International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1730
ISSN 2229‐5518
HIERARCHICAL PRIVACY PRESERVING
DISTRIBUTED FREQUENT ITEMSET MINING OVER VERICALLY DISTRIBUTED DATASET
(HPPDFIM)
Fuad Ali moh. Alyarimi, Sonajharia Minz
Abstract— The process of defining association rules is dependent of frequent itemsets. Distributed data mining is a significant scenario due to the fact of large quantity of data and adaptive resource sharing between networks that provides computational strength. In this context data that is high in volume need to be distributed to various computational clients to perform data mining in distributed manner, which is scalable in regard to computation and process time. Henceforth it become a serious issue to protect the privacy of the data that distributed for mining. Here in this paper we explored a novel perturbation technique called connected guassian approach (CGA) to protect the data privacy that distributed and a distributed frequent itemset mining modal. The model here we proposed is referred as hierarchical privacy preserving distributed frequent itemset mining (HPPDFIM) that applied on vertically partitioned data tuples. The other underlying goal of the proposed model is that achieving privacy by using one way digestive standard during communication between distributed computational resources involved in DDM. The experiments revealed that the model is scalable and accurate unlike other perturbation standards. In a glance the goals of the proposed HPPDFIM is (i) protecting privacy during the distribution of the vertically partitioned data tuples even distributed computational resources compromised, (ii) achieving privacy during the conversations between participant distributed computational resources, (iii) and the frequent itemset mining should be done such that each node of the network that participating in mining process should not aware of the transaction elements of the other nodes that are involving in data mining process.
Index Terms— Anonymity data, Data Mining, Distributed Frequent Itemset Mining, guassian Perturbation, Perturbation Approach, privacy preserving data mining.
—————————— ——————————
xtracting the repeated patterns that are frequent than the desired threshold is an essential process of Data mining and knowledge discovery. Rapid progress in data
collection and promulgation stratagies demands the reformation of the traditional mining modals. Along side a vivid progress experiencing in technologies related to data mining and knowledge discovery. In relation to these two circumstances, it is apparent that there is considerable scope for further research. In regard to desired computational resources to achieve scalability in mining and knowledge discovery (DM&KD) from high volumes of data, the distributed systems are considerably significant. Using distributed systems in data mining, which refers as distributed data mining exploring the need of research in different dimensions. One of that highly prioritized research issue is handling data identity leakage and privacy leakages. Preserving data privacy in DM&KD is considered as part of the process, henceforth evaluating the traditional algorithms
————————————————
Fuad Ali Mohammed Al-Yarimi is currently pursuing Ph.D degree program in computer science in Jawaharlal Nehru University,New Delhi, India, E-mail: frindh@gmail.com
Sonajharia Minz, Profesor at School of Computer & Systems Science
Jawaharlal Nehru University, New Delhi 110067 – India, (email:
sonaminz@mail.jnu.ac.in)
scalability under association of privacy preserving techniques is seriously considered in earlier research [1][2][3].
The identity of an individual or of a transaction is clearly observable in a given transaction dataset. We are also obvious to observe the transactional scenarios, such as type of transaction, key attributes involved in that transactions, position and of those attributes in a transaction and conclusions from that transactionsset for decision making, future transaction forecasting from a given transaction dataset. Henceforth producing a transactional datset to third party data mining resources causes identity and privacy leakages.
In the context of literature, (i) if data allows the unautherized third party to discover confidential information, then it is threat called data inference, (ii) any of the privacy preserving model reorganize and modify the actual data to handle the datainference, such that actual state of the data and knowledge from that data remain concealed. These privacy preserving techniques are conceptually different for distributed data mining (DDM). This is due to more than one third party participating in data mining and data distributed to these third party may recurrent or unique. Henceforth the privacy preserving techniques used in the case of single third party would not optimal in DDM.
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1731
ISSN 2229‐5518
In our previous work [45] we considers the case when the data owner is unable to do knowledge extraction from a vast repository of data containing some sensitive information. In such a case the data miner may be a trusted party who may be allowed access to the data in its original form for knowledge extraction, so we didn’t do any anonymity or perturbation to the data before encrypt. The paper only aims to explore the use of cryptographic techniques for secured channel for privacy preservation under the consideration of the data miner is a trusted party.
In this regard here we propose a Hierarchical Privacy Preserving Distributed Frequent Itemset Mining over Vertically Distributed data (HPPDFIM-VD) under the consideration of the data miner is not a trusted party so we need to perturbate the data before send it to the data miner. The rest of the paper set as, first outline the recent proposals relevant to privacy preserving data mining, then we explore the proposed HPPDFIM-VD. Further we explore the performance analysis of the proposal, which followed by the conclusion and references.
The exploration of the frequently cited privacy preserving with perturbation and cryptography models is follow. Handling data inference using secure computation by multi- party, which is a cryptographic concept that also referred as Secure Malty Party Computation (SMPC) approach is optimal for high-end privacy [14], [15]. Henceforth any traditional mining algorithm under SMPC approach can preserve privacy for multiple users of the distributed environment [16]. Due to the obstacles such as computationally very expensive and failure to correlate hypothesis and expediency, these approaches are failed to be optimal.
In this regard considerable alternatives to SMPC have been
proposed in recent literature, which are specific to the data
mining task opted. The solutions devised in [14] are specific
to decision tree mining over horizontally partitioned data. In
the context of vertically partitioned data, a set of algorithms
projected in [15] [18], the algorithms devised in [17] are
specific to clustering approach. A data compression technique
was devised in [19] to enable privacy preserving under
collaborative distributed data mining and analysis. The
modals [14] [15] [17] [18] [19] are providing the entire data
such that it can’t compromise for data inference. But
untrusted parties may succed partially to infer the properties,
which may be sensitive to an individual or a transaction. In
this context, researchers coined a process called information
hiding via property perturbation that progress the process of
privacy preserving. In this regard the information hiding is
achieved by transforming attributes of the given dataset such
that it can be useful to apply data mining modals without
compromising at scalability of the mining results. This
information hiding modals follows either one the (i)
Anonymizing [20], [21], [22], [23], [24], [25] (ii) attribute
probability change [26], [27], [28] and (iii) Data perturbation
[29], [30], [31], [32], [33], [34], [35], [36].
Privacy preserved data set related oprations is another research dimension of knowledge discovery strategy. In this regard set of modals devised in [37][39][40][41][42], which is two party based privacy preserved intersecting , combining two or more sets on state of one field. A study [38] explored that these proposed protocols vulunerable to leak information. While most of these protocols are equality based, algorithms in [38] compute arbitrary join predicates leveraging the power of a secure coprocessor. The models devisd in [43] are using Tiny trusted devices in regard to secure function evaluation.
The perturbation techniques build by using either additive or matrix ultiplication approaches. Few of the interesting additive based perturbation techniques can be found in [29], [30], [32], [33], [35] and matrix multiplication models are [31][34]. These models are optimal for continuous data. In recent time a multilevel perturbation model devised in [44]. Along the limits explored these perturbation techniques are optimal only for centralized data mining. The other factors in regard to this perturbation methods are (i) protection is centric to attributes identity and privacy, (ii) the genune relations between attributes, which extracted as mining results still disclosed to third party mining resource. Henceforth here in this paper we propose a Connected guassian approach to preturbate the attributes of the dataset in regard to achieve the privacy preserving in distributed data mining on vertically partitioned data. In the aim of preserving the privacy of emerged mining results at mining resource, the proposed technique perturbating even the positions of the attributes. Ideally we would get complete zero knowledge, but for a practical solution restricted information disclosure may be suitable. Finally, we quantify the accuracy and the efficiency of the algorithm, in view of the security restrictions.
Hierarchical Privacy Preserving Distributed Frequent Itemset
Mining on Vertically Partitioned Dataset
The proposed HPPDFIM is a twofold model. In first fold the
total dataset perturbated using proposed Connected Gaussian
Approach (CGA), then the data will be partitioned vertically
in non linier order, and the same will be distributed across the
nodes. Here the proposed connected Gaussian approach
protects the data anonymity and integrity even nodes
compromised together and analyze. In second fold each node
performs tuple level itemset mining on given vertical tuple of
the dataset, which is in perturbated format. Then the nodes
perform oneway communication in the order of tuples
allotted. In the second fold of the process the node with first
vertically partitioned tuple sends tuple level frequent itemsets
tlfi to the node with second tuple of the vertically partitioned
data. Then the node that received tlfi from its predecessor
extract the multi tuple level frequent itemsets, here in this
case it uses the tlfi received from its predecessor and tlfi
found at that node,these multituple level frequent itemsets
can be referred as mtlfi. Then this mtlfi will be send to next node in the order. Then mltfi that received and tlfi of the
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1732
ISSN 2229‐5518
current node will be used to find mtlfi of the current node. The last node in the sequence then sends mtlfi that projected
guasian noise vector V of size
smax (here
smax indicates the
to data provider. Then the data provider removes the perturbstion from the received mtlfi, hence data provider can find final set of frequent itemsets.
Let D be the data base with E number of attributes and N number of transactions. The dataset D partitioned such that the N transactions data vertically tupled in nonleniar model and then send these tuples to nodes in a sequential order. Let D be the dataset having N transactions that are generated by set of
max support of any item in given transaction dataset) will be
generated such that the vector values must be in the range of 0 to
1 with fixed interval in ascending order.
Table 1: Hierarchical guassian noise vector preparation algorithm
Let smax be the max support of any item in given transaction dataset
Let gmin rand _ double(min, max)
Here gmin is minimum guassian perturbation threshold
attributes
{a1 , a2 , a3 ,.........., ae }
Let consider a distributed
Here min : 0.0 and max : 0.1
network environment with m number data collection sensors, which are collecting data for attributes of the geven dataset D . The given dataset D vertically partitioned between data
Let gmax 1 gmin
Here gmax is maximum guassian perturbation threshold
collection sensors (here after can be referred as node), such that
Let g
inc
(gmax gmin )
tuple of attributes
{a1 , a2 , a3}
in node
n1 , attributes
smax
{a4 , a5 ,....ae9 } belongs to node
n2 , attributes {ae9 , ae8}
Here
ginc
is guassian perturbation fixed increment
belongs node n3 , attributes {aei , ae j , eek } belongs to node nm1 and {ae j ,.....ae } belongs to node nm . The data collected for these attributes perturbated using proposed CGA and then
partitions this data vertically according to the attribute partitions.
And the same will be sent to appropriate nodes in the form of matrix.
In general scenario of perturbation approaches for multiparty, the nexus of two or more parties leads to the exploration of the original state of the data and as frequent itemsets their influences in the business scenario. In this regard here we refine the general perturbation approaches as Connected Perturbation Approach, which prevents the leakage of the actual data state and their influence as frequent itemsets in business scenario even two or more parties compramized.
The base idea of the connected perturbation approach is, applying strategic hierarchical swaping between elements in
threshold
Reset guassian noise vector V of size smax to empty
foreach i {i=0, 1, 2, 3,...., smax } Vi gmin (i * ginc )
Then these hierarchical guassian noise vector is used to perturbate the frequency of each item. Here we perturbate item names by adding fixed random string followed by the implication of a digestive algorithm such as MD5.
Table 2: Item position and frequency perturbation approach
For each item i I begin
s(i j ) Find _ frequency(i j ) //traverse the entire database
and count the total occurances of the item i j . End;
Intialize vector PS as empty
For each item i I begin
s(i )
transactions by their frequency. To protect the privacy in regard
to this swaping we perturbate their frequency by hierarchical guassian noises. The steps follows.
ps(i j )
i j
// Here ps(i j ) is perturbated support of the item
Vj
Let dataset D with n number of transactions
T {t1 , t 2 , t3 ,.......t n 2 , t n 1 , t n } . Each transaction t j is a set of items {i , i , i , ......, i } that are subset of total set of items
1 2 3 |t j|
I , which are using to generate dataset D . The data source authority decides support s of the frequency. The hierarchical
Foreach j such that { j 1.... | I |} PS ps(i j )
end
For each transaction ti T
begin
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1733
ISSN 2229‐5518
PS
max
end
foreach p {p=1.....|PS|/2} begin
Swap items with support PSp and items with support end
last node in sequence found the mtlfim . Then this mtlfim will be sent to the server (the process initiator).
Since mtlfim received from last node in sequence is representing all the itemsets in their digestive format, hence server communicates with all nodes participated in mining process and
collects the frequent itemsets in narmal format.
Table 3: Field Name Perturbation approach
Let create a random string rs of given threshold size rl;
For each item i I begin
pi MD5(i rs)
End
Forther this data vertically partitioned and distributed to the multyparties that are performing mining. The mining of frequent itemsets multiparies. The process of mining frequent itemsets on perturbated data at multiparies is explained in following sections
Since the dataset is partitioned vertically, the total number of transactions at each node are N . By applying any of the popular itemset mining models such as apriori, FP-tree or BIDE the frequent itemsets of the locally available transactions should be found first by each node. Once the frequent itemsets with given support threshold found, then a boolean matrix that represents tlfi should prepare such that rows represents frequent
itemsets, columns represents transactions. Each column of the tlfi should represent the presence of the frequent itemset in a given transaction, if frequent itemset r is existing in transaction c , then column r X c represents 1 otherwise 0. If the node is first node then this node farwards tlfi to its successor. If the
node is not the first node in sequence, then continues to find the
Let consider a dataset D with 6 transactions and 9 attributes. Let consider the coverage value 40% that represents support 3.6. As of the given dataset smin is 1 and smax is 4;
Let consider gmin as 0.0394; Then the gmax is 0.9606;
The ginc is 0.10673
The support of the items {A1, A2, A3, A4, A5, A6, A7, A8,A9}
are {4, 3, 4, 4, 3, 3, 4, 3, 3}
The perturbated support of items {A1, A2, A3, A4, A5, A6, A7, A8,A9} are {27.372, 11.864, 11.123, 8.5778, 5.235 , 4.413 ,
5.085 , 3.358 , 3.000} .
Order of items by perturubated support: {A9, A8, A6, A7, A5, A4, A3, A2, A1}. TO understand the process see table 4 & 5
Table 4: Actual dataset D
mtlfi , which described in following section.
1 2 3 4 5 6
7 8 9
In the sequence if a node
ni received
mtlfii 1
from its
predecessor, then the node ni prepares mtlfii as follows
mtlfii tlfii mtlfii 1 (tlfii * mtlfii 1 ) …….Eq(1)
Table 5: Position perturbated Dataset
If the node
ni is not the last node in the sequence, then it
farwards mtlfii to next node ni 1 . This process continues til the
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1734
ISSN 2229‐5518
Tuple of Attributes in node n1 :
{346162c577bf5cba11dd7ce1098e3c0a,
Field ID perturbation by salting random string and applying message digest technique:
Let consider rs is “v534$^gfjgfgSDFGH”
The resultant value after salting rs and applying MD5for field id
A1 is 346162c577bf5cba11dd7ce1098e3c0a A2 is eb4210359fd0cd219ce8dd775cb2459c A3 is c1e4e966ac715390a9986cd9c0ece2ee A4 is 0454a42022ef166a73092cd0d75f57d7
A5 is 26d8473d4235dc8cb6ad79868d3b00e2
A6 is 99c4b622f8a5f1ec594356f0091c9cbc
A7 is 7185b700b8eb8ea2380c65f7c2372bb2
A8 is 442c57912f12e956ba36758da2358ee1
A9 is 17cfb4826cbaabd131413f0f2d69f315
Hence the final structure of the data at source can be found in table 6
Table 6: Data representation at source after perturbating frequency, position and field ids
eb4210359fd0cd219ce8dd775cb2459c, c1e4e966ac715390a9986cd9c0ece2ee}
Tuple of Attributes in node n2 :
{0454a42022ef166a73092cd0d75f57d7
26d8473d4235dc8cb6ad79868d3b00e2}
Tuple of Attributes in node n3 :
{99c4b622f8a5f1ec594356f0091c9cbc
7185b700b8eb8ea2380c65f7c2372bb2
442c57912f12e956ba36758da2358ee1
17cfb4826cbaabd131413f0f2d69f315}
Transaction matrix prepared from the data collected for attributes tuple {a1 , a2 , a3} for 6 transactions at node n1 is
Transaction matrix prepared from the data collected for
attributes tuple {a4 , a5} for 6 transactions at node n2 is
Attribut es | 0454a42022ef166a73092 cd0d75f57d7 | 26d8473d4235dc8cb6ad7 9868d3b00e2 |
Transac tions | 0454a42022ef166a73092 cd0d75f57d7 | 26d8473d4235dc8cb6ad7 9868d3b00e2 |
T1 | 1 | 0 |
T2 | 0 | 1 |
T3 | 1 | 1 |
T4 | 1 | 0 |
T5 | 1 | 0 |
T6 | 0 | 1 |
Transaction matrix prepared from the data collected for
attributes tuple {a6 , a7 , a8 , a9 } for 6 transactions at node n3 is
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1735
ISSN 2229‐5518
Since node n1 is first node in the sequence it will not perform the process of determining mtlfi , and farwards this tlfi to
i
next node n 2
The
tlfi found at node
2
n 2 for frequent itemsets
({0454a42022ef166a73092cd0d75f57d7}
is
{26d8473d4235dc8cb6ad79868d3b00e2})
T 1 | T 2 | T 3 | T 4 | T 5 | T 6 | |
{0454a42022ef166a73092cd0d75 | 1 | 0 | 1 | 1 | 1 | 0 |
{26d8473d4235dc8cb6ad79868d3 | 0 | 1 | 1 | 0 | 0 | 1 |
The frequent itemsets found at node n1 are
({346162c577bf5cba11dd7ce1098e3c0a},
{eb4210359fd0cd219ce8dd775cb2459c},
{c1e4e966ac715390a9986cd9c0ece2ee},
{eb4210359fd0cd219ce8dd775cb2459c,c1e4e966ac715390a
Then at node n 2 the product of tlfin and tlfin will be found that referred as tlfin Xn
tlfin1 X n2 tlfin1 X tlfin2 :
Then the tlfi is as follow:
1
T 1 | T 2 | T 3 | T 4 | T 5 | T 6 | |
{346162c577bf5cba11dd7ce1098 | 1 | 0 | 1 | 0 | 0 | 1 |
{eb4210359fd0cd219ce8dd775cb | 1 | 0 | 0 | 1 | 0 | 1 |
{c1e4e966ac715390a9986cd9c0e | 1 | 0 | 0 | 1 | 0 | 1 |
{eb4210359fd0cd219ce8dd775cb c1e4e966ac715390a9986cd9c0ec | 1 | 0 | 0 | 1 | 0 | 1 |
Here above matrix represents
{346162c577bf5cba11dd7ce1098e3c0a},
{eb4210359fd0cd219ce8dd775cb2459c},
{c1e4e966ac715390a9986cd9c0ece2ee},
{eb4210359fd0cd219ce8dd775cb2459c,c1e4e966ac715390
,
which indicates the existence of an itemset in a particular transaction.
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1736
ISSN 2229‐5518
Since the node
n 2 is not last node of the sequence, hence
farwards this mtlfin2
to next node n3
The same process continues at node n3 and determined tlfin
and mtlfi matrices at node n are
n3 3
The matrix tlfi is:
3
The final tlfin Xn after support analysis is
T 1 | T 2 | T 3 | T 4 | T 5 | T 6 | |
c1e4e966ac715390a9986cd9c0ec e2ee X 0454a42022ef166a73092cd0d75f 57d7 | 1 | 0 | 0 | 1 | 0 | 1 |
Since the node n 2 is not the first node the sequence hence it constructs matrix mtlfi as follows
2
mtlfin2 tlfin2 mtlfin1 tlfin1Xn2
The matrix mltfi is
2
The matrix
(tlfi Xmtlfi ) is:
n3 n 2
T 1 | T 2 | T 3 | T 4 | T 5 | T 6 | |
99c4b622f8a5f1ec594356f0091c9c bc X {26d8473d4235dc8cb6ad79868d3 | 1 | 0 | 1 | 0 | 1 | 0 |
7185b700b8eb8ea2380c65f7c2372 bb2 X {26d8473d4235dc8cb6ad79868d3 | 1 | 0 | 1 | 1 | 1 | 0 |
442c57912f12e956ba36758da2358 ee1 X {c1e4e966ac715390a9986cd9c0e | 1 | 0 | 0 | 1 | 0 | 1 |
442c57912f12e956ba36758da2358 ee1 | 1 | 0 | 0 | 1 | 0 | 1 |
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1737
ISSN 2229‐5518
[EBF1CA419D2EA2BCF2A208C3701A30E9] | 4 |
[[796BDC6F731D811A5F57EB0C06EFEF50], [3048BEB67F69393F2F987E646F24194F]] | 3 |
[[36EC455DE71F46C98F3131176973972A], [6593D7B12FD418CDB35BBF438DE72F66]] | 3 |
[36EC455DE71F46C98F3131176973972A] | 4 |
[[CA034CA406E03D1010A118F5B3677518], [36EC455DE71F46C98F3131176973972A]] | 3 |
[8650E375EE80B2277A84FC9B85375E36] | 3 |
[[C6F2F93133905F75DA4B02CCC19AB66A], [EBF1CA419D2EA2BCF2A208C3701A30E9]] | 3 |
[[36EC455DE71F46C98F3131176973972A], [CA034CA406E03D1010A118F5B3677518, 6593D7B12FD418CDB35BBF438DE72F66]] | 3 |
[3048BEB67F69393F2F987E646F24194F] | 3 |
[6593D7B12FD418CDB35BBF438DE72F66] | 4 |
Since the node n3 is last node in the sequence, hence it sends
mtlfi as database level frequent itemsets to the server.
3
Then the server identifies the actual itemsets by removing the position and frequency perturbation that added at source level. The process explored below
Database level frequent itemsets with attribute representation is
The matrix
mtlfi is: [
3
mtlfin tlfin mtlfin (tlfin Xmtlfin ) ]
Frequent Itemsets after removing the perturbation is:
A4, A1 | 3 |
A5, A7 | 3 |
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1738
ISSN 2229‐5518
A3, A1 | 4 |
A8 | 3 |
A7 | 4 |
A4, A3 | 3 |
A3 | 4 |
A3, A1, A4 | 3 |
A5 | 3 |
A2 | 3 |
A2, A6 | 3 |
A1 | 4 |
A9 | 3 |
A6 | 3 |
A4 | 4 |
The analysis of the frequent items after removing perturbation in frequency and position from the resultant frequent itemsets indicates the accuracy in identifying the frequent itemsets even from perturbated data. Here in this analysis we consider the basic itemset mining algorithm but it is quite evident to use any frequent itemset mining algorithm such as fptree, bide to find tlfi at each mining node.
Here in this paper we explored a novel connected perturbation approach to preserve privacy in vertically partitioned distributed data mining. Most of the solutions currently available in recent literature either not compatible to distributed data mining or fail to avoid data leaks under nexus of one more data mining node authorities. In this regard the model that proposed here is perturbating actual dataset in connected manner. The proposed model that referred as “Hierarchical Privacy Preserving Distributed Frequent Itemset Mining Over Verically Distributed Dataset (HPPDFIM)” is perturbating the actual item frequency, postion and field id in connected manner. In future this model can be expanded to horizontally partitioned distributed data mining and multi level data mining.
[1]. W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of the 2001 New Security Paradigms Workshop, Cloudcroft, New Mexico, Sept. 11-13 2001.
[2]. S. J. Rizvi and J. R. Haritsa. Privacy-preserving association rule mining. In Proceedings of 28th International Conference on Very
Large Data Bases. VLDB, Aug. 20-23 2002.
[3]. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the
Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on
Principles of Database Systems, Santa Barbara, California, USA, May
21-23 2001. ACM.
[4]. D. W.-L. Cheung, V. Ng, A. W.-C. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. Transactions on Knowledge and Data Engineering, 8(6):911{922, Dec. 1996.
[5]. M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In The
ACM SIGMOD Workshop on Research Issues on Data Mining and
Knowledge Discovery (DMKD'02), June 2 2002.
[6]. P. Chan. On the accuracy of meta-learning for scalable data mining.
Journal of Intelligent Information Systems, 8:5{28, 1997.
[7]. Prodromidis, P. Chan, and S. Stolfo. Meta-learning in distributed data mining systems: Issues and approaches, chapter 3. AAAI/MIT Press,
2000.
[8]. R. Chen, K. Sivakumar, and H. Kargupta. Distributed web mining using bayesian networks from multiple data streams. In The 2001
IEEE International Conference on Data Mining. IEEE, Nov. 29 - Dec.
2 2001.
[9]. R. Wirth, M. Borth, and J. Hipp. When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In Ubiquitous Data Mining for Mobile and Distributed Environments workshop associated with the Joint 12th European Conference on Machine Learning (ECML'01) and 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01), Freiburg, Germany, Sept.3-7 2001.
[10]. Y. Lindell and B. Pinkas. Privacy preserving data mining. In Advances in Cryptology { CRYPTO 2000, pages 36{54. Springer- Verlag, Aug. 20-24 2000.
[11]. R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 14-19 2000. ACM.
[12]. C. Yao. How to generate and exchange secrets. In Proceedings of the
27th IEEE Symposium on Foundations of Computer Science, pages
162{167. IEEE, 1986.
[13]. O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game - a completeness theorem for protocols with honest majority. In
19th ACM Symposium on the Theory of Computing, pages 218{229,
1987.
[14]. Y. Lindell and B. Pinkas, “Privacy Preserving Data Mining,” Proc.
Int’l Cryptology Conf. (CRYPTO), 2000.
[15]. J. Vaidya and C.W. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2002.
[16]. O. Goldreich, “Secure Multi-Party Computation,” Final (incomplete)
draft, version 1.4, 2002.
[17]. J. Vaidya and C. Clifton, “Privacy-Preserving K-Means Clustering over Vertically Partitioned Data,” Proc. ACM SIGKDD Int’l Conf.
Knowledge Discovery and Data Mining, 2003.
[18]. A.W.-C. Fu, R.C.-W. Wong, and K. Wang, “Privacy-Preserving
Frequent Pattern Mining across Private Databases,” Proc. IEEE Fifth
Int’l Conf. Data Mining, 2005.
[19]. B. Bhattacharjee, N. Abe, K. Goldman, B. Zadrozny, V.R.
Chillakuru, M.del Carpio, and C. Apte, “Using Secure Coprocessors for Privacy Preserving Collaborative Data Mining and Analysis,”
Proc. Second Int’l Workshop Data Management on New Hardware
(DaMoN ’06), 2006.
[20]. C.C. Aggarwal and P.S. Yu, “A Condensation Approach to Privacy Preserving Data Mining,” Proc. Int’l Conf. Extending Database Technology (EDBT), 2004.
[21]. Bertino, B.C. Ooi, Y. Yang, and R.H. Deng, “Privacy and Ownership Preserving of Outsourced Medical Data,” Proc. 21st Int’l Conf. Data Eng. (ICDE), 2005.
[22]. D. Kifer and J.E. Gehrke, “Injecting Utility Into Anonymized
Datasets,” Proc. ACM SIGMOD Int’l Conf. Management of Data,
2006.
[23]. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, “L-Diversity: Privacy Beyond K-Anonymity,” Proc. Int’l Conf. Data Eng., 2006.
[24]. L. Sweeney, “K-Anonymity: A Model for Protecting Privacy,” Int’l J.
Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), vol. 10, pp. 557-570, 2002.
[25]. X. Xiao and Y. Tao, “Personalized Privacy Preservation,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2006.
[26]. R. Agrawal, R. Srikant, and D. Thomas, “Privacy Preserving OLAP,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2005.
[27]. W. Du and Z. Zhan, “Using Randomized Response Techniques for Privacy-Preserving Data Mining,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2003.
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April‐2013 1739
ISSN 2229‐5518
[28]. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, “Privacy Preserving Mining of Association Rules,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2002.
[29]. D. Agrawal and C.C. Aggarwal, “On the Design and Quantification of Privacy Preserving Data Mining Algorithms,” Proc. 20th ACM
SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems
(PODS ’01), pp. 247-255, May 2001.
[30]. R. Agrawal and R. Srikant, “Privacy Preserving Data Mining,” Proc.
ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’00),
2000.
[31]. K. Chen and L. Liu, “Privacy Preserving Data Classification with
Rotation Perturbation,” Proc. IEEE Fifth Int’l Conf. Data Mining,
2005.
[32]. Z. Huang, W. Du, and B. Chen, “Deriving Private Information From
Randomized Data,” Proc. ACM SIGMOD Int’l Conf. Management of
Data (SIGMOD), 2005.
[33]. Li, J. Sun, S. Papadimitriou, G. Mihaila, and I. Stanoi, “Hiding in the
Crowd: Privacy Preservation on Evolving Streams Through
Correlation Tracking,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE), 2007.
[34]. K. Liu, H. Kargupta, and J. Ryan, “Random Projection-Based
Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 1, pp. 92-106, Jan. 2006.
[35]. S. Papadimitriou, F. Li, G. Kollios, and P.S. Yu, “Time Series Compressibility and Privacy,” Proc. 33rd Int’l Conf. Very Large Data Bases (VLDB ’07), 2007.
[36]. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, “On the Privacy Preserving Properties of Random Data Perturbation Techniques,” Proc. IEEE Third Int’l Conf. Data Mining, 2003.
[37]. R. Agrawal, A. Evfimievski, and R. Srikant, “Information Sharing across Private Databases,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2003.
[38]. R. Agrawal, D. Asonov, M. Kantarcioglu, and Y. Li, “Sovereign
Joins,” Proc. 22nd Int’l Conf. Data Eng. (ICDE ’06), 2006.
[39]. C. Clifton, M. Kantarcioglu, X. Lin, J. Vaidya, and M. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” ACM SIGKDD
Explorations, vol. 4, no. 2, pp. 28-34, 2003.
[40]. B.A. Huberman, M. Franklin, and T. Hogg, “Enhancing Privacy and
Trust in Electronic Communities,” Proc. First ACM Conf. Electronic
Commerce, pp. 78-86, Nov. 1999.
[41]. M. Freedman, K. Nissim, and B. Pinkas, “Efficient Private Matching and Set Intersection,” Advances in Cryptology—EUROCRYPT, vol.
3027, pp. 1-19, 2004.
[42]. L. Kissner and D. Song, “Privacy-Preserving Set Operations,” Proc.
Int’l Cryptology Conf. (CRYPTO), 2005.
[43]. Iliev and S. Smith, “More Efficient Secure Function Evaluation Using Tiny Trusted Third Parties,” Technical Report TR2005-551, Dept. of Computer Science, Dartmouth Univ., 2005.
[44]. Yaping Li, Minghua Chen, Qiwei Li, and Wei Zhang; "Enabling Multilevel Trust in Privacy Preserving Data Mining"; ieee transactions on knowledge and data engineering, vol. 24, no. 9,
september 2012
[45]. Fuad Ali Mohammed Al-Yarimi, Sonajharia Minz; “Multilevel
Privacy Preserving In Distributed Environment using Cryptographic
Technique”; Proceedings of the World Congress on Engineering 2012
Vol I WCE 2012, July 4 - 6, 2012, London, U.K.
IJSER © 2013 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013
ISSN 2229-5518
1740
IJSER IS) 2013
http://www.ijser ora
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013
ISSN 2229-5518
1741
IJSER IS) 2013
http://www.ijser ora
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013
ISSN 2229-5518
1742
IJSER IS) 2013
http://www.ijser ora