Privacy of data, preserving in Data Mining
|
Full Text(PDF, 3000) PP.
|
|
Author(s) |
Deepika Saxena |
|
KEYWORDS |
datamining, data privacy, preserving datamining, model, quantization approach, predictive information, ppdp, ppdm.
|
|
ABSTRACT |
Huge volume of detailed personal data is regularly collected and sharing of these data is proved to be beneficial for data mining application. Such data include shopping habits, criminal records,medical history, credit records etc .On one hand such data is an important asset to business organization and governments for decision making by analyzing it .On the other hand privacy regulations and other privacy concerns may prevent data owners from sharing information for data analysis. In order to share data while preserving privacy data owner must come up with a solution which achieves the dual goal of privacy preservation as well as accurate clustering result. Trying to give solution for this we implemented vector quantization approach piecewise on the datasets which segmentize each row of datasets and quantization approach is performed on each segment using K means which later are again united to form a transformed data set.Some experimental results are presented which tries to finds the optimum value of segment size and quantization parameter which gives optimum in the tradeoff between clustering utility and data privacy in the input dataset.
|
|
References |
|
[1] C. C. Aggarwal. On k-anonymity and the curse of
dimensionality. In Proc. of the 31st
International Conference on Very Large Data Bases (VLDB),
pages 901{909, Trondheim,
Norway, 2005.
[2] C. C. Aggarwal, J. Pei, and B. Zhang. On privacy
preservation against adversarial data
mining. In Proc. of the 12th ACM SIGKDD International
Conference on Knowledge
Discovery and Data Mining, Philadelphia, PA, August
2006.
[3] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R.
Panigrahy, D. Thomas, and
A. Zhu. Anonymizing tables. In Proc. of the 10th
International Conference on Database
Theory (ICDT), pages 246{258, Edinburgh, UK, January
2005.
[4] R. Agrawal, A. Ev¯mievski, and R. Srikant.
Information sharing across private
databases. In Proc. of the 2003 ACM SIGMOD International
Conference on Management
of Data, San Diego, CA, 2003.
[5] R. Agrawal, T. Imielinski, and A. N. Swami. Mining
association rules between sets of
items in large datasets. In Proc. of the 1993 ACM SIGMOD,
pages 207{216, 1993.
[6] R. Agrawal and R. Srikant. Privacy preserving data
mining. In Proc. of the 2000 ACM
SIGMOD International Conference on Management of Data,
pages 439{450, Dallas,
Texas, May 2000.
[7] S. Agrawal and J. R. Haritsa. A framework for highaccuracy
privacy-preserving mining.
In Proc. of the 21st IEEE International Conference on Data
Engineering (ICDE), pages
193{204, Tokyo, Japan, 2005.
[8] R. J. Bayardo and R. Agrawal. Data privacy through
optimal k-anonymization. In
Proc. of the 21st IEEE International Conference on Data
Engineering (ICDE), pages
217{228, Tokyo, Japan, 2005.
[9] L. Burnett, K. Barlow-Stewart, A. Pros, and H.
Aizenberg. The gene trustee: A universal
identi¯cation system that ensures privacy and
con¯dentiality for human genetic
databases. Journal of Law and Medicine, 10:506{513, 2003.
[10] Business for Social Responsibility. BSR Report on
Privacy, 1999. http://www.bsr.org/.
123
BIBLIOGRAPHY 124
[11] D. Chaum. Untraceable electronic mail, return
addresses, and digital pseudonyms.
Communications of the ACM, 24(2):84{88, 1981.
[12] S. Chawathe, H. G. Molina, J. Hammer, K. Ireland, Y.
Papakonstantinou, J. Ullman,
and J. Widom. The TSIMMIS Project: Integration of
heterogeneous information
sources. In 16th Meeting of the Information Processing
Society of Japan, pages 7{18,
1994.
[13] C. Clifton. Using sample size to limit exposure to
data mining. Journal of Computer
Security, 8(4):281{307, 2000.
[14] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M.
Y. Zhu. Tools for privacy
preserving distributed data mining. SIGKDD Explorations,
4(2), December 2002.
[15] L. H. Cox. Suppression methodology and statistical
disclosure control. Journal of the
American Statistics Association, Theory and Method Section,
75:377{385, 1980.
[16] T. Dalenius. Finding a needle in a haystack - or
identifying anonymous census record.
Journal of O±cial Statistics, 2(3):329{336, 1986.
[17] U. Dayal and H. Y. Hwang. View de¯nition and
generalization for database integration
in a multidatabase systems. IEEE Transactions on Software
Engineering, 10(6):628{
645, 1984.
[18] A. Deutsch and Y. Papakonstantinou. Privacy in
database publishing. In ICDT, 2005.
[19] W. Du, Y. S. Han, and S. Chen. Privacy-preserving
multivariate statistical analysis:
Linear regression and classi¯cation. In Proc. of the SIAM
International Conference on
Data Mining (SDM), Florida, 2004.
[20] W. Du and Z. Zhan. Building decision tree classi¯er
on private data. In Workshop
on Privacy, Security, and Data Mining at the 2002 IEEE
International Conference on
Data Mining, Maebashi City, Japan, December 2002.
|
|
|