IJSER Home >> Journal >> IJSER
International Journal of Scientific and Engineering Research
ISSN Online 2229-5518
ISSN Print: 2229-5518 3    
Website: http://www.ijser.org
scirp IJSER >> Volume 2, Issue 3, March 2011 Edition
Privacy of data, preserving in Data Mining
Full Text(PDF, 3000)  PP.  
Deepika Saxena
datamining, data privacy, preserving datamining, model, quantization approach, predictive information, ppdp, ppdm.
Huge volume of detailed personal data is regularly collected and sharing of these data is proved to be beneficial for data mining application. Such data include shopping habits, criminal records,medical history, credit records etc .On one hand such data is an important asset to business organization and governments for decision making by analyzing it .On the other hand privacy regulations and other privacy concerns may prevent data owners from sharing information for data analysis. In order to share data while preserving privacy data owner must come up with a solution which achieves the dual goal of privacy preservation as well as accurate clustering result. Trying to give solution for this we implemented vector quantization approach piecewise on the datasets which segmentize each row of datasets and quantization approach is performed on each segment using K means which later are again united to form a transformed data set.Some experimental results are presented which tries to finds the optimum value of segment size and quantization parameter which gives optimum in the tradeoff between clustering utility and data privacy in the input dataset.
[1] C. C. Aggarwal. On k-anonymity and the curse of dimensionality. In Proc. of the 31st International Conference on Very Large Data Bases (VLDB), pages 901{909, Trondheim, Norway, 2005.

[2] C. C. Aggarwal, J. Pei, and B. Zhang. On privacy preservation against adversarial data mining. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, August 2006.

[3] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Anonymizing tables. In Proc. of the 10th International Conference on Database Theory (ICDT), pages 246{258, Edinburgh, UK, January 2005.

[4] R. Agrawal, A. Ev¯mievski, and R. Srikant. Information sharing across private databases. In Proc. of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003.

[5] R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large datasets. In Proc. of the 1993 ACM SIGMOD, pages 207{216, 1993.

[6] R. Agrawal and R. Srikant. Privacy preserving data mining. In Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, pages 439{450, Dallas, Texas, May 2000.

[7] S. Agrawal and J. R. Haritsa. A framework for highaccuracy privacy-preserving mining. In Proc. of the 21st IEEE International Conference on Data Engineering (ICDE), pages 193{204, Tokyo, Japan, 2005.

[8] R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In Proc. of the 21st IEEE International Conference on Data Engineering (ICDE), pages 217{228, Tokyo, Japan, 2005.

[9] L. Burnett, K. Barlow-Stewart, A. Pros, and H. Aizenberg. The gene trustee: A universal identi¯cation system that ensures privacy and con¯dentiality for human genetic databases. Journal of Law and Medicine, 10:506{513, 2003.

[10] Business for Social Responsibility. BSR Report on Privacy, 1999. http://www.bsr.org/. 123 BIBLIOGRAPHY 124

[11] D. Chaum. Untraceable electronic mail, return addresses, and digital pseudonyms. Communications of the ACM, 24(2):84{88, 1981.

[12] S. Chawathe, H. G. Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS Project: Integration of heterogeneous information sources. In 16th Meeting of the Information Processing Society of Japan, pages 7{18, 1994.

[13] C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4):281{307, 2000.

[14] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for privacy preserving distributed data mining. SIGKDD Explorations, 4(2), December 2002.

[15] L. H. Cox. Suppression methodology and statistical disclosure control. Journal of the American Statistics Association, Theory and Method Section, 75:377{385, 1980.

[16] T. Dalenius. Finding a needle in a haystack - or identifying anonymous census record. Journal of O±cial Statistics, 2(3):329{336, 1986.

[17] U. Dayal and H. Y. Hwang. View de¯nition and generalization for database integration in a multidatabase systems. IEEE Transactions on Software Engineering, 10(6):628{ 645, 1984.

[18] A. Deutsch and Y. Papakonstantinou. Privacy in database publishing. In ICDT, 2005.

[19] W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classi¯cation. In Proc. of the SIAM International Conference on Data Mining (SDM), Florida, 2004.

[20] W. Du and Z. Zhan. Building decision tree classi¯er on private data. In Workshop on Privacy, Security, and Data Mining at the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, December 2002.

Untitled Page