IJSER Home >> Journal >> IJSER
International Journal of Scientific and Engineering Research
ISSN Online 2229-5518
ISSN Print: 2229-5518 5    
Website: http://www.ijser.org
scirp IJSER >> Volume 3,Issue 5,May 2012
Multi-Domain Record Matching over Query Results from Multiple Web Databases
Full Text(PDF, )  PP.1117-1122  
P.Kowsiga, T.Mohanraj
Record Matching, UDD, Duplicates, Multi-Domain, SVM, N-Staged SVM, Hyperplanes
Record Matching is a process to identify the duplicate records in web databases. It is an important step for data integration. In earlier systems, the record matching is addressed through the Unsupervised Online Record Matching method, UDD, i.e for a given user query, can effectively identify duplicates from the query result records of multiple web databases. This process of record matching are done through a single domain which provides limited number of non-duplicate data results. Hence, the proposal is made for a Multi-domain record matching process which includes an algorithm called N-Staged SVM, that helps to separate the duplicate and non-duplicate records based on the classifiers. The N-Staged SVM which helps to separate the duplicate and non-duplicate data using iterative process. A single domain can include multiple web databases, a single database can include multiple hyperplanes, a single hyperplane include multiple data, which are made separated as duplicate and non-duplicate using the N-Staged SVM. This process is repeated for multiple domains by constructing hyperplanes for each. Hence the result produced will be efficient and more reliable results are provided for the user query.
[1] R. Ananthakrishna, S. Chaudhuri, and V. Ganti, “Eliminating Fuzzy Duplicates in Data Warehouses,” Proc. 28th Int‟l Conf. Very Large Data Bases, pp. 586-597, 2002.

[2] M. Bilenko and R.J. Mooney, “Adaptive Duplicate Detection Using Learnable String Similarity Measures,” Proc. ACM SIGKDD, pp. 39-48, 2003.

[3] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” Proc. ACM SIGMOD, pp. 313-324, 2003.

[4] P.Christen, T.Churches, and M.Hegland, “Febrl-A Parallel Open Source Data Linkage System,” Advances in Knowledge Discovery and Data Mining, pp. 638-647, Springer, 2004.

[5] S. Chaudhuri, V. Ganti, and R. Motwani, “ Robust Identification of Fuzzy Duplicates,” Proc. 21st IEEE Int‟l Conf. Data Eng., pp. 865-876, 2005.

[6] H.Zhao, W.Meng, A.Wu, V.Raghavan, and C.Yu, “Fully Automatic Wrapper Generation for Search Engines,” Proc. 14th World Wide Web Conf., pp. 66-75, 2005.

[7] B. He and K.C.-C. Chang, “Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach,” ACM Trans. Database Systems, vol. 31, no. 1, pp. 346-396, 2006.

[8] Y. Zhai and B. Liu, “Structured Data Extraction from the Web Based on Partial Tree Alignment,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 12, pp. 1614-1628, Dec. 2006.

[9] A.K. Elmagarmid, P.G. Ipeirotis, and V.S. Verykios,“Duplicate Record Detection: A Survey,” IEEE Trans.Knowledge and Data Eng., vol. 19, no.1, pp. 1-16, Jan. 2007.

[10] P. Christen, “Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification,” Proc. ACM SIGKDD, pp. 151-159, 2008.

[11] W.Su, J.Wang, and F.H.Lochovsky, “Record Matching over Query Results from Multiple Web Databases,” IEEE Trans.Knowledge and Data Eng., vol. 22, no. 4, pp. 578-589, 2010.

Untitled Page