IJSER Home >> Journal >> IJSER
International Journal of Scientific and Engineering Research
ISSN Online 2229-5518
ISSN Print: 2229-5518 4    
Website: http://www.ijser.org
scirp IJSER >> Volume 3,Issue 4,April 2012
Improved Decision tree algorithm for data streams with Concept-drift adaptation
Full Text(PDF, )  PP.67-70  
K.Ruth Ramya, R.S.S.Vishnu Priya, P.Panini Sai, N.Chandrasekhar
Adaptive learning strategies, Bayes Classifier, Concept-drift, Data Streams, Decision trees, Discriminant analysis,VFDT
Decision tree construction is a well studied problem in data mining. Recently, there has been much interest in mining streaming data. Algorithms like VFDT and CVFDT exist for the construction of a decision tree but, as the new examples are added, a new model has to be generated. In this paper, we have given an algorithm for construction of a decision tree that uses discriminant analysis, to choose the cut point for splitting tests thereby optimizing the time complexity to O(n) from O(nlogn). Also various adaptive learning strategies like contextual, dynamic ensemble, forgetting and detector approaches have been analyzed and handling of concept-drift occurred due to gradual change in data-set is discussed using naïve Bayes classifier at each inner node.
[1] R.O. Duda, P.E. Hart, and D. Stork. Pattern Classification. New York, Willey and Sons,2001.

[2] J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In P.Domingos and C. Faloutsos, editors, Procs. of the 9th ACM SigKDD Int. Conference in Knowledge Discovery and Data Mining. ACM Press, 2003.

[3] J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. Of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 13–24. acmpress, June 2001.

[4] Phillip B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. of the 2001 ACM Symp. on Parallel Algorithms and Architectures, pages 281–291. ACM Press, August 2001.

[5] Tom Mitchell. Machine Learning. McGraw Hill, 1997

[6] S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proceedings of ACM Symp. on Theory of Computing (STOC), pages 471–475. ACM Press, 2001.

[7] S. Viglas and J. Naughton. Rate-based query optimization for streaming information sources. In Proc. of the 2002 ACM SIGMOD Intl. Conf. on Management of Data, June 2002.

[8] P. Domingos and G. Hulten. Mining high-speed data streams.In Proceedings of the ACM Conference on Knowledge and Data Discovery (SIGKDD), 2000.

[9] Dynamic Integration of Classifiers for Handling ConceptDrift, Tsymbal,A., Pechenizkiy, M.,Cunningham,P.&Puuronen,S.Information Fusion, Special Issue onApplications of Ensemble Methods,9(1),pp.56F68,2008.

[10] Reference Framework for Handling Concept Drift: An Application Perspective. Žliobaitė,I. and Pechenizkiy, M. Technical report, Eindhoven University of Technology,2010

[11] G. Castillo, “Adaptive learning algorithms for bayesian network classifiers,” Ph.D. dissertation, Universidade de Aveiro, Departamento de Matematica, 2006. ´

[12] A. Tsymbal, “The problem of concept drift: definitions and related work,” Department of Computer Science, Trinity College Dublin, Tech. Rep., 2004

Untitled Page