IJSER Home >> Journal >> IJSER
International Journal of Scientific and Engineering Research
ISSN Online 2229-5518
ISSN Print: 2229-5518 9    
Website: http://www.ijser.org
scirp IJSER >> Volume 2, Issue 9, September 2011
Ontology Based Text Categorization - Telugu Documents
Full Text(PDF, 3000)  PP.  
Author(s)
Mrs.A.Kanaka Durga, Dr.A.Govardhan
KEYWORDS
Concept-based model, IR, Ontology, Retrieval model, Term frequency, Text categorization and Telugu documents
ABSTRACT
In this paper, we introduce a new method of ontology based text classification for Telugu documents and retrieval system. Many of the text categorization techniques are based on word and/or phrase analysis of the text. Term frequency analysis signifies the importance of a term within a document. Two terms within a document can have the same frequency, but one term may contribute more to the meaning of the sentence compared to the other term. Our aim is to capture the semantics of a text. The model we worked enables to capture the terms that presents the concepts in the text and thus identifies the topic of the document. We have introduced the new concept based model which analyzes the terms on the sentences and documents level. This concept-based model effectively discriminates between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The limitations of key-word based search are overcome by usage of Ontology which is a motivation of semantic IR. The retrieval model is based on an adaptation of the classic vector-space model. The concept of ontology is associated with the related words and their weights from the pre-classified documents as a learning stage. In the main process, the words and their mutual relations are extracted from the target documents. The concept of Ontology is used to map the target document. A detailed description of the test results is illustrated in the paper and we explained thoroughly how the concept based classification is far more superior when compared to the word based classification for telugu documents.
References
[1]Sebastiani F., “Machine Learning in Automated Text Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.

[2] Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, and Jaime Carbonell (1999), Summarizing Text Documents: Sentence Selection and Evaluation Metrics, In ACM SIGIR 1999, pp.121-128, 1999.

[3] Dr.G.Uma Maheswara Rao, Morphological Analyser, at the centre for ALTS, University of Hyderabad.

[4] Dr.G.Uma Maheswara Rao and research team “Ontology_ Dictionary-Telegu”,at the centre for ALTS, University of Hyderabad.

[5] A. karthikeyan et al.,”An Novel Approach sing Semantic Information retrieval For Tamil documents”, International Jornal of Engineering Science and Technology,vol.2(9),2010,4424-4433.

[6].S.MChaware et al.,”A survey:Issues of semantic Matching for Indian Languages Using Ontology”,International Journal of Information echnology and knowledge Management, vol.2(2).pp.351-354,2010.

Untitled Page