Ontology Based Text Categorization - Telugu Documents
|
Full Text(PDF, 3000) PP.
|
|
Author(s) |
Mrs.A.Kanaka Durga, Dr.A.Govardhan |
|
KEYWORDS |
Concept-based model, IR, Ontology, Retrieval model, Term frequency, Text categorization and Telugu documents
|
|
ABSTRACT |
In this paper, we introduce a new method of ontology based text classification for Telugu documents and retrieval system. Many of the text categorization techniques are based on word and/or phrase analysis of the text. Term frequency analysis signifies the importance of a term within a document. Two terms within a document can have the same frequency, but one term may contribute more to the meaning of the sentence compared to the other term. Our aim is to capture the semantics of a text. The model we worked enables to capture the terms that presents the concepts in the text and thus identifies the topic of the document. We have introduced the new concept based model which analyzes the terms on the sentences and documents level. This concept-based model effectively discriminates between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The limitations of key-word based search are overcome by usage of Ontology which is a motivation of semantic IR. The retrieval model is based on an adaptation of the classic vector-space model. The concept of ontology is associated with the related words and their weights from the pre-classified documents as a learning stage. In the main process, the words and their mutual relations are extracted from the target documents. The concept of Ontology is used to map the target document. A detailed description of the test results is illustrated in the paper and we explained thoroughly how the concept based classification is far more superior when compared to the word based classification for telugu documents.
|
|
References |
|
[1]Sebastiani F., “Machine Learning in Automated Text
Categorization,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47,
2002.
[2] Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, and Jaime
Carbonell (1999), Summarizing Text Documents: Sentence
Selection and Evaluation Metrics, In ACM SIGIR 1999,
pp.121-128, 1999.
[3] Dr.G.Uma Maheswara Rao, Morphological Analyser, at the
centre for ALTS, University of Hyderabad.
[4] Dr.G.Uma Maheswara Rao and research team “Ontology_
Dictionary-Telegu”,at the centre for ALTS, University of
Hyderabad.
[5] A. karthikeyan et al.,”An Novel Approach sing Semantic Information
retrieval For Tamil documents”, International Jornal of Engineering
Science and Technology,vol.2(9),2010,4424-4433.
[6].S.MChaware et al.,”A survey:Issues of semantic Matching
for Indian Languages Using Ontology”,International
Journal of Information echnology and knowledge Management,
vol.2(2).pp.351-354,2010.
|
|
|