Author Topic: Abstract— Reordering is important problem to be considered when translating betw  (Read 1941 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Author : Nyein Thwet Thwet Aung, Khin Mar Soe, Ni Lar Thein
International Journal of Scientific & Engineering Research Volume 2, Issue 9, September-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstract— Natural Language Processing has been developed to allow human-machine communication to take place in a natural-language. Word Sense Disambiguation (WSD) has always been a key problem in Natural Language Processing. WSD is defined as the task of finding the correct sense of a word in a specific context. Several methodological issues come up with the context of WSD. These are supervised and unsupervised WSD approaches. Supervised WSD approaches have obtained better results than unsupervised WSD approaches. There is not any cited work for resolving ambiguity of words in Myanmar language. Using Naïve Bayesian (NB) classifiers is known as one of the best method for supervised approaches for WSD. In this paper, we use Naïve Bayesian Classifier to disambiguate ambiguous Myanmar words with part-of-speech ‘noun’ and ‘verb’. The system also uses Myanmar-English Parallel Corpus as training data. The WSD module developed here will be used as a complement to improve Myanmar-English machine translation system. As an advantage, the system can improve the accuracy of Myanmar to English language translation.
Index Terms— Myanmar-English machine translation, Myanmar-English parallel corpus, Naïve Bayes Classifier, Natural Language Processing, supervised approach, unsupervised approach, Word Sense Disambiguation.

1   INTRODUCTION                                                                      
Word sense disambiguation (WSD) is one of the most critical and widely studied Natural Language Processing tasks, which is used in order to increase the success rates of NLP applications like machine trans-lation, information search and information extract, natural language understanding (such as man-machine conversation system, interrogator-responder system), text auto-proofreading, speech recognition, sound-character transformation, syntax structure recognition and the language study etc [8].
   WSD can be defined as the process of identifying the correct sense or meaning of a word in a particular con-text. When a human being is encountered with a word with multiple senses, he easily identifies the exact sense of the word with the help of context without giving a single thought to the other senses. But when the same situation is provided to a computer it is not an easy task to correctly identify the desired sense. WSD process helps in resolving such ambiguity issues [1]. Sometimes a word differs in meaning when its Part- of-Speech (POS) is different. For example butter can be a verb or a noun as it can be seen in the following example: 
Will you spread butter [Noun] on toast? 
Don't think you can butter [Verb] me up that easily. 
In one sentence butter as a noun means “a solid yel-low food made from milk or cream” [3], while in the other sentence butter as a verb means “to say nice things to someone so that they will do what you want” [3]. As such ambiguities can easily be resolved with the help of Part Of Speech (POS), WSD does not entertain such words. The word with different meanings having same POS needs some WSD process to conclude the accurate sense. For example, Chair in English can be “a separate seat for one person” or “the person in charge of a meeting or an organization”.

Three main approaches have been applied in the WSD field. These are knowledge-based approaches, corpus based approaches and hybrid approach. Knowledge based approaches use Machine Readable Dictionaries (MRD). It relies on information provided by MRD. Corpus based approaches can be divided into two types, supervised and unsupervised learning approaches. Supervised learning approaches use information gathered from training on a corpus that has sense-tagged for semantic disambiguation. The classification approach of WSD makes use of statistical approaches either referring lexicons or using corpus for training. Thesauri, lexicons and corpus are the main source of training in the supervised approach. Unsupervised leaning approaches determine the class membership of each object to be classified in a sample without using sense-tagged training examples. Hybrid approach combines aspects of fore mentioned methodologies [11].
All approaches mentioned above have been used by different researchers for different languages. Among them, corpus based approaches select a target word using statistic information that is automatically ex-tracted from corpora. Corpus based method is one of the successful lines of research on WSD. In this paper, we focus on implementing WSD process for Myanmar language. We aim an application of WSD for machine translation (MT), where the system has to select the correct translation equivalent in the target language of a polysemous item in the source language. The current work is an initial step to resolve the ambiguity of words in Myanmar context. The technique that is implemented to resolve ambiguity is Bayesian Classification.

The remainder of this paper is organized as follows: We discuss the related work in section 2. Section 3 shows the ambiguity of Myanmar Language. Section 4 and 5 describe about Naïve Bayesian Classification and Naïve Bayesian Classifier for WSD. Section 6 shows the overview of the proposed system. Section 7 discusses the execution of proposed WSD Algorithm and section 8 shows the implementation of the system. The evaluation of the system is described in section 9. The paper is concluded in Section 10.

Word   No: of Sense   Sense
1   Sense
2   Sense
3   Sense

Many researchers have been work for word sense dis-ambiguation in English Language. For the research reported in this paper, we will emphasis on the ambiguity of the Myanmar words because it is still now open in Machine Translation. In the following paragraphs, we discuss briefly some of the related work and history in the area of Word Sense Disambiguation.
Cuong Anh Le and Akira Shimazu (2004) performed to obtain High WSD accuracy using Naive Bayesian classifier with rich features [2]. Ishizaki (2006) performed a word sense disambiguation system using modified Bayesian algorithms for Indonesian language [9]. Samir Elmougy, Taher Hamza and Hatem M.Noaman (2008) discussed rooting algorithm with Naïve Bayes Classifier for Arabic Word Sense Disambiguation [10]. Farag Ahmed and Andreas Nurnberger (2008) proposed Arabic/English Word translation disambiguation using parallel corpora and matching schemes [4].
Yu Zheng-tao, Deng Bin, Hou Bo, Han Lu and Guo Jian-yi (2009) discussed word sense disambiguation based on Bayes model and information gain [13]. Asma Naseer and Sarmad Hussain (2009) proposed Supervised Word Sense Disambiguation for Urdu Using Bayesian Classification [1]. Zhang Zheng and Zhu Shu (2009) presented a new approach to Word Sense Disambiguation in MT System [14]. Laroussi Merhbene, Anis Zouaghi and Mounir Zrigui (2010) discussed Ambiguous Arabic Words Disambiguation [7]. They used context matching algorithm. The system achieved a precision of 78% and recall of 65%, using roots and signatures identifying each sense.

Myanmar language is the official language of the Union of Myanmar. It is written from left to right and no spaces between words, although informal writing often contains spaces after each clause. It is syllabic alphabet and written in circular shape. It has sentence boundary mark. It is a free-word-order language, which usually follows the subject-object-verb (SOV) order. In particular, preposition adjunctions can appear in several different places of the sentence. However, English Language has a rigid subject-verb-object (SVO) order.
Like English, Myanmar language has semantic ambiguity problem. Although using statistical methods has been very successful for some of important problems in Myanmar Natural Language Processing such as Part Of Speech tagging, segmentation and alignment of parallel translation, an effective method for solving semantic ambiguity problem does not exist yet. In table 1 and 2 show some examples of Myanmar ambiguous nouns and verbs and their senses.

Read More: Click here...