Author Topic: Automatic Reordering Rule Generation Based On Parallel Tagged Aligned Corpus for  (Read 3172 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Author : Thinn Thinn Wai, Tin Myat Htwe, Ni Lar Thein
International Journal of Scientific & Engineering Research Volume 2, Issue 9, September-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstract— Reordering is important problem to be considered when translating between language pairs with different word orders. Myanmar is a verb final language and reordering is needed when it is translated into other languages which are different from Myanmar word order. In this paper, automatic reordering rule generation for Myanmar-English machine machine translation is presented. In order to generate reordering rules; Myanmar-English parallel tagged aligned corpus is firstly created. Then reordering rules are generated automatically by using the linguistic information from this parallel tagged aligned corpus. In this paper, function tag and part-of-speech tag reordering rule extraction algorithms are proposed to generate reordering rules automatically. These algorithms can be used for other language pairs which need reordering because these rules generation is only depend on part-of-speech tags and function tags.
Index Terms— Constituent Analysis, English-Myanmar Machine translation, parallel tagged aligned corpus, Reordering, Syntactic Analysis,

1   INTRODUCTION                                                                      
The goal of statistical machine translation is to translate an input word sequence in the source language into a target language word sequence. In order to improve the translation process, it is possible to perform preprocessing steps before training and translation in both source and target language sequence. In machine translation, reordering is one of the major problems,   since different languages have different word order re-quirements. When a Myanmar sentence is translated into English sentence, the verb in the Myanmar sentence must be moved after the subject of the English sentence in order to obtain the correct word order. On a sub sentential level, Myanmar word order diverges from English mostly within the noun phrase and verb phrase. Moreorver, there are many particles that support noun, adjective, and verb in Myanmar Language. They are subject marker particles, object marker particles, adjective support particles and verb support particles.  These particles do not exist in English and their missing can make the translation error.  So, each particle is needed to move its respective places scuch as beside a noun, verb and so on. To allievate the tag missing, moving these particles to their respective places is essential. Without reordering, the particles can be far from their relative nouns, verbs and adjectives and the correct word order can’t be obtained. In addition to this, the meaningful translation can’t also be obtained. Therefore, reordering is necessary for translation from Myanmar language to English Language. In this work, corpus creation procedure and reordering rules generation procedures are proposed for Myanmar-English statistical machine translation.
   The plan of this paper is as follows. In the next section, related works which use reordering approaches in a preprocessing step are reviewed. In Section 3, the significant differences of word order in English language and Myanmar language. Section 4 describes analysis steps and corpus creation. In Section 5, proposed reordering rule extraction algorithm and reordering rules are explained in details. In the last two sections, the experiments are reported and then we conclude the experiments and discuss future work re-spectively.

Different approaches have been developed to deal with the word order problem. First approaches worked by constraining reordering at decoding time [7]. In [12], the alignment model introduced the restrictions in word order, which leads also to restrictions at decoding time. A comparison of these two approaches can be found in [2]. They have in common that they do not use any syntactic or lexical information; therefore they rely on a strong language model or on long phrases to get the right word order. Other approaches were introduced that use more linguistic knowledge, for example the use of bitext grammars that allow parsing the source and target language [13]. In [10], syntactic information was used to re rank the output of a translation system with the idea of accounting for different reordering at this stage. In [11], a lexicalized block-oriented reordering model is proposed that decides for a given phrase whether the next phrase should be oriented to its left or right.
The most recent and very promising approaches that have been demonstrated reorder the source sentences based on rules learned from an aligned training corpus with a POS-tagged source side [8, 9, 20]. These rules are then used to reorder the word sequence in the most likely way.
   In our approach we follow the idea proposed in [20] of using a parallel training corpus with a tagged source side to extract rules which allow a reordering before the translation task.

When Myanmar sentence is translated to English sentence, many differences of word order can be found. In this section, significant word order differences; adjective movement, and adverb movement will be described.
Some adjectives (JJ) in noun chunk (NC) of Myanmar sentence are necessary to move before its relative noun (NN.Person) to obtain the correct English order. For ex-ample, when the Myanmar phrase “            ” is trans-lated into the English phrase “rich man”, the adjective “                     
 (JJ)” must be moved before its relative noun “       
        (NN.Person)” .  This can be seen in the Example (1).
Example (1),

Myanmar is also modifier and adjunct proceeding language. Therefore, these adjuncts are needed to move after its relative verb to make the correct word order in English Sentence. When the Myanmar sentence “         
                                 ” is translated into the English sen-tence “He runs quickly.” the adverb of manner “               ” must be moved behind its relative verb “    ” in order to fit the correct English order. Such adverb movement can be seen in the Example (2).
Example (2),
In this example, the verb particle pos tag (Sf.Dec) is also needed to move beside its relative verb not to miss the Myanmar word meaning. Therefore, the pos tag “VB.Common “and ‘Sf.Dec” in Myanmar phrase are combined to form only one tag “VB.Common” in English phrase.
All of these above necessities, word reordering is needed for Myanmar-Englsih statistical machine translation

Read More: Click here...