Building Bilingual Corpus based on Hybrid Approach for Myanmar-English Machine Translation
Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein
EM Algorithm, IBM Models, Machine Translation, Word-aligned Parallel Corpus, Natural Language Processing
Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. In this paper, we describe an alignment system that aligns English-Myanmar texts at word level in parallel sentences. Essential for building parallel corpora is the alignment of translated segments with source segments. Since word alignment research on Myanmar and English languages is still in its infancy, it is not a trivial task for Myanmar-English text. A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other.Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. The proposed approach is combination of corpus based approach and dictionary lookup approach. The corpus based approach is based on the first three IBM models and Expectation Maximization (EM) algorithm. For the dictionary lookup approach, the proposed system uses the bilingual Myanmar-English Dictionary.
