Building Bilingual Corpus based on Hybrid Approach for Myanmar-English Machine Translation

Home >> Journal >> IJSER

International Journal of Scientific and Engineering Research

ISSN Online 2229-5518

ISSN Print: 2229-5518 9

Website: http://www.ijser.org

IJSER >> Volume 2, Issue 9, September 2011

Building Bilingual Corpus based on Hybrid Approach for Myanmar-English Machine Translation

Full Text(PDF, 3000) PP.

Author(s)

Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein

KEYWORDS

EM Algorithm, IBM Models, Machine Translation, Word-aligned Parallel Corpus, Natural Language Processing

Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. In this paper, we describe an alignment system that aligns English-Myanmar texts at word level in parallel sentences. Essential for building parallel corpora is the alignment of translated segments with source segments. Since word alignment research on Myanmar and English languages is still in its infancy, it is not a trivial task for Myanmar-English text. A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other.Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. The proposed approach is combination of corpus based approach and dictionary lookup approach. The corpus based approach is based on the first three IBM models and Expectation Maximization (EM) algorithm. For the dictionary lookup approach, the proposed system uses the bilingual Myanmar-English Dictionary.


References

[1] Bing Xiang,Yonggang Deng,and Bowen Zhou, “Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages”, Proceedings of the ACL 2010 Conference Short Papers, 2010, pages 22–26. [2] C. Callison-Burch, D. Talbot, and M. Osborne, “Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora”. In Proceedings of ACL, Barcelona, Spain, July 2004, pages 175–182. [3] D. Wu. “Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria” In: Proc. of the 32nd Annual Conference of the ACL: 80-87. Las Cruces, NM in 1994. http://acl.ldc.upenn.edu/P/P94/P94- 1012.pdf [4] Eknath Venkataramani and Deepa Gupta, “English-Hindi Automatic Word Alignment with Scarce Resources”. In International Conference on Asian Language Processing, IEEE, 2010. [5] F. Och and H. Ney, “A Systematic Comparison of Various Statistical Alignment Models”. Computational Linguistics, 29(1):19– 52, 2003. [6] G. Chinnappa and Anil Kumar Singh, “A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models”, In Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India. 2007. [7] Helen Langone, Benjamin R. Haskell, Geroge, A.Miller, “Annotating WordNet”, In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL, 2004. [8] Ittycheriah and S. Roukos, “A Maximum Entropy Word Aligner for Arabic-English Machine Translation”. In Proceedings of HLTEMNLP. Vancouver, Canada, 2005, Pages 89–96. [9] J.Martin, R. Mihalcea, and T. Pedersen, “Word Alignment for Languages with Scarce Resources”. In Proceedings of the ACL Workshop on Building and Using Parallel Texts. Ann Arbor, USA ,2005, Pages 65–74,. [10] Jamie Brunning, Adria de Gispert and William Byrne, “Context- Dependent Alignment Models for Statistical Machine Translation”. The 2009 Annual Conference of the North American Chapter of the ACL, pages110–118,Boulder, Colorado, June 2009. [11] Li and Chengqing Zong, “Word Reordering Alignment for Combination of Statistical Machine Translation Systems”, IEEE, 2008. [12] Niraj Aswani and Rpbert Gaizauskas, “A hybrid approach to align sentences and words in English-Hindi parallel corpora”. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, June, 2005, page 57-64. [13] Pascale Fung and Kenneth Ward Church, “ K-vec: A New Approach for Aligning Parallel Texts”. In Proceedings of the 15th conference on Computational linguistics. Kyoto, Japan, 1994, Pages 1096-1102. [14] P. Koehn, F. J. Och, and D. Marcu, “Statistical Phrase based Translation”. In Proceedings of HLT-NAACL. Edmonton, Canada. 2003 ,Pages 81–88. [15] P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L.Mercer, “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2):263–311, 1993. [16] R. Mihalcea and T. Pedersen, “An evaluation exercise for word alignment”. In Proceedings of HLT-NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond. Edmonton, Canada., 2003, Pages 1–6. [17] W.P.Pa,N.L.Thein, ""Disambiguation in Myanmar Word Segmentation"",ICCA,February,2009.

Untitled Page