IJSER Home >> Journal >> IJSER
International Journal of Scientific and Engineering Research
ISSN Online 2229-5518
ISSN Print: 2229-5518 10    
Website: http://www.ijser.org
scirp IJSER >> Volume 2, Issue 10, October 2011 Edition
Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition
Full Text(PDF, 3000)  PP.  
Author(s)
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun
KEYWORDS
Bilingual OCR, Machine Printed, Myanmar-English Scripts, SVM
ABSTRACT
The increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for present day digital image analysis systems. And when the Digital Library is concerned with Science and Technology documents, it needs to advance the OCR system to bilingual nature as most of them are written in Myanmar in combination with English letters. In this paper a bilingual OCR to simultaneously recognize the printed English and Myanmar texts is proposed including segmentation mechanism for the overlapping nature of Myanmar scripts. The effectiveness of the proposed mechanism is proved with the experimental results of segmentation accuracy rates, comparisons of feature extraction methods and overall accuracy rates.
References
[1] V. Govindaraju and Setlur, “Guide to OCR for Indic Scripts: Document Recognition and Retrieval”, 2009

[2] “General guidelines for designing bilingual low cost digital library services suitable for special library users in developing countries and the Arabic speaking world”, World Library and Information Congress: 75th IFLA General Conference and Counci, 23-27 August 2009, Milan, Italy.

[3] K. Shivsubramani, R. Loganathan, C. J. Srinivasan, V. Ajay and K. P. Soman, “Multiclass Hierarchical SVM for Recognition of Printed Tamil Characters”, Centre for Excellence in Computational Engineering, Amrita Vishwa Vidyapeetham, Tamilnadu, India, 2007.

[4] N. S. Sarhan and L. Al-Zobaidy, “Recognition of Printed Assyrian Character Based on Neocognitron Artificial Neural Network”, The International Arab Journal of Information Technology, Vol 4, No.1, January 2007.

[5] R. Singh and M. Kaur, “OCR for Telugu Script Using Back-Propagation Based Classifier”, International Journal of Information Technology and Knowledge Management, July-December 2010, Vol. 2, No. 2, pp. 639-643.

[6] R. Singh, C. S. Yadav, P. Verma and V. Yadav, “Optical Character Recognition (OCR) for Printed Devnagari Script Using Artificial Neural Network”, International Journal of Computer Science&CommunicationVol.1, No. 1,January-June 2010, pp. 91-95.

[7] D. Achaya U, N. V. S. Reddy and Krishnamoorthi, “Hierarchical Recognition System for Machine Printed Kannada Characters”, IJCSNS International Journal of Computer Science and Network Security, Vol. 8 No.11, November 2008.

[8] H. Guo and J. Zhao, “A Chinese Minority Script Recognition Method Based on Wavelet Feature and Modified KNN”, Journal of Software, Vol. 5, No. 2, February 2010.

[9] H. A. Al-Muhtaseb, S. A. Mahmoud and R. S. Qahwaji, “Recognition of Off-line Printed Arabic Text Using Hidden Markov Models”, Information and Computer Science Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia and Electronic Imaging and media communications department, University of Bradford, Bradford, UK, 2008.

[10] B. Chaulagain, B. B. Rai and S. K. Raya, “Final Report on Nepali Optical Character Recognition, NepaliOCR”, July 29, 2009.

[11] “Myanmar Orthography”. Department of the Myanmar Language Commission, Ministry of Education, Union of Myanmar, June, 2003.

[12] J. Dong, A. Krzy_ zak and C. Y. Suen, “An improved handwritten Chinese character recognition system using support vector machine”, Pattern Recognition Letters, Vol. 26, 2005, pg- 1849–1856.

[13] S. Rawat et al., “A Semi-automatic Adaptive OCR for Digital Libraries”, Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad - 500032, India, 2006.

[14] M. Meshesha and C. V. Jawahar , “Optical Character Recognition of Amharic Documents”, Center for Visual Information Technology, International Institute of Information Technology, Hyderabad - 500 032, India, 2007.

[15] G.Vamvakas, B.Gatos, N. Stamatopoulos, and S. J. Perantonis, “A Complete Optical Character Recognition Methodology for Historical Documents”, The Eighth IAPR Workshop on Document Analysis Systems, 2008.

[16] B. Philip and R. D. Sudhaker Samuel, “Preferred Computational Approaches for the Recognition of different Classes of Printed Malayalam Characters using Hierarchical SVM Classifiers”, International Journal of Computer Applications (0975 - 8887) Vol. 1, No. 16, 2010.

[17] G. G. Rajput, R. Horakeri and S. Chandrakant, “Printed and Handwritten Mixed Kannada Numerals Recognition Using SVM”, (IJCSE) International Journal on Computer Science and Engineering, Vol. 02, No. 05, 2010, pg- 1622-1626.

[18] T. Swe and P. Tin, “Recognition and Translation of the Myanmar Printed Text Based on Hopfield Neural Network”, Asia-Pacific Symposium on Information and Telecommunication Technologies (APSITT), pp 99-104, Myanmar, November 9-10, 2005.

[19] Y. Thein and M. M. Sein, “Myanmar Intelligent Character Recognition for Handwritten”, University of Computer Studies, Yangon, Myanmar, 2006.

[20] S. Hussain, N. Durrani and S. Gul, “Survey of Language Computing in Asia 2005”, Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, 2005.

[21] H. P. P. Win and K. N. N. Tun, “Image Enhancement Processes for Myanmar Printed Documents”, the fifth Conerence on Parallel & Soft Computing, University of Computer Studies, Yangon, Myanmar, December 16, 2010.

[22] M. Agrawal and D. Doermann, “Re-targetable OCR with Intelligent Character Segmentation”, The Eight IAPR Workshop on Document Analysis Systems, 2008.

[23] R. Ramanathan et. al., “Robust Feature Extraction Technique for Optical Character Recognition”, International Conference on Advances in Computing, Control, and Telecommunication Technologies, 2009.

[24] S. V. Rajashekararadhya and Dr. P. V. Ranjan ,” Efficient Zone Based Feature Extraction Algorithm for Handwritten Numeral Recognition of Four Popular South Indian Scripts”, Journal of Theoretical and Applied Information Technology, 2008.

[25] G. Vamvakas, B. Gatos and S. J. Perantonis , “A Novel Feature Extraction and Classification Methodology for the Recognition of Historical Documents ”, 10th International Conference on Document Analysis and Recognition, 2009.

[26] Ngodrup et al., “Study on Printed Tibetan Character Recognition”, International Conference on Artificial Intelligence and Computational Intelligence, 2010.

[27] C. W. Hsu, C. C. Chang, and C. J. Lin, “A Practical Guide to Support Vector Classification”, April 15, 2010.

Untitled Page