Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition
|
Full Text(PDF, 3000) PP.
|
|
Author(s) |
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun |
|
KEYWORDS |
Bilingual OCR, Machine Printed, Myanmar-English Scripts, SVM
|
|
ABSTRACT |
The increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for present day digital image analysis systems. And when the Digital Library is concerned with Science and Technology documents, it needs to advance the OCR system to bilingual nature as most of them are written in Myanmar in combination with English letters. In this paper a bilingual OCR to simultaneously recognize the printed English and Myanmar texts is proposed including segmentation mechanism for the overlapping nature of Myanmar scripts. The effectiveness of the proposed mechanism is proved with the experimental results of segmentation accuracy rates, comparisons of feature extraction methods and overall accuracy rates.
|
|
References |
|
[1] V. Govindaraju and Setlur, “Guide to OCR for Indic Scripts: Document
Recognition and Retrieval”, 2009
[2] “General guidelines for designing bilingual low cost digital library services
suitable for special library users in developing countries and the Arabic
speaking world”, World Library and Information Congress: 75th IFLA
General Conference and Counci, 23-27 August 2009, Milan, Italy.
[3] K. Shivsubramani, R. Loganathan, C. J. Srinivasan, V. Ajay and K. P.
Soman, “Multiclass Hierarchical SVM for Recognition of Printed Tamil
Characters”, Centre for Excellence in Computational Engineering, Amrita
Vishwa Vidyapeetham, Tamilnadu, India, 2007.
[4] N. S. Sarhan and L. Al-Zobaidy, “Recognition of Printed Assyrian
Character Based on Neocognitron Artificial Neural Network”, The
International Arab Journal of Information Technology, Vol 4, No.1, January
2007.
[5] R. Singh and M. Kaur, “OCR for Telugu Script Using Back-Propagation
Based Classifier”, International Journal of Information Technology and
Knowledge Management, July-December 2010, Vol. 2, No. 2, pp. 639-643.
[6] R. Singh, C. S. Yadav, P. Verma and V. Yadav, “Optical Character
Recognition (OCR) for Printed Devnagari Script Using Artificial Neural
Network”, International Journal of Computer
Science&CommunicationVol.1, No. 1,January-June 2010, pp. 91-95.
[7] D. Achaya U, N. V. S. Reddy and Krishnamoorthi, “Hierarchical
Recognition System for Machine Printed Kannada Characters”, IJCSNS
International Journal of Computer Science and Network Security, Vol. 8
No.11, November 2008.
[8] H. Guo and J. Zhao, “A Chinese Minority Script Recognition Method Based
on Wavelet Feature and Modified KNN”, Journal of Software, Vol. 5, No. 2,
February 2010.
[9] H. A. Al-Muhtaseb, S. A. Mahmoud and R. S. Qahwaji, “Recognition of
Off-line Printed Arabic Text Using Hidden Markov Models”, Information
and Computer Science Department, King Fahd University of Petroleum &
Minerals, Dhahran 31261, Saudi Arabia and Electronic Imaging and media
communications department, University of Bradford, Bradford, UK, 2008.
[10] B. Chaulagain, B. B. Rai and S. K. Raya, “Final Report on Nepali Optical
Character Recognition, NepaliOCR”, July 29, 2009.
[11] “Myanmar Orthography”. Department of the Myanmar Language
Commission, Ministry of Education, Union of Myanmar, June, 2003.
[12] J. Dong, A. Krzy_ zak and C. Y. Suen, “An improved handwritten Chinese
character recognition system using support vector machine”, Pattern
Recognition Letters, Vol. 26, 2005, pg- 1849–1856.
[13] S. Rawat et al., “A Semi-automatic Adaptive OCR for Digital Libraries”,
Centre for Visual Information Technology, International Institute of
Information Technology, Hyderabad - 500032, India, 2006.
[14] M. Meshesha and C. V. Jawahar , “Optical Character Recognition of
Amharic Documents”, Center for Visual Information Technology,
International Institute of Information Technology, Hyderabad - 500 032,
India, 2007.
[15] G.Vamvakas, B.Gatos, N. Stamatopoulos, and S. J. Perantonis, “A
Complete Optical Character Recognition Methodology for Historical Documents”, The Eighth IAPR Workshop on Document Analysis Systems,
2008.
[16] B. Philip and R. D. Sudhaker Samuel, “Preferred Computational
Approaches for the Recognition of different Classes of Printed Malayalam
Characters using Hierarchical SVM Classifiers”, International Journal of
Computer Applications (0975 - 8887) Vol. 1, No. 16, 2010.
[17] G. G. Rajput, R. Horakeri and S. Chandrakant, “Printed and Handwritten
Mixed Kannada Numerals Recognition Using SVM”, (IJCSE) International
Journal on Computer Science and Engineering, Vol. 02, No. 05, 2010, pg-
1622-1626.
[18] T. Swe and P. Tin, “Recognition and Translation of the Myanmar Printed
Text Based on Hopfield Neural Network”, Asia-Pacific Symposium on
Information and Telecommunication Technologies (APSITT), pp 99-104,
Myanmar, November 9-10, 2005.
[19] Y. Thein and M. M. Sein, “Myanmar Intelligent Character Recognition for
Handwritten”, University of Computer Studies, Yangon, Myanmar, 2006.
[20] S. Hussain, N. Durrani and S. Gul, “Survey of Language Computing in Asia
2005”, Center for Research in Urdu Language Processing, National
University of Computer and Emerging Sciences, 2005.
[21] H. P. P. Win and K. N. N. Tun, “Image Enhancement Processes for
Myanmar Printed Documents”, the fifth Conerence on Parallel & Soft
Computing, University of Computer Studies, Yangon, Myanmar, December
16, 2010.
[22] M. Agrawal and D. Doermann, “Re-targetable OCR with Intelligent
Character Segmentation”, The Eight IAPR Workshop on Document
Analysis Systems, 2008.
[23] R. Ramanathan et. al., “Robust Feature Extraction Technique for Optical
Character Recognition”, International Conference on Advances in
Computing, Control, and Telecommunication Technologies, 2009.
[24] S. V. Rajashekararadhya and Dr. P. V. Ranjan ,” Efficient Zone Based
Feature Extraction Algorithm for Handwritten Numeral Recognition of Four
Popular South Indian Scripts”, Journal of Theoretical and Applied
Information Technology, 2008.
[25] G. Vamvakas, B. Gatos and S. J. Perantonis , “A Novel Feature Extraction
and Classification Methodology for the Recognition of Historical
Documents ”, 10th International Conference on Document Analysis and
Recognition, 2009.
[26] Ngodrup et al., “Study on Printed Tibetan Character Recognition”,
International Conference on Artificial Intelligence and Computational
Intelligence, 2010.
[27] C. W. Hsu, C. C. Chang, and C. J. Lin, “A Practical Guide to Support
Vector Classification”, April 15, 2010.
|
|
|