Numeral Handwritten Hindi/Arabic Numeric

Recognition Method

Mohamed H. Ghaleb1, Loay E. George2, and Faisel G. Mohammed3

Computer of Science, College of Science, University of Baghdad, Iraq

Abstract: Handwritten numerals recognition plays a vital role in postal automation services. The major problem in handwritten recognition is the huge variability and distortions of patterns. The aim of the current research work is to develop a heuristic based method has good recognition efficiency for recognizing numeral free handwritten objects. In this research, the introduced method for extracting features from patterns is based on (i) the percentage of strokes in both horizontal and vertical direct ions and (ii) some morphological operations. The proposed method gives good recognition result, the attained recognition rate is

98.15%, the number of tested samples was 4500 samples.

Keyword: Artificial intelligence, pattern recognition, image segmentation, character recognition, handwritten recognition.

1 INTRODUCTION

—————————— ——————————

andwritten recognition has attracted many research- ers across the world [1, 9, and 10]. The problem of automatic recognition of handwritten text as opposed
to machine printed text is a complex one, especially for cursive based languages. Several researchers have in- troduced algorithms for character recognition for different languages such as English, Chinese, Japanese, and Ara- bic [2, 4, and 5].
Typical Optical Character Recognition (OCR) system consists of the phases: preprocessing, segmentation, fea- ture extraction, classifications and recognition. The output of each stage is used as the input of next stage. Prepro- cessing stage consists of many adjustment operations for slant correction, normalization and thicking. Many newly proposed methods have been introduced for the purpose of feature extraction [3, 5].
Most of the Indian scripts are distinguished by the pres- ence of matras (or, character modifiers) in addition to main characters, while the English script has no matras. Therefore, the algorithms developed specifically for Eng- lish are not directly applicable to Indian scripts [6].

2 RELATED WORKS

In recent years some researchers have developed computational intelligence models for accurate recogni- tion of Arabic text. Al-Omari [7] used an average template matching approach for recognizing Arabic (Indian) nu- merals. He suggested the use of feature vectors repre- senting a set of significant boundary points distances from the center of gravity (COG) of the numeral object. He were used these features to derive a model for each nu- meric digit. An overall hit ratio of 87.22% was achieved in the preliminary results. This ratio reached 100% for some of the digits. But there was misinterpretation between sim- ilar digits like (6) and (9). Classification was performed using the Euclidean distance between the feature vector of the test samples and the generated models.
Sadri et al. [8] proposed the use of support vector
machine for the recognition of isolated handwritten Ara- bic/Persian numerals. The method views each digit from four different directions, and then extracting the features used which are used to train SVM classifiers to recognize the digit classes. An average recognition rate of 94.14% was obtained.
A new method based on Hidden Markov Model (HMM) for recognition of isolated handwritten Arabic (In- dian) numerals was presented by Mahmoud [9]. In his method, four sets of features (i.e. angle, circle, horizontal and vertical (ACHV)), were generated based on the seg- mentation of numeral image, and for each segment the ratio of black pixels to segment size was computed. The- se features were used for training and evaluating the HMM models. Average recognition rate of 97.99% was achieved.
The use of abductive machine learning techniques for the recognition of handwritten Arabic (Indian) numerals was demonstrated by Lawal [10]. An average recognition rate of 99.03% was achieved with a set of only 32 fea- tures based on FCC codes.

3 PROPOSED SYSTEM DESCRIPTION

Like other languages Indian has 10 basic digits, the scope of this paper is limited to develop an approach for detect- ing the handwritten Hindi numerals (from one to nine: 1-9) which are commonly used in Arabic writing. Each numeral type was written by different peoples in different style, as shown in Table (1). The proposed system has ability to recognize numeral objects in different background and foreground colors as shown in Table (1).

Table 1: Different Styles of Handwritten Hindi Nu- merals Samples

Load colored digit image

Convert to

gray image Binarization



Preprocessing and segmentation

Features extraction



No

Clipping

Calculate SP for horizontal and vertical

Grand Class

Type Decision

Noise removal

Yes

Unrecognized Numerals

Decision, see Figure (3)

Single Numeric Class

Decision, see Figure (2)

Figure (1) shows the scheme of the proposed sys- tem. In general the first step in this system is involved

Decision making

Print ID

with loading the numeral color image file; then it is con-
verted to gray image, and to binary image using thresh- olding method.
The obtained binary digit object image is enhanced using median filter to remove unwanted isolated pixels (noise). After this step, the image is clipped, and then the numbers of strokes are calculated via horizontal and ver- tical scans. The counted numbers of strokes are normal- ized to determine the corresponding percentages of strokes.
The determined percentage is then tested to decide the class/type of tested image. Some morphological crite- ria are used to discriminate between samples belong to same class.
The stages of the developed system are shown in
Figure (1), they are:

1. Preprocessing and segmentation stage (image to image).

2. Features extraction stage (image to feature).

3. Decision making stage (feature to interpretation).

The details of the each stage are clarified in the next sec- tions.

Figure 1: The Block Diagram of Proposed System, SP de- notes to strokes percentages

3.1 PREPROCESSING AND SEGMENTATION

This stage consists of the following tasks:

A. Load colored digit image: The system has abil- ity to load bitmap image file format.

B. Convert to gray image: The digit image is con-

verted to gray image.

C. Binarization: In the normal case the numeral im-

age should consist of two colors (i.e., foreground
& background), the largest repeated color refers to the background and the second largest re- peated color refers to the foreground color. The steps taken to convert grayscale image to binary image are:

a. Determine the histogram of the image.

b. Search in the histogram to find the largest two peaks they should be separated at least by certain colors. So, the distance between the locations of these two peaks should be more than a predefined minimum distance value. The midpoint between these two peaks is considered as the threshold value used to convert image from gray to binary:

Threshold value = (Pe+Ps)/2

Where, Pe is the value of right peak, and

Ps is the left peak value.

c. Scan all images pixels if the gray pixel value is close to the highest peak value then set pixel value (0) otherwise set the value (1). Where black pixel (0) refers to the back- ground and white pixel (1) refers to the fore- ground.

D. Noise removal: To remove noise from binary dig- it image the median filter was used.

E. Clipping: this process used to clip the numeral

image from input image such that the new image boundaries confine the numeral object area. The scanning process consists of the following steps: a. Find the left and right edge (the most left and

right columns contain white pixels (1)).

b. Find the top row and bottom row (the first and last row contain white pixels (1)).

The new width and height of the clipped image are:-

Width = right edge – left edge + 1

Height =bottom edge – top edge + 1

So, the size of the clipped image is (width, height)

3.2 FEATURES EXTRACTION

The extracted features are mainly based on the percentage of number of strokes for both horizontal and vertical directions. The minimum number of strokes in both horizontal and vertical is one stroke because the characteristics of Hindi figure (1 to 9) are being connected, and the maximum number of strokes is 4 which can be found in the numeral im- age "4". The calculation steps for the percentages of number of strokes for both horizontal and vertical are:

A. Let HP and VP as array [1 to 4] which represent

the horizontal and vertical percentage of the number of found strokes {1, 2, 3, 4}.

B. Let Counter as an array [1 to 4] that represents the counters for the four types of strokes. The ini-

tial values of the array elements are set zero.

C. Scan horizontally each row in the clipped image: C1. Set Stroke = number of strokes counted in

the tested row

If Stroke=0 then set Stroke=1

Else If Stroke>4 then set Stroke=4

Increment Counter[Stroke]by 1

C2. Set HP[I]=Counter[I]/Width

Where I represents the four strokes, and HP is the horizontal percentage Stroke of I.

D. Scan vertically each column in the clipped image: D1. Initialize Counter array (all values of the array

elements are set equal to 0).

D2. Repeat c1 to c2 steps but for columns in- stead of rows and VP instead of HP

3.3 DECISION MAKING

This stage implies of the following operations:

A. Grand Class Type Decision: This step classified the input images into the following 8 grand clas- ses:

a. 1st GClass (N1): This class contains the image object that recognized as number "1". The features of Class1 are the height and width of clipped image, this means we don't need to calculate the percentages of the four strokes types.

b. 2nd GClass (N4): This class contains the im- age object that recognized as number "4".

c. 3rd GClass (N5): This class contains the im- age object that recognized as number "5".

d. 4th GClass (N16): This class contains the

numeral objects that recognized as number

"1" or "6".

e. 5th GClass (N78): This class contains the im- age objects that recognized as number "7" or

"8".

f. 6th GClass (N24): This class contains the im- age objects that recognized as number "2" or "4".

g. 7th GClass (N39): This class contains the im- age objects that recognized as number "3" or "9".

h. 8th GClass (N-1): This class contains the im- age objects that are not recognized.

The following set of criteria was used to recog- nize the above listed grand classes:

(i) If Height > Width*4 the Gclass=1

(ii) If HP(1)>0.7 and (VP(4)>0.05 or VP(3)>0.25)

the Gclass=2

(iii) If HP(2)+HP(3)>0.53 and VP(2)+VP(3)>0.53 the Gclass=3

(iv) If HP(1)>0.85 and VP(1)>0.85 the Gclass=4 (v) If HP(2)>0.38 and VP(1)>0.81 the Gclass= 5 (vi) If HP(1)>0.89 and HP(3)<0.02 and

VP(2)>0.3 the Gclass=6

(vii) HP(2)+HP(3)>0.2 and HP(2)+HP(3)<0.35 the Gclass=7

(viii) Otherwise Gclass=8

The fixed percentages values mentioned is above list of criteria are used to reduce the inter- lacing between classes, wing other value will lead to increase the interlacing between classes see table (2). The fixed percentages values have been found by following trail procedure. In case of finding a grand classes consist of more than one numeral (i.e., 4th, 5th, 6th, 7th grand class) then the next stage (i.e., B) should applied.

B. Single Numeric Class Decision

In single class type decision making, their will be four expected grand classes ("16", "78", "24", "39") and each grand class has more than one expected numeral. The sub-classification task was handled using different set of criteria, each set is designed to handle the numerals be- long to one of the grand classes. So, four sets of morphological based criteria were introduced the handle within classification task see figure (2):

Grand class

Yes

4th

GClass

from right to left and then from right to left for twice times.

d. The seventh grand class (N39): To distin-

guish between "3" and "9" numerals from the input image nominated as "39" grand class,

Class=16

No

test height

with width

th

Yes 5 GClass test

the fact that numeral "9" have a closed circle in the high part of its image is taken. This cir- cle must be fully closed or, up to some ex- tent. While the numeral "3" doesn’t have this

Class=78

No

Class=24

No

Yes

position of high-

est distance

6th GClass test cor-

th

circular segment.

C. Unrecognized Numerals Decision

In this stage the unrecognized numerals through the first (i.e., grand) classifier will distin- guished. This could be done through the follow- ing recognition steps see figure (3):

Class=39 Yes

No

7 GClass

test closed circle

a. If a closed circle is found in the high part

(around 60% of the image) then the numeral is recognized as "9".

b. If all image content represent a semi closed circle then the numeral is recognized as "5".

c. If HP(2) + HP(3) > 0.65 and VP(2) + VP(3) >

Figure 2: Single Numeric Class Decision

a. The fourth GClass (N16): To distinguish be- tween "1" and "6" numerals from input image nominated as "16" grand class. The width to height ratio of the tested clipped image is used as follows:

(i) If height > width *2 then class=1 Else class=6

b. The fifth GClass (N78): To distinguish be-

tween "7" and "8" numerals from input image nominated as "78" grand class. The main dif-

0.06 then done step (d).

Otherwise this image unrecognized as nu- meral.

d. Test the curvature of the upper part of the

numeral image; if its type is convex the nu- meral is considered "2" or "4" (recognized as in sixth grand class (N24)), while if it is a con- cave type the numeral is considered "3".

8th GClass (N-1)

ference between "7", "8" is the position of highest distance found between the two
strokes during the horizontal scans. If the highest distance is found at the top side of the image then the numeral is recognized as "7", otherwise it is "8". The implementation steps
are classified in the following:

Scan horizontally each row in the clipped image.

Numeral 9 distin- guishing procedure

Numeral 5 distin- guishing procedure

Yes

Class =5

No

Class =9

No

Yes

Set count=0.

Calculate the distance between first and second strokes for row that contain two

strokes only if this distance is more than or

equal the distance of the previous scanned row then increment count by 1.

If (count div number of the rows contain only two strokes) > 0.7 then the tested numeral=7 otherwise it is numeral =8

c. The sixth grand class (N24): To distinguish

between "2" and "4" numerals from the input image nominated as "24" grand class. The

Class=234

No

Yes

Numeral 3or24 distin- guishing procedure

No

Class=24

Yes

Class24 analysis

fact that numeral "4" has only two corners while numeral "2" has only one corner is tak- en if we take a horizontal scan on numeral
"2" from left side. Starting from bottom side, we will see that the line direction is coming from right to left and at the upper side it will reverted to be from left to right. While, for numeral "4" the line direction must iterated

Figure 3: Unrecognized Numerals Decision

C.1 Numeral 9 distinguishing procedure:

This can be done as follows:

a. Let Closed Circle Counter CCC=0 and Two

Strokes Counter TSC=0.

b. Calculate number of strokes in the high part (around 60%) of the image. If the number of counted strokes equal 2 or 3 then increment TSC by 1 otherwise go to step (2).

c. If the number of counted strokes=3 then the two strokes that have biggest distance are used.

d. Allocate the first point belong to the background after the 1st stroke.

e. Allocate the last point belong to the background before the 2nd stroke.

f. Set column counter ColC=0.

g. Start vertical scan along the points from 1st to 2nd point, at each vertical scan instance if there are upper and lower strokes are found then incre-

ment ColC by 1.

h. If we reach ColC>60% of the two points distance then increment CCC by 1.

i. Search in all columns in image, and repeat steps

(b) to (h) (but for columns instead of rows).

j. If the attained CCC>10% of total points within the image and CCC>60% of TSC, then the image have a closed circle otherwise it doesn’t.

C.2 Numeral 5 distinguishing procedure:

The same above mentioned steps for numeral "9" are followed but the search will be for whole image region.

4. Results and Conclusions

The conducted tests have been applied on a set con-
sist of (4500) numeral images extracted from (42) scanned documents prepared by (42) persons. The tested results indicated that the attained recognition rate is
98.15%. This recognition rate is achieved when the per- centage of strokes for both horizontal and vertical is uti- lized as a discriminating feature and the interlaced clas- ses are separated using morphological operation, as ex- plain in the section 2. Table (2) shows the attained recog- nition ratios whine the criteria listed in section 2.3.1 have been used. Table (3) shows the recognition ratios of (suc- cess rate, failure rate, and misclassified rate) for all nu- merals from 1 to 9.
Since the feature based on strokes percentage are reflection invariant (vertical and horizontal direction) so, for this reason some additional morphological attributes (like distance between strokes) have been use to signified between numerals that are mirrors to each other (e.g. the
numerals "7" and "8").

Table2. The Recognition for the 8 Grand Classes

-1

0.8

10.8

36.8

30

10.4

2.6

1.8

0.4

52

Misclassified

Rate

0.4

0

0

0

0

0

1.2

0.2

1

Table3. The final recognition ratio of (Success Rate,

Failure Rate, and Misclassified Rate)

Samples

1

2

3

4

5

6

7

8

9

Success Rate

96.6

99.8

99.8

97.6

99.2

97.4

97

99.4

96.6

Failure Rate

3.4

0.2

0.2

2.4

0.6

1.4

2.8

0.4

3.4

Misclassified

Rate

0

0

0

0

0.2

1.2

0.2

0.2

0

REFERENCES

[1] Wadhwa, D. and Verma, K., ―Online Handwriting Recognition of Hindi Numerals using Svm‖, Interna- tional Journal of Computer Applications, Vol. 48 No.

11, pp (13-17), June 2012.
[2] Elnagar, A., Al-Kharousi, F., and Harous, S., ―Recogni- tion of Handwritten Hindi Numerals using Structural Descriptors‖, IEEE International Conference, Vol. 2, pp (983-988), 12-15 Oct 1997.

[3] Sinha, G., Rani, R., and Dhir, R., ―Handwritten Gur- mukhi Numeral Recognition using Zone-based Hybrid Feature Extraction Techniques‖, International Journal of Computer Applications, Vol. 47No.21, pp (24-29), June 2012.

[4] Rani, A., Rani, R., and Dhir, R., ―Combination of Differ-

ent Feature Sets and SVM Classifier for Handwritten Gurumukhi Numeral Recognition‖, International Jour- nal of Computer Applications, Vol. 47No.18, pp (28-
33), June 2012.

[5] Dhandra, B.V., Benne R.G., and Hangarge, M., ―Print- ed and Handwritten Kannada Numerals Recognition Using Directional Stroke and Directional Density with KNN‖, International Journal of Machine Intelligence, Vol. 3, Issue 3, pp (121-125), November 2011.

[6] Hanmandlu, M., Nath, A.V., Mishra, A.C., and Madasu, V.K., ―Fuzzy Model Based Recognition of Handwritten Hindi Numerals using Bacterial Foraging‖, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007), 11-13 July
2007, Melbourne, Australia.
[7] Al-Omari, F., ―Hand-Written Indian Numerals Recogni- tion System Using Template Matching Approaches‖, ACS/IEEE International Conference on 2001, pp (83-
88), 25-29 Jun 2001.
[8] Sadri, J., Suen, C.Y. and Bui, T.D., ―Application of Support Vector Machines for Recognition of Hand- written Arabic/Persian Digits‖, Toosi Univ. of Tech., 2nd MVIP, Vol.1 pp (300-307), Feb 2003, Tehran, Iran.

[9] Mahmoud, S., ―Recognition of Writer-Independent off- line Handwritten Arabic (Indian) Numerals using Hid- den Markov models‖, Signal Processing, Vol. 88, Is- sue 4, pp (844-857), April, 2008.

[10] Lawal, I.A., Abdel-Aal, R.E., and Mahmoud, S.A.,

―Recognition of Handwritten Arabic (Indian) Numerals

Using Freeman's Chain Codes and Abductive Net­ work Classifiers", International Conference on Pattern Recognition, IEEE Computer Society, pp (1884-

1887), 23-26 Aug 2010