Home | Biodata | Biography | Photo Gallery | Publications | Tributes
[Back to Tamil Studies List]

Tamil Studies


Computer Recognition of Printed Numerals: 
A Study in Artificial Intelligence

Madras Christian College Magazine, Vol.XLVII, 1978
M. Chandrasekaran, R. Chandrasekaran  and Gift Siromoney

Almost every significant progress made by mankind over the past few decades can be attributed to the advent of sophisticated computers. Whether it is the development of nuclear device, explorations of new sources of energy, weather forecasting, planning a township or analysing the public opinion poll results, the use of computers has become inevitable. Apart from the above areas of application, they are also widely used for scientific research.

There have been remarkable developments during the last fifteen years in many areas of computer science, particularly in the field of computer pattern recognition.

What is pattern recognition? To define it in a simple way, patterns are the means by which we understand and interpret the world. A child learns to differentiate the visual patterns of mother and father, speech and music and patterns of senses. The older he grows, the more refined is the pattern recognition process. He manages to discriminate paintings of two different authors and so on. For a mathematician, the detection of an elegant proof for a specific theorem is pattern recognition, and a social scientist finds patterns in the analysis of his data.

When a man glances at a page of printed characters he identifies them correctly without hesitation. For such an identification process, he makes use of the rules he learned from his past experience. He is able to distinguish between ' V ' and ' Y ' in a standard type font. One interesting question that can be asked at this stage is whether the computer can do the same work. If so, to what extent and what will be the percentage of success.

To answer the above question positively, we conducted a simple experiment to demonstrate the method of recognizing by computer the numerals and the four operators (+, , X, /) used in the arithmetic expressions. We present here the results of the experiment.

Out of the different fonts used by the printers, we have chosen Gill Sans Bold Condensed type numerals for our study. The method used for recognising the numerals and the four operators can be described in a simple way as follows.

Each character is converted into a rectangular binary array in which a ' 0 ' (zero) represents a blank, and ' 1 ' represents black. Here the word character is used to denote any numeral from 0 to 9 and the four operators +,, X, /. Each binary matrix is now examined by the computer row by row and the number of runs of 1's is noted. These values form a string of numerals and any one numeral may occur in consecutive positions. The above string is called the row run for the given character. We omit the repeated occurrences of the same numeral in the row run string and form a condensed row run string. Similarly a condensed column run string is formed by scanning the picture matrix column-wise. These two strings together will represent a particular character.

All the numerals and the four operators except two have unique representations. Ties are observed between the numerals 0 and 4. To break the ties we developed another method called the symbolic run method. In this procedure repeated occurrences of the same numeral in the row and column run strings are classified into three types, namely, small, medium and long runs. This symbolic representation for each character is composed of symbolic row runs and symbolic column runs which are obtained by a row-wise and column-wise examination of the picture matrix. In this method we observed unique representations for each character.

Such information is extracted from each of the binary matrices for the ten numerals from 0 to 9 and the four operators. Each character is now represented by a numeral in the memory of the computer. The sample input that is fed into the computer is read character by character. Each input character is reduced to a string pattern using the above described methods and compared with the characters already stored in the memory.

If there is agreement, the numeral or the operator is recognized as that character. The following arithmetic expression is used for the experiment.

1870 X 15 + 63
There are ten characters in the above arithmetic expression and our interest is to make the computer recognize the characters and to get the result after evaluation of the expression in the form of picture matrices. For example the number ' 1870 ' is identified as a string composed of four different numerals. The computer is made to recognize it as ' one thousand eight hundred and seventy ' as soon as it encounters an operator immediately following a numeral. Similarly the rest of the arithmetic expression is evaluated and the final result ' 28113 ' is obtained in the form of pictures.

The methods described here are quite general in nature and we have been successful in using them for the recognition of printed Tamil characters1 and the mridangarn mnemonics2. They can be suitably modified to identify handwritten characters also.


1 Gift Siromoney, Chandrasekaran R. and Chandrasekaran M. Computer Recognition of Printed Tamil Characters, Pattern Recognition (forthcoming).
2 Gift Siromoney, Chandrasekaran M., and Chandrasekaran, R., Computer Recognition and Transliteration of Mridangam Mnemonics, Quarterly Journal of the National Centre for the Performing Arts (forthcoming).

A Sample Pattern and its Representation

0001111100000000
0111111111000000
1111111111100000
1111111111110000
1110000011111000
1100000001111000
0000000000111100
0000000000111100
0000000000111100
0000000000111100
0000000000111100
0000000000111100
0000000001111000
0000000001111000
0000000011110000
0000000011110000
0000000111100000
0000001111100000
0000001111000000
0000011110000000
0000111100000000
0000111100000000
0001111000000000
0011110000000000
0011111111111111
0111111111111111
1111111111111111
1111111111111111

Row Run

1111221111111111111111111111111

Condensed Row Run

121

Symbolic Row Run

S1S2L1

Column Run

2222223333222211

Condensed Column Run

2321

Symbolic Column Run

S2S3S2S1

Go to the top of the page

Home | Biodata | Biography | Photo Gallery | Publications | Tributes