![]() |
![]() Home | Biodata | Biography | Photo Gallery | Publications | Tributes Tamil Studies |
![]() |
Almost every significant progress made by mankind over the past few decades can be attributed to the advent of sophisticated computers. Whether it is the development of nuclear device, explorations of new sources of energy, weather forecasting, planning a township or analysing the public opinion poll results, the use of computers has become inevitable. Apart from the above areas of application, they are also widely used for scientific research.
There have been remarkable developments during the last fifteen years in many areas of computer science, particularly in the field of computer pattern recognition.
What is pattern recognition? To define it in a simple way, patterns are the means by which we understand and interpret the world. A child learns to differentiate the visual patterns of mother and father, speech and music and patterns of senses. The older he grows, the more refined is the pattern recognition process. He manages to discriminate paintings of two different authors and so on. For a mathematician, the detection of an elegant proof for a specific theorem is pattern recognition, and a social scientist finds patterns in the analysis of his data.
When a man glances at a page of printed characters he identifies them correctly without hesitation. For such an identification process, he makes use of the rules he learned from his past experience. He is able to distinguish between ' V ' and ' Y ' in a standard type font. One interesting question that can be asked at this stage is whether the computer can do the same work. If so, to what extent and what will be the percentage of success.
To answer the above question positively, we conducted a simple experiment to demonstrate the method of recognizing by computer the numerals and the four operators (+,
, X, /) used in the arithmetic expressions. We present here the results of the experiment.
Out of the different fonts used by the printers, we have chosen Gill Sans Bold Condensed type numerals for our study. The method used for recognising the numerals and the four operators can be described in a simple way as follows.
Each character is converted into a rectangular binary array in which a '
0 ' (zero) represents a blank, and ' 1 ' represents black. Here the word character is used to denote any numeral from 0 to 9 and the four operators +,, X, /. Each binary matrix is now examined by the computer row by row and the number of runs of 1's is noted. These values form a string of numerals and any one numeral may occur in consecutive positions. The above string is called the row run for the given character. We omit the repeated occurrences of the same numeral in the row run string and form a condensed row run string. Similarly a condensed column run string is formed by scanning the picture matrix column-wise. These two strings together will represent a particular character.
All the numerals and the four operators except two have unique representations. Ties are observed between the numerals 0 and 4. To break the ties we developed another method called the symbolic run method. In this procedure repeated occurrences of the same numeral in the row and column run strings are classified into three types, namely, small, medium and long runs. This symbolic representation for each character is composed of symbolic row runs and symbolic column runs which are obtained by a row-wise and column-wise examination of the picture matrix. In this method we observed unique representations for each character.
Such information is extracted from each of the binary matrices for the ten numerals from 0 to 9 and the four operators. Each character is now represented by a numeral in the memory of the computer. The sample input that is fed into the computer is read character by character. Each input character is reduced to a string pattern using the above described methods and compared with the characters already stored in the memory.
If there is agreement, the numeral or the operator is recognized as that character. The following arithmetic expression is used for the experiment.
The methods described here are quite general in nature and we have been successful in using them for the recognition of printed Tamil characters1 and the mridangarn mnemonics2. They can be suitably modified to identify handwritten characters also.
1 Gift Siromoney, Chandrasekaran R. and Chandrasekaran M.
Computer Recognition of Printed Tamil Characters, Pattern Recognition (forthcoming).
2 Gift Siromoney, Chandrasekaran M., and Chandrasekaran, R., Computer Recognition and
Transliteration of Mridangam Mnemonics, Quarterly Journal of the National Centre for the Performing Arts (forthcoming).
A Sample Pattern and its Representation
0001111100000000 0111111111000000 1111111111100000 1111111111110000 1110000011111000 1100000001111000 0000000000111100 0000000000111100 0000000000111100 0000000000111100 0000000000111100 0000000000111100 0000000001111000 0000000001111000 0000000011110000 0000000011110000 0000000111100000 0000001111100000 0000001111000000 0000011110000000 0000111100000000 0000111100000000 0001111000000000 0011110000000000 0011111111111111 0111111111111111 1111111111111111 1111111111111111 |
Row Run |
1111221111111111111111111111111 |
Condensed Row Run |
121 |
Symbolic Row Run |
S1S2L1 |
Column Run |
2222223333222211 |
Condensed Column Run |
2321 |
Symbolic Column Run |
S2S3S2S1 |