期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2017
卷号:95
期号:15
页码:3692
出版社:Journal of Theoretical and Applied
摘要:Optical Character Recognition (OCR) is a very old and of great interest in pattern recognition field. In this paper, a new algorithm based on morphological structure is proposed for Arabic character recognition. Our proposed method uses center of mass calculation. It is invariant with the size, translation and rotation of the target image. In addition, topology-based landmarks like intersection pixels masking the intersection of loops and multiple strokes, as well as end points have been used to compute centers of mass of these points located in the individual quadrants of the circles enclosing the characters. After doing initial pre-processing operations like binarization, resizing, normalization, removing noise, skeletonization, the total number of intersection pixels as well as the total number of end points are determined and stored. The character image is then encircled and divided into four quadrants. The center of mass of the character image as well as the masses of each of its four quadrants are determined and the Euclidean distances (ED) of the intersection and end points in each of the quadrants with the massed are calculated. These quantities are determined for both the target and prototype image and then the best match is achieved with the character having the minimum ED. Results show that the presented method opens up a new direction for dealing with the complex problems of OCR.
关键词:Arabic Character Recognition; OCR; Center of Mass; Geometric-Topological features