期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2015
卷号:82
期号:2
出版社:Journal of Theoretical and Applied
摘要:Character Recognition in ancient document images remains a challenging task. Initial scanning process deforms the document image, while aging process of document render it ancient which turns it to posses unwanted background noise. Segmentation includes an essential process in OCR. Complex scripts like derivatives of Brahmi, encounter various problems in the segmentation process. A hybrid model that entails segmentation in noisy images followed by binarization is proposed. In the first phase, segmentation technique for the ancient Telugu document image into meaningful units is proposed. Horizontal profile pattern is convolved with Gaussian kernel. The statistical properties of meaningful units are explored through an extensive analysis of the geometrical patterns of meaningful units. In the second phase, noisy documents are cleaned with the help of Modified IGT algorithm and then segmented by using conventional profile mechanism. The performance of the present hybrid technique is proved by the results of higher efficiencies for the cleaned documents. The efficiency analysis of segmentation carried out for the present hybrid technique reveals a threshold number of Vowels (V), Consonants(C), CV core characters to exhibit higher efficiencies. It also reflects upon the non-canonical features of any other marks of the Telugu document.