期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2013
卷号:5
期号:06
页码:538-546
出版社:Engg Journals Publications
摘要:Telugu is an ancient historic language. It is spoken by about 84.6 million people of Andhra Pradesh. The script has circular orthography with few horizontal and slant strokes. Huge literature exists for this language in printed form which needs to be preserved by scanning and converting it into editable form. Segmentation of touching characters is a major issue in any OCR system. Segmenting the words into individual glyphs by Connected Component Analysis yields poor results due to touching characters. Touching conjunct consonants is the major component which needs to be properly addressed for improving the accuracy of an OCR system. In this paper an overlapping bounding box approach is presented for segmenting the conjunct consonants along with an algorithm for identifying the correct touching location. An accuracy rate of 91.27% is achieved.