首页    期刊浏览 2025年03月02日 星期日
登录注册

文章基本信息

  • 标题:Dataset Generation for OCR
  • 本地全文:下载
  • 作者:Aparna Vara Lakshmi Vemuri ; T.V.Sai Krishna ; Atul Negi
  • 期刊名称:International Journal of Computer Trends and Technology
  • 电子版ISSN:2231-2803
  • 出版年度:2011
  • 卷号:2
  • 期号:1
  • 出版社:Seventh Sense Research Group
  • 摘要:Telugu is one of the prominent scripts in India and Asia, with more than 62 million speakers. While it is seen that OCR technology is in a mature stage of development for English and other Roman/Latin scripts, the progress of OCR in Asian and particularly Indian scripts is in a relatively nascent stage. One of the reasons is the complexity of the orthography, especially in Telugu. While potentially 10000 syllables are frequently used in the language, the orthographic units are composed by combinations of 36 consonants and 16 vowels. A practical OCR system for Telugu script was proposed and developed by Negi et al [3], where the complexity of Telugu script and methods for its reduction were proposed. Their approach consists of identification and recognition of connected components. Their recognition used a modification to the template matching approach called the fringe distance method proposed by Brown [1]. In this paper we propose an improved and robust recognition strategy which first uses the pixel distributions of the script and later exploits the structural information of Telugu orthography. In this paper we don’t discuss layout related issues for the isolation of Telugu text regions, which is taken up elsewhere
  • 关键词:Dataset Generation for OCR
国家哲学社会科学文献中心版权所有