期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2016
卷号:86
期号:2
出版社:Journal of Theoretical and Applied
摘要:Pre-processing of document images is the most variant factor from one type of document image to another. In general, especially document images require more intensive pre-processing procedures than other type of images; one of such categories is pre-printed form images. Pre-processing of such documents is different from other type of images containing simple text and free from graphical components. This paper proposes a generic pre-processing algorithm adaptable for pre-printed application form images. The work supports specifically on problem of detection and removal of scratched words inherent in the text, since these elements are interpreted neither by humans nor by machines. The algorithm exploits the features like Euler�s number, number of connected components and area covered by holes with in a text block for detection of scratched out text blocks. The algorithm has yielded reasonably good results with an overall efficacy of around 96.5%.