期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:8
页码:406-415
出版社:Science and Information Society (SAI)
摘要:Arabic script is inherently cursive, even when
machine-printed. When connected to other characters, some
Arabic characters may be optionally written in compact aesthetic
forms known as ligatures. It is useful to distinguish ligatures
from ordinary characters for several applications, especially
automatic text recognition. Datasets that do not annotate these
ligatures may confuse the recognition system training. Some
popular datasets manually annotate ligatures, but no dataset
(prior to this work) took ligatures into consideration from the
design phase. In this paper, a detailed study of Arabic ligatures
and a design for a dataset that considers the representation of
ligative and unligative characters are presented. Then, pilot data
collection and recognition experiments are conducted on the
presented dataset and on another popular dataset of handwritten
Arabic words. These experiments show the benefit of annotating
ligatures in datasets by reducing error-rates in character
recognition tasks.
关键词:Arabic ligatures; automatic text recognition;
handwriting datasets; Hidden Markov Models