期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2011
卷号:11
期号:6
页码:146-151
出版社:International Journal of Computer Science and Network Security
摘要:Record linkage is the process of identifying if two records represent the same real entity or not. Record Linkage is one of the most important and most investigated issue in data quality literature. Most of the current researches have been applied on English context and these researches didn��t mention the required modifications in order to be applicable in other contexts like Arabic context. Applying record linkage algorithms on Arabic context is a challenging task due to the unique characteristics of Arabic language in terms of its morphological and orthographical features. This paper proposed a token based framework for record linkage in Arabic data set. In our framework we use a new technique for Arabic name tokenization and use a new approach for similarity computation.
关键词:Arabic Data Cleaning; Data Quality; Duplicate Detection; Data warehouse; Entity Resolution; Record Linkage; Object Identification; String Similarity