期刊名称:TELKOMNIKA (Telecommunication Computing Electronics and Control)
印刷版ISSN:2302-9293
出版年度:2018
卷号:16
期号:2
页码:834-842
DOI:10.12928/telkomnika.v16i2.7669
语种:English
出版社:Universitas Ahmad Dahlan
其他摘要:Data warehouse is a collective entity of data from various data sources. Data are prone to several complications and irregularities in data warehouse. Data cleaning service involves identification of errors, removing them and improve the quality of data. Data cleaning service is non trivial activity to ensure data quality. One of the common methods is duplicate elimination. This research focuses on the service of duplicate elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology, involved stages and services within data warehouse environment. It also provides a comparison through some experiments on different duplicate elimination services based on different spelling on different pronunciation, misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. In addition, the comparison also includes the evaluation of performance for each service based on the required response time, memory load and CPU time, so that in the future these services are reliable to handle big data in data warehouse.