期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2010
卷号:2
期号:2
页码:416-423
出版社:Engg Journals Publications
摘要:It is very difficult to over-emphasize the benefits of accurate data. Errors in data are generally the most expensive aspect of data entry, costing the users even much more compared to the original data entry. Unfortunately, these costs are intangibles or difficult to measure. If errors are detected at an early stage then it requires little cost to remove the errors. Incorrect and misleading data lead to all sorts of unpleasant and unnecessary expenses. Unluckily, it would be very expensive to correct the errors after the data has been processed, particularly when the processed data has been converted into the knowledge for decision making. No doubt a stitch in time saves nine i.e. a timely effort will prevent more work at later stage. Moreover, time spent in processing errors can also have a significant cost. One of the major problems with automated data entry systems are errors. In this paper we discuss many well known techniques to minimize errors, different cleansing approaches and, suggest how we can improve accuracy rate. Framework available for data cleansing offer the fundamental services such as attribute selection, formation of tokens, selection of clustering algorithms, selection of eliminator functions etc.
关键词:Data warhouse; data cleansing; data quality; data mining;error detection; data anomalies.