期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2016
卷号:7
期号:5
DOI:10.14569/IJACSA.2016.070548
出版社:Science and Information Society (SAI)
摘要:Noisy training data have a huge negative impact on machine learning algorithms. Noise-filtering algorithms have been proposed to eliminate such noisy instances. In this work, we empirically show that the most popular noise-filtering algorithms have a large False Positive (FP) error rate. In other words, these noise filters mistakenly identify genuine instances as outliers and eliminate them. Therefore, we propose more conservative outlier identification criteria that improve the FP error rate and, thus, the performance of the noise filters. With the new filter, an instance is eliminated if and only if it is misclassified by a mutual decision of Naïve Bayesian (NB) classifier and the original filtering criteria being used. The number of genuine instances that are incorrectly eliminated is reduced as a result, thereby improving the classification accuracy.