首页    期刊浏览 2024年12月11日 星期三
登录注册

文章基本信息

  • 标题:Enhancing the Classification Accuracy of Noisy Dataset By Fusing Correlation Based Feature Selection with K-Nearest Neighbour
  • 本地全文:下载
  • 作者:Samir Kumar Singha ; Syed Imtiaz Hassan
  • 期刊名称:Oriental Journal of Computer Science and Technology
  • 印刷版ISSN:0974-6471
  • 出版年度:2017
  • 卷号:10
  • 期号:2
  • 页码:282-290
  • 语种:English
  • 出版社:Oriental Scientific Publishing Company
  • 摘要:The performance of data mining and machine learning tasks can be significantly degraded due to the presence of noisy, irrelevant and high dimensional data containing large number of features. A large amount of real world data consist of noise or missing values. While collecting data, there may be many irrelevant features that are collected by the storage repositories. These redundant and irrelevant feature values distorts the classification principle and simultaneously increases calculations overhead and decreases the prediction ability of the classifier. The high-dimensionality of such datasets possesses major bottleneck in the field of data mining, statistics, machine learning. Among several methods of dimensionality reduction, attribute or feature selection technique is often used in dimensionality reduction. Since the k-NN algorithm is sensitive to irrelevant attributes therefore its performance degrades significantly when a dataset contains missing values or noisy data. However, this weakness of the k-NN algorithm can be minimized when combined with the other feature selection techniques. In this research we combine the Correlation based Feature Selection (CFS) with k-Nearest Neighbour (k-NN) Classification algorithm to find better result in classification when the dataset contains missing values or noisy data. The reduced attribute set decreases the time required for classification. The research shows that when dimensionality reduction is done using CFS and classified with k-NN algorithm, dataset with nil or very less noise may have negative impact in the classification accuracy, when compared with classification accuracy of k-NN algorithm alone. When additional noise is introduced to these datasets, the performance of k-NN degrades significantly. When these noisy datasets are classified using CFS and k-NN together, the percentage in classification accuracy is improved.
  • 关键词:k-Nearest Neighbour ; Correlation based feature selection ; Attribute Selection ; Missing Values ; Dimensionality Reduction
国家哲学社会科学文献中心版权所有