首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:A Proposal for Recommendation of Feature Selection Algorithm based on Data Set Characteristics
  • 本地全文:下载
  • 作者:Saptarsi Goswami ; Amlan Chakrabarti ; Basabi Chakraborty
  • 期刊名称:Journal of Universal Computer Science
  • 印刷版ISSN:0948-6968
  • 出版年度:2016
  • 卷号:22
  • 期号:6
  • 页码:760-781
  • DOI:10.3217/jucs-022-06-0760
  • 出版社:Graz University of Technology and Know-Center
  • 摘要:Feature selection is an important prerequisite of any pattern recognition, machine learning or data mining problem. A lot of algorithms for feature subset selection have been developed so far for reduction of dimensionality of the data set in order to achieve high recognition accuracy with low computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some effect on the algorithm for feature selection. In this work, the data set characteristics is studied for recommendation of appropriate feature selection algorithm to be used for a particular data set. A new proposal in terms of intra attribute relationship and a measure MVS (multivariate score) has been introduced to quantify and group different data sets on the basis of the data set correlation structure into several categories. The measure is used to group 63 publicly available bench mark data set according to their characteristics. The performance of different feature selection algorithms on different groups of data are then studied by simulation experiments to verify the relationship o f data set characteristics and the feature selection algorithm. The effect of some other data set characteristics has also been studied. Finally a framework of recommendation regarding the choice of proper feature selection algorithm has been indicated.
  • 关键词:correlation structure; data set characteristics; feature selection algorithm; multivariate score
国家哲学社会科学文献中心版权所有