首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:DISTRIBUTED AND PROGRESSIVE FEATURE SELECTION ALGORITHM FOR HIGH DIMENSIONAL DATA: A MAP-REDUCE APPROACH
  • 本地全文:下载
  • 作者:CH. RAJA RAMESH ; G. JENA ; K RAGHAVA RAO
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2017
  • 卷号:95
  • 期号:24
  • 页码:7020
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Dimensionality reduction or feature selection is an essential pre-processing step to apply machine learning algorithm further on any data set. But at for medium dimensional datasets it is optional or on-demand requirement. But it is mandatory in high dimensional datasets. Its significance is increased to get the accurate and relevant output from machine learning algorithm. Most of the existing methods are divided into 2 types one is Dimensionality reduction and the other one is feature selection. There is very narrow gap between these two methods. Dimensionality reduction is more mathematical analysis with transformations and may or may not have same subset of features from original features. Feature selection is application of feature engineering and requires domain knowledge. But any algorithm applicable for high dimensional data requires more processing time and storage resources. We considered the processing time as basis for our problem statement and implemented a distributed algorithm for Feature Selection and named as Distributed Progressive Feature selection algorithm with Knn+Relieff for high dimensional data. In this paper applied MapReduce concept to select final sub set of relevant features in progressive manner. Simulation results showthe feature with its weights for various parameters.
  • 关键词:Feature Selection; Dimensionality Reduction; Mappers; Similarity Measures
国家哲学社会科学文献中心版权所有