文章基本信息

标题：Attribute Selection via a Novel Interval Based Evaluation Algorithm: Applied on Real life data sets
本地全文：下载
作者：Mostafa A. Salama ; Mostafa A. Salama ; Ghada Hassan 等
期刊名称：MATEC Web of Conferences
电子版ISSN：2261-236X
出版年度：2016
卷号：76
页码：1-7
DOI：10.1051/matecconf/20167604030
语种：English
出版社：EDP Sciences
摘要：Real life problems handled by machine learning deals with various forms of values in the data set attributes, like the continuous and discrete form. Discretization is an important step in the pre-processing stage as most of the attribute selection techniques assume the discreetness of the input values. This step could change the internal structure of the input attribute values with respect to the classification problem, and thus the quality of this step directly impact the quality of the selected features. This work discusses the problems existing in the current discretization techniques and proposes an attribute evaluation and selection technique to avoid these problems. Attributes are evaluated in its continuous form directly without biasing its internal structure and enhances the computational complexity by eliminating the discretization step. The basic insight of the proposed approach relies on the inverse relationship between class label distribution overlap and the relative information content of a given attribute. In order to estimate the validity of this assumption, a series of data sets were examined using several standard approaches including our own implementation, and the approaches ranked with respect to the overall classification accuracy. The results, at least with respect to the testing data sets deployed in this study, indicate that the proposed approach outperformed other methods selected for evaluation in this study. These results will be examined over a wider range of continuous attribute data sets from nonmedical domains in order to investigate the robustness of these results.