期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2018
卷号:96
期号:4
页码:1164
出版社:Journal of Theoretical and Applied
摘要:Big Data is a huge amount of high dimensional data, which is produced from different sources. Big Data dimensionality is a considerable challenge in data processing applications. In this paper, we proposed a framework for handling Big Data dimensionality based on MapReduce parallel processing and FuzzyRough for feature selection. This paper proposes a new method for selecting features based on fuzzy similarity relations. The initial experimentation shows that it reduces dimensionality and enhances classification accuracy. The proposed framework consists of three main steps. The first is the preprocessing data step. As for the next two steps, they are a map and reduce steps, which belong to MapReduce concept. In map step, FuzzyRough is utilized for selecting features. In reduce step, the fuzzy similarity is presented for reducing the extracted features. In our experimental results, the proposed framework achieved 86.4% accuracy by using decision tree technique, while the accuracy of the previous frameworks, which are performed on the same data set, achieved accuracy between 70 to 80%.