期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2019
卷号:97
期号:11
页码:3077-3097
出版社:Journal of Theoretical and Applied
摘要:Classification methods can be used to derive values from big data in the form of models, which then can be utilized to predict new cases. Several parallel classification methods for big data have been developed based on Hadoop MapReduce as well as for Spark system. As big data keeps on coming, the models must be updated from time to time to represent the old as well as the new data. The computations must be efficient and scalable for handling big data. This research aims to enhance the existing parallel classifiers such that they will perform as incremental classifier handling batches of big data. The research results are presented as follows. First, the architecture and main concept of the enhancement is presented. Secondly, the proposed incremental parallel Na�ve Bayes classifier (NBC) based on MapReduce that handles dataset with discrete attributes is discussed in detailed. Two series of experiment were performed on Hadoop clusters with 5 and 10 nodes. The results show that the incremental parallel NBC has acceptable accuracy, is efficient and scalable.
关键词:Big Data Classification Method; Incremental Parallel Classifier; Mapreduce Patterns