期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:1
页码:891
DOI:10.15680/IJIRCCE.2017.0501187
出版社:S&S Publications
摘要:Data mining is the extraction of hidden predictive information from large databases, is a powerful newtechnology with great potential to help companies as well as research focus on the most important information in theirdata warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive,knowledge-driven decisions. Frequent Itemset Mining is one of the classical data mining problems in most of the datamining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor’smemory and CPU are very limited, which degrades the performance of algorithm. In this paper we have proposed onesuch distributed algorithm which will run on Hadoop – one of the recent most popular distributed frameworks whichmainly focus on mapreduce paradigm. The proposed approach takes into account inherent characteristics of the Apriorialgorithm related to the frequent itemset generation and through a block-based partitioning uses a dynamic workloadmanagement. The algorithm greatly enhances the performance and achieves high scalability compared to the existingdistributed Apriori based approaches. Proposed algorithm is implemented and tested on large scale datasets distributedover a cluster.