期刊名称:Current Journal of Applied Science and Technology
印刷版ISSN:2457-1024
出版年度:2018
卷号:26
期号:1
页码:1-14
语种:English
出版社:Sciencedomain International
摘要:Information retrieval and decision-making demand a scalable and efficient methodto process and extract relevant information from Big Data. Data mining is a refined analysis of a large quantity of data to determine new information in the outline of patterns, trends, and relations. With the spread of the World Wide Web, the quantity of data stored and made available electronically has increased enormously, and methods to retrieve information from such big data have gained immense significance for both business and scientific research communities. Frequent Itemset Mining is one of the most extensively applied procedures to retrieve useful information from data. However, when this method is applied to Big Data, the combinatorial outburst of candidate itemsets has become a challenge. Recent developments in the area of parallel programming have offered outstanding tools to overcome this problem. Nevertheless, these tools have their own technical drawbacks, e.g. unbiased data sharing and inter-communication costs. In our study, we examine the applicability of Frequent Itemset Mining in the MapReduce framework. We introduce a new method for extracting large datasets: Big-Frequent-ItemsetMining. This method is optimized to run on extremely large datasets. Our approach is similar to FP-growth but uses a different data structure that is based on an algebraic topology. In this study, we demonstrate the scalability of our techniques.
关键词:Data mining;frequent itemsets;association rules;big data sets;frequent pattern mining;map reduce