期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2015
卷号:3
期号:11
DOI:10.15680/IJIRCCE.2015.0311068
出版社:S&S Publications
摘要:Discovering frequent item sets is an important key problem in data mining applications, such as thediscovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problemoperate in a bottom-up, breadth-first search direction. The computation starts from frequent itemsets (the minimum lengthfrequent itemsets) and continues until all maximal (length) frequent itemsets were found. During the execution, everyfrequent item set is explicitly considered. A new algorithm is presented which combines both the bottom-up and the topdownsearches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-downdirection. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set.It is used to prune early candidates that would normally encountered in the bottom-up search. A very importantcharacteristic of the algorithm is that it does not require explicit examination of every frequent item set. Therefore thealgorithm performs well even when some maximal frequent item sets are long. As its output, the algorithm produces themaximum frequent set, i.e., the set containing all maximal frequent item sets, thus specifying immediately all frequent itemsets. Pattern-mining algorithm (Max-Miner) presented scales roughly linearly in the number of maximal patterns embeddedin a database irrespective of the length of the longest pattern. In comparison, previous algorithms based on Apriority scaleexponentially with longest pattern length. Experiments on real data show that when the patterns are long, our algorithm ismore efficient by an order of magnitude or more.
关键词:Max-Miner Algorithm; Data Cleaning; Data Access; System ideal state