期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2012
卷号:3
期号:1Ver4
出版社:Ayushmaan Technologies
摘要:This paper describes the issues and remedies in mining distributed databases. A direct application of sequential algorithms to distributed databases is not effective, as it requires a large amount of communication overhead. In this paper, an efficient algorithm for mining distributed databases is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-C Matrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+1)-item set into a pair of those number formed as-(x,y) to compress the content transmitted and query corresponding support counts in C Matrix. This approach also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that the proposed algorithm has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.s