期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2013
卷号:1
期号:10
出版社:S&S Publications
摘要:Initially the first phase derives the Genetic Algorithm for global clustering process to resolve theoptimization solution in both clustering and feature selection. The second phase follows a concept of confusion matrixfor derivative works and improved GA is included for the final classification. The third phase presents the optimizationtechnique to evaluate the cluster optimality for proficient document clustering based on the optimized conceptualfeature words. Final phase introduce a join approach to cluster the web pages which primarily finds the recurrent setsand then clusters the documents. These recurrent sets are generated by using recurrent pattern expansion technique.Then by applying Fuzzy K-Means algorithm on Optimized Web document clustering using Recurrent Set foundsclusters having documents which are extremely related and have related features. Experimental results show that ourapproach is more efficient then the above two join approach and can handle more efficiently in robust nature.Performance evaluation show benefits in terms of cluster optimality, true negative rate and information retrieval on realand UCI repository bag of words dataset.
关键词:Genetic Algorithm; Fuzzy K-Means Algorithm; Recurrent Pattern Expansion; Web document Clustering;Confusion Matrix; Optimization Technique; World Wide Web; Feature Selection