文章基本信息

标题：Relevant Data Clustering In Web Search Engine
本地全文：下载
作者：N.NAGAKUMARI ; P.SRIVALLI ; K.SATYA TEJ 等
期刊名称：International Journal of Computer Science and Information Technologies
电子版ISSN：0975-9646
出版年度：2011
卷号：2
期号：5
页码：2464-2466
出版社：TechScience Publications
摘要：As the number of web pages grows in informational retrieval engines we did not find the relavent documents ,so by using clustering concept we can find relavant documents . The main purpose of clustering techniques is to partitionate a set of entities into different groups, called clusters. These groups may be consistent in terms of similarity of its members. As the name suggests, the representative-based clustering techniques uses some form of representation for each cluster. Thus, every group has a member that represents it. The main use is to reducing the cost of the algorithm, the use of representatives makes the process easier to understand. clustering process is done by using k-means algorithm here in k-means there are lot of disadvantages ,it works very slow and it is not applicable for large databases.so fastgreedy k-means algorithm is used, it overcomes the drawbacks of k-means algorithm. but it is a limitation when the algorithm is used for large number of data points, So we introduce an efficient method to Compute the distortion for this algorithm.
关键词：—Document clustering; k-means;Fast k-means;algorithm