首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Improving the Performance of K-Means Clustering For High Dimensional Data Set
  • 本地全文:下载
  • 作者:P.Prabhu ; N.Anbazhagan
  • 期刊名称:International Journal on Computer Science and Engineering
  • 印刷版ISSN:2229-5631
  • 电子版ISSN:0975-3397
  • 出版年度:2011
  • 卷号:3
  • 期号:6
  • 页码:2317-2322
  • 出版社:Engg Journals Publications
  • 摘要:Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, impossible to enumerate. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed by efficient dimensionality reduction methods such as Principal Component Analysis (PCA).Cluster analysis in high-dimensional data as the process of fast identification and efficient description of clusters. The clusters have to be of high quality with regard to a suitably chosen homogeneity measure. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroids. There is a difficulty in comparing quality of the clusters produced Different initial partitions can result in different final clusters. Hence in this paper we proposed to use the Principal component Analysis method to reduce the data set from high dimensional to low dimensional. The new method is used to find the initial centroids to make the algorithm more effective and efficient. By comparing the result of original and proposed method, it was found that the results obtained from proposed method are more accurate.
  • 关键词:Clustering ; k-means; principal component analysis; dimension reduction; initial centroid
国家哲学社会科学文献中心版权所有