期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2011
卷号:2
期号:3
页码:955-959
出版社:TechScience Publications
摘要:Clustering is considered as the task of dividing a data set such that elements within each subset that is similar between themselves and are dissimilar to elements belonging to other subsets. Clustering techniques usually belong to the group of undirected data mining tools; these techniques are also sometimes referred to as “unsupervised learning” because there is no particular dependent or outcome variable to predict. Cluster analysis is most common in any discipline that involves analysis of multivariate data. K-Means is one of the most widely used algorithms in clustering techniques because of its simplicity and performance. The initial centriod for K-Means clustering is generated randomly. The performance of K-Means clustering is highly affected when the dataset used is of high dimension. The accuracy and time complexity is highly affected because of the high dimension data. Hence, the initial centroid provided must be appropriate. For this purpose, the dimensionality reduction technique called Principal Component Analysis (PCA) is used. For better performance, this paper uses the Kernel Principal Component Analysis (KPCA) for deciding the initial centroid. The experimental result shows that the proposed clustering technique results in better accuracy and the time complexity is also reduced.DY>
关键词:K-Means; Principal Component Analysis (PCA);Kernel Principal Component Analysis (KPCA); Centroida