文章基本信息

标题：A Novel Benchmark K-Means Clustering on Continuous Data
本地全文：下载
作者：K. Prasanna ; M. Sankara Prasanna Kumar ; G. Surya Narayana 等
期刊名称：International Journal on Computer Science and Engineering
印刷版ISSN：2229-5631
电子版ISSN：0975-3397
出版年度：2011
卷号：3
期号：8
页码：2974-2977
出版社：Engg Journals Publications
摘要：Cluster analysis is one of the prominent techniques in the field of data mining and k-means is one of the most well known popular and partitioned based clustering algorithms. K-means clustering algorithm is widely used in clustering. The performance of k-means algorithm will affect when clustering the continuous data. In this paper, a novel approach for performing k-means clustering on continuous data is proposed. It organizes all the continuous data sets in a sorted structure such that one can find all the data sets which are closest to a given centroid efficiently. The key institution behind this approach is calculating the distance from origin to each data point in the data set. The data sets are portioned into k-equal number of cluster with initial centroids and these are updated all at a time with closest one according to newly calculated distances from the data set. The experimental results demonstrate that proposed approach can improves the computational speed of the direct k-means algorithm in the total number of distance calculations and the overall time of computations particularly in handling continuous data.
关键词：cluster analysis; data mining; k-means clustering algorithm and continuous data.