文章基本信息

标题：Finding the Number of Clusters in Data and Better Initial Centers for K-means Algorithm
本地全文：下载
作者：Ahmed Fahim
期刊名称：International Journal of Intelligent Systems and Applications
印刷版ISSN：2074-904X
电子版ISSN：2074-9058
出版年度：2020
卷号：12
期号：6
页码：1-20
DOI：10.5815/ijisa.2020.06.01
出版社：MECS Publisher
摘要：The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.
关键词：Data clustering;k in k-means;initial centers in k-means;clustering algorithms