摘要:In K-means clustering, we are given a set of n data points in multidimensional space, and the problem is to determine the number k of clusters. In this paper, we present three methods which are used to determine the true number of spherical Gaussian clusters with additional noise features. Our algorithms take into account the structure of Gaussian data sets and the initial centroids. These three algorithms have their own emphases and characteristics. The first method uses Minkowski distance as a measure of similarity, which is suitable for the discovery of non-convex spherical shape or the clusters with a large difference in size. The second method uses feature weighted Minkowski distance, which emphasizes the different importance of different features for the clustering results. The third method combines Minkowski distance with the best feature factors. We experiment with a variety of general evaluation indexes on Gaussian data sets with and without noise features. The results showed that the algorithms have higher precision than traditional K-means algorithm.