期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2011
卷号:8
期号:5
出版社:IJCSI Press
摘要:Clustering technique is mainly focus on pattern recognition for further organizational design analysis which finds groups of data objects such that objects in a group are similar to one another and dissimilar from the objects in the other group. It is important to preprocess data due to noisy data, errors, inconsistencies, outliers and lack of variable values. Different data preprocessing techniques like cleaning method, outlier detection, data integration and transformation can be carried out before clustering process to achieve successful analysis. Normalization is an important preprocessing step in Data Mining to standardize the values of all variables from dynamic range into specific range. Outliers can significantly affect data mining performance, so outlier detection and removal is an important task in wide variety of data mining applications. k-Means is one of the most well known clustering algorithms yet it suffers major shortcomings like initialize number of clusters and seed values preliminary and converges to local minima. This paper analyzed the performance of modified k-Means clustering algorithm with data preprocessing technique includes cleaning method, normalization approach and outlier detection with automatic initialization of seed values on datasets from UCI dataset repository.