期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:6
页码:5037-5043
出版社:TechScience Publications
摘要:Outlier detection in high dimensional data becomes an emerging technique in today’s research in the area of data mining. It tries to find entities that are considerably unrelated, unique and inconsistent with respect to the common data in an input database. It faces various challenges because of the increase of dimensionality. Hubness has recently been developed as an important concept and acts as a characteristic for the increase of dimensionality connecting to nearest neighbors. Clustering also shows a vital role in handling high dimensional data and an important tool for outlier detection. This paper establishes a technique where the concept of hubness, especially the antihub (points with low hubness) algorithm is embedded in the resultant clusters obtained from clustering techniques such as K-means and Fuzzy C Means (FCM) to detect the outliers mainly to reduce the computation time. Further, the smaller clusters are treated as an outliers after applying clustering technique. So that they are all taken out before the antihub is applied, which further reduces the computation time. It compares the results of all the techniques by applying it on three different real data sets. The Experimental results demonstrate that when all five algorithms are compared, KCAntihubStage2 provides a significant reduction in computational time than the others and also provides better accuracy when the size of the data set is large. It is concluded that when the Antihub is applied into K-means, and the small clusters are removed, it outperforms well.