期刊名称:International Journal of Hybrid Information Technology
印刷版ISSN:1738-9968
出版年度:2015
卷号:8
期号:10
页码:63-72
DOI:10.14257/ijhit.2015.8.10.06
出版社:SERSC
摘要:Subspace outlier mining has a very important significance in big data analysis. To a large extent, subspace clustering algorithm has impact on the efficiency of mining outliers in subspaces. To solve the problem that CMI method selects best clustering subspaces unstably and complexly, formulas of chain rule of Cumulative Entropy, Cumulative Total Correlation and Cumulative Holoentropy were given. Cumulative Holoentropy was used to mine the best clustering subspaces on continuous data sets in which outliers were detected. Subspace outlier detection algorithm based on Cumulative Holoentropy was then proposed. Finally, the validity and scalability of proposed method were tested on real datasets and virtual datasets. Experiment shows that the efficiency of mining outliers in subspaces is enhanced by the proposed algorithm.
关键词:Big Data Analysis; Outlier Detection; Subspace Clustering; Cumulative ; holoentropy