期刊名称:Journal of King Saud University @?C Computer and Information Sciences
印刷版ISSN:1319-1578
出版年度:2019
页码:1-9
DOI:10.1016/j.jksuci.2019.08.003
出版社:Elsevier
摘要:Detecting outliers in real-time is increasingly important for many real-world applications such as detecting abnormal heart activity, intrusions to systems, spams or abnormal credit card transactions. However, detecting outliers in data streams rises many challenges such as high-dimensionality, dynamic data distribution and unpredictable relationships. Our simulations demonstrate that some advanced solutions still show drawbacks. In this paper, first, we improve the capacity to detect outliers of both micro-clusters based algorithms (MCOD) and distance-based algorithms (Abstract-C and Exact-Storm) known for their performance. This is by adding a layer called LiCS that classifies online the K-nearest-neighbors (Knn) of each node based on their evolutionary status. This layer aggregates the results and uses a count threshold to better classify nodes. Experiments on SpamBase datasets confirmed that our technique enhances the accuracy and the precision of such algorithm and helps to reduce the unclassified nodes.Second, we propose a hybrid solution based on iterative majority voting and our LiCS. Experiments on real data proves that it outperforms discussed algorithms in terms of accuracy, precision and sensitivity in detecting outliers. It also minimizes the issue of unclassified instances and consolidate the different outputs of algorithms.
关键词:Data streams ; Outlier detection ; High;dimensional data ; Big data mining ; Intrusion detection