首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:i-CODAS: An Improved Online Data Stream Clustering in Arbitrary Shaped Clusters
  • 本地全文:下载
  • 作者:Md Kamrul Islam ; Md Manjur Ahmed ; Kamal Zuhairi Zamli
  • 期刊名称:Engineering Letters
  • 印刷版ISSN:1816-093X
  • 电子版ISSN:1816-0948
  • 出版年度:2019
  • 卷号:27
  • 期号:4
  • 页码:752-762
  • 出版社:Newswood Ltd
  • 摘要:Nowadays a lot of IT-based applications are generatinghuge data streams continuously, and clustering of thesestreams provide many advantages in data mining. In the fieldof clustering of data stream, density-based technique is themost popular as it is able to generate arbitrary shaped clusterwith high cluster quality in a noisy environment. However,most of the existing density-based algorithms for data streamclustering are either offline or hybrid of offline and onlinephase or can handle only hyper-elliptical clusters. But offlinealgorithms are not good choice for data stream clustering asstoring the data stream is impractical and often the shapeof the cluster is arbitrary rather than regular in data space.Recently, an online clustering method called CODAS has beenproposed where the generated clusters are arbitrary in shape.However, like other existing density based clustering algorithms,the radius of all micro-cluster in CODAS is global and constant.But it is really hard to set the optimal value of micro-clusterradius in practical, and a global radius may not be optimalfor each micro-cluster. An erroneous choice of radius decreasesthe clustering quality remarkably. In this paper, we presentan improved version of CODAS called i-CODAS based onthe concept of maintaining local radius for each micro-clusterindependently. The radius is updated in an online mannertowards its local optimal value as new data sample lies inthe cluster. The data samples are summarized in a metadata,called micro-cluster. The micro-clusters are presented in a clusteringgraph based on the connectivity among micro-clusters.The clustering graph is finally used to generate arbitraryshaped clusters. The performance of the proposed i-CODASis measured and compared with other density-based clusteringalgorithms. The experimental result proves the superiority ofi-CODAS over other clustering algorithms in terms of noisesensitivity, accuracy, purity, processing speed and scalability.
  • 关键词:data stream; online clustering; arbitrary shape;Euclidean distance; cluster graph
国家哲学社会科学文献中心版权所有