首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Newsgroup Topic Extraction using Probabilistic Inverse Cluster Frequency Term-Cluster Weighting and Growing Neural Gas Clustering
  • 本地全文:下载
  • 作者:Sigit Adinugroho ; Muh Arif Rahman ; Dahnial Syauqy
  • 期刊名称:IAENG International Journal of Computer Science
  • 印刷版ISSN:1819-656X
  • 电子版ISSN:1819-9224
  • 出版年度:2021
  • 卷号:48
  • 期号:1
  • 语种:English
  • 出版社:IAENG - International Association of Engineers
  • 摘要:Effectiveness of a topic extraction method is depends on the capability to extract information from a large amount of data. This paper proposes a new solution for selecting a set of topics from text documents as well as a new approach to set the weight of a term in clusters of documents. The process of revealing topics from multiple documents starts with a preprocessing step, which aims to omit the unnecessary portion of the text. After that, the weights of terms in documents are calculated based on the term frequency-inverse document frequency method. Then, feature transformation based on singular value decomposition is employed to build weight for clustering. The clustering process is conducted using the Growing Neural Gas method. Finally, to determine the weights of terms in clusters as a way of selecting topics from clusters, the proposed probabilistic inverse cluster frequency term-cluster method is applied. Experiments show that the framework attains satisfactory results indicated by the average accuracy of 0.8606, 0.7406, 0.4039, and 0.6647 for topics obtained from Binary2, Multi5, Multi7, and Multi10 categories of 20Newsgroup dataset.
  • 关键词:topic extraction;growing neural gas clustering;probabilistic inverse cluster frequency term-cluster weighting;feature transformation
国家哲学社会科学文献中心版权所有