首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016
  • 本地全文:下载
  • 作者:Wishnu Hardi ; Wisnu Ananta Kusuma ; Sulistyo Basuki
  • 期刊名称:Berkala Ilmu Perpustakaan dan Informasi
  • 印刷版ISSN:1693-7740
  • 电子版ISSN:2477-0361
  • 出版年度:2019
  • 卷号:15
  • 期号:2
  • 页码:226-238
  • DOI:10.22146/bip.36451
  • 出版社:Universitas Gadjah Mada
  • 摘要:Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters.Data Analysis. Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.
  • 关键词:Text mining; document clustering; K-Means algorithm; Cosine Similarity
国家哲学社会科学文献中心版权所有