首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Evaluation of Text Clustering Methods Using WordNet
  • 本地全文:下载
  • 作者:Abdelmalek Amine ; Zakaria Elberrichi ; Michel Simonet‎
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2010
  • 卷号:7
  • 期号:4
  • 出版社:Zarqa Private University
  • 摘要:The increasing number of digitized texts presently available notably on the Web has developed an acute need in text mining ‎techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extract ‎knowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highly ‎confusing to a user to choose the algorithm that best suits its target dataset. Actually, it is very hard to define which algorithms ‎work the best, since results depend considerably on the application and on the kinds of data at hand. In this paper, we propose, ‎study and compare three text clustering methods: an ascending hierarchical clustering method, a SOM-based clustering ‎method and an ant-based clustering method, all of these based on the synsets of WordNet as terms for the representation of ‎textual documents. The effects of these methods are examined in several experiments using 3 similarity measurements: the ‎cosine distance, the Euclidean distance and the manhattan distance. The reuters-21578 corpus is used for evaluation. The ‎evaluation was done, by using the F-measure. The results obtained show that the SOM-based clustering method using the ‎cosine distance provides the best results.‎
  • 关键词:Text clustering; similarity; WordNet; reuter-21578; and F-measure
国家哲学社会科学文献中心版权所有