文章基本信息

标题：Evaluation of Text Clustering Methods Using WordNet
本地全文：下载
作者：Abdelmalek Amine ; Zakaria Elberrichi ; Michel Simonet‎ 等
期刊名称：The International Arab Journal of Information Technology
印刷版ISSN：1683-3198
出版年度：2010
卷号：7
期号：4
出版社：Zarqa Private University
摘要：The increasing number of digitized texts presently available notably on the Web has developed an acute need in text mining ‎techniques. Clustering systems are used more and more often in text mining, especially to analyze texts and to extract ‎knowledge they contain. With the availability of the vast amount of clustering algorithms and techniques, it becomes highly ‎confusing to a user to choose the algorithm that best suits its target dataset. Actually, it is very hard to define which algorithms ‎work the best, since results depend considerably on the application and on the kinds of data at hand. In this paper, we propose, ‎study and compare three text clustering methods: an ascending hierarchical clustering method, a SOM-based clustering ‎method and an ant-based clustering method, all of these based on the synsets of WordNet as terms for the representation of ‎textual documents. The effects of these methods are examined in several experiments using 3 similarity measurements: the ‎cosine distance, the Euclidean distance and the manhattan distance. The reuters-21578 corpus is used for evaluation. The ‎evaluation was done, by using the F-measure. The results obtained show that the SOM-based clustering method using the ‎cosine distance provides the best results.‎
关键词：Text clustering; similarity; WordNet; reuter-21578; and F-measure