文章基本信息

标题：Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM
本地全文：下载
作者：Abdelmalek Amine ; Zakaria Elberrichi ; Michel Simonet 等
期刊名称：INFOCOMP
印刷版ISSN：1807-4545
出版年度：2008
卷号：7
期号：01
页码：27-35
出版社：Federal University of Lavras
摘要：With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom…), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations: a conceptual representation of texts and a representation based on n-grams, instead of a representation based on words. The effects of these combinations are examined in several experiments using 4 measurements of similarity. The Reuters-21578 corpus is used for evaluation. The evaluation was done by using the F-measure and the entropy.
关键词：Text clustering, Self-Organizing Maps of Kohonen, n-grams, concept, similarity, Reuters21578.