首页    期刊浏览 2024年12月13日 星期五
登录注册

文章基本信息

  • 标题:Text Clustering Using a Suffix Tree Similarity Measure
  • 本地全文:下载
  • 作者:HUANG, Chenghui ; YIN, Jian ; HOU, Fang
  • 期刊名称:Journal of Computers
  • 印刷版ISSN:1796-203X
  • 出版年度:2011
  • 卷号:6
  • 期号:10
  • 页码:2180-2186
  • DOI:10.4304/jcp.6.10.2180-2186
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:In text mining area, popular methods use the bag-of-words models, which represent a document as a vector. These methods ignored the word sequence information, and the good clustering result limited to some special domains. This paper proposes a new similarity measure based on suffix tree model of text documents. It analyzes the word sequence information, and then computes the similarity between the text documents of corpus by applying a suffix tree similarity that combines with TF-IDF weighting method. Experimental results on standard document benchmark corpus RUTERS and BBC indicate that the new text similarity measure is effective. Comparing with the results of the other two frequent word sequence based methods, our proposed method achieves an improvement of about 15% on the average of F-Measure score.
  • 关键词:clustering algorithm;suffix tree;document model;similarity measure
国家哲学社会科学文献中心版权所有