首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Semantic Clustering for Large-Scale Documents.doc
  • 本地全文:下载
  • 作者:Ming Liu
  • 期刊名称:Computer and Information Science
  • 印刷版ISSN:1913-8989
  • 电子版ISSN:1913-8997
  • 出版年度:2010
  • 卷号:3
  • 期号:1
  • 页码:91
  • DOI:10.5539/cis.v3n1p91
  • 出版社:Canadian Center of Science and Education
  • 摘要:Normal 0 7.8 ? 0 2 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:????; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-fareast-font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} Along with explosion of information, how to cluster large-scale documents has become more and more important. This paper proposes a novel document clustering algorithm (CLCL) to solve this problem. This algorithm first constructs lexical chains from feature space to reflect different topics which input documents contain, and documents also can be separated into clusters by these lexical chains. However, this separation is too rough. So, idea of self organizing mapping is used to optimize cluster partition. For agglomerating documents with semantic similarities into one cluster, influences from similar features are also considered. Experiments demonstrate that because effects of semantic similarities between different documents are considered, CLCL has better performance than traditional document clustering algorithms.
国家哲学社会科学文献中心版权所有