首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:A SEMI-STRUCTURED TEXTS CLUSTERING ALGORITHM
  • 本地全文:下载
  • 作者:ZHANG PEI YUN ; CHEN EN HONG ; HUANG BO
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2013
  • 卷号:50
  • 期号:3
  • 出版社:Journal of Theoretical and Applied
  • 摘要:In order to improve the clustering result of semi-structured texts, it needs to reduce the dimension and sparsity. To reduce the dimensions of semi-structured texts clustering, aimed at meta-data of semi-structured texts, we build the metadata feature vectors. Based on the domain concepts model, we build domain vector based on the domain concepts tree (set). With the help of the WordNet, we compute semantic similarity between the metadata feature vector and the domain vector. Finally, the clustering algorithm is designed to cluster semi-structured texts based on the semantic similarity between metadata feature vectors and domain vectors. The analysis shows that the clustering algorithm is feasible and has higher clustering accurate rate. It can ease the problem of lacking domain ontology and has the ability to improve the clustering quality.
  • 关键词:Domain Concepts Model; Metadata; Semi-Structured Texts; Clustering; Semantic Similarity
国家哲学社会科学文献中心版权所有