首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:Efficient Clustering Algorithms in Text Mining
  • 本地全文:下载
  • 作者:Nataraj Gudapaty ; G Loshma ; Dr. Nagaratna P Hegde
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2012
  • 卷号:3
  • 期号:2
  • 页码:1068-1072
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents. Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics. K-means and PAM (partitioning around mediods) algorithms of text clustering and semantic-based vector space model, a semantic based PAM text clustering model is proposed to solve the problem on high-dimensional and sparse characteristics of text data set. The model reduces the semantic loss of the text data and improves the quality of text clustering. We propose a novel adaptive kernel K-means clustering algorithm and PAM (Partition Around Mediods) algorithm to combine textual content and citation information for clustering. In this text mining process using semantics the comparison betweenK-Means and PAM is done. The time and space complexities of these two algorithms are compared and presented as bar charts and line charts using graphs.
国家哲学社会科学文献中心版权所有