首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Document Similarity Detection using K-Means and Cosine Distance
  • 本地全文:下载
  • 作者:Wendi Usino ; Anton Satria Prabuwono ; Khalid Hamed S. Allehaibi
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2019
  • 卷号:10
  • 期号:2
  • 页码:165-170
  • DOI:10.14569/IJACSA.2019.0100222
  • 出版社:Science and Information Society (SAI)
  • 摘要:A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
  • 关键词:K-means; cosine distance; cluster; document similarity; document frequency; inverse document frequency; preprocessing; vector space model
国家哲学社会科学文献中心版权所有