首页    期刊浏览 2024年12月05日 星期四
登录注册

文章基本信息

  • 标题:Near-Duplicates Detection and Elimination Based on Web Provenance for Effective Web Search
  • 本地全文:下载
  • 作者:Y. Syed Mudhasir ; J. Deepika ; S. Sendhilkumar
  • 期刊名称:International Journal on Internet and Distributed Computing Systems
  • 印刷版ISSN:2219-1127
  • 电子版ISSN:2219-1887
  • 出版年度:2011
  • 卷号:1
  • 期号:1
  • 页码:22-32
  • 出版社:IJIDCS Press
  • 摘要:Users of World Wide Web utilize search engines for information retrieval in web as search engines play a vital role in finding information on the web. However, the performance of a web search is greatly affected by flooding of search results with information that is redundant in nature i.e., existence of near-duplicates. Such near-duplicates holdup the other promising results to the users. Many of these near-duplicates are from distrusted websites and/or authors who host information on web. Such near-duplicates may be eliminated by means of Provenance. Thus, this paper proposes a novel approach to identify such near-duplicates based on provenance. In this approach a provenance model has been built using web pages which are the search results returned by existing search engine. The proposed model combines both content based and trust based factors for classifying the results as original or near-duplicates
国家哲学社会科学文献中心版权所有