首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Title Based Duplicate Detection of Web Documents
  • 本地全文:下载
  • 作者:Mrs. M. Kiruthika ; Mrs. Smita Dange ; Mrs. P. Sandhya
  • 期刊名称:International Journal of Electronics and Computer Science Engineering
  • 电子版ISSN:2277-1956
  • 出版年度:2012
  • 卷号:1
  • 期号:4
  • 页码:2084-2094
  • 出版社:Buldanshahr : IJECSE
  • 摘要:In recent times, the concept of web crawling has received remarkable significance owing to extreme development of the World Wide Web. Very large amounts of web documents are swarming the web making the search engines less appropriate to the users. Among the vast number of web documents are many duplicates and near duplicates i.e. variants derived from the same original web document due to which additional overheads are created for search engines by which their performance and quality is significantly affected. Web crawling research community has extensively recognized the need for detection of duplicate and near duplicate web pages. Providing the users with relevant results for their queries in the first page without duplicates and redundant results is a vital requisite. Also, this problem of duplication should be avoided to save storage as well as to improve search quality. The near duplicate web pages are detected followed by the storage of crawled web pages in to repositories. The detection of near duplicates conserves network bandwidth, brings down storage cost and enhances the quality of search engines. In this paper, we have discussed a feasible method for detection of near-duplicate web documents based on the title of the documents which will help to reduce the overhead of search engines and improve their performance.
  • 关键词:Watermarking; Haar Wavelet; DWT; PSNR
国家哲学社会科学文献中心版权所有