首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler
  • 本地全文:下载
  • 作者:A.K. Sharma, Ashutosh Dixit
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2008
  • 卷号:8
  • 期号:12
  • 页码:349-354
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:

    Due to the deficiency in their refresh techniques [12], current crawlers add unnecessary traffic to the already overloaded Internet. Moreover there exist no certain ways to verify whether a document has been updated or not. In this paper, an efficient approach is being proposed for building an effective incremental web crawler [13]. It selectively updates its database and/ or local collection of web pages instead of periodically refreshing the collection in batch mode thereby improving the “freshness” of the collection significantly and bringing new pages in more timely manner. It also detects web pages which frequently undergo up-dation and dynamically calculates the refresh time of the page for its next update.

  • 关键词:

    World Wide Web, Search engine, Incremental Crawler, Hypertext, Browser

国家哲学社会科学文献中心版权所有