首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:A framework for dynamic indexing from hidden web
  • 作者:Hasan Mahmud ; Moumie Soulemane ; Mohammad Rafiuzzaman
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2011
  • 卷号:8
  • 期号:5
  • 出版社:IJCSI Press
  • 摘要:The proliferation of dynamic websites operating on databases requires generating web pages on-the-fly which is too sophisticated for most of the search engines to index. In an attempt to crawl the contents of dynamic web pages, weve tried to come up with a simple approach to index these huge amounts of dynamic contents hidden behind the search forms. Our key contribution in this paper is the design and implementation of a simple framework to index the dynamic web pages and the use of Hadoop MapReduce framework to update and maintain the index. In our approach, from an initial URL, our crawler downloads both the static and dynamic web pages, detects form interfaces, adaptively selects keywords to generate most promising search results, automatically fill-up search form interfaces, submits the dynamic URL and processes the result until some conditions are satisfied.
  • 关键词:Dynamic web pages; crawler; hidden web; index; hadoop.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有