期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2012
卷号:1
期号:4
页码:353-357
出版社:Shri Pannalal Research Institute of Technolgy
摘要:The tremendous growth of the World Wide Web has made tools such as search engines and information retrieval systems have become essential. In this dissertation, we propose a fully distributed, peer-to-peer architecture for web crawling. The main goal behind the development of such a system is to provide an alternative but efficient, easily implementable and a decentralized system for crawling, indexing, caching and querying web pages. The main function of a webcrawler is to recursively visit web pages, extract all URLs form the page, parse the page for keywords and visit the extracted URLs recursively. We propose an architecture that can be easily implemeneted on a local (campus) network and which follows a fully distributed, peer-to-peer architecture. The architecture specifications, implementation details, requirements to be met and analysis of such a system is discussed.