期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:3
页码:2159-2163
出版社:TechScience Publications
摘要:A web crawler is a software program that browses the web in a very systematic manner. Crawlers are used to create a replica of all the visited web pages that are processed by a search engine that will index the downloaded the pages that help in quick searchers. This is used by the search engine and other users to ensure that their database is up to date. A large number of HTML pages via web pages are continually being added every day and information is constantly changing. There are some web pages which are not directly located by the search engines because today in almost all search engines searchable databases are not properly index able or qyeryable. So they appear hidden to the average internet user. These pages are referred to as the Hidden Web or the Deep Web. In world wild web the huge amount of information is available only through surface web. The deep web is the largest growing area of now days of information on the internet. This paper briefly studies the concepts of web crawler, their type, and architecture for searching the hidden web documents. The various category of web crawler with working is also taken for the study and provide some future directions for research on web crawling for searching hidden web.