期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2017
卷号:6
期号:8
页码:17472
DOI:10.15680/IJIRSET.2017.0608276
出版社:S&S Publications
摘要:The web is a collection of documents which contains more information like textual content, images,audio, video, etc. A web page is a document which can be suitable for web browsers like Mozila Firefox, GoogleChrome, Microsoft Internet Explorer, etc. A website is a collection of web pages which are grouped and connectedtogether in various ways. At present, the numbers of web pages on the World Wide Web are increasing significantly.Many search engines like Google, Yahoo, and Bing are available to users to find the specified web page. Generally thesearch engines work through keyword inputs. However, web pages are retrieved in this manner usually include invalidlinks and irrelevant web pages. Hence, there is a need for categorizing web pages or web documents to facilitate theindexing, searching and retrieving of web pages and it also useful for finding its category. Classification plays a vitalrole in many information retrieval tasks. Web page classification is an essential for crawling, the development of webdirectories, topic-specific web link analysis, contextual advertising, and analyzing of the topical structure of the Web. Itcan also help to improve the quality of Web search. The main problem in web mining is to classify the web pages and itcan be done using optimization algorithms, classification algorithms, feature selection and feature extractionalgorithms. Web mining techniques are used to fetch knowledge from web data. Web mining is used to extract relevantinformation from web pages or web documents. Detailed study and survey have been conducted and this paperprovides the basic concepts of web page classification, different algorithms and techniques used for web pageclassification, web directories, types, approaches, applications, features and research issues.
关键词:Web Mining; Web Page Classification; Web Directories; Research Issues