期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:8
DOI:10.14569/IJACSA.2021.0120886
语种:English
出版社:Science and Information Society (SAI)
摘要:In recent years, the emergence of WWW (World Wide Web) led to the accumulation of huge amount of information and data. Hence the web is found to consist of unstructured and structured information that impacts the day to day life of the society. Because of such availability of huge information, utilization of the required information becomes more challenging. This paper provided a comprehensive survey on the current situation and recent trends on web content mining (WCM) and its applications thereby contributing to the enhancement of the upcoming research in WCM. The paper focused mainly on the mining and retrieval techniques, various WCM approaches, challenges and process of information retrieval and information extraction. The paper describes the four major tasks of web content mining that is information retrieval, information extraction, generalization and validation in detail. WCM concentrates on orchestrating, sorting, classifying, collecting, congregating of web data and provide the improved data which can be easily accessed by the users. Web content mining tools were needed to scan text, images and HTML documents and provide results to the search engine. It guides the search engine to provide better productive results for every search based on their importance. The paper also analysed different web content mining tools for the extraction of relevant information from the corresponding web page.
关键词:Web content mining; web structure mining; web usage mining; data mining; information retrieval; information extraction.