期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2020
卷号:11
期号:10
DOI:10.14569/IJACSA.2020.0111001
出版社:Science and Information Society (SAI)
摘要:In this paper, advancing web scale knowledge extraction and alignment by integrating few sources has been considered by exploring different methods of aggregation and attention in order to focus on image information. An improved model, namely, Wrapper Extraction of Image using DOM and JSON (WEIDJ) has been proposed to extract images and the related information in fastest way. Several models, such as Document Object Model (DOM), Wrapper using Hybrid DOM and JSON (WHDJ), WEIDJ and WEIDJ (no-rules) are been discussed. The experimental results on real world websites demonstrate that our models outperform others, such as Document Object Model (DOM), Wrapper using Hybrid DOM and JSON (WHDJ) in terms of mining in a higher volume of web data from a various types of image format and taking the consideration of web data extraction from deep web.
关键词:Data extraction; Document Object Model; web data extraction; Wrapper using Hybrid DOM and JSON; Wrapper Extraction of Image using DOM and JSON