期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2014
期号:ICETS
页码:298
出版社:S&S Publications
摘要:An increasing number of databaseshave become web accessible through HTML formbasedsearch interfaces. The data units returned fromthe underlying database are usually encoded into theresult pages dynamically for human browsing. Forthe encoded data units to be machine processable,which is essential for many applications such as deepweb data collection and Internet comparisonshopping, they need to be extracted out and assignedmeaningful labels. In this paper, we present anautomatic annotation approach that first aligns thedata units on a result page into different groups suchthat the data in the same group have the samesemantic. Then, for each group we annotate it fromdifferent aspects and aggregate the differentannotations to predict a final annotation label for it.An annotation wrapper for the search site isautomatically constructed and can be used toannotate new result pages from the same webdatabase. Our experiments indicate that the proposedapproach is highly effective. So this paper uses dataalignment, data annotation, web databases andwrapper generation as the term to provide the userwith much better result while they search for theterms.