文章基本信息

标题：Learning Semantic Definitions of Online Information Sources
本地全文：下载
作者：Mark J. Carman ; Craig A. Knoblock
期刊名称：Journal of Automation, Mobile Robotics & Intelligent Systems (JAMRIS)
印刷版ISSN：1897-8649
电子版ISSN：2080-2145
出版年度：2007
卷号：30
页码：1-50
出版社：Industrial Research Inst. for Automation and Measurements, Warsaw
摘要：The Internet contains a very large number of information sources providing many types of data from weather forecasts to travel deals and financial information. These sources can be accessed via Web-forms, Web Services, RSS feeds and so on. In order to make automated use of these sources, we need to model them semantically, but writing semantic descriptions for Web Services is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions of Web sources. In order to learn these definitions, our system actively invokes the sources and compares the data they produce with that of known sources of information. It then performs an inductive logic search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. In this paper we perform an empirical evaluation of the system using real-world Web sources. The evaluation demonstrates the effectiveness of the approach, showing that we can automatically learn complex models for real sources in reasonable time. We also compare our system with a complex schema matching system, showing that our approach can handle the kinds of problems tackled by the latter.