期刊名称:Journal of Automation, Mobile Robotics & Intelligent Systems (JAMRIS)
印刷版ISSN:1897-8649
电子版ISSN:2080-2145
出版年度:2007
卷号:30
页码:1-50
出版社:Industrial Research Inst. for Automation and Measurements, Warsaw
摘要:The Internet contains a very large number of information sources providing many
types of data from weather forecasts to travel deals and financial information.
These sources can be accessed via Web-forms, Web Services, RSS feeds and so on.
In order to make automated use of these sources, we need to model them
semantically, but writing semantic descriptions for Web Services is both tedious
and error prone. In this paper we investigate the problem of automatically
generating such models. We introduce a framework for learning Datalog
definitions of Web sources. In order to learn these definitions, our system
actively invokes the sources and compares the data they produce with that of
known sources of information. It then performs an inductive logic search through
the space of plausible source definitions in order to learn the best possible
semantic model for each new source. In this paper we perform an empirical
evaluation of the system using real-world Web sources. The evaluation
demonstrates the effectiveness of the approach, showing that we can
automatically learn complex models for real sources in reasonable time. We also
compare our system with a complex schema matching system, showing that our
approach can handle the kinds of problems tackled by the latter.