首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:Unsupervised DNF Blocking for Efficient Linking of Knowledge Graphs and Tables
  • 本地全文:下载
  • 作者:Mayank Kejriwal
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:3
  • 页码:134
  • DOI:10.3390/info12030134
  • 出版社:MDPI Publishing
  • 摘要:Entity Resolution (ER) is the problem of identifying co-referent entity pairs across datasets, including knowledge graphs (KGs). ER is an important prerequisite in many applied KG search and analytics pipelines, with a typical workflow comprising two steps. In the first ’blocking’ step, entities are mapped to blocks. Blocking is necessary for preempting comparing all possible pairs of entities, as (in the second ‘similarity’ step) only entities within blocks are paired and compared, allowing for significant computational savings with a minimal loss of performance. Unfortunately, learning a blocking scheme in an unsupervised fashion is a non-trivial problem, and it has not been properly explored for heterogeneous, semi-structured datasets, such as are prevalent in industrial and Web applications. This article presents an unsupervised algorithmic pipeline for learning Disjunctive Normal Form (DNF) blocking schemes on KGs, as well as structurally heterogeneous tables that may not share a common schema. We evaluate the approach on six real-world dataset pairs, and show that it is competitive with supervised and semi-supervised baselines.
  • 关键词:entity resolution; knowledge graphs; blocking; DNF blocking; heterogeneous linking; table-graph linking entity resolution ; knowledge graphs ; blocking ; DNF blocking ; heterogeneous linking ; table-graph linking
国家哲学社会科学文献中心版权所有