首页    期刊浏览 2025年03月03日 星期一
登录注册

文章基本信息

  • 标题:The Complexity of Aggregates over Extractions by Regular Expressions
  • 本地全文:下载
  • 作者:Doleschal, Johannes ; Bratman, Noa ; Kimelfeld, Benny
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2021
  • 卷号:186
  • 页码:10:1-10:20
  • DOI:10.4230/LIPIcs.ICDT.2021.10
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Regular expressions with capture variables, also known as "regex-formulas", extract relations of spans (intervals identified by their start and end indices) from text. In turn, the class of regular document spanners is the closure of the regex formulas under the Relational Algebra. We investigate the computational complexity of querying text by aggregate functions, such as sum, average, and quantile, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximate computation. More precisely, we show that in a restricted case, all studied aggregate functions can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).
  • 关键词:Information extraction; document spanners; regular expressions; aggregation functions
国家哲学社会科学文献中心版权所有