首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Weighted Set-Based String Similarity
  • 本地全文:下载
  • 作者:Marios Hadjieleftheriou ; Divesh Srivastava
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2010
  • 卷号:33
  • 期号:01
  • 出版社:IEEE Computer Society
  • 摘要:Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whose similarity to the query is larger than a user specified threshold. Weighted string similarity queries are useful in applications like data cleaning and integration for finding approximate matches in the presence of typographical mistakes, multiple formatting conventions, data transformation errors, etc. We show that this problem has semantic properties that can be exploited to design index structures that support very efficient algorithms for query answering.
国家哲学社会科学文献中心版权所有