首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Optimal lower bounds for locality sensitive hashing (except when q is tiny)
  • 本地全文:下载
  • 作者:Ryan O'Donnell ; YI WU ; Yuan Zhou
  • 期刊名称:Electronic Colloquium on Computational Complexity
  • 印刷版ISSN:1433-8092
  • 出版年度:2009
  • 卷号:2009
  • 出版社:Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
  • 摘要:

    We study lower bounds for Locality Sensitive Hashing (LSH) in the strongest setting: point sets in 01d under the Hamming distance. Recall that is said to be an (rcrpq) -sensitive hash family if all pairs xy01d with dist(xy)r have probability at least p of collision under a randomly chosen h, whereas all pairs xy01d with dist(xy)cr have probability at most q of collision. Typically, one considers d, with 1">c1 fixed and q bounded away from 0.

    For its applications to approximate nearest neighbor search in high dimensions, the quality of an LSH family is governed by how small its ``rho parameter'' =ln(1p)ln(1q) is as a function of the parameter c. The seminal paper of Indyk and Motwani showed that for each c1, the extremely simple family =xxi:id achieves 1c . The only known lower bound, due to Motwani, Naor, and Panigrahy, is that must be at least (e1c−1)(e1c+1)46c (minus od(1)).

    In this paper we show an optimal lower bound: must be at least 1c (minus od(1)). This lower bound for Hamming space yields a lower bound of 1c2 for Euclidean space (or the unit sphere) and 1c for the Jaccard distance on sets; both of these match known upper bounds. Our proof is simple; the essence is that the noise stability of a boolean function at e−t is a log-convex function of t.

    Like the Motwani--Naor--Panigrahy lower bound, our proof relies on the assumption that q is not ``tiny'', meaning of the form 2−(d). Some lower bound on q is always necessary, as otherwise it is trivial to achieve =0 . The range of q for which our lower bound holds is the same as the range of q for which accurately reflects an LSH family's quality. Still, we conclude by discussing why it would be more satisfying to find LSH lower bounds that hold for tiny q .

  • 关键词:Locality Sensitive Hashing; LSH; noise sensitivity; noise stability
国家哲学社会科学文献中心版权所有