We study lower bounds for Locality Sensitive Hashing (LSH) in the strongest setting: point sets in 01d under the Hamming distance. Recall that is said to be an (rcrpq) -sensitive hash family if all pairs xy01d with dist(xy)r have probability at least p of collision under a randomly chosen h, whereas all pairs xy01d with dist(xy)cr have probability at most q of collision. Typically, one considers d, with 1">c1 fixed and q bounded away from 0.
For its applications to approximate nearest neighbor search in high dimensions, the quality of an LSH family is governed by how small its ``rho parameter'' =ln(1p)ln(1q) is as a function of the parameter c. The seminal paper of Indyk and Motwani showed that for each c1, the extremely simple family =xxi:id achieves 1c . The only known lower bound, due to Motwani, Naor, and Panigrahy, is that must be at least (e1c−1)(e1c+1)46c (minus od(1)).
In this paper we show an optimal lower bound: must be at least 1c (minus od(1)). This lower bound for Hamming space yields a lower bound of 1c2 for Euclidean space (or the unit sphere) and 1c for the Jaccard distance on sets; both of these match known upper bounds. Our proof is simple; the essence is that the noise stability of a boolean function at e−t is a log-convex function of t.
Like the Motwani--Naor--Panigrahy lower bound, our proof relies on the assumption that q is not ``tiny'', meaning of the form 2−(d). Some lower bound on q is always necessary, as otherwise it is trivial to achieve =0 . The range of q for which our lower bound holds is the same as the range of q for which accurately reflects an LSH family's quality. Still, we conclude by discussing why it would be more satisfying to find LSH lower bounds that hold for tiny q .