skip to main content
research-article

Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)

Published:01 March 2014Publication History
Skip Abstract Section

Abstract

We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}d under the Hamming distance. Recall that H is said to be an (r, cr, p, q)-sensitive hash family if all pairs x, y ∈ {0,1}d with dist(x, y) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x, y ∈ {0, 1}d with dist(x, y) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c > 1 fixed and q bounded away from 0.

For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = {xxi : i ∈ [d]} achieves ρ ≤ 1/c. The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least ( e1/c - 1)/(e1/c + 1) ≥ .46/c (minus od(1)). The contribution of this article is twofold. (1) We show the “optimal” lower bound for ρ: it must be at least 1/c (minus od(1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t. (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, “tiny” means q = 2-Θ(d), a parameter range we believe is natural.

References

  1. A. Andoni and P. Indyk. 2006. Efficient algorithms for substring near neighbor problem. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. ACM, 1212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Andoni and P. Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1, 117--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Buhler. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 5, 419--428.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. D. Ullman, and C. Yang. 2002. Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13, 1, 64--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. S. Das, M. Datar, A. Garg, and S. Rajaram. 2007. Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web. ACM, 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry (SCG’04). ACM, New York, NY, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Gionis, P. Indyk, and R. Motwani. 1999. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing. ACM, New York, NY, 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Indyk. 2001. High-dimensional Computational Geometry. Doctoral dissertation, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Indyk. 2004. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry, Discrete Mathematics and Its Applications, Chapman and Hall/CRC, 877--892.Google ScholarGoogle Scholar
  11. P. Indyk. 2009. Personal communication.Google ScholarGoogle Scholar
  12. R. Motwani, A. Naor, and R. Panigrahy. 2007. Lower bounds on locality sensitive hashing. SIAM J. Disc. Math. 21, 4, 930--935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Neylon. 2010. A locality-sensitive hash for real vectors. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms. 1179--1189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. O’Donnell. 2003. Computational Applications of Noise Sensitivity. Doctoral dissertation, Massachusetts Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Panigrahy. 2006. Entropy based nearest neighbor search in high dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. ACM, 1186--1195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Panigrahy, K. Talwar, and U. Wieder. 2008. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, 414--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Ravichandran, P. Pantel, and E. Hovy. 2005. Randomized algorithms and NLP: Using locality sensitive hash function for high speed noun clustering. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 622--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Shakhnarovich, P. Viola, and T. Darrell. 2003. Fast pose estimation with parameter-sensitive hashing. In Proceedings of the 9th IEEE International Conference on Computer Vision. Citeseer, 750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Terasawa and Y. Tanaka. 2007. Spherical LSH for approximate nearest neighbor search on unit hypersphere. In Proceedings of the 10th International Workshop on Algorithms and Data Structures. Lecture Notes in Computer Science, vol. 4619, Springer-Verlag, Berlin Heidelberg, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!