ABSTRACT
We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). The data structure requires O(n) space regardless of the metric's structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the point set has a bounded expansion constant c, which is a measure of the intrinsic dimensionality, as defined in (Karger & Ruhl, 2002), the cover tree data structure can be constructed in O (c6n log n) time. Furthermore, nearest neighbor queries require time only logarithmic in n, in particular O (c12 log n) time. Our experimental results show speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
References
- Beygelzimer, A., Kakade, S., & Langford, J. (2005). Cover trees for nearest neighbor. Available at http://hunch.net/~jl/projects/cover_tree. Google Scholar
Digital Library
- Clarkson, K. (1999). Nearest neighbor queries in metric spaces. Discrete and Computational Geometry, 22, 63--93.Google Scholar
Cross Ref
- Clarkson, K. (2002). Nearest neighbor searching in metric spaces: Experimental results for sb(s). http://cm.bell-labs.com/who/clarkson/Msb/readme.html.Google Scholar
- Friedman, J., Bentley, J., & Finkel, R. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3, 209--226. Google Scholar
Digital Library
- Gray, A., & Moore, A. (2000). N-body problems in statistical learning. Advances in Neural Information Processing Systems, 13, 521--527.Google Scholar
- Gupta, A., Krauthgamer, R., & Lee, J. (2003). Bounded geometries, fractals, and low-distortion embeddings. Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (pp. 534--543). Google Scholar
Digital Library
- Har-Peled, S., & Mendel, M. (2006). Fast constructions of nets in low dimensional metrics and their applications. SIAM Journal on Computing, 35, 1148--1184. Google Scholar
Digital Library
- Karger, D., & Ruhl, M. (2002). Finding nearest neighbors in growth restricted metrics. Proceedings of the 34th Annual ACM Symposium on Theory of Computing (pp. 741--750). Google Scholar
Digital Library
- Krauthgamer, R., & Lee, J. (2004a). The black-box complexity of nearest neighbor search. Proceedings of the 31st International Colloquium on Automata, Languages and Programming (pp. 858--869).Google Scholar
Cross Ref
- Krauthgamer, R., & Lee, J. (2004b). Navigating nets: Simple algorithms for proximity search. Proceedings of the 15th Annual Symposium on Discrete Algorithms (pp. 791--801). Google Scholar
Digital Library
- Laviolette, F., Marchand, M., & Shah, M. (2005). A PAC-bayes approach to the set covering machine. Advances in Neural Information Processing Systems, 18.Google Scholar
- Omohundro, S. (1987). Efficient algorithms with neural network behavior. Journal of Complex Systems, 1, 273--347.Google Scholar
- Uhlmann, J. (1991). Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40, 175--179.Google Scholar
Cross Ref
Index Terms
Cover trees for nearest neighbor



Comments