ABSTRACT
The Euclidean Minimum Spanning Tree problem has applications in a wide range of fields, and many efficient algorithms have been developed to solve it. We present a new, fast, general EMST algorithm, motivated by the clustering and analysis of astronomical data. Large-scale astronomical surveys, including the Sloan Digital Sky Survey, and large simulations of the early universe, such as the Millennium Simulation, can contain millions of points and fill terabytes of storage. Traditional EMST methods scale quadratically, and more advanced methods lack rigorous runtime guarantees. We present a new dual-tree algorithm for efficiently computing the EMST, use adaptive algorithm analysis to prove the tightest (and possibly optimal) runtime bound for the EMST problem to-date, and demonstrate the scalability of our method on astronomical data sets.
References
- P. K. Agarwal et al. Euclidean minimum spanning trees and bichromatic closest pairs. Discrete Comput. Geom., 6(5):407--422, 1991. Google Scholar
Digital Library
- G. J. Babu and E. D. Feigelson. Astrostatistics. Chapman&Hall/CRC, 1996.Google Scholar
Cross Ref
- M. Balcan, A. Blum, and S. Vempala. A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing, pages 671--680. ACM, 2008. Google Scholar
Digital Library
- J. D. Barrow, S. P. Bhavsar, and D. H. Sonoda. Minimal spanning trees, filaments and galaxy clustering. MNRAS, 216:17--35, Sept. 1985.Google Scholar
Cross Ref
- J. Bentley and J. Friedman. Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces. IEEET. Comput., 27:97--105, 1978. Google Scholar
Digital Library
- J. Bentley and A. Yao. An almost optimal algorithm for unbounded searching. Inform. Process. Lett., 5(3):82--87, 1976.Google Scholar
Cross Ref
- A. Beygelzimer, S. Kakade, and J. Langford. Cover Trees for Nearest Neighbor. 23rd International Conference on Machine Learning, pages 97--104, 2006. Google Scholar
Digital Library
- S. P. Bhavsar and R. J. Splinter. The superiority of the minimal spanning tree in percolation analyses of cosmological datasets. MNRAS, 282:1461--1466, 1996.Google Scholar
Cross Ref
- P. Callahan and S. Kosaraju. Faster algorithms for some geometric graph problems in higher dimensions. In Fourth annual ACM-SIAM Symposium on Discrete Algorithms, pages 291--300, 1993. Google Scholar
Digital Library
- P. B. Callahan and S. R. Kosaraju. A Decomposition of Multidimensional Point Sets with Applications to k-Nearest-Neighbors and n-body Potential Fields. J. ACM, 62(1):67--90, 1995. Google Scholar
Digital Library
- B. Chazelle. A faster deterministic algorithm for minimum spanning trees. In Symposium on Foundations of Computer Science, pages 22--31, 1997. Google Scholar
Digital Library
- B. Chazelle. A minimum spanning tree algorithm with inverse-ackermann type complexity. J. ACM, 47(6):1028--1047, 2000. Google Scholar
Digital Library
- M. Collessetal. The 2dF Galaxy Redshift Survey: spectra and redshifts. MNRAS, 328:1039--1063, 2001.Google Scholar
Cross Ref
- E. Demaine, A. Lopez-Ortiz, and J. Munro. Adaptive set intersections, unions, and diýerences. In SODA, pages743--752, 2000. Google Scholar
Digital Library
- M. B. Eisenetal. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863--14868, 1998.Google Scholar
Cross Ref
- V. Estivill-Castro and D. Wood. A survey of adaptive sorting algorithms. ACM Comput. Surv., 24(4):441--476, 1992. Google Scholar
Digital Library
- M. Fredman and R. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, 34(3):596--615, 1987. Google Scholar
Digital Library
- J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3(3):209--226, 1977. Google Scholar
Digital Library
- H. Gabow et al. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, 6(2):109--122, 1986. Google Scholar
Digital Library
- J. C. Gower and G. J. S. Ross. Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1):54--64, 1969.Google Scholar
Cross Ref
- A. Gray et al. Mlpack, 2008. http://mloss.org/software/view/152/.Google Scholar
- A. Gray and A. W. Moore. N-body problems in statistical learning. In Advances in Neural Information Processing Systems 13, 2001.Google Scholar
- A. G. Gray and A. W. Moore. Rapid Evaluation of Multiple Density Models. In The Ninth Conference on Artificial Intel ligence and Statistics, 2003.Google Scholar
- M. Held and R. M. Karp. The traveling-salesman problem and minimum spanning trees. Operations Research, 18(6):1138--1162, 1970.Google Scholar
Digital Library
- D. H. Jones et al. The 6dF Galaxy Survey: samples, observational techniques and the first data release. MNRAS, 355:747--763, 2004.Google Scholar
Cross Ref
- D. R. Karger and M. Ruhl. Finding Nearest Neighbors in Growth-Restricted Metrics. ACM Symposium on Theory of Computing, pages 741--750, 2002. Google Scholar
Digital Library
- D. G. Kirkpatrick and R. Seidel. The ultimate planar convex hull alorithm? SIAM J. Comput., 15(1):287--299, 1986. Google Scholar
Digital Library
- J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc., 7:48--50, 1956.Google Scholar
Cross Ref
- C. Lacey and S. Cole. Merger rates in hierarchical models of galaxy formation. II- Comparison with N-body simulations. MNRAS, 271(3):676--692, 1994.Google Scholar
Cross Ref
- A. Moore et al. Fast Algorithms and Efficient Statistics: n-point Correlation Functions. In Proceedings of MPA/MPE/ESO Conference Mining the Sky, 2000.Google Scholar
- G. Narasimhan, M. Zachariasen, and J. Zhu. Experiments with computing geometric minimum spanning trees. In Proceedings of ALENEX'00, pages 183--196, 2000.Google Scholar
- J. Nesetril. Otakar Boruvka on minimum spanning tree problem Translation of both the 1926 papers, comments, history. Discrete Math., 233:3--36, 2001. Google Scholar
Digital Library
- O. Nevalainen, J. Ernvall, and J. Kata jainen. Finding minimal spanning trees in a Euclidean coordinate space. BIT Numerical Mathematics, 21(1):46--54, 1981.Google Scholar
Cross Ref
- S. Pettie. Finding Minimum Spanning Trees in O(m±(m, n)) Time. Technical report, University of Texas at Austin, Austin, TX, USA, 1999. Google Scholar
Digital Library
- S. Pettie and V. Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16--34, 2002. Google Scholar
Digital Library
- F. P. Preparata and M. I. Shamos. Computational Geometry. Springer-Verlag, New York, 1985. Google Scholar
Digital Library
- R. C. Prim. Shortest connection networks and some generalizations. Bel l Sys. Tech. J., 36:1389--1401, 1957.Google Scholar
- P. Ram et al. Linear time algorithms for pairwise statistical problems. In Advances in Neural Information Processing Systems 23, 2009.Google Scholar
- R. Riegel, A. Gray, and G. Richards. Massive-Scale Kernel Discriminant Analysis: Mining for Quasars. In SIAM International Conference on Data Mining, 2008.Google Scholar
- S. Schmeja and R. S. Klessen. Evolving structures of star-forming clusters. AAP, 449:151--159, 2006.Google Scholar
Cross Ref
- M. Shamos and D. Hoey. Closest-point problems. In 16th Annual Symposium on Foundations of Computer Science, pages 151--162, 1975. Google Scholar
Digital Library
- S. Shectman et al. The Las Campanas Redshift Survey. Astrophys. J., 470, 1996.Google Scholar
- V. Springel et al. Simulations of the formation, evolution and clustering of galaxies and quasars. Nature, 435(7042):629--636, 2005.Google Scholar
Cross Ref
- S. Subramaniam and S. B. Pope. A mixing model for turbulent reactive flows based on euclidean minimum spanning trees. Combust. Flame, 115(4):487--514, 1998.Google Scholar
Cross Ref
- R. Tarjan. Data Structures and Network Algorithms. Society for industrial and Applied Mathematics, 1988. Google Scholar
Digital Library
- P. J. Wan et al. Minimum-energy broadcast routing in static ad hoc wireless networks. In IEEE Infocom, 2001.Google Scholar
Cross Ref
- P. Wang et al. Fast Mean Shift with Accurate and Stable Convergence. In The Eleventh Workshop on Artificial Intel ligence and Statistics, 2007.Google Scholar
- P. Willett. Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage., 24(5):577--597, 1988. Google Scholar
Digital Library
- W.-K. Wong and A. Moore. Efficient algorithms for non-parametric clustering with clutter. In Proceedings of the 34th Interface Symposium, 2002.Google Scholar
- A. Yao. An O(|E| log log |V|) algorithm for finding minimum spanning trees. Inf. Process. Lett., 4:21--23, 1975.Google Scholar
Cross Ref
- A. Yao. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J. Comput., 11(4):721--736, 1982.Google Scholar
Cross Ref
- D. York et al. The Sloan Digital Sky Survey: Technical Summary. Astronomical Journal, 120:1579--1587, 2000.Google Scholar
Cross Ref
- C. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput., 20(1):68--86, 1971. Google Scholar
Digital Library
Supplemental Material
Index Terms
Fast euclidean minimum spanning tree





Comments