10.1145/1835804.1835882acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedings
research-article

Fast euclidean minimum spanning tree: algorithm, analysis, and applications

ABSTRACT

The Euclidean Minimum Spanning Tree problem has applications in a wide range of fields, and many efficient algorithms have been developed to solve it. We present a new, fast, general EMST algorithm, motivated by the clustering and analysis of astronomical data. Large-scale astronomical surveys, including the Sloan Digital Sky Survey, and large simulations of the early universe, such as the Millennium Simulation, can contain millions of points and fill terabytes of storage. Traditional EMST methods scale quadratically, and more advanced methods lack rigorous runtime guarantees. We present a new dual-tree algorithm for efficiently computing the EMST, use adaptive algorithm analysis to prove the tightest (and possibly optimal) runtime bound for the EMST problem to-date, and demonstrate the scalability of our method on astronomical data sets.

References

  1. P. K. Agarwal et al. Euclidean minimum spanning trees and bichromatic closest pairs. Discrete Comput. Geom., 6(5):407--422, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. J. Babu and E. D. Feigelson. Astrostatistics. Chapman&Hall/CRC, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Balcan, A. Blum, and S. Vempala. A discriminative framework for clustering via similarity functions. In Symposium on Theory of Computing, pages 671--680. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. D. Barrow, S. P. Bhavsar, and D. H. Sonoda. Minimal spanning trees, filaments and galaxy clustering. MNRAS, 216:17--35, Sept. 1985.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Bentley and J. Friedman. Fast Algorithms for Constructing Minimal Spanning Trees in Coordinate Spaces. IEEET. Comput., 27:97--105, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bentley and A. Yao. An almost optimal algorithm for unbounded searching. Inform. Process. Lett., 5(3):82--87, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Beygelzimer, S. Kakade, and J. Langford. Cover Trees for Nearest Neighbor. 23rd International Conference on Machine Learning, pages 97--104, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. P. Bhavsar and R. J. Splinter. The superiority of the minimal spanning tree in percolation analyses of cosmological datasets. MNRAS, 282:1461--1466, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  9. P. Callahan and S. Kosaraju. Faster algorithms for some geometric graph problems in higher dimensions. In Fourth annual ACM-SIAM Symposium on Discrete Algorithms, pages 291--300, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. B. Callahan and S. R. Kosaraju. A Decomposition of Multidimensional Point Sets with Applications to k-Nearest-Neighbors and n-body Potential Fields. J. ACM, 62(1):67--90, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Chazelle. A faster deterministic algorithm for minimum spanning trees. In Symposium on Foundations of Computer Science, pages 22--31, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Chazelle. A minimum spanning tree algorithm with inverse-ackermann type complexity. J. ACM, 47(6):1028--1047, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Collessetal. The 2dF Galaxy Redshift Survey: spectra and redshifts. MNRAS, 328:1039--1063, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  14. E. Demaine, A. Lopez-Ortiz, and J. Munro. Adaptive set intersections, unions, and diýerences. In SODA, pages743--752, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. B. Eisenetal. Cluster analysis and display of genome-wide expression patterns. PNAS, 95(25):14863--14868, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  16. V. Estivill-Castro and D. Wood. A survey of adaptive sorting algorithms. ACM Comput. Surv., 24(4):441--476, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Fredman and R. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM, 34(3):596--615, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. H. Friedman, J. L. Bentley, and R. A. Finkel. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3(3):209--226, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Gabow et al. Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica, 6(2):109--122, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. C. Gower and G. J. S. Ross. Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1):54--64, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Gray et al. Mlpack, 2008. http://mloss.org/software/view/152/.Google ScholarGoogle Scholar
  22. A. Gray and A. W. Moore. N-body problems in statistical learning. In Advances in Neural Information Processing Systems 13, 2001.Google ScholarGoogle Scholar
  23. A. G. Gray and A. W. Moore. Rapid Evaluation of Multiple Density Models. In The Ninth Conference on Artificial Intel ligence and Statistics, 2003.Google ScholarGoogle Scholar
  24. M. Held and R. M. Karp. The traveling-salesman problem and minimum spanning trees. Operations Research, 18(6):1138--1162, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. H. Jones et al. The 6dF Galaxy Survey: samples, observational techniques and the first data release. MNRAS, 355:747--763, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  26. D. R. Karger and M. Ruhl. Finding Nearest Neighbors in Growth-Restricted Metrics. ACM Symposium on Theory of Computing, pages 741--750, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. G. Kirkpatrick and R. Seidel. The ultimate planar convex hull alorithm? SIAM J. Comput., 15(1):287--299, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc., 7:48--50, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  29. C. Lacey and S. Cole. Merger rates in hierarchical models of galaxy formation. II- Comparison with N-body simulations. MNRAS, 271(3):676--692, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. Moore et al. Fast Algorithms and Efficient Statistics: n-point Correlation Functions. In Proceedings of MPA/MPE/ESO Conference Mining the Sky, 2000.Google ScholarGoogle Scholar
  31. G. Narasimhan, M. Zachariasen, and J. Zhu. Experiments with computing geometric minimum spanning trees. In Proceedings of ALENEX'00, pages 183--196, 2000.Google ScholarGoogle Scholar
  32. J. Nesetril. Otakar Boruvka on minimum spanning tree problem Translation of both the 1926 papers, comments, history. Discrete Math., 233:3--36, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. O. Nevalainen, J. Ernvall, and J. Kata jainen. Finding minimal spanning trees in a Euclidean coordinate space. BIT Numerical Mathematics, 21(1):46--54, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Pettie. Finding Minimum Spanning Trees in O(m±(m, n)) Time. Technical report, University of Texas at Austin, Austin, TX, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Pettie and V. Ramachandran. An optimal minimum spanning tree algorithm. J. ACM, 49(1):16--34, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. P. Preparata and M. I. Shamos. Computational Geometry. Springer-Verlag, New York, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. C. Prim. Shortest connection networks and some generalizations. Bel l Sys. Tech. J., 36:1389--1401, 1957.Google ScholarGoogle Scholar
  38. P. Ram et al. Linear time algorithms for pairwise statistical problems. In Advances in Neural Information Processing Systems 23, 2009.Google ScholarGoogle Scholar
  39. R. Riegel, A. Gray, and G. Richards. Massive-Scale Kernel Discriminant Analysis: Mining for Quasars. In SIAM International Conference on Data Mining, 2008.Google ScholarGoogle Scholar
  40. S. Schmeja and R. S. Klessen. Evolving structures of star-forming clusters. AAP, 449:151--159, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  41. M. Shamos and D. Hoey. Closest-point problems. In 16th Annual Symposium on Foundations of Computer Science, pages 151--162, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Shectman et al. The Las Campanas Redshift Survey. Astrophys. J., 470, 1996.Google ScholarGoogle Scholar
  43. V. Springel et al. Simulations of the formation, evolution and clustering of galaxies and quasars. Nature, 435(7042):629--636, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  44. S. Subramaniam and S. B. Pope. A mixing model for turbulent reactive flows based on euclidean minimum spanning trees. Combust. Flame, 115(4):487--514, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  45. R. Tarjan. Data Structures and Network Algorithms. Society for industrial and Applied Mathematics, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. J. Wan et al. Minimum-energy broadcast routing in static ad hoc wireless networks. In IEEE Infocom, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  47. P. Wang et al. Fast Mean Shift with Accurate and Stable Convergence. In The Eleventh Workshop on Artificial Intel ligence and Statistics, 2007.Google ScholarGoogle Scholar
  48. P. Willett. Recent trends in hierarchic document clustering: a critical review. Inf. Process. Manage., 24(5):577--597, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. W.-K. Wong and A. Moore. Efficient algorithms for non-parametric clustering with clutter. In Proceedings of the 34th Interface Symposium, 2002.Google ScholarGoogle Scholar
  50. A. Yao. An O(|E| log log |V|) algorithm for finding minimum spanning trees. Inf. Process. Lett., 4:21--23, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  51. A. Yao. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J. Comput., 11(4):721--736, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  52. D. York et al. The Sloan Digital Sky Survey: Technical Summary. Astronomical Journal, 120:1579--1587, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  53. C. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput., 20(1):68--86, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library

Supplemental Material

kdd2010_march_fem_01.mov

Index Terms

  1. Fast euclidean minimum spanning tree

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!