skip to main content
10.1145/2807591.2807651acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free Access

A work-efficient algorithm for parallel unordered depth-first search

Published:15 November 2015Publication History

ABSTRACT

Advances in processing power and memory technology have made multicore computers an important platform for high-performance graph-search (or graph-traversal) algorithms. Since the introduction of multicore, much progress has been made to improve parallel breadth-first search. However, less attention has been given to algorithms for unordered or loosely ordered traversals.

We present a parallel algorithm for unordered depth-first-search on graphs. We prove that the algorithm is work efficient in a realistic algorithmic model that accounts for important scheduling costs. This work-efficiency result applies to all graphs, including those with high diameter and high out-degree vertices. The algorithmic techniques behind this result include a new data structure for representing the frontier of vertices in depth-first search, a new amortization technique for controlling excess parallelism, and an adaptation of the lazy-splitting technique to depth first search.

We validate the theoretical results with an implementation and experiments. The experiments show that the algorithm performs well on a range of graphs and that it can lead to significant improvements over comparable algorithms.

References

  1. Stanford large network dataset collection. http://snap.stanford.edu/.Google ScholarGoogle Scholar
  2. The 9th dimacs implementation challenge, 2013. http://www.dis.uniroma1.it/challenge9/.Google ScholarGoogle Scholar
  3. The 10th dimacs implementation challenge, 2014. http://www.cc.gatech.edu/dimacs10/.Google ScholarGoogle Scholar
  4. U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321--347, 2002.Google ScholarGoogle Scholar
  5. U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. A. Acar, A. Charguéraud, and M. Rainey. Theory and practice of chunked sequences. In ESA 2014, volume 8737 of LNCS, pages 25--36. Springer Berlin Heidelberg, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13--19, 2010, pages 1--11, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Aggarwal, R. J. Anderson, and M. Kao. Parallel depth-first search in general directed graphs. SIAM J. Comput., 19(2):397--409, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In 2006 International Conference on Parallel Processing (ICPP 2006), 14--18 August 2006, Columbus, Ohio, USA, pages 523--530, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In SC '12, pages 12:1--12:10, Los Alamitos, CA, USA, 2012. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Berrendorf and M. Makulla. Level-synchronous parallel breadth-first search algorithms for multicore and multiprocessor systems. In FC '14, pages 26--31, 2014Google ScholarGoogle Scholar
  13. G. E. Blelloch, P. Cheng, and P. B. Gibbons. Room synchronizations. In SPAA '01, pages 122--133. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM SDM, 2004.Google ScholarGoogle Scholar
  17. D. Chase and Y. Lev. Dynamic circular work-stealing deque. In SPAA '05, pages 21--28, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign. In ASPLOS '04, volume 38, pages 199--210. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency. In IPDPS '12, pages 378--389. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Cong, S. B. Kodali, S. Krishnamoorthy, D. Lea, V. A. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536--545, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. A. Davis. University of florida sparse matrix collection, 2010. Available at http://www.cise.ufl.edu/research/sparse/matrices/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC '97, pages 48--48. IEEE, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Hagerup. Planar depth-first search in o(log n) parallel time. SIAM J. Comput., 19(4):678--704, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. KLA: A new algorithmic paradigm for parallel graph computations. In PACT '14, pages 27--38, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Hendler and N. Shavit. Non-blocking steal-half work queues. In PODC '02, pages 280--289, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2004, October 24--28, 2004, Vancouver, BC, Canada, pages 69--80, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Intel. Cilk Plus. http://www.cilkplus.org/.Google ScholarGoogle Scholar
  29. R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  30. V. Kumar and V. Rao. Parallel depth first search. part ii. analysis. International Journal of Parallel Programming, 16(6):501--519, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW '10, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm. SPAA '10, pages 303--314, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM '07, pages 29--42. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Mizell and K. J. Maschhoff. Early experiences with large-scale cray XMT systems. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23--29, 2009, pages 1--9, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP '13, pages 456--471. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 827--835, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. J. Quinn and N. Deo. Parallel graph algorithms. ACM Comput. Surv., 16(3):319--348, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Rao and V. Kumar. Parallel depth first search. part i. implementation. IJPP, 16(6):479--499, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. Reghbati and D. G. Corneil. Parallel computations in graph theory. SIAM J. Comput., 7(2):230--237, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  40. E. Reghbati (Arjomandi) and D. Corneil. Parallel computations in graph theory. SIAM JoC, 7(2):230--237, 1978.Google ScholarGoogle Scholar
  41. J. H. Reif. Depth-first search is inherently sequential. Inf. Process. Lett., 20(5):229--234, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  42. V. A. Saraswat, P. Kambadur, S. B. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In C. Cascaval and P.-C. Yew, editors, PPOPP, pages 201--212. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on gpus and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. E. Saule and Ü. V. Çatalyürek. An early evaluation of the scalability of graph algorithms on the intel MIC architecture. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, May 21--25, 2012, pages 1629--1639, 2012 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPOPP '13, pages 135--146, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Shun, L. Dhulipala, and G. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 143--153, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. F. Siebert. Concurrent, parallel, real-time garbage-collection. In ACM Sigplan Notices, volume 45, pages 11--20. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. TOPLAS, 36(3):10:1--10:51, Sept. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EUROSYS '09, pages 205--218. Acm, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Xia and V. K. Prasanna. Topologically adaptive parallel breadth-first search on multicore processors. In IASTED '09, volume 668, page 91, 2009.Google ScholarGoogle Scholar

Index Terms

  1. A work-efficient algorithm for parallel unordered depth-first search

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
              November 2015
              985 pages
              ISBN:9781450337236
              DOI:10.1145/2807591
              • General Chair:
              • Jackie Kern,
              • Program Chair:
              • Jeffrey S. Vetter

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 November 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader