ABSTRACT
Advances in processing power and memory technology have made multicore computers an important platform for high-performance graph-search (or graph-traversal) algorithms. Since the introduction of multicore, much progress has been made to improve parallel breadth-first search. However, less attention has been given to algorithms for unordered or loosely ordered traversals.
We present a parallel algorithm for unordered depth-first-search on graphs. We prove that the algorithm is work efficient in a realistic algorithmic model that accounts for important scheduling costs. This work-efficiency result applies to all graphs, including those with high diameter and high out-degree vertices. The algorithmic techniques behind this result include a new data structure for representing the frontier of vertices in depth-first search, a new amortization technique for controlling excess parallelism, and an adaptation of the lazy-splitting technique to depth first search.
We validate the theoretical results with an implementation and experiments. The experiments show that the algorithm performs well on a range of graphs and that it can lead to significant improvements over comparable algorithms.
- Stanford large network dataset collection. http://snap.stanford.edu/.Google Scholar
- The 9th dimacs implementation challenge, 2013. http://www.dis.uniroma1.it/challenge9/.Google Scholar
- The 10th dimacs implementation challenge, 2014. http://www.cc.gatech.edu/dimacs10/.Google Scholar
- U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321--347, 2002.Google Scholar
- U. A. Acar, A. Charguéraud, and M. Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011. Google Scholar
Digital Library
- U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP '13, 2013. Google Scholar
Digital Library
- U. A. Acar, A. Charguéraud, and M. Rainey. Theory and practice of chunked sequences. In ESA 2014, volume 8737 of LNCS, pages 25--36. Springer Berlin Heidelberg, 2014.Google Scholar
Cross Ref
- V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2010, New Orleans, LA, USA, November 13--19, 2010, pages 1--11, 2010. Google Scholar
Digital Library
- A. Aggarwal, R. J. Anderson, and M. Kao. Parallel depth-first search in general directed graphs. SIAM J. Comput., 19(2):397--409, 1990. Google Scholar
Digital Library
- D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray MTA-2. In 2006 International Conference on Parallel Processing (ICPP 2006), 14--18 August 2006, Columbus, Ohio, USA, pages 523--530, 2006. Google Scholar
Digital Library
- S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In SC '12, pages 12:1--12:10, Los Alamitos, CA, USA, 2012. IEEE. Google Scholar
Digital Library
- R. Berrendorf and M. Makulla. Level-synchronous parallel breadth-first search algorithms for multicore and multiprocessor systems. In FC '14, pages 26--31, 2014Google Scholar
- G. E. Blelloch, P. Cheng, and P. B. Gibbons. Room synchronizations. In SPAA '01, pages 122--133. ACM, 2001. Google Scholar
Digital Library
- G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. Internally deterministic parallel algorithms can be fast. In PPoPP '12, pages 181--192, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720--748, Sept. 1999. Google Scholar
Digital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SIAM SDM, 2004.Google Scholar
- D. Chase and Y. Lev. Dynamic circular work-stealing deque. In SPAA '05, pages 21--28, 2005. Google Scholar
Digital Library
- C.-Y. Cher, A. L. Hosking, and T. Vijaykumar. Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign. In ASPLOS '04, volume 38, pages 199--210. ACM, 2004. Google Scholar
Digital Library
- J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency. In IPDPS '12, pages 378--389. IEEE, 2012. Google Scholar
Digital Library
- G. Cong, S. B. Kodali, S. Krishnamoorthy, D. Lea, V. A. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536--545, 2008. Google Scholar
Digital Library
- T. A. Davis. University of florida sparse matrix collection, 2010. Available at http://www.cise.ufl.edu/research/sparse/matrices/. Google Scholar
Digital Library
- T. Endo, K. Taura, and A. Yonezawa. A scalable mark-sweep garbage collector on large-scale shared-memory machines. In SC '97, pages 48--48. IEEE, 1997. Google Scholar
Digital Library
- C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel garbage collection for shared memory multiprocessors. In JVM '01, 2001. Google Scholar
Digital Library
- T. Hagerup. Planar depth-first search in o(log n) parallel time. SIAM J. Comput., 19(4):678--704, 1990. Google Scholar
Digital Library
- Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. KLA: A new algorithmic paradigm for parallel graph computations. In PACT '14, pages 27--38, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- D. Hendler and N. Shavit. Non-blocking steal-half work queues. In PODC '02, pages 280--289, 2002. Google Scholar
Digital Library
- X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2004, October 24--28, 2004, Vancouver, BC, Canada, pages 69--80, 2004. Google Scholar
Digital Library
- Intel. Cilk Plus. http://www.cilkplus.org/.Google Scholar
- R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 2011. Google Scholar
Cross Ref
- V. Kumar and V. Rao. Parallel depth first search. part ii. analysis. International Journal of Parallel Programming, 16(6):501--519, 1987. Google Scholar
Digital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW '10, pages 591--600. ACM, 2010. Google Scholar
Digital Library
- C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm. SPAA '10, pages 303--314, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In SIGCOMM '07, pages 29--42. ACM, 2007. Google Scholar
Digital Library
- D. Mizell and K. J. Maschhoff. Early experiences with large-scale cray XMT systems. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23--29, 2009, pages 1--9, 2009. Google Scholar
Digital Library
- D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP '13, pages 456--471. ACM, 2013. Google Scholar
Digital Library
- M. Patwary, P. Refsnes, and F. Manne. Multi-core spanning forest algorithms using the disjoint-set data structure. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 827--835, May 2012. Google Scholar
Digital Library
- M. J. Quinn and N. Deo. Parallel graph algorithms. ACM Comput. Surv., 16(3):319--348, 1984. Google Scholar
Digital Library
- V. Rao and V. Kumar. Parallel depth first search. part i. implementation. IJPP, 16(6):479--499, 1987. Google Scholar
Digital Library
- E. Reghbati and D. G. Corneil. Parallel computations in graph theory. SIAM J. Comput., 7(2):230--237, 1978.Google Scholar
Cross Ref
- E. Reghbati (Arjomandi) and D. Corneil. Parallel computations in graph theory. SIAM JoC, 7(2):230--237, 1978.Google Scholar
- J. H. Reif. Depth-first search is inherently sequential. Inf. Process. Lett., 20(5):229--234, 1985.Google Scholar
Cross Ref
- V. A. Saraswat, P. Kambadur, S. B. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In C. Cascaval and P.-C. Yew, editors, PPOPP, pages 201--212. ACM, 2011. Google Scholar
Digital Library
- A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on gpus and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- E. Saule and Ü. V. Çatalyürek. An early evaluation of the scalability of graph algorithms on the intel MIC architecture. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, May 21--25, 2012, pages 1629--1639, 2012 Google Scholar
Digital Library
- J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPOPP '13, pages 135--146, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- J. Shun, L. Dhulipala, and G. Blelloch. A simple and practical linear-work parallel algorithm for connectivity. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '14, pages 143--153, 2014. Google Scholar
Digital Library
- F. Siebert. Concurrent, parallel, real-time garbage-collection. In ACM Sigplan Notices, volume 45, pages 11--20. ACM, 2010. Google Scholar
Digital Library
- A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. TOPLAS, 36(3):10:1--10:51, Sept. 2014. Google Scholar
Digital Library
- C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EUROSYS '09, pages 205--218. Acm, 2009. Google Scholar
Digital Library
- Y. Xia and V. K. Prasanna. Topologically adaptive parallel breadth-first search on multicore processors. In IASTED '09, volume 668, page 91, 2009.Google Scholar
Index Terms
A work-efficient algorithm for parallel unordered depth-first search
Recommendations
Parallel Depth-First Search for Directed Acyclic Graphs
IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and AlgorithmsDepth-First Search (DFS) is a pervasive algorithm, often used as a building block for topological sort, connectivity and planarity testing, among many other applications. We propose a novel work-efficient parallel algorithm for the DFS traversal of ...
Recognizing Unordered Depth-First Search Trees of an Undirected Graph in Parallel
Let $G$ be an undirected graph and $T$ be a spanning tree of $G$. In this paper, an efficient parallel algorithm is proposed for determining whether $T$ is an unordered depth-first search tree of $G$. The proposed algorithm runs in $O(m/p + \log m)$ ...
On External-Memory Planar Depth First Search
WADS '01: Proceedings of the 7th International Workshop on Algorithms and Data StructuresEven though a large number of I/O-efficient graph algorithms have been developed, a number of fundamental problems still remain open. For example, no space- and I/O-efficient algorithms are known for depth-first search or breadth-first search in sparse ...




Comments