skip to main content
research-article

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. In this paper, we study multiple algorithms for four such problems: discrete-event simulation, single-source shortest path, breadth-first search, and minimal spanning trees. We show that the algorithms can be classified into two categories that we call unordered and ordered, and demonstrate experimentally that there is a trade-off between parallelism and work efficiency: unordered algorithms usually have more parallelism than their ordered counterparts for the same problem, but they may also perform more work. Nevertheless, our experimental results show that unordered algorithms typically lead to more scalable implementations, demonstrating that less work-efficient irregular algorithms may be better for parallel execution.

References

  1. 9th dimacs implementation challenge - shortest paths. http://www.dis.uniroma1.it/~challenge9/links.shtml.Google ScholarGoogle Scholar
  2. D.A. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David~A. Bader and Kamesh Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In ICPP'06: Proceedings of the 2006 International Conference on Parallel Processing, pages 523--530, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Richard Bellman. On a routing problem. Quarterly of Applied Mathematics, 16(1), pages 87--90, 1958.Google ScholarGoogle Scholar
  5. Gianfranco Bilardi, Paolo D'Alberto, and Alexandru Nicolau. Fractal matrix multiplication: A case study on portability of cache performance. In Algorithm Engineering, pages 26--38, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gianfranco Bilardi, Afonso Ferreira, Reinhard Lüling, and José D. P. Rolim, editors. Solving Irregularly Structured Problems in Parallel, 4th International Symposium, IRREGULAR'97, Paderborn, Germany, June 12--13, 1997, Proceedings, volume 1253 of Lecture Notes in Computer Science. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brian D. Carlstrom, Austen McDonald, Michael Carbin, Christos Kozyrakis, and Kunle Olukotun. Transactional collection classes. In PPOPP, pages 56--67, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Mani Chandy and Jayadev Misra. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Software Eng., 5(5):440--452, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Philippe Charles, Christian Grothoff, Vijay~A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In OOPSLA, pages 519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Chazan and W. Miranker. Chaotic relaxation. Linear algebra, 2(199--222), 1969.Google ScholarGoogle Scholar
  11. Sun Chung and Anne Condon. Parallel implementation of boruvka's minimum spanning tree algorithm. Parallel Processing Symposium, International, 0:302, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nick Edmonds, Alex Breuer, Douglas Gregor, and Andrew Lumsdaine. Single-source shortest paths with the parallel boost graph library. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.Google ScholarGoogle Scholar
  14. Paul Feautrier. Some efficient solutions to the affine scheduling problem: One dimensional time. International Journal of Parallel Programming, October 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jr. Ford, L. R. Network flow theory. Technical Report Report P-923, The Rand Corporation, Santa Monica, Cal, 1956.Google ScholarGoogle Scholar
  16. Maurice Herlihy and J. Eliot~B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph Ja'Ja'. An Introduction to Parallel Algorithms. Addison-Wesley Publishing Company, Reading, MA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David~R. Jefferson. Virtual time. ACM Trans. Program. Lang. Syst., 7(3):404--425, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jorg Keller, Cristoph Kessler, and Jesper Traff. Practical PRAM Programming. Wiley-Interscience, New York, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali, and Calin Casçaval. How much parallelism is there in irregular applications? In PPoPP, pages 3--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Charles E. Leiserson and Tao~B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In SPAA, pages 303--314, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kamesh Madduri, David~A. Bader, Jonathan~W. Berry, and Joseph~R. Crobak. Parallel shortest path algorithms for solving large-scale instances. In 9th DIMACS Implementation Challenge - Shortest Paths, November, 2006.Google ScholarGoogle Scholar
  24. Mario Mendez-Lojo, Donald Nguyen, Dimitrios Prountzos, Xin Sui, M. Amber Hassaan, Milind Kulkarni, Martin Burtscher, and Keshav Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Meyer and P. Sanders. δ-stepping: a parallelizable shortest path algorithm. J. Algs. 49(1), pages 114--152, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. http://www.openmp.org/.Google ScholarGoogle Scholar
  27. Keshav Pingali, Milind Kulkarni, Donald Nguyen, Martin Burtscher, Mario Mendez-Lojo, Dimitrios Prountzos, Xin Sui, and Zifei Zhong. Amorphous data-parallelism in irregular algorithms. regular tech report TR-09-05, The University of Texas at Austin, 2009.Google ScholarGoogle Scholar
  28. C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput., 36(12):1425--1439, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lawrence Rauchwerger and David~A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yossi Shiloach and Uzi Vishkin. An o(log n) parallel connectivity algorithm. J. Algorithms, 3(1):57--67, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ten H. Tzen and Lionel~M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers, 1993.Google ScholarGoogle Scholar
  32. Uzi Vishkin, Shlomit Dascal, Efraim Berkovich, and Joseph Nuzman. Explicit multi-threading (xmt) bridging models for instruction parallelism (extended abstract). In SPAA, pages 140--151, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xingzhi Wen and Uzi Vishkin. Fpga-based prototype of a pram-on-chip processor. In Conf. Computing Frontiers, pages 55--66, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Andy Yoo, Edmond Chow, Keith Henderson, William McLendon, Bruce Hendrickson, and Umit Catalyurek. A scalable distributed parallel breadth-first search algorithm on bluegene/l. In SC'05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page~25, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yang Zhang and Eric~A. Hansen. Parallel breadth-first heuristic search on a shared memory architecture. In AAAI-06 Workshop on Heuristic Search, Memory-Based Heuristics and Their Applications, July 2006.Google ScholarGoogle Scholar

Index Terms

  1. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!