skip to main content
research-article

How much parallelism is there in irregular applications?

Published:14 February 2009Publication History
Skip Abstract Section

Abstract

Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular programs have a generalized form of data-parallelism called amorphous data-parallelism. However, in many programs, amorphous data-parallelism cannot be uncovered using static techniques, and its exploitation requires runtime strategies such as optimistic parallel execution. This raises a natural question: how much amorphous data-parallelism actually exists in irregular programs?

In this paper, we describe the design and implementation of a tool called ParaMeter that produces parallelism profiles for irregular programs. Parallelism profiles are an abstract measure of the amount of amorphous data-parallelism at different points in the execution of an algorithm, independent of implementation-dependent details such as the number of cores, cache sizes, load-balancing, etc. ParaMeter can also generate constrained parallelism profiles for a fixed number of cores. We show parallelism profiles for seven irregular applications, and explain how these profiles provide insight into the behavior of these applications.

References

  1. Arvind, David Culler, and Gino Maa. Assessing the benefits of fine-grain parallelism in dataflow programs. International Journal of High-performance Computing Applications, 2(3), 1988.Google ScholarGoogle Scholar
  2. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 207--216, Santa Barbara, California, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Braunstein, M. Mézard, and R. Zecchina. Survey propagation: An algorithm for satisfiability. Random Structures and Algorithms, 27:201--226, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Paul Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG '93: Proceedings of the ninth annual symposium on Computational geometry, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Edsger Dijkstra. A Discipline of Programming. Prentice-Hall, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published ACM SIGPLAN Notices, Vol. 33, No. 5, May, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Leonidas J. Guibas, Donald E. Knuth, and Micha Sharir. Randomized incremental construction of delaunay and voronoi diagrams. Algorithmica, 7(1):381--413, December 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tim Harris and Satnam Singh. Feedback directed implicit parallelism. In ICFP '07: Proceedings of the 12th ACM SIGPLAN international conference on Functional programming, pages 251--264, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Benoıt Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In SPAA'08: Proceedings of the ACM Symposium on Parallel Architectures and Algorithms, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News, 36(1):233--243, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau, and Josep Torrellas. Posh: a tls compiler that exploits program structure. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 158--167, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jayadev Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Rauchwerger, Y. Zhan, and J. Torrellas. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. B. Theobald, G. R. Gao, and L. J. Hendren. On the limits of program parallelism and its smoothability. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO 25), pages 10--19, Dec 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christoph von Praun, Rajesh Bordawekar, and Calin Cascaval. Modeling optimistic concurrency using quantitative dependence analysis. In PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 185--196, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bruce Walter, Kavita Bala, Milind Kulkarni, and Keshav Pingali. Fast agglomerative clustering for rendering. In IEEE Symposium on Interactive Ray Tracing (RT), 2008.Google ScholarGoogle ScholarCross RefCross Ref
  29. Hongtao Zhong, Mojtaba Mehrara, Steve Lieberman, and Scott Mahlke. Uncovering hidden loop level parallelism in sequential applications. IEEE 14th International Symposium on High Performance Computer Architecture, pages 290--301, Feb. 2008.Google ScholarGoogle Scholar

Index Terms

  1. How much parallelism is there in irregular applications?

      Recommendations

      Reviews

      Wolfgang Schreiner

      Most success in parallel computing is achieved with regular data parallel algorithms, where the same task is independently performed on all elements of a data structure-typically, a dense vector or matrix. However, it is not a priori clear how much parallelism is inherent in irregular applications with amorphous data parallelism; such applications operate on pointer-based graphs, where, due to algorithmic constraints at any time, only a subset of graph nodes can be simultaneously processed. For analyzing such algorithms, Kulkarni et al. developed the ParaMeter tool, a simulator for programs written in a sequential style, extended by an iteration construct for the parallel processing of all elements of a set. The simulation proceeds in a number of rounds where essentially, in every round, a maximal set of independent nodes is selected, and then the effect of processing these nodes in parallel is determined. The number of independent nodes per round thus serves as a measure for the amount of parallelism inherent in the algorithm. As the authors point out, the resulting parallelism profile is influenced by the tool's greedy scheduling strategy and by the particular (random) choice of a maximal independent set of nodes per round (there may exist better choices that amount to more parallelism). However, the paper contains sample analyses of six algorithms, five of which seem insensitive to the choices; only the last one significantly underestimates the available parallelism. Therefore, the tool is certainly valuable as an initial guide for the development of irregular parallel applications. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 44, Issue 4
        PPoPP '09
        April 2009
        294 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1594835
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2009
          322 pages
          ISBN:9781605583976
          DOI:10.1145/1504176

        Copyright © 2009 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 February 2009

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!