skip to main content
research-article

Optimistic parallelism benefits from data partitioning

Published:01 March 2008Publication History
Skip Abstract Section

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations.

The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors.

In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.

Skip Supplemental Material Section

Supplemental Material

Video

References

  1. Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. International Journal of Computer Vision (IJCV), 70(2):109--131, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Donald D. Chamberlin, Morton M. Astrahan, Michael W. Blasgen, James N. Gray, W. Frank King, Bruce G. Lindsay, Raymond Lorie, James W. Mehl, Thomas G. Price, Franco Putzolu, Patricia Griffiths Selinger, Mario Schkolnick, Donald R. Slutz, Irving L. Traiger, Bradford W. Wade, and Robert A. Yost. A history and evaluation of system R, pages 54--68. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1994.Google ScholarGoogle Scholar
  3. Shimin Chen, Phillip B. Gibbons, Michael Kozuch, Vasileios Liaskovitis, Anastassia Ailamaki, Guy E. Blelloch, Babak Falsafi, Limor Fix, Nikos Hardavellas, Todd C. Mowry, and Chris Wilkerson. Scheduling threads for constructive cache sharing on cmps. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 105--115, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrew V. Goldberg and Robert E. Tarjan. A new approach to the maximum-flow problem. J. ACM, 35(4):921--940, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael I. Gordon, William Thies, and Saman Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 151--162, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tim Harris and Keir Fraser. Language support for lightweight transactions. In OOPSLA '03: Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, pages 388--402, New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed Systems, 1(1):35--47, January 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Horowitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In Proceedings of the SIGPLAN '89 Conference on Program Language Design and Implementation, Portland, OR, June 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Benoît Hudson, Gary L. Miller, and Todd Phillips. Sparse parallel delaunay mesh refinement. In SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, pages 339--347, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel Corporation. Intel thread building blocks 2.0. http://osstbb.intel.com.Google ScholarGoogle Scholar
  13. G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96--129, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ken Kennedy and John Allen, editors. Optimizing compilers for modren architectures:a dependence-based approach. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. W. Kernighan and S. Lin. An effective heuristic procedure for partitioning graphs. The Bell System Technical Journal, pages 291--308, February 1970.Google ScholarGoogle ScholarCross RefCross Ref
  16. Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture with speculative multithreading. IEEE Trans. Comput., 48(9):866--880, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. Optimistic parallelism requires abstractions. SIGPLAN Not. (Proceedings of PLDI 2007), 42(6):211--222, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. R. Larus and P. N. Hilfinger. Detecting conflicts between structure accesses. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation, pages 24--31, New York, NY, USA, 1988. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, and David A. Wood. Logtm: Log-based transactional memory. In HPCA '06: Proceedings of the 12th International Symposium on High Performance Computer Architecture, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  20. Andreas Müller and Roland Rühl. Extending high performance fortran for the support of unstructured computations. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 127--136, New York, NY, USA, 1995. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yang Ni, Vijay Menon, Ali-Reza Adl-Tabatabai, Antony L. Hosking, Rick Hudson, J. Eliot B. Moss, Bratin Saha, and Tatiana Shpeisman. Open nesting in software transactional memory. In Principles and Practices of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lawrence Rauchwerger and David A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anne Rogers and Keshav Pingali. Process decomposition through locality of reference. In ACM Symposium on Programming Language Design and Implementation, pages 69--80, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Sagiv, T. Reps, and R. Wilhelm. Solving shape-analysis problems in languages with destructive updating. ACM Transactions on Programming Languages and Systems, 20(1):1--50, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hudson, Chi Cao Minh, and Benjamin Hertzberg. McRT-STM: a high performance software transactional memory system for a multi-core runtime. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 187--197, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Scott, Michael F. Spear, Luke Dalessandro, and Virendra J. Marathe. Delaunay triangulation with transactions and barriers. In IEEE Intl. Symp. on Workload Characterization (IISWC), Boston, MA, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203--222. Springer-Verlag, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Sohn and H. D. Simon. S-HARP: A parallel dynamic spectral partitioner. Lecture Notes in Computer Science, 1457:376--385, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and Donald Greenberg. Lightcuts: a scalable approach to illumination. ACM Transactions on Graphics (SIGGRAPH), 24(3):1098--1107, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimistic parallelism benefits from data partitioning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!