skip to main content
research-article

Exploiting coarse-grain speculative parallelism

Published:22 October 2011Publication History
Skip Abstract Section

Abstract

Speculative execution at coarse granularities (e.g., code-blocks, methods, algorithms) offers a promising programming model for exploiting parallelism on modern architectures. In this paper we present Anumita, a framework that includes programming constructs and a supporting runtime system to enable the use of coarse-grain speculation to improve program performance, without burdening the programmer with the complexity of creating, managing and retiring speculations. Speculations may be composed by specifying surrogate code blocks at any arbitrary granularity, which are then executed concurrently, with a single winner ultimately modifying program state. Anumita provides expressive semantics for winner selection that go beyond time to solution to include user-defined notions of quality of solution. Anumita can be used to improve the performance of hard to parallelize algorithms whose performance is highly dependent on input data. Anumita is implemented as a user-level runtime with programming interfaces to C, C++, Fortran and as an OpenMP extension. Performance results from several applications show the efficacy of using coarse-grain speculation to achieve (a) robustness when surrogates fail and (b) significant speedup over static algorithm choices.

References

  1. H. Abelson and G. J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA, USA, 2nd edition, 1996. ISBN 0262011530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. Communications of the ACM, 53: 90--101, August 2010. ISSN 0001-0782. URL http://doi.acm.org/10.1145/1787234.1787255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A Language and Compiler for Algorithmic Choice. In Proceedings of the 2009 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '09, pages 38--49, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Barrett, M. Berry, J. Dongarra, V. Eijkhout, and C. Romine. Algorithmic bombardment for the iterative solution of linear systems: A poly-iterative approach. Jnl. of Computational & Appl. Math., 74: 91--110, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C+. In OOPSLA '09: Proceeding of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, pages 81--96. ACM, 2009. ISBN 978--1--60558--766-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Bhowmick, L. C. McInnes, B. Norris, and P. Raghavan. The role of multi-method linear solvers in pde-based simulations. In ICCSA (1), pages 828--839, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bhowmik and M. Franklin. A general compiler framework for speculative multithreading. In SPAA '02: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 99--108, New York, NY, USA, 2002. ACM. ISBN 1--58113--529--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Blundell, E. Lewis, and M. Martin. Subtleties of transactional memory atomicity semantics. IEEE Computer Architecture Letters, 5 (2): 17, 2006. ISSN 1556--6056. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H.-J. Boehm. Threads Cannot be Implemented As a Library. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '05, pages 261--268, New York, NY, USA, 2005. ACM. ISBN 1--59593-056--6. URL http://doi.acm.org/10.1145/1065010.1065042. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Boehm, Hans-J. and Adve, Sarita V. Foundations of the C+ Concurrency Memory Model. In Proceedings of the 2008 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI 2008, pages 68--78, New York, NY, USA, 2008. ACM. ISBN 978--1--59593--860--2. URL http://doi.acm.org/10.1145/1375581.1375591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Chen, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10: Proceedings of ACM SIGPLAN 2010 conference on Programming Language Design and Implementation, volume 45, pages 62--73, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Cledat, T. Kumar, J. Sreeram, and S. Pande. Opportunistic Computing: A New Paradigm for Scalable Realism on Many-Cores. In Proceedings of the First USENIX conference on Hot topics in parallelism, HotPar'09, pages 5--5, Berkeley, CA, USA, 2009. USENIX Association. URL http://portal.acm.org/citation.cfm?id=1855591.1855596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DIMACS. Discrete Mathematics and Theoretical Computer Science, A National Science Foundation Science and Technology Center. http://dimacs.rutgers.edu/, April 2011.Google ScholarGoogle Scholar
  14. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of ACM SIGPLAN 2007 conference on Programming Language Design and Implementation, volume 42, pages 223--234, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Doug Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html, April 2011.Google ScholarGoogle Scholar
  16. T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In PLDI '04: Proceedings of ACM SIGPLAN 2004 conference on Programming Language Design and Implementation, volume 39, pages 59--70, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Speculative thread decomposition through empirical optimization. In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 205--214, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--602--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast Track: A Software System for Speculative Program Optimization. In CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 157--168, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0--7695--3576-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 211--222, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--633--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS XIII: Proceedings of the 13th International conference on Architectural Support for Programming Languages and Operating Systems, volume 36, pages 233--243, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much Parallelism is There in Irregular Applications? In PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 3--14, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--397--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 158--167, New York, NY, USA, 2006. ACM. ISBN 1--59593--189--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: Proceedings of the 13th International conference on Architectural Support for Programming Languages and Operating Systems, pages 329--339. ACM, 2008. ISBN 978--1--59593--958--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Luo, V. Packirisamy, W.-C. Hsu, A. Zhai, N. Mungre, and A. Tarkas. Dynamic performance tuning for speculative threads. In ISCA '09: Proceedings of the 22nd annual International Symposium on Computer Architecture, volume 37, pages 462--473, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marco Pagliari. Graphcol: Graph Coloring Heuristic Tool. http://www.cs.sunysb.edu/ algorith/implement/graphcol/implement.shtml, April 2011.Google ScholarGoogle Scholar
  26. P. Marcuello and A. González. Thread-Spawning Schemes for Speculative Multithreading. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 55, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Patterson, David A. and Hennessy, John L. Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 4th edition, 2008. ISBN 0123744938, 9780123744937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Prabhu, G. Ramalingam, and K. Vaswani. Safe Programmable Speculative Parallelism. In PLDI '10: Proceedings of ACM SIGPLAN 2010 conference on Programming Language Design and Implementation, volume 45, pages 50--61, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. K. Pyla and S. Varadarajan. Avoiding Deadlock Avoidance. In PACT 2010: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In ASPLOS XV: Proceedings of the 15th International conference on Architectural Support for Programming Languages and Operating Systems, volume 38, pages 65--76, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems, 10 (2): 160--180, 1999. ISSN 1045--9219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. R. Rice and R. F. Boisvert. Solving Elliptic Problems Using ELLPACK. Springer-Verlag, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing, Boston, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical Report 90--20, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1990.Google ScholarGoogle Scholar
  35. J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23 (3): 253--300, 2005. ISSN 0734--2071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Thomas Wang. Sorting Algorithm Examples. http://www.concentric.net/ttwang/sort/sort.htm, April 2011.Google ScholarGoogle Scholar
  37. C. Tian, M. Feng, N. Vijay, and G. Rajiv. Copy or Discard execution model for speculative parallelization on multicores. In MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, pages 330--341, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978--1--4244--2836--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. O. Trachsel and T. R. Gross. Variant-based competitive Parallel Execution of Sequential Programs. In Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 197--206, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0044--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. O. Trachsel and T. R. Gross. Supporting Application-Specific Speculation with Competitive Parallel Execution. In 3rd ISCA Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, PESPMA'10, 2010.Google ScholarGoogle Scholar
  40. C. von Praun, L. Ceze, and C. Caşcaval. Implicit Parallelism with Ordered Transactions. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP 2007, pages 79--89, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--602--8. URL http://doi.acm.org/10.1145/1229428.1229443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Zhang, C. Sun, and S. Lu. Conmem: detecting severe concurrency bugs through an effect-oriented approach. In ASPLOS XV:Proceedings of the 15th International conference on Architectural Support for Programming Languages and Operating Systems, pages 179--192, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--839--1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting coarse-grain speculative parallelism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!