Abstract
Speculative execution at coarse granularities (e.g., code-blocks, methods, algorithms) offers a promising programming model for exploiting parallelism on modern architectures. In this paper we present Anumita, a framework that includes programming constructs and a supporting runtime system to enable the use of coarse-grain speculation to improve program performance, without burdening the programmer with the complexity of creating, managing and retiring speculations. Speculations may be composed by specifying surrogate code blocks at any arbitrary granularity, which are then executed concurrently, with a single winner ultimately modifying program state. Anumita provides expressive semantics for winner selection that go beyond time to solution to include user-defined notions of quality of solution. Anumita can be used to improve the performance of hard to parallelize algorithms whose performance is highly dependent on input data. Anumita is implemented as a user-level runtime with programming interfaces to C, C++, Fortran and as an OpenMP extension. Performance results from several applications show the efficacy of using coarse-grain speculation to achieve (a) robustness when surrogates fail and (b) significant speedup over static algorithm choices.
- H. Abelson and G. J. Sussman. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, MA, USA, 2nd edition, 1996. ISBN 0262011530. Google Scholar
Digital Library
- S. V. Adve and H.-J. Boehm. Memory Models: A Case for Rethinking Parallel Languages and Hardware. Communications of the ACM, 53: 90--101, August 2010. ISSN 0001-0782. URL http://doi.acm.org/10.1145/1787234.1787255. Google Scholar
Digital Library
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. PetaBricks: A Language and Compiler for Algorithmic Choice. In Proceedings of the 2009 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '09, pages 38--49, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. Google Scholar
Digital Library
- R. Barrett, M. Berry, J. Dongarra, V. Eijkhout, and C. Romine. Algorithmic bombardment for the iterative solution of linear systems: A poly-iterative approach. Jnl. of Computational & Appl. Math., 74: 91--110, 1996. Google Scholar
Digital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C+. In OOPSLA '09: Proceeding of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, pages 81--96. ACM, 2009. ISBN 978--1--60558--766-0. Google Scholar
Digital Library
- S. Bhowmick, L. C. McInnes, B. Norris, and P. Raghavan. The role of multi-method linear solvers in pde-based simulations. In ICCSA (1), pages 828--839, 2003. Google Scholar
Digital Library
- A. Bhowmik and M. Franklin. A general compiler framework for speculative multithreading. In SPAA '02: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, pages 99--108, New York, NY, USA, 2002. ACM. ISBN 1--58113--529--7. Google Scholar
Digital Library
- C. Blundell, E. Lewis, and M. Martin. Subtleties of transactional memory atomicity semantics. IEEE Computer Architecture Letters, 5 (2): 17, 2006. ISSN 1556--6056. Google Scholar
Digital Library
- H.-J. Boehm. Threads Cannot be Implemented As a Library. In Proceedings of the 2005 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '05, pages 261--268, New York, NY, USA, 2005. ACM. ISBN 1--59593-056--6. URL http://doi.acm.org/10.1145/1065010.1065042. Google Scholar
Digital Library
- Boehm, Hans-J. and Adve, Sarita V. Foundations of the C+ Concurrency Memory Model. In Proceedings of the 2008 ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI 2008, pages 68--78, New York, NY, USA, 2008. ACM. ISBN 978--1--59593--860--2. URL http://doi.acm.org/10.1145/1375581.1375591. Google Scholar
Digital Library
- T. Chen, M. Feng, and R. Gupta. Supporting speculative parallelization in the presence of dynamic data structures. In PLDI '10: Proceedings of ACM SIGPLAN 2010 conference on Programming Language Design and Implementation, volume 45, pages 62--73, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- R. Cledat, T. Kumar, J. Sreeram, and S. Pande. Opportunistic Computing: A New Paradigm for Scalable Realism on Many-Cores. In Proceedings of the First USENIX conference on Hot topics in parallelism, HotPar'09, pages 5--5, Berkeley, CA, USA, 2009. USENIX Association. URL http://portal.acm.org/citation.cfm?id=1855591.1855596. Google Scholar
Digital Library
- DIMACS. Discrete Mathematics and Theoretical Computer Science, A National Science Foundation Science and Technology Center. http://dimacs.rutgers.edu/, April 2011.Google Scholar
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In PLDI '07: Proceedings of ACM SIGPLAN 2007 conference on Programming Language Design and Implementation, volume 42, pages 223--234, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- Doug Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html, April 2011.Google Scholar
- T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Min-cut program decomposition for thread-level speculation. In PLDI '04: Proceedings of ACM SIGPLAN 2004 conference on Programming Language Design and Implementation, volume 39, pages 59--70, New York, NY, USA, 2004. ACM. Google Scholar
Digital Library
- T. A. Johnson, R. Eigenmann, and T. N. Vijaykumar. Speculative thread decomposition through empirical optimization. In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 205--214, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--602--8. Google Scholar
Digital Library
- K. Kelsey, T. Bai, C. Ding, and C. Zhang. Fast Track: A Software System for Speculative Program Optimization. In CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, pages 157--168, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0--7695--3576-0. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming Language Design and Implementation, pages 211--222, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--633--2. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS XIII: Proceedings of the 13th International conference on Architectural Support for Programming Languages and Operating Systems, volume 36, pages 233--243, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Casçaval. How much Parallelism is There in Irregular Applications? In PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 3--14, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--397--6. Google Scholar
Digital Library
- W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss, J. Renau, and J. Torrellas. POSH: a TLS compiler that exploits program structure. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 158--167, New York, NY, USA, 2006. ACM. ISBN 1--59593--189--9. Google Scholar
Digital Library
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: Proceedings of the 13th International conference on Architectural Support for Programming Languages and Operating Systems, pages 329--339. ACM, 2008. ISBN 978--1--59593--958--6. Google Scholar
Digital Library
- Y. Luo, V. Packirisamy, W.-C. Hsu, A. Zhai, N. Mungre, and A. Tarkas. Dynamic performance tuning for speculative threads. In ISCA '09: Proceedings of the 22nd annual International Symposium on Computer Architecture, volume 37, pages 462--473, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- Marco Pagliari. Graphcol: Graph Coloring Heuristic Tool. http://www.cs.sunysb.edu/ algorith/implement/graphcol/implement.shtml, April 2011.Google Scholar
- P. Marcuello and A. González. Thread-Spawning Schemes for Speculative Multithreading. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 55, Washington, DC, USA, 2002. IEEE Computer Society. Google Scholar
Digital Library
- Patterson, David A. and Hennessy, John L. Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 4th edition, 2008. ISBN 0123744938, 9780123744937. Google Scholar
Digital Library
- P. Prabhu, G. Ramalingam, and K. Vaswani. Safe Programmable Speculative Parallelism. In PLDI '10: Proceedings of ACM SIGPLAN 2010 conference on Programming Language Design and Implementation, volume 45, pages 50--61, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- H. K. Pyla and S. Varadarajan. Avoiding Deadlock Avoidance. In PACT 2010: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010. Google Scholar
Digital Library
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In ASPLOS XV: Proceedings of the 15th International conference on Architectural Support for Programming Languages and Operating Systems, volume 38, pages 65--76, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Transactions on Parallel Distributed Systems, 10 (2): 160--180, 1999. ISSN 1045--9219. Google Scholar
Digital Library
- J. R. Rice and R. F. Boisvert. Solving Elliptic Problems Using ELLPACK. Springer-Verlag, 1985. Google Scholar
Digital Library
- Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing, Boston, 1996. Google Scholar
Digital Library
- Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical Report 90--20, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1990.Google Scholar
- J. G. Steffan, C. Colohan, A. Zhai, and T. C. Mowry. The STAMPede approach to thread-level speculation. ACM Transactions on Computer Systems, 23 (3): 253--300, 2005. ISSN 0734--2071. Google Scholar
Digital Library
- Thomas Wang. Sorting Algorithm Examples. http://www.concentric.net/ttwang/sort/sort.htm, April 2011.Google Scholar
- C. Tian, M. Feng, N. Vijay, and G. Rajiv. Copy or Discard execution model for speculative parallelization on multicores. In MICRO 41: Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture, pages 330--341, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978--1--4244--2836--6. Google Scholar
Digital Library
- O. Trachsel and T. R. Gross. Variant-based competitive Parallel Execution of Sequential Programs. In Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 197--206, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0044--5. Google Scholar
Digital Library
- O. Trachsel and T. R. Gross. Supporting Application-Specific Speculation with Competitive Parallel Execution. In 3rd ISCA Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures, PESPMA'10, 2010.Google Scholar
- C. von Praun, L. Ceze, and C. Caşcaval. Implicit Parallelism with Ordered Transactions. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP 2007, pages 79--89, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--602--8. URL http://doi.acm.org/10.1145/1229428.1229443. Google Scholar
Digital Library
- W. Zhang, C. Sun, and S. Lu. Conmem: detecting severe concurrency bugs through an effect-oriented approach. In ASPLOS XV:Proceedings of the 15th International conference on Architectural Support for Programming Languages and Operating Systems, pages 179--192, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--839--1. Google Scholar
Digital Library
Index Terms
Exploiting coarse-grain speculative parallelism
Recommendations
Exploiting coarse-grain speculative parallelism
OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applicationsSpeculative execution at coarse granularities (e.g., code-blocks, methods, algorithms) offers a promising programming model for exploiting parallelism on modern architectures. In this paper we present Anumita, a framework that includes programming ...
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureMulti-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, ...
Coarse-grain speculation for emerging processors
OOPSLA '11: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companionThe impending multi/many-core processor revolution requires that programmers leverage explicit concurrency to improve performance. Unfortunately, a large body of applications/algorithms are inherently hard to parallelize due to execution order ...







Comments