Abstract
This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations across parallel control constructs. Remedying this situation is generally thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics.
Tapir is a compiler IR that represents logically parallel tasks asymmetrically in the program's control flow graph. Tapir allows the compiler to optimize across parallel control constructs with only minor changes to its existing analyses and code transformations. To prototype Tapir in the LLVM compiler, for example, we added or modified about 6000 lines of LLVM's 4-million-line codebase. Tapir enables LLVM's existing compiler optimizations for serial code -- including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination -- to work with parallel control constructs such as spawning and parallel loops. Tapir also supports parallel optimizations such as loop scheduling.
- S. Agarwal, R. Barik, V. Sarkar, and R. K. Shyamasundar. May-happen-in-parallel analysis of X10 programs. In PPoPP, pages 183--193, 2007. Google Scholar
Digital Library
- A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, second edition, 2006.Google Scholar
Digital Library
- J. Aldrich, C. Chambers, E. Sirer, and S. Eggers. Static analyses for eliminating unnecessary synchronization from Java programs. In A. Cortesi and G. File, editors, Static Analysis, volume 1694 of Lecture Notes in Computer Science, pages 19--38. 1999.Google Scholar
Cross Ref
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA, pages 119--129, 1998. Google Scholar
Digital Library
- E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 20(3):404--418, 2009. Google Scholar
Digital Library
- R. Barik and V. Sarkar. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT, pages 41--52, 2009. Google Scholar
Digital Library
- R. Barik, J. Zhao, and V. Sarkar. Interprocedural strength reduction of critical sections in explicitly-parallel programs. In PACT, pages 29--40, 2013. Google Scholar
Cross Ref
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5): 720--748, 1999. Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996. Google Scholar
Digital Library
- H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency memory model. In PLDI, pages 68--78, 2008.Google Scholar
Digital Library
- V. Cave, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old X10. In PPPJ, pages 51--61, 2011. Google Scholar
Digital Library
- P. Chatarasi, J. Shirako, and V. Sarkar. Polyhedral optimizations of explicitly parallel programs. In PACT, pages 213--226, 2015. Google Scholar
Digital Library
- W. Du, R. Ferreira, and G. Agrawal. Compiler support for exploiting coarse-grained pipelined parallelism. In SC, pages 8--21, 2003. Google Scholar
Digital Library
- M. Feng and C. E. Leiserson. Efficient detection of determinacy races in Cilk programs. Theory of Computing Systems, 32(3):301--326, 1999. Google Scholar
Cross Ref
- J. T. Fineman and C. E. Leiserson. Race detectors for Cilk and Cilk++ programs. In D. Padua, editor, Encyclopedia of Parallel Computing, pages 1706--1719. 2011.Google Scholar
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212--223, 1998. Google Scholar
Digital Library
- GCC Team. GCC 4.9 release series changes, new features, and fixes. Available at https://gcc.gnu.org/gcc-4.9/ changes.html, 2014.Google Scholar
- GCC Team. GOMP -- an OpenMP implementation for GCC. Available at https://gcc.gnu.org/projects/gomp/, 2015.Google Scholar
- D. Grunwald and H. Srinivasan. Data flow equations for explicitly parallel programs. In PPoPP, pages 159--168, 1993. Google Scholar
Digital Library
- C. A. R. Hoare. Algorithm 64: Quicksort. CACM, 4(7):321,1961. ISSN 0001-0782.Google Scholar
- Intel Corporation. Intel Cilk Plus Language Specification, 2010. Document Number: 324396-001US. Available from http://software.intel.com/sites/products/cilk-plus/cilk_plus_language_specification.pdf.Google Scholar
- Intel Corporation. Intel Cilk Plus Application Binary Interface Specification, 2010. Document Number: 324512-001US. Available from https://software.intel.com/sites/products/cilk-plus/cilk_plus_abi.pdf.Google Scholar
- Intel Corporation. Cilk Plus/LLVM. Available from http://cilkplus.github.io/, 2013.Google Scholar
- Intel Corporation. Intel C++ Compiler 16.0 User and Reference Guide, 2015.Google Scholar
- Intel Corporation. Intel Cilk Plus samples. Available from https://software.intel.com/en-us/codesamples/intel-compiler/intel-compiler-features/ intelcilkplus, 2016.Google Scholar
- P. G. Joisha, R. S. Schreiber, P. Banerjee, H. J. Boehm, and D. R. Chakrabarti. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In POPL, pages 623--636, 2011. Google Scholar
Digital Library
- H. Jordan, S. Pellegrini, P. Thoman, K. Kofler, and T. Fahringer. INSPIRE: The Insieme parallel intermediate representation. In PACT, pages 7--18, 2013.Google Scholar
Cross Ref
- B. W. Kernighan and D. M. Ritchie. The C Programming Language. Prentice Hall, Inc., second edition, 1988.Google Scholar
Digital Library
- D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin. SPIRE, a sequential to parallel intermediate representation extension. Technical report, Technical Report CRI/A-487, MINES ParisTech, 2012.Google Scholar
- D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chapman. LLVM parallel intermediate representation: Design and evaluation using OpenSHMEM communications. In LLVM, pages 2:1--2:8, 2015.Google Scholar
Digital Library
- J. Knoop, B. Steffen, and J. Vollmer. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM TOPLAS, pages 268--299, 1996.Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO, pages 75--87, 2004. Google Scholar
Cross Ref
- I.-T. A. Lee, C. E. Leiserson, T. B. Schardl, Z. Zhang, and J. Sukha. On-the-fly pipeline parallelism. ACM TOPC, 2(3): 17:1--17:42, 2015.Google Scholar
Digital Library
- J. Lee, S. P. Midkiff, and D. A. Padua. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In LCPC, pages 114--130, 1997.Google Scholar
- C. E. Leiserson. The Cilk++ concurrency platform. Journal of Supercomputing, 51(3):244--257, 2010. Google Scholar
Digital Library
- LLVM Developer List. LLVMdev discussions on Intel OpenMP proposal. Available from http: //lists.llvm.org/pipermail/llvm-dev/2012-September/053861.html, September 2012.Google Scholar
- LLVM Developer List. LLVMdev Parallelization metadata and intrinsics in LLVM (for OpenMP, etc.). Available from http://lists.llvm.org/pipermail/llvm-dev/2012September/053792.html, September 2012.Google Scholar
- LLVM Developer List. LLVMdev discussions on OpenCL SPIR proposal. Available from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053293.html, September 2012.Google Scholar
- LLVM Developer List. LLVMdev discussions on parallel IR. Available from http://lists.llvm.org/pipermail/llvm-dev/2015-March/083314.html, March 2015.Google Scholar
- LLVM Project. OpenMP®: Support for the OpenMP language. Available at http://openmp.llvm.org/, 2015.Google Scholar
- LLVM Project. LLVM Language Reference Manual, 2015. Available from http://llvm.org/docs/LangRef.html.Google Scholar
- LLVM Project. LLVM's Analysis and Transform Passes, 2015. Available from http://llvm.org/docs/Passes.html.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new framework for parallel machine learning. In UAI, pages 340--349, 2010.Google Scholar
Digital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, pages 716--727, 2012. Google Scholar
Digital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146, 2010. Google Scholar
Digital Library
- M. McCool, A. D. Robison, and J. Reinders. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier Science, 2012.Google Scholar
Digital Library
- S. P. Midkiff and D. A. Padua. Issues in the optimization of parallel programs. In ICPP, pages 105--113, 1990.Google Scholar
- S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997.Google Scholar
Digital Library
- A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical modeling of pipeline parallelism. In PACT, pages 281--290, 2009. Google Scholar
Digital Library
- R. H. B. Netzer and B. P. Miller. What are race conditions? ACM Letters on Programming Languages and Systems, 1(1): 74--88, 1992. Google Scholar
Digital Library
- D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In SOSP, pages 456--471, 2013. Google Scholar
Digital Library
- D. Nguyen, A. Lenharth, and K. Pingali. Deterministic Galois: On-demand, portable and parameterless. In ASPLOS, pages 499--512, 2014.Google Scholar
Digital Library
- D. Novillo, R. Unrau, and J. Schaeffer. Concurrent SSA form in the presence of mutual exclusion. In ICPP, pages 356--364, 1998. Google Scholar
Cross Ref
- OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 4.0, July 2013. Available from http://www.openmp.org/mp-documents/ OpenMP4.0.0.pdf.Google Scholar
- A. Pop and A. Cohen. Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers. In CPC, 2010. URL https://hal.inria.fr/inria-00551518.Google Scholar
- W. Pugh. Fixing the Java memory model. In JAVA, pages 89--98, 1999. Google Scholar
Digital Library
- E. Ruf. Effective synchronization removal for Java. In PLDI, pages 208--218, 2000. Google Scholar
Digital Library
- R. Rugina and M. C. Rinard. Pointer analysis for structured parallel programs. TOPLAS, pages 70--116, Jan. 2003.Google Scholar
Digital Library
- V. Sarkar. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In LCPC, pages 94--113, 1998.Google Scholar
- V. Sarkar and B. Simons. Parallel program graphs and their classification. In LCPC, pages 633--655, 1994. Google Scholar
Cross Ref
- T. B. Schardl, B. C. Kuszmaul, I.-T. A. Lee, W. M. Leiserson, and C. E. Leiserson. The Cilkprof scalability profiler. In SPAA, pages 89--100, 2015. Google Scholar
Digital Library
- J. Shun and G. E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google Scholar
Digital Library
- J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In SPAA, pages 68--70, 2012. Google Scholar
Digital Library
- J. Shun, L. Dhulipala, and G. E. Blelloch. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In DCC, pages 403--412, 2015.Google Scholar
Digital Library
- H. Srinivasan and D. Grunwald. An efficient construction of parallel static single assignment form for structured parallel programs. Technical report, Technical Report CU-CS-564--91, University of Colorado at Boulder, 1991.Google Scholar
- H. Srinivasan and M. Wolfe. Analyzing programs with explicit parallelism. In LCPC, pages 405--419, 1991.Google Scholar
- H. Srinivasan, J. Hook, and M. Wolfe. Static single assignment for explicitly parallel programs. In POPL, pages 260-- 272, 1993. Google Scholar
Digital Library
- R. M. Stallman and the GCC Developer Community. Using the GNU Compiler Collection (for GCC version 6.1.0). Free Software Foundation, 2016.Google Scholar
- B. Stroustrup. The C++ Programming Language. AddisonWesley, 4th edition, 2013.Google Scholar
Digital Library
- R. Utterback, K. Agrawal, J. T. Fineman, and I.-T. A. Lee. Provably good and practically efficient parallel race detection for fork-join programs. In SPAA, pages 83--94, 2016. Google Scholar
Digital Library
- V. Vafeiadis, T. Balabonski, S. Chakraborty, R. Morisset, and F. Zappa Nardelli. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In POPL, pages 209--220, 2015. Google Scholar
Digital Library
- J. Zhao and V. Sarkar. Intermediate language extensions for parallelism. In SPLASH, pages 329--340, 2011. Google Scholar
Digital Library
Index Terms
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
Recommendations
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation
Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers ...
OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingThis paper presents OpenCilk, an open-source software infrastructure for task-parallel programming that allows for substantial code reuse and easy exploration of design choices in language abstraction, compilation strategy, runtime mechanism, and ...







Comments