Abstract
Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks such as the Pluto algorithm, which include a cost function for modern multicore architectures for which coarse-grained parallelism and locality are crucial, consider only a subspace of transformations to avoid a combinatorial explosion in finding transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations: in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In addition, there is currently no proof that the algorithm successfully finds a tree of permutable loop bands for all affine loop nests. In this article, we propose an approach to address these two issues (1) by modeling a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, and (2) by extending the Pluto algorithm in a way that allows a proof for its soundness and completeness for all affine loop nests. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance for any benchmark from Polybench. For the Lattice Boltzmann Method (LBM) simulations with periodic boundary conditions, it provides a mean speedup of 1.33 × over Pluto. We also show that Pluto+ does not increase compilation time significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time by only 15%. In cases in which it improves execution time significantly, it increased polyhedral optimization time by only 2.04 × .
- Aravind Acharya and Uday Bondhugula. 2015. PLUTO+: Near-complete modeling of affine transformations for parallelism and locality. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY. Google Scholar
Digital Library
- N. Ahmed, Nikolay Mateev, and Keshav Pingali. 2001. Synthesizing transformations for locality enhancement of imperfectly-nested loops. International Journal of Parallel Programming 29, 5. Google Scholar
Digital Library
- Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, and Monica S. Lam. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Prentice Hall, Upper Saddle River, NJ. Google Scholar
Digital Library
- Corinne Ancourt and Francois Irigoin. 1991. Scanning polyhedra with DO loops. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, NY, 39--50. Google Scholar
Digital Library
- Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling stencil computations to maximize parallelism. In Supercomputing. Article 40, 11 pages. Google Scholar
Digital Library
- Cédric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In International Conference on Parallel Architectures and Compilation Techniques (PACT). 7--16. DOI:http://dx.doi.org/10.1109/PACT.2004.11 Google Scholar
Digital Library
- Uday Bondhugula. 2008. Effective Automatic Parallelization and Locality Optimization Using the Polyhedral Model. Ph.D. Dissertation. The Ohio State University, Columbus, OH. Google Scholar
Digital Library
- Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, and Nicolas Vasilache. 2014. Tiling and optimizing time-iterated computations on periodic domains. In International Conference on Parallel Architectures and Compilation Techniques (PACT’14). 39--50. Google Scholar
Digital Library
- Uday Bondhugula, M. Baskaran, Sriram Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. 2008a. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International Conference on Compiler Construction (ETAPS CC’08). Google Scholar
Digital Library
- Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In International Conference on Parallel Architectures and Compilation Techniques. 343--352. Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008b. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN Symposium on Programming Languages Design and Implementation (PLDI’08). ACM, New York, NY, 101--113. Google Scholar
Digital Library
- Chun Chen. 2012. Polyhedra scanning revisited. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, NY, 499--508. Google Scholar
Digital Library
- Shiyi Chen and Gary D. Doolen. 1998. Lattice Boltzmann method for fluid flows. Annual Review of Fluid Mechanics 30, 1, 329--364.Google Scholar
Cross Ref
- C. Choffrut and K. Culik. 1983. Folding of the plane and the design of systolic arrays. Information Processing Letters 17, 3, 149--153.Google Scholar
Cross Ref
- Cloog 2004. The CLooG Code Generator in the Polyhedral Model. http://www.cloog.org.Google Scholar
- COIN-OR. 2001. Computational Infrastructure for Operations Research. Retrieved March 2, 2016 from http://www.coin-or.org.Google Scholar
- Dominique d’Humières. 2002. Multiple--relaxation--time Lattice Boltzmann models in three dimensions. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 360, 1792, 437--451.Google Scholar
- P. Feautrier. 1988. Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3, 243--268.Google Scholar
Cross Ref
- P. Feautrier. 1991. Dataflow analysis of scalar and array references. International Journal of Parallel Programming 20, 1, 23--53.Google Scholar
Cross Ref
- P. Feautrier. 1992a. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. International Journal of Parallel Programming 21, 5, 313--348. Google Scholar
Digital Library
- P. Feautrier. 1992b. Some efficient solutions to the affine scheduling problem: Part II, multidimensional time. International Journal of Parallel Programming 21, 6, 389--420. Google Scholar
Digital Library
- GNU. 2012. GLPK (GNU Linear Programming Kit). Retrieved March 2, 2016 from https://www.gnu.org/software/glpk/.Google Scholar
- Martin Griebl. 2004. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. Habilitation thesis. University of Passau, Passau, Germany.Google Scholar
- Tobias Grosser, Hongbin Zheng, Ragesh Aloor, Andreas Simbrger, Armin Grolinger, and Louis-Noël Pouchet. 2011. Polly polyhedral optimization in LLVM. In Proceedings of the 1st International Workshop on Polyhedral Compilation Techniques (IMPACT’11).Google Scholar
- Albert Hartono, Muthu Baskaran, Cédric Bastoul, Albert Cohen, Sriram Krishnamoorthy, Boyana Norris, and J. Ramanujam. 2009. A parametric multi-level tiler for imperfect loop nests. In International Conference on Supercomputing (ICS’09). Google Scholar
Digital Library
- Tom Henretty, Kevin Stock, Louis-Noël Pouchet, Franz Franchetti, J. Ramanujam, and P. Sadayappan. 2011. Data layout transformation for stencil computations on short SIMD architectures. In ETAPS International Conference on Compiler Construction (CC’11). 225--245. Google Scholar
Digital Library
- Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noël Pouchet, J. Ramanujam, and P. Sadayappan. 2013. A stencil compiler for short-vector SIMD architectures. In ACM International Conference on Supercomputing. ACM, New York, NY. Google Scholar
Digital Library
- F. Irigoin and R. Triolet. 1988. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages. ACM, New York, NY, 319--329. Google Scholar
Digital Library
- W. Kelly and W. Pugh. 1995. A Unifying Framework for Iteration Reordering Transformations. Technical Report CS-TR-3430. Department of Computer Science, University of Maryland, College Park, MD. Google Scholar
Digital Library
- W. Kelly, W. Pugh, and E. Rosser. 1995. Code generation for multiple mappings. In International Symposium on the Frontiers of Massively Parallel Computation. 332--341. Google Scholar
Digital Library
- Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-Noël Pouchet, and P. Sadayappan. 2013. When polyhedral transformations meet SIMD code generation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). ACM, New York, NY. Google Scholar
Digital Library
- A. Leung, N. T. Vasilache, B. Meister, and R. A. Lethin. 2010. Methods and apparatus for joint parallelism and locality optimization in source code compilation. (June 2010). US Patent 2010/0070956 A1. Filed September 16, 2009, Issued March 18, 2010.Google Scholar
- Wei Li and Keshav Pingali. 1994. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming 22, 2, 183--205. Google Scholar
Digital Library
- A. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An affine partitioning algorithm to maximize parallelism and minimize communication. In ACM International Conference on Supercomputing (ICS’99). 228--237. Google Scholar
Digital Library
- A. Lim and Monica S. Lam. 1997. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 201--214. Google Scholar
Digital Library
- A. Lim and Monica S. Lam. 1998. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing 24, 3--4, 445--475. Google Scholar
Digital Library
- Robin Lougee-Heimer. 2003. The common optimization interface for operations research. IBM Journal of Research and Development 47, 1, 57--66. Google Scholar
Digital Library
- Benoit Meister, Allen Leung, Nicolas Vasilache, David Wohlford, Cédric Bastoul, and Richard Lethin. 2009. Productivity via automatic code generation for PGAS platforms with the R-stream compiler. In Workshop on Asynchrony in the PGAS Programming Model.Google Scholar
- Benoît Meister, Nicolas Vasilache, David Wohlford, Muthu Baskaran, Allen Leung, and Richard Lethin. 2011. R-stream compiler. In Encyclopedia of Parallel Computing. 1756--1765.Google Scholar
- Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). Google Scholar
Digital Library
- Nissa Osheim, Michelle Mills Strout, Dave Rostron, and Sanjay Rajopadhye. 2008. Smashing: Folding space to tile through time. In Workshop on Languages and Compilers for Parallel Computing (LCPC’08). Springer, Berlin, 80--93. Google Scholar
Digital Library
- Palabos. 2012. Palabos. Retrieved March 2, 2016 from http://www.palabos.org/.Google Scholar
- Irshad Pananilath, Aravind Acharya, Vinay Vasista, and Uday Bondhugula. 2015. An optimizing code generator for a class of Lattice-Boltzmann computations. ACM Transactions on Architecture and Code Optimization. 12, 2, 14:1--14:23. Google Scholar
Digital Library
- Pluto 2008. PLUTO: An automatic parallelizer and locality optimizer for affine loop nests. http://pluto-compiler.sourceforge.net.Google Scholar
- Polybench 2010. Polybench suite. Retrieved March 2, 2016 from http://polybench.sourceforge.net.Google Scholar
- L.-N. Pouchet, Cédric Bastoul, John Cavazos, and Albert Cohen. 2008. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN Symposium on Programming Languages Design and Implementation (PLDI). Google Scholar
Digital Library
- Louis-Noël Pouchet, Uday Bondhugula, Cédric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan, and Nicolas Vasilache. 2011. Loop transformations: Convexity, pruning and optimization. In ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11). Google Scholar
Digital Library
- Fabien Quilleré, Sanjay V. Rajopadhye, and Doran Wilde. 2000. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming 28, 5, 469--498. Google Scholar
Digital Library
- Gabe Rudy, Malik Murtaza Khan, Mary Hall, Chun Chen, and Jacqueline Chame. 2011. A programming language interface to describe transformations and code generation. In Proceedings of the 23rd International Conference on Languages and Compilers for Parallel Computing (LCPC’10). Springer, Berlin, 136--150. Google Scholar
Digital Library
- Robert Sadourny. 1975. The dynamics of finite-difference models of the shallow-water equations. Journal of the Atmospheric Sciences 32, 4.Google Scholar
Cross Ref
- Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons, Hoboken, NJ. Google Scholar
Digital Library
- Robert Strzodka, Mohammed Shaheen, Dawid Pajak, and Hans-Peter Seidel. 2011. Cache accurate time skewing in iterative stencil computations. In International Conference on Parallel Processing (ICPP’11). 571--581. Google Scholar
Digital Library
- Paul N. Swarztrauber. 2000. 171.swim SPEC CPU2000 Benchmark Description File. Standard Performance Evaluation Corporation. Retrieved March 2, 2016 from http://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html.Google Scholar
- Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). New York, NY, 117--128. Google Scholar
Digital Library
- Nicolas Vasilache. 2007. Scalable Program Optimization Techniques in the Polyhedral Model. Ph.D. Dissertation. Université de Paris-Sud, INRIA Futurs, Orsay, France.Google Scholar
- Nicolas Vasilache, Benoit Meister, Muthu Baskaran, and Richard Lethin. 2012. Joint scheduling and layout optimization to enable multi-level vectorization. In International Workshop on Polyhedral Compilation Techniques (IMPACT’12).Google Scholar
- Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In Mathematical Software—ICMS 2010, Komei Fukuda, Joris Hoeven, Michael Joswig, and Nobuki Takayama (Eds.). Lecture Series in Computer Science, Vol. 6327. Springer, Berlin, 299--302. Google Scholar
Digital Library
- Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral parallel code generation for CUDA. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4, 54:1--54:23. Google Scholar
Digital Library
- M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In ACM SIGPLAN Symposium on Programming Languages Design and Implementation. ACM, New York, NY, 30--44. Google Scholar
Digital Library
- D. Wonnacott. 2000. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In IPDPS. 171--180. Google Scholar
Digital Library
- Yoav Yaacoby and Peter R. Cappello. 1995. Converting affine recurrence equations to quasi-uniform recurrence equations. VLSI Signal Processing 11, 1--2, 113--131. Google Scholar
Digital Library
- Tomofumi Yuki. 2014. Understanding Polybench/C 3.2 kernels. In International Workshop on Polyhedral Compilation Techniques (IMPACT’14).Google Scholar
- Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye. 2013. AlphaZ: A system for design space exploration in the polyhedral model. In Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, Vol. 7760. Springer, Berlin, 17--31.Google Scholar
- Yutao Zhong, Xipeng Shen, and Chen Ding. 2009. Program locality analysis using reuse distance. ACM Transactions on Programming Languages and Systems 31, 6, Article 20, 39 pages. Google Scholar
Digital Library
- Qisu Zou and Xiaoyi He. 1997. On pressure and velocity boundary conditions for the Lattice Boltzmann BGK model. Physics of Fluids (1994-present) 9, 6, 1591--1598.Google Scholar
Recommendations
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingAffine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler ...






Comments