Abstract
Compiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful yet costly integer programming algorithms to compose loop optimizations. As a result, the benefits that a polyhedral compiler has to offer to programs such as real scientific applications that contain sequences of loop nests, remain impractical for the common users. In this work, we address this scalability problem in polyhedral compilers. We identify three causes of unscalability, each of which stems from large number of statements and dependences in the program scope. We propose a one-shot solution to the problem by reducing the effective number of statements and dependences as seen by the compiler. We achieve this by representing a sequence of statements in a program by a single super-statement. This set of super-statements exposes the minimum sufficient constraints to the Integer Linear Programming (ILP) solver for finding correct optimizations. We implement our approach in the PLuTo polyhedral compiler and find that it condenses the program statements and program dependences by factors of 4.7x and 6.4x, respectively, averaged over 9 hot regions (ranging from 48 to 121 statements) in 5 real applications. As a result, the improvements in time and memory requirement for compilation are 268x and 20x, respectively, over the latest version of the PLuTo compiler. The final compile times are comparable to the Intel compiler while the performance is 1.92x better on average due to the latter’s conservative approach to loop optimization.
- R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9:491–542, 1987. Google Scholar
Digital Library
- U. K. Banerjee. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Norwell, MA, USA, 1988. Google Scholar
Digital Library
- S. G. Bhaskaracharya and U. Bondhugula. Polyglot: a polyhedral loop transformation framework for a graphical dataflow language. In Compiler Construction, pages 123–143. Springer, 2013. Google Scholar
Digital Library
- U. Bondhugula. Pluto: An automatic parallelizer and locality optimizer for multicores, 2014. Available at http:// pluto-compiler.sourceforge.net.Google Scholar
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 101–113, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- U. Bondhugula, V. Bandishti, A. Cohen, G. Potron, and N. Vasilache. Tiling and optimizing time-iterated computations on periodic domains. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pages 39–50, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- G. B. Dantzig and B. Curtis Eaves. Fourier-motzkin elimination and its dual. Journal of Combinatorial Theory, Series A, 14(3):288–297, 1973.Google Scholar
Cross Ref
- C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. JPDC, 64(1):108 – 134, 2004. Google Scholar
Digital Library
- P. Feautrier. Parametric integer programming. RAIRO Recherche Op’erationnelle, 22, 1988.Google Scholar
- P. Feautrier. Scalable and structured scheduling. International Journal of Parallel Programming, 34(5):459–487, 2006. Google Scholar
Digital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319–349, 1987. Google Scholar
Digital Library
- G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 15–29, New York, NY, USA, 1991. ACM. Google Scholar
Digital Library
- T. Grosser, A. Groesslinger, and C. Lengauer. Polly - performing polyhedral optimizations on a low-level intermediate representation. Parallel Processing Letters, 22(04), 2012.Google Scholar
Cross Ref
- N. P. Johnson, T. Oh, A. Zaks, and D. I. August. Fast condensation of the program dependence graph. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 39–50, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC, volume 768 of Lecture Notes in Computer Science, pages 301–320. 1994. Google Scholar
Digital Library
- L. Lamport. The parallel execution of do loops. Commun. ACM, 17 (2):83–93, Feb. 1974. Google Scholar
Digital Library
- D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, PLDI ’91, pages 1–14, New York, NY, USA, 1991. ACM. Google Scholar
Digital Library
- N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In SPAA, pages 282–291. ACM, 1997. Google Scholar
Digital Library
- S. Mehta, P.-H. Lin, and P.-C. Yew. Revisiting loop fusion in the polyhedral framework. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 233–246, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- S. P. Midkiff. Automatic parallelization: An overview of fundamental compiler techniques. Synthesis Lectures on Computer Architecture, 7 (1):1–169, 2012. Google Scholar
Digital Library
- J. Ng, D. Kulkarni, W. Li, R. Cox, and S. Bobholz. Inter-procedural loop fusion, array contraction and rotation. In Parallel Architectures and Compilation Techniques, 2003. PACT 2003. Proceedings. 12th International Conference on, pages 114–124, 2003. Google Scholar
Digital Library
- L.-N. Pouchet and M. Narayan. Polyopt: a polyhedral optimizer for the rose compiler, 2014. Available at http://www.cse. ohio-state.edu/˜pouchet/software/polyopt/.Google Scholar
- L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedralbased data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 29–38, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 4–13. ACM, 1991. Google Scholar
Digital Library
- A. J. Thadhani. Factors affecting programmer productivity during application development. IBM Systems Journal, 23(1):19–35, 1984. Google Scholar
Digital Library
- R. Upadrasta and A. Cohen. Sub-polyhedral scheduling using (unit- )two-variable-per-inequality polyhedra. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’13, pages 483–496, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- N. Vasilache, C. Bastoul, A. Cohen, and S. Girbal. Violated dependence analysis. In Proceedings of the 20th annual international conference on Supercomputing, pages 335–344. ACM, 2006. Google Scholar
Digital Library
- A. Venkat, M. Shantharam, M. Hall, and M. M. Strout. Non-affine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pages 185:185–185:194, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software ICMS 2010, volume 6327 of Lecture Notes in Computer Science, pages 299–302. Springer Berlin Heidelberg, 2010. Google Scholar
Digital Library
- S. Verdoolaege. Integer set coalescing. In 5th International Workshop on Polyhedral Compilation Techniques (IMPACT), 2015.Google Scholar
- M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. International Journal of Parallel Programming, 16(2):137–178, 1987. Google Scholar
Digital Library
- M. Wolfe and C.-W. Tseng. The power test for data dependence. Parallel and Distributed Systems, IEEE Transactions on, 3(5):591– 601, 1992. Google Scholar
Digital Library
- M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. Google Scholar
Digital Library
Index Terms
Improving compiler scalability: optimizing large programs at small price
Recommendations
Improving compiler scalability: optimizing large programs at small price
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationCompiler scalability is a well known problem: reasoning about the application of useful optimizations over large program scopes consumes too much time and memory during compilation. This problem is exacerbated in polyhedral compilers that use powerful ...
The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests
Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. ...
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionCoarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...






Comments