ABSTRACT
Parameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size parameterization could also enable optimizations such as register tiling to become dynamic optimizations. Although it is easy to generate such loops for (hyper) rectangular iteration spaces tiled with (hyper) rectangular tiles, many important computations do not fall into this restricted domain. Parameterized tile code generation for the general case of convex iteration spaces being tiled by (hyper) rectangular tiles has in the past been solved with bounding box approaches or symbolic Fourier Motzkin approaches. However, both approaches have less than ideal code generation efficiency and resulting code quality. We present the theoretical foundations, implementation, and experimental validation of a simple, unified technique for generating parameterized tiled code. Our code generation efficiency is comparable to all existing code generation techniques including those for fixed tile sizes, and the resulting code is as efficient as, if not more than, all previous techniques. Thus the technique provides parameterized tiled loops for free! Our "one-size-fits-all" solution, which is available as open source software can be adapted for use in production compilers.
- S. Amarasinghe. Parallelizing Compiler Techniques Based on Linear Inequalities. PhD thesis, Stanford University, 1997. Google Scholar
Digital Library
- S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pages 126--138, New York, NY, USA, 1993. ACM Press. Google Scholar
Digital Library
- C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39--50, April 1991. Google Scholar
Digital Library
- R. Andonov and S. Rajopadhye. Optimal orthogonal tiling of 2-diterations. Journal of Parallel and Distributed Computing, 45(2):159--165, September 1997. Google Scholar
Digital Library
- R. Bagnara, P. M. Hill, and E. Zaffanella. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Quaderno 457, Dipartimento di Matematica, Università di Parma, Italy, 2006.Google Scholar
- C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, Juanles-Pins, September 2004. Google Scholar
Digital Library
- P. Boulet, A. Darte, T. Risset, and Y. Robert. (pen)-ultimate tiling? INTEGRATION, the VLSI journal, 17:33--51, August 1994 Google Scholar
Digital Library
- S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1995. Google Scholar
Digital Library
- G. Goumas, M. Athanasaki, and N. Koziris. An efficient code generation technique for tiled iteration spaces. IEEE Transactions on Parallel and Distributed Systems, 14(10), October 2003. Google Scholar
Digital Library
- A. Größlinger, M. Griebl, and C. Lengauer. Introducing non-linear parameters to the polyhedron model. In Michael Gerndt and Edmond Kereku, editors, Proc. 11th Workshop on Compilers for Parallel Computers (CPC 2004), Research Report Series, pages 1--12. LRRTUM, Technische Universität München, July 2004.Google Scholar
- K. Högstedt, L. Carter, and J. Ferrante. Determining the idle time of a tiling. In Principles of Programming Languages, pages 160--173, Paris, France, January 1997. ACM. Google Scholar
Digital Library
- F. Irigoin and R. Triolet. Supernode partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, January 1988. Google Scholar
Digital Library
- M. Jiménez, J. M. Llabería, and A. Fernández. Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst., 24(4):409--453, 2002. Google Scholar
Digital Library
- S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explict optimizations for stencil computations. In Memory Systems Performance and Correctness, 2006. Google Scholar
Digital Library
- W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers '95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, 1995. Google Scholar
Digital Library
- T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Iterative compilation. In Embedded processor design challenges: systems, architectures, modeling, and simulation-SAMOS, pages 171--187. Springer-Verlag New York, Inc., New York, NY, USA, 2002. Google Scholar
Digital Library
- M. S. Lam and M. E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459, 1991. Google Scholar
Digital Library
- H. Le Verge, V. Van Dongen, and D. Wilde. La synthèse de nids de boucles avec la bibliothèque polyédrique. In RenPar'6, Lyon, France, June 1994. English version "Loop Nest Synthesis Using the Polyhedral Library"in IRISA TR 830, May 1994.Google Scholar
- H. Le Verge, V. Van Dongen, and D. Wilde. Loop nest synthesis using the polyhedral library. Technical Report PI 830, IRISA, Rennes, France, May 1994. Also published as INRIA Research Report 2288.Google Scholar
- D. K. Lowenthal. Accurately selecting block size at runtime in pipelined parallel programs. Int. J. Parallel Program., 28(3):245--274, 2000. Google Scholar
Digital Library
- D.S. Nikolopoulos. Dynamic tiling for effective use of shared caches on multithreaded processors. International Journal of High Performance Computing and Networking, pages 22--35, 2004. Google Scholar
Digital Library
- W. Pugh. Omega test: A practical algorithm for exact array dependency analysis. Comm. of the ACM, 35(8):102, 1992. Google Scholar
Digital Library
- M. Püschel, J.M.F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proceedings of the IEEE, 93(2):232--275, February 2005.Google Scholar
Cross Ref
- F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal Parallel Programming, 28(5):469--498, 2000. Google Scholar
Digital Library
- J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--120, 1992.Google Scholar
Cross Ref
- R. Schreiber and J. Dongarra. Automatic blocking of nested loops. Technical Report 90.38, RIACS, NASA Ames Research Center, August 1990.Google Scholar
Digital Library
- TLOG: A parameterized tiled loop generator. Available at: http://www.cs.colostate.edu/ ln/TLOG/.Google Scholar
- R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Supercomputing '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society. Google Scholar
Digital Library
- R.P. Wilson, R.S. French, C.S. Wilson, S.P. Amarasinghe, J.M. Anderson, S.W K. Tjiang, S.-W. Liao, C.-W. Tseng, M.W. Hall, M.S. Lam, and J.L. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices, 29(12):31--37, 1994. Google Scholar
Digital Library
- J. Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000. Google Scholar
Digital Library
Index Terms
Parameterized tiled loops for free
Recommendations
Parameterized tiling revisited
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimizationTiling, a key transformation for optimizing programs, has been widely studied in literature. Parameterized tiled code is important for auto-tuning systems since they often execute a large number of runs with dynamically varied tile sizes. Previous work ...
Parameterized loop tiling
Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and ...
Parameterized tiled loops for free
Proceedings of the 2007 PLDI conferenceParameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size ...







Comments