Abstract
Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and autotuners that produce highly optimized libraries and codes. Although it is easy to generate such loops for (hyper-) rectangular iteration spaces tiled with (hyper-) rectangular tiles, many important computations do not fall into this restricted domain. In the past, parameterized tiled code generation for the general case of convex iteration spaces being tiled by (hyper-) rectangular tiles has been solved with bounding box approaches or with sophisticated and expensive machinery.
We present a novel formulation of the parameterized tiled loop generation problem using a polyhedral set called the outset. By reducing the problem of parameterized tiled code generation to that of generating standard loops and simple postprocessing of these loops, the outset method achieves a code generation efficiency that is comparable to existing code generation techniques, including those for fixed tile sizes. We compare the performance of our technique with several other tiled loop generation methods on kernels from BLAS3 and scientific computations. The simplicity of our solution makes it well suited for use in production compilers—in particular, the IBM XL compiler uses the inset-based technique introduced in this article for register tiling. We also provide a complete coverage of parameterized tiling of perfect loop nests by describing three related techniques: (i) a scheme for separating full and partial tiles; (ii) a scheme for generating tiled loops directly from the abstract syntax tree representation of loops; (iii) a formal characterization of parameterized loop tiling using bilinear forms and a Symbolic Fourier-Motzkin Elimination (SFME)-based parameterized tiled loop generation method.
- Amarasinghe, S. 1997. Parallelizing compiler techniques based on linear inequalities. Ph.D. thesis, Stanford University. Google Scholar
Digital Library
- Amarasinghe, S. P. and Lam, M. S. 1993. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, New York, NY, 126--138. Google Scholar
Digital Library
- Ancourt, C. and Irigoin, F. 1991. Scanning polyhedra with do loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). ACM Press, New York, NY, 39--50. Google Scholar
Digital Library
- Andonov, R. and Rajopadhye, S. 1997. Optimal orthogonal tiling of 2-d iterations. J. Parall. Distrib. Comput. 45, 2, 159--165. Google Scholar
Digital Library
- Bastoul, C. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). Los Alamitos, CA, 7--16. Google Scholar
Digital Library
- Boulet, P., Darte, A., Risset, T., and Robert, Y. 1994. (Pen)-ultimate tiling? Integr. VLSI J. 17, 33--51. Google Scholar
Digital Library
- Coleman, S. and McKinley, K. S. 1995. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95). ACM Press, New York, NY, 279--290. Google Scholar
Digital Library
- Dines, L. 1919. Systems of linear inequalities. Ann. Math. 20, 191--199.Google Scholar
Cross Ref
- Eaves, B. C. and Rothblum, U. G. 1989. A theory on extending algorithms for parametric problems. Math. Oper. Res. 14, 3, 502--533. Google Scholar
Digital Library
- Eaves, B. C. and Rothblum, U. G. 1992. Dines-Fourier-Motzkin quantifier elimination and an application of corresponding transfer principles over ordered fields. Math. Program. 53, 1--3, 307--321. Google Scholar
Digital Library
- Fourier, L. 1827. Reported in Analyse des travaux de l'Academie Royale des Sciences, pendant l'annee 1824, Partie mathematique. Histoire de l'Acacdemie Royale des Sciences de l'Institut de France 7, xlvii--lv.Google Scholar
- Goumas, G., Athanasaki, M., and Koziris, N. 2003. An efficient code generation technique for tiled iteration spaces. IEEE Trans. Parall. Distrib. Syst. 14, 10, 1021--1034. Google Scholar
Digital Library
- Grösslinger, A., Griebl, M., and Lengauer, C. 2004. Introducing nonlinear parameters to the polyhedron model. In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC'04). M. Gerndt and E. Kereku, Eds., Research Report Series, LRR-TUM, Technische Universität München, Seeon, Germany, 1--12.Google Scholar
- Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., and Sadayappan, P. 2009. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing. ACM, New York, NY, 147--157. Google Scholar
Digital Library
- HiTLoG 2007. HiTLoG: Hierarchical Tiled Loop Generator. http://www.cs.colostate.edu/MMAlpha/HiTLoG/.Google Scholar
- Högstedt, K., Carter, L., and Ferrante, J. 1997. Determining the idle time of a tiling. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). ACM, New York, NY, 160--173. Google Scholar
Digital Library
- Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'88). ACM, New York, 319--329. Google Scholar
Digital Library
- Jiménez, M., Llabería, J. M., and Fernández, A. 2002. Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst. 24, 4, 409--453. Google Scholar
Digital Library
- Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., and Yelick, K. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance and Correctness (MSPC'06). ACM, New York, NY, 51--60. Google Scholar
Digital Library
- Kelly, W., Pugh, W., and Rosser, E. 1995. Code generation for multiple mappings. In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95). IEEE Computer Society, Los Alamitos, CA, 332--341. Google Scholar
Digital Library
- Kim, D. and Rajopadhye, S. 2009. Parameterized tiling for imperfectly nested loops. Tech. rep. CS-09-101, Colorado State University.Google Scholar
- Kim, D., Renganarayanan, L., Rostron, D., Rajopadhye, S., and Strout, M. M. 2007. Multilevel tiling: M for the price of one. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'07). ACM Press, New York, NY. Google Scholar
Digital Library
- Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Los Alamitos, CA, 237. Google Scholar
Digital Library
- Knijnenburg, P. M. W., Kisuki, T., and O'Boyle, M. F. P. 2002. Iterative compilation. In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation (SAMOS). Springer-Verlag, Berlin, ACM Press, New York, NY, 171--187. Google Scholar
Digital Library
- Le Fur, M. 1996. Scanning parameterized polyhedron using Fourier-Motzkin elmination. Concurrency: Pract. Exp. 8, 6, 445--460.Google Scholar
Cross Ref
- Le Verge, H., Van Dongen, V., and Wilde, D. 1994a. La synthèse de nids de boucles avec la bibliothèque polyédrique. In RenPar'6. IRISA, Lyon, France. (English version: Loop nest synthesis using the polyhedral library. In IRISA TR 830.)Google Scholar
- Le Verge, H., Van Dongen, V., and Wilde, D. 1994b. Loop nest synthesis using the polyhedral library. Tech. rep. PI 830, IRISA, Rennes, France. (Also published as INRIA Research Report 2288.)Google Scholar
- Lowenthal, D. K. 2000. Accurately selecting block size at runtime in pipelined parallel, programs. Int. J. Parallel Program. 28, 3, 245--274. Google Scholar
Digital Library
- Motzkin, T. 1936. Beitrage zur theorie der linearen ungliechungen. Ph.D. thesis, University of Basel.Google Scholar
- Nikolopoulos, D. S. 2004. Dynamic tiling for effective use of shared caches on multithreaded processors. Int. J. High Perform. Comput. Netw. 2, 22--35. Google Scholar
Digital Library
- OMEGA+ 2011. OMEGA+: Pressburger engine and code generator. http://chunchen.info/omega/.Google Scholar
- Pugh, W. 1992. Omega test: A practical algorithm for exact array dependency analysis. Comm. ACM 35, 8, 102. Google Scholar
Digital Library
- Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. Spiral: Code generation for dsp transforms. Proc. IEEE 93, 2, 232--275.Google Scholar
Cross Ref
- Quilleré, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. Int. J. Parall. Program. 28, 5, 469--498. Google Scholar
Digital Library
- Ramanujam, J. and Sadayappan, P. 1992. Tiling multidimensional iteration spaces for multicomputers. J. Parall. Distrib. Comput. 16, 2, 108--120.Google Scholar
Cross Ref
- Renganarayana, L., Bondhugula, U., Derisavi, S., Eichenberger, A. E., and O'Brien, K. 2009. Compact multidimensional kernel extraction for register tiling. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09). ACM Press, New York, NY. Google Scholar
Digital Library
- Schreiber, R. and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rep. 90.38, RIACS, NASA Ames Research Center. Google Scholar
Digital Library
- Weispfenning, V. 1994. Parametric linear and quadratic optimization by elimination. Tech. rep. MIP-9404, Fakultät für Mathematik und Informatik, Universität Passau.Google Scholar
- Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society, Los Alamitos, CA, 1--27. Google Scholar
Digital Library
- Wilson, R. P., French, R. S., Wilson, C. S., Amarasinghe, S. P., Anderson, J. M., Tjiang, S. W. K., Liao, S.-W., Tseng, C.-W., Hall, M. W., Lam, M. S., and Hennessy, J. L. 1994. SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices 29, 12, 31--37. Google Scholar
Digital Library
- Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). ACM, New York, NY, USA, 30--44. Google Scholar
Digital Library
- Wolfe, M. 1989. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 357--361. Google Scholar
Digital Library
- Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
Index Terms
Parameterized loop tiling
Recommendations
Parameterized tiled loops for free
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationParameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size ...
Parameterized tiled loops for free
Proceedings of the 2007 PLDI conferenceParameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size ...
Tiling imperfectly-nested loop nests
SC '00: Proceedings of the 2000 ACM/IEEE conference on SupercomputingTiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop ...






Comments