skip to main content
research-article
Free Access

Parameterized loop tiling

Published:04 May 2012Publication History
Skip Abstract Section

Abstract

Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and autotuners that produce highly optimized libraries and codes. Although it is easy to generate such loops for (hyper-) rectangular iteration spaces tiled with (hyper-) rectangular tiles, many important computations do not fall into this restricted domain. In the past, parameterized tiled code generation for the general case of convex iteration spaces being tiled by (hyper-) rectangular tiles has been solved with bounding box approaches or with sophisticated and expensive machinery.

We present a novel formulation of the parameterized tiled loop generation problem using a polyhedral set called the outset. By reducing the problem of parameterized tiled code generation to that of generating standard loops and simple postprocessing of these loops, the outset method achieves a code generation efficiency that is comparable to existing code generation techniques, including those for fixed tile sizes. We compare the performance of our technique with several other tiled loop generation methods on kernels from BLAS3 and scientific computations. The simplicity of our solution makes it well suited for use in production compilers—in particular, the IBM XL compiler uses the inset-based technique introduced in this article for register tiling. We also provide a complete coverage of parameterized tiling of perfect loop nests by describing three related techniques: (i) a scheme for separating full and partial tiles; (ii) a scheme for generating tiled loops directly from the abstract syntax tree representation of loops; (iii) a formal characterization of parameterized loop tiling using bilinear forms and a Symbolic Fourier-Motzkin Elimination (SFME)-based parameterized tiled loop generation method.

References

  1. Amarasinghe, S. 1997. Parallelizing compiler techniques based on linear inequalities. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amarasinghe, S. P. and Lam, M. S. 1993. Communication optimization and code generation for distributed memory machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM Press, New York, NY, 126--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ancourt, C. and Irigoin, F. 1991. Scanning polyhedra with do loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). ACM Press, New York, NY, 39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andonov, R. and Rajopadhye, S. 1997. Optimal orthogonal tiling of 2-d iterations. J. Parall. Distrib. Comput. 45, 2, 159--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bastoul, C. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). Los Alamitos, CA, 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boulet, P., Darte, A., Risset, T., and Robert, Y. 1994. (Pen)-ultimate tiling? Integr. VLSI J. 17, 33--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Coleman, S. and McKinley, K. S. 1995. Tile size selection using cache organization and data layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95). ACM Press, New York, NY, 279--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dines, L. 1919. Systems of linear inequalities. Ann. Math. 20, 191--199.Google ScholarGoogle ScholarCross RefCross Ref
  9. Eaves, B. C. and Rothblum, U. G. 1989. A theory on extending algorithms for parametric problems. Math. Oper. Res. 14, 3, 502--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eaves, B. C. and Rothblum, U. G. 1992. Dines-Fourier-Motzkin quantifier elimination and an application of corresponding transfer principles over ordered fields. Math. Program. 53, 1--3, 307--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fourier, L. 1827. Reported in Analyse des travaux de l'Academie Royale des Sciences, pendant l'annee 1824, Partie mathematique. Histoire de l'Acacdemie Royale des Sciences de l'Institut de France 7, xlvii--lv.Google ScholarGoogle Scholar
  12. Goumas, G., Athanasaki, M., and Koziris, N. 2003. An efficient code generation technique for tiled iteration spaces. IEEE Trans. Parall. Distrib. Syst. 14, 10, 1021--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Grösslinger, A., Griebl, M., and Lengauer, C. 2004. Introducing nonlinear parameters to the polyhedron model. In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC'04). M. Gerndt and E. Kereku, Eds., Research Report Series, LRR-TUM, Technische Universität München, Seeon, Germany, 1--12.Google ScholarGoogle Scholar
  14. Hartono, A., Baskaran, M. M., Bastoul, C., Cohen, A., Krishnamoorthy, S., Norris, B., Ramanujam, J., and Sadayappan, P. 2009. Parametric multilevel tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing. ACM, New York, NY, 147--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HiTLoG 2007. HiTLoG: Hierarchical Tiled Loop Generator. http://www.cs.colostate.edu/MMAlpha/HiTLoG/.Google ScholarGoogle Scholar
  16. Högstedt, K., Carter, L., and Ferrante, J. 1997. Determining the idle time of a tiling. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). ACM, New York, NY, 160--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Irigoin, F. and Triolet, R. 1988. Supernode partitioning. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'88). ACM, New York, 319--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiménez, M., Llabería, J. M., and Fernández, A. 2002. Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst. 24, 4, 409--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., and Yelick, K. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the Workshop on Memory System Performance and Correctness (MSPC'06). ACM, New York, NY, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kelly, W., Pugh, W., and Rosser, E. 1995. Code generation for multiple mappings. In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95). IEEE Computer Society, Los Alamitos, CA, 332--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kim, D. and Rajopadhye, S. 2009. Parameterized tiling for imperfectly nested loops. Tech. rep. CS-09-101, Colorado State University.Google ScholarGoogle Scholar
  22. Kim, D., Renganarayanan, L., Rostron, D., Rajopadhye, S., and Strout, M. M. 2007. Multilevel tiling: M for the price of one. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'07). ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Los Alamitos, CA, 237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Knijnenburg, P. M. W., Kisuki, T., and O'Boyle, M. F. P. 2002. Iterative compilation. In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation (SAMOS). Springer-Verlag, Berlin, ACM Press, New York, NY, 171--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Le Fur, M. 1996. Scanning parameterized polyhedron using Fourier-Motzkin elmination. Concurrency: Pract. Exp. 8, 6, 445--460.Google ScholarGoogle ScholarCross RefCross Ref
  26. Le Verge, H., Van Dongen, V., and Wilde, D. 1994a. La synthèse de nids de boucles avec la bibliothèque polyédrique. In RenPar'6. IRISA, Lyon, France. (English version: Loop nest synthesis using the polyhedral library. In IRISA TR 830.)Google ScholarGoogle Scholar
  27. Le Verge, H., Van Dongen, V., and Wilde, D. 1994b. Loop nest synthesis using the polyhedral library. Tech. rep. PI 830, IRISA, Rennes, France. (Also published as INRIA Research Report 2288.)Google ScholarGoogle Scholar
  28. Lowenthal, D. K. 2000. Accurately selecting block size at runtime in pipelined parallel, programs. Int. J. Parallel Program. 28, 3, 245--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Motzkin, T. 1936. Beitrage zur theorie der linearen ungliechungen. Ph.D. thesis, University of Basel.Google ScholarGoogle Scholar
  30. Nikolopoulos, D. S. 2004. Dynamic tiling for effective use of shared caches on multithreaded processors. Int. J. High Perform. Comput. Netw. 2, 22--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. OMEGA+ 2011. OMEGA+: Pressburger engine and code generator. http://chunchen.info/omega/.Google ScholarGoogle Scholar
  32. Pugh, W. 1992. Omega test: A practical algorithm for exact array dependency analysis. Comm. ACM 35, 8, 102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. Spiral: Code generation for dsp transforms. Proc. IEEE 93, 2, 232--275.Google ScholarGoogle ScholarCross RefCross Ref
  34. Quilleré, F., Rajopadhye, S., and Wilde, D. 2000. Generation of efficient nested loops from polyhedra. Int. J. Parall. Program. 28, 5, 469--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ramanujam, J. and Sadayappan, P. 1992. Tiling multidimensional iteration spaces for multicomputers. J. Parall. Distrib. Comput. 16, 2, 108--120.Google ScholarGoogle ScholarCross RefCross Ref
  36. Renganarayana, L., Bondhugula, U., Derisavi, S., Eichenberger, A. E., and O'Brien, K. 2009. Compact multidimensional kernel extraction for register tiling. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09). ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Schreiber, R. and Dongarra, J. 1990. Automatic blocking of nested loops. Tech. rep. 90.38, RIACS, NASA Ames Research Center. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Weispfenning, V. 1994. Parametric linear and quadratic optimization by elimination. Tech. rep. MIP-9404, Fakultät für Mathematik und Informatik, Universität Passau.Google ScholarGoogle Scholar
  39. Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of the ACM/IEEE Conference on Supercomputing. IEEE Computer Society, Los Alamitos, CA, 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wilson, R. P., French, R. S., Wilson, C. S., Amarasinghe, S. P., Anderson, J. M., Tjiang, S. W. K., Liao, S.-W., Tseng, C.-W., Hall, M. W., Lam, M. S., and Hennessy, J. L. 1994. SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices 29, 12, 31--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). ACM, New York, NY, USA, 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wolfe, M. 1989. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 357--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parameterized loop tiling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 34, Issue 1
      April 2012
      225 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/2160910
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 May 2012
      • Accepted: 1 January 2012
      • Revised: 1 June 2010
      • Received: 1 April 2009
      Published in toplas Volume 34, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!