skip to main content
10.1145/1250734.1250780acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Parameterized tiled loops for free

Published:10 June 2007Publication History

ABSTRACT

Parameterized tiled loops-where the tile sizes are not fixed at compile time, but remain symbolic parameters until later--are quite useful for iterative compilers and "auto-tuners" that produce highly optimized libraries and codes. Tile size parameterization could also enable optimizations such as register tiling to become dynamic optimizations. Although it is easy to generate such loops for (hyper) rectangular iteration spaces tiled with (hyper) rectangular tiles, many important computations do not fall into this restricted domain. Parameterized tile code generation for the general case of convex iteration spaces being tiled by (hyper) rectangular tiles has in the past been solved with bounding box approaches or symbolic Fourier Motzkin approaches. However, both approaches have less than ideal code generation efficiency and resulting code quality. We present the theoretical foundations, implementation, and experimental validation of a simple, unified technique for generating parameterized tiled code. Our code generation efficiency is comparable to all existing code generation techniques including those for fixed tile sizes, and the resulting code is as efficient as, if not more than, all previous techniques. Thus the technique provides parameterized tiled loops for free! Our "one-size-fits-all" solution, which is available as open source software can be adapted for use in production compilers.

References

  1. S. Amarasinghe. Parallelizing Compiler Techniques Based on Linear Inequalities. PhD thesis, Stanford University, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pages 126--138, New York, NY, USA, 1993. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39--50, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Andonov and S. Rajopadhye. Optimal orthogonal tiling of 2-diterations. Journal of Parallel and Distributed Computing, 45(2):159--165, September 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bagnara, P. M. Hill, and E. Zaffanella. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems. Quaderno 457, Dipartimento di Matematica, Università di Parma, Italy, 2006.Google ScholarGoogle Scholar
  6. C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, Juanles-Pins, September 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boulet, A. Darte, T. Risset, and Y. Robert. (pen)-ultimate tiling? INTEGRATION, the VLSI journal, 17:33--51, August 1994 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Goumas, M. Athanasaki, and N. Koziris. An efficient code generation technique for tiled iteration spaces. IEEE Transactions on Parallel and Distributed Systems, 14(10), October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Größlinger, M. Griebl, and C. Lengauer. Introducing non-linear parameters to the polyhedron model. In Michael Gerndt and Edmond Kereku, editors, Proc. 11th Workshop on Compilers for Parallel Computers (CPC 2004), Research Report Series, pages 1--12. LRRTUM, Technische Universität München, July 2004.Google ScholarGoogle Scholar
  11. K. Högstedt, L. Carter, and J. Ferrante. Determining the idle time of a tiling. In Principles of Programming Languages, pages 160--173, Paris, France, January 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Irigoin and R. Triolet. Supernode partitioning. In 15th ACM Symposium on Principles of Programming Languages, pages 319--328. ACM, January 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Jiménez, J. M. Llabería, and A. Fernández. Register tiling in nonrectangular iteration spaces. ACM Trans. Program. Lang. Syst., 24(4):409--453, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explict optimizations for stencil computations. In Memory Systems Performance and Correctness, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers '95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Kisuki, P. M. W. Knijnenburg, and M. F. P. O'Boyle. Combined selection of tile sizes and unroll factors using iterative compilation. In PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, page 237, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Iterative compilation. In Embedded processor design challenges: systems, architectures, modeling, and simulation-SAMOS, pages 171--187. Springer-Verlag New York, Inc., New York, NY, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. S. Lam and M. E. Wolf. A data locality optimizing algorithm (with retrospective). In Best of PLDI, pages 442--459, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Le Verge, V. Van Dongen, and D. Wilde. La synthèse de nids de boucles avec la bibliothèque polyédrique. In RenPar'6, Lyon, France, June 1994. English version "Loop Nest Synthesis Using the Polyhedral Library"in IRISA TR 830, May 1994.Google ScholarGoogle Scholar
  20. H. Le Verge, V. Van Dongen, and D. Wilde. Loop nest synthesis using the polyhedral library. Technical Report PI 830, IRISA, Rennes, France, May 1994. Also published as INRIA Research Report 2288.Google ScholarGoogle Scholar
  21. D. K. Lowenthal. Accurately selecting block size at runtime in pipelined parallel programs. Int. J. Parallel Program., 28(3):245--274, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D.S. Nikolopoulos. Dynamic tiling for effective use of shared caches on multithreaded processors. International Journal of High Performance Computing and Networking, pages 22--35, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Pugh. Omega test: A practical algorithm for exact array dependency analysis. Comm. of the ACM, 35(8):102, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Püschel, J.M.F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. Spiral: Code generation for dsp transforms. Proceedings of the IEEE, 93(2):232--275, February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal Parallel Programming, 28(5):469--498, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--120, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. Schreiber and J. Dongarra. Automatic blocking of nested loops. Technical Report 90.38, RIACS, NASA Ames Research Center, August 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. TLOG: A parameterized tiled loop generator. Available at: http://www.cs.colostate.edu/ ln/TLOG/.Google ScholarGoogle Scholar
  29. R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Supercomputing '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R.P. Wilson, R.S. French, C.S. Wilson, S.P. Amarasinghe, J.M. Anderson, S.W K. Tjiang, S.-W. Liao, C.-W. Tseng, M.W. Hall, M.S. Lam, and J.L. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notices, 29(12):31--37, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Xue. Loop Tiling For Parallelism. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parameterized tiled loops for free

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2007
      508 pages
      ISBN:9781595936332
      DOI:10.1145/1250734
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 42, Issue 6
        Proceedings of the 2007 PLDI conference
        June 2007
        491 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1273442
        Issue’s Table of Contents

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate406of2,067submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!