skip to main content
research-article

Loop transformations: convexity, pruning and optimization

Published:26 January 2011Publication History
Skip Abstract Section

Abstract

High-level loop transformations are a key instrument in mapping computational kernels to effectively exploit the resources in modern processor architectures. Nevertheless, selecting required compositions of loop transformations to achieve this remains a significantly challenging task; current compilers may be off by orders of magnitude in performance compared to hand-optimized programs. To address this fundamental challenge, we first present a convex characterization of all distinct, semantics-preserving, multidimensional affine transformations. We then bring together algebraic, algorithmic, and performance analysis results to design a tractable optimization algorithm over this highly expressive space. Our framework has been implemented and validated experimentally on a representative set of benchmarks running on state-of-the-art multi-core platforms.

Skip Supplemental Material Section

Supplemental Material

50-mpeg-4.mp4

References

  1. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. Using machine learning to focus iterative optimization. In Proc. of the Intl. Symposium on Code Generation and Optimization (CGO'06), pages 295--305, Washington, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Barthou, J.-F. Collard, and P. Feautrier. Fuzzy array dataflow analysis. Journal of Parallel and Distributed Computing, 40:210--226, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Juan-les-Pins, France, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In Intl. Conf. on Compiler Construction (ETAPS CC'10), LNCS 6011, pages 283--303, Paphos, Cyprus, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Bodin, T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and E. Rohou. Iterative compilation in a non-linear optimisation space. In W. on Profile and Feedback Directed Compilation, Paris, Oct. 1998.Google ScholarGoogle Scholar
  6. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In Proc. of the 19th intl. conf. on Parallel Architectures and Compilation Techniques (PACT'10), pages 343--352. ACM press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chen, J. Chame, and M. Hall. CHiLL: A framework for composing high-level loop transformations. Technical Report 08-897, U. of Southern California, 2008.Google ScholarGoogle Scholar
  10. P. Clauss. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: applications to analyze and transform scientific programs. In Proc. of the Intl. Conf. on Supercomputing, pages 278--285. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM International conference on Supercomputing, pages 151--160, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Darte. On the complexity of loop fusion. Parallel Computing, pages 149--157, 1999.Google ScholarGoogle Scholar
  13. A. Darte and G. Huard. Loop shifting for loop parallelization. Technical Report RR2000-22, ENS Lyon, May 2000.Google ScholarGoogle Scholar
  14. A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Proc. Letters, 7(4):379--392, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243--268, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Feautrier. Dataflow analysis of scalar and array references. Intl. J. of Parallel Programming, 20(1):23--53, Feb. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  17. P. Feautrier. Some efficient solutions to the affine scheduling problem, part I: one dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, Oct. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Feautrier. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, Dec. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 315--326, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. Intl. J. of Parallel Programming, 34(3):261--317, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Griebl. Automatic parallelization of loop programs for distributed memory architectures. Habilitation thesis. Facultät für Mathematik und Informatik, Universität Passau, 2004.Google ScholarGoogle Scholar
  22. A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.Google ScholarGoogle Scholar
  23. F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Kelly. Optimization within a Unified Transformation Framework. PhD thesis, Univ. of Maryland, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing, pages 301--320, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Kudlur and S. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 114--124. ACM Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Lethin, A. Leung, B. Meister, N. Vasilache, D. Wohlford, M. Baskaran, A. Hartono, and K. Datta. In D. Padua, editor, Encyclopedia of Parallel Computing. 1st edition., 2011, 50 p. in 4 volumes, not available separately., hardcover edition, June 2011.Google ScholarGoogle Scholar
  29. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst., 18(4):424--453, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In symposium on Parallel Algorithms and Architectures, pages 282--291, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Nisbet. GAPS: A compiler framework for genetic algorithm (GA) optimised parallelisation. In HPCN Europe 1998: Proc. of the Intl. Conf. and Exhibition on High-Performance Computing and Networking, pages 987--989, London, UK, 1998. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L.-N. Pouchet. Interative Optimization in the Polyhedral Model. PhD thesis, University of Paris-Sud 11, Orsay, France, Jan. 2010.Google ScholarGoogle Scholar
  33. L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI'08), pages 90--100. ACM Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan. Combined iterative and model-driven optimization in an automatic parallelization framework. In Conf. on SuperComputing (SC'10), New Orleans, LA, Nov. 2010. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Qasem and K. Kennedy. Profitable loop fusion and tiling using model-driven empirical search. In Proc. of the 20th Intl. Conf. on Supercomputing (ICS'06), pages 249--258. ACM press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing, 16(2):108--230, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Ren, J. Y. Park, M. Houston, A. Aiken, and W. J. Dally. A tuning framework for software-managed memory hierarchies. In Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'08), pages 280--291. ACM Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. SIGPLAN Notices, Proc. of the 2007 PLDI Conf., 42(6):405--414, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Singhai and K. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340--355, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  41. N. J. A. Sloane. Sequence a000670. The On-Line Encyclopedia of Integer Sequences.Google ScholarGoogle Scholar
  42. M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not., 38(5):77--90, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable autotuning framework for computer optimization. In IPDPS'09, Rome, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. N. Vasilache. Scalable Program Optimization Techniques in the Polyhedra Model. PhD thesis, University of Paris-Sud 11, 2007.Google ScholarGoogle Scholar
  45. S. Verdoolaege, F. Catthoor, M. Bruynooghe, and G. Janssens. Feasibility of incremental translation. Technical Report CW 348, Katholieke Universiteit Leuven Department of Computer Science, Oct. 2002.Google ScholarGoogle Scholar
  46. Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In Intl. Symp. on Code Generation and Optimization (CGO'09), Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the atlas project. Parallel Computing, 27(1--2):3--35, 2001.Google ScholarGoogle Scholar
  48. M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 274--286, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Wolfe. High performance compilers for parallel computing. Addison-Wesley Publishing Company, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Loop transformations: convexity, pruning and optimization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 1
      POPL '11
      January 2011
      624 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1925844
      Issue’s Table of Contents
      • cover image ACM Conferences
        POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
        January 2011
        652 pages
        ISBN:9781450304900
        DOI:10.1145/1926385

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 January 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!