skip to main content
research-article

PLUTO+: near-complete modeling of affine transformations for parallelism and locality

Published:24 January 2015Publication History
Skip Abstract Section

Abstract

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks like the Pluto algorithm, that include a cost function for modern multicore architectures where coarse-grained parallelism and locality are crucial, consider only a sub-space of transformations to avoid a combinatorial explosion in finding the transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations, in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In this paper, we propose an approach to address this limitation by modeling a much larger space of affine transformations in conjunction with the Pluto algorithm's cost function. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance in any of the Polybench benchmarks. For Lattice Boltzmann Method (LBM) codes with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compile times significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time only by 15%. In cases where it improves execution time significantly, it increased polyhedral optimization time only by 2.04x.

References

  1. A. V. Aho, R. Sethi, J. D. Ullman, and M. S. Lam. Compilers: Principles, Techniques, and Tools (second edition). Prentice Hall, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Bandishti, I. Pananilath, and U. Bondhugula. Tiling stencil computations to maximize parallelism. In Supercomputing, pages 40:1– 40:11, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bastoul. Code generation in the polyhedral model is easier than you think. In International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 7–16, 2004.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International conference on Compiler Construction (ETAPS CC), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN symposium on Programming Languages Design and Implementation (PLDI), pages 101–113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. Bondhugula, V. Bandishti, A. Cohen, G. Potron, and N. Vasilache. Tiling and optimizing time-iterated computations on periodic domains. In International conference on Parallel Architectures and Compilation Techniques (PACT), pages 39–50, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Chen. Polyhedra scanning revisited. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 499–508, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chen and G. D. Doolen. Lattice boltzmann method for fluid flows. Annual review of fluid mechanics, 30(1):329–364, 1998.Google ScholarGoogle Scholar
  9. C. Choffrut and K. Culik. Folding of the plane and the design of systolic arrays. Information Processing Letters, 17(3):149 – 153, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  10. Cloog. The Chunky Loop Generator. http://www.cloog.org.Google ScholarGoogle Scholar
  11. D. d’Humières. Multiple–relaxation–time lattice boltzmann models in three dimensions. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 360(1792):437–451, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22(3):243–268, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Feautrier. Dataflow analysis of scalar and array references. International Journal of Parallel Programming, 20(1):23–53, Feb. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. Feautrier. Some efficient solutions to the affine scheduling problem: Part II, multidimensional time. International Journal of Parallel Programming, 21(6):389–420, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Feautrier. Some efficient solutions to the affine scheduling problem: Part I, one-dimensional time. International Journal of Parallel Programming, 21(5):313–348, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. GNU. GLPK (GNU Linear Programming Kit). https://www.gnu.org/software/glpk/.Google ScholarGoogle Scholar
  17. M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. University of Passau, 2004. Habilitation thesis.Google ScholarGoogle Scholar
  18. A. Hartono, M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, and J. Ramanujam. A parametric multi-level tiler for imperfect loop nests. In International conference on Supercomputing (ICS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Henretty, K. Stock, L.-N. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short simd architectures. In ETAPS International conference on Compiler Construction (CC’11), pages 225–245, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Henretty, R. Veras, F. Franchetti, L.-N. Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector SIMD architectures. In ACM International Conference on Supercomputing, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan. When polyhedral transformations meet simd code generation. In ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI), June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Leung, N. Vasilache, B. Meister, and R. Lethin. Methods and apparatus for joint parallelism and locality optimization in source code compilation, June 3 2010. WO Patent App. PCT/US2009/057,194.Google ScholarGoogle Scholar
  23. W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming, 22(2):183–205, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages 201–214, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4): 445–475, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ACM International Conference on Supercomputing (ICS), pages 228–237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Meister, N. Vasilache, D. Wohlford, M. Baskaran, A. Leung, and R. Lethin. R-Stream Compiler. In Encyclopedia of Parallel Computing, pages 1756–1765. 2011.Google ScholarGoogle Scholar
  28. N. Osheim, M. M. Strout, D. Rostron, and S. Rajopadhye. Smashing: Folding space to tile through time. In Workshop on Languages and Compilers for Parallel Computing (LCPC), pages 80–93. Springer-Verlag, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Palabos. Palabos. http://www.palabos.org/.Google ScholarGoogle Scholar
  30. Polybench. Polybench suite. http://polybench.sourceforge.net.Google ScholarGoogle Scholar
  31. L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. Iterative optimization in the polyhedral model: Part II, multidimensional time. In ACM SIGPLAN symposium on Programming Languages Design and Implementation (PLDI), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop transformations: Convexity, pruning and optimization. In ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’11), Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Sadourny. The dynamics of finite-difference models of the shallowwater equations. J. Atm. Sciences, 32(4), Apr. 1975.Google ScholarGoogle ScholarCross RefCross Ref
  34. R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache accurate time skewing in iterative stencil computations. In International conference on Parallel Processing (ICPP), pages 571–581, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. N. Swarztrauber. 171.swim spec cpu2000 benchmark description file. Standard Performance Evaluation Corporation. http://www.spec.org/cpu2000/CFP2000/171.swim/docs/171.swim.html, 2000.Google ScholarGoogle Scholar
  36. Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk, and C. E. Leiserson. The Pochoir stencil compiler. In ACM symposium on Parallelism in Algorithms and Architectures (SPAA), pages 117–128, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Vasilache. Scalable Program Optimization Techniques in the Polyhedral Model. PhD thesis, Université de Paris-Sud, INRIA Futurs, Sept. 2007.Google ScholarGoogle Scholar
  38. S. Verdoolaege. ISL: An Integer Set Library for the Polyhedral Model. In K. Fukuda, J. Hoeven, M. Joswig, and N. Takayama, editors, Mathematical Software - ICMS 2010, volume 6327, pages 299– 302. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Verdoolaege and T. Grosser. Polyhedral extraction tool. In International workshop on Polyhedral Compilation Techniques (IMPACT), 2012.Google ScholarGoogle Scholar
  40. D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In IPDPS, pages 171 –180, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Yaacoby and P. R. Cappello. Converting affine recurrence equations to quasi-uniform recurrence equations. VLSI Signal Processing, 11(1- 2):113–131, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Yuki. Understanding PolyBench/C 3.2 kernels. In International workshop on Polyhedral Compilation Techniques (IMPACT), Jan. 2014.Google ScholarGoogle Scholar
  43. Q. Zou and X. He. On pressure and velocity boundary conditions for the lattice Boltzmann BGK model. Physics of Fluids (1994-present), 9(6):1591–1598, 1997.Google ScholarGoogle Scholar

Index Terms

  1. PLUTO+: near-complete modeling of affine transformations for parallelism and locality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!