skip to main content
research-article

Revisiting loop fusion in the polyhedral framework

Published:06 February 2014Publication History
Skip Abstract Section

Abstract

Loop fusion is an important compiler optimization for improving memory hierarchy performance through enabling data reuse. Traditional compilers have approached loop fusion in a manner decoupled from other high-level loop optimizations, missing several interesting solutions. Recently, the polyhedral compiler framework with its ability to compose complex transformations, has proved to be promising in performing loop optimizations for small programs. However, our experiments with large programs using state-of-the-art polyhedral compiler frameworks reveal suboptimal fusion partitions in the transformed code. We trace the reason for this to be lack of an effective cost model to choose a good fusion partitioning among the possible choices, which increase exponentially with the number of program statements. In this paper, we propose a fusion algorithm to choose good fusion partitions with two objective functions - achieving good data reuse and preserving parallelism inherent in the source code. These objectives, although targeted by previous work in traditional compilers, pose new challenges within the polyhedral compiler framework and have thus not been addressed. In our algorithm, we propose several heuristics that work effectively within the polyhedral compiler framework and allow us to achieve the proposed objectives. Experimental results show that our fusion algorithm achieves performance comparable to the existing polyhedral compilers for small kernel programs, and significantly outperforms them for large benchmark programs such as those in the SPEC benchmark suite.

References

  1. Pluto: An automatic parallelizer and locality optimizer for multicores. Available at http://pluto-compiler.sourceforge.net.Google ScholarGoogle Scholar
  2. Pocc: Polyhedral compiler collection. Available at http://www.cse.ohio-state.edu/~pouchet/software/pocc/.Google ScholarGoogle Scholar
  3. Polybench. Available at http://www-roc.inria.fr/~pouchet/software/polybench/.Google ScholarGoogle Scholar
  4. Polyopt: a polyhedral optimizer for the rose compiler. Available at http://www.cse.ohio-state.edu/~pouchet/software/polyopt/.Google ScholarGoogle Scholar
  5. R-stream compiler. Available at https://www.reservoir.com/?q=rstream.Google ScholarGoogle Scholar
  6. Rose compiler. Available at http://rosecompiler.org/.Google ScholarGoogle Scholar
  7. Ibm xl compiler. Available at http://ibm.com/software/awdtools/xlcpp/.Google ScholarGoogle Scholar
  8. M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In CC '2010, pages 283--303, Paphos, Cyprus, Mar. . Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. U. Bondhugula. Effective Automatic Parallelization and Locality Optimization Using The Polyhedral Model. PhD thesis, Ohio State University, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In CC'2008, pages 132--146. Springer Berlin / Heidelberg. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Bondhugula, O. Gunluk, S. Dash, and L. Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In PACT '2010, pages 343--352. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Chen, J. Chame, and M. Hall. Chill: A framework for composing high-level loop transformations. U. of Southern California, Tech. Rep, pages 08--897, 2008.Google ScholarGoogle Scholar
  13. A. Cohen, M. Sigler, S. Girbal, O. Temam, D. Parello, and N. Vasilache. Facilitating the search for compositions of program transformations. In ICS '05, pages 151--160. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Darte. On the complexity of loop fusion. In PACT '99, Washington, DC, USA. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. JPDC, 64 (1): 108 -- 134, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Feautrier. Some efficient solutions to the affine scheduling problem. i. one-dimensional time. IJPP, pages 313--347, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Feautrier. Some efficient solutions to the affine scheduling problem. part ii. multidimensional time. IJPP, pages 389--420, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. IJPP, 34: 261--317, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Griebl. phAutomatic Parallelization of Loop Programs for Distributed Memory Architectures. PhD thesis, University of Passau, 2004.Google ScholarGoogle Scholar
  20. T. Grosser, H. Zheng, R. Aloor, A. Simbürger, A. Größlinger, and L.-N. Pouchet. Polly-polyhedral optimization in llvm. In IMPACT '2011.Google ScholarGoogle Scholar
  21. W. Kelly and W. Pugh. Minimizing communication while preserving parallelism. In ICS '1995, pages 52--60. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC, volume 768 of Lecture Notes in Computer Science, pages 301--320. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24: 445 -- 475, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS '99, pages 228--237. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. LLVM. Polly: Polyhedral optimizations for llvm. Available at http://polly.llvm.org/.Google ScholarGoogle Scholar
  26. N. Megiddo and V. Sarkar. Optimal weighted loop fusion for parallel programs. In SPAA '97, pages 282--291. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L.-N. Pouchet, C. Bastoul, A. Cohen, and J. Cavazos. Iterative optimization in the polyhedral model: Part ii, multidimensional time. In PLDI '08, pages 90--100. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part i, one-dimensional time. In CGO '07, pages 144 --156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop transformations: convexity, pruning and optimization. In POPL '2011, pages 549--562. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Sharir. A strong-connectivity algorithm and its applications in data flow analysis. Computers & Mathematics with Applications, 7 (1): 67--72, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. K. Singhai and K. S. McKinley. A parametrized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40 (6): 340--355, 1997.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Revisiting loop fusion in the polyhedral framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!