skip to main content
research-article
Free Access

An Effective Fusion and Tile Size Model for PolyMage

Published:08 November 2020Publication History
Skip Abstract Section

Abstract

Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with parallelism and locality. This article presents a new fusion algorithm for high-performance domain-specific compilers for image processing pipelines. The fusion algorithm is driven by dynamic programming and explores spaces of fusion possibilities not covered by previous approaches, and it is also driven by a cost function more concrete and precise in capturing optimization criteria than prior approaches. The fusion model is particularly tailored to the transformation and optimization sequence applied by PolyMage and Halide, two recent DSLs for image processing pipelines. Our model-driven technique when implemented in PolyMage provides significant improvements (up to 4.32×) over PolyMage’s approach (which uses auto-tuning to aid its model) and over Halide’s automatic approach (by up to 2.46×) on two state-of-the-art shared-memory multicore architectures.

References

  1. Protonu Basu, Anand Venkat, Mary W. Hall, Samuel W. Williams, Brian van Straalen, and Leonid Oliker. 2013. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid. In Proceedings of the 20th International Conference on High Performance Computing (HiPC’13). 452--461.Google ScholarGoogle ScholarCross RefCross Ref
  2. Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 343--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Eddie C. Davis, Michelle Mills Strout, and Catherine Olschanowsky. 2018. Transforming loop chains via macro dataflow graphs. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 265--277. DOI:https://doi.org/10.1145/3168832 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Johannes Doerfert, Shrey Sharma, and Sebastian Hack. 2018. Polyhedral expression propagation. In Proceedings of the 27th International Conference on Compiler Construction (CC’18). ACM, New York, NY, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guang R. Gao, R. Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective loop fusion for array contraction. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing. 281--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Google Inc. 2017. XLA (Accelerated Linear Algebra) for TensorFlow. Retrieved from https://www.tensorflow.org/performance/xla/.Google ScholarGoogle Scholar
  7. Abhinav Jangda and Uday Bondhugula. 2018. An effective fusion and tile size model for optimizing image processing pipelines. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 261--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ken Kennedy. 2001. Fast greedy weighted fusion. Int. J. Parallel Prog. 29, 5 (2001), 463--491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 301--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nimrod Megiddo and Vivek Sarkar. 1997. Optimal weighted loop fusion for parallel programs. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA’97). 282--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sanyam Mehta, Gautham Beeraka, and Pen-Chung Yew. 2013. Tile size selection revisited. ACM Trans. Archit. Code Optim. 10, 4 (Dec. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (July 2016), 83:1--83:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ravi Teja Mullapudi, Vinay Vasista, and Uday. Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 429--443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A study on balancing parallelism, data locality, and recomputation in existing PDE solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 793--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. PolyMage project, Apache 2.0 license 2017. PolyMage. Retrieved from https://bitbucket.org/udayb/polymage.Google ScholarGoogle Scholar
  17. PolyMagePage 2015. PolyMage: A DSL and compiler for automatic optimization of image processing pipelines. Retrieved from http://mcl.csa.iisc.ernet.in/polymage.html.Google ScholarGoogle Scholar
  18. Apan Qasem and Ken Kennedy. 2006. Profitable loop fusion and tiling using model-driven empirical search. In Proceedings of the International Conference on Supercomputing (ICS’06). 249--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012), 32:1--32:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation. 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. István Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop tiling in large-scale stencil codes at run-time with OPS. CoRR abs/1704.00693 (2017).Google ScholarGoogle Scholar
  22. Gerald Roth and Ken Kennedy. 1998. Loop fusion in high performance Fortran. In Proceedings of the International Conference on Supercomputing (ICS’98). 125--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jun Shirako, Kamal Sharma, Naznin Fauzia, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan, and Vivek Sarkar. 2012. Analytical bounds for optimal tile size selection. In Proceedings of the 21st International Conference on Compiler Construction. 101--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vinay Vasista, Kumudha Narasimhan, Siddharth Bhat, and Uday Bondhugula. 2017. Optimizing geometric multigrid method computation using a DSL approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Symposium on Programming Languages Design and Implementation. 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David Wonnacott. 1999. Time skewing for parallel computers. In Proceedings of the 12th Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 477--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Qing Yi and Ken Kennedy. 2004. Improving memory hierarchy performance through combined loop interchange and multi-level fusion. Int. J. High Perf. Comput. Applic. 18, 2 (2004), 237--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical overlapped tiling. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’12). 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Effective Fusion and Tile Size Model for PolyMage

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 42, Issue 3
      September 2020
      230 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/3430314
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 November 2020
      • Accepted: 1 June 2020
      • Revised: 1 March 2020
      • Received: 1 December 2018
      Published in toplas Volume 42, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)101
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!