Abstract
Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with parallelism and locality. This article presents a new fusion algorithm for high-performance domain-specific compilers for image processing pipelines. The fusion algorithm is driven by dynamic programming and explores spaces of fusion possibilities not covered by previous approaches, and it is also driven by a cost function more concrete and precise in capturing optimization criteria than prior approaches. The fusion model is particularly tailored to the transformation and optimization sequence applied by PolyMage and Halide, two recent DSLs for image processing pipelines. Our model-driven technique when implemented in PolyMage provides significant improvements (up to 4.32×) over PolyMage’s approach (which uses auto-tuning to aid its model) and over Halide’s automatic approach (by up to 2.46×) on two state-of-the-art shared-memory multicore architectures.
- Protonu Basu, Anand Venkat, Mary W. Hall, Samuel W. Williams, Brian van Straalen, and Leonid Oliker. 2013. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid. In Proceedings of the 20th International Conference on High Performance Computing (HiPC’13). 452--461.Google Scholar
Cross Ref
- Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 343--352. Google Scholar
Digital Library
- Eddie C. Davis, Michelle Mills Strout, and Catherine Olschanowsky. 2018. Transforming loop chains via macro dataflow graphs. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 265--277. DOI:https://doi.org/10.1145/3168832 Google Scholar
Digital Library
- Johannes Doerfert, Shrey Sharma, and Sebastian Hack. 2018. Polyhedral expression propagation. In Proceedings of the 27th International Conference on Compiler Construction (CC’18). ACM, New York, NY, 25--36. Google Scholar
Digital Library
- Guang R. Gao, R. Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective loop fusion for array contraction. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing. 281--295. Google Scholar
Digital Library
- Google Inc. 2017. XLA (Accelerated Linear Algebra) for TensorFlow. Retrieved from https://www.tensorflow.org/performance/xla/.Google Scholar
- Abhinav Jangda and Uday Bondhugula. 2018. An effective fusion and tile size model for optimizing image processing pipelines. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 261--275. Google Scholar
Digital Library
- Ken Kennedy. 2001. Fast greedy weighted fusion. Int. J. Parallel Prog. 29, 5 (2001), 463--491. Google Scholar
Digital Library
- Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 301--320. Google Scholar
Digital Library
- Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’07). Google Scholar
Digital Library
- Nimrod Megiddo and Vivek Sarkar. 1997. Optimal weighted loop fusion for parallel programs. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA’97). 282--291. Google Scholar
Digital Library
- Sanyam Mehta, Gautham Beeraka, and Pen-Chung Yew. 2013. Tile size selection revisited. ACM Trans. Archit. Code Optim. 10, 4 (Dec. 2013). Google Scholar
Digital Library
- Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (July 2016), 83:1--83:11. Google Scholar
Digital Library
- Ravi Teja Mullapudi, Vinay Vasista, and Uday. Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 429--443. Google Scholar
Digital Library
- Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A study on balancing parallelism, data locality, and recomputation in existing PDE solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 793--804. Google Scholar
Digital Library
- PolyMage project, Apache 2.0 license 2017. PolyMage. Retrieved from https://bitbucket.org/udayb/polymage.Google Scholar
- PolyMagePage 2015. PolyMage: A DSL and compiler for automatic optimization of image processing pipelines. Retrieved from http://mcl.csa.iisc.ernet.in/polymage.html.Google Scholar
- Apan Qasem and Ken Kennedy. 2006. Profitable loop fusion and tiling using model-driven empirical search. In Proceedings of the International Conference on Supercomputing (ICS’06). 249--258. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012), 32:1--32:12. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation. 519--530. Google Scholar
Digital Library
- István Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop tiling in large-scale stencil codes at run-time with OPS. CoRR abs/1704.00693 (2017).Google Scholar
- Gerald Roth and Ken Kennedy. 1998. Loop fusion in high performance Fortran. In Proceedings of the International Conference on Supercomputing (ICS’98). 125--132. Google Scholar
Digital Library
- Jun Shirako, Kamal Sharma, Naznin Fauzia, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan, and Vivek Sarkar. 2012. Analytical bounds for optimal tile size selection. In Proceedings of the 21st International Conference on Compiler Construction. 101--121. Google Scholar
Digital Library
- Vinay Vasista, Kumudha Narasimhan, Siddharth Bhat, and Uday Bondhugula. 2017. Optimizing geometric multigrid method computation using a DSL approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17). Google Scholar
Digital Library
- M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Symposium on Programming Languages Design and Implementation. 30--44. Google Scholar
Digital Library
- David Wonnacott. 1999. Time skewing for parallel computers. In Proceedings of the 12th Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 477--480. Google Scholar
Digital Library
- Qing Yi and Ken Kennedy. 2004. Improving memory hierarchy performance through combined loop interchange and multi-level fusion. Int. J. High Perf. Comput. Applic. 18, 2 (2004), 237--253. Google Scholar
Digital Library
- Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical overlapped tiling. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’12). 207--218. Google Scholar
Digital Library
Index Terms
An Effective Fusion and Tile Size Model for PolyMage
Recommendations
An effective fusion and tile size model for optimizing image processing pipelines
PPoPP '18Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...






Comments