Abstract
This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.
The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.
- Andrew Adams, Eino-Ville Talvala, Sung Hee Park, David E. Jacobs, Boris Ajdin, Natasha Gelfand, Jennifer Dolson, Daniel Vaquero, Jongmin Baek, Marius Tico, Hendrik P. A. Lensch, Wojciech Matusik, Kari Pulli, Mark Horowitz, and Marc Levoy. The Frankencamera: An Experimental Platform for Computational Photography. In ACM Transactions on Graphics, pages 29:1--29:12, 2010. Google Scholar
Digital Library
- Corinne Ancourt and Francois Irigoin. Scanning polyhedra with do loops. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 39--50, 1991. Google Scholar
Digital Library
- Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible frame- work for program autotuning. In International conference on Parallel Architectures and Compilation Techniques, pages 303--316, 2014. Google Scholar
Digital Library
- M. Aubry, S. Paris, S. Hasinoff, J. Kautz, and F. Durand. Fast local laplacian filters: Theory and applications. ACM Transactions on Graphics, 2014. Google Scholar
Digital Library
- Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. Tiling stencil computations to maximize parallelism. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 40:1--40:11, 2012. Google Scholar
Digital Library
- Cedric Bastoul. Code generation in the polyhedral model is easier than you think. In International conference on Parallel Architectures and Compilation Techniques, pages 7--16, 2004. Google Scholar
Digital Library
- Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques, pages 343--352, 2010. Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 101--113, 2008. Google Scholar
Digital Library
- Ian Buck, Tim Foley, Daniel Reiter Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. In ACM Transactions on Graphics, 2004. Google Scholar
Digital Library
- Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217--236, 1983. Google Scholar
Digital Library
- Jiawen Chen, Sylvain Paris, and Fredo Durand. Real-time edge-aware image processing with the bilateral grid. In ACM Transactions on Graphics, 2007. Google Scholar
Digital Library
- The CImg Library: C++ Template Image Processing Toolkit. http://cimg.sourceforge.net/.Google Scholar
- Albert Cohen, Sylvain Girbal, David Parello, M. Sigler, Olivier Temam, and Nicolas Vasilache. Facilitating the search for compositions of program transformations. In International conference on Supercomputing, pages 151--160, 2005. Google Scholar
Digital Library
- Franklin C. Crow. Summed-area tables for texture mapping. In Annual conference on Computer Graphics and Interactive Techniques, pages 207--212, 1984. Google Scholar
Digital Library
- Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. Liszt: A domain specific language for building portable mesh-based pde solvers. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 9:1--9:12, 2011. Google Scholar
Digital Library
- Conal Elliott. Functional image synthesis. In Proceedings of Bridges, 2001.Google Scholar
- Sylvain Girbal, Nicolas Vasilache, Cedric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261--317, 2006. Google Scholar
Digital Library
- Google Glass. http://www.google.com/glass.Google Scholar
- Michael I. Gordon, William Thies, and Saman P. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006. Google Scholar
Digital Library
- Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman P. Amarasinghe. A stream compiler for communication-exposed architectures. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, 2002. Google Scholar
Digital Library
- Tobias Grosser, Albert Cohen, Justin Holewinski, P Sadayappan, and Sven Verdoolaege. Hybrid hexagonal/classical tiling for GPUs. In International symposium on Code Generation and Optimization, page 66, 2014. Google Scholar
Digital Library
- Tobias Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Split tiling for GPUs: automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pages 24--31, 2013. Google Scholar
Digital Library
- Halide git version. https://github.com/halide/HalideCommit: 8a9a0f7153a6701b6d76a706dc08bbd12ba41396.Google Scholar
- Mary W. Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In International workshop on Languages and Compilers for Parallel Computing, pages 50--64, 2009. Google Scholar
Digital Library
- Chris Harris and Mike Stephens. A combined corner and edge detector. In Fourth Alvey Vision Conference, pages 147--151, 1988.Google Scholar
Cross Ref
- Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In International conference on Supercomputing, pages 13--24, 2013. Google Scholar
Digital Library
- Justin Holewinski, Louis-Noel Pouchet, and P Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Super- computing, pages 311--320, 2012. Google Scholar
Digital Library
- Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 349--362, 2012. Google Scholar
Digital Library
- Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007. Google Scholar
Digital Library
- A. Leung, N.T. Vasilache, B. Meister, and R.A. Lethin. Methods and apparatus for joint parallelism and locality optimization in source code compilation, June 3 2010. WO Patent App. PCT/US2009/057,194.Google Scholar
- Sanyam Mehta, Pei-Hung Lin, and Pen-Chung Yew. Revisiting loop fusion in the polyhedral framework. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 233--246, 2014. Google Scholar
Digital Library
- OpenCV: Open Source Computer Vision. http://opencv.org.Google Scholar
- Sylvain Paris, Samuel W. Hasinoff, and Jan Kautz. Local laplacian filters: Edge-aware image processing with a laplacian pyramid. In ACM Transactions on Graphics, pages 68:1--68:12, 2011. Google Scholar
Digital Library
- Sylvain Paris, Pierre Kornprobst, JackTumblin Tumblin, and Fredo Durand. Bilateral filtering: Theory and applications. Foundations and Trends R in Computer Graphics and Vision, 4(1):1--75, 2009. Google Scholar
Digital Library
- CoreImage. Apple Core Image programming guide.Google Scholar
- Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Fredo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4):32:1--32:12, 2012. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013. Google Scholar
Digital Library
- Michael A. Shantzis. A model for efficient and flexible image computing. In ACM Transactions on Graphics, pages 147--154, 1994. Google Scholar
Digital Library
- Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing, 13(4s):134:1--134:25, 2014. Google Scholar
Digital Library
- William Thies, Michal Karczmarek, and Saman P. Amarasinghe. Streamit: A language for streaming applications. In International conference on Compiler Construction, pages 179--196, 2002. Google Scholar
Digital Library
- Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In International Parallel and Distributed Processing Symposium, pages 1--12, 2009. Google Scholar
Digital Library
- Sven Verdoolaege. isl: An integer set library for the polyhedral model. In International Congress Conference on Mathematical Software, volume 6327, pages 299--302. 2010. Google Scholar
Digital Library
- M. Wolf. More iteration space tiling. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 655--664, 1989. Google Scholar
Digital Library
- D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In International Parallel and Distributed Processing Symposium, pages 171--180, 2000. Google Scholar
Digital Library
- Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000. Google Scholar
Digital Library
- Xing Zhou, Jean-Pierre Giacalone, María Jesus Garzaran, Robert H. Kuhn, Yang Ni, and David Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012. Google Scholar
Digital Library
Index Terms
PolyMage: Automatic Optimization for Image Processing Pipelines
Recommendations
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsThis paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines
PLDI '13Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. ...







Comments