skip to main content
research-article

PolyMage: Automatic Optimization for Image Processing Pipelines

Published:14 March 2015Publication History
Skip Abstract Section

Abstract

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective utilization of parallelism available on modern architectures. For applications that demand high performance, the traditional options are to use optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.

The focus of our system, PolyMage, is on automatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. Our optimization approach primarily relies on the transformation and code generation capabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically. Experimental results on a modern multicore system show that the performance achieved by our automatic approach is up to 1.81x better than that achieved through manual tuning in Halide, a state-of-the-art language and compiler for image processing pipelines. For a camera raw image processing pipeline, our performance is comparable to that of a hand-tuned implementation.

References

  1. Andrew Adams, Eino-Ville Talvala, Sung Hee Park, David E. Jacobs, Boris Ajdin, Natasha Gelfand, Jennifer Dolson, Daniel Vaquero, Jongmin Baek, Marius Tico, Hendrik P. A. Lensch, Wojciech Matusik, Kari Pulli, Mark Horowitz, and Marc Levoy. The Frankencamera: An Experimental Platform for Computational Photography. In ACM Transactions on Graphics, pages 29:1--29:12, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Corinne Ancourt and Francois Irigoin. Scanning polyhedra with do loops. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 39--50, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. Opentuner: An extensible frame- work for program autotuning. In International conference on Parallel Architectures and Compilation Techniques, pages 303--316, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Aubry, S. Paris, S. Hasinoff, J. Kautz, and F. Durand. Fast local laplacian filters: Theory and applications. ACM Transactions on Graphics, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. Tiling stencil computations to maximize parallelism. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 40:1--40:11, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cedric Bastoul. Code generation in the polyhedral model is easier than you think. In International conference on Parallel Architectures and Compilation Techniques, pages 7--16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. A model for fusion and code motion in an automatic parallelizing compiler. In International conference on Parallel Architectures and Compilation Techniques, pages 343--352, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 101--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ian Buck, Tim Foley, Daniel Reiter Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: stream computing on graphics hardware. In ACM Transactions on Graphics, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217--236, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jiawen Chen, Sylvain Paris, and Fredo Durand. Real-time edge-aware image processing with the bilateral grid. In ACM Transactions on Graphics, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. The CImg Library: C++ Template Image Processing Toolkit. http://cimg.sourceforge.net/.Google ScholarGoogle Scholar
  13. Albert Cohen, Sylvain Girbal, David Parello, M. Sigler, Olivier Temam, and Nicolas Vasilache. Facilitating the search for compositions of program transformations. In International conference on Supercomputing, pages 151--160, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Franklin C. Crow. Summed-area tables for texture mapping. In Annual conference on Computer Graphics and Interactive Techniques, pages 207--212, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. Liszt: A domain specific language for building portable mesh-based pde solvers. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 9:1--9:12, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Conal Elliott. Functional image synthesis. In Proceedings of Bridges, 2001.Google ScholarGoogle Scholar
  17. Sylvain Girbal, Nicolas Vasilache, Cedric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi-automatic composition of loop transformations. International Journal of Parallel Programming, 34(3):261--317, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google Glass. http://www.google.com/glass.Google ScholarGoogle Scholar
  19. Michael I. Gordon, William Thies, and Saman P. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman P. Amarasinghe. A stream compiler for communication-exposed architectures. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 291--303, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tobias Grosser, Albert Cohen, Justin Holewinski, P Sadayappan, and Sven Verdoolaege. Hybrid hexagonal/classical tiling for GPUs. In International symposium on Code Generation and Optimization, page 66, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tobias Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, and Sven Verdoolaege. Split tiling for GPUs: automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, pages 24--31, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Halide git version. https://github.com/halide/HalideCommit: 8a9a0f7153a6701b6d76a706dc08bbd12ba41396.Google ScholarGoogle Scholar
  24. Mary W. Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. Loop transformation recipes for code generation and auto-tuning. In International workshop on Languages and Compilers for Parallel Computing, pages 50--64, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chris Harris and Mike Stephens. A combined corner and edge detector. In Fourth Alvey Vision Conference, pages 147--151, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  26. Tom Henretty, Richard Veras, Franz Franchetti, Louis-Noel Pouchet, J. Ramanujam, and P. Sadayappan. A stencil compiler for short-vector simd architectures. In International conference on Supercomputing, pages 13--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Justin Holewinski, Louis-Noel Pouchet, and P Sadayappan. High-performance code generation for stencil computations on GPU architectures. In International conference on Super- computing, pages 311--320, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In International conference on Architectural Support for Programming Languages and Operating Systems, pages 349--362, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective Automatic Parallelization of Stencil Computations. In ACM SIGPLAN conference on Programming Languages Design and Implementation, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Leung, N.T. Vasilache, B. Meister, and R.A. Lethin. Methods and apparatus for joint parallelism and locality optimization in source code compilation, June 3 2010. WO Patent App. PCT/US2009/057,194.Google ScholarGoogle Scholar
  31. Sanyam Mehta, Pei-Hung Lin, and Pen-Chung Yew. Revisiting loop fusion in the polyhedral framework. In ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, pages 233--246, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. OpenCV: Open Source Computer Vision. http://opencv.org.Google ScholarGoogle Scholar
  33. Sylvain Paris, Samuel W. Hasinoff, and Jan Kautz. Local laplacian filters: Edge-aware image processing with a laplacian pyramid. In ACM Transactions on Graphics, pages 68:1--68:12, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sylvain Paris, Pierre Kornprobst, JackTumblin Tumblin, and Fredo Durand. Bilateral filtering: Theory and applications. Foundations and Trends R in Computer Graphics and Vision, 4(1):1--75, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. CoreImage. Apple Core Image programming guide.Google ScholarGoogle Scholar
  36. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Fredo Durand. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics, 31(4):32:1--32:12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Fredo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN conference on Programming Languages Design and Implementation, pages 519--530, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Michael A. Shantzis. A model for efficient and flexible image computing. In ACM Transactions on Graphics, pages 147--154, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions on Embedded Computing, 13(4s):134:1--134:25, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. William Thies, Michal Karczmarek, and Saman P. Amarasinghe. Streamit: A language for streaming applications. In International conference on Compiler Construction, pages 179--196, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In International Parallel and Distributed Processing Symposium, pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sven Verdoolaege. isl: An integer set library for the polyhedral model. In International Congress Conference on Mathematical Software, volume 6327, pages 299--302. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Wolf. More iteration space tiling. In International conference for High Performance Computing, Networking, Storage, and Analysis, pages 655--664, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In International Parallel and Distributed Processing Symposium, pages 171--180, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jingling Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xing Zhou, Jean-Pierre Giacalone, María Jesus Garzaran, Robert H. Kuhn, Yang Ni, and David Padua. Hierarchical overlapped tiling. In International symposium on Code Generation and Optimization, pages 207--218, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PolyMage: Automatic Optimization for Image Processing Pipelines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!