Abstract
Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.
We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.
Supplemental Material
Available for Download
Supplemental material.
- Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1--29:12. Google Scholar
Digital Library
- Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2011. Fast and robust pyramid-based image processing. Tech. Rep. MIT-CSAIL-TR-2011-049, Massachusetts Institute of Technology.Google Scholar
- Buck, I. 2007. GPU computing: Programming a massively parallel processor. In CGO '07: Proceedings of the International Symposium on Code Generation and Optimization, IEEE Computer Society, 17. Google Scholar
Digital Library
- Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics 26, 3 (July), 103:1--103:9. Google Scholar
Digital Library
- CoreImage. Apple CoreImage programming guide. http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/CoreImaging.Google Scholar
- Elliott, C., Finne, S., and de Moor, O. 2003. Compiling embedded languages. Journal of Functional Programming 13, 2. Updated version of paper by the same name that appeared in SAIG '00 proceedings. Google Scholar
Digital Library
- Elliott, C. 2001. Functional image synthesis. In Proceedings of Bridges.Google Scholar
- Fatahalian, K., Horn, D. R., Knight, T. J., Leem, L., Houston, M., Park, J. Y., Erez, M., Ren, M., Aiken, A., Dally, W. J., and Hanrahan, P. 2006. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, SC '06. Google Scholar
Digital Library
- Feautrier, P. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20.Google Scholar
- Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., Meli, A. S., Leger, C., Lamb, A. A., Wong, J., Hoffman, H., Maze, D. Z., and Amarasinghe, S. 2002. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Guenter, B., and Nehab, D. 2010. The neon image processing language. Tech. Rep. MSR-TR-2010-175, Microsoft Research.Google Scholar
- IPP. Intel Integrated Performance Primitives. http://software.intel.com/en-us/articles/intel-ipp/.Google Scholar
- Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., and Towles, B. 2002. Stream scheduling. Concurrent VLSI Architecture Tech Report 122, Stanford University, March.Google Scholar
- Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 4.Google Scholar
Cross Ref
- Levoy, M. 1994. Spreadsheets for images. In Proceedings of SIGGRAPH 94, Computer Graphics Proceedings, Annual Conference Series, 139--146. Google Scholar
Digital Library
- Li, C., Xu, C., Gui, C., and Fox, M. D. 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 12 (December), 3243--3254. Google Scholar
Digital Library
- LLVM. The LLVM compiler infrastructure. http://llvm.org.Google Scholar
- McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Graphics Hardware 2002, 57--68. Google Scholar
Digital Library
- Newburn, C. J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S. D., Wang, Z. G., Du, Z. H., Chen, Y., Wu, G., Guo, P., Liu, Z., and Zhang, D. 2011. Intel's array building blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, CGO '11, 224--235. Google Scholar
Digital Library
- OpenCL, 2011. The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
- OpenMP. OpenMP. http://openmp.org/.Google Scholar
- Paris, S., and Durand, F. 2009. A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision 81, 1, 24--52. Google Scholar
Digital Library
- Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2009. Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision. Google Scholar
Digital Library
- Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid. ACM Transactions on Graphics 30, 4. Google Scholar
Digital Library
- Pharr, M., and Mark, W. R. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of Innovative Parallel Computing (InPar).Google Scholar
- PixelBender. Adobe PixelBender reference. http://www.adobe.com/content/dam/Adobe/en/devnet/pixelbender/pdfs/pixelbender_reference.pdf.Google Scholar
- Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation" 93, 2, 232--275.Google Scholar
- Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, SIGGRAPH '94, 147--154. Google Scholar
Digital Library
Index Terms
Decoupling algorithms from schedules for easy optimization of image processing pipelines
Recommendations
Automatically scheduling halide image processing pipelines
The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a ...
Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines
Seminal Graphics Papers: Pushing the Boundaries, Volume 2Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions ...
PACXXv2 + RV: An LLVM-based Portable High-Performance Programming Model
LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPCTo achieve high performance on today's high-performance computing (HPC) systems multiple programming models have to be used. An example for this burden to the developer is OpenCL: the OpenCL's SPMD programming model must be used together with a host ...






Comments