skip to main content
article

Auto-vectorization for image processing DSLs

Published:21 June 2017Publication History
Skip Abstract Section

Abstract

The parallelization of programs and distributing their workloads to multiple threads can be a challenging task. In addition to multi-threading, harnessing vector units in CPUs proves highly desirable. However, employing vector units to speed up programs can be quite tedious. Either a program developer solely relies on the auto-vectorization capabilities of the compiler or he manually applies vector intrinsics, which is extremely error-prone, difficult to maintain, and not portable at all.

Based on whole-function vectorization, a method to replace control flow with data flow, we propose auto-vectorization techniques for image processing DSLs in the context of source-to-source compilation. The approach does not require the input to be available in SSA form. Moreover, we formulate constraints under which the vectorization analysis and code transformations may be greatly simplified in the context of image processing DSLs. As part of our methodology, we present control flow to data flow transformation as a source-to-source translation. Moreover, we propose a method to efficiently analyze algorithms with mixed bit-width data types to determine the optimal SIMD width, independently of the target instruction set. The techniques are integrated into an open source DSL framework. Subsequently, the vectorization capabilities are compared to a variety of existing state-of-the-art C/C++ compilers. A geometric mean speedup of up to 3.14 is observed for benchmarks taken from ISPC and image processing, compared to non-vectorized executions.

References

  1. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proceedings of the 10th Symposium on Principles of Programming Languages (POPL), pages 177–189, Austin, Texas, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. B. Asher and N. Rotem. Hybrid type legalization for a sparse SIMD instruction set. ACM Transactions on Architecture and Code Optimization (TACO), 10(3):Article No. 11, September 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. S. Baghsorkhi, N. Vasudevan, and Y. Wu. FlexVec: Auto-vectorization for irregular loops. In Proceedings of the 37th International Conference on Programming Language Design and Implementation (PLDI), pages 697–710, Santa Barbara, CA, USA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Harris and M. Stephens. A combined corner and edge detector. In Proceedings of the 4th Alvey Vision Conference, pages 147–151, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  5. H. W. Jensen, S. Premoze, P. Shirley, W. B. Thompson, J. A. Ferwerda, and M. M. Stark. Night rendering. Technical Report UUCS-00-016, Computer Science Department, University of Utah, Aug. 2000.Google ScholarGoogle Scholar
  6. R. Karrenberg and S. Hack. Whole-function vectorization. In Proceedings of the 9th International Symposium on Code Generation and Optimization (CGO), pages 141–150, Chamonix, France, April 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Karrenberg and S. Hack. Improving performance of OpenCL on CPUs. In Proceedings of the 21st International Conference on Compiler Construction (CC), pages 1–20, Tallinn, Estonia, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Krall and S. Lelait. Compilation techniques for multimedia processors. Journal of Parallel Programming, 28(4):347–361, August 2000.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI), pages 145–156, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Leißa, I. Haffner, and S. Hack. Sierra: A SIMD extension for C ++. In Proceedings of the Workshop on Programming Models for SIMD/Vector Processing, pages 17–24, Orlando, Florida, USA, February 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Levine, D. Callahan, and J. Dongarra. A comparative study of automatic vectorizing compilers. Journal of Parallel Computing, 17(10): 1223–1244, December 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner, and W. Eckert. HIPAcc: A domain-specific language and compiler for image processing. IEEE Transactions on Parallel and Distributed Systems, 27(1):210–224, January 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 281–294, New York, USA, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Nuzman and A. Zaks. Outer-loop vectorization - revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 2–11, Toronto, Canada, October 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Pharr and W. R. Mark. ISPC: A SPMD compiler for high-performance CPU programming. In Proceedings of the International Conference on Innovative Parallel Computing (InPar), pages 1–13, San Jose, USA, May 2012.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Püschel, F. Franchetti, and Y. Voronenko. Spiral. In D. Padua, editor, Encyclopedia of Parallel Computing. 2011.Google ScholarGoogle Scholar
  17. J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th Conference on Programming Language Design and Implementation (PLDI), pages 519–530, Seattle, USA, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Saito, S. Preis, N. Panchenko, and X. Tian. Reducing the Functionality Gap Between Auto-Vectorization and Explicit Vectorization, pages 173– 186. Nara, Japan, Oct. 2016.Google ScholarGoogle Scholar
  19. C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler, and J. Teich. ExaSlang: A domain-specific language for highly scalable multigrid solvers. In Proceedings of the 4th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pages 42–51, New Orleans, LA, USA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. J. Shensa. The discrete wavelet transform: Wedding the À Trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40(10): 2464–2482, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 165–175, San Jose, USA, March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. Journal of Parallel Programming, 28(4):363–400, August 2000.Google ScholarGoogle ScholarCross RefCross Ref
  23. F. Stein. Efficient computation of optical flow using the Census Transform. In C. Rasmussen, H. Bülthoff, B. Schölkopf, and M. Giese, editors, Pattern Recognition, volume 3175 of Lecture Notes in Computer Science, pages 79–86. 2004.Google ScholarGoogle Scholar
  24. Y. Sui, X. Fan, H. Zhou, and J. Xue. Loop-oriented array- and fieldsensitive pointer analysis for automatic SIMD vectorization. In Proceedings of the 17th International Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES), pages 41–51, Santa Barbara, CA, USA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Auto-vectorization for image processing DSLs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 5
          LCTES '17
          May 2017
          120 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140582
          Issue’s Table of Contents
          • cover image ACM Conferences
            LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
            June 2017
            120 pages
            ISBN:9781450350303
            DOI:10.1145/3078633
            • General Chair:
            • Vijay Nagarajan,
            • Program Chair:
            • Zili Shao

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!