Abstract
SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that directly expand to machine instructions. They significantly decrease programmer productivity by enforcing a very error-prone and hard-to-read assembly-like programming style. Furthermore, intrinsics are not portable because they are tied to a specific instruction set.
In this paper, we show how a C-like language can be extended to allow for portable and efficient SIMD programming. Our extension puts the programmer in total control over where and how control-flow vectorization is triggered. We present a type system and a formal semantics of our extension and prove the soundness of the type system. Using our prototype implementation IVL that targets Intel's MIC architecture and SSE instruction set, we show that the generated code is roughly on par with handwritten intrinsic code.
- J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of Control Dependence to Data Dependence. In POPL, 1983. Google Scholar
Digital Library
- R. Allen and K. Kennedy. Automatic Translation of FORTRAN Programs to Vector Form. ACM Trans. Program. Lang. Syst., 1987. Google Scholar
Digital Library
- aobench. URL http://code.google.com/p/aobench/.Google Scholar
- G. E. Blelloch et al. Implementation of a Portable Nested Data-Parallel Language. In PPOPP, 1993. Google Scholar
Digital Library
- A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 2000. Google Scholar
Digital Library
- M. Farrar. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics, 23: 156--161, January 2007. Google Scholar
Digital Library
- I. Georgiev and P. Slusallek. RTfact: Generic Concepts for Flexible and High Performance Ray Tracing. In IEEE/Eurographics Symposium on Interactive Ray Tracing, 2008.Google Scholar
- A. Ghuloum et al. Future-Proof Data Parallel Algorithms and Software on Intel Multi-Core Architecture. Intel Technology Journal, 11 (04), November 2007.Google Scholar
Cross Ref
- GNU Press. Using the GNU Compiler Collection. For GCC version 4.6.2.Google Scholar
- P. Hanrahan and J. Lawson. A Language for Shading and Lighting Calculations. In SIGGRAPH, 1990. Google Scholar
Digital Library
- }ICCIntel Corp. Intel® Compilers and Libraries. URL http://software.intel.com/en-us/articles/intel-compilers.Google Scholar
- Intel Corp. Intel SPMD Program Compiler. URL http://ispc.github.com.Google Scholar
- Intel Corp. Intel® 64 and IA-32 Architectures Optimization Reference Manual, 2009.Google Scholar
- Intel Corp. The Intel Many Integrated Core (MIC) Architecture, 2010.Google Scholar
- K. E. Iverson. A Programming Language. John Wiley & Sons, Inc., 1962. Google Scholar
Digital Library
- R. Karrenberg and S. Hack. Whole Function Vectorization. In CGO, 2011. Google Scholar
Digital Library
- Khronos Group. OpenCL 1.0 Specification, 2009.Google Scholar
- A. Krall and S. Lelait. Compilation Techniques for Multimedia Processors. Int. J. Parallel Program., 28 (4): 347--361, 2000. Google Scholar
Cross Ref
- S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. PLDI, 35 (5): 145--156, 2000. Google Scholar
Digital Library
- R. Leißa, S. Hack, and I. Wald. Extending a C-like Language for Portable SIMD Programming. The full version of our PPoPP'12 paper available online at http://www.cdl.uni-saarland.de/projects/vecimp.Google Scholar
- K.-C. Li and H. Schwetman. Vector C--A Vector Processing Language. Journal of Parallel and Distributed Computing, 2 (2): 132 -- 169, 1985.Google Scholar
Cross Ref
- A. Lokhmotov, B. R. Gaster, A. Mycroft, N. Hickey, and D. Stuttard. Revisiting SIMD Programming. In LCPC, pages 32--46, 2007.Google Scholar
- MatLab. URL http://www.mathworks.com/products/matlab.Google Scholar
- M. McCool. A Retargetable, Dynamic Compiler and Embedded language. In CGO, 2011.Google Scholar
- G. Michaelson and P. Cockshott. Vector Pascal, an array language, 2002.Google Scholar
- V. Ngo. Parallel Loop Transformation Techniques For Vector-Based Multiprocessor Systems. PhD thesis, University of Minnesota, 1994. Google Scholar
Digital Library
- M. Norrish. C formalised in HOL. PhD thesis, University of Cambridge, 1998.Google Scholar
- D. Nuzman and R. Henderson. Multi-platform Auto-vectorization. In CGO, 2006. Google Scholar
Digital Library
- D. Nuzman and A. Zaks. Outer-Loop Vectorization: Revisited for Short SIMD Architectures. In PACT, 2008. Google Scholar
Digital Library
- NVIDIA. CUDA Programming Guide, 2009.Google Scholar
- R. G. Scarborough and H. G. Kolsky. A vectorizing Fortran compiler. IBM J. Res. Dev., 30 (2): 163--171, 1986. Google Scholar
Digital Library
- L. Seiler et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. In SIGGRAPH, 2008. Google Scholar
Digital Library
- J. Shin. Introducing Control Flow into Vectorized Code. In PACT '07, 2007. Google Scholar
Digital Library
- J. Shin, C. Jacqueline, and M. W. Hall. Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures. In PACT, 2002. Google Scholar
Digital Library
- N. Sreraman and R. Govindarajan. A Vectorizing Compiler for Multimedia Extensions. Int. J. Parallel Program., 28 (4): 363--400, 2000. Google Scholar
Cross Ref
- I. Wald. Fast Construction of SAH BVHs on the Intel® Many Integrated Core (MIC) Architecture. IEEE Transactions on Visualization and Computer Graphics, 99, 2010. Google Scholar
Digital Library
- J. Zhou and K. A. Ross. Implementing Database Operations Using SIMD Instructions. In SIGMOD, 2002. Google Scholar
Digital Library
Index Terms
Extending a C-like language for portable SIMD programming
Recommendations
Simple, portable and fast SIMD intrinsic programming: generic simd library
WPMVP '14: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processingUsing SIMD (Single Instruction Multiple Data) is a cost-effective way to explore data parallelism on modern processors. Most processor vendors today provide SIMD engines, such as Altivec/VSX for POWER, SSE/AVX for Intel processors, and NEON for ARM. ...
Extending a C-like language for portable SIMD programming
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingSIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and ...
Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumSIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the ...







Comments