Abstract
SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.
- Pentium processor with MMX technology. http://edc.intel.com/Platforms/Previous/Processors/Pentium-MMX/.Google Scholar
- NAS parallel benchmark suite. http://www.nas.nasa.gov/Resources/Software/npb.html.Google Scholar
- Spec cpu2006. http://www.spec.org/cpu2006/.Google Scholar
- The SUIF 2 compiler system. http://suif.stanford.edu/suif/suif2/.Google Scholar
- R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. MICRO, 2010. Google Scholar
Digital Library
- A. Bik, M. Girkar, P. Grey, and X. Tian. Automatic intra-register vectorization for the intel architecture. IJPP, 2002. Google Scholar
Digital Library
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. Proc. of PLDI, 2008. Google Scholar
Digital Library
- D. DeVries. A vectorizing SUIF compiler: Implementation and performance. Master's Thesis, 1997.Google Scholar
- A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. PLDI, 2004. Google Scholar
Digital Library
- R. Hanxleden and K. Kennedy. Relaxing SIMD control flow constraints using loop transformations. PLDI, 1992. Google Scholar
Digital Library
- T. Henretty, K. Stock, L. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short-vector SIMD architectures. CC, 2011. Google Scholar
Digital Library
- M. Hohenauer, F. Engel, R. Leupers, G. Ascheid, and H. Meyr. A SIMD optimization framework for retargetable compilers. TACO, 2009. Google Scholar
Digital Library
- IBM. PowerPC microprocessor family: Vector/SIMD multimedi extension technology programming environments manual. IBM Systems and Technology Group, 2005.Google Scholar
- Intel. IA-32 intel architecture optimization reference manual. 2005.Google Scholar
- A. Krall and S. Lelait. Compilation techniques for multimedia processors. IJPP, 2000. Google Scholar
Cross Ref
- S. Larsen. Compilation techniques for short-vector instructions. PhD Thesis, 2006. Google Scholar
Digital Library
- S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. PLDI, 2000. Google Scholar
Digital Library
- C. Lee and D. DeVries. Initial results on the performance and cost of vector microprocessors. Micro, 1997. Google Scholar
Digital Library
- C. Lee and M. Stoodley. Simple vector microprocessors for multimedia applications. Micro, 1998. Google Scholar
Digital Library
- R. Leupers and F. David. A uniform optimization technique for offset assignment problems. ISSS, 1998. Google Scholar
Digital Library
- D. Nuzman and A. Zaks. Outer-loop vectorization - revisited for short SIMD architectures. PACT, 2008. Google Scholar
Digital Library
- D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. PLDI, 2006. Google Scholar
Digital Library
- S. Oberman, G. Favor, and F. Weber. Amd 3dnow! technology: Architecture and implementations. IEEE MICRO, 1999. Google Scholar
Digital Library
- G. Ren, P. Wu, and D. Padua. Optimizing data permutations for simd devices. PLDI, 2006. Google Scholar
Digital Library
- J. Shin. Compiler optimizations for architectures supporting superword-level parallelism. PhD Thesis, 2005. Google Scholar
Digital Library
- J. Shin, J. Chame, and M. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. PACT, 2002. Google Scholar
Digital Library
- J. Shin, J. Chame, and M. Hall. Exploiting superword-level locality in multimedia extension architectures. JILP, 2003.Google Scholar
- J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. CGO, 2005. Google Scholar
Digital Library
- N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. IJPP, 2000. Google Scholar
Cross Ref
- C. Tenllado, L. Pinuel, M. Prieto, and F. Catthoor. Pack transposition: Enhancing superword level parallelism exploitation. PARCO, 2005.Google Scholar
- C. Tenllado, L. P. M. Prieto, F. Tirado, and F. Catthoor. Improving superword level parallelism support in modern compilers. CODES+ISSS, 2005. Google Scholar
Digital Library
- M. Weiss. Strip-mining on SIMD architectures. ICS, 1991. Google Scholar
Digital Library
- R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C. Tseng,M. Hall,M. Lam, and J. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 1994. Google Scholar
Digital Library
- P.Wu, A. Eichenberger, and A.Wang. Efficient SIMD code generation for runtime alignment and length conversion. CGO, 2005. Google Scholar
Digital Library
Index Terms
A compiler framework for extracting superword level parallelism
Recommendations
A compiler framework for extracting superword level parallelism
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationSIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...







Comments