Abstract
Existing pattern-based compiler technology is unable to effectively exploit the full potential of SIMD architectures. We present a new program synthesis based technique for auto-vectorizing performance critical innermost loops. Our synthesis technique is applicable to a wide range of loops, consistently produces performant SIMD code, and generates correctness proofs for the output code. The synthesis technique, which leverages existing work on relational verification methods, is a novel combination of deductive loop restructuring, synthesis condition generation and a new inductive synthesis algorithm for producing loop-free code fragments. The inductive synthesis algorithm wraps an optimized depth-first exploration of code sequences inside a CEGIS loop. Our technique is able to quickly produce SIMD implementations (up to 9 instructions in 0.12 seconds) for a wide range of fundamental looping structures. The resulting SIMD implementations outperform the original loops by 2.0x-3.7x.
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In OOPSLA, 2000. Google Scholar
Digital Library
- S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006. Google Scholar
Digital Library
- E. Barr, C. Bird, and M. Marron. Collecting a Heap of Shapes. Technical Report MSR-TR-2011-135, Microsoft Research, Dec. 2011.Google Scholar
- G. Barthe, J. M. Crespo, and C. Kunz. Relational verification using product programs. In FM, 2011. Google Scholar
Digital Library
- G. Barthe, J. M. Crespo, and C. Kunz. Beyond 2-safety: Asymmetric product programs for relational program verification. In LFCS, 2013.Google Scholar
Cross Ref
- G. Barthe, P. R. DArgenio, and T. Rezk. Secure information flow by self-composition. In CSFW, 2004. Google Scholar
Digital Library
- M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. SPUR: A trace-based JIT compiler for CIL. In OOPSLA, 2010. Google Scholar
Digital Library
- N. Benton. Simple relational correctness proofs for static analyses and program transformations. In POPL, 2004. Google Scholar
Digital Library
- P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In PLDI, 2005. Google Scholar
Digital Library
- S. Gulwani. Dimensions in program synthesis. In PPDP, 2010. Invited talk paper. Google Scholar
Digital Library
- S. Gulwani. Synthesis from examples: Interaction models and algorithms. SYNASC, 2012. Invited talk paper. Google Scholar
Digital Library
- S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loopfree programs. In PLDI, 2011. Google Scholar
Digital Library
- S. Gulwani, V. A. Korthikanti, and A. Tiwari. Synthesizing geometry constructions. In PLDI, 2011. Google Scholar
Digital Library
- Intel Optimization Manual (June 2011) -- Section 6.5.1. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.Google Scholar
- S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ICSE, 2010. Google Scholar
Digital Library
- R. Joshi, G. Nelson, and K. H. Randall. Denali: A goal-directed superoptimizer. In PLDI, 2002. Google Scholar
Digital Library
- C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: Effective selection of data structures. In PLDI, 2011. Google Scholar
Digital Library
- K. Kennedy and J. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google Scholar
Digital Library
- S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, 2000. Google Scholar
Digital Library
- K.-K. Ma and J. Foster. Inferring aliasing and encapsulation properties for java. In OOPSLA, 2007. Google Scholar
Digital Library
- S. Maleki, Y. Gao, M. Garzaran, T.Wong, and D. Padua. An evaluation of vectorizing compilers. In PACT, 2011. Google Scholar
Digital Library
- M. Marron. Structural analysis: Shape information via points-to computation. Technical Report 1201.1277, arXiv, Jan. 2012.Google Scholar
- H. Massalin. Superoptimizer - a look at the smallest program. In ASPLOS, 1987. Google Scholar
Digital Library
- A. Menon, O. Tamuz, S. Gulwani, B. Lampson, and A. Kalai. A machine learning framework for programming by example. In ICML, 2013.Google Scholar
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In ASPLOS, 2009. Google Scholar
Digital Library
- G. Necula. Proof-carrying code. In POPL, 1997. Google Scholar
Digital Library
- G. Necula and P. Lee. Safe kernel extensions without run-time checking. In OSDI, 1996. Google Scholar
Digital Library
- D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, 2006. Google Scholar
Digital Library
- D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT, 2008. Google Scholar
Digital Library
- A. Pnueli, M. Siegel, and F. Singerman. Translation validation. In TACAS, 1998. Google Scholar
Digital Library
- B. Ren, G. Agrawal, J. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD parallelization of applications that traverse irregular data structures. In CGO, 2013.Google Scholar
Digital Library
- K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In ESEC/FSE-13, 2005. Google Scholar
Digital Library
- J. Shin, M. Hall, and J. Cha. Superword-level parallelism in the presence of control flow. In CGO, 2005. Google Scholar
Digital Library
- R. Singh, S. Gulwani, and S. Rajamani. Automatically generating algebra problems. In AAAI, 2012.Google Scholar
- A. Solar Lezama. Program Synthesis By Sketching. PhD thesis, EECS Department, University of California, Berkeley, Dec 2008.Google Scholar
- A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In PLDI, 2005. Google Scholar
Digital Library
- SPEC. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org/cpu2006/.Google Scholar
- S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In POPL, 2010. Google Scholar
Digital Library
- R.Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution-time problem: Overview of methods and survey of tools. ACM TECS, 7(3), 2008. Google Scholar
Digital Library
- P.Wu, A. Eichenberger, and A.Wang. Efficient SIMD code generation for runtime alignment and length conversion. In CGO, 2005. Google Scholar
Digital Library
- K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), 2005.Google Scholar
Cross Ref
- A. Zaks and A. Pnueli. Covac: Compiler validation by program analysis of the cross-product. 2008.Google Scholar
- L. D. Zuck, A. Pnueli, and B. Goldberg. Voc: A methodology for the translation validation of optimizing compilers. J. UCS, 9(3), 2003.Google Scholar
Index Terms
From relational verification to SIMD loop synthesis
Recommendations
From relational verification to SIMD loop synthesis
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingExisting pattern-based compiler technology is unable to effectively exploit the full potential of SIMD architectures. We present a new program synthesis based technique for auto-vectorizing performance critical innermost loops. Our synthesis technique ...
Relaxing SIMD control flow constraints using loop transformations
PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementationMany loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-...
Relaxing SIMD control flow constraints using loop transformations
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-...







Comments