skip to main content
research-article

From relational verification to SIMD loop synthesis

Published:23 February 2013Publication History
Skip Abstract Section

Abstract

Existing pattern-based compiler technology is unable to effectively exploit the full potential of SIMD architectures. We present a new program synthesis based technique for auto-vectorizing performance critical innermost loops. Our synthesis technique is applicable to a wide range of loops, consistently produces performant SIMD code, and generates correctness proofs for the output code. The synthesis technique, which leverages existing work on relational verification methods, is a novel combination of deductive loop restructuring, synthesis condition generation and a new inductive synthesis algorithm for producing loop-free code fragments. The inductive synthesis algorithm wraps an optimized depth-first exploration of code sequences inside a CEGIS loop. Our technique is able to quickly produce SIMD implementations (up to 9 instructions in 0.12 seconds) for a wide range of fundamental looping structures. The resulting SIMD implementations outperform the original loops by 2.0x-3.7x.

References

  1. M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In OOPSLA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Barr, C. Bird, and M. Marron. Collecting a Heap of Shapes. Technical Report MSR-TR-2011-135, Microsoft Research, Dec. 2011.Google ScholarGoogle Scholar
  4. G. Barthe, J. M. Crespo, and C. Kunz. Relational verification using product programs. In FM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Barthe, J. M. Crespo, and C. Kunz. Beyond 2-safety: Asymmetric product programs for relational program verification. In LFCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Barthe, P. R. DArgenio, and T. Rezk. Secure information flow by self-composition. In CSFW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. SPUR: A trace-based JIT compiler for CIL. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Benton. Simple relational correctness proofs for static analyses and program transformations. In POPL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gulwani. Dimensions in program synthesis. In PPDP, 2010. Invited talk paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gulwani. Synthesis from examples: Interaction models and algorithms. SYNASC, 2012. Invited talk paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loopfree programs. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Gulwani, V. A. Korthikanti, and A. Tiwari. Synthesizing geometry constructions. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel Optimization Manual (June 2011) -- Section 6.5.1. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.Google ScholarGoogle Scholar
  15. S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ICSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Joshi, G. Nelson, and K. H. Randall. Denali: A goal-directed superoptimizer. In PLDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: Effective selection of data structures. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Kennedy and J. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K.-K. Ma and J. Foster. Inferring aliasing and encapsulation properties for java. In OOPSLA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Maleki, Y. Gao, M. Garzaran, T.Wong, and D. Padua. An evaluation of vectorizing compilers. In PACT, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Marron. Structural analysis: Shape information via points-to computation. Technical Report 1201.1277, arXiv, Jan. 2012.Google ScholarGoogle Scholar
  23. H. Massalin. Superoptimizer - a look at the smallest program. In ASPLOS, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Menon, O. Tamuz, S. Gulwani, B. Lampson, and A. Kalai. A machine learning framework for programming by example. In ICML, 2013.Google ScholarGoogle Scholar
  25. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Necula. Proof-carrying code. In POPL, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Necula and P. Lee. Safe kernel extensions without run-time checking. In OSDI, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Pnueli, M. Siegel, and F. Singerman. Translation validation. In TACAS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Ren, G. Agrawal, J. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD parallelization of applications that traverse irregular data structures. In CGO, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In ESEC/FSE-13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Shin, M. Hall, and J. Cha. Superword-level parallelism in the presence of control flow. In CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Singh, S. Gulwani, and S. Rajamani. Automatically generating algebra problems. In AAAI, 2012.Google ScholarGoogle Scholar
  35. A. Solar Lezama. Program Synthesis By Sketching. PhD thesis, EECS Department, University of California, Berkeley, Dec 2008.Google ScholarGoogle Scholar
  36. A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. SPEC. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  38. S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In POPL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R.Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution-time problem: Overview of methods and survey of tools. ACM TECS, 7(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P.Wu, A. Eichenberger, and A.Wang. Efficient SIMD code generation for runtime alignment and length conversion. In CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), 2005.Google ScholarGoogle ScholarCross RefCross Ref
  42. A. Zaks and A. Pnueli. Covac: Compiler validation by program analysis of the cross-product. 2008.Google ScholarGoogle Scholar
  43. L. D. Zuck, A. Pnueli, and B. Goldberg. Voc: A methodology for the translation validation of optimizing compilers. J. UCS, 9(3), 2003.Google ScholarGoogle Scholar

Index Terms

  1. From relational verification to SIMD loop synthesis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 8
        PPoPP '13
        August 2013
        309 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2517327
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2013
          332 pages
          ISBN:9781450319225
          DOI:10.1145/2442516

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 February 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!