Abstract
This paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and leverages counter-example guided inductive synthesis (CEGIS) to find provably correct translations. Lifting existing code to a high-performance description language has a number of benefits, including maintainability and performance portability. For example, our experiments show that the lifted summaries can enable domain specific compilers to do a better job of parallelization as compared to an off-the-shelf compiler working on the original code, and can even support fully automatic migration to hardware accelerators such as GPUs. We have implemented verified lifting in a system called STNG and have evaluated it using microbenchmarks, mini-apps, and real-world applications. We demonstrate the benefits of verified lifting by first automatically summarizing Fortran source code into a high-level predicate language, and subsequently translating the lifted summaries into Halide, with the translated code achieving median performance speedups of 4.1X and up to 24X for non-trivial stencils as compared to the original implementation.
- Apache Hive. http://hive.apache.org.Google Scholar
- Syntax-Guided Synthesis Competition. http://www.sygus. org.Google Scholar
- Rajeev Alur, Rastislav Bod´ık, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–17, 2013.Google Scholar
- Saman P Amarasinghe, Jennifer-Ann M Anderson, Monica S Lam, and Chau-Wen Tseng. An overview of the SUIF compiler for scalable parallel machines. In PPSC, pages 662–667, 1995. Google Scholar
Digital Library
- Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. OpenTuner: An extensible framework for program autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pages 303–316, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS parallel benchmarks— summary and preliminary results. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, Supercomputing ’91, pages 158–165, New York, NY, USA, 1991. ACM. Google Scholar
Digital Library
- Gilles Barthe, Juan Manuel Crespo, Sumit Gulwani, Cesar Kunz, and Mark Marron. From relational verification to SIMD loop synthesis. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 123–134, 2013. Google Scholar
Digital Library
- John R. Baumgardner. Three-dimensional treatment of convective flow in the earth’s mantle. Journal of Statistical Physics, 39(5-6):501–511, 1985.Google Scholar
Cross Ref
- Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology. In Proceedings of the 11th International Conference on Supercomputing, ICS ’97, pages 340–347, New York, NY, USA, 1997. ACM. Google Scholar
Digital Library
- Aaron R. Bradley. SAT-based model checking without unrolling. In Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI’11, pages 70–87, 2011. Google Scholar
Digital Library
- Aaron R. Bradley, Zohar Manna, and Henny B. Sipma. What’s decidable about arrays? In Proceedings of the 7th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI’06, pages 427–442, Berlin, Heidelberg, 2006. Springer-Verlag. Google Scholar
Digital Library
- Peter E Bulychev, Egor V Kostylev, and Vladimir A Zakharov. Anti-unification algorithms and their applications in program analysis. In Perspectives of Systems Informatics, pages 413– 423. Springer, 2010. Google Scholar
Digital Library
- Bryan Catanzaro, Shoaib Kamil, Yunsup Lee, Krste Asanovic, James Demmel, Kurt Keutzer, John Shalf, Kathy Yelick, and Armando Fox. SEJITS: Getting productivity and performance with selective embedded JIT specialization. In Workshop on Programmable Models for Emerging Architecture (PMEA), 2009.Google Scholar
- B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel programmability and the Chapel language. International Journal High Performance Computing Applications, 21(3):291– 312, 2007. Google Scholar
Digital Library
- Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. Optimizing database-backed applications with query synthesis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 3–14, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- M. Christen, O. Schenk, and H. Burkhart. PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In International Parallel Distributed Processing Symposium (IPDPS), pages 676–687, May 2011. Google Scholar
Digital Library
- Alessandro Cimatti and Alberto Griggio. Software model checking via IC3. In Proceedings of the 24th International Conference on Computer Aided Verification, CAV’12, pages 277–293, 2012. Google Scholar
Digital Library
- Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, pages 4:1–4:12, Piscataway, NJ, USA, 2008. IEEE Press. Google Scholar
Digital Library
- Kei Davis and Daniel J. Quinlan. ROSE: An optimizing transformation system for C++ array-class libraries. In Workshop on Object-Oriented Technology, ECOOP ’98, pages 452–453, London, UK, UK, 1998. Springer-Verlag. Google Scholar
Digital Library
- Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
Digital Library
- Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 21st International Conference on Software Engineering, ICSE ’99, pages 213–224, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Paul Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23–53, 1991.Google Scholar
Digital Library
- Cormac Flanagan and Shaz Qadeer. Predicate abstraction for software verification. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’02, pages 191–202, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- Matteo Frigo and Volker Strumpen. Cache oblivious stencil computations. In Proceedings of the 19th Annual International Conference on Supercomputing, ICS ’05, pages 361–366, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- Pranav Garg, Christof Löding, P. Madhusudan, and Daniel Neider. ICE: A robust framework for learning invariants. In Armin Biere and Roderick Bloem, editors, Computer Aided Verification, volume 8559 of Lecture Notes in Computer Science, pages 69–87. Springer International Publishing, 2014. Google Scholar
Digital Library
- Tobias Grosser, Albert Cohen, Justin Holewinski, P. Sadayappan, and Sven Verdoolaege. Hybrid hexagonal/classical tiling for gpus. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO ’14, pages 66:66–66:75, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- Tobias Grosser, Sven Verdoolaege, Albert Cohen, and P. Sadayappan. The relation between diamond tiling and hexagonal tiling. Parallel Processing Letters, 24(03):1441002, 2014.Google Scholar
Cross Ref
- Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. Improving performance via mini-applications. Technical report, Sandia National Laboratory, 2009.Google Scholar
- C. A. R. Hoare. An axiomatic basis for computer programming. Communications of the ACM, 12(10):576–580, October 1969. Google Scholar
Digital Library
- Intel. The X10 parallel programming language. http:// x10-lang.org.Google Scholar
- Francois Irigoin and Remi Triolet. Supernode partitioning. In Symposium on Principles of Programming Languages (POPL’88), pages 319–328, San Diego, CA, January 1988. Google Scholar
Digital Library
- Ming-Yee Iu and Willy Zwaenepoel. HadoopToSQL: A MapReduce query optimizer. In Proceedings of the 5th European Conference on Computer Systems, EuroSys ’10, pages 251–264, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- Jinseong Jeon, Xiaokang Qiu, Armando Solar-Lezama, and Jeffrey S. Foster. Adaptive concretization for parallel program synthesis. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part II, pages 377–394, 2015.Google Scholar
- Rajeev Joshi, Greg Nelson, and Keith Randall. Denali: A goal-directed superoptimizer. pages 304–314, 2002. Google Scholar
Digital Library
- David Joyner, Ondˇrej ˇ Cert´ık, Aaron Meurer, and Brian E. Granger. Open source computer algebra systems: SymPy. ACM Communications in Computer Algebra, 45(3/4):225–234, January 2012. Google Scholar
Digital Library
- Shoaib Kamil. A cross-domain stencil benchmark suite. In Workshop on Optimizing Stencil Computations, 2013.Google Scholar
- Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, and Samuel Williams. An auto-tuning framework for parallel multicore stencil computations. In International Parallel and Distributed Processing Symposium (IPDPS), pages 1–12, 2010.Google Scholar
Cross Ref
- Aleksandr Karbyshev, Nikolaj Bjørner, Shachar Itzhaky, Noam Rinetzky, and Sharon Shoham. Property-directed inference of universal invariants or proving their absence. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I, pages 583–602, 2015.Google Scholar
- Martin Kong, Richard Veras, Kevin Stock, Franz Franchetti, Louis-No¨el Pouchet, and P. Sadayappan. When polyhedral transformations meet SIMD code generation. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, pages 127– 138, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- A.C. Mallinson, D.A. Beckingsale, W.P. Gaudin, J.A. Herdman, J.M. Levesque, and S.A. Jarvis. Cloverleaf: Preparing hydrodynamics codes for exascale. In Cray User Group, 2013.Google Scholar
- Henry Massalin. Superoptimizer: A look at the smallest program. In Proceedings of the Second International Conference on Architectual Support for Programming Languages and Operating Systems, ASPLOS II, pages 122–126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press. Google Scholar
Digital Library
- John Matthews, J. Strother Moore, Sandip Ray, and Daron Vroon. Verification condition generation via theorem proving. In Proceedings of the 13th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning, LPAR’06, pages 362–376, 2006. Google Scholar
Digital Library
- Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman Amarasinghe. Helium: Lifting high-performance stencil kernels from stripped x86 binaries to Halide DSL code. In ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, June 2015. Google Scholar
Digital Library
- Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pages 429–443, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- Phitchaya Mangpo Phothilimthana, Tikhon Jelvis, Rohin Shah, Nishant Totla, Sarah Chasins, and Rastislav Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 396–407, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- Gordon D Plotkin. A note on inductive generalization. Machine Intelligence, 5(1):153–163, 1970.Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, 2013. Google Scholar
Digital Library
- John C. Reynolds. Transformational systems and the algebraic structure of atomic formulas. In Bernard Meltzer and Donald Michie, editors, Machine Intelligence 5, pages 135–151. Edinburgh University Press, Edinburgh, Scotland, 1969.Google Scholar
- Gabriel Rivera and Chau-Wen Tseng. Tiling optimizations for 3D scientific computations. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, SC ’00, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic superoptimization. SIGPLAN Notices, 48(4):305–316, March 2013. Google Scholar
Digital Library
- Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, page 9, 2014. Google Scholar
Digital Library
- Armando Solar Lezama. Program Synthesis By Sketching. PhD thesis, EECS Department, University of California, Berkeley, Dec 2008.Google Scholar
- Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodk, Vijay A. Saraswat, and Sanjit A. Seshia. Sketching stencils. In Jeanne Ferrante and Kathryn S. McKinley, editors, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 167–178. ACM, 2007. Google Scholar
Digital Library
- Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pages 404–415, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. Delite: A compiler architecture for performance-oriented embedded domain-specific languages. ACM Transactions in Embedded Computer Systems, 13(4s):134:1–134:25, April 2014. Google Scholar
Digital Library
- Yuan Tang, Rezaul Chowdhury, Bradley C. Kuszmaul, Chi keung Luk, and Charles E. Leiserson. The Pochoir stencil compiler. In ACM Symposium on Parallelism in Algorithms and Architectures, (SPAA), 2011. Google Scholar
Digital Library
- Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. TRANSIT: specifying protocols with concolic snippets. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013, pages 287–296, 2013. Google Scholar
Digital Library
- Diego Vasco, Nelson Moraga, and Gundolf Haase. Parallel finite volume method simulation of three-dimensional fluid flow and convective heat transfer for viscoplastic non-newtonian fluids. J. Numerical Heat Transfer, Part A: Applications, 66(2):990–1019, 2014.Google Scholar
Cross Ref
- Diego Vasco, Nelson Moraga, Aurel Neic, and Gundolf Haase. OpenMP parallel acceleration of a 3D finite volume method based code for simulation of non-Newtonian fluid flows. In Sergio Gutiérrez, Daniel Hurtado, and Esteban Sáez, editors, Cuadernos de Mecánica Computacional, pages 108–117, Concepci´on, Chile, 2011. Sociedad Chilena de Mecánica Computacional.Google Scholar
- Ben Wiedermann, Ali Ibrahim, and William R. Cook. Interprocedural query extraction for transparent persistence. In Proceedings of the 23rd ACM SIGPLAN Conference on Objectoriented Programming Systems Languages and Applications, pages 19–36, 2008. Google Scholar
Digital Library
- Glynn Winskel. The Formal Semantics of Programming Languages: An Introduction. MIT Press, Cambridge, MA, USA, 1993. Google Scholar
Digital Library
- Christoph M. Wintersteiger, Youssef Hamadi, and Leonardo Mendonc¸a de Moura. Efficiently solving quantified bit-vector formulas. In Proceedings of 10th International Conference on Formal Methods in Computer-Aided Design, FMCAD 2010, Lugano, Switzerland, October 20-23, pages 239–246, 2010. Google Scholar
Digital Library
- David Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3):181–221, 2002. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, 2012. Google Scholar
Digital Library
- Fubo Zhang and Erik H. D’Hollander. Using hammock graphs to structure programs. IEEE Transactions on Software Engineering, 30(4):231–245, 2004. Google Scholar
Digital Library
Index Terms
Verified lifting of stencil computations
Recommendations
Verified lifting of stencil computations
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationThis paper demonstrates a novel combination of program synthesis and verification to lift stencil computations from low-level Fortran code to a high-level summary expressed using a predicate language. The technique is sound and mostly automated, and ...
Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift
Stencil computations are a widely used type of algorithm, found in applications from physical simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on modern hardware such as Graphic Processing Units perfectly. Although ...







Comments