ABSTRACT
Performance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations are hard to obtain because they often involve non-traditional transformations, which means that they cannot be produced by optimizing the reference stencil with a compiler. In fact, many stencils are produced by code generators that were tediously handcrafted.
In this paper, we show how stencil implementations can be produced with sketching. Sketching is a software synthesis approach where the programmer develops a partial implementation--a sketch--and a separate specification of the desired functionality given by a reference (unoptimized) stencil. The synthesizer then completes the sketch to behave like the specification, filling in code fragments that are difficult to develop manually.
Existing sketching systems work only for small finite programs, i.e.,, programs that can be represented as small Boolean circuits. In this paper, we develop a sketching synthesizer that works for stencil computations, a large class of programs that, unlike circuits, have unbounded inputs and outputs, as well as an unbounded number of computations. The key contribution is a reduction algorithm that turns a stencil into a circuit, allowing us to synthesize stencils using an existing sketching synthesizer.
- W. Ackermann. Solvable cases of the decision problem. Studies in Logic and the Foundations. of Mathematics. North Ü Holland, 1954.Google Scholar
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google Scholar
Digital Library
- W. L. Briggs, V. E. Henson, and S. F. McCormick. A Multigrid Tutorial. SIAM, 2000. Google Scholar
Digital Library
- R. E. Bryant, S. German, and M. N. Velev. Processor verification using efficient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):1--41, January 2001. Google Scholar
Digital Library
- R. E. Bryant, D. Kroening, J. Ouaknine, S. A. Seshia, O. Strichman, and B. Brady. Deciding bit-vector arithmetic with abstraction. In Proc. TACAS 2007, March 2007. Google Scholar
Digital Library
- D. Currie, X. Feng, M. Fujita, A. J. Hu, M. Kwan, and S. Rajan. Embedded software verification using symbolic execution and uninterpreted functions. Int. J. Parallel Program., 34(1):61--91, 2006. Google Scholar
Digital Library
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, C. Whaley, and K. Yelick. Self adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 2005.Google Scholar
Cross Ref
- C. C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiss. Cache optimization for structured and unstructured grid multigrid. Elect. Trans. Numer. Anal., 10:21--40, 2000.Google Scholar
- M. Frigo and S. Johnson. Fftw: An adaptive software architecture for the fft. In ICASSP conference proceedings, volume 3, pages 1381--1384, 1998.Google Scholar
Cross Ref
- M. Frigo and V. Strumpen. The memory behavior of cache oblivious stencil computations. The Journal of Supercomputing, 39(2):93--112, 2007. Google Scholar
Digital Library
- S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness, pages 51--60, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. A. Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In B. Calder and B. G. Zorn, editors, Memory System Performance, pages 36--43. ACM, 2005. Google Scholar
Digital Library
- K. McMillan. Verification of infinite state systems by compositional model checking. In Correct Hardware Design and Verification Methods: 10th IFIP WG 10.5 Advanced Research Working Conference, CHARME '99, Bad Herrenalb, Germany, September 1999., pages 219--237, 1999. Google Scholar
Digital Library
- A. Mishchenko, S. Chatterjee, and R. Brayton. Dag-aware AIG rewriting: A fresh look at combinational logic synthesis. In DAC '06: Proceedings of the 43rd annual conference on Design automation, pages 532--535, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- A. Pnueli, O. Shtrichman, and M. Siegel. The code validation tool (cvt). International Journal on Software Tools for Technology Transfer (STTT), 2, December 1998.Google Scholar
Cross Ref
- W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. In Supercomputing '91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pages 4--13, New York, NY, USA, 1991. ACM Press. Google Scholar
Digital Library
- M. Püschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. Johnson. Spiral: A generator for platform-adapted libraries of signal processing algorithms. Journal of High Performance Computing and Applications, accepted for publication. Google Scholar
Digital Library
- G. Roth, J. Mellor-Crummey, K. Kennedy, and R. G. Brickner. Compiling stencils in high performance fortran. In Supercomputing '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), pages 1--20, New York, NY, USA, 1997. ACM Press. Google Scholar
Digital Library
- S. Sellappa and S. Chatterjee. Cache-efficient multigrid algorithms. Int. J. High Perform. Comput. Appl., 18(1):115--133, 2004. Google Scholar
Digital Library
- L. Snyder. Programming Guide to ZPL. MIT Press, Cambridge, MA, 1999. Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, V. Saraswat, and S. Seshia. Combinatorial sketching for finite programs. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2006), pages 404--415, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, V. Saraswat, and S. Seshia. Combinatorial sketching for finite programs. In ASPLOS '06,San Jose, CA, USA, 2006. ACM Press. Google Scholar
Digital Library
- D. Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3):1--221, 2002. Google Scholar
Digital Library
Index Terms
Sketching stencils
Recommendations
Sketching stencils
Proceedings of the 2007 PLDI conferencePerformance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations are hard to obtain because they ...
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing SystemsWith fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...
Sketching concurrent data structures
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe describe PSketch, a program synthesizer that helps programmers implement concurrent data structures. The system is based on the concept of sketching, a form of synthesis that allows programmers to express their insight about an implementation as a ...







Comments