Abstract

Stencil convolution is a fundamental building block of many scientific and image processing algorithms. We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel. To achieve this we extend our prior work on the Repa array library with two new features: partitioned and cursored arrays. Combined with careful management of the interaction between GHC and its back-end code generator LLVM, we achieve performance comparable to the standard OpenCV library.
Supplemental Material
- S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In In Proc. Intl. Conf. on Object-Oriented Programming, Systems, Languages, and Applications, 2009. Google Scholar
Digital Library
- B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Proc. of the 15th Symposium on Principles of Programming Languages, pages 1--11, 1988. Google Scholar
Digital Library
- R. Barrett, P. Roth, and S. Poole. Finite difference stencils implemented using Chapel. Technical report, Oak Ridge National Laboratory, 2007.Google Scholar
- M. Bolingbroke and S. Peyton Jones. Supercompilation by evaluation. In Proc. of the third ACM Haskell Symposium, pages 135--146. ACM, 2010. Google Scholar
Digital Library
- G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly Media, 2008.Google Scholar
- J. Canny. Finding edges and lines in images. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA, 1983. Google Scholar
Digital Library
- S. Carr, C. Ding, and P. Sweany. Improving software pipelining with unroll-and-jam. In Proc. of the 29th Hawaii International Conference on System Sciences. IEEE Computer Society, 1996. Google Scholar
Digital Library
- M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In Proc. of the sixth workshop on Declarative Aspects of Multicore Programming, pages 3--14. ACM, 2011. Google Scholar
Digital Library
- B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby. ZPL: A machine independent programming language for parallel computers. IEEE Transactions on Software Engineering, 26: 197--211, 2000. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proc. of the 12th ACM SIGPLAN International Conference on Functional programming, pages 315--326. ACM, 2007. Google Scholar
Digital Library
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proc, of the ACM/IEEE Conference on Supercomputing, pages 4:1--4:12. IEEE Press, 2008. Google Scholar
Digital Library
- D. G. Feitelson and L. Rudolph. Gang scheduling performance benefits for fine-grain synchronization. Journal of Parallel and Distributed Computing, 16: 306--318, 1992.Google Scholar
Cross Ref
- P. N. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Technical report, Berkeley, CA, USA, 2001. Google Scholar
Digital Library
- C. S. Ierotheou, S. P. Johnson, M. Cross, and P. F. Leggett. Computer aided parallelisation tools (CAPTools) - conceptual overview and performance on the parallelisation of structured mesh codes. Parallel Comput., 22: 163--195, February 1996. Google Scholar
Digital Library
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, Shape-polymorphic, Parallel Arrays in Haskell. In Proc. of the 15th ACM SIGPLAN International Conference on Functional Programming, pages 261--272. ACM, 2010. Google Scholar
Digital Library
- S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proc. of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 235--244. ACM, 2007. Google Scholar
Digital Library
- J. Launchbury and S. L. Peyton Jones. Lazy functional state threads. In Proc. of the ACM SIGPLAN 1994 conference on Programming Language Design and Implementation, pages 24--35. ACM, 1994. Google Scholar
Digital Library
- M. Lesniak. PASTHA: parallelizing stencil calculations in Haskell. In Proc. of the 5th ACM SIGPLAN workshop on Declarative Aspects of Multicore Programming, pages 5--14. ACM, 2010. Google Scholar
Digital Library
- N. Mitchell. Rethinking supercompilation. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, pages 309--320. ACM, 2010. Google Scholar
Digital Library
- R. W. Numrich. The computational energy spectrum of a program as it executes. The Journal of Supercomputing, 52 (2): 119--134, 2010. Google Scholar
Digital Library
- L. O'Gorman, M. J. Sammon, and M. Seul. Practical Algorithms for Image Analysis. Cambridge University Press, 2nd edition, 2008. Google Scholar
Digital Library
- D. A. Orchard, M. Bolingbroke, and A. Mycroft. Ypnos: Declarative, Parallel Structured Grid Programming. In Proc. of the 5th ACM SIGPLAN workshop on Declarative Aspects of Multicore Programming, pages 15--24. ACM, 2010. Google Scholar
Digital Library
- S. Peyton Jones, A. Tolmach, and T. Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. of the Haskell Workshop, 2001.Google Scholar
- Repa. The Repa Home Page, Mar. 2011. http://trac.haskell.org/repa.Google Scholar
- B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. In Proc. of the 15th Symposium on Principles of Programming Languages. ACM, 1988. Google Scholar
Digital Library
- S.-B. Scholz. Single assignment C -- efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 13 (6): 1005--1059, 2003. Google Scholar
Digital Library
- D. A. Terei and M. M. Chakravarty. An LLVM backend for GHC. In Proc. of the third ACM Symposium on Haskell, pages 109--120. ACM, 2010. Google Scholar
Digital Library
Index Terms
Efficient parallel stencil convolution in Haskell
Recommendations
Efficient parallel stencil convolution in Haskell
Haskell '11: Proceedings of the 4th ACM symposium on HaskellStencil convolution is a fundamental building block of many scientific and image processing algorithms. We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel. To achieve this ...
Accelerating Haskell array codes with multicore GPUs
DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programmingCurrent GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge.
To raise ...
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programmingGraphics Processing Units (GPUs) are powerful computing devices that with the advent of CUDA/OpenCL are becomming useful for general purpose computations. Obsidian is an embedded domain specific language that generates CUDA kernels from functional ...







Comments