Abstract
Previous work has demonstrated that it is possible to generate efficient and highly parallel code for multicore CPUs and GPUs from combinator-based array languages for a range of applications. That work, however, has been limited to operating on flat, rectangular structures without any facilities for irregularity or nesting.
In this paper, we show that even a limited form of nesting provides substantial benefits both in terms of the expressiveness of the language (increasing modularity and providing support for simple irregular structures) and the portability of the code (increasing portability across resource-constrained devices, such as GPUs). Specifically, we generalise Blelloch's flattening transformation along two lines: (1) we explicitly distinguish between definitely regular and potentially irregular computations; and (2) we handle multidimensional arrays. We demonstrate the utility of this generalisation by an extension of the embedded array language Accelerate to include irregular streams of multidimensional arrays. We discuss code generation, optimisation, and irregular stream scheduling as well as a range of benchmarks on both multicore CPUs and GPUs.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). htp://tensorflow.org/Google Scholar
- Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In PPoPP’13: Principles and Practice of Parallel Programming. ACM, 81ś92. Google Scholar
Digital Library
- Lars Bergstrom and John Reppy. 2012. Nested data-parallelism on the GPU. In ICFP: International Conference on Functional Programming. ACM. Google Scholar
Digital Library
- Guy E. Blelloch. 1995. NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-95-170. Carnegie Mellon University.Google Scholar
- Guy E Blelloch and Gary W Sabot. 1988. Compiling collection-oriented languages onto massively parallel computers. In Symposium on the Frontiers of Massively Parallel Computation. IEEE, 575ś585.Google Scholar
- I Buck. 2003. Brook Language Speciication. Outubro (2003). htp://merrimac. stanford.edu/brookGoogle Scholar
- Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH Papers. ACM, 777ś786. Google Scholar
Digital Library
- Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In DAMP: Declarative Aspects of Multicore Programming. ACM, 3ś14. Google Scholar
Digital Library
- Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha. 1990. Scan primitives for vector computers. In Supercomputing. IEEE, 666ś675. Google Scholar
Digital Library
- Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive array constructs in an embedded GPU kernel programming language. In DAMP: Declarative Aspects and Applications of Multicore Programming. ACM. Google Scholar
Digital Library
- Koen Claessen, Mary Sheeran, and Joel Svensson. 2008. Obsidian: GPU programming in Haskell. In IFL: Implementation and Application of Functional Languages.Google Scholar
- Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream fusion from lists to streams to nothing at all. In ICFP: International Conference on Functional Programming. ACM. Google Scholar
Digital Library
- Jan Prins Daniel Palmer and Stephen Westfold. 1995. Work-Eicient Nested DataParallelism. In Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Processing (Frontiers 95). IEEE. Google Scholar
Digital Library
- Tim A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software 38, 1 (2011), 1ś25. htp://www.cise.ufl. edu/research/sparse/matrices Google Scholar
Digital Library
- Conal Elliott. 2003. Functional Images. In The Fun of Programming. Palgrave.Google Scholar
- Conal Elliott. 2004. Programming Graphics Processors Functionally. In Haskell Workshop. ACM. Google Scholar
Digital Library
- Matthew Fluet, Nic Ford, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Status Report: The Manticore Project. In ML’07: Workshop on ML. ACM, 15ś24. Google Scholar
Digital Library
- Leo J Guibas and Douglas K Wyatt. 1978. Compilation and Delayed Evaluation in APL. In POPL ’78: Principles of Programming Languages. 1ś8. Google Scholar
Digital Library
- Amir H. Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke. 2011. Sponge: Portable Stream Programming on Graphics Engines. SIGARCH: Computer Architecture News 39, 1 (March 2011), 381ś392. Google Scholar
Digital Library
- Gabriele Keller, Manuel MT Chakravarty, Roman Leshchinskiy, Ben Lippmeier, and Simon Peyton Jones. 2012. Vectorisation avoidance. In ACM SIGPLAN Notices, Vol. 47. ACM, 37ś48. Google Scholar
Digital Library
- Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, and Ben Lippmeier. 2010. Regular, Shape-polymorphic, Parallel Arrays in Haskell. In ICFP: International Conference on Functional Programming. ACM, 261ś272. Google Scholar
Digital Library
- Bradford Larsen. 2011. Simple optimizations for an applicative array language for graphics processors. In DAMP: Declarative Aspects of Multicore Programming. Google Scholar
Digital Library
- Ben Lippmeier, Manuel Chakravarty, Gabriele Keller, and Simon Peyton Jones. 2012. Guiding parallel array fusion with indexed types. In Haskell Symposium. ACM, 25ś36. Google Scholar
Digital Library
- Ben Lippmeier, Manuel M T Chakravarty, Gabriele Keller, Roman Leshchinskiy, and Simon Peyton Jones. 2012. Work Eicient Higher-Order Vectorisation. In ICFP’12: International Conference on Functional Programming. ACM, 259ś270. Google Scholar
Digital Library
- Frederik M. Madsen, Robert Clifton-Everest, Manuel M. T. Chakravarty, and Gabriele Keller. 2015. Functional Array Streams. In FHPC’15: Workshop on Functional High-Performance Computing. ACM, 23ś34. Google Scholar
Digital Library
- Frederik M. Madsen and Andrzej Filinski. 2013. Towards a Streaming Model for Nested Data Parallelism. In FHPC’13: Workshop on Functional High-performance Computing. ACM, 13ś24. Google Scholar
Digital Library
- Geofrey Mainland and Greg Morrisett. 2010. Nikola: Embedding Compiled GPU Functions in Haskell. In Haskell Symposium. ACM. Google Scholar
Digital Library
- Trevor L. McDonell, Manuel M T Chakravarty, Vinod Grover, and Ryan R Newton. 2015. Type-safe Runtime Code Generation: Accelerate to LLVM. In Haskell ’15: Symposium on Haskell. ACM, 201ś212. Google Scholar
Digital Library
- Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising Purely Functional GPU Programs. In ICFP: International Conference on Functional Programming. 49ś60. Google Scholar
Digital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. (1999).Google Scholar
- Daniel W. Palmer, Jan F. Prins, Siddhartha Chatterjee, and Rickard E. Faith. 1996. Piecewise execution of nested data-parallel programs. In Languages and Compilers for Parallel Computing. Springer Heidelberg, 346ś361. Google Scholar
Digital Library
- Simon Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M T Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In Foundations of Software Technology and Theoretical Computer Science.Google Scholar
- Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geofrey Washburn. 2006. Simple uniication-based type inference for GADTs. In ICFP’06: International Conference on Functional Programming. 50ś61. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI’13: Programming Language Design and Implementation. ACM. Google Scholar
Digital Library
- Ronald Rivest. 1992. The MD5 message-digest algorithm. (1992).Google Scholar
- Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Martin Odersky, and Kunle Olukotun. 2013. Optimizing Data Structures in High-Level Programs: New Directions for Extensible Compilers based on Staging. In POPL’13: Principles of Programming Languages. ACM. Google Scholar
Digital Library
- Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan primitives for GPU computing. In Symposium on Graphics Hardware. Eurographics Association, 97ś106. Google Scholar
Digital Library
- William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A language for streaming applications. In Compiler Construction. Springer. Google Scholar
Digital Library
- Philip Wadler. 1990. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science 73, 2 (June 1990), 231ś248. Google Scholar
Digital Library
- Yongpeng Zhang and F Mueller. 2012. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In ICPP ’12: International Conference on Parallel Processing. 340ś349. Google Scholar
Digital Library
Index Terms
Streaming irregular arrays
Recommendations
Streaming irregular arrays
Haskell 2017: Proceedings of the 10th ACM SIGPLAN International Symposium on HaskellPrevious work has demonstrated that it is possible to generate efficient and highly parallel code for multicore CPUs and GPUs from combinator-based array languages for a range of applications. That work, however, has been limited to operating on flat, ...
Exploiting Implicit Parallelism in Dynamic Array Programming Languages
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingWe have built an interpreter for the array programming language J. The interpreter exploits implicit data parallelism in the language to achieve good parallel speedups on a variety of benchmark applications.
Many array programming languages operate on ...
Two Techniques for Static Array Partitioning on Message-Passing Parallel Machines
PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation TechniquesWe present two elegant techniques for partitioning arrays in parallel DoAll loops for message-passing parallel machines. (1) Communication-free array partitioning: A general solution of communication-free partitioning is derived for arrays in a DoAll ...







Comments