skip to main content
article

Streaming irregular arrays

Published:07 September 2017Publication History
Skip Abstract Section

Abstract

Previous work has demonstrated that it is possible to generate efficient and highly parallel code for multicore CPUs and GPUs from combinator-based array languages for a range of applications. That work, however, has been limited to operating on flat, rectangular structures without any facilities for irregularity or nesting.

In this paper, we show that even a limited form of nesting provides substantial benefits both in terms of the expressiveness of the language (increasing modularity and providing support for simple irregular structures) and the portability of the code (increasing portability across resource-constrained devices, such as GPUs). Specifically, we generalise Blelloch's flattening transformation along two lines: (1) we explicitly distinguish between definitely regular and potentially irregular computations; and (2) we handle multidimensional arrays. We demonstrate the utility of this generalisation by an extension of the embedded array language Accelerate to include irregular streams of multidimensional arrays. We discuss code generation, optimisation, and irregular stream scheduling as well as a range of benchmarks on both multicore CPUs and GPUs.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). htp://tensorflow.org/Google ScholarGoogle Scholar
  2. Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In PPoPP’13: Principles and Practice of Parallel Programming. ACM, 81ś92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lars Bergstrom and John Reppy. 2012. Nested data-parallelism on the GPU. In ICFP: International Conference on Functional Programming. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Guy E. Blelloch. 1995. NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-95-170. Carnegie Mellon University.Google ScholarGoogle Scholar
  5. Guy E Blelloch and Gary W Sabot. 1988. Compiling collection-oriented languages onto massively parallel computers. In Symposium on the Frontiers of Massively Parallel Computation. IEEE, 575ś585.Google ScholarGoogle Scholar
  6. I Buck. 2003. Brook Language Speciication. Outubro (2003). htp://merrimac. stanford.edu/brookGoogle ScholarGoogle Scholar
  7. Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH Papers. ACM, 777ś786. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In DAMP: Declarative Aspects of Multicore Programming. ACM, 3ś14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha. 1990. Scan primitives for vector computers. In Supercomputing. IEEE, 666ś675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Koen Claessen, Mary Sheeran, and Bo Joel Svensson. 2012. Expressive array constructs in an embedded GPU kernel programming language. In DAMP: Declarative Aspects and Applications of Multicore Programming. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Koen Claessen, Mary Sheeran, and Joel Svensson. 2008. Obsidian: GPU programming in Haskell. In IFL: Implementation and Application of Functional Languages.Google ScholarGoogle Scholar
  12. Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream fusion from lists to streams to nothing at all. In ICFP: International Conference on Functional Programming. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jan Prins Daniel Palmer and Stephen Westfold. 1995. Work-Eicient Nested DataParallelism. In Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Processing (Frontiers 95). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tim A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Software 38, 1 (2011), 1ś25. htp://www.cise.ufl. edu/research/sparse/matrices Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Conal Elliott. 2003. Functional Images. In The Fun of Programming. Palgrave.Google ScholarGoogle Scholar
  16. Conal Elliott. 2004. Programming Graphics Processors Functionally. In Haskell Workshop. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew Fluet, Nic Ford, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Status Report: The Manticore Project. In ML’07: Workshop on ML. ACM, 15ś24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Leo J Guibas and Douglas K Wyatt. 1978. Compilation and Delayed Evaluation in APL. In POPL ’78: Principles of Programming Languages. 1ś8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Amir H. Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke. 2011. Sponge: Portable Stream Programming on Graphics Engines. SIGARCH: Computer Architecture News 39, 1 (March 2011), 381ś392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gabriele Keller, Manuel MT Chakravarty, Roman Leshchinskiy, Ben Lippmeier, and Simon Peyton Jones. 2012. Vectorisation avoidance. In ACM SIGPLAN Notices, Vol. 47. ACM, 37ś48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, and Ben Lippmeier. 2010. Regular, Shape-polymorphic, Parallel Arrays in Haskell. In ICFP: International Conference on Functional Programming. ACM, 261ś272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bradford Larsen. 2011. Simple optimizations for an applicative array language for graphics processors. In DAMP: Declarative Aspects of Multicore Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ben Lippmeier, Manuel Chakravarty, Gabriele Keller, and Simon Peyton Jones. 2012. Guiding parallel array fusion with indexed types. In Haskell Symposium. ACM, 25ś36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ben Lippmeier, Manuel M T Chakravarty, Gabriele Keller, Roman Leshchinskiy, and Simon Peyton Jones. 2012. Work Eicient Higher-Order Vectorisation. In ICFP’12: International Conference on Functional Programming. ACM, 259ś270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Frederik M. Madsen, Robert Clifton-Everest, Manuel M. T. Chakravarty, and Gabriele Keller. 2015. Functional Array Streams. In FHPC’15: Workshop on Functional High-Performance Computing. ACM, 23ś34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Frederik M. Madsen and Andrzej Filinski. 2013. Towards a Streaming Model for Nested Data Parallelism. In FHPC’13: Workshop on Functional High-performance Computing. ACM, 13ś24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Geofrey Mainland and Greg Morrisett. 2010. Nikola: Embedding Compiled GPU Functions in Haskell. In Haskell Symposium. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Trevor L. McDonell, Manuel M T Chakravarty, Vinod Grover, and Ryan R Newton. 2015. Type-safe Runtime Code Generation: Accelerate to LLVM. In Haskell ’15: Symposium on Haskell. ACM, 201ś212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising Purely Functional GPU Programs. In ICFP: International Conference on Functional Programming. 49ś60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. (1999).Google ScholarGoogle Scholar
  31. Daniel W. Palmer, Jan F. Prins, Siddhartha Chatterjee, and Rickard E. Faith. 1996. Piecewise execution of nested data-parallel programs. In Languages and Compilers for Parallel Computing. Springer Heidelberg, 346ś361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Simon Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M T Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In Foundations of Software Technology and Theoretical Computer Science.Google ScholarGoogle Scholar
  33. Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geofrey Washburn. 2006. Simple uniication-based type inference for GADTs. In ICFP’06: International Conference on Functional Programming. 50ś61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI’13: Programming Language Design and Implementation. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ronald Rivest. 1992. The MD5 message-digest algorithm. (1992).Google ScholarGoogle Scholar
  36. Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Martin Odersky, and Kunle Olukotun. 2013. Optimizing Data Structures in High-Level Programs: New Directions for Extensible Compilers based on Staging. In POPL’13: Principles of Programming Languages. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D Owens. 2007. Scan primitives for GPU computing. In Symposium on Graphics Hardware. Eurographics Association, 97ś106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A language for streaming applications. In Compiler Construction. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Philip Wadler. 1990. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science 73, 2 (June 1990), 231ś248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yongpeng Zhang and F Mueller. 2012. CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures. In ICPP ’12: International Conference on Parallel Processing. 340ś349. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Streaming irregular arrays

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!