skip to main content
research-article
Open Access
Distinguished Paper

Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming

Published:19 August 2021Publication History
Skip Abstract Section

Abstract

We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism. Empirically, we benchmark against the Futhark array programming language, and demonstrate that aggressive inlining and type-driven compilation allows array programs to be written in an expressive, "pointful" style with little performance penalty.

Skip Supplemental Material Section

Supplemental Material

3473593.mp4

Presentation Videos

Auxiliary Presentation Video

We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operations, we argue for an explicit nested indexing style that mirrors application of functions to arguments. We also introduce a fine-grained typed effects system which affords concise and automatically-parallelized in-place updates. Specifically, an associative accumulation effect allows reverse-mode automatic differentiation of in-place updates in a way that preserves parallelism.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA. 265–283. isbn:9781931971331Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Atılım Günes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. Automatic Differentiation in Machine Learning: A Survey. J. Mach. Learn. Res., 18, 1 (2017), Jan., 5595–5637. issn:1532-4435Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. 2010. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for scientific computing conference (SciPy). 4, 1–7.Google ScholarGoogle Scholar
  4. Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, and Jonathan Ragan-Kelley. 2020. Differentiating a Tensor Language. arxiv:2008.11256.Google ScholarGoogle Scholar
  5. Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. Julia: A fresh approach to numerical computing. SIAM review, 59, 1 (2017), 65–98. https://doi.org/10.1137/141000671 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christian Bischof, Alan Carle, George Corliss, Andreas Griewank, and Paul Hovland. 1992. ADIFOR — generating derivative codes from Fortran programs. Scientific Programming, 1, 1 (1992), 11–29. https://doi.org/10.1155/1992/717832 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guy E. Blelloch. 1993. NESL: A Nested Data-Parallel Language (Version 2.6). USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). Association for Computing Machinery, New York, NY, USA. 101–113. isbn:9781595938602 https://doi.org/10.1145/1375581.1375595 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonathan Immanuel Brachthäuser, Philipp Schuster, and Klaus Ostermann. 2020. Effects as Capabilities: Effect Handlers and Lightweight Effect Polymorphism. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 126, Nov., 30 pages. https://doi.org/10.1145/3428194 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jaxGoogle ScholarGoogle Scholar
  11. Manuel M T Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In DAMP ’11: The 6th workshop on Declarative Aspects of Multicore Programming. ACM. https://doi.org/10.1145/1926354.1926358 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). 44–54. https://doi.org/10.1109/IISWC.2009.5306797 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Conal Elliott. 2018. The Simple Essence of Automatic Differentiation. Proc. ACM Program. Lang., 2, ICFP (2018), Article 70, July, 29 pages. https://doi.org/10.1145/3236765 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Roy Frostig, Matthew Johnson, Dougal Maclaurin, Adam Paszke, and Alexey Radul. 2021. Decomposing reverse-mode automatic differentiation. In LAFI ’21: POPL 2021 workshop on Languages for Inference.Google ScholarGoogle Scholar
  15. Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation (second ed.). Society for Industrial and Applied Mathematics, USA. isbn:0898716594 https://doi.org/10.1137/1.9780898717761 Google ScholarGoogle Scholar
  16. Tobias Grosser, Armin Größ linger, and C. Lengauer. 2012. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation. Parallel Process. Lett., 22 (2012), https://doi.org/10.1142/S0129626412500107 Google ScholarGoogle ScholarCross RefCross Ref
  17. Charles R. Harris, K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern’andez del R’ıo, Mark Wiebe, Pearu Peterson, Pierre G’erard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), Sept., 357–362. https://doi.org/10.1038/s41586-020-2649-2 Google ScholarGoogle Scholar
  18. Laurent Hascoet and Valérie Pascual. 2013. The Tapenade automatic differentiation tool: principles, model, and specification. ACM Transactions on Mathematical Software (TOMS), 39, 3 (2013), 1–43. https://doi.org/10.1145/2450153.2450158 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Troels Henriksen, Sune Hellfritzsch, Ponnuswamy Sadayappan, and Cosmin Oancea. 2020. Compiling Generalized Histograms for GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20). IEEE Press, Article 97, 14 pages. isbn:9781728199986 https://doi.org/10.1109/SC41405.2020.00101 Google ScholarGoogle ScholarCross RefCross Ref
  20. Troels Henriksen, Ken Friis Larsen, and Cosmin E. Oancea. 2016. Design and GPGPU Performance of Futhark’s Redomap Construct. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2016). Association for Computing Machinery, New York, NY, USA. 17–24. isbn:9781450343848 https://doi.org/10.1145/2935323.2935326 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Troels Henriksen, Niels GW Serup, Martin Elsman, Fritz Henglein, and Cosmin E Oancea. 2017. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 556–571. https://doi.org/10.1145/3062341.3062354 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Anders Kiel Hovgaard, Troels Henriksen, and Martin Elsman. 2018. High-Performance Defunctionalisation in Futhark. In International Symposium on Trends in Functional Programming. 136–156. https://doi.org/10.1007/978-3-030-18506-0_7 Google ScholarGoogle ScholarCross RefCross Ref
  23. Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Fredo Durand. 2020. DiffTaichi: Differentiable Programming for Physical Simulation. In International Conference on Learning Representations. https://openreview.net/forum?id=B1eB5xSFvrGoogle ScholarGoogle Scholar
  24. Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures. ACM Trans. Graph., 38, 6 (2019), Article 201, Nov., 16 pages. issn:0730-0301 https://doi.org/10.1145/3355089.3356506 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jan Hückelheim, Navjot Kukreja, Sri Hari Krishna Narayanan, Fabio Luporini, Gerard Gorman, and Paul Hovland. 2019. Automatic differentiation for adjoint stencil loops. In Proceedings of the 48th International Conference on Parallel Processing. 1–10. https://doi.org/10.1145/3337821.3337906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Innes. 2018. Don’t Unroll Adjoint: Differentiating SSA-Form Programs. CoRR, abs/1810.07951 (2018), arxiv:1810.07951.Google ScholarGoogle Scholar
  27. Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons, Inc., USA. isbn:978-0-471-43014-8Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rasmus Wriedt Larsen and Troels Henriksen. 2017. Strategies for Regular Segmented Reductions on GPU. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing (FHPC 2017). Association for Computing Machinery, New York, NY, USA. 42–52. isbn:9781450351812 https://doi.org/10.1145/3122948.3122952 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). Association for Computing Machinery, New York, NY, USA. 24–35. isbn:089791662X https://doi.org/10.1145/178243.178246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daan Leijen. 2014. Koka: Programming with Row Polymorphic Effect Types. Electronic Proceedings in Theoretical Computer Science, 153 (2014), Jun, 100–126. issn:2075-2180 https://doi.org/10.4204/eptcs.153.8 Google ScholarGoogle ScholarCross RefCross Ref
  31. Tzu-Mao Li, Michaël Gharbi, Andrew Adams, Frédo Durand, and Jonathan Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in Halide. ACM Trans. Graph. (Proc. SIGGRAPH), 37, 4 (2018), 139:1–139:13. https://doi.org/10.1145/3197517.3201383 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dougal Maclaurin, David Duvenaud, and Ryan P Adams. 2014. Autograd: Effortless gradients in numpy. ICML ’15 AutoML workshop.Google ScholarGoogle Scholar
  33. Oleksandr Manzyuk, Barak A. Pearlmutter, Alexey Andreyevich Radul, David R. Rush, and Jeffrey Mark Siskind. 2019. Perturbation confusion in forward automatic differentiation of higher-order functions. Journal of Functional Programming, 29 (2019), e12. https://doi.org/10.1017/S095679681900008X Google ScholarGoogle ScholarCross RefCross Ref
  34. Kiminori Matsuzaki and Kento Emoto. 2010. Implementing Fusion-Equipped Parallel Skeletons by Expression Templates. In Implementation and Application of Functional Languages, Marco T. Morazán and Sven-Bodo Scholz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 72–89. isbn:978-3-642-16478-1 https://doi.org/10.1007/978-3-642-16478-1_5 Google ScholarGoogle ScholarCross RefCross Ref
  35. Trevor L. McDonell, Manuel M T Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising Purely Functional GPU Programs. In ICFP ’13: The 18th ACM SIGPLAN International Conference on Functional Programming. ACM. https://doi.org/10.1145/2500365.2500595 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Robin Milner, Mads Tofte, and David Macqueen. 1997. The Definition of Standard ML. MIT Press, Cambridge, MA, USA. isbn:0262631814Google ScholarGoogle Scholar
  37. Neil Mitchell. 2010. Rethinking Supercompilation. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP ’10). Association for Computing Machinery, New York, NY, USA. 309–320. isbn:9781605587943 https://doi.org/10.1145/1863543.1863588 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Shayan Najd, Sam Lindley, Josef Svenningsson, and Philip Wadler. 2016. Everything Old is New Again: Quoted Domain-Specific Languages. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM ’16). Association for Computing Machinery, New York, NY, USA. 25–36. isbn:9781450340977 https://doi.org/10.1145/2847538.2847541 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 8024–8035.Google ScholarGoogle Scholar
  40. Barak A Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Languages and Systems (TOPLAS), 30, 2 (2008), 1–36. https://doi.org/10.1145/1330017.1330018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Simon Peyton Jones. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems (APLAS ’08). Springer-Verlag, Berlin, Heidelberg. 138. isbn:9783540893295 https://doi.org/10.1007/978-3-540-89330-1_10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Simon Peyton Jones and Simon Marlow. 2002. Secrets of the Glasgow Haskell Compiler Inliner. J. Funct. Program., 12, 5 (2002), July, 393–434. issn:0956-7968 https://doi.org/10.1017/S0956796802004331 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Mark Shields. 2007. Practical Type Inference for Arbitrary-Rank Types. J. Funct. Program., 17, 1 (2007), Jan., 1–82. issn:0956-7968 https://doi.org/10.1017/S0956796806006034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not., 48, 6 (2013), June, 519–530. issn:0362-1340 https://doi.org/10.1145/2499370.2462176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sam Ritchie and Gerald Jay Sussman. 2021. AD on Higher Order Functions. Unpublished note.Google ScholarGoogle Scholar
  46. Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, and Zachary Tatlock. 2018. Relay: A new IR for machine learning frameworks. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. 58–68. https://doi.org/10.1145/3211346.3211348 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, and Simon Peyton Jones. 2019. Efficient Differentiable Programming in a Functional Array-Processing Language. Proc. ACM Program. Lang., 3, ICFP (2019), Article 97, July, 30 pages. https://doi.org/10.1145/3341701 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An Array-Oriented Language with Static Rank Polymorphism. In Proceedings of the 23rd European Symposium on Programming Languages and Systems - Volume 8410. Springer-Verlag, Berlin, Heidelberg. 27–46. isbn:9783642548321 https://doi.org/10.1007/978-3-642-54833-8_3 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Guy L. Steele, Eric Allen, David Chase, Christine Flood, Victor Luchangco, Jan-Willem Maessen, and Sukyoung Ryu. 2011. Fortress (Sun HPCS Language). Springer US, Boston, MA. 718–735. isbn:978-0-387-09766-4 https://doi.org/10.1007/978-0-387-09766-4_190 Google ScholarGoogle ScholarCross RefCross Ref
  50. Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: A Functional Data-Parallel IR for High-Performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17). IEEE Press, 74–85. isbn:9781509049318 https://doi.org/10.1109/CGO.2017.7863730 Google ScholarGoogle ScholarCross RefCross Ref
  51. J. A. Stratton, Christopher I. Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, N. Anssari, G. Liu, and W. Hwu. 2012. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing.Google ScholarGoogle Scholar
  52. Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, Karthikeyan Bhargavan, and Jean Yang. 2011. Secure Distributed Programming with Value-Dependent Types. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP ’11). Association for Computing Machinery, New York, NY, USA. 266–278. isbn:9781450308656 https://doi.org/10.1145/2034773.2034811 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, and Hiroyuki Yamazaki Vincent. 2019. Chainer: A deep learning framework for accelerating the research cycle. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2002–2011. https://doi.org/10.1145/3292500.3330756 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arxiv:1802.04730.Google ScholarGoogle Scholar

Index Terms

  1. Getting to the point: index sets and parallelism-preserving autodiff for pointful array programming

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!