Abstract
Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.
Supplemental Material
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
- Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). ACM, New York, NY, USA. 214–228. isbn:978-1-4503-6225-2 http://mike-rainey.site/papers/oracle-ppop19-long.pdfGoogle Scholar
Digital Library
- Umut A Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. http://mike-rainey.site/papers/heartbeat.pdfGoogle Scholar
- Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, and Tatiana Shpeisman. 2017. Parallelizing Julia with a Non-Invasive DSL. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 4:1–4:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.4 Google Scholar
Cross Ref
- Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst., 11 (1989), October, 598–632. issn:0164-0925 https://doi.org/10.1145/69558.69562 Google Scholar
Digital Library
- Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA. 81–92. isbn:9781450319225 https://doi.org/10.1145/2442516.2442525 Google Scholar
Digital Library
- Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2017. Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language. Proc. ACM Program. Lang., 2, POPL (2017), Article 5, Dec., 29 pages. https://doi.org/10.1145/3158093 Google Scholar
Digital Library
- Guy E. Blelloch. 1992. NESL: A Nested Data-Parallel Language. USA.Google Scholar
Digital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA. 207–216. isbn:0897917006 https://doi.org/10.1145/209936.209958 Google Scholar
Digital Library
- Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP ’11). Association for Computing Machinery, New York, NY, USA. 3–14. isbn:9781450304863 https://doi.org/10.1145/1926354.1926358 Google Scholar
Digital Library
- Adam Chlipala. 2015. An Optimizing Compiler for a Purely Functional Web-application Language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 10–21. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784741 Google Scholar
Digital Library
- Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, and John C. Hart. 2010. Parallel SAH K-D Tree Construction. In Proceedings of the Conference on High Performance Graphics (HPG ’10). Eurographics Association, Goslar, DEU. 77–86.Google Scholar
- Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 2009. Semantics engineering with PLT Redex. Mit Press.Google Scholar
Digital Library
- Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3, 3 (1977), Sept., 209–226. issn:0098-3500 https://doi.org/10.1145/355744.355745 Google Scholar
Digital Library
- Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General transformations for GPU execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing) (SC ’13).Google Scholar
Digital Library
- Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. 2018. Hierarchical Memory Management for Mutable State. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 81–93. isbn:9781450349826 https://doi.org/10.1145/3178487.3178494 Google Scholar
Digital Library
- Robert H. Halstead. 1985. MULTILISP: A Language for Concurrent Symbolic Computation. ACM Trans. Program. Lang. Syst., 7, 4 (1985), Oct., 501–538. issn:0164-0925 https://doi.org/10.1145/4472.4478 Google Scholar
Digital Library
- Tim Harris and Satnam Singh. 2007. Feedback Directed Implicit Parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA. 251–264. isbn:9781595938152 https://doi.org/10.1145/1291151.1291192 Google Scholar
Digital Library
- Gabriele Keller and Manuel M. T. Chakravarty. 1998. Flattening Trees. In Proceedings of the 4th International Euro-Par Conference on Parallel Processing (Euro-Par ’98). Springer-Verlag, Berlin, Heidelberg. 709–719. isbn:3540649522Google Scholar
- Chaitanya Koparkar, Mike Rainey, Michael Vollmer, Milind Kulkarni, and Ryan R. Newton. 2021. Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations. arxiv:2107.00522.Google Scholar
- Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). Association for Computing Machinery, New York, NY, USA. 2–14. isbn:9781450327848 https://doi.org/10.1145/2594291.2594312 Google Scholar
Digital Library
- Doug Lea. 2000. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande. 36–43.Google Scholar
Digital Library
- Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system release. Feb, https://ocaml.org/releases/4.10/htmlman/index.htmlGoogle Scholar
- Junichiro Makino. 1990. Vectorization of a treecode. J. Comput. Phys., 87 (1990), March, 148–160. issn:0021-9991 https://doi.org/10.1016/0021-9991(90)90231-O Google Scholar
Digital Library
- Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. 2008. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In Proceedings of the 7th International Symposium on Memory Management (ISMM ’08). Association for Computing Machinery, New York, NY, USA. 11–20. isbn:9781605581347 https://doi.org/10.1145/1375634.1375637 Google Scholar
Digital Library
- Simon Marlow, Ryan Newton, and Simon Peyton Jones. 2011. A Monad for Deterministic Parallelism. SIGPLAN Not., 46, 12 (2011), Sept., 71–82. issn:0362-1340 https://doi.org/10.1145/2096148.2034685 Google Scholar
Digital Library
- Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 65–78. isbn:9781605583327 https://doi.org/10.1145/1596550.1596563 Google Scholar
Digital Library
- R. Menon and L. Dagum. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computing in Science and Engineering, v, 01 (1998), jan, 46–55. issn:1558-366X https://doi.org/10.1109/99.660313 Google Scholar
Digital Library
- Leo A. Meyerovich, Todd Mytkowicz, and Wolfram Schulte. 2011. Data Parallel Programming for Irregular Tree Computations. In HotPAR. https://www.microsoft.com/en-us/research/publication/data- parallel-programming-for-irregular-tree-computations/Google Scholar
- Robin Milner, Mads Tofte, and David Macqueen. 1997. The Definition of Standard ML. MIT Press, Cambridge, MA, USA. isbn:0262631814Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037.Google Scholar
Digital Library
- Ram Raghunathan, Stefan K Muller, Umut A Acar, and Guy Blelloch. 2016. Hierarchical memory management for parallel programs. ACM SIGPLAN Notices, 51, 9 (2016), 392–406.Google Scholar
Digital Library
- Mike Rainey, Kyle A Hale, Ryan Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism.Google Scholar
- John Reppy, Claudio V. Russo, and Yingqi Xiao. 2009. Parallel Concurrent ML. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 257–268. isbn:9781605583327 https://doi.org/10.1145/1596550.1596588 Google Scholar
Digital Library
- Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig. 2004. A Nanopass Infrastructure for Compiler Education. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP ’04). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:1581139055 https://doi.org/10.1145/1016850.1016878 Google Scholar
Digital Library
- Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). Association for Computing Machinery, New York, NY, USA. 68–70. isbn:9781450312134 https://doi.org/10.1145/2312005.2312018 Google Scholar
Digital Library
- Jeremy Siek, Carl Factora, Andre Kuhlenschmidt, Ryan Newton, Scott Ryan, Cameron Swords, Michael Vitousek, and Michael Vollmer. 2020. Essentials Of Compilation: An Incremental Approach. https://iucompilercourse.github.io/IU-P423-P523-E313-E513-Fall-2020/Google Scholar
- KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. Proc. ACM Program. Lang., 4, ICFP, Article 113, 30 pages. https://doi.org/10.1145/3408995 Google Scholar
Digital Library
- KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. arXiv preprint arXiv:2004.11663.Google Scholar
- Mads Tofte, Lars Birkedal, Martin Elsman, and Niels Hallenberg. 2004. A Retrospective on Region-Based Memory Management. Higher Order Symbol. Comput., 17, 3 (2004), Sept., 245–265. issn:1388-3690 https://doi.org/10.1023/B:LISP.0000029446.78563.a4 Google Scholar
Digital Library
- Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Inf. Comput., 132, 2 (1997), Feb., 109–176. issn:0890-5401 https://doi.org/10.1006/inco.1996.2613 Google Scholar
Digital Library
- Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage collector for functional programs on multicore processors. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. 421–433.Google Scholar
Digital Library
- Kenton Varda. 2015. Cap’n Proto. https://capnproto.org/Google Scholar
- Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, and Ryan R. Newton. 2019. LoCal: A Language for Programs Operating on Serialized Data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 48–62. isbn:9781450367127 https://doi.org/10.1145/3314221.3314631 Google Scholar
Digital Library
- Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. 2017. Compiling Tree Transforms to Operate on Packed Representations. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 26:1–26:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.26 Google Scholar
Cross Ref
- Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2019. Disentanglement in Nested-Parallel Programs. Proc. ACM Program. Lang., 4, POPL (2019), Article 47, Dec., 32 pages. https://doi.org/10.1145/3371115 Google Scholar
Digital Library
- Edward Z. Yang, Giovanni Campagna, Ömer S. Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, and Ryan R. Newton. 2015. Efficient Communication and Collection with Compact Normal Forms. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 362–374. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784735 Google Scholar
Digital Library
Index Terms
Efficient tree-traversals: reconciling parallelism and dense data representations
Recommendations
Data-only flattening for nested data parallelism
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingData parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as ...
A Comparison of 12 Parallel FORTRAN Dialects
A simple program that approximates pi by numerical quadrature is rewritten to run on nine commercially available processors to illustrate the compilations that arise in parallel programming in FORTRAN. The machines used are the Alliant FX/8, BBN ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel ComputationHigh Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...






Comments