skip to main content

Efficient tree-traversals: reconciling parallelism and dense data representations

Published:19 August 2021Publication History
Skip Abstract Section

Abstract

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.

Skip Supplemental Material Section

Supplemental Material

3473596.mp4

Presentation Videos

Auxiliary Presentation Video

Version 2 of the video for "Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations".

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). ACM, New York, NY, USA. 214–228. isbn:978-1-4503-6225-2 http://mike-rainey.site/papers/oracle-ppop19-long.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  3. Umut A Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. http://mike-rainey.site/papers/heartbeat.pdfGoogle ScholarGoogle Scholar
  4. Todd A. Anderson, Hai Liu, Lindsey Kuper, Ehsan Totoni, Jan Vitek, and Tatiana Shpeisman. 2017. Parallelizing Julia with a Non-Invasive DSL. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 4:1–4:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.4 Google ScholarGoogle ScholarCross RefCross Ref
  5. Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst., 11 (1989), October, 598–632. issn:0164-0925 https://doi.org/10.1145/69558.69562 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lars Bergstrom, Matthew Fluet, Mike Rainey, John Reppy, Stephen Rosen, and Adam Shaw. 2013. Data-Only Flattening for Nested Data Parallelism. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). Association for Computing Machinery, New York, NY, USA. 81–92. isbn:9781450319225 https://doi.org/10.1145/2442516.2442525 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jean-Philippe Bernardy, Mathieu Boespflug, Ryan R. Newton, Simon Peyton Jones, and Arnaud Spiwack. 2017. Linear Haskell: Practical Linearity in a Higher-Order Polymorphic Language. Proc. ACM Program. Lang., 2, POPL (2017), Article 5, Dec., 29 pages. https://doi.org/10.1145/3158093 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guy E. Blelloch. 1992. NESL: A Nested Data-Parallel Language. USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP ’95). Association for Computing Machinery, New York, NY, USA. 207–216. isbn:0897917006 https://doi.org/10.1145/209936.209958 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming (DAMP ’11). Association for Computing Machinery, New York, NY, USA. 3–14. isbn:9781450304863 https://doi.org/10.1145/1926354.1926358 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Adam Chlipala. 2015. An Optimizing Compiler for a Purely Functional Web-application Language. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 10–21. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784741 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Byn Choi, Rakesh Komuravelli, Victor Lu, Hyojin Sung, Robert L. Bocchino, Sarita V. Adve, and John C. Hart. 2010. Parallel SAH K-D Tree Construction. In Proceedings of the Conference on High Performance Graphics (HPG ’10). Eurographics Association, Goslar, DEU. 77–86.Google ScholarGoogle Scholar
  13. Matthias Felleisen, Robert Bruce Findler, and Matthew Flatt. 2009. Semantics engineering with PLT Redex. Mit Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. 1977. An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw., 3, 3 (1977), Sept., 209–226. issn:0098-3500 https://doi.org/10.1145/355744.355745 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General transformations for GPU execution of tree traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Supercomputing) (SC ’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut Acar, and Matthew Fluet. 2018. Hierarchical Memory Management for Mutable State. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). Association for Computing Machinery, New York, NY, USA. 81–93. isbn:9781450349826 https://doi.org/10.1145/3178487.3178494 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert H. Halstead. 1985. MULTILISP: A Language for Concurrent Symbolic Computation. ACM Trans. Program. Lang. Syst., 7, 4 (1985), Oct., 501–538. issn:0164-0925 https://doi.org/10.1145/4472.4478 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tim Harris and Satnam Singh. 2007. Feedback Directed Implicit Parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP ’07). Association for Computing Machinery, New York, NY, USA. 251–264. isbn:9781595938152 https://doi.org/10.1145/1291151.1291192 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gabriele Keller and Manuel M. T. Chakravarty. 1998. Flattening Trees. In Proceedings of the 4th International Euro-Par Conference on Parallel Processing (Euro-Par ’98). Springer-Verlag, Berlin, Heidelberg. 709–719. isbn:3540649522Google ScholarGoogle Scholar
  20. Chaitanya Koparkar, Mike Rainey, Michael Vollmer, Milind Kulkarni, and Ryan R. Newton. 2021. Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations. arxiv:2107.00522.Google ScholarGoogle Scholar
  21. Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). Association for Computing Machinery, New York, NY, USA. 2–14. isbn:9781450327848 https://doi.org/10.1145/2594291.2594312 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Doug Lea. 2000. A java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande. 36–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system release. Feb, https://ocaml.org/releases/4.10/htmlman/index.htmlGoogle ScholarGoogle Scholar
  24. Junichiro Makino. 1990. Vectorization of a treecode. J. Comput. Phys., 87 (1990), March, 148–160. issn:0021-9991 https://doi.org/10.1016/0021-9991(90)90231-O Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. 2008. Parallel Generational-Copying Garbage Collection with a Block-Structured Heap. In Proceedings of the 7th International Symposium on Memory Management (ISMM ’08). Association for Computing Machinery, New York, NY, USA. 11–20. isbn:9781605581347 https://doi.org/10.1145/1375634.1375637 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Simon Marlow, Ryan Newton, and Simon Peyton Jones. 2011. A Monad for Deterministic Parallelism. SIGPLAN Not., 46, 12 (2011), Sept., 71–82. issn:0362-1340 https://doi.org/10.1145/2096148.2034685 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 65–78. isbn:9781605583327 https://doi.org/10.1145/1596550.1596563 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Menon and L. Dagum. 1998. OpenMP: An Industry-Standard API for Shared-Memory Programming. Computing in Science and Engineering, v, 01 (1998), jan, 46–55. issn:1558-366X https://doi.org/10.1109/99.660313 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Leo A. Meyerovich, Todd Mytkowicz, and Wolfram Schulte. 2011. Data Parallel Programming for Irregular Tree Computations. In HotPAR. https://www.microsoft.com/en-us/research/publication/data- parallel-programming-for-irregular-tree-computations/Google ScholarGoogle Scholar
  30. Robin Milner, Mads Tofte, and David Macqueen. 1997. The Definition of Standard ML. MIT Press, Cambridge, MA, USA. isbn:0262631814Google ScholarGoogle Scholar
  31. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8026–8037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ram Raghunathan, Stefan K Muller, Umut A Acar, and Guy Blelloch. 2016. Hierarchical memory management for parallel programs. ACM SIGPLAN Notices, 51, 9 (2016), 392–406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mike Rainey, Kyle A Hale, Ryan Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism.Google ScholarGoogle Scholar
  34. John Reppy, Claudio V. Russo, and Yingqi Xiao. 2009. Parallel Concurrent ML. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). Association for Computing Machinery, New York, NY, USA. 257–268. isbn:9781605583327 https://doi.org/10.1145/1596550.1596588 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig. 2004. A Nanopass Infrastructure for Compiler Education. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP ’04). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:1581139055 https://doi.org/10.1145/1016850.1016878 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). Association for Computing Machinery, New York, NY, USA. 68–70. isbn:9781450312134 https://doi.org/10.1145/2312005.2312018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jeremy Siek, Carl Factora, Andre Kuhlenschmidt, Ryan Newton, Scott Ryan, Cameron Swords, Michael Vitousek, and Michael Vollmer. 2020. Essentials Of Compilation: An Incremental Approach. https://iucompilercourse.github.io/IU-P423-P523-E313-E513-Fall-2020/Google ScholarGoogle Scholar
  38. KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. Proc. ACM Program. Lang., 4, ICFP, Article 113, 30 pages. https://doi.org/10.1145/3408995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. arXiv preprint arXiv:2004.11663.Google ScholarGoogle Scholar
  40. Mads Tofte, Lars Birkedal, Martin Elsman, and Niels Hallenberg. 2004. A Retrospective on Region-Based Memory Management. Higher Order Symbol. Comput., 17, 3 (2004), Sept., 245–265. issn:1388-3690 https://doi.org/10.1023/B:LISP.0000029446.78563.a4 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Inf. Comput., 132, 2 (1997), Feb., 109–176. issn:0890-5401 https://doi.org/10.1006/inco.1996.2613 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Katsuhiro Ueno and Atsushi Ohori. 2016. A fully concurrent garbage collector for functional programs on multicore processors. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. 421–433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kenton Varda. 2015. Cap’n Proto. https://capnproto.org/Google ScholarGoogle Scholar
  44. Michael Vollmer, Chaitanya Koparkar, Mike Rainey, Laith Sakka, Milind Kulkarni, and Ryan R. Newton. 2019. LoCal: A Language for Programs Operating on Serialized Data. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 48–62. isbn:9781450367127 https://doi.org/10.1145/3314221.3314631 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Michael Vollmer, Sarah Spall, Buddhika Chamith, Laith Sakka, Chaitanya Koparkar, Milind Kulkarni, Sam Tobin-Hochstadt, and Ryan R. Newton. 2017. Compiling Tree Transforms to Operate on Packed Representations. In 31st European Conference on Object-Oriented Programming (ECOOP 2017), Peter Müller (Ed.) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 74). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany. 26:1–26:29. isbn:978-3-95977-035-4 issn:1868-8969 https://doi.org/10.4230/LIPIcs.ECOOP.2017.26 Google ScholarGoogle ScholarCross RefCross Ref
  46. Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2019. Disentanglement in Nested-Parallel Programs. Proc. ACM Program. Lang., 4, POPL (2019), Article 47, Dec., 32 pages. https://doi.org/10.1145/3371115 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Edward Z. Yang, Giovanni Campagna, Ömer S. Ağacan, Ahmed El-Hassany, Abhishek Kulkarni, and Ryan R. Newton. 2015. Efficient Communication and Collection with Compact Normal Forms. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA. 362–374. isbn:978-1-4503-3669-7 https://doi.org/10.1145/2784731.2784735 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient tree-traversals: reconciling parallelism and dense data representations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!