Abstract
Relaxed Radix Balanced Trees (RRB-Trees) is one of the latest members in a family of persistent tree based data-structures that combine wide branching factors with simple and relatively flat structures. Like the battle-tested immutable sequences of Clojure and Scala, they have effectively constant lookup and updates, good cache utilization, but also logarithmic concatenation and slicing. Our goal is to bring the benefits of persistent data structures to the discipline of systems programming via generic yet efficient immutable vectors supporting transient batch updates. We describe a C++ implementation that can be integrated in the runtime of higher level languages with a C core (Lisps like Guile or Racket, but also Python or Ruby), thus widening the access to these persistent data structures.
In this work we propose (1) an Embedding RRB-Tree (ERRB-Tree) data structure that efficiently stores arbitrary unboxed types, (2) a technique for implementing tree operations orthogonal to optimizations for a more compact representation of the tree, (3) a policy-based design to support multiple memory management and reclamation mechanisms (including automatic garbage collection and reference counting), (4) a model of transience based on move-semantics and reference counting, and (5) a definition of transience for confluent meld operations. Combining these techniques a performance comparable to that of mutable arrays can be achieved in many situations, while using the data structure in a functional way.
Supplemental Material
Available for Download
% Artifact for the ICFP17 paper "Persistence for the masses: RRB-Vectors in a systems language" % Juan Pedro Bolívar Puente Artifact contents ================= * A LibreOffice Calc document containing the aggregated benchmark result data and plots used in the paper. \ File: `data/benchmarks.ods` * All the raw output from the various benchmarking tools used. \ Folder: `data/raw/` * A `Dockerfile` for a Docker[^docker] image including a whole system suitable for reproducing our results (see next section). * A `Makefile` to ease launching the various scripts. [^immer]: Immer: `https://sinusoid.es/immer` [^git]: Immer source code: `https://github.com/arximboldi/immer` [^docker]: Docker: `https://www.docker.com`
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2014. Theory and Practice of Chunked Sequences. Springer Berlin Heidelberg, Berlin, Heidelberg, 25–36. Google Scholar
Cross Ref
- Andrei Alexandrescu. 2001. Modern C++ design: generic programming and design patterns applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Google Scholar
- Matthew H. Austern. 2000. Segmented Iterators and Hierarchical Algorithms. In Selected Papers from the International Seminar on Generic Programming. Springer-Verlag, London, UK, UK, 80–90. http://dl.acm.org/citation.cfm?id=647373. 724070Google Scholar
- Phil Bagwell. 2000. Fast And Space Efficient Trie Searches. Technical Report.Google Scholar
- Phil Bagwell. 2001. Ideal Hash Trees. Es Grands Champs 1195 (2001).Google Scholar
- Phil Bagwell. 2002. Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays. In In Implementation of Functional Languages, 14th International Workshop. 34.Google Scholar
Digital Library
- Philip Bagwell and Tiark Rompf. 2011. RRB-Trees: Efficient Immutable Vectors. Technical Report. EPFL.Google Scholar
- Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2001. Composing High-performance Memory Allocators. SIGPLAN Not. 36, 5 (May 2001), 114–124. Google Scholar
Digital Library
- Hans-Juergen Boehm and Mark Weiser. 1988. Garbage Collection in an Uncooperative Environment. Softw., Pract. Exper. 18, 9 (1988), 807–820. Google Scholar
Digital Library
- Hans-J. Boehm, Russ Atkinson, and Michael Plass. 1995. Ropes: An Alternative to Strings. Softw. Pract. Exper. 25, 12 (Dec. 1995), 1315–1330. Google Scholar
Digital Library
- H-J Boehm, M Spertus, and C Nelson. 2008. N2670: Minimal support for garbage collection and reachability-based leak detection (revised. (2008).Google Scholar
- Sébastien Collette, John Iacono, and Stefan Langerman. 2012. Confluent Persistence Revisited. In Proceedings of the Twentythird Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’12). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 593–601. http://dl.acm.org/citation.cfm?id=2095116.2095166 Google Scholar
Cross Ref
- Erik D. Demaine, Stefan Langerman, and Eric Price. 2010. Confluently Persistent Tries for Efficient Version Control. Algorithmica 57, 3 (July 2010), 462–483. Google Scholar
Digital Library
- Ulrich Drepper. 2008. What Every Programmer Should Know About Memory. Technical Report. Red Hat. http://people. redhat.com/drepper/cpumemory.pdfGoogle Scholar
- J R Driscoll, N Sarnak, D D Sleator, and R E Tarjan. 1986. Making Data Structures Persistent. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (STOC ’86). ACM, New York, NY, USA, 109–121. Google Scholar
Digital Library
- Richard Fabian. 2013. Data-Oriented Design. (2013). http://www.dataorienteddesign.com/dodmain/dodmain.htmlGoogle Scholar
- Amos Fiat and Haim Kaplan. 2001. Making Data Structures Confluently Persistent. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’01). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 537–546. http://dl.acm.org/citation.cfm?id=365411.365528Google Scholar
Digital Library
- Matthew Flatt and PLT. 2010. Reference: Racket. Technical Report PLT-TR-2010-1. PLT Design Inc. https://racket-lang.org/ tr1/ .Google Scholar
- Mark Galassi, Jim Blandy, Gary Houston, Tim Pierce, Neil Jerram, Martin Grabmüller, and Andy Wingo. 2002. Guile Reference Manual. (2002). https://www.gnu.org/software/guile/manual/guile.htmlGoogle Scholar
- Erich Gamma, Richard Helm, Ralph E. Johnson, and John Vlissides. 1995. Design Patterns. Elements of Reusable ObjectOriented Software. Addison-Wesley.Google Scholar
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. SIGPLAN Not. 42, 10 (Oct. 2007), 57–76. Google Scholar
Digital Library
- Matthias Grimmer, Chris Seaton, Thomas Würthinger, and Hanspeter Mössenböck. 2015. Dynamically Composing Languages in a Modular Way: Supporting C Extensions for Dynamic Languages. In Proceedings of the 14th International Conference on Modularity (MODULARITY 2015). ACM, New York, NY, USA, 1–13. Google Scholar
Digital Library
- Rich Hickey. 2008. The Clojure Programming Language. In Proceedings of the 2008 Symposium on Dynamic Languages (DLS ’08). ACM, New York, NY, USA. Google Scholar
Digital Library
- Howard E. Hinnant, David Abrahams, and Peter Dimov. 2004. A Proposal to Add an Rvalue Reference to the C++ Language. Technical Report N1690=04-0130. ISO JTC1/SC22/WG21 – C++ working group.Google Scholar
- Ralf Hinze and Ross Paterson. 2006. Finger Trees: A Simple General-purpose Data Structure. Journal of Functional Programming 16, 2 (2006), 197–217. Google Scholar
Digital Library
- Haim Kaplan. 2005. Persistent data structures. In In Handbook On Data Structures And applications, CRC Press 2001, Dinesh Meht And Sarta Sahni (Editors) Boroujerdi, A., And Moret, B.M.E., "Persistency in Computational Geometry"; Proc. 7TH Canadian Conf. Comp. Geometry, Quebeq. 241–246.Google Scholar
- Jean Niklas L’orange. 2014. Improving RRB-Tree Performance through Transience. Master’s thesis. Norwegian University of Science and Technology.Google Scholar
- Nicholas D. Matsakis and Felix S. Klock, II. 2014. The Rust Language. Ada Lett. 34, 3 (Oct. 2014), 103–104. Google Scholar
Digital Library
- C. Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press. https://books.google.de/books?id= SxPzSTcTalACGoogle Scholar
- Aleksandar Prokopec. 2014. Data Structures and Algorithms for Data-Parallel Computing in a Managed Runtime. Ph.D. Dissertation. IC, Lausanne. Google Scholar
Cross Ref
- Jon Rafkind, Adam Wick, John Regehr, and Matthew Flatt. 2009. Precise Garbage Collection for C. In Proceedings of the 2009 International Symposium on Memory Management (ISMM ’09). ACM, New York, NY, USA, 39–48. Google Scholar
Digital Library
- Michael J. Steindorfer and Jurgen J. Vinju. 2015. Optimizing Hash-array Mapped Tries for Fast and Lean Immutable JVM Collections. SIGPLAN Not. 50, 10 (Oct. 2015), 783–800. Google Scholar
Digital Library
- Michael J. Steindorfer and Jurgen J. Vinju. 2016. Towards a Software Product Line of Trie-based Collections. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2016). ACM, New York, NY, USA, 168–172. Google Scholar
Digital Library
- Nicolas Stucki, Tiark Rompf, Vlad Ureche, and Phil Bagwell. 2015. RRB Vector: A Practical General Purpose Immutable Sequence. SIGPLAN Not. 50, 9 (Aug. 2015), 342–354. Google Scholar
Digital Library
- D Walker. 2005. Substructural type systems. In In Advanced Topics in Types and Programming Languages. The MIT Press.Google Scholar
Index Terms
Persistence for the masses: RRB-vectors in a systems language
Recommendations
RRB vector: a practical general purpose immutable sequence
ICFP 2015: Proceedings of the 20th ACM SIGPLAN International Conference on Functional ProgrammingState-of-the-art immutable collections have wildly differing performance characteristics across their operations, often forcing programmers to choose different collection implementations for each task. Thus, changes to the program can invalidate the ...
RRB vector: a practical general purpose immutable sequence
ICFP '15State-of-the-art immutable collections have wildly differing performance characteristics across their operations, often forcing programmers to choose different collection implementations for each task. Thus, changes to the program can invalidate the ...
An efficient on-the-fly cycle collection
A reference-counting garbage collector cannot reclaim unreachable cyclic structures of objects. Therefore, reference-counting collectors either use a backup tracing collector infrequently, or employ a cycle collector to reclaim cyclic structures. We ...






Comments