Abstract
An immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to carefully balance execution time of operations with memory consumption characteristics and need to scale gracefully from a few elements to multiple gigabytes at least. When processing larger in-memory data sets the overhead of the data structure encoding itself becomes a memory usage bottleneck, dominating the overall performance.
In this paper we propose AXIOM, a novel hash-trie data structure that allows for a highly efficient and type-safe multi-map encoding by distinguishing inlined values of singleton sets from nested sets of multi-mappings. AXIOM strictly generalizes over previous hash-trie data structures by supporting the processing of fine-grained type-heterogeneous content on the implementation level (while API and language support for type-heterogeneity are not scope of this paper). We detail the design and optimizations of AXIOM and further compare it against state-of-the-art immutable maps and multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a case study in static analysis. AXIOM reduces the key-value storage overhead by 1.87x; with specializing and inlining across collection boundaries it improves by 5.1x.
Supplemental Material
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley. Google Scholar
Digital Library
- Phil Bagwell. 2001. Ideal Hash Trees. Technical Report LAMP-REPORT- 2001-001. Ecole polytechnique federale de Lausanne.Google Scholar
- Phil Bagwell and Tiark Rompf. 2011. RRB-Trees: Efficient Immutable Vectors. Technical Report EPFL-REPORT-169879. Ecole polytechnique federale de Lausanne.Google Scholar
- Carl Friedrich Bolz, Lukas Diekmann, and Laurence Tratt. 2013. Storage Strategies for Collections in Dynamically Typed Languages. In OOPSLA '13. ACM. Google Scholar
Digital Library
- Rene de la Briandais. 1959. File Searching Using Variable Length Keys. In IRE-AIEE-ACM '59 (Western). ACM.Google Scholar
- James R. Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. 1986. Making Data Structures Persistent. In STOC '86. ACM. Google Scholar
Digital Library
- Edward Fredkin. 1960. Trie Memory. Commun. ACM 3, 9 (1960). Google Scholar
Digital Library
- Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In OOPSLA '07. ACM. Google Scholar
Digital Library
- Loukas Georgiadis, Robert E. Tarjan, and Renato F. Werneck. 2006. Finding Dominators in Practice. J. Graph Algorithms Appl. 10, 1 (2006).Google Scholar
Cross Ref
- Joseph Gil and Yuval Shimron. 2012. Smaller Footprint for Java Collections. In ECOOP '12. Springer-Verlag. Google Scholar
Digital Library
- Eiichi Goto. 1974. Monocopy and Associative Algorithms in Extended Lisp. University of Toyko. Technical Report.Google Scholar
- Mark Harman, David Binkley, Keith Gallagher, Nicolas Gold, and Jens Krinke. 2009. Dependence Clusters in Source Code. ACM Trans. Program. Lang. Syst. 32, 1 (2009). Google Scholar
Digital Library
- Mark Hills and Paul Klint. 2014. PHP AiR: Analyzing PHP systems with Rascal. In CSMR/WCRE '14 Tools. IEEE.Google Scholar
Cross Ref
- Atsushi Igarashi and Mirko Viroli. 2002. On Variance-Based Subtyping for Parametric Types. In ECOOP '02. Springer-Verlag. Google Scholar
Digital Library
- Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM '13. ACM. Google Scholar
Digital Library
- V. Leis, A. Kemper, and T. Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE '13. Google Scholar
Digital Library
- Chris Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press. Google Scholar
Digital Library
- Simon Peyton Jones. 2003. The Haskell 98 Language and Libraries. Journal of Functional Programming 13, 1 (2003).Google Scholar
- Juan Pedro Bolivar Puente. 2017. Persistence for the Masses: RRB-vectors in a Systems Language. Proc. ACM Program. Lang. 1, ICFP, Article 16 (2017). Google Scholar
Digital Library
- Neil Sarnak and Robert E. Tarjan. 1986. Planar Point Location Using Persistent Search Trees. Communications of the ACM 29, 7 (1986). Google Scholar
Digital Library
- Siddhartha Sen, Robert E. Tarjan, and David Hong Kyun Kim. 2016. Deletion Without Rebalancing in Binary Search Trees. ACM Transactions on Algorithms 12, 4, Article 57 (2016). Google Scholar
Digital Library
- Lukas Stadler. 2014. Partial Escape Analysis and Scalar Replacement for Java. Ph.D. Dissertation. Johannes Kepler University Linz.Google Scholar
- Michael J. Steindorfer and Jurgen J. Vinju. 2014. Code Specialization for Memory Efficient Hash Tries (Short Paper). In GPCE '14. ACM. Google Scholar
Digital Library
- Michael J. Steindorfer and Jurgen J. Vinju. 2015. Optimizing Hasharray Mapped Tries for Fast and Lean Immutable JVM Collections. In OOPSLA '15. ACM. Google Scholar
Digital Library
- Michael J. Steindorfer and Jurgen J. Vinju. 2016. Towards a Software Product Line of Trie-based Collections. In GPCE '16. ACM. Google Scholar
Digital Library
- Nicolas Stucki, Tiark Rompf, Vlad Ureche, and Phil Bagwell. 2015. RRB Vector: A Practical General Purpose Immutable Sequence. In ICFP '15. ACM. Google Scholar
Digital Library
- L. VanderHart and R. Neufeld. 2014. Clojure Cookbook: Recipes for Functional Programming. O'Reilly Media. Google Scholar
Digital Library
- P. Wadler and S. Blott. 1989. How to Make Ad-hoc Polymorphism Less Ad Hoc. In POPL '89. ACM. Google Scholar
Digital Library
Index Terms
To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-tries
Recommendations
To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-tries
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationAn immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming ...
On the connection between functional programming languages and real-time Java scoped memory
JTRES '07: Proceedings of the 5th international workshop on Java technologies for real-time and embedded systemsJava has recently joined C and C++ as a relatively high-level language suitable for developing real-time applications. Java's garbage collection, while generally a useful feature, can be problematic for real-time applications if collection occurs with ...
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsThe data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...







Comments