skip to main content

To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-tries

Published:11 June 2018Publication History
Skip Abstract Section

Abstract

An immutable multi-map is a many-to-many map data structure with expected fast insert and lookup operations. This data structure is used for applications processing graphs or many-to-many relations as applied in compilers, runtimes of programming languages, or in static analysis of object-oriented systems. Collection data structures are assumed to carefully balance execution time of operations with memory consumption characteristics and need to scale gracefully from a few elements to multiple gigabytes at least. When processing larger in-memory data sets the overhead of the data structure encoding itself becomes a memory usage bottleneck, dominating the overall performance.

In this paper we propose AXIOM, a novel hash-trie data structure that allows for a highly efficient and type-safe multi-map encoding by distinguishing inlined values of singleton sets from nested sets of multi-mappings. AXIOM strictly generalizes over previous hash-trie data structures by supporting the processing of fine-grained type-heterogeneous content on the implementation level (while API and language support for type-heterogeneity are not scope of this paper). We detail the design and optimizations of AXIOM and further compare it against state-of-the-art immutable maps and multi-maps in Java, Scala and Clojure. We isolate key differences using microbenchmarks and validate the resulting conclusions on a case study in static analysis. AXIOM reduces the key-value storage overhead by 1.87x; with specializing and inlining across collection boundaries it improves by 5.1x.

Skip Supplemental Material Section

Supplemental Material

p283-steindorfer.webm

References

  1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Phil Bagwell. 2001. Ideal Hash Trees. Technical Report LAMP-REPORT- 2001-001. Ecole polytechnique federale de Lausanne.Google ScholarGoogle Scholar
  3. Phil Bagwell and Tiark Rompf. 2011. RRB-Trees: Efficient Immutable Vectors. Technical Report EPFL-REPORT-169879. Ecole polytechnique federale de Lausanne.Google ScholarGoogle Scholar
  4. Carl Friedrich Bolz, Lukas Diekmann, and Laurence Tratt. 2013. Storage Strategies for Collections in Dynamically Typed Languages. In OOPSLA '13. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rene de la Briandais. 1959. File Searching Using Variable Length Keys. In IRE-AIEE-ACM '59 (Western). ACM.Google ScholarGoogle Scholar
  6. James R. Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan. 1986. Making Data Structures Persistent. In STOC '86. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Edward Fredkin. 1960. Trie Memory. Commun. ACM 3, 9 (1960). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In OOPSLA '07. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Loukas Georgiadis, Robert E. Tarjan, and Renato F. Werneck. 2006. Finding Dominators in Practice. J. Graph Algorithms Appl. 10, 1 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  10. Joseph Gil and Yuval Shimron. 2012. Smaller Footprint for Java Collections. In ECOOP '12. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eiichi Goto. 1974. Monocopy and Associative Algorithms in Extended Lisp. University of Toyko. Technical Report.Google ScholarGoogle Scholar
  12. Mark Harman, David Binkley, Keith Gallagher, Nicolas Gold, and Jens Krinke. 2009. Dependence Clusters in Source Code. ACM Trans. Program. Lang. Syst. 32, 1 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Hills and Paul Klint. 2014. PHP AiR: Analyzing PHP systems with Rascal. In CSMR/WCRE '14 Tools. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  14. Atsushi Igarashi and Mirko Viroli. 2002. On Variance-Based Subtyping for Parametric Types. In ECOOP '02. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tomas Kalibera and Richard Jones. 2013. Rigorous Benchmarking in Reasonable Time. In ISMM '13. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Leis, A. Kemper, and T. Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE '13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chris Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Simon Peyton Jones. 2003. The Haskell 98 Language and Libraries. Journal of Functional Programming 13, 1 (2003).Google ScholarGoogle Scholar
  19. Juan Pedro Bolivar Puente. 2017. Persistence for the Masses: RRB-vectors in a Systems Language. Proc. ACM Program. Lang. 1, ICFP, Article 16 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Neil Sarnak and Robert E. Tarjan. 1986. Planar Point Location Using Persistent Search Trees. Communications of the ACM 29, 7 (1986). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Siddhartha Sen, Robert E. Tarjan, and David Hong Kyun Kim. 2016. Deletion Without Rebalancing in Binary Search Trees. ACM Transactions on Algorithms 12, 4, Article 57 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lukas Stadler. 2014. Partial Escape Analysis and Scalar Replacement for Java. Ph.D. Dissertation. Johannes Kepler University Linz.Google ScholarGoogle Scholar
  23. Michael J. Steindorfer and Jurgen J. Vinju. 2014. Code Specialization for Memory Efficient Hash Tries (Short Paper). In GPCE '14. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael J. Steindorfer and Jurgen J. Vinju. 2015. Optimizing Hasharray Mapped Tries for Fast and Lean Immutable JVM Collections. In OOPSLA '15. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael J. Steindorfer and Jurgen J. Vinju. 2016. Towards a Software Product Line of Trie-based Collections. In GPCE '16. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nicolas Stucki, Tiark Rompf, Vlad Ureche, and Phil Bagwell. 2015. RRB Vector: A Practical General Purpose Immutable Sequence. In ICFP '15. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. VanderHart and R. Neufeld. 2014. Clojure Cookbook: Recipes for Functional Programming. O'Reilly Media. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Wadler and S. Blott. 1989. How to Make Ad-hoc Polymorphism Less Ad Hoc. In POPL '89. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. To-many or to-one? all-in-one! efficient purely functional multi-maps with type-heterogeneous hash-tries

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!