skip to main content
research-article

Optimizing hash-array mapped tries for fast and lean immutable JVM collections

Published:23 October 2015Publication History
Skip Abstract Section

Abstract

The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or Scala, contain scalable and well-performing immutable collection data structures that are implemented as Hash-Array Mapped Tries (HAMTs). HAMTs already feature efficient lookup, insert, and delete operations, however due to their tree-based nature their memory footprints and the runtime performance of iteration and equality checking lag behind array-based counterparts. This particularly prohibits their application in programs which process larger data sets. In this paper, we propose changes to the HAMT design that increase the overall performance of immutable sets and maps. The resulting general purpose design increases cache locality and features a canonical representation. It outperforms Scala’s and Clojure’s data structure implementations in terms of memory footprint and runtime efficiency of iteration (1.3–6.7x) and equality checking (3–25.4x).

Skip Supplemental Material Section

Supplemental Material

References

  1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bagwell. Fast And Space Efficient Trie Searches. Technical Report LAMP-REPORT-2000-001, Ecole polytechnique fédérale de Lausanne, 2000.Google ScholarGoogle Scholar
  3. P. Bagwell. Ideal Hash Trees. Technical Report LAMPREPORT-2001-001, Ecole polytechnique fédérale de Lausanne, 2001.Google ScholarGoogle Scholar
  4. P. Bagwell. Fast Functional Lists, Hash-Lists, Deques, and Variable Length Arrays. Technical Report LAMP-REPORT- 2002-003, Ecole polytechnique fédérale de Lausanne, 2002.Google ScholarGoogle Scholar
  5. P. Bagwell and T. Rompf. RRB-Trees: Efficient Immutable Vectors. Technical Report EPFL-REPORT-169879, Ecole polytechnique fédérale de Lausanne, 2011.Google ScholarGoogle Scholar
  6. A. Biboudis, N. Palladinos, G. Fourtounis, and Y. Smaragdakis. Streams a la carte: Extensible Pipelines with Object Algebras. In ECOOP ’15: Proceedings of the 29th European conference on Object-Oriented Programming. Schloss Dagstuhl, 2015.Google ScholarGoogle Scholar
  7. T. J. Biggerstaff. A perspective of generative reuse. Annals of Software Engineering, 5(1):169–226, 1998. ISSN 1022-7091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. D. Cooper, T. J. Harvey, and K. Kennedy. A Simple, Fast Dominance Algorithm. Technical Report TR-06-33870, Rice University, 2006.Google ScholarGoogle Scholar
  9. K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. De La Briandais. File Searching Using Variable Length Keys. In Papers presented at the the March 3-5, 1959, western joint computer conference, pages 295–298. ACM Press, 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. In Proceedings of the 18th annual ACM symposium on Theory of computing. ACM, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ebert, D. Bildhauer, H. Schwarz, and V. Riediger. Using difference information to reuse software cases. Softwaretechnik-Trends, 2007.Google ScholarGoogle Scholar
  13. E. Fredkin. Trie Memory. Communications of the ACM, 3(9): 490–499, 1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gil and Y. Shimron. Smaller Footprint for Java Collections. In ECOOP ’12: Proceedings of the 26th European conference on Object-Oriented Programming. Springer-Verlag, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Hajiyev, M. Verbaere, and O. de Moor. Codequest: Scalable source code queries with datalog. In ECOOP ’06: Proceedings of the 20th European Conference on Object-Oriented Programming. Springer-Verlag, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hills and P. Klint. PHP AiR: Analyzing PHP systems with Rascal. In Proceedings of IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Igarashi and M. Viroli. On Variance-Based Subtyping for Parametric Types. In ECOOP ’02: Proceedings of the 16th European conference on Object-Oriented Programming. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Klint, T. van der Storm, and J. Vinju. Rascal: A Domain Specific Language for Source Code Analysis and Manipulation. In Proceedings of Ninth IEEE International Working Conference on Source Code Analysis and Manipulation. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. E. Ladner, R. Fortna, and B.-H. Nguyen. A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation. In Experimental Algorithmics. Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. McIlroy. Mass-Produced Software Components. In P. Naur and B. Randell, editors, Proceedings of NATO Software Engineering Conference, pages 138–155. NATO Scientific Affairs Division, 1968.Google ScholarGoogle Scholar
  21. N. Mitchell and G. Sevitsky. The causes of bloat, the limits of health. In OOPSLA ’07: Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Olsson and S. Nilsson. TRASH A dynamic LC-trie and hash data structure. In High Performance Switching and Routing, 2007. HPSR ’07. Workshop on. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. Concurrent tries with efficient non-blocking snapshots. In PPoPP ’12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Rademaker. Binary relational querying for structural source code analysis. Universiteit Utrecht. Master’s thesis, 2008.Google ScholarGoogle Scholar
  26. I. Ramakrishnan, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. Efficient Tabling Mechanisms for Logic Programs. In Proceedings of the 12th International Conference on Logic Programming. Elsevier, 1995.Google ScholarGoogle Scholar
  27. N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. Communications of the ACM, 29(7): 669–679, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Shavit and D. Touitou. Software transactional memory. ACM Press, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. J. Steindorfer and J. J. Vinju. Code Specialization for Memory Efficient Hash Tries (Short Paper). In GPCE ’14: Proceedings of Generative Programming Concepts & Experiences. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Stucki, T. Rompf, V. Ureche, and P. Bagwell. RRB Vector: A Practical General Purpose Immutable Sequence. In ICFP ’15: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Ureche, C. Talau, and M. Odersky. Miniboxing: improving the speed to code size tradeoff in parametric polymorphism translations. In OOPSLA ’13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer, and H. Mössenböck. An object storage model for the truffle language implementation framework. In PPPJ ’14: Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing hash-array mapped tries for fast and lean immutable JVM collections

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 10
      OOPSLA '15
      October 2015
      953 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2858965
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
        October 2015
        953 pages
        ISBN:9781450336895
        DOI:10.1145/2814270

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 October 2015

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!