Abstract
The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We generate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.
- P. Bagwell. Ideal Hash Trees. Technical Report LAMP-REPORT-2001- 001, Ecole polytechnique fédérale de Lausanne, Oct. 2001.Google Scholar
- T. J. Biggerstaff. A Perspective of Generative Reuse. Annals of Software Engineering, 5(1):169–226, Jan. 1998. Google Scholar
Digital Library
- K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google Scholar
Digital Library
- R. De La Briandais. File Searching Using Variable Length Keys. In IRE-AIEE-ACM ’59 (Western): Papers Presented at the the March 3-5, 1959, Western Joint Computer Conference. ACM, Mar. 1959. Google Scholar
Digital Library
- J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making Data Structures Persistent. In STOC ’86: Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing. ACM, Nov. 1986. Google Scholar
Digital Library
- E. Fredkin. Trie Memory. Communications of the ACM, 3(9):490–499, Sept. 1960. Google Scholar
Digital Library
- J. Gil and Y. Shimron. Smaller Footprint for Java Collections. In ECOOP’12: Proceedings of the 26th European Conference on Object-Oriented Programming. Springer, June 2012. Google Scholar
Digital Library
- D. McIlroy. Mass-Produced Software Components. In P. Naur and B. Randell, editors, Proceedings of NATO Software Engineering Conference, pages 138–155, Oct. 1968.Google Scholar
- C. Okasaki. Purely Functional Data Structures. Cambridge University Press, June 1999. Google Scholar
Digital Library
Index Terms
Code specialization for memory efficient hash tries (short paper)
Recommendations
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsThe data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...
Code specialization for memory efficient hash tries (short paper)
GPCE 2014: Proceedings of the 2014 International Conference on Generative Programming: Concepts and ExperiencesThe hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than ...
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA '15The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...






Comments