skip to main content
article

Code specialization for memory efficient hash tries (short paper)

Published:15 September 2014Publication History
Skip Abstract Section

Abstract

The hash trie data structure is a common part in standard collection libraries of JVM programming languages such as Clojure and Scala. It enables fast immutable implementations of maps, sets, and vectors, but it requires considerably more memory than an equivalent array-based data structure. This hinders the scalability of functional programs and the further adoption of this otherwise attractive style of programming. In this paper we present a product family of hash tries. We generate Java source code to specialize them using knowledge of JVM object memory layout. The number of possible specializations is exponential. The optimization challenge is thus to find a minimal set of variants which lead to a maximal loss in memory footprint on any given data. Using a set of experiments we measured the distribution of internal tree node sizes in hash tries. We used the results as a guidance to decide which variants of the family to generate and which variants should be left to the generic implementation. A preliminary validating experiment on the implementation of sets and maps shows that this technique leads to a median decrease of 55% in memory footprint for maps (and 78% for sets), while still maintaining comparable performance. Our combination of data analysis and code specialization proved to be effective.

References

  1. P. Bagwell. Ideal Hash Trees. Technical Report LAMP-REPORT-2001- 001, Ecole polytechnique fédérale de Lausanne, Oct. 2001.Google ScholarGoogle Scholar
  2. T. J. Biggerstaff. A Perspective of Generative Reuse. Annals of Software Engineering, 5(1):169–226, Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. De La Briandais. File Searching Using Variable Length Keys. In IRE-AIEE-ACM ’59 (Western): Papers Presented at the the March 3-5, 1959, Western Joint Computer Conference. ACM, Mar. 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making Data Structures Persistent. In STOC ’86: Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing. ACM, Nov. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Fredkin. Trie Memory. Communications of the ACM, 3(9):490–499, Sept. 1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Gil and Y. Shimron. Smaller Footprint for Java Collections. In ECOOP’12: Proceedings of the 26th European Conference on Object-Oriented Programming. Springer, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. McIlroy. Mass-Produced Software Components. In P. Naur and B. Randell, editors, Proceedings of NATO Software Engineering Conference, pages 138–155, Oct. 1968.Google ScholarGoogle Scholar
  9. C. Okasaki. Purely Functional Data Structures. Cambridge University Press, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Code specialization for memory efficient hash tries (short paper)

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!