Abstract
Collection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain users from using an optimal variant for a given problem. Although a wide range of specialized collections is available for the Java Virtual Machine (JVM), they introduce yet another dependency and complicate user adoption by requiring specific Application Program Interfaces (APIs) incompatible with the standard library.
A product line for collection data structures would relieve library designers from optimizing for the general case. Furthermore, a product line allows evolving the potentially large code base of a collection family efficiently. The challenge is to find a small core framework for collection data structures which covers all variations without exhaustively listing them, while supporting good performance at the same time.
We claim that the concept of Array Mapped Tries (AMTs) embodies a high degree of commonality in the sub-domain of immutable collection data structures. AMTs are flexible enough to cover most of the variability, while minimizing code bloat in the generator and the generated code. We implemented a Data Structure Code Generator (DSCG) that emits immutable collections based on an AMT skeleton foundation. The generated data structures outperform competitive hand-optimized implementations, and the generator still allows for customization towards specific workloads.
- V. Alves, D. Schneider, M. Becker, N. Bencomo, and P. Grace. “Comparitive Study of Variability Management in Software Product Lines and Runtime Adaptable Systems”. In: Proceedings of VaMoS ’09. Universität Duisburg-Essen, 2009.Google Scholar
- P. Bagwell. Fast And Space Efficient Trie Searches. Tech. rep. LAMP-REPORT-2000-001. Ecole polytechnique fédérale de Lausanne, 2000.Google Scholar
- P. Bagwell. Ideal Hash Trees. Tech. rep. LAMPREPORT-2001-001. Ecole polytechnique fédérale de Lausanne, 2001.Google Scholar
- P. Bagwell and T. Rompf. RRB-Trees: Efficient Immutable Vectors. Tech. rep. EPFL-REPORT-169879.Google Scholar
- Ecole polytechnique fédérale de Lausanne, 2011.Google Scholar
- T. J. Biggerstaff. “A Perspective of Generative Reuse”. In: Annals of Software Engineering 5.1 (1998). Google Scholar
Digital Library
- R. Bird. “Two Dimensional Pattern Matching”. In: Information Processing Letters 6.5 (1977).Google Scholar
Cross Ref
- R. de la Briandais. “File Searching Using Variable Length Keys”. In: Proceedings of IRE-AIEE-ACM ’59 (Western). ACM, 1959. Google Scholar
Digital Library
- P. Clements and L. Northrop. Software Product Lines: Practices and Patterns. Addison-Wesley, 2001. Google Scholar
Digital Library
- K. Czarnecki and U. W. Eisenecker. Generative Programming: Methods, Tools, and Applications. ACM Press, 2000. Google Scholar
Digital Library
- A. van Deursen and P. Klint. “Domain-Specific Language Design Requires Feature Descriptions”. In: Journal of Computing and Information Technology 10.1 (2002).Google Scholar
Cross Ref
- J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. “Making Data Structures Persistent”. In: Proceedings of STOC ’86. ACM, 1986. Google Scholar
Digital Library
- E. Fredkin. “Trie Memory”. In: Communications of the ACM 3.9 (1960). Google Scholar
Digital Library
- S. Hallsteinsen, M. Hinchey, S. Park, and K. Schmid. “Dynamic Software Product Lines”. In: Computer 41.4 (2008). Google Scholar
Digital Library
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. “Concurrent Data Representation Synthesis”. In: Proceedings of PLDI ’12. ACM, 2012. Google Scholar
Digital Library
- P. Hawkins, A. Aiken, K. Fisher, M. Rinard, and M. Sagiv. “Data Representation Synthesis”. In: Proceedings of PLDI ’11. ACM, 2011. Google Scholar
Digital Library
- P. Helland. “Immutability Changes Everything”. In: Communications of the ACM 59.1 (2015). Google Scholar
Digital Library
- P. Klint, T. van der Storm, and J. Vinju. “Rascal: A Domain Specific Language for Source Code Analysis and Manipulation”. In: Proceedings of SCAM ’09. IEEE, 2009. Google Scholar
Digital Library
- C. Loncaric, E. Torlak, and M. D. Ernst. “Fast Synthesis of Fast Collections”. In: Proceedings of PLDI ’16. ACM, 2016. Google Scholar
Digital Library
- D. McIlroy. “Mass-Produced Software Components”. In: Proceedings of NATO Software Engineering Conference. Scientific Affairs Division, NATO, 1969.Google Scholar
- C. Okasaki. Purely Functional Data Structures. Cambridge University Press, 1999. Google Scholar
Digital Library
- A. Prokopec, N. G. Bronson, P. Bagwell, and M. Odersky. “Concurrent Tries with Efficient Non-blocking Snapshots”. In: Proc. of PPoPP ’12. ACM, 2012. Google Scholar
Digital Library
- E. Schonberg, J. T. Schwartz, and M. Sharir. “Automatic Data Structure Selection in SETL”. In: Proceedings of POPL ’79. ACM, 1979. Google Scholar
Digital Library
- O. Shacham, M. Vechev, and E. Yahav. “Chameleon: Adaptive Selection of Collections”. In: Proceedings of PLDI ’09. ACM, 2009. Google Scholar
Digital Library
- Y. Smaragdakis and D. Batory. “DiSTiL: A Transformation Library for Data Structures”. In: Proceedings of DSL’97. USENIX Association, 1997. Google Scholar
Digital Library
- M. J. Steindorfer. Towards a Feature Model of Trie-Based Collections. 2016.Google Scholar
- DOI : 10. 5281 / zenodo. 59739.Google Scholar
- M. J. Steindorfer and J. J. Vinju. “Code Specialization for Memory Efficient Hash Tries (Short Paper)”. In: Proceedings of GPCE ’14. ACM, 2014. Google Scholar
Digital Library
- M. J. Steindorfer and J. J. Vinju. “Fast and Lean Immutable Multi-Maps on the JVM based on Heterogeneous Hash-Array Mapped Tries”. In: ArXiv e-prints (2016). arXiv: 1608.01036 {cs.DS}.Google Scholar
- M. J. Steindorfer and J. J. Vinju. “Optimizing Hasharray Mapped Tries for Fast and Lean Immutable JVM Collections”. In: Proceedings of OOPSLA ’15. ACM, 2015. Google Scholar
Digital Library
- M. J. Steindorfer and J. J. Vinju. “Performance Modeling of Maximal Sharing”. In: Proceedings of ICPE ’16. ACM, 2016. Google Scholar
Digital Library
- N. Stucki, T. Rompf, V. Ureche, and P. Bagwell. “RRB Vector: A Practical General Purpose Immutable Sequence”. In: Proceedings of ICFP ’15. ACM, 2015. Google Scholar
Digital Library
- V. Ureche, C. Talau, and M. Odersky. “Miniboxing: Improving the Speed to Code Size Tradeoff in Parametric Polymorphism Translations”. In: Proceedings of OOPSLA ’13. ACM, 2013. Google Scholar
Digital Library
- Introduction Related Work A Stable Data Type Independent Encoding Intermediate Generator Abstractions ConclusionGoogle Scholar
Index Terms
Towards a software product line of trie-based collections
Recommendations
Towards a software product line of trie-based collections
GPCE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesCollection data structures in standard libraries of programming languages are designed to excel for the average case by carefully balancing memory footprint and runtime performance. These implicit design decisions and hard-coded trade-offs do constrain ...
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA '15The data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...
Optimizing hash-array mapped tries for fast and lean immutable JVM collections
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsThe data structures under-pinning collection API (e.g. lists, sets, maps) in the standard libraries of programming languages are used intensively in many applications. The standard libraries of recent Java Virtual Machine languages, such as Clojure or ...







Comments