Abstract
Dynamic programming languages are becoming increasingly popular, yet often show a significant performance slowdown compared to static languages. In this paper, we study the performance overhead of automatic memory management in dynamic languages. We propose to improve the performance and memory bandwidth usage of dynamic languages by co-optimizing garbage collection overhead and cache performance for newly-initialized and dead objects. Our study shows that less frequent garbage collection results in a large number of cache misses for initial stores to new objects. We solve this problem by directly placing uninitialized objects into on-chip caches without off-chip memory accesses. We further optimize the garbage collection by reducing unnecessary cache pollution and write-backs through partial tracing that invalidates dead objects between full garbage collections. Experimental results on PyPy and V8 show that less frequent garbage collection along with our optimizations can significantly improve the performance of dynamic languages.
- Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, and Armin Rigo. 2009. Tracing the meta-level: PyPy’s tracing JIT compiler. In Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems. ACM, 18–25. Google Scholar
Digital Library
- Browserbench. 2017. JetStream 1.1. (2017). http://browserbench.org/ JetStream/Google Scholar
- Coding Dojo. 2016. The 9 Most In-Demand Programming Languages of 2016. (2016). http://www.codingdojo.com/blog/ 9-most-in-demand-programming-languages-of-2016/Google Scholar
- Google. 2018. Chrome V8. (2018). https://developers.google.com/v8/Google Scholar
- Shiwen Hu and Lizy John. 2006. Avoiding store misses to fully modified cache blocks. In Performance, Computing, and Communications Conference, 2006. IPCCC 2006. 25th IEEE International. IEEE, 8–pp.Google Scholar
- IBM. 1994. PowerPC Microprocessor Family: The Programming Environments. IBM Microelectronics, Motorola Corporation.Google Scholar
- Spectrum IEEE. 2017. The 2017 Top Ten Programming Languages. (Jul 2017). https://spectrum.ieee.org/computing/software/ the-2017-top-programming-languagesGoogle Scholar
- José A Joao, Onur Mutlu, and Yale N Patt. 2009. Flexible referencecounting-based hardware acceleration for garbage collection. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 418–428. Google Scholar
Digital Library
- Jarrod A Lewis, Bryan Black, and Mikko H Lipasti. 2002. Avoiding initialization misses to the heap. In Computer Architecture, 2002. Proceedings. 29th Annual International Symposium on. IEEE, 183–194. Google Scholar
Digital Library
- Matthias Meyer. 2004. A novel processor architecture with exact tag-free pointers. IEEE Micro 24, 3 (2004), 46–55. Google Scholar
Digital Library
- Matthias Meyer. 2005. An on-chip garbage collection coprocessor for embedded real-time systems. In Embedded and Real-Time Computing Systems and Applications, 2005. Proceedings. 11th IEEE International Conference on. IEEE, 517–524. Google Scholar
Digital Library
- Sun Microsystems. 2006. Memory Management in the Java HotSpot Virtual Machine. (2006). http://www.oracle.com/technetwork/java/ javase/tech/memorymanagement-whitepaper-1-150020.pdfGoogle Scholar
- Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP laboratories (2009), 22–31.Google Scholar
- Kelvin D Nilsen and William Schmidt. 1996. System and hardware module for incremental real time garbage collection and memory management. (Sept. 24 1996). US Patent 5,560,003.Google Scholar
- Stephen O’Grady. 2016. The RedMonk Programming Language Rankings: January 2016. (Feb 2016). http://redmonk.com/sogrady/2016/02/ 19/language-rankings-1-16/Google Scholar
- Hannes Payer and Ross McIlroy. 2015. Getting Garbage Collection for Free. (2015). https://v8project.blogspot.com/2015/08/ getting-garbage-collection-for-free.htmlGoogle Scholar
- Chih-Jui Peng and Gurindar S Sohi. 1989. Cache memory design considerations to support languages with dynamic heap allocation. University of Wisconsin-Madison. Computer Sciences Department.Google Scholar
- The PyPy Project. 2014. Garbage Collection in PyPy. (2014). https: //pypy.readthedocs.io/en/release-2.4.x/garbage_collection.htmlGoogle Scholar
- PyPerformance. 2017. Python Performance Benchmark Suite. (2017). http://pyperformance.readthedocs.io/Google Scholar
- Vimal K Reddy, Richard K Sawyer, and Edward F Gehringer. 2006. A cache-pinning strategy for improving generational garbage collection. In International Conference on High-Performance Computing. Springer, 98–110. Google Scholar
Digital Library
- P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16–19. Google Scholar
Digital Library
- Rubinius. 2013. Concurrent Garbage Collection. (Jun 2013). https://github.com/rubinius/rubinius-archive/ blob/cf54187d421275eec7d2db0abd5d4c059755b577/_posts/ 2013-06-22-concurrent-garbage-collection.markdownGoogle Scholar
- Hou Rui, Fuxin Zhang, and Weiwu Hu. 2005. A memory bandwidth effective cache store miss policy. In Asia-Pacific Conference on Advances in Computer Systems Architecture. Springer, 750–760. Google Scholar
Digital Library
- Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA, 475–486. Google Scholar
Digital Library
- Jennifer B. Sartor, Wim Heirman, Stephen M. Blackburn, Lieven Eeckhout, and Kathryn S. McKinley. 2014. Cooperative Cache Scrubbing. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT ’14). ACM, New York, NY, USA, 15–26. Google Scholar
Digital Library
- William J Schmidt and Kelvin D Nilsen. 1994. Performance of a hardware-assisted real-time garbage collector. ACM SIGOPS Operating Systems Review 28, 5 (1994), 76–85. Google Scholar
Digital Library
- Jonathan Shidal, Zachary Gottlieb, Ron K Cytron, and Krishna M Kavi. 2014. Trash in cache: detecting eternally silent stores. In Proceedings of the workshop on Memory Systems Performance and Correctness. ACM, 8. Google Scholar
Digital Library
- Jonathan Shidal, Ari J Spilo, Paul T Scheid, Ron K Cytron, and Krishna M Kavi. 2015. Recycling trash in cache. In ACM SIGPLAN Notices, Vol. 50. ACM, 118–130. Google Scholar
Digital Library
- Mohit Tiwari, Banit Agrawal, Shashidhar Mysore, Jonathan Valamehr, and Timothy Sherwood. 2008. A Small Cache of Large Ranges: Hardware Methods for Efficiently Searching, Storing, and Updating Big Dataflow Tags. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). IEEE Computer Society, Washington, DC, USA, 94–105. Google Scholar
Digital Library
- Maira Wenzel, Alan Dawkins, Luke Latham, Tom Pratt, Mike Jones, Michal Ciechan, and Xaviex. 2017. Fundamentals of Garbage Collection. (Mar 2017). https://msdn.microsoft.com/en-us/library/ ee787088(v=vs.110).aspxGoogle Scholar
- Paul R Wilson, Michael S Lam, and Thomas G Moher. 1992. Caching considerations for generational garbage collection. In ACM SIGPLAN Lisp Pointers. ACM, 32–42. Google Scholar
Digital Library
- David S Wise, Brian Heck, Caleb Hess, Willie Hunt, and Eric Ost. 1997. Research demonstration of a hardware reference-counting heap. Lisp and Symbolic Computation 10, 2 (1997), 159–181. Google Scholar
Digital Library
- Xi Yang, Stephen M. Blackburn, Daniel Frampton, Jennifer B. Sartor, and Kathryn S. McKinley. 2011. Why Nothing Matters: The Impact of Zeroing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’11). ACM, New York, NY, USA, 307–324. Google Scholar
Digital Library
- Yi Zhao, Jin Shi, Kai Zheng, Haichuan Wang, Haibo Lin, and Ling Shao. 2009. Allocation Wall: A Limiting Factor of Java Applications on Emerging Multi-core Platforms. SIGPLAN Not. 44, 10 (Oct. 2009), 361–376. Google Scholar
Digital Library
Index Terms
Hardware-software co-optimization of memory management in dynamic languages
Recommendations
Hardware-software co-optimization of memory management in dynamic languages
ISMM 2018: Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory ManagementDynamic programming languages are becoming increasingly popular, yet often show a significant performance slowdown compared to static languages. In this paper, we study the performance overhead of automatic memory management in dynamic languages. We ...
Designing a Modern Memory Hierarchy with Hardware Prefetching
In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-...
Idle time garbage collection scheduling
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationEfficient garbage collection is increasingly important in today's managed language runtime systems that demand low latency, low memory consumption, and high throughput. Garbage collection may pause the application for many milliseconds to identify live ...







Comments