Abstract
Procrastination is the fundamental technique used in synchronization mechanisms such as Read-Copy-Update (RCU) where writers, in order to synchronize with readers, defer the freeing of an object until there are no readers referring to the object. The synchronization mechanism determines when the deferred object is safe to reclaim and when it is actually reclaimed. Hence, such memory reclamations are completely oblivious of the memory allocator state. This induces poor memory allocator performance, for instance, when the reclamations are ill-timed. Furthermore, deferred objects provide hints about the future that inform memory regions that are about to be freed. Although useful, hints are not exploited as deferred objects are not visible to memory allocators. We introduce Prudence, a dynamic memory allocator, that is tightly integrated with the synchronization mechanism to ensure visibility of deferred objects to the memory allocator. Such an integration enables Prudence to (i) identify the safe time to reclaim deferred objects' memory, (ii) have an inclusive view of the allocated, free and about-to-be-freed objects, and (iii) exploit optimizations based on the hints about the future during important state transitions. Our evaluation in the Linux kernel shows that Prudence integrated with RCU performs 3.9X to 28X better in micro-benchmarks compared to SLUB, a recent memory allocator in the Linux kernel. It also improves the overall performance perceptibly (4%-18%) for a mix of widely used synthetic and application benchmarks. Further, it performs better (up to 98%) in terms of object hits in caches, object cache churns, slab churns, peak memory usage and total fragmentation, when compared with the SLUB allocator.
- Maya Arbel and Hagit Attiya. Concurrent updates with rcu: Search tree as an example. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC '14, pages 196--205, New York, NY, USA, 2014. ACM.Google Scholar
Digital Library
- Andrea Arcangeli, Mingming Cao, Paul E McKenney, and Dipankar Sarma. Using Read-Copy-Update Techniques for System V IPC in the Linux 2.5 Kernel. In USENIX Annual Technical Conference, FREENIX Track, pages 297--309, 2003.Google Scholar
- David A Barrett and Benjamin G Zorn. Using lifetime predictors to improve memory allocation performance. In ACM SIGPLAN Notices, volume 28, pages 187--196. ACM, 1993.Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- Andrew Baumann, Jeremy Kerr, Jonathan Appavoo, Dilma Da Silva, Orran Krieger, and Robert W Wisniewski. Module hot-swapping for dynamic update and reconfiguration in K42. In 6th Linux. Conf. Au, 2005.Google Scholar
- Emery D Berger, Kathryn S McKinley, Robert D Blumofe, and Paul R Wilson. Hoard: A scalable memory allocator for multithreaded applications. ACM Sigplan Notices, 35(11):117--128, 2000.Google Scholar
Digital Library
- Emery D Berger, Benjamin G Zorn, and Kathryn S McKinley. Oopsla 2002: reconsidering custom memory allocation. ACM SIGPLAN Notices, 48(4S):46--57, 2013.Google Scholar
- Jeff Bonwick. The Slab Allocator: An Object-Caching Kernel Memory Allocator. In USENIX summer, volume 16. Boston, MA, USA, 1994.Google Scholar
Digital Library
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. Scalable address spaces using rcu balanced trees. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 199--210, New York, NY, USA, 2012. ACM.Google Scholar
Digital Library
- Jonathan Corbet. The SLUB allocator. http://lwn.net/Articles/229984/, 2007.Google Scholar
- Jonathan Corbet. Relocating rcu callbacks. http://lwn.net/Articles/522262/, 2012.Google Scholar
- Jonathan Corbet. Epoll evolving. https://lwn.net/Articles/633422/, 2015.Google Scholar
- Jason Evans. A scalable concurrent malloc (3) implementation for freebsd. In Proc. of the BSDCan Conference, Ottawa, Canada, 2006.Google Scholar
- The Apache Software Foundation. Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.2/programs/ab.html, 2015.Google Scholar
- The Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/, 2015.Google Scholar
- Benjamin Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In OSDI, volume 99, pages 87--100, 1999.Google Scholar
- Richard Golding, Peter Bosch, John Wilkes, USENIX Association, et al. Idleness is not sloth. In USENIX, pages 201--212, 1995.Google Scholar
- Mel Gorman. Understanding the Linux virtual memory manager. Prentice Hall, 2004.Google Scholar
Digital Library
- The PostgreSQL Global Development Group. pgbench. http://www.postgresql.org/docs/devel/static/pgbench.html, 2015.Google Scholar
- The PostgreSQL Global Development Group. PostgreSQL. http://www.postgresql.org/, 2015.Google Scholar
- Dinakar Guniguntala, Paul E McKenney, Josh Triplett, and Jonathan Walpole. The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux. IBM Systems Journal, 47(2):221--236, 2008.Google Scholar
Digital Library
- Thomas E Hart, Paul E McKenney, Angela Demke Brown, and Jonathan Walpole. Performance of memory reclamation for lockless synchronization. Journal of Parallel and Distributed Computing, 67(12):1270--1285, 2007.Google Scholar
Digital Library
- Philip W Howard and Jonathan Walpole. Relativistic red-black trees. Concurrency and Computation: Practice and Experience, 2013.Google Scholar
- Hajime Inoue, Darko Stefanović, and Stephanie Forrest. On the prediction of java object lifetimes. Computers, IEEE Transactions on, 55(7):880--892, 2006.Google Scholar
Digital Library
- Rick Jones. NetPerf. http://www.netperf.org/, 2012.Google Scholar
- Jeffrey Katcher. Postmark: A new file system benchmark. Technical report, Technical Report TR3022, Network Appliance, 1997. www.netapp.com/tech_library/3022.html, 1997.Google Scholar
- H. T. Kung and Philip L. Lehman. Concurrent manipulation of binary search trees. ACM Trans. Database Syst., 5(3):354--382, September 1980.Google Scholar
Digital Library
- Christoph Lameter. SLUB: The unqueued slab allocator. http://lwn.net/Articles/229096/, 2007.Google Scholar
- Ran Liu, Heng Zhang, and Haibo Chen. Scalable read-mostly synchronization using passive reader-writer locks. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 219--230, Philadelphia, PA, June 2014. USENIX Association.Google Scholar
Digital Library
- Yandong Mao, Eddie Kohler, and Robert Tappan Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European conference on Computer Systems, pages 183--196. ACM, 2012.Google Scholar
Cross Ref
- Paul E McKenney. Exploiting Deferred Destructions: An Analysis of Read-Copy-Update Techniques in Operating System kernels. PhD thesis, Oregon Health & Science University, 2004.Google Scholar
- Paul E McKenney. Structured deferral: synchronization via procrastination. Communications of the ACM, 56(7):40--49, 2013.Google Scholar
Digital Library
- Paul E McKenney. RCU Linux usage. www.rdrop.com/users/paulmck/RCU/linuxusage.html, 2014.Google Scholar
- Paul E McKenney, Jonathan Appavoo, Andi Kleen, Orran Krieger, Rusty Russell, Dipankar Sarma, and Maneesh Soni. Read-copy update. In AUUG Conference Proceedings, page 175. AUUG, Inc., 2001.Google Scholar
- Paul E McKenney, Dipankar Sarma, Ingo Molnar, and Suparna Bhattacharya. Extending RCU for realtime and embedded workloads. In Ottawa Linux Symposium, pages v2, pages 123--138, 2006.Google Scholar
- Paul E McKenney, Dipankar Sarma, and Maneesh Soni. Scaling dcache with RCU. Linux Journal, 2004(117):3, 2004.Google Scholar
Digital Library
- Paul E McKenney and John D Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Parallel and Distributed Computing and Systems, pages 509--518, 1998.Google Scholar
- James Morris. Have You Driven an SELinux Lately? In Linux Symposium Proceedings, 2008.Google Scholar
- Robert Olsson and Stefan Nilsson. Trash a dynamic lc-trie and hash data structure. In High Performance Switching and Routing, 2007. HPSR'07. Workshop on, pages 1--6. IEEE, 2007.Google Scholar
Cross Ref
- Dipankar Sarma and Paul E McKenney. Making RCU safe for deep sub-millisecond response realtime applications. In Proceedings of the 2004 USENIX Annual Technical Conference (FREENIX Track), pages 182--191, 2004.Google Scholar
- Josh Triplett, Paul E McKenney, and Jonathan Walpole. Scalable concurrent hash tables via relativistic programming. ACM SIGOPS Operating Systems Review, 44(3):102--109, 2010.Google Scholar
Digital Library
- Josh Triplett, Paul E McKenney, and Jonathan Walpole. Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming. In USENIX Annual Technical Conference, page 11, 2011.Google Scholar
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 18--32. ACM, 2013.Google Scholar
Digital Library
- David Wentzlaff and Anant Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev., 43:76--85, April 2009.Google Scholar
Digital Library
Index Terms
Prudent Memory Reclamation in Procrastination-Based Synchronization
Recommendations
Prudent Memory Reclamation in Procrastination-Based Synchronization
ASPLOS'16Procrastination is the fundamental technique used in synchronization mechanisms such as Read-Copy-Update (RCU) where writers, in order to synchronize with readers, defer the freeing of an object until there are no readers referring to the object. The ...
Prudent Memory Reclamation in Procrastination-Based Synchronization
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsProcrastination is the fundamental technique used in synchronization mechanisms such as Read-Copy-Update (RCU) where writers, in order to synchronize with readers, defer the freeing of an object until there are no readers referring to the object. The ...
Performance of memory reclamation for lockless synchronization
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace locks with non-blocking synchronization to protect against deadlock, ...







Comments