Abstract
Dynamic memory allocation plays a vital role in modern application programs. Modern lock-free memory allocators based on hardware atomic primitives usually provide good performance. However, threads may starve in these lock-free implementations, leading to unbounded worst-case execution time that is not allowed in real-time embedded systems. This article presents decentralized dynamic memory management, wfspan, based on non-linearizable wait-free lists. It employs a helping mechanism to ensure no starvation in the lock-free implementation. From the perspective of design tradeoff, wfspan guarantees bounded execution steps in both allocation and deallocation procedure, at the cost of increasing bounded worst-case memory footprint. The results of running benchmarks on an x86/64 and an aarch64 machine illustrate that wfspan achieves competitive performance and memory footprint compared to lock-based and lock-free practical memory allocators while showing superior to other allocators in terms of worst-case execution time.
- [1] . 2018. Retrieved from alloc-test. https://github.com/node-dot-cpp/alloc-test.Google Scholar
- [2] . 2020. Verified sequential malloc/free. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management. 48–59.Google Scholar
Digital Library
- [3] . 2020. ARM BIG.LITTLE. Retrieved from https://www.arm.com/why-arm/technologies/big-little.Google Scholar
- [4] . 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). Association for Computing Machinery, New York, NY, 117–128. Google Scholar
Digital Library
- [5] . 2020. A wait-free universal construction for large objects. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 102–116. Google Scholar
Digital Library
- [6] . 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, and (Eds.). Springer, Berlin, 337–340.Google Scholar
Cross Ref
- [7] . 2002. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management (ISMM’02). Association for Computing Machinery, New York, NY, 163–174. Google Scholar
Digital Library
- [8] . 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD. Retrieved from https://www.bsdcan.org/2006/papers/jemalloc.pdf.Google Scholar
- [9] . 2014. Highly-efficient wait-free synchronization. Theor. Comput. Syst. 55, 3 (
October 2014), 475–520. Google ScholarDigital Library
- [10] . 2021. TCMalloc: Thread-Caching Malloc. Retrieved from https://google.github.io/tcmalloc/design.html.Google Scholar
- [11] . 1993. Improving the cache locality of memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’93). Association for Computing Machinery, New York, NY, 177–186. Google Scholar
Digital Library
- [12] . 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (
Jan. 1991), 124–149. Google ScholarDigital Library
- [13] . 2003. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems.522–529. Google Scholar
Cross Ref
- [14] . 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23–32.Google Scholar
Digital Library
- [15] . 2021. New Intel Core Processors with Intel Hybrid Technology. Retrieved from https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html.Google Scholar
- [16] . 2018. Rampant Pixels Memory Allocator Benchmark. Retrieved from https://github.com/mjansson/rpmalloc-benchmark.Google Scholar
- [17] . 2013. Lock Elision in the GNU C Library. Retrieved from https://lwn.net/Articles/534758/.Google Scholar
- [18] . 2011. Wait-free queues with multiple enqueuers and dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). Association for Computing Machinery, New York, NY, 223–234. Google Scholar
Digital Library
- [19] . 2012. A methodology for creating fast wait-free data structures. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). Association for Computing Machinery, New York, NY, 141–150. Google Scholar
Digital Library
- [20] . 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 International Symposium on Memory Management (ISMM’15). Association for Computing Machinery, New York, NY, 41–55. Google Scholar
Digital Library
- [21] . 1998. Memory allocation for long-running server applications. In Proceedings of the 1st International Symposium on Memory Management (ISMM’98). Association for Computing Machinery, New York, NY, 176–185. Google Scholar
Digital Library
- [22] . 2021. mimalloc-bench. Retrieved from https://github.com/daanx/mimalloc-bench.Google Scholar
- [23] . 2019. Mimalloc: Free list sharding in action. In Programming Languages and Systems, (Ed.). Springer International Publishing, Cham, 244–265.Google Scholar
Cross Ref
- [24] . 2000. Malloc() Performance in a Multithreaded Linux Environment. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (San Diego, California) (ATEC’00). USENIX Association, USA, 301–311.Google Scholar
- [25] . 2019. Snmalloc: A message passing allocator. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’19). Association for Computing Machinery, New York, NY, 122–135. Google Scholar
Digital Library
- [26] . 2004. TLSF: A new dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time Systems (ECRTS’04). IEEE Computer Society, USA, 79–86.Google Scholar
Cross Ref
- [27] . 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 6 (
June 2004), 491–504. Google ScholarDigital Library
- [28] . 2004. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Association for Computing Machinery, New York, NY, 35–46. Google Scholar
Digital Library
- [29] . 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). Association for Computing Machinery, New York, NY, 267–275. Google Scholar
Digital Library
- [30] . 2007. Smart Heap. Retrieved from http://www.microquill.com/smartheap/sh_tspec.htm.Google Scholar
- [31] . 2020. Universal wait-free memory reclamation. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 130–143. Google Scholar
Digital Library
- [32] . 2018. FA-Stack: A fast array-based stack with wait-free progress guarantee. IEEE Trans. Parallel Distrib. Syst. 29, 4 (2018), 843–857. Google Scholar
Cross Ref
- [33] . 2009. Progress guarantee for parallel programs via bounded lock-freedom. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). Association for Computing Machinery, New York, NY, 144–154. Google Scholar
Digital Library
- [34] . 2002. Real-time performance of dynamic memory allocation algorithms. In Proceedings of the 14th Euromicro Conference on Real-Time Systems (Euromicro RTS’02). IEEE, 41–49.Google Scholar
Cross Ref
- [35] . 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’34). IEEE Computer Society, 294–305.Google Scholar
Cross Ref
- [36] . 2017. POSTER: A wait-free queue with wait-free memory reclamation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17). Association for Computing Machinery, New York, NY, 453–454. Google Scholar
Digital Library
- [37] . 2017. Ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). Association for Computing Machinery, New York, NY, 342–358. Google Scholar
Digital Library
- [38] . 1999. Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance. Technical Report NASA/CR-1999-209347. Google Scholar
Digital Library
- [39] . 2010. X86-TSO: A rigorous and usable programmer’s model for X86 multiprocessors. Commun. ACM 53, 7 (
July 2010), 89–97. Google ScholarDigital Library
- [40] . 2010. A waitfree dynamic storage allocator by adopting the helping queue pattern. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (Innsbruck, Austria). ACTA Press, Calgary, AB, Canada, 79–87. Google Scholar
Cross Ref
- [41] . 2014. A practical wait-free simulation for lock-free data structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). Association for Computing Machinery, New York, NY, 357–368. Google Scholar
Digital Library
- [42] . 2018. Interval-based memory reclamation. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). Association for Computing Machinery, New York, NY, 1–13. Google Scholar
Digital Library
- [43] . 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). Association for Computing Machinery, New York, NY, 24–36. Google Scholar
Digital Library
- [44] . 2016. A wait-free queue as fast as fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Association for Computing Machinery, New York, NY, Article
16 , 13 pages. Google ScholarDigital Library
- [45] . 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’14). 155–166. Google Scholar
Cross Ref
- [46] . 2019. Rely-guarantee reasoning about concurrent memory management in zephyr RTOS. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV’19), New York City, NY, USA, July 15-18, 2019, Proceedings, Part II (Lecture Notes in Computer Science), and (Eds.), Vol. 11562. Springer, 515–533. Google Scholar
Cross Ref
Index Terms
wfspan: Wait-free Dynamic Memory Management
Recommendations
Universal wait-free memory reclamation
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingIn this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide ...
A Wait-Free Multi-Word Compare-and-Swap Operation
The number of cores in future multi-core systems are expected to increase by 100 fold over the next decade. The fine-grained synchronization methods found in wait-free algorithm designs makes them desirable for these future systems. Unfortunately, such ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsThe non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...






Comments