skip to main content
research-article

wfspan: Wait-free Dynamic Memory Management

Published:23 August 2022Publication History
Skip Abstract Section

Abstract

Dynamic memory allocation plays a vital role in modern application programs. Modern lock-free memory allocators based on hardware atomic primitives usually provide good performance. However, threads may starve in these lock-free implementations, leading to unbounded worst-case execution time that is not allowed in real-time embedded systems. This article presents decentralized dynamic memory management, wfspan, based on non-linearizable wait-free lists. It employs a helping mechanism to ensure no starvation in the lock-free implementation. From the perspective of design tradeoff, wfspan guarantees bounded execution steps in both allocation and deallocation procedure, at the cost of increasing bounded worst-case memory footprint. The results of running benchmarks on an x86/64 and an aarch64 machine illustrate that wfspan achieves competitive performance and memory footprint compared to lock-based and lock-free practical memory allocators while showing superior to other allocators in terms of worst-case execution time.

REFERENCES

  1. [1] AG OLogN Technologies. 2018. Retrieved from alloc-test. https://github.com/node-dot-cpp/alloc-test.Google ScholarGoogle Scholar
  2. [2] Appel Andrew W. and Naumann David A.. 2020. Verified sequential malloc/free. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management. 4859.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] ARM. 2020. ARM BIG.LITTLE. Retrieved from https://www.arm.com/why-arm/technologies/big-little.Google ScholarGoogle Scholar
  4. [4] Berger Emery D., McKinley Kathryn S., Blumofe Robert D., and Wilson Paul R.. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). Association for Computing Machinery, New York, NY, 117128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Correia Andreia, Ramalhete Pedro, and Felber Pascal. 2020. A wait-free universal construction for large objects. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 102116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Moura Leonardo de and Bjørner Nikolaj. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, Ramakrishnan C. R. and Rehof Jakob (Eds.). Springer, Berlin, 337340.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Dice Dave and Garthwaite Alex. 2002. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management (ISMM’02). Association for Computing Machinery, New York, NY, 163174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Evans Jason. 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD. Retrieved from https://www.bsdcan.org/2006/papers/jemalloc.pdf.Google ScholarGoogle Scholar
  9. [9] Fatourou Panagiota and Kallimanis Nikolaos D.. 2014. Highly-efficient wait-free synchronization. Theor. Comput. Syst. 55, 3 (October 2014), 475520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Google. 2021. TCMalloc: Thread-Caching Malloc. Retrieved from https://google.github.io/tcmalloc/design.html.Google ScholarGoogle Scholar
  11. [11] Grunwald Dirk, Zorn Benjamin, and Henderson Robert. 1993. Improving the cache locality of memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’93). Association for Computing Machinery, New York, NY, 177186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Herlihy Maurice. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (Jan. 1991), 124149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Herlihy M., Luchangco V., and Moir M.. 2003. Obstruction-free synchronization: Double-ended queues as an example. In Proceedings of the 23rd International Conference on Distributed Computing Systems.522529. Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Herter J., Backes P., Haupenthal F., and Reineke J.. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 2332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Intel. 2021. New Intel Core Processors with Intel Hybrid Technology. Retrieved from https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html.Google ScholarGoogle Scholar
  16. [16] Jansson Mattias. 2018. Rampant Pixels Memory Allocator Benchmark. Retrieved from https://github.com/mjansson/rpmalloc-benchmark.Google ScholarGoogle Scholar
  17. [17] Kleen Andi. 2013. Lock Elision in the GNU C Library. Retrieved from https://lwn.net/Articles/534758/.Google ScholarGoogle Scholar
  18. [18] Kogan Alex and Petrank Erez. 2011. Wait-free queues with multiple enqueuers and dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). Association for Computing Machinery, New York, NY, 223234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Kogan Alex and Petrank Erez. 2012. A methodology for creating fast wait-free data structures. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). Association for Computing Machinery, New York, NY, 141150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Kuszmaul Bradley C.. 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 International Symposium on Memory Management (ISMM’15). Association for Computing Machinery, New York, NY, 4155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Larson Per-Åke and Krishnan Murali. 1998. Memory allocation for long-running server applications. In Proceedings of the 1st International Symposium on Memory Management (ISMM’98). Association for Computing Machinery, New York, NY, 176185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Leijen Daan. 2021. mimalloc-bench. Retrieved from https://github.com/daanx/mimalloc-bench.Google ScholarGoogle Scholar
  23. [23] Leijen Daan, Zorn Benjamin, and Moura Leonardo de. 2019. Mimalloc: Free list sharding in action. In Programming Languages and Systems, Lin Anthony Widjaja (Ed.). Springer International Publishing, Cham, 244265.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lever Chuck and Boreham David. 2000. Malloc() Performance in a Multithreaded Linux Environment. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (San Diego, California) (ATEC’00). USENIX Association, USA, 301–311.Google ScholarGoogle Scholar
  25. [25] Liétar Paul, Butler Theodore, Clebsch Sylvan, Drossopoulou Sophia, Franco Juliana, Parkinson Matthew J., Shamis Alex, Wintersteiger Christoph M., and Chisnall David. 2019. Snmalloc: A message passing allocator. In Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM’19). Association for Computing Machinery, New York, NY, 122135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Masmano M., Ripoll I., Crespo A., and Real J.. 2004. TLSF: A new dynamic memory allocator for real-time systems. In Proceedings of the 16th Euromicro Conference on Real-Time Systems (ECRTS’04). IEEE Computer Society, USA, 7986.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Michael Maged M.. 2004. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst. 15, 6 (June 2004), 491504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Michael Maged M.. 2004. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’04). Association for Computing Machinery, New York, NY, 3546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Michael Maged M. and Scott Michael L.. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). Association for Computing Machinery, New York, NY, 267275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] microquill. 2007. Smart Heap. Retrieved from http://www.microquill.com/smartheap/sh_tspec.htm.Google ScholarGoogle Scholar
  31. [31] Nikolaev Ruslan and Ravindran Binoy. 2020. Universal wait-free memory reclamation. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’20). Association for Computing Machinery, New York, NY, 130143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Peng Yaqiong and Hao Zhiyu. 2018. FA-Stack: A fast array-based stack with wait-free progress guarantee. IEEE Trans. Parallel Distrib. Syst. 29, 4 (2018), 843857. Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Petrank Erez, Musuvathi Madanlal, and Steesngaard Bjarne. 2009. Progress guarantee for parallel programs via bounded lock-freedom. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). Association for Computing Machinery, New York, NY, 144154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Puaut Isabelle. 2002. Real-time performance of dynamic memory allocation algorithms. In Proceedings of the 14th Euromicro Conference on Real-Time Systems (Euromicro RTS’02). IEEE, 4149.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Rajwar Ravi and Goodman James R.. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’34). IEEE Computer Society, 294305.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Ramalhete Pedro and Correia Andreia. 2017. POSTER: A wait-free queue with wait-free memory reclamation. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’17). Association for Computing Machinery, New York, NY, 453454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Roghanchi Sepideh, Eriksson Jakob, and Basu Nilanjana. 2017. Ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). Association for Computing Machinery, New York, NY, 342358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Rushby John. 1999. Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance. Technical Report NASA/CR-1999-209347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Sewell Peter, Sarkar Susmit, Owens Scott, Nardelli Francesco Zappa, and Myreen Magnus O.. 2010. X86-TSO: A rigorous and usable programmer’s model for X86 multiprocessors. Commun. ACM 53, 7 (July 2010), 8997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Stellwag Philippe, Krainz Jakob, and Schröder-Preikschat Wolfgang. 2010. A waitfree dynamic storage allocator by adopting the helping queue pattern. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (Innsbruck, Austria). ACTA Press, Calgary, AB, Canada, 7987. Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Timnat Shahar and Petrank Erez. 2014. A practical wait-free simulation for lock-free data structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). Association for Computing Machinery, New York, NY, 357368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wen Haosen, Izraelevitz Joseph, Cai Wentao, Beadle H. Alan, and Scott Michael L.. 2018. Interval-based memory reclamation. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’18). Association for Computing Machinery, New York, NY, 113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Woo Steven Cameron, Ohara Moriyoshi, Torrie Evan, Singh Jaswinder Pal, and Gupta Anoop. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA’95). Association for Computing Machinery, New York, NY, 2436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Yang Chaoran and Mellor-Crummey John. 2016. A wait-free queue as fast as fetch-and-add. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). Association for Computing Machinery, New York, NY, Article 16, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Yun H., Mancuso R., Wu Z., and Pellizzoni R.. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’14). 155166. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Zhao Yongwang and Sanán David. 2019. Rely-guarantee reasoning about concurrent memory management in zephyr RTOS. In Proceedings of the 31st International Conference on Computer Aided Verification (CAV’19), New York City, NY, USA, July 15-18, 2019, Proceedings, Part II (Lecture Notes in Computer Science), Dillig Isil and Tasiran Serdar (Eds.), Vol. 11562. Springer, 515533. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. wfspan: Wait-free Dynamic Memory Management

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
          July 2022
          330 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/3551651
          • Editor:
          • Tulika Mitra
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 August 2022
          • Online AM: 4 May 2022
          • Revised: 1 April 2022
          • Accepted: 1 April 2022
          • Received: 1 July 2021
          Published in tecs Volume 21, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!