Abstract
We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a concurrent allocator that generally performs and scales in our experiments better than other allocators while using less memory, and is still competitive otherwise. The main ideas behind the design of scalloc are: uniform treatment of small and big objects through so-called virtual spans, efficiently and effectively reclaiming free memory through fast and scalable global data structures, and constant-time (modulo synchronization) allocation and deallocation operations that trade off memory reuse and spatial locality without being subject to false sharing.
Supplemental Material
Available for Download
The artifact can be used to reproduce the results presented in the paper, i.e., it can be used to evaluate scalloc against other interesting allocators.
- Y. Afek, G. Korland, and E. Yanovsky. Quasi-linearizability: Relaxed consistency for improved concurrency. In Proc. Conference on Principles of Distributed Systems (OPODIS), pages 395–410. Springer, 2010. doi: 10.1007/978-3-642-17653-1_ 29. Google Scholar
Digital Library
- M. Aigner and C. Kirsch. ACDC: Towards a universal mutator for benchmarking heap management systems. In Proc. International Symposium on Memory Management (ISMM), pages 75–84. ACM, 2013. doi: 10.1145/2464157.2464161. Google Scholar
Digital Library
- E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 117–128. ACM, 2000. doi: 10.1145/384264.379232. Google Scholar
Digital Library
- E. Berger, B. Zorn, and K. McKinley. Reconsidering custom memory allocation. In Proc. Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 1–12. ACM, 2002. doi: 10.1145/582419.582421. Google Scholar
Digital Library
- S. Blackburn, P. Cheng, and K. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proc. International Conference on Software Engineering (ICSE). IEEE, 2004. doi: 10.1109/ICSE.2004.1317436. Google Scholar
Digital Library
- A. Clements, M. Kaashoek, and N. Zeldovich. RadixVM: Scalable address spaces for multithreaded applications. In Proc. ACM European Conference on Computer Systems (EuroSys), pages 211–224. ACM, 2013. doi: 10.1145/2465351.2465373. Google Scholar
Digital Library
- M. Dodds, A. Haas, and C. Kirsch. A scalable, correct time-stamped stack. In Proc. Symposium on Principles of Programming Languages (POPL), pages 233–246. ACM, 2015. Google Scholar
Digital Library
- doi: 10.1145/2775051.2676963.Google Scholar
- J. Evans. A scalable concurrent malloc(3) implementation for freebsd. In Proc. BSDCan, 2006.Google Scholar
- W. Gloger. ptmalloc2 – a multi-thread malloc implementation. http://malloc.de/en/.Google Scholar
- Google Inc. gperftools: Fast, multi-threaded malloc() and nifty performance analysis tools. http://code.google.com/p/ gperftools/.Google Scholar
- A. Haas, T. Henzinger, C. Kirsch, M. Lippautz, H. Payer, A. Sezgin, and A. Sokolova. Distributed queues in shared memory—multicore performance and scalability through quantitative relaxation. In Proc. International Conference on Computing Frontiers (CF), pages 17:1–17:9. ACM, 2013. doi: 10.1145/2482767.2482789. Google Scholar
Digital Library
- A. Haas, T. Henzinger, A. Holzer, C. Kirsch, M. Lippautz, H. Payer, A. Sezgin, A. Sokolova, and H. Veith. Local linearizability. CoRR, abs/1502.07118, 2015.Google Scholar
- D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proc. Symposium on Parallel Algorithms and Architectures (SPAA), pages 355–364. ACM, 2010. doi: 10.1145/1810479.1810540. Google Scholar
Digital Library
- J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1–17, 2006. Google Scholar
Digital Library
- doi: 10.1145/1186736.1186737.Google Scholar
- T. Henzinger, C. Kirsch, H. Payer, A. Sezgin, and A. Sokolova. Quantitative relaxation of concurrent data structures. In Proc. Symposium on Principles of Programming Languages (POPL), pages 317–328. ACM, 2013. doi: 10.1145/2429069.2429109. Google Scholar
Digital Library
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., 2008. Google Scholar
Digital Library
- R. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. Hertzberg. Mcrt-malloc: a scalable transactional memory allocator. In Proc. International Symposium on Memory Management (ISMM), pages 74–83. ACM, 2006. doi: 10.1145/1133956. Google Scholar
Digital Library
- 1133967.Google Scholar
- Intel Corporation. Thread building blocks (tbb). http: //threadingbuildingblocks.org.Google Scholar
- A. Kogan and E. Petrank. A methodology for creating fast waitfree data structures. In Proc. Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 141–150. ACM, 2012. doi: 10.1145/2145816.2145835. Google Scholar
Digital Library
- B. Kuszmaul. Supermalloc: A super fast multithreaded malloc for 64-bit machines. In Proc. International Symposium on Memory Management (ISMM), pages 41–55. ACM, 2015. doi: 10.1145/2754169.2754178. Google Scholar
Digital Library
- P.-A. Larson and M. Krishnan. Memory allocation for longrunning server applications. In Proc. International Symposium on Memory Management (ISMM), pages 176–185. ACM, 1998. Google Scholar
Digital Library
- doi: 10.1145/286860.286880.Google Scholar
- D. Lea. A memory allocator. http://g.oswego.edu/dl/ html/malloc.html.Google Scholar
- Lockless Inc. llalloc: Lockless memory allocator. http: //locklessinc.com/.Google Scholar
- M. Michael. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst., 15(6): 491–504, 2004. doi: 10.1109/TPDS.2004.8. Google Scholar
Digital Library
- M. Michael. Scalable lock-free dynamic memory allocation. In Proc. Conference on Programming Language Design and Implementation (PLDI), pages 35–46. ACM, 2004. doi: 10. 1145/996893.996848. Google Scholar
Digital Library
- MicroQuill Inc. shbench. http://www.microquill.com/.Google Scholar
- S. Schneider, C. Antonopoulos, and D. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proc. International Symposium on Memory Management (ISMM), pages 84–94. ACM, 2006. doi: 10.1145/1133956.1133968. Google Scholar
Digital Library
- K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov. Addresssanitizer: A fast address sanity checker. In Proc. USENIX Conference on Annual Technical Conference (USENIX ATC), pages 28–28. USENIX Association, 2012. Google Scholar
Digital Library
- R. Treiber. Systems programming: Coping with parallelism. Technical Report RJ-5118, IBM Research Center, 1986.Google Scholar
Index Terms
Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures
Recommendations
Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsWe demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a concurrent ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsThe non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Cooperating Write Buffer Cache and Virtual Memory Management for Flash Memory Based Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications SymposiumFlash memory is becoming the storage media of choice for mobile devices and embedded systems. The performance of flash memory is impacted by the asymmetric speed of read and write operations, limited number of erase times and the absence of in-place ...






Comments