skip to main content
research-article
Open Access

Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures

Published:23 October 2015Publication History
Skip Abstract Section

Abstract

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a concurrent allocator that generally performs and scales in our experiments better than other allocators while using less memory, and is still competitive otherwise. The main ideas behind the design of scalloc are: uniform treatment of small and big objects through so-called virtual spans, efficiently and effectively reclaiming free memory through fast and scalable global data structures, and constant-time (modulo synchronization) allocation and deallocation operations that trade off memory reuse and spatial locality without being subject to false sharing.

Skip Supplemental Material Section

Supplemental Material

References

  1. Y. Afek, G. Korland, and E. Yanovsky. Quasi-linearizability: Relaxed consistency for improved concurrency. In Proc. Conference on Principles of Distributed Systems (OPODIS), pages 395–410. Springer, 2010. doi: 10.1007/978-3-642-17653-1_ 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Aigner and C. Kirsch. ACDC: Towards a universal mutator for benchmarking heap management systems. In Proc. International Symposium on Memory Management (ISMM), pages 75–84. ACM, 2013. doi: 10.1145/2464157.2464161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: a scalable memory allocator for multithreaded applications. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 117–128. ACM, 2000. doi: 10.1145/384264.379232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Berger, B. Zorn, and K. McKinley. Reconsidering custom memory allocation. In Proc. Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 1–12. ACM, 2002. doi: 10.1145/582419.582421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Blackburn, P. Cheng, and K. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proc. International Conference on Software Engineering (ICSE). IEEE, 2004. doi: 10.1109/ICSE.2004.1317436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Clements, M. Kaashoek, and N. Zeldovich. RadixVM: Scalable address spaces for multithreaded applications. In Proc. ACM European Conference on Computer Systems (EuroSys), pages 211–224. ACM, 2013. doi: 10.1145/2465351.2465373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Dodds, A. Haas, and C. Kirsch. A scalable, correct time-stamped stack. In Proc. Symposium on Principles of Programming Languages (POPL), pages 233–246. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. doi: 10.1145/2775051.2676963.Google ScholarGoogle Scholar
  9. J. Evans. A scalable concurrent malloc(3) implementation for freebsd. In Proc. BSDCan, 2006.Google ScholarGoogle Scholar
  10. W. Gloger. ptmalloc2 – a multi-thread malloc implementation. http://malloc.de/en/.Google ScholarGoogle Scholar
  11. Google Inc. gperftools: Fast, multi-threaded malloc() and nifty performance analysis tools. http://code.google.com/p/ gperftools/.Google ScholarGoogle Scholar
  12. A. Haas, T. Henzinger, C. Kirsch, M. Lippautz, H. Payer, A. Sezgin, and A. Sokolova. Distributed queues in shared memory—multicore performance and scalability through quantitative relaxation. In Proc. International Conference on Computing Frontiers (CF), pages 17:1–17:9. ACM, 2013. doi: 10.1145/2482767.2482789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Haas, T. Henzinger, A. Holzer, C. Kirsch, M. Lippautz, H. Payer, A. Sezgin, A. Sokolova, and H. Veith. Local linearizability. CoRR, abs/1502.07118, 2015.Google ScholarGoogle Scholar
  14. D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proc. Symposium on Parallel Algorithms and Architectures (SPAA), pages 355–364. ACM, 2010. doi: 10.1145/1810479.1810540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1–17, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. doi: 10.1145/1186736.1186737.Google ScholarGoogle Scholar
  17. T. Henzinger, C. Kirsch, H. Payer, A. Sezgin, and A. Sokolova. Quantitative relaxation of concurrent data structures. In Proc. Symposium on Principles of Programming Languages (POPL), pages 317–328. ACM, 2013. doi: 10.1145/2429069.2429109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. Hertzberg. Mcrt-malloc: a scalable transactional memory allocator. In Proc. International Symposium on Memory Management (ISMM), pages 74–83. ACM, 2006. doi: 10.1145/1133956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 1133967.Google ScholarGoogle Scholar
  21. Intel Corporation. Thread building blocks (tbb). http: //threadingbuildingblocks.org.Google ScholarGoogle Scholar
  22. A. Kogan and E. Petrank. A methodology for creating fast waitfree data structures. In Proc. Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 141–150. ACM, 2012. doi: 10.1145/2145816.2145835. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Kuszmaul. Supermalloc: A super fast multithreaded malloc for 64-bit machines. In Proc. International Symposium on Memory Management (ISMM), pages 41–55. ACM, 2015. doi: 10.1145/2754169.2754178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P.-A. Larson and M. Krishnan. Memory allocation for longrunning server applications. In Proc. International Symposium on Memory Management (ISMM), pages 176–185. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. doi: 10.1145/286860.286880.Google ScholarGoogle Scholar
  26. D. Lea. A memory allocator. http://g.oswego.edu/dl/ html/malloc.html.Google ScholarGoogle Scholar
  27. Lockless Inc. llalloc: Lockless memory allocator. http: //locklessinc.com/.Google ScholarGoogle Scholar
  28. M. Michael. Hazard pointers: Safe memory reclamation for lock-free objects. IEEE Trans. Parallel Distrib. Syst., 15(6): 491–504, 2004. doi: 10.1109/TPDS.2004.8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Michael. Scalable lock-free dynamic memory allocation. In Proc. Conference on Programming Language Design and Implementation (PLDI), pages 35–46. ACM, 2004. doi: 10. 1145/996893.996848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MicroQuill Inc. shbench. http://www.microquill.com/.Google ScholarGoogle Scholar
  31. S. Schneider, C. Antonopoulos, and D. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proc. International Symposium on Memory Management (ISMM), pages 84–94. ACM, 2006. doi: 10.1145/1133956.1133968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov. Addresssanitizer: A fast address sanity checker. In Proc. USENIX Conference on Annual Technical Conference (USENIX ATC), pages 28–28. USENIX Association, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Treiber. Systems programming: Coping with parallelism. Technical Report RJ-5118, IBM Research Center, 1986.Google ScholarGoogle Scholar

Index Terms

  1. Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 50, Issue 10
            OOPSLA '15
            October 2015
            953 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2858965
            • Editor:
            • Andy Gill
            Issue’s Table of Contents
            • cover image ACM Conferences
              OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications
              October 2015
              953 pages
              ISBN:9781450336895
              DOI:10.1145/2814270

            Copyright © 2015 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 October 2015

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!