skip to main content
research-article

Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform memory access times. As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. We find that optimizing only for data locality can counteract the benefits of cache contention avoidance and vice versa. Therefore, system software must take both data locality and cache contention into account to achieve good performance, and memory management cannot be decoupled from process scheduling. We present a detailed analysis of a commercially available NUMA-multicore architecture, the Intel Nehalem. We describe two scheduling algorithms: maximum-local, which optimizes for maximum data locality, and its extension, N-MASS, which reduces data locality to avoid the performance degradation caused by cache contention. N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations.

References

  1. M. Awasthi, D. W. Nellans, K. Sudan, R. Balasubramonian, and A. Davis. Handling the problems and opportunities posed by multiple on-chip memory controllers. In PACT'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Banikazemi, D. Poff, and B. Abali. PAM: a novel performance/power aware meta-scheduler for multi-core systems. In SC'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Blagodurov, S. Zhuravlev, and A. Fedorova. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In ATEC'05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Hackenberg, D. Molka, and W. E. Nagel. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. In MICRO 42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based QoS techniques for cache/memory in CMP platforms. In ICS'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual, January 2011.Google ScholarGoogle Scholar
  11. Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multicore systems. IEEE Micro, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Koufaty, D. Reddy, and S. Hahn. Bias scheduling in heterogeneous multi-core architectures. In EuroSys'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Li, H. L. Sudarsan, M. Stumm, and K. C. Sevcik. Locality and loop scheduling on NUMA multiprocessors. In ICPP'93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Majo and T. R. Gross. Memory system performance in a NUMA multicore multiprocessor. In SYSTOR'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Marathe and F. Mueller. Hardware profile-guided automatic page placement for ccNUMA systems. In PPoPP'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross core interference through contention synthesis. In HiPEAC'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In CGO'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Molka, D. Hackenberg, R. Schöne, and M. S. Müller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In PACT'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In ASPLOS'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO 39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. C. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A comprehensive scheduler for asymmetric multicore processors. In EuroSys'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Sandberg, D. Eklöv, and E. Hagersten. Reducing cache pollution through detection and elimination of non-temporal memory accesses. In SC'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In ASPLOS '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. M. Tikir and J. K. Hollingsworth. Hardware monitors for dynamic page migration. Journal of Parallel and Distributed Computing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Verghese, S. Devine, A. Gupta, and M. Rosenblum. Operating system support for improving data locality on CC-NUMA compute servers. In ASPLOS'96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zhuralev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS'10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 11
      ISMM '11
      November 2011
      135 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2076022
      Issue’s Table of Contents
      • cover image ACM Conferences
        ISMM '11: Proceedings of the international symposium on Memory management
        June 2011
        148 pages
        ISBN:9781450302630
        DOI:10.1145/1993478

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 June 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!