skip to main content
research-article

Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

Published:01 April 2012Publication History
Skip Abstract Section

Abstract

Cores in chip-multiprocessors (CMPs) share multiple memory subsystem resources. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms for each resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and performance loss. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable.

This article proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each resource. Our technique, Fairness via Source Throttling (FST), estimates unfairness in the entire memory system. If unfairness is above a system-software-set threshold, FST throttles down cores causing unfairness by limiting the number of requests they create and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST enforces thread priorities/weights, and enables system-software to enforce different fairness objectives in the memory system.

Our evaluations show that FST provides the best system fairness and performance compared to three systems with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.

References

  1. Advanced Micro Devices. 2009. AMD’s six-core Opteron processors. http://techreport.com/articles.x/17005.Google ScholarGoogle Scholar
  2. Bitirgen, R., Ipek, E., and Martinez, J. F. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cazorla, F. J., Ramirez, A., Valero, M., Knijnenburg, P. M. W., Sakellariou, R., and Fernandez, E. 2004. QoS for high-performance SMT processors in embedded systems. IEEE Micro. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chou, Y., Fahs, B., and Abraham, S. 2004. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the 31st Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2009. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2011. Prefetch-aware shared-resource management for multi-core systems. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ebrahimi, E., Miftakhutdinov, R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., and Patt, Y. 2011. Parallel application memory scheduling. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eyerman, S. and L. Eeckhout, L. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Eyerman, S. and L. Eeckhout, L. 2009. Per-thread cycle accounting in SMT processors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fedorova, A., Seltzer, M., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gabor, R., Weiss, S., and Mendelson, A. 2006. Fairness and throughput in switch on event multithreading. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Glew, A. 1998. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session’98.Google ScholarGoogle Scholar
  15. Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-a-chip. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Herdrich, A., Illikkal, R., Iyer, R., Newell, D., Chadha, V., and Moses, J. 2009. Rate-based QoS techniques for cache/memory in CMP platforms. In Proceedings of the International Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S. 2006. Communist, utilitarian and capitalist cache policies on cmps: Caches as a shared resource. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel. 2008. First the tick, now the tock: Next generation Intel microarchitecure (Nehalem). Intel technical white paper.Google ScholarGoogle Scholar
  19. Iyer, R. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the International Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jahre, M. and Natvig, L. 2009. A light-weight fairness mechanism for chip multiprocessor memory systems. In Proceedings of the ACM International Conference on Computing Frontiers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  24. Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kroft, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, J. W., Ng, M. C., and Asanovic, K. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Luo, K., Franklin, M., Mukherjee, S., and Sezne, A. 2001a. Boosting SMT performance by speculation control. In Proceedings of the International Parallel and Distributed Processing Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Luo, K., Gummaraju, J., and Franklin, M. 2001b. Balancing throughput and fairness in SMT processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.Google ScholarGoogle Scholar
  29. Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google ScholarGoogle Scholar
  30. Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In Proceedings of the 16th USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., and Moscibroda, T. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nesbit, K. J., Laudon, J., and Smith, J. E. Virtual private caches. 2007. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. 2004. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Snavely, A. and Tullsen, D. M. 2000. Symbiotic job scheduling for a simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Srinath, S., Mutlu, O., Kim, H., and Patt, Y. 2007. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tang, L., Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tendler, J., Dodson, S., Field, S., Le, H., and Sinharoy, B. 2001. POWER4 system microarchitecture. IBM technical white paper. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tullsen, D. M. and. Eggers, S. J. 1996. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Wechsler, O. 2006. Inside Intel core microarchitecure. Intel technical white paper.Google ScholarGoogle Scholar
  44. Zhang, X., Dwarkadas, S., and Shen, K. 2009. Hardware execution throttling for multi-core resource management. In Proceedings of USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention inmulticore processors via scheduling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computer Systems
        ACM Transactions on Computer Systems  Volume 30, Issue 2
        April 2012
        111 pages
        ISSN:0734-2071
        EISSN:1557-7333
        DOI:10.1145/2166879
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 April 2012
        • Accepted: 1 January 2012
        • Revised: 1 December 2011
        • Received: 1 March 2011
        Published in tocs Volume 30, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!