skip to main content
10.1145/1736020.1736058acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

Authors Info & Claims
Published:13 March 2010Publication History

ABSTRACT

Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms in each individual resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and loss of performance. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable.

This paper proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each individual resource. Our technique, Fairness via Source Throttling (FST), estimates the unfairness in the entire shared memory system. If the estimated unfairness is above a threshold set by system software, FST throttles down cores causing unfairness by limiting the number of requests they can inject into the system and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST also enforces thread priorities/weights, and enables system software to enforce different fairness objectives and fairness-performance tradeoffs in the memory system.

Our evaluations show that FST provides the best system fairness and performance compared to four systems with no fairness control and with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.

References

  1. Advanced Micro Devices. AMD's six-core Opteron processors. http://techreport.com/articles.x/17005, 2009.Google ScholarGoogle Scholar
  2. R. Bitirgen et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. J. Cazorla et al. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Das et al. Application-aware prioritization mechanisms for on-chip networks. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Eyerman and L. Eeckhout. Per-thread cycle accounting in SMT processors. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Fedorova et al. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Gabor et al. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Glew. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session '98, Oct. 1998.Google ScholarGoogle Scholar
  9. B. Grot et al. Preemptive virtual clock: A flexible, efficient, and costeffective QoS scheme for networks-on-a-chip. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Herdrich et al. Rate-based QoS techniques for cache/memory in CMP platforms. In ICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel. First the tick, now the tock: Next generation Intel microarchitecure (Nehalem). Intel Technical White Paper, 2008.Google ScholarGoogle Scholar
  12. R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS--18, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jahre and L. Natvig. A light-weight fairness mechanism for chip multiprocessor memory systems. In Computing Frontiers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kimet al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. W. Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA-35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Luo et al. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  19. K. Luo et al. Boosting SMT performance by speculation control. In IPDPS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 -- 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google ScholarGoogle Scholar
  21. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi--core systems. In USENIX Security, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA--35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. J. Nesbit et al. Fair queuing memory systems. In MICRO--39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. J. Nesbit et al. Virtual private caches. In ISCA-34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Patil et al. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google ScholarGoogle Scholar
  30. D. M. Tullsen et al. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA--23, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. O. Wechsler. Inside Intel core microarchitecure. Intel Technical White Paper, 2006.Google ScholarGoogle Scholar
  32. X. Zhang et al. Hardware execution throttling for multi-core resource management. In USENIX, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!