ABSTRACT
Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms in each individual resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and loss of performance. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable.
This paper proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each individual resource. Our technique, Fairness via Source Throttling (FST), estimates the unfairness in the entire shared memory system. If the estimated unfairness is above a threshold set by system software, FST throttles down cores causing unfairness by limiting the number of requests they can inject into the system and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST also enforces thread priorities/weights, and enables system software to enforce different fairness objectives and fairness-performance tradeoffs in the memory system.
Our evaluations show that FST provides the best system fairness and performance compared to four systems with no fairness control and with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.
- Advanced Micro Devices. AMD's six-core Opteron processors. http://techreport.com/articles.x/17005, 2009.Google Scholar
- R. Bitirgen et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In MICRO-41, 2008. Google Scholar
Digital Library
- F. J. Cazorla et al. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 2004. Google Scholar
Digital Library
- R. Das et al. Application-aware prioritization mechanisms for on-chip networks. In MICRO, 2009. Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. Per-thread cycle accounting in SMT processors. In ASPLOS, 2009. Google Scholar
Digital Library
- A. Fedorova et al. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007. Google Scholar
Digital Library
- R. Gabor et al. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google Scholar
Digital Library
- A. Glew. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session '98, Oct. 1998.Google Scholar
- B. Grot et al. Preemptive virtual clock: A flexible, efficient, and costeffective QoS scheme for networks-on-a-chip. In MICRO, 2009. Google Scholar
Digital Library
- A. Herdrich et al. Rate-based QoS techniques for cache/memory in CMP platforms. In ICS, 2009. Google Scholar
Digital Library
- Intel. First the tick, now the tock: Next generation Intel microarchitecure (Nehalem). Intel Technical White Paper, 2008.Google Scholar
- R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS--18, 2004. Google Scholar
Digital Library
- R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google Scholar
Digital Library
- M. Jahre and L. Natvig. A light-weight fairness mechanism for chip multiprocessor memory systems. In Computing Frontiers, 2009. Google Scholar
Digital Library
- S. Kimet al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google Scholar
Digital Library
- D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google Scholar
Digital Library
- J. W. Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA-35, 2008. Google Scholar
Digital Library
- K. Luo et al. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
- K. Luo et al. Boosting SMT performance by speculation control. In IPDPS, 2001. Google Scholar
Digital Library
- Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 -- 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
- T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi--core systems. In USENIX Security, 2007. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google Scholar
Digital Library
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA--35, 2008. Google Scholar
Digital Library
- K. J. Nesbit et al. Fair queuing memory systems. In MICRO--39, 2006. Google Scholar
Digital Library
- K. J. Nesbit et al. Virtual private caches. In ISCA-34, 2007. Google Scholar
Digital Library
- H. Patil et al. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google Scholar
Digital Library
- S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google Scholar
Digital Library
- A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google Scholar
Digital Library
- J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, Oct. 2001.Google Scholar
- D. M. Tullsen et al. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA--23, 1996. Google Scholar
Digital Library
- O. Wechsler. Inside Intel core microarchitecure. Intel Technical White Paper, 2006.Google Scholar
- X. Zhang et al. Hardware execution throttling for multi-core resource management. In USENIX, 2009. Google Scholar
Digital Library
Index Terms
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
Recommendations
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
ASPLOS '10Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed ...
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems
Cores in chip-multiprocessors (CMPs) share multiple memory subsystem resources. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms ...
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
ASPLOS '10Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed ...








Comments