Abstract
Cores in chip-multiprocessors (CMPs) share multiple memory subsystem resources. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed separate fairness mechanisms for each resource. Such resource-based fairness mechanisms implemented independently in each resource can make contradictory decisions, leading to low fairness and performance loss. Therefore, a coordinated mechanism that provides fairness in the entire shared memory system is desirable.
This article proposes a new approach that provides fairness in the entire shared memory system, thereby eliminating the need for and complexity of developing fairness mechanisms for each resource. Our technique, Fairness via Source Throttling (FST), estimates unfairness in the entire memory system. If unfairness is above a system-software-set threshold, FST throttles down cores causing unfairness by limiting the number of requests they create and the frequency at which they do. As such, our source-based fairness control ensures fairness decisions are made in tandem in the entire memory system. FST enforces thread priorities/weights, and enables system-software to enforce different fairness objectives in the memory system.
Our evaluations show that FST provides the best system fairness and performance compared to three systems with state-of-the-art fairness mechanisms implemented in both shared caches and memory controllers.
- Advanced Micro Devices. 2009. AMD’s six-core Opteron processors. http://techreport.com/articles.x/17005.Google Scholar
- Bitirgen, R., Ipek, E., and Martinez, J. F. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Cazorla, F. J., Ramirez, A., Valero, M., Knijnenburg, P. M. W., Sakellariou, R., and Fernandez, E. 2004. QoS for high-performance SMT processors in embedded systems. IEEE Micro. Google Scholar
Digital Library
- Chou, Y., Fahs, B., and Abraham, S. 2004. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the 31st Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2009. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2010. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Ebrahimi, E., Lee, C. J., Mutlu, O., and Patt, Y. 2011. Prefetch-aware shared-resource management for multi-core systems. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Ebrahimi, E., Miftakhutdinov, R., Fallin, C., Lee, C. J., Joao, J. A., Mutlu, O., and Patt, Y. 2011. Parallel application memory scheduling. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Eyerman, S. and L. Eeckhout, L. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28, 3, 42--53. Google Scholar
Digital Library
- Eyerman, S. and L. Eeckhout, L. 2009. Per-thread cycle accounting in SMT processors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Fedorova, A., Seltzer, M., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Gabor, R., Weiss, S., and Mendelson, A. 2006. Fairness and throughput in switch on event multithreading. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Glew, A. 1998. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session’98.Google Scholar
- Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-a-chip. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Herdrich, A., Illikkal, R., Iyer, R., Newell, D., Chadha, V., and Moses, J. 2009. Rate-based QoS techniques for cache/memory in CMP platforms. In Proceedings of the International Conference on Supercomputing. Google Scholar
Digital Library
- Hsu, L. R., Reinhardt, S. K., Iyer, R., and Makineni, S. 2006. Communist, utilitarian and capitalist cache policies on cmps: Caches as a shared resource. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Intel. 2008. First the tick, now the tock: Next generation Intel microarchitecure (Nehalem). Intel technical white paper.Google Scholar
- Iyer, R. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the International Conference on Supercomputing. Google Scholar
Digital Library
- Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., and Reinhardt, S. 2007. QoS policies and architecture for cache/memory in CMP platforms. In Proceedings of the ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Jahre, M. and Natvig, L. 2009. A light-weight fairness mechanism for chip multiprocessor memory systems. In Proceedings of the ACM International Conference on Computing Frontiers. Google Scholar
Digital Library
- Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Kroft, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Lee, J. W., Ng, M. C., and Asanovic, K. 2008. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Luo, K., Franklin, M., Mukherjee, S., and Sezne, A. 2001a. Boosting SMT performance by speculation control. In Proceedings of the International Parallel and Distributed Processing Symposium. Google Scholar
Digital Library
- Luo, K., Gummaraju, J., and Franklin, M. 2001b. Balancing throughput and fairness in SMT processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software.Google Scholar
- Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
- Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In Proceedings of the 16th USENIX Security Symposium. Google Scholar
Digital Library
- Muralidhara, S. P., Subramanian, L., Mutlu, O., Kandemir, M., and Moscibroda, T. 2011. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Nesbit, K. J., Aggarwal, N., Laudon, J., and Smith, J. E. 2006. Fair queuing memory systems. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Nesbit, K. J., Laudon, J., and Smith, J. E. Virtual private caches. 2007. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. 2004. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Rixner, S., Dally, W. J., Kapasi, U. J., Mattson, P., and Owens, J. D. 2000. Memory access scheduling. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Snavely, A. and Tullsen, D. M. 2000. Symbiotic job scheduling for a simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Srinath, S., Mutlu, O., Kim, H., and Patt, Y. 2007. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- Tang, L., Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Tendler, J., Dodson, S., Field, S., Le, H., and Sinharoy, B. 2001. POWER4 system microarchitecture. IBM technical white paper. Google Scholar
Digital Library
- Tullsen, D. M. and. Eggers, S. J. 1996. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Wechsler, O. 2006. Inside Intel core microarchitecure. Intel technical white paper.Google Scholar
- Zhang, X., Dwarkadas, S., and Shen, K. 2009. Hardware execution throttling for multi-core resource management. In Proceedings of USENIX. Google Scholar
Digital Library
- Zhuravlev, S., Blagodurov, S., and Fedorova, A. 2010. Addressing shared resource contention inmulticore processors via scheduling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
Index Terms
Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems
Recommendations
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
ASPLOS '10Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed ...
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsCores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed ...
Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems
ASPLOS '10Cores in a chip-multiprocessor (CMP) system share multiple hardware resources in the memory subsystem. If resource sharing is unfair, some applications can be delayed significantly while others are unfairly prioritized. Previous research proposed ...






Comments