Abstract
Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications and in optimizing system energy consumption.
- An Mey, D., Sarholz, S., Terboven, C., van der Pas, R., and Loh, E. 2007. The RWTH Aachen SMP-Cluster User’s Guide, Version 6.2.Google Scholar
- Blagodurov, S., Zhuravlev, S., Lansiquot, S., and Fedorova, A. 2009. Addressing contention on multicore processors via scheduling. Tech. Rep., Simon Fraser University 2009-16.Google Scholar
- Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 340--351. Google Scholar
Digital Library
- Cho, S. and Jin, L. 2006. Managing distributed, shared l2 caches through os-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 455--468. Google Scholar
Digital Library
- Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 280--291. Google Scholar
Digital Library
- Dhiman, G., Marchetti, G., and Rosing, T. 2009. vGreen: A system for energy efficient computing in virtualized environments. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED). Google Scholar
Digital Library
- Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. 2009. Coordinated control of multiple prefetchers in multicore systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 316--326. Google Scholar
Digital Library
- Fedorova, A., Seltzer, M. I., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT’07). 25--38. Google Scholar
Digital Library
- Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circ. 31, 1277--1284.Google Scholar
Cross Ref
- Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective qos scheme for networks-on-chip. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 268--279. Google Scholar
Digital Library
- Herdrich, A., Illikkal, R., Iyer, R., Newell, D., Chadha, V., and Moses, J. 2009. Rate-based QoS techniques for cache/memory in cmp platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). 479--488. Google Scholar
Digital Library
- Hoste, K. and Eeckhout, L. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72. Google Scholar
Digital Library
- Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 220--229. Google Scholar
Digital Library
- Kim, Y., Han, D., Mutlu, O., and Harchol-balter, M. 2010. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’41).Google Scholar
- Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66. Google Scholar
Digital Library
- Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. 2008. Prefetch-aware dram controllers. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA’08). 200--209. Google Scholar
Digital Library
- Liedtke, J., Haertig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS’97). 213. Google Scholar
Digital Library
- Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 367--378.Google Scholar
- Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200. Google Scholar
Digital Library
- Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: denial of memory service in multicore systems. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (SS’07). 1--18. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’40). 146--160. Google Scholar
Digital Library
- Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). 63--74. Google Scholar
Digital Library
- Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 423--432. Google Scholar
Digital Library
- Shelepov, D., and Fedorova, A. 2008. Scheduling on heterogeneous multicore processors using architectural signatures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA).Google Scholar
- Snavely, A. and Tullsen, D. M. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Archit. News 28, 5, 234--244. Google Scholar
Digital Library
- Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA’02). 117. Google Scholar
Digital Library
- Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: sharing-aware acheduling on smp-cmp-smt multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys’07). Google Scholar
Digital Library
- Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. 2009. Rapidmrc: Approximating l2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 121--132. Google Scholar
Digital Library
- Thomas, M. H., Indermaur, T., and Gonzalez, R. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google Scholar
- van der Pas, R. 2005. The OMPlab on sun systems. In Proceedings of the 1st International Workshop on OpenMP.Google Scholar
- Xie, Y. and Loh, G. 2008. Dynamic classification of program memory behaviors in CMPs. In Proceedings of CMP-MSI, (held in conjunction with ISCA-35).Google Scholar
- Zhang, X., Dwarkadas, S., and Shen, K. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys’09). 89--102. Google Scholar
Digital Library
Index Terms
Contention-Aware Scheduling on Multicore Systems
Recommendations
Addressing shared resource contention in multicore processors via scheduling
ASPLOS '10Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...
A case for NUMA-aware contention management on multicore systems
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesOn multicore systems contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as last-level caches and memory controllers. Previous work investigated how contention ...
Addressing shared resource contention in multicore processors via scheduling
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsContention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...






Comments