ABSTRACT
Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.
- D. an Mey, S. Sarholz, and C. Terboven et al. The RWTH Aachen SMP-Cluster User's Guide, Version 6.2. 2007.Google Scholar
- E. Berg and E. Hagersten. Statcache: a Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In Proceedings of the IEEE International Symmposium on Performance Analysis of Systems and Software, pages 20--27, 2004. Google Scholar
Digital Library
- S. Blagodurov, S. Zhuravlev, S. Lansiquot, and A. Fedorova. Addressing Contention on Multicore Processors via Scheduling. In Simon Fraser University, Technical Report 2009-16, 2009.Google Scholar
- C. Cascaval, L. D. Rose, D. A. Padua, and D. A. Reed. Compile-Time Based Performance Prediction. In LCPC '99: Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing, pages 365--379, 2000.. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In HPCA '05: Proceedings of the 11th International Symposium on High- Performance Computer Architecture, pages 340--351, 2005. Google Scholar
Digital Library
- S. Cho and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 455--468, 2006. Google Scholar
Digital Library
- G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), 2009. Google Scholar
Digital Library
- A. Fedorova, M. I. Seltzer, and M. D. Smith. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In Proceedings of the Sixteenth International Conference on Parallel Architectures and Compilation Techniques (PACT'07), pages 25--38, 2007. Google Scholar
Digital Library
- A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based QoS Techniques for Cache/Memory in CMP Platforms. In ICS '09: Proceedings of the 23rd International Conference on Supercomputing, pages 479--488, 2009. Google Scholar
Digital Library
- K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro, 27(3):63--72, 2007. Google Scholar
Digital Library
- Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08), pages 220--229, 2008.. Google Scholar
Digital Library
- R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. IEEE Micro, 28(3):54--66, 2008.. Google Scholar
Digital Library
- E. Koukis and N. Koziris. Memory Bandwidth Aware Scheduling for SMP Cluster Nodes. In PDP '05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 187--196, 2005. Google Scholar
Digital Library
- J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS '97: Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97), page 213, 1997. Google Scholar
Digital Library
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA 2008), pages 367--378, 2008.Google Scholar
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190--200, 2005. Google Scholar
Digital Library
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, 2006.. Google Scholar
Digital Library
- N. Rafique, W.-T. Lim, and M. Thottethodi. Effective management of dram bandwidth in multicore processors. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 245--258, 2007. Google Scholar
Digital Library
- D. Shelepov and A. Fedorova. Scheduling on heterogeneous multicore processors using architectural signatures. WIOSCA, 2008..Google Scholar
- D. Shelepov, J. C. Saez, and S. Jeffery et al. HASS: a Scheduler for Heterogeneous Multicore Systems. ACM Operating System Review, 43(2), 2009.. Google Scholar
Digital Library
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Archit. News, 28(5):234--244, 2000.. Google Scholar
Digital Library
- G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 117, 2002. Google Scholar
Digital Library
- D. Tam, R. Azimi, and M. Stumm. Thread Clustering: Sharing-Aware Acheduling on SMP-CMP-SMT Multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys'07), 2007.. Google Scholar
Digital Library
- D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. Rapidmrc: Approximating l2 miss rate curves on commodity systems for online optimizations. In ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, pages 121--132, 2009.. Google Scholar
Digital Library
- R. van der Pas. The OMPlab on Sun Systems. In Proceedings of the First International Workshop on OpenMP, 2005..Google Scholar
- Y. Xie and G. Loh. Dynamic Classification of Program Memory Behaviors in CMPs. In Proc. of CMP-MSI, held in conjunction with ISCA-35, 2008.Google Scholar
- X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09), pages 89--102, 2009. Google Scholar
Digital Library
Index Terms
Addressing shared resource contention in multicore processors via scheduling
Recommendations
Addressing shared resource contention in multicore processors via scheduling
ASPLOS '10Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...
Addressing shared resource contention in multicore processors via scheduling
ASPLOS '10Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software ...
Survey of scheduling techniques for addressing shared resources in multicore processors
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern computing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. ...








Comments