skip to main content
10.1145/1736020.1736036acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Addressing shared resource contention in multicore processors via scheduling

Published:13 March 2010Publication History

ABSTRACT

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2\% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications.

References

  1. D. an Mey, S. Sarholz, and C. Terboven et al. The RWTH Aachen SMP-Cluster User's Guide, Version 6.2. 2007.Google ScholarGoogle Scholar
  2. E. Berg and E. Hagersten. Statcache: a Probabilistic Approach to Efficient and Accurate Data Locality Analysis. In Proceedings of the IEEE International Symmposium on Performance Analysis of Systems and Software, pages 20--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Blagodurov, S. Zhuravlev, S. Lansiquot, and A. Fedorova. Addressing Contention on Multicore Processors via Scheduling. In Simon Fraser University, Technical Report 2009-16, 2009.Google ScholarGoogle Scholar
  4. C. Cascaval, L. D. Rose, D. A. Padua, and D. A. Reed. Compile-Time Based Performance Prediction. In LCPC '99: Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing, pages 365--379, 2000.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. In HPCA '05: Proceedings of the 11th International Symposium on High- Performance Computer Architecture, pages 340--351, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Cho and L. Jin. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 455--468, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Dhiman, G. Marchetti, and T. Rosing. vGreen: a System for Energy Efficient Computing in Virtualized Environments. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Fedorova, M. I. Seltzer, and M. D. Smith. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. In Proceedings of the Sixteenth International Conference on Parallel Architectures and Compilation Techniques (PACT'07), pages 25--38, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Herdrich, R. Illikkal, R. Iyer, D. Newell, V. Chadha, and J. Moses. Rate-based QoS Techniques for Cache/Memory in CMP Platforms. In ICS '09: Proceedings of the 23rd International Conference on Supercomputing, pages 479--488, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Hoste and L. Eeckhout. Microarchitecture-Independent Workload Characterization. IEEE Micro, 27(3):63--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08), pages 220--229, 2008.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. IEEE Micro, 28(3):54--66, 2008.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Koukis and N. Koziris. Memory Bandwidth Aware Scheduling for SMP Cluster Nodes. In PDP '05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 187--196, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Liedtke, H. Haertig, and M. Hohmuth. OS-Controlled Cache Predictability for Real-Time Systems. In RTAS '97: Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97), page 213, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA 2008), pages 367--378, 2008.Google ScholarGoogle Scholar
  16. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S.Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 190--200, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, 2006.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Rafique, W.-T. Lim, and M. Thottethodi. Effective management of dram bandwidth in multicore processors. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 245--258, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Shelepov and A. Fedorova. Scheduling on heterogeneous multicore processors using architectural signatures. WIOSCA, 2008..Google ScholarGoogle Scholar
  20. D. Shelepov, J. C. Saez, and S. Jeffery et al. HASS: a Scheduler for Heterogeneous Multicore Systems. ACM Operating System Review, 43(2), 2009.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Archit. News, 28(5):234--244, 2000.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 117, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Tam, R. Azimi, and M. Stumm. Thread Clustering: Sharing-Aware Acheduling on SMP-CMP-SMT Multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys'07), 2007.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. Rapidmrc: Approximating l2 miss rate curves on commodity systems for online optimizations. In ASPLOS '09: Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, pages 121--132, 2009.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. van der Pas. The OMPlab on Sun Systems. In Proceedings of the First International Workshop on OpenMP, 2005..Google ScholarGoogle Scholar
  26. Y. Xie and G. Loh. Dynamic Classification of Program Memory Behaviors in CMPs. In Proc. of CMP-MSI, held in conjunction with ISCA-35, 2008.Google ScholarGoogle Scholar
  27. X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys'09), pages 89--102, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Addressing shared resource contention in multicore processors via scheduling

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!