skip to main content
research-article

Contention-Aware Scheduling on Multicore Systems

Published:01 December 2010Publication History
Skip Abstract Section

Abstract

Contention for shared resources on multicore processors remains an unsolved problem in existing systems despite significant research efforts dedicated to this problem in the past. Previous solutions focused primarily on hardware techniques and software page coloring to mitigate this problem. Our goal is to investigate how and to what extent contention for shared resource can be mitigated via thread scheduling. Scheduling is an attractive tool, because it does not require extra hardware and is relatively easy to integrate into the system. Our study is the first to provide a comprehensive analysis of contention-mitigating techniques that use only scheduling. The most difficult part of the problem is to find a classification scheme for threads, which would determine how they affect each other when competing for shared resources. We provide a comprehensive analysis of such classification schemes using a newly proposed methodology that enables to evaluate these schemes separately from the scheduling algorithm itself and to compare them to the optimal. As a result of this analysis we discovered a classification scheme that addresses not only contention for cache space, but contention for other shared resources, such as the memory controller, memory bus and prefetching hardware. To show the applicability of our analysis we design a new scheduling algorithm, which we prototype at user level, and demonstrate that it performs within 2% of the optimal. We also conclude that the highest impact of contention-aware scheduling techniques is not in improving performance of a workload as a whole but in improving quality of service or performance isolation for individual applications and in optimizing system energy consumption.

References

  1. An Mey, D., Sarholz, S., Terboven, C., van der Pas, R., and Loh, E. 2007. The RWTH Aachen SMP-Cluster User’s Guide, Version 6.2.Google ScholarGoogle Scholar
  2. Blagodurov, S., Zhuravlev, S., Lansiquot, S., and Fedorova, A. 2009. Addressing contention on multicore processors via scheduling. Tech. Rep., Simon Fraser University 2009-16.Google ScholarGoogle Scholar
  3. Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 340--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cho, S. and Jin, L. 2006. Managing distributed, shared l2 caches through os-level page allocation. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 455--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Das, R., Mutlu, O., Moscibroda, T., and Das, C. R. 2009. Application-aware prioritization mechanisms for on-chip networks. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 280--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dhiman, G., Marchetti, G., and Rosing, T. 2009. vGreen: A system for energy efficient computing in virtualized environments. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ebrahimi, E., Mutlu, O., Lee, C. J., and Patt, Y. N. 2009. Coordinated control of multiple prefetchers in multicore systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 316--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fedorova, A., Seltzer, M. I., and Smith, M. D. 2007. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT’07). 25--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circ. 31, 1277--1284.Google ScholarGoogle ScholarCross RefCross Ref
  10. Grot, B., Keckler, S. W., and Mutlu, O. 2009. Preemptive virtual clock: A flexible, efficient, and cost-effective qos scheme for networks-on-chip. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’42). 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Herdrich, A., Illikkal, R., Iyer, R., Newell, D., Chadha, V., and Moses, J. 2009. Rate-based QoS techniques for cache/memory in cmp platforms. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). 479--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hoste, K. and Eeckhout, L. 2007. Microarchitecture-independent workload characterization. IEEE Micro 27, 3, 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 220--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kim, Y., Han, D., Mutlu, O., and Harchol-balter, M. 2010. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’41).Google ScholarGoogle Scholar
  15. Knauerhase, R., Brett, P., Hohlt, B., Li, T., and Hahn, S. 2008. Using OS observations to improve performance in multicore systems. IEEE Micro 28, 3, 54--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lee, C. J., Mutlu, O., Narasiman, V., and Patt, Y. N. 2008. Prefetch-aware dram controllers. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA’08). 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Liedtke, J., Haertig, H., and Hohmuth, M. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS’97). 213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 367--378.Google ScholarGoogle Scholar
  19. Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’05). 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Moscibroda, T. and Mutlu, O. 2007. Memory performance attacks: denial of memory service in multicore systems. In Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (SS’07). 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mutlu, O. and Moscibroda, T. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’40). 146--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qureshi, M. K. and Patt, Y. N. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’39). 423--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shelepov, D., and Fedorova, A. 2008. Scheduling on heterogeneous multicore processors using architectural signatures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA).Google ScholarGoogle Scholar
  25. Snavely, A. and Tullsen, D. M. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. SIGARCH Comput. Archit. News 28, 5, 234--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Suh, G. E., Devadas, S., and Rudolph, L. 2002. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA’02). 117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: sharing-aware acheduling on smp-cmp-smt multiprocessors. In Proceedings of the 2nd ACM European Conference on Computer Systems (EuroSys’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tam, D. K., Azimi, R., Soares, L. B., and Stumm, M. 2009. Rapidmrc: Approximating l2 miss rate curves on commodity systems for online optimizations. In Proceeding of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Thomas, M. H., Indermaur, T., and Gonzalez, R. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google ScholarGoogle Scholar
  30. van der Pas, R. 2005. The OMPlab on sun systems. In Proceedings of the 1st International Workshop on OpenMP.Google ScholarGoogle Scholar
  31. Xie, Y. and Loh, G. 2008. Dynamic classification of program memory behaviors in CMPs. In Proceedings of CMP-MSI, (held in conjunction with ISCA-35).Google ScholarGoogle Scholar
  32. Zhang, X., Dwarkadas, S., and Shen, K. 2009. Towards practical page coloring-based multicore cache management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys’09). 89--102. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Contention-Aware Scheduling on Multicore Systems

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 28, Issue 4
    December 2010
    100 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/1880018
    Issue’s Table of Contents

    Copyright © 2010 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 1 December 2010
    • Accepted: 1 October 2010
    • Received: 1 May 2010
    Published in tocs Volume 28, Issue 4

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!