skip to main content
10.1145/1736020.1736033acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Probabilistic job symbiosis modeling for SMT processor scheduling

Published:13 March 2010Publication History

ABSTRACT

Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited number of possible co-schedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve system-level priorities/shares.

This paper proposes probabilistic job symbiosis modeling, which predicts whether jobs will create positive or negative symbiosis when co-scheduled without requiring the co-schedule to be evaluated. The model, which uses per-thread cycle stacks computed through a previously proposed cycle accounting architecture, is simple enough to be used in system software. Probabilistic job symbiosis modeling provides six key innovations over prior work in symbiotic job scheduling: (i) it does not require a sampling phase, (ii) it readjusts the job co-schedule continuously, (iii) it evaluates a large number of possible co-schedules at very low overhead, (iv) it is not driven by heuristics, (v) it can optimize a performance target of interest (e.g., system throughput or job turnaround time), and (vi) it preserves system-level priorities/shares. These innovations make symbiotic job scheduling both practical and effective.

Our experimental evaluation, which assumes a realistic scenario in which jobs come and go, reports an average 16% (and up to 35%) reduction in job turnaround time compared to the previously proposed SOS (sample, optimize, symbios) approach for a two-thread SMT processor, and an average 19% (and up to 45%) reduction in job turnaround time for a four-thread SMT processor.

References

  1. C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C.-Y. Cher, and M. Valero. Software-controlled priority characterization of POWER5 processor. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 415--426, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. R. Bulpin and I. Pratt. Hyper-threading aware process scheduling heuristics. In Proceedings of the USENIX Annual Technical Conference, pages 103--106, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernández, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE Transactions on Computers, 55(7):785--799, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically controlled resource allocation in SMT processors. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 171--182, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernández. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 24(4):24--31, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip-multiprocessor architecture. In Proceedings of the Eleventh International Symposium on High Performance Computer Architecture (HPCA), pages 340--351, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA), pages 239--250, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Cota-Robles. Priority Based Simultaneous Multi-Threading, Dec. 2003. United States Patent No. 6,658,447 B2.Google ScholarGoogle Scholar
  9. A. El-Moursy, R. Garg, D. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Eyerman and L. Eeckhout. A memory-level parallelism aware fetch policy for SMT processors. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 240--249, Feb. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Eyerman and L. Eeckhout. System-level performance metrics for multi-program workloads. IEEE Micro, 28(3):42--53, May/June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Eyerman and L. Eeckhout. Per--cycle accounting in SMT processors. In The International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 133--144, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Fedorova, M. Seltzer, and M. D. Smith. A non-work-conserving operating system scheduler for SMT processors. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), in conjunction with ISCA, June 2006.Google ScholarGoogle Scholar
  14. R. Gabor, S. Weiss, and A. Mendelson. Fairness enforcement in switch on event multithreading. ACM Transactions on Architecture and Code Optimization (TACO), 4(3):34, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Gibbs, B. Atyam, F. Berres, B. Blanchard, L. Castillo, P. Coelho, N. Guerin, L. Liu, C. D. Maciel, C. Sosa, and C. Thirumalai. Advanced POWER Virtualization on IBM eServer p5 Servers: Architecture and Performance Considerations. IBM, Nov. 2005.Google ScholarGoogle Scholar
  16. R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In Proceedings of the 23rd IEEE International Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 164--171, Nov. 2001.Google ScholarGoogle Scholar
  18. S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for SMT processors. Technical report, University of Washington, 2000.Google ScholarGoogle Scholar
  19. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 423--432, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. E. Raasch and S. K. Reinhardt. The impact of resource partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 15--26, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Ramirez, A. Pajuelo, O. J. Santana, and M. Valero. Runahead threads to improve SMT performance. In Proceedings of the Fourteenth International Symposium on High-Performance Computer Architecture (HPCA), pages 149--158, Feb. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced SMT job scheduling. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 63--73, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 45--57, Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Snavely and L. Carter. Symbiotic jobscheduling on the MTA. In Proceedings of the Workshop on Multi-Threaded Execution, Architecture and Compilers, Jan. 2000.Google ScholarGoogle Scholar
  25. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 234--244, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 66--76, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of the European Conference in Computer Systems (EuroSys), pages 47--58, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 26--34, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference, Dec. 1996.Google ScholarGoogle Scholar
  30. D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 318--327, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), pages 191--202, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), pages 392--403, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. VMware. HyperThreading Support in VMware ESX Server 2.1, Apr. 2004.Google ScholarGoogle Scholar

Index Terms

  1. Probabilistic job symbiosis modeling for SMT processor scheduling

        Recommendations

        Reviews

        Srini Ramaswamy

        In today's climate of increasingly complex processors, improving hardware resource utilization by means of resource sharing across multiple active threads is a very important consideration for system software. By introducing the possibility of having to share hardware resources to improve performance at a much finer granular scale, designers have to grapple with the issue of avoiding runtime dependencies between multiple scheduled jobs. Given today's move toward large-scale multi-core systems, this problem assumes increasingly stronger significance. Intuitively, job symbiosis modeling implies that jobs that are co-scheduled have a natural symbiotic relationship with respect to their use of shared hardware resources. However, for such scenarios, one will require that the job's execution profile with respect to resource usage be well understood. Other researchers have called this "job workload characterization." By understanding the workload characterization, or by accurately predicting such characterization, system software can create beneficial scheduling scenarios for performance improvements. However, this implies that one fully understands workload characterization by means of sampling. In this paper, the authors introduce probabilistic job symbiosis modeling as an approach for co-scheduling of jobs in such environments. Key to the success of this approach is the ability "to predict whether jobs will create positive or negative symbiosis when co-scheduled." The approach presented in this paper uses per-thread cycle stacks computed using a previously proposed cyclic accounting architecture to estimate single-threaded progress rates for the individual jobs in a job co-schedule. The system software can in turn use this to estimate job co-schedules that are maximally positive with respect to shared resource usage. A per-thread "cycle stack is an estimate for the single-threaded cycle stack had the thread been executed in isolation." This involves computing the "model formulas for every possible co-schedule" during each time period. The authors define a measure called fairness, which "is the minimum ratio of proportional progress for any two jobs in the system, and equals zero if at least one program starves and equals one if all jobs make progress proportional to their relative shares." I see the potential for this to become computationally complex if the estimations have to be done for each pair of possible jobs at runtime; however, there are approximations that can be used to trim down the search space. Furthermore, the authors define two primary performance metrics for multi-program workloads: a system-oriented performance metric (system throughput), and a user-oriented performance metric (average normalized turnaround time). These are used to appropriately tune the fairness measure, dependent on the needs of the system under investigation. Using these metrics, the authors compare and contrast the performance of their approach with the sample, optimize, symbios (SOS) approach, and show some improvements for two- and four-thread simultaneous multi-threading processor systems. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!