ABSTRACT
Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited number of possible co-schedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve system-level priorities/shares.
This paper proposes probabilistic job symbiosis modeling, which predicts whether jobs will create positive or negative symbiosis when co-scheduled without requiring the co-schedule to be evaluated. The model, which uses per-thread cycle stacks computed through a previously proposed cycle accounting architecture, is simple enough to be used in system software. Probabilistic job symbiosis modeling provides six key innovations over prior work in symbiotic job scheduling: (i) it does not require a sampling phase, (ii) it readjusts the job co-schedule continuously, (iii) it evaluates a large number of possible co-schedules at very low overhead, (iv) it is not driven by heuristics, (v) it can optimize a performance target of interest (e.g., system throughput or job turnaround time), and (vi) it preserves system-level priorities/shares. These innovations make symbiotic job scheduling both practical and effective.
Our experimental evaluation, which assumes a realistic scenario in which jobs come and go, reports an average 16% (and up to 35%) reduction in job turnaround time compared to the previously proposed SOS (sample, optimize, symbios) approach for a two-thread SMT processor, and an average 19% (and up to 45%) reduction in job turnaround time for a four-thread SMT processor.
- C. Boneti, F. J. Cazorla, R. Gioiosa, A. Buyuktosunoglu, C.-Y. Cher, and M. Valero. Software-controlled priority characterization of POWER5 processor. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 415--426, June 2008. Google Scholar
Digital Library
- J. R. Bulpin and I. Pratt. Hyper-threading aware process scheduling heuristics. In Proceedings of the USENIX Annual Technical Conference, pages 103--106, Apr. 2005. Google Scholar
Digital Library
- F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernández, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE Transactions on Computers, 55(7):785--799, July 2006. Google Scholar
Digital Library
- F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernandez. Dynamically controlled resource allocation in SMT processors. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 171--182, Dec. 2004. Google Scholar
Digital Library
- F. J. Cazorla, A. Ramirez, M. Valero, P. M. W. Knijnenburg, R. Sakellariou, and E. Fernández. QoS for high-performance SMT processors in embedded systems. IEEE Micro, 24(4):24--31, July 2004. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip-multiprocessor architecture. In Proceedings of the Eleventh International Symposium on High Performance Computer Architecture (HPCA), pages 340--351, Feb. 2005. Google Scholar
Digital Library
- S. Choi and D. Yeung. Learning-based SMT processor resource distribution via hill-climbing. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA), pages 239--250, June 2006. Google Scholar
Digital Library
- E. Cota-Robles. Priority Based Simultaneous Multi-Threading, Dec. 2003. United States Patent No. 6,658,447 B2.Google Scholar
- A. El-Moursy, R. Garg, D. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), Apr. 2006. Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. A memory-level parallelism aware fetch policy for SMT processors. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 240--249, Feb. 2007. Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. System-level performance metrics for multi-program workloads. IEEE Micro, 28(3):42--53, May/June 2008. Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. Per--cycle accounting in SMT processors. In The International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 133--144, Mar. 2009. Google Scholar
Digital Library
- A. Fedorova, M. Seltzer, and M. D. Smith. A non-work-conserving operating system scheduler for SMT processors. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture (WIOSCA), in conjunction with ISCA, June 2006.Google Scholar
- R. Gabor, S. Weiss, and A. Mendelson. Fairness enforcement in switch on event multithreading. ACM Transactions on Architecture and Code Optimization (TACO), 4(3):34, Sept. 2007. Google Scholar
Digital Library
- B. Gibbs, B. Atyam, F. Berres, B. Blanchard, L. Castillo, P. Coelho, N. Guerin, L. Liu, C. D. Maciel, C. Sosa, and C. Thirumalai. Advanced POWER Virtualization on IBM eServer p5 Servers: Architecture and Performance Considerations. IBM, Nov. 2005.Google Scholar
- R. Jain, C. J. Hughes, and S. V. Adve. Soft real-time scheduling on simultaneous multithreaded processors. In Proceedings of the 23rd IEEE International Real-Time Systems Symposium, pages 134--145, Dec. 2002. Google Scholar
Digital Library
- K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 164--171, Nov. 2001.Google Scholar
- S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for SMT processors. Technical report, University of Washington, 2000.Google Scholar
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 423--432, Dec. 2006. Google Scholar
Digital Library
- S. E. Raasch and S. K. Reinhardt. The impact of resource partitioning on SMT processors. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 15--26, Sept. 2003. Google Scholar
Digital Library
- T. Ramirez, A. Pajuelo, O. J. Santana, and M. Valero. Runahead threads to improve SMT performance. In Proceedings of the Fourteenth International Symposium on High-Performance Computer Architecture (HPCA), pages 149--158, Feb. 2008.Google Scholar
Cross Ref
- A. Settle, J. Kihm, A. Janiszewski, and D. Connors. Architectural support for enhanced SMT job scheduling. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 63--73, Sept. 2004. Google Scholar
Digital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 45--57, Oct. 2002. Google Scholar
Digital Library
- A. Snavely and L. Carter. Symbiotic jobscheduling on the MTA. In Proceedings of the Workshop on Multi-Threaded Execution, Architecture and Compilers, Jan. 2000.Google Scholar
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for simultaneous multithreading processor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 234--244, Nov. 2000. Google Scholar
Digital Library
- A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 66--76, June 2002. Google Scholar
Digital Library
- D. Tam, R. Azimi, and M. Stumm. Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of the European Conference in Computer Systems (EuroSys), pages 47--58, Mar. 2007. Google Scholar
Digital Library
- N. Tuck and D. M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 26--34, Sept. 2003. Google Scholar
Digital Library
- D. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference, Dec. 1996.Google Scholar
- D. M. Tullsen and J. A. Brown. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 318--327, Dec. 2001. Google Scholar
Digital Library
- D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA), pages 191--202, May 1996. Google Scholar
Digital Library
- D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), pages 392--403, June 1995. Google Scholar
Digital Library
- VMware. HyperThreading Support in VMware ESX Server 2.1, Apr. 2004.Google Scholar
Index Terms
Probabilistic job symbiosis modeling for SMT processor scheduling
Recommendations
Probabilistic job symbiosis modeling for SMT processor scheduling
ASPLOS '10Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited ...
Probabilistic job symbiosis modeling for SMT processor scheduling
ASPLOS '10Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited ...
Probabilistic modeling for job symbiosis scheduling on SMT processors
Symbiotic job scheduling improves simultaneous multithreading (SMT) processor performance by coscheduling jobs that have “compatible” demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited ...










Comments