Abstract
As modern embedded systems like cars need high-power integrated CPUs--GPU SoCs for various real-time applications such as lane or pedestrian detection, they face greater thermal problems than before, which may, in turn, incur higher failure rate and cooling cost. We demonstrate, via experimentation on a representative CPUs--GPU platform, the importance of accounting for two distinct thermal characteristics—the platform’s temperature imbalance and different power dissipations of different tasks—in real-time scheduling to avoid any burst of power dissipations while guaranteeing all timing constraints. To achieve this goal, we propose a new <u>R</u>eal-<u>T</u>ime <u>T</u>hermal-<u>A</u>ware <u>S</u>cheduling (RT-TAS) framework. We first capture different CPU cores’ temperatures caused by different GPU power dissipations (i.e., CPUs--GPU thermal coupling) with core-specific thermal coupling coefficients. We then develop thermally-balanced task-to-core assignment and CPUs--GPU co-scheduling. The former addresses the platform’s temperature imbalance by efficiently distributing the thermal load across cores while preserving scheduling feasibility. Building on the thermally-balanced task assignment, the latter cooperatively schedules CPU and GPU computations to avoid simultaneous peak power dissipations on both CPUs and GPU, thus mitigating excessive temperature rises while meeting task deadlines. We have implemented and evaluated RT-TAS on an automotive embedded platform to demonstrate its effectiveness in reducing the maximum temperature by 6−12.2°C over existing approaches without violating any task deadline.
- 2018. Tegra X1 Thermal Design Guide. Technical Report TDG-08214-001. Nvidia.Google Scholar
- Rehan Ahmed, Pengcheng Huang, Max Millen, and Lothar Thiele. 2017. On the design and application of thermal isolation servers. ACM Transactions on Embedded Computing Systems (TECS) 16 (2017).Google Scholar
- Tarek A AlEnawy and Hakan Aydin. 2005. Energy-aware task allocation for rate monotonic scheduling. In RTAS.Google Scholar
- Hakan Aydin and Qi Yang. 2003. Energy-aware partitioning for multiprocessor real-time systems. In Parallel and Distributed Processing Symposium.Google Scholar
Cross Ref
- Enrico Bini and Giorgio C. Buttazzo. 2005. Measuring the performance of schedulability tests. Real-Time Systems 30, 1--2 (2005).Google Scholar
Digital Library
- Thidapat Chantem, X. Sharon Hu, and Robert P. Dick. 2011. Temperature-aware scheduling and assignment for hard real-time applications on MPSoCs. IEEE Transactions on Very Large Scale Integration Systems 19, 10 (2011).Google Scholar
Digital Library
- Minki Cho, William Song, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2012. Thermal system identification (TSI): A methodology for post-silicon characterization and prediction of the transient thermal field in multicore chips. In SEMI-THERM.Google Scholar
- Edward G. Coffman, Gabor Galambos, Silvano Martello, and Daniele Vigo. 1999. Bin packing approximation algorithms: Combinatorial analysis. In Handbook of Combinatorial Optimization. 151--207.Google Scholar
- David Defour and Eric Petit. 2013. GPUburn: A system to test and mitigate GPU hardware failures. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).Google Scholar
Cross Ref
- Kapil Dev and Sherief Reda. 2016. Scheduling challenges and opportunities in integrated cpu+ gpu processors. In ESTIMedia.Google Scholar
- Glenn A. Elliott, Bryan C. Ward, and James H. Anderson. 2013. GPUSync: A framework for real-time GPU management. In RTSS.Google Scholar
- Paolo Gai, Marco Di Natale, Giuseppe Lipari, Alberto Ferrari, Claudio Gabellini, and Paolo Marceca. 2003. A comparison of MPCP and MSRP when sharing resources in the Janus multiple-processor on a chip platform. In RTAS.Google Scholar
- Sharath Kodase, Shige Wang, Zonghua Gu, and Kang G. Shin. 2003. Improving scalability of task allocation and scheduling in large distributed real-time systems using shared buffers. In RTAS.Google Scholar
- Pratyush Kumar and Lothar Thiele. 2011. Cool shapers: Shaping real-time tasks for improved thermal guarantees. In DAC.Google Scholar
- Kai Lampka and Bjorn Forsberg. 2016. Keep it slow and in time : Online DVFS with hard real-time workloads. In DATE.Google Scholar
- Youngmoon Lee, Hoon Sung Chwa, Kang G. Shin, and Shige Wang. 2018. Thermal-aware resource management for embedded real-time systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018).Google Scholar
Cross Ref
- Sheng-Chih Lin and Kaustav Banerjee. 2008. Cool chips: Opportunities and implications for power and thermal management. IEEE Trans. Dev. 55, 1 (2008).Google Scholar
- Pratyush Patel, Iljoo Baek, Hyoseung Kim, and Ragunathan Rajkumar. 2018. Analytical enhancements and practical insights for MPCP with self-suspensions. In RTAS.Google Scholar
- Indrani Paul, Srilatha Manne, Manish Arora, W. Lloyd Bircher, and Sudhakar Yalamanchili. 2013. Cooperative boosting: Needy versus greedy power management. In ISCA.Google Scholar
- Nick Piggin. [n.d.]. “Linux CFS Scheduler”. https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt.Google Scholar
- Alok Prakash, Hussam Amrouch, Muhammad Shafique, Tulika Mitra, and Jörg Henkel. 2016. Improving mobile gaming performance through cooperative CPU-GPU thermal management. In DAC.Google Scholar
- Danil Prokhorov. 2008. Computational Intelligence in Automotive Applications. Vol. 132. Springer.Google Scholar
- Robert Redelmeier. [n.d.]. cpuburn. https://patrickmn.com/projects/cpuburn/.Google Scholar
- Onur Sahin, Lothar Thiele, and Ayse K. Coskun. 2018. MAESTRO: Autonomous QoS management for mobile applications under thermal constraints. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2018).Google Scholar
- Gaurav Singla, Gurinderjit Kaur, Ali Unver, and Umit Ogras. 2015. Predictive dynamic thermal and power management for heterogeneous mobile platforms. In DATE.Google Scholar
- Kevin Skadron, Mircea Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan. 2003. Temperature-aware microarchitecture. In ISCA.Google Scholar
- Liang Wang, Xiaohang Wang, and Terrence Mak. 2016. Adaptive routing algorithms for lifetime reliability optimization in network-on-chip. IEEE Trans. Comput. 65, 9 (2016).Google Scholar
Digital Library
- Man-Ki Yoon, Sibin Mohan, Chien-Ying Chen, and Lui Sha. 2016. TaskShuffler: A schedule randomization protocol for obfuscation against timing inference attacks in real-time systems. In RTAS.Google Scholar
Index Terms
Thermal-Aware Scheduling for Integrated CPUs--GPU Platforms
Recommendations
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms
CF '16: Proceedings of the ACM International Conference on Computing FrontiersIn climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, ...
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsThe graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...






Comments