ABSTRACT
Multicores are becoming ubiquitous, not only in general-purpose but also embedded computing. This trend is a reflexion of contemporary embedded applications posing steadily increasing demands in processing power. On such platforms, prediction of timing behavior to ensure that deadlines of real-time tasks can be met is becoming increasingly difficult. While real-time multicore scheduling approaches help to assure deadlines based on firm theoretical properties, their reliance on task migration poses a significant challenge to timing predictability in practice. Task migration actually (a) reduces timing predictability for contemporary multicores due to cache warm-up overheads while (b) increasing traffic on the network-on-chip (NoC) interconnect.
This paper puts forth a fundamentally new approach to increase the timing predictability of multicore architectures aimed at task migration in embedded environments. A task migration between two cores imposes cache warm-up overheads on the migration target, which can lead to missed deadlines for tight real-time schedules. We propose novel micro-architectural support to migrate cache lines. Our scheme shows dramatically increased predictability in the presence of cross-core migration.
Experimental results for schedules demonstrate that our scheme enables real-time tasks to meet their deadlines in the presence of task migration. Our results illustrate that increases in execution time due to migration is reduced by our scheme to levels that may prevent deadline misses of real-time tasks that would otherwise occur. Our mechanism imposes an overhead at a fraction of the task's execution time, yet this overhead can be steered to fill idle slots in the schedule, i.e., it does not contribute to the execution time of the migrated task. Overall, our novel migration scheme provides a unique mechanism capable of significantly increasing timing predictability in the wake of task migration.
- Tera-scale research prototype: Connecting 80 simple sores on a single test chip. ftp://download.intel.com/research/platform/terascale/terascaleresearchprototypebackgrounder.pdf.Google Scholar
- Tilera processor family. http://www.tilera.com/products/processors.php.Google Scholar
- Wcet project benchmarks, 2007. http://www.mrtc.mdh.se/projects/wcetbenchmarks.html.Google Scholar
- A. Acquaviva, A. Alimonda, S. Carta, and M. Pittau. Assessing task migration impact on embedded soft real-time streaming multimedia applications. EURASIP J. Embedded Syst., 2008(2):1?15, 2008. Google Scholar
Digital Library
- J. Anderson, J. Calandrino, and U. Devi. Real-time scheduling on multicore platforms. In IEEE Real-Time Embedded Technology and Applications Symposium, pages 179--190, Apr. 2006. Google Scholar
Digital Library
- J. Anderson and A. Srinivasan. Early-release fair scheduling. In Euromicro Conference on Real-Time Systems, pages 35--43, June 2000. Google Scholar
Digital Library
- J. Anderson and A. Srinivasan. Mixed pfair/erfair scheduling of asynchronous periodic tasks. In Euromicro Conference on Real-Time Systems, pages 76--85, June 2001. Google Scholar
Digital Library
- R. Arnold, F. Mueller, D. B. Whalley, and M. Harmon. Bounding worst-case instruction cache performance. In IEEE Real-Time Systems Symposium, pages 172--181, Dec. 1994.Google Scholar
Cross Ref
- S. Baruah. Techniques for multiprocessor global schedulability analysis. In IEEE Real--Time Systems Symposium, pages 119--128, 2007. Google Scholar
Digital Library
- S. Baruah, N. Cohen, C. Plaxton, and D. Varvel. Proportionate progress: A notion of fairness in resource allocation. Algorithmica, 15:600--625, 1996.Google Scholar
Digital Library
- S. Bertozzi, A. Acquaviva, D. Bertozzi, and A. Poggiali. Supporting task migration in multi-processor systems-on-chip: a feasibility study. In Proceedings of the conference on Design, automation and test in Europe, pages 15--20, 2006. Google Scholar
Digital Library
- A. Burchard, J. Liebeherr, Y. Oh, and S. Son. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Trans. on Computers, 44(12):1429--1442, 1995. Google Scholar
Digital Library
- J. Calandrino and J. Anderson. Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In Euromicro Conference on Real-Time Systems, pages 209?308, July 2008. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting interthread cache contention on a chip multi-processor architecture. In International Symposium on High Performance Computer Architecture, pages 340--351, 2005. Google Scholar
Digital Library
- R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, and J. McDonald. Parallel Programming in OpenMP. Morgan Kaufmann Publishers, Los Altos, CA 94022, USA, 2000. Google Scholar
Digital Library
- J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In International Symposium on Computer Architecture, pages 264--276, 2006. Google Scholar
Digital Library
- Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing replication, communication, and capacity allocation in cmps. In International Symposium on Computer Architecture, pages 357--368, 2005. Google Scholar
Digital Library
- D. Choffnes, M. Astley, and M. J.Ward. Migration policies for multicore fair-share scheduling. ACM SIGOPS Operating Systems Review, 42:92--93, 2008. Google Scholar
Digital Library
- S. Dhall and C. Liu. On a real-time scheduling problem. Operations Research, 26(1):127--140, 1978.Google Scholar
Digital Library
- N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In International Symposium on Microarchitecture, pages 321--332, 2006. Google Scholar
Digital Library
- N. Eisley, L.-S. Peh, and L. Shang. Leveraging on-chip networks for data cache migration in chip multiprocessors. In International conference on Parallel architectures and compilation techniques, pages 197--207, 2008. Google Scholar
Digital Library
- A. Fedorova, M. Seltzer, and M. Smith. Cache-fair thread scheduling for multicore processors. Technical Report TR-17-06, Harvard University, Oct. 2006.Google Scholar
- J. Gummaraju and M. Rosenblum. Stream programming on general-purpose processors. In International Symposium on Microarchitecture, pages 343--354, 2005. Google Scholar
Digital Library
- D. Hardy and I. Puaut. Wcet analysis of multi-level non-inclusive setassociative instruction caches. In Proceedings of Real-Time Systems Symposium, pages 456--466, 2008. Google Scholar
Digital Library
- R. Iyer. Cqos: a framework for enabling qos in shared caches of cmp platforms. In Proceedings of international conference on Supercomputing, pages 257--266, 2004. Google Scholar
Digital Library
- N. Jerger, M. Lipasti, and L. Peh. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In International Symposium on Microarchitecture, pages 35--46, Nov. 2008. Google Scholar
Digital Library
- S. W. Kim, M. Voss, B. Kuhn, H.-C. Hoppe, and W. Nagel. Vgv: Supporting performance analysis of object-oriented parallel applications. In Proc. of IPDPS'2002 (HIPS'2002): Workshop on High-Level Parallel Programming Models and Supportive Environments, pages 108--115, Apr. 2002. Google Scholar
Digital Library
- S. Lauzac, R. Melhem, and D. Mosse. Comparison of global and partitioning schemes for scheduling rate monotonic tasks on a multiprocessor. In Euromicro Workshop on Real-Time Systems, pages 188--195, 1998.Google Scholar
Cross Ref
- C. Lee, J. Hahn, Y. Seo, S. Min, R. Ha, S. Hong, C. Park, M. Lee, and C. Kim. Analysis of cache-related preemption delay in fixed-priority preemptive scheduling. In IEEE Real-Time Systems Symposium, pages 700--713, Dec. 1996. Google Scholar
Digital Library
- T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In In ACM/IEEE conference on Supercomputing, pages 1--11, Nov. 2007. Google Scholar
Digital Library
- T. Li, P. Brett, B. Hohlt, R. Knauerhase, S. McElderry, and S. Hahn. Operating system support for shared-isa asymmetric multi-core architectures. In Workshop on the Interaction between Operating Systems and Computer Architecture, pages 19--26, June 2008.Google Scholar
- J. Liu. Real-Time Systems. Prentice Hall, 2000.Google Scholar
- M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. K. Martin, and D. A. Wood. Improving multiple-cmp systems using token coherence. In International Symposium on High Performance Computer Architecture, pages 328?339, 2005. Google Scholar
Digital Library
- M. R. Marty and M. D. Hill. Virtual hierarchies to support server consolidation. In International Symposium on Computer Architecture, pages 46--56, 2007. Google Scholar
Digital Library
- M. Moir and S. Ramamurthy. Pfair scheduling of fixed and migrating periodic tasks on multiple resources. In IEEE Real-Time Systems Symposium, pages 294--303, Dec. 1999. Google Scholar
Digital Library
- F. Mueller. Timing predictions for multi-level caches. In ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems, pages 29--36, June 1997.Google Scholar
- F. Mueller. Timing analysis for instruction caches. Real-Time Systems, 18(2/3):209--239, May 2000. Google Scholar
Digital Library
- H. Ramaprasad and F. Mueller. Tightening the bounds on feasible preemptions. Transactions on Embedded Computing Systems, Mar. 2008 (accepted). Google Scholar
Digital Library
- J. Renau, B. Fragela, J. Tuck, W. Liu, L. Ceze, S. Sarangi, P. Sack, and a. P. M. K. Strauss. Sesc simulator. http://sesc.sourceforge.net, Jan. 2005.Google Scholar
- S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W. mei W. Hwu. Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 73--82, 2008. Google Scholar
Digital Library
- A. Srinivasan and J. Anderson. Optimal rate-based scheduling on multiprocessors. In ACM Symposium on Theory of Computing, pages 189--198, May 2002. Google Scholar
Digital Library
- J. Staschulat and R. Ernst. Multiple process execution in cache related preemption delay analysis. In In international conference on Embedded software, pages 278--286, 2004. Google Scholar
Digital Library
- J. Staschulat, S. Schliecker, and R. Ernst. Scheduling analysis of real-time systems with precise modeling of cache related preemption delay. In Euromicro Conference on Real-Time Systems, pages 41--48, 2005. Google Scholar
Digital Library
- K. Strauss, X. Shen, and J. Torrellas. Uncorq: Unconstrained snoop request delivery in embedded-ring multiprocessors. In International Symposium on Microarchitecture, pages 327--342, 2007. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. P. Amarasinghe. Streamit: A language for streaming applications. In Compiler Construction, pages 179--196, 2002. Google Scholar
Digital Library
- J. Wegener and F. Mueller. A comparison of static analysis and evolutionary testing for the verification of timing constraints. Real-Time Systems, 21(3):241--268, Nov. 2001. Google Scholar
Digital Library
- R. Wilhelm, J. Engblohm, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution time problem -- overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems, 7(3):1--53, Apr. 2008. Google Scholar
Digital Library
- www.openmp.org. Official OpenMP Specification, May 2005.Google Scholar
- J. Yan and W. Zhang. Time-predictable l2 caches for real-time multicore processors. In Work in Progress session of IEEE Real-Time Systems Symposium, Dec. 2007.Google Scholar
- J. Yan and W. Zhang. Wcet analysis of multi-core processors. In Work in Progress session of IEEE Real-Time Systems Symposium, Dec. 2007.Google Scholar
- J. Yan and W. Zhang. Wcet analysis for multi-core processors with shared l2 instruction caches. In IEEE Real-Time Embedded Technology and Applications Symposium, pages 80--89, Apr. 2008. Google Scholar
Digital Library
- M. Zhang and K. Asanovic. Victim migration: Dynamically adapting between private and shared cmp caches. TR 2005-064, MIT CSAIL, 2005.Google Scholar
Cross Ref
Index Terms
Push-assisted migration of real-time tasks in multi-core processors
Recommendations
Push-assisted migration of real-time tasks in multi-core processors
LCTES '09Multicores are becoming ubiquitous, not only in general-purpose but also embedded computing. This trend is a reflexion of contemporary embedded applications posing steadily increasing demands in processing power. On such platforms, prediction of timing ...
Predictable Shared Cache Management for Multi-Core Real-Time Virtualization
Special Issue on Autonomous Battery-Free Sensing and Communication, Special Issue on ESWEEK 2016 and Regular PapersReal-time virtualization has gained much attention for the consolidation of multiple real-time systems onto a single hardware platform while ensuring timing predictability. However, a shared last-level cache (LLC) on modern multi-core platforms can ...
Predictable task migration for locked caches in multi-core systems
LCTES '10Locking cache lines in hard real-time systems is a common means of achieving predictability of cache access behavior and tightening as well as reducing worst case execution time, especially in a multitasking environment. However, cache locking poses a ...







Comments