Abstract
With the technology trend of hardware and workload consolidation for embedded systems and the rapid development of edge computing, there has been increasing interest in supporting parallel real-time tasks to better utilize the multi-core platforms while meeting the stringent real-time constraints. For parallel real-time tasks, the federated scheduling paradigm, which assigns each parallel task a set of dedicated cores, achieves good theoretical bounds by ensuring exclusive use of processing resources to reduce interferences. However, because cores share the last-level cache and memory bandwidth resources, in practice tasks may still interfere with each other despite executing on dedicated cores. Such resource interferences due to concurrent accesses can be even more severe for embedded platforms or edge servers, where the computing power and cache/memory space are limited. To tackle this issue, in this work, we present a holistic resource allocation framework for parallel real-time tasks under federated scheduling. Under our proposed framework, in addition to dedicated cores, each parallel task is also assigned with dedicated cache and memory bandwidth resources. Further, we propose a holistic resource allocation algorithm that well balances the allocation between different resources to achieve good schedulability. Additionally, we provide a full implementation of our framework by extending the federated scheduling system with Intel’s Cache Allocation Technology and MemGuard. Finally, we demonstrate the practicality of our proposed framework via extensive numerical evaluations and empirical experiments using real benchmark programs.
- [1] . 2017. Contention-aware dynamic memory bandwidth isolation with predictability in COTS multicores: An avionics case study. In Euromicro Conference on Real-Time Systems (ECRTS). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- [2] . 2016. Trading cores for memory bandwidth in real-time systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–11.Google Scholar
- [3] . 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. 483–485. Google Scholar
Digital Library
- [4] . 2012. Analyzing global-EDF for multiprocessor scheduling of parallel tasks. In International Conference on Principles Of Distributed Systems. Springer, 16–30.Google Scholar
Cross Ref
- [5] . 2018. Memory System Resource Partitioning and Monitoring (MPAM). (2018). https://developer.arm.com/documentation/ddi0598/latest/.Google Scholar
- [6] . 2017. Mixed-criticality scheduling with dynamic redistribution of shared cache. In Euromicro Conference on Real-Time Systems (ECRTS). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- [7] . 2015. Federated scheduling of sporadic DAG task systems. In International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 179–186. Google Scholar
Digital Library
- [8] . 2016. The federated scheduling of systems of mixed-criticality sporadic DAG tasks. In IEEE Real-Time Systems Symposium (RTSS). 227–236. Google Scholar
Digital Library
- [9] . 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University. http://parsec.cs.princeton.edu. Google Scholar
Digital Library
- [10] . 2013. Feasibility analysis in the sporadic DAG task model. In Euromicro Conference on Real-Time Systems (ECRTS). 225–233. Google Scholar
Digital Library
- [11] . 2014. Automatic cache partitioning and time-triggered scheduling for real-time MPSoCs. In International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 1–8.Google Scholar
Cross Ref
- [12] . 2013. Global EDF schedulability analysis for synchronous parallel tasks on multicore platforms. In Euromicro Conference on Real-Time Systems (ECRTS). 25–34. Google Scholar
Digital Library
- [13] . 2016. Ginseng: Market-driven LLC allocation. In USENIX Annual Technical Conference (ATC). 295–308. Google Scholar
Digital Library
- [14] . 2020. The SCIP Optimization Suite 7.0. In Technical Report.Google Scholar
- [15] . 2015. A survey on cache management mechanisms for real-time embedded systems. Computing Surveys (CSUR) 48, 2 (2015), 1–36. Google Scholar
Digital Library
- [16] . 2009. Cache-aware scheduling and analysis for multicores. In International Conference on Embedded Software (EMSOFT). ACM, 245–254. Google Scholar
Digital Library
- [17] . 2018. A comparative study of predictable DRAM controllers. Transactions on Embedded Computing Systems (TECS) 17, 2 (2018), 1–23. Google Scholar
Digital Library
- [18] . 2015. A framework for scheduling DRAM memory accesses for multi-core mixed-time critical systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 307–316.Google Scholar
- [19] . 2013. Intel CilkPlus v1.2. (
Sep 2013). https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm.Google Scholar - [20] . 2019. User space software for Intel(R) Resource Director Technology. (2019). https://github.com/intel/intel-cmt-cat.Google Scholar
- [21] . 2017. Semi-federated scheduling of parallel real-time tasks on multiprocessors. In Real-Time Systems Symposium (RTSS). IEEE, 80–91.Google Scholar
- [22] . 2016. On the decomposition-based global EDF scheduling of parallel real-time tasks. In Real-Time Systems Symposium (RTSS). IEEE, 237–246.Google Scholar
- [23] . 2014. Bounding memory interference delay in COTS-based multi-core systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 145–154.Google Scholar
- [24] . 2016. Real-time cache management for multi-core virtualization. In International Conference on Embedded Software (EMSOFT). IEEE, 1–10. Google Scholar
Digital Library
- [25] . 2013. Parallel scheduling for cyber-physical systems: Analysis and case study on a self-driving car. In 4th International Conference on Cyber-Physical Systems (ICCPS). 31–40. Google Scholar
Digital Library
- [26] . 2010. Scheduling parallel real-time tasks on multi-core processors. In 31st IEEE Real-Time Systems Symposium (RTSS). 259–268. Google Scholar
Digital Library
- [27] . 2013. Analysis of global EDF for parallel tasks. In 25th Euromicro Conference on Real-Time Systems (ECRTS). 3–13.Google Scholar
- [28] . 2014. Analysis of federated and global scheduling for parallel real-time tasks. In 26th Euromicro Conference on Real-Time Systems (ECRTS). 85–96. Google Scholar
Digital Library
- [29] . 2016. Randomized work stealing for large scale soft real-time systems. In IEEE Real-Time Systems Symposium (RTSS). 203–214.Google Scholar
- [30] . 2016. Architecture and analysis of a dynamically-scheduled real-time memory controller. Real-Time Systems 52, 5 (2016), 675–729. Google Scholar
Digital Library
- [31] . 1995. Memory bandwidth and machine balance in current high performance computers. Computer Society Technical Committee on Computer Architecture (TCCA) newsletter 2, 19–25 (1995).Google Scholar
- [32] . 2016. Schedulability analysis of conditional parallel task graphs in multicore systems. IEEE Trans. Comput. 66, 2 (2016), 339–353. Google Scholar
Digital Library
- [33] . 2012. Techniques optimizing the number of processors to schedule multi-threaded tasks. In 24th Euromicro Conference on Real-Time Systems (ECRTS). 321–330. Google Scholar
Digital Library
- [34] . 2019. Cache-conscious off-line real-time scheduling for multi-core platforms: Algorithms and implementation. Real-Time Systems 55, 4 (2019), 810–849.Google Scholar
Digital Library
- [35] . 2013. OpenMP Application Program Interface v4.0. (
July 2013). http://http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.Google Scholar - [36] . 2014. Problem Based Benchmark Suite. (2014). http://www.cs.cmu.edu/pbbs.Google Scholar
- [37] . 2016. Memory servers for multicore systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–12.Google Scholar
- [38] . 2013. Multi-core real-time scheduling for generalized parallel task models. Real-Time Systems 49, 4 (2013), 404–435.Google Scholar
Cross Ref
- [39] . 2015. Static task partitioning for locked caches in multicore real-time systems. Transactions on Embedded Computing Systems (TECS) 14, 1 (2015), 1–30. Google Scholar
Digital Library
- [40] . 2017. The emergence of edge computing. Computer 50, 1 (2017), 30–39. Google Scholar
Digital Library
- [41] . 2016. Edge computing: Vision and challenges. Internet of Things Journal 3, 5 (2016), 637–646.Google Scholar
Cross Ref
- [42] . 2020. E-WarP: A system-wide framework for memory bandwidth profiling and management. In Real-Time Systems Symposium (RTSS). IEEE, 345–357.Google Scholar
- [43] . 2020. Bringing inter-thread cache benefits to federated scheduling. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 281–295.Google Scholar
- [44] . 2018. Reservation-based federated scheduling for parallel real-time tasks. In IEEE Real-Time Systems Symposium (RTSS). 482–494.Google Scholar
- [45] . 2016. Taming non-blocking caches to improve isolation in multicore real-time systems. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–12.Google Scholar
- [46] . 2014. FJOS: Practical, predictable, and efficient system support for fork/join parallelism. In Real-Time and Embedded Technology and Applications Symposium (RTAS), IEEE 20th. 25–36.Google Scholar
- [47] . 2017. Schedulability analysis of non-preemptive real-time scheduling for multicore processors with shared caches. In Real-Time Systems Symposium (RTSS). IEEE, 199–208.Google Scholar
- [48] . 2019. Holistic resource allocation for multicore real-time systems. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 345–356. Google Scholar
Digital Library
- [49] . 2016. Maracas: A real-time multicore vCPU scheduling framework. In Real-Time Systems Symposium (RTSS). IEEE, 179–190.Google Scholar
- [50] . 2016. Bwlock: A dynamic memory access control framework for soft real-time applications on multicore platforms. Transactions on Computers (TC) 66, 7 (2016), 1247–1252.Google Scholar
Digital Library
- [51] . 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 155–166.Google Scholar
- [52] . 2013. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 55–64. Google Scholar
Digital Library
- [53] . 2015. Memory bandwidth management for efficient performance isolation in multi-core platforms. Transactions on Computers (TC) 65, 2 (2015), 562–576. Google Scholar
Digital Library
- [54] . 2009. Towards practical page coloring-based multicore cache management. In European Conference on Computer Systems. ACM, 89–102. Google Scholar
Digital Library
- [55] . 2016. MITTS: Memory inter-arrival time traffic shaping. SIGARCH Computer Architecture News 44, 3 (2016), 532–544. Google Scholar
Digital Library
- [56] . 2019. Deterministic futexes: Addressing WCET and bounded interference concerns. In Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 65–76.Google Scholar
Index Terms
Holistic Resource Allocation Under Federated Scheduling for Parallel Real-time Tasks
Recommendations
Analysis of Federated Scheduling for Integer-Valued Workloads
RTNS '22: Proceedings of the 30th International Conference on Real-Time Networks and SystemsIn federated scheduling of parallel real-time tasks on multiprocessor systems, high-utilization tasks are allocated dedicated processors on which they execute exclusively. Several methods exist for allocating a sufficient number of processors to ...
Mixed-criticality federated scheduling for parallel real-time tasks
A mixed-criticality system comprises safety-critical and non-safety-critical tasks sharing a computational platform. Thus, different levels of assurance are required by different tasks in terms of real-time performance. As the computational demands of ...
Dynamic Global Scheduling of Parallel Real-Time Tasks
CSE '12: Proceedings of the 2012 IEEE 15th International Conference on Computational Science and EngineeringHigh-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, leaving the scheduling of the tasks onto processors to be performed at runtime. Therefore, if the ...






Comments