Abstract
Growing processing demand on multitasking real-time systems can be met by employing scalable multicore architectures. For such environments, locking cache lines for hard real-time systems ensures timing predictability of data references and may lower worst-case execution time. This work studies the benefits of cache locking on massive multicore architectures with private caches in the context of hard real-time systems. In shared cache architectures, the cache is a single resource shared among all of the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores.
The objective of this work is to increase the predictability of memory accesses resolved by caches while reducing the number of cores for a given task set. This allows designers to reduce the footprint of their subsystem of real-time tasks and thereby cost, either by choosing a product with fewer cores as a target or to allow more subsystems to be co-located on a given fixed number of cores.
Our work proposes a novel variant of the cache-unaware First Fit Decreasing (FFD) algorithm called Naive locked First Fit Decreasing (NFFD) policy. We propose two cache-aware static scheduling schemes: (a) Greedy First Fit Decreasing (GFFD) and (b) Colored First Fit Decreasing (CoFFD) for task sets where tasks do not have intratask conflicts among locked regions (Scenario A). NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. Experiments also show that CoFFD consistently outperforms GFFD, resulting in a lower number of cores and lower system utilization. CoFFD reduces the number of core requirements by 30% to 60% compared to NFFD.
For a more generic case where tasks have intratask conflicts, we split the task partitioning between two phases: task selection and task allocation (Scenario B). Instead of resolving conflicts at a global level, these algorithms resolve conflicts among regions while allocating a task onto a core and unlocking at region level instead of task level. We show that a combination of dynamic ordering (task selection) with Chaitin’s Coloring (task allocation) scheme reduces the number of cores required by up to 22% over a basic scheme (in a combination of monotone ordering and regional FFD). Regional unlocking allows this scheme to outperform CoFFD for medium utilization task sets from Scenario A. However, CoFFD performs better than any other scheme for high utilization task sets from Scenario A. Overall, this work is unique in considering the challenges of future multicore architectures for real-time systems and provides key insights into task partitioning and cache-locking mechanisms for architectures with private caches.
- Adapteva. 2014. Parallella Computer Specifications. Retrieved October 27, 2014, from http://www.parallella.org/board/.Google Scholar
- B. Akesson, K. Goossens, and M. Ringhofer. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 251--256. Google Scholar
Digital Library
- J. Anderson, J. Calandrino, and U. Devi. 2006. Real-time scheduling on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 179--190. Google Scholar
Digital Library
- ARM. 2014. ARM11 MPCore Processor. Retrieved October 27, 2014, from http://www.arm.com/products/processors/classic/arm11/arm11-mpcore.php.Google Scholar
- A. Burchard, J. Liebeherr, Y. Oh, and S. H. Son. 1995. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Transactions on Computers 44, 12, 1429--1442. Google Scholar
Digital Library
- J. V. Busquets-Matraix. 1996. Adding instruction cache effect to an exact schedulability analysis of preemptive real-time systems. In Proceedings of the 8th Euromicro Workshop on Real-Time Systems. 271--276. Google Scholar
Digital Library
- J. V. Busquets-Matraix. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of the 9th EuroMicro Workshop on Real-Time Systems. 56--63.Google Scholar
Cross Ref
- J. Calandrino and J. Anderson. 2008. Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In Proceedings of the 20th Euromicro Conference on Real-Time Systems. 209--308. Google Scholar
Digital Library
- G. J. Chaitin. 1982. Register allocation and spilling via graph coloring. ACM SIGPLAN Notices 17, 6, 98--101. DOI: http://dx.doi.org/10.1145/872726.806984 Google Scholar
Digital Library
- S. Chattopadhyay, A. Roychoudhury, and T. Mitra. 2010. Modeling shared cache and bus in multi-cores for timing analysis. In Proceedings of the 13th International Workshop on Software and Compilers for Embedded Systems (SCOPES’10). ACM, New York, NY, Article No. 6. DOI: http://dx.doi.org/10.1145/1811212.1811220 Google Scholar
Digital Library
- D. Choffnes, M. Astley, and M. J. Ward. 2008. Migration policies for multi-core fair-share scheduling. ACM SIGOPS Operating Systems Review 42, 92--93. Google Scholar
Digital Library
- B. D. de Dinechin, P. G. de Massas, G. Lager, C. Leger, B. Orgogozo, J. Reybert, and T. Strudel. 2013. A distributed run-time environment for the Kalray MPPAÂ-256 integrated manycore processor. Procedia Computer Science 18, 1654--1663.Google Scholar
Cross Ref
- R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign (CODES/CASHE’98). IEEE, Los Alamitos, CA, 97--101. http://dl.acm.org/citation.cfm?id=278241.278309. Google Scholar
Digital Library
- N. Eisley, L.-S. Peh, and L. Shang. 2008. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 197--207. DOI: http://dx.doi.org/10.1145/1454115.1454144 Google Scholar
Digital Library
- Jakob Engblom. 2003. Analysis of the execution time unpredictability caused by dynamic branch prediction. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 152. Google Scholar
Digital Library
- Freescale. 2008. P4080 Multicore Processor. Retrieved October 27, 2014, from http://cache.freescale.com/files/netcomm/doc/fact_sheet/QorIQ_P4080.pdf.Google Scholar
- M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co. Google Scholar
Digital Library
- N. Guan, M. Stigge, W. Yi, and G. Yu. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the 7th ACM International Conference on Embedded Software (EMSOFT’09). ACM, New York, NY, 245--254. DOI: http://dx.doi.org/10.1145/1629335.1629369 Google Scholar
Digital Library
- D. Hardy, T. Piquet, and I. Puaut. 2009. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE, Los Alamitos, CA, 68--77. DOI: http://dx.doi.org/10.1109/RTSS.2009.34 Google Scholar
Digital Library
- J. Herter, P. Backes, F. Haupenthal, and J. Reineke. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23--32. Google Scholar
Digital Library
- J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Bork, G. Schrom, and others. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10) IEEE International. IEEE, 108--109.Google Scholar
Cross Ref
- T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, NY, Article No. 53. DOI: http://dx.doi.org/10.1145/1362622.1362694 Google Scholar
Digital Library
- T. Li, P. Brett, B. Hohlt, R. Knauerhase, S. D. McElderry, and S. Hahn. 2008. Operating system support for shared-ISA asymmetric multi-core architectures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture. 19--26.Google Scholar
- J. Liedke, H. Härtig, and M. Hohmuth. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 213--223. Google Scholar
Digital Library
- T. Liu, M. Li, and C. Jason Xue. 2009. Minimizing WCET for real-time embedded systems via static instruction cache locking. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’09). 35--44. Google Scholar
Digital Library
- T. Liu, Y. Zhao, M. Li, and C. J. Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In Proceedings of the 39th International Conference on Parallel Processing. 573--582. Google Scholar
Digital Library
- R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. 2013. Real-time cache management framework for multi-core architectures. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 45--54. Google Scholar
Digital Library
- F. Mueller. 1995. Compiler support for software-based cache partitioning. In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems. 137--145. Google Scholar
Digital Library
- J. Ouyang and Y. Xie. 2010. LOFT: A high performance network-on-chip providing quality-of-service support. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 409--420. DOI: http://dx.doi.org/10.1109/MICRO.2010.21 Google Scholar
Digital Library
- M. Paolieri, E. Quiñones, F. J. Cazorla, G. Bernat, and M. Valero. 2009. Hardware support for WCET analysis of hard real-time multicore systems. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 57--68. DOI: http://dx.doi.org/10.1145/1555754.1555764 Google Scholar
Digital Library
- M. Paolieri, E. Quiñones, F. J. Cazorla, R. I. Davis, and M. Valero. 2011. IA3: An interference aware allocation algorithm for multicore hard real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 280--290. Google Scholar
Digital Library
- S. Plazar, J. C. Kleinsorge, P. Marwedel, and H. Falk. 2012. WCET-aware static locking of instruction caches. In Proceedings of the 10th International Symposium on Code Generation and Optimization. 44--52. Google Scholar
Digital Library
- I. Puaut. 2006. WCET-centric software-controlled instruction caches for hard real-time systems. In Proceedings of the 8th Euromicro Conference on Real-Time Systems (ECRTS’06). IEEE, Los Alamitos, CA, 217--226. DOI: http://dx.doi.org/10.1109/ECRTS.2006.32 Google Scholar
Digital Library
- I. Puaut and D. Decotigny. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of the 23rd IEEE Real-Time Systems Symposium (RTSS’02). IEEELos Alamitos, CA, 114. http://dl.acm.org/citation.cfm?id=827272.829141 Google Scholar
Digital Library
- I. Puaut and D. Hardy. 2007. Predictable paging in real-time systems: A compiler approach. In Proceedings of the 19th Euromicro Conference on Real-Time Systems. 169--178. Google Scholar
Digital Library
- I. Puaut and C. Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Proceedings of the Conference on Design, Automation, and Test in Europe. 1484--1489. http://portal.acm.org/citation.cfm?id=1266366.1266692. Google Scholar
Digital Library
- H. Ramaprasad and F. Mueller. 2011. Tightening the bounds on feasible preemptions. ACM Transactions on Embedded Computing Systems 10, 2, Article No. 27. DOI: http://dx.doi.org/10.1145/1880050.1880063 Google Scholar
Digital Library
- A. Sarkar, F. Mueller, and H. Ramaprasad. 2012. Static task partitioning for locked caches in multi-core real-time systems. In Proceedings of the Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 161--170. Google Scholar
Digital Library
- V. Suhendra and T. Mitra. 2008. Exploring locking and partitioning for predictable shared caches on multi-cores. In Proceedings of the 45th Annual Design Automation Conference. ACM, New York, NY, 300--303. DOI: http://dx.doi.org/10.1145/1391469.1391545 Google Scholar
Digital Library
- Tilera. 2009. Tilera Processor Family. Retrieved October 27, 2014, from http://www.tilera.com/.Google Scholar
- X. Vera, B. Lisper, and J. Xue. 2003. Data caches in multitasking hard real-time systems. In Proceedings of the 24th IEEE International Real-Time Systems Symposium (RTSS’03). IEEE, Los Alamitos, CA, 154. http://dl.acm.org/citation.cfm?id=956418.956619. Google Scholar
Digital Library
- X. Vera, B. Lisper, and J. Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems 7, 1, 4:1--4:38. Google Scholar
Digital Library
- B. C. Ward, J. L. Herman, C. J. Kenna, and J. H. Anderson. 2013. Making shared caches more predictable on multicore platforms. In Proceedings of the 25th Euromicro Conference on Real-Time Systems. 157--167. Google Scholar
Digital Library
- A. Wolfe. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the Workshop on Responsive Computer Systems.Google Scholar
- H. Yuny, R. Mancusoz, Z.-P. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium.Google Scholar
Index Terms
Static Task Partitioning for Locked Caches in Multicore Real-Time Systems
Recommendations
Static task partitioning for locked caches in multi-core real-time systems
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systemsLocking cache lines in hard real-time systems is a common means to ensure timing predictability of data references and to lower bounds on worst-case execution time, especially in a multi-tasking environment. Growing processing demand on multi-tasking ...
Predictable task migration for locked caches in multi-core systems
LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsLocking cache lines in hard real-time systems is a common means of achieving predictability of cache access behavior and tightening as well as reducing worst case execution time, especially in a multitasking environment. However, cache locking poses a ...
Predictable task migration for locked caches in multi-core systems
LCTES '10Locking cache lines in hard real-time systems is a common means of achieving predictability of cache access behavior and tightening as well as reducing worst case execution time, especially in a multitasking environment. However, cache locking poses a ...






Comments