skip to main content
research-article

Static Task Partitioning for Locked Caches in Multicore Real-Time Systems

Published:21 January 2015Publication History
Skip Abstract Section

Abstract

Growing processing demand on multitasking real-time systems can be met by employing scalable multicore architectures. For such environments, locking cache lines for hard real-time systems ensures timing predictability of data references and may lower worst-case execution time. This work studies the benefits of cache locking on massive multicore architectures with private caches in the context of hard real-time systems. In shared cache architectures, the cache is a single resource shared among all of the tasks. However, in scalable cache architectures with private caches, conflicts exist only among the tasks scheduled on one core. This calls for a cache-aware allocation of tasks onto cores.

The objective of this work is to increase the predictability of memory accesses resolved by caches while reducing the number of cores for a given task set. This allows designers to reduce the footprint of their subsystem of real-time tasks and thereby cost, either by choosing a product with fewer cores as a target or to allow more subsystems to be co-located on a given fixed number of cores.

Our work proposes a novel variant of the cache-unaware First Fit Decreasing (FFD) algorithm called Naive locked First Fit Decreasing (NFFD) policy. We propose two cache-aware static scheduling schemes: (a) Greedy First Fit Decreasing (GFFD) and (b) Colored First Fit Decreasing (CoFFD) for task sets where tasks do not have intratask conflicts among locked regions (Scenario A). NFFD is capable of scheduling high utilization task sets that FFD cannot schedule. Experiments also show that CoFFD consistently outperforms GFFD, resulting in a lower number of cores and lower system utilization. CoFFD reduces the number of core requirements by 30% to 60% compared to NFFD.

For a more generic case where tasks have intratask conflicts, we split the task partitioning between two phases: task selection and task allocation (Scenario B). Instead of resolving conflicts at a global level, these algorithms resolve conflicts among regions while allocating a task onto a core and unlocking at region level instead of task level. We show that a combination of dynamic ordering (task selection) with Chaitin’s Coloring (task allocation) scheme reduces the number of cores required by up to 22% over a basic scheme (in a combination of monotone ordering and regional FFD). Regional unlocking allows this scheme to outperform CoFFD for medium utilization task sets from Scenario A. However, CoFFD performs better than any other scheme for high utilization task sets from Scenario A. Overall, this work is unique in considering the challenges of future multicore architectures for real-time systems and provides key insights into task partitioning and cache-locking mechanisms for architectures with private caches.

References

  1. Adapteva. 2014. Parallella Computer Specifications. Retrieved October 27, 2014, from http://www.parallella.org/board/.Google ScholarGoogle Scholar
  2. B. Akesson, K. Goossens, and M. Ringhofer. 2007. Predator: A predictable SDRAM memory controller. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 251--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Anderson, J. Calandrino, and U. Devi. 2006. Real-time scheduling on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ARM. 2014. ARM11 MPCore Processor. Retrieved October 27, 2014, from http://www.arm.com/products/processors/classic/arm11/arm11-mpcore.php.Google ScholarGoogle Scholar
  5. A. Burchard, J. Liebeherr, Y. Oh, and S. H. Son. 1995. New strategies for assigning real-time tasks to multiprocessor systems. IEEE Transactions on Computers 44, 12, 1429--1442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. V. Busquets-Matraix. 1996. Adding instruction cache effect to an exact schedulability analysis of preemptive real-time systems. In Proceedings of the 8th Euromicro Workshop on Real-Time Systems. 271--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. V. Busquets-Matraix. 1997. Hybrid instruction cache partitioning for preemptive real-time systems. In Proceedings of the 9th EuroMicro Workshop on Real-Time Systems. 56--63.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Calandrino and J. Anderson. 2008. Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In Proceedings of the 20th Euromicro Conference on Real-Time Systems. 209--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. J. Chaitin. 1982. Register allocation and spilling via graph coloring. ACM SIGPLAN Notices 17, 6, 98--101. DOI: http://dx.doi.org/10.1145/872726.806984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chattopadhyay, A. Roychoudhury, and T. Mitra. 2010. Modeling shared cache and bus in multi-cores for timing analysis. In Proceedings of the 13th International Workshop on Software and Compilers for Embedded Systems (SCOPES’10). ACM, New York, NY, Article No. 6. DOI: http://dx.doi.org/10.1145/1811212.1811220 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Choffnes, M. Astley, and M. J. Ward. 2008. Migration policies for multi-core fair-share scheduling. ACM SIGOPS Operating Systems Review 42, 92--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. D. de Dinechin, P. G. de Massas, G. Lager, C. Leger, B. Orgogozo, J. Reybert, and T. Strudel. 2013. A distributed run-time environment for the Kalray MPPAÂ-256 integrated manycore processor. Procedia Computer Science 18, 1654--1663.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign (CODES/CASHE’98). IEEE, Los Alamitos, CA, 97--101. http://dl.acm.org/citation.cfm?id=278241.278309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Eisley, L.-S. Peh, and L. Shang. 2008. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 197--207. DOI: http://dx.doi.org/10.1145/1454115.1454144 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jakob Engblom. 2003. Analysis of the execution time unpredictability caused by dynamic branch prediction. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Freescale. 2008. P4080 Multicore Processor. Retrieved October 27, 2014, from http://cache.freescale.com/files/netcomm/doc/fact_sheet/QorIQ_P4080.pdf.Google ScholarGoogle Scholar
  17. M. R. Garey and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Guan, M. Stigge, W. Yi, and G. Yu. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the 7th ACM International Conference on Embedded Software (EMSOFT’09). ACM, New York, NY, 245--254. DOI: http://dx.doi.org/10.1145/1629335.1629369 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Hardy, T. Piquet, and I. Puaut. 2009. Using bypass to tighten WCET estimates for multi-core processors with shared instruction caches. In Proceedings of the 30th IEEE Real-Time Systems Symposium (RTSS’09). IEEE, Los Alamitos, CA, 68--77. DOI: http://dx.doi.org/10.1109/RTSS.2009.34 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Herter, P. Backes, F. Haupenthal, and J. Reineke. 2011. CAMA: A predictable cache-aware memory allocator. In Proceedings of the 23rd Euromicro Conference on Real-Time Systems. 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Bork, G. Schrom, and others. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10) IEEE International. IEEE, 108--109.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Li, D. Baumberger, D. A. Koufaty, and S. Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). ACM, New York, NY, Article No. 53. DOI: http://dx.doi.org/10.1145/1362622.1362694 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Li, P. Brett, B. Hohlt, R. Knauerhase, S. D. McElderry, and S. Hahn. 2008. Operating system support for shared-ISA asymmetric multi-core architectures. In Proceedings of the Workshop on the Interaction between Operating Systems and Computer Architecture. 19--26.Google ScholarGoogle Scholar
  24. J. Liedke, H. Härtig, and M. Hohmuth. 1997. OS-controlled cache predictability for real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 213--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Liu, M. Li, and C. Jason Xue. 2009. Minimizing WCET for real-time embedded systems via static instruction cache locking. In Proceedings of the IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS’09). 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Liu, Y. Zhao, M. Li, and C. J. Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In Proceedings of the 39th International Conference on Parallel Processing. 573--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Mancuso, R. Dudko, E. Betti, M. Cesati, M. Caccamo, and R. Pellizzoni. 2013. Real-time cache management framework for multi-core architectures. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Mueller. 1995. Compiler support for software-based cache partitioning. In Proceedings of the ACM SIGPLAN Workshop on Language, Compiler, and Tool Support for Real-Time Systems. 137--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Ouyang and Y. Xie. 2010. LOFT: A high performance network-on-chip providing quality-of-service support. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 409--420. DOI: http://dx.doi.org/10.1109/MICRO.2010.21 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Paolieri, E. Quiñones, F. J. Cazorla, G. Bernat, and M. Valero. 2009. Hardware support for WCET analysis of hard real-time multicore systems. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 57--68. DOI: http://dx.doi.org/10.1145/1555754.1555764 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Paolieri, E. Quiñones, F. J. Cazorla, R. I. Davis, and M. Valero. 2011. IA3: An interference aware allocation algorithm for multicore hard real-time systems. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium. 280--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Plazar, J. C. Kleinsorge, P. Marwedel, and H. Falk. 2012. WCET-aware static locking of instruction caches. In Proceedings of the 10th International Symposium on Code Generation and Optimization. 44--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. I. Puaut. 2006. WCET-centric software-controlled instruction caches for hard real-time systems. In Proceedings of the 8th Euromicro Conference on Real-Time Systems (ECRTS’06). IEEE, Los Alamitos, CA, 217--226. DOI: http://dx.doi.org/10.1109/ECRTS.2006.32 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Puaut and D. Decotigny. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of the 23rd IEEE Real-Time Systems Symposium (RTSS’02). IEEELos Alamitos, CA, 114. http://dl.acm.org/citation.cfm?id=827272.829141 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. Puaut and D. Hardy. 2007. Predictable paging in real-time systems: A compiler approach. In Proceedings of the 19th Euromicro Conference on Real-Time Systems. 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. I. Puaut and C. Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Proceedings of the Conference on Design, Automation, and Test in Europe. 1484--1489. http://portal.acm.org/citation.cfm?id=1266366.1266692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Ramaprasad and F. Mueller. 2011. Tightening the bounds on feasible preemptions. ACM Transactions on Embedded Computing Systems 10, 2, Article No. 27. DOI: http://dx.doi.org/10.1145/1880050.1880063 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Sarkar, F. Mueller, and H. Ramaprasad. 2012. Static task partitioning for locked caches in multi-core real-time systems. In Proceedings of the Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. V. Suhendra and T. Mitra. 2008. Exploring locking and partitioning for predictable shared caches on multi-cores. In Proceedings of the 45th Annual Design Automation Conference. ACM, New York, NY, 300--303. DOI: http://dx.doi.org/10.1145/1391469.1391545 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tilera. 2009. Tilera Processor Family. Retrieved October 27, 2014, from http://www.tilera.com/.Google ScholarGoogle Scholar
  41. X. Vera, B. Lisper, and J. Xue. 2003. Data caches in multitasking hard real-time systems. In Proceedings of the 24th IEEE International Real-Time Systems Symposium (RTSS’03). IEEE, Los Alamitos, CA, 154. http://dl.acm.org/citation.cfm?id=956418.956619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Vera, B. Lisper, and J. Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems 7, 1, 4:1--4:38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. B. C. Ward, J. L. Herman, C. J. Kenna, and J. H. Anderson. 2013. Making shared caches more predictable on multicore platforms. In Proceedings of the 25th Euromicro Conference on Real-Time Systems. 157--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Wolfe. 1993. Software-based cache partitioning for real-time applications. In Proceedings of the Workshop on Responsive Computer Systems.Google ScholarGoogle Scholar
  45. H. Yuny, R. Mancusoz, Z.-P. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In Proceedings of the IEEE Real-Time Embedded Technology and Applications Symposium.Google ScholarGoogle Scholar

Index Terms

  1. Static Task Partitioning for Locked Caches in Multicore Real-Time Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!