Abstract
Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key activities to pursue code optimization which have profound impact on WCET. At the same time, these three activities exhibit a phase ordering problem: Independently performing register allocation, scheduling and cluster assignment could have a negative effect on the other phases, thereby generating sub-optimal compiled codes. In this paper, a compiler level optimization, namely WCET-aware Re-scheduling Register Allocation (WRRA), is proposed to achieve WCET minimization for real-time embedded systems with clustered VLIW architecture. The novelty of the proposed approach is that the effects of register allocation, instruction scheduling and cluster assignment on the quality of generated code are taken into account for WCET minimization. These three compilation processes are integrated into a single phase to obtain a balanced result. The proposed technique is implemented in Trimaran 4.0. The experimental results show that the proposed technique can reduce WCET effectively, by 33% on average.
- H. Falk. "WCET-aware register allocation based on graph coloring," in DAC '09: Proceedings of the 46th annual design automation conference, 2009, pp. 726--731. Google Scholar
Digital Library
- Texas Instruments Inc., "TMS320C62x/67x CPU and instrction set reference guide," 1998.Google Scholar
- T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: The fifteenth IEEE real-time and embedded technology and applications symposium, 2009, pp. 35--44. Google Scholar
Digital Library
- "MAP1000 unfolds at equator," in Microprocessor Report, 1998.Google Scholar
- J. Fridman and A. Greefield, "The TigerSharc DSP architecture," in IEEE Micro, 2000, pp. 66--76. Google Scholar
Digital Library
- M. D. Smith, N. Ramsey, and G. Holloway, "A generalized algorithm for graph-coloring register allocation," in PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation, 2004, pp. 277--288. Google Scholar
Digital Library
- P. Briggs, "Register allocation via graph coloring," PhD thesis, Rice University, Houston, USA, 1992. Google Scholar
Digital Library
- L. George and A. W. Appel, "Iterated registe coalescing," in ACM transactions on programming language systems, 1996, pp. 300--324. Google Scholar
Digital Library
- P. Lokuciejewski, H. Falk, and P. Marwedel, "WCET-driven cache-based procedure positioning optimizations," in ECRTS '08: Proceedings of euromicro technical committee on real-time systems, 2008 pp. 321--330. Google Scholar
Digital Library
- J.-F. Deverge and I. Puaut, "WCET-directed dynamic scratchpad memory allocation of data," in ECRTS '07: Proceedings of euromicro technical committee on real-time systems, 2007, pp. 179--190. Google Scholar
Digital Library
- I. Puaut and C. Pais, "Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison," in DATE '07: Proceedings of design, automation and test in Europe, 2007, pp. 1484--1489. Google Scholar
Digital Library
- R. Leupers, "Instruction scheduling for clustered VLIW dsps," in PACT '00: Proceedings of the international conference on parallel architecture and compilation techniques, 2000, pp. 291--300. Google Scholar
Digital Library
- http://www.absint.com/aitGoogle Scholar
- J. A. Fisher, J. R. Ellis, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," in Proceedings 1984 SIG-PLAN symposium on compiler construction, 1984, pp. 37--47. Google Scholar
Digital Library
- Texas Instrucments, Inc., in TMS320C62xx CPU and Instruction Set: Reference Guide, 1997, Manufacturing part #D426008--9761, revision A.Google Scholar
- L. Gwennap, "Digital 21264 sets new standard," in Microprocessor Report, 1996, Vol. 10, pp. 11--16.Google Scholar
- An infrastructure for research in backend compilation and architecture exploration. http://www.trimaran.orgGoogle Scholar
- MiBench. http://www.eecs.umich.edu/mibenchGoogle Scholar
- M. Poletto and V. Sarkar, "Linear scan register allocation," in ACM transactions on programming languages and systems, 1999, pp. 895--913. Google Scholar
Digital Library
- D. W. Goodwin and K. D. Wilken, "Optimal and near-optimal global register allocation using 0--1 interger programming," in Software: practice and experience, 1996, pp. 929--965. Google Scholar
Digital Library
- S. Hack, D. Grund, and G. Goos, "Register allocation for programs in SSA form," in CC '06: International conference on compiler construction, 2006, Vol. 3923, pp. 247--262. Google Scholar
Digital Library
- J. M. Codina, J. Sanchez, and A. Gonzalez, "A unified modulo scheduling and register allocation technique for clustered processors," in PACT '01: Proceedings of the international conference on parallel architecture and compilation techniques, 2001, pp. 175--184. Google Scholar
Digital Library
- G. J. Chaitin, M. A. Auslander, et al., "Register allocation via coloring," in Comouter language, 1981, pp. 47--57. Google Scholar
Digital Library
- J. D. Ullman, "Complexity of sequencing problems," 1976.Google Scholar
- J. R. Ellis, "Bulldog: A compiler for VLIW architectures," in The MIT press, 1986, pp. 180--184. Google Scholar
Digital Library
- S. Jang, S. Carr, P. Sweany, and D. Kuras, "A code generation framwork for VLIW architectures with partitioned register banks," in MPCS '98: Proceedings of 3rd international conference on massively parallel computing systems, 1998, pp. 61--69.Google Scholar
- V. S. Lapinskii and M. F. Jacome, "Cluster assignment for high-performance embedded VLIW processors," in ACM transactions on design automation of electronic systems, 2002, pp. 430--454. Google Scholar
Digital Library
- E. Ozer, S. Banerjia, and T. M. Conte, "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures," in MICRO, 1998, pp. 308--315. Google Scholar
Digital Library
- R. Nagpal and Y. N. Srikant, "Progmatic integrated scheduling for clustered VLIW architectures," in Software-practice and experience, 2008, pp. 227--257. Google Scholar
Digital Library
- K. Kailars, A. Agrawala, and K. Ebcioglu, "Cars: a new code generation framwork for clustered ILP processors," in HPCA '01: Prodeedings of the 7th international symposium on high-performance computer architecture, 2001, pp. 133--143. Google Scholar
Digital Library
- J. Sanchez and A. Conzalezor, "Instruction scheduling for clustered VLIW architectures," in Proceedings of 13th international symposium on system synthesis, 2000, pp. 41--46. Google Scholar
Digital Library
- J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, "Modulo scheduling with integrated register spilling for clustered VLIW architectures," in MICRO, 2001, pp. 160--169. Google Scholar
Digital Library
- F. Pereira and J. Palsberg, "Register allocation by puzzle solving," in PLDI '08: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, 2008, pp. 216--226. Google Scholar
Digital Library
- P. Faraboschi, G. Brown, J. Fisher, G. Desoli, and F. Homewood, "Lx: a technology platform for customizable VLIW embedded processing," in ISCA '00: Proceedings of the 27th international symposium on comupter architecture, 2000, pp. 203--213. Google Scholar
Digital Library
- T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: Real-time and embedded technology and applications symposium, 2009, pp. 35--44. Google Scholar
Digital Library
- T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC," in Journal of parallel distributed computing, 2011, Vol. 71, pp. 1473--1483. Google Scholar
Digital Library
- T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Task assignment with cache partitioning and locking for WCET minimization on MPSoC," in ICPP '10: International conference on parallel processing, 2010, pp. 573--582. Google Scholar
Digital Library
- C. Q. Xu, C. J. Xue, Y. He, and E. H.-M. Sha, "Energy efficient joint scheduling and multi-core interconnect design," in ASP-DAC '10: Asia and south pacific design automation conference, 2010, pp. 879--884. Google Scholar
Digital Library
- C. Q. Xu, C. J. Xue, J. Hu, and E. H.-M. Sha, "Optimizing scheduling and intercluster connection for application-specific DSP processors," in IEEE transactions on signal processing, 2009, Vol. 57, pp. 4538--4547. Google Scholar
Digital Library
- C. Q. Xu, C. J. Xue, and E. H.-M. Sha, "Energy-efficient joint scheduling and application-specific interconnection design," in IEEE transactions on very large scale integration systems, 2011, Vol. 19, pp. 1813--1822. Google Scholar
Digital Library
- T. Liu, A. Orailoglu, and C. J. Xue, "Register allocation for simultaneous reduction of energy and peak temperature on registers," in DATE '11: Proceedings of design, automation and test in Europe, 2011, pp. 20--25.Google Scholar
Index Terms
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
Recommendations
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded SystemsWorst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key ...
An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017In real-time embedded system design, one major goal is to construct a feasible schedule. Whether a feasible schedule exists depends on the Worst-Case Execution Time (WCET) of each task. Consequently, it is important to minimize the WCET of each task. We ...
WCET-Aware Re-Scheduling Register Allocation for Real-Time Embedded Systems With Clustered VLIW Architecture
Worst-case execution time (WCET) is one of the most important metric in real-time embedded system design. For embedded systems with clustered very long instruction word (VLIW) architecture, register allocation, instruction scheduling, and cluster ...






Comments