Abstract
In real-time embedded system design, one major goal is to construct a feasible schedule. Whether a feasible schedule exists depends on the Worst-Case Execution Time (WCET) of each task. Consequently, it is important to minimize the WCET of each task. We investigate the problem of instruction scheduling and register allocation for a program executed on a clustered Very Long Instruction Word (VLIW) processor such that the WCET of the program is minimized, and propose a novel, unified instruction scheduling and register allocation heuristic approach. Our heuristic approach is underpinned by a set of novel techniques, including spanning graph-based WCET-aware live range splitting, WCET-aware dynamic register pressure control, WCET-aware basic block prioritization for performing integrated instruction scheduling and register allocation, and WCET-aware spill code handling. We have implemented our approach in Trimaran 4.0, and compared it with the state-of-the-art approach by using a set of 20 benchmarks. The experimental results show that our approach achieves the maximum WCET improvement of 29.61% and the average WCET improvement of 10.23%, respectively.
- Thomas S. Brasier, Philip H. Sweany, Steven J. Beaty, and Steve Carr. 1995. CRAIG: A practical framework for combining instruction scheduling and register assignment. In Proceedings of the 1995 International Conference on Parallel Architectures and Compiler Techniques (PACT 95), Limassol, Cyprus. Google Scholar
Digital Library
- Preston Briggs. 1992. Register allocation via graph coloring. Ph.D. Dissertation. Rice University. Google Scholar
Digital Library
- Preston Briggs, Keith D. Cooper, and Linda Torczon. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 428--455. Google Scholar
Digital Library
- Inc CEVA. 2017. CEVA-X DSP Cores for Multimedia and Communications. Copyright. (2017). Retrieved from http://www.ceva-dsp.com/product/ceva-x2/.Google Scholar
- Gregory J. Chaitin. 1986. Register allocation and spilling via graph coloring. (Feb. 18 1986). US Patent 4,571,678.Google Scholar
- Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register allocation via coloring. Computer Languages 6, 1 (1981), 47--57. Google Scholar
Digital Library
- Lakshmi N. Chakrapani, John Gyllenhaal, W. Hwu Wen-mei, Scott A. Mahlke, Krishna V. Palem, and Rodric M. Rabbah. 2004. Trimaran: An infrastructure for research in instruction-level parallelism. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 32--41. Google Scholar
Digital Library
- Transmeta Corporation. 2001-2004. Transmeta Efficeon Processor. Copyright. (2001-2004). Retrieved from http://datasheets.chipdb.org/Transmeta/TM8600/efficeon_tm8600_prod_brief.pdf.Google Scholar
- John Rolfe Ellis. 1985. Bulldog: A compiler for VLIW architectures. Technical Report. Yale Univ., New Haven, CT (USA).Google Scholar
- Mattias V. Eriksson, Oskar Skoog, and Christoph W. Kessler. 2008. Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of the 11th International Workshop on Software 8 Compilers for Embedded Systems. ACM, 11--20. Google Scholar
- Heiko Falk. 2009. WCET-aware register allocation based on graph coloring. In Proceedings of the 46th Annual Design Automation Conference. ACM, 726--731. Google Scholar
Digital Library
- H. Falk and H. Kotthaus. 2011. WCET-driven cache-aware code positioning. In 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). 145--154. Google Scholar
Digital Library
- Heiko Falk, Norman Schmitz, and Florian Schmoll. 2011. WCET-aware Register Allocation Based on Integer-Linear Programming. In Proceedings of the 2011 23rd Euromicro Conference on Real-Time Systems. IEEE Computer Society, 13--22. Google Scholar
Digital Library
- David E. Golberg. 1989. Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989 (1989), 102. Google Scholar
Digital Library
- James R. Goodman and W.-C. Hsu. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd International Conference on Supercomputing. ACM, 442--452. Google Scholar
Digital Library
- Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The Mälardalen WCET benchmarks: Past, present and future. In OASIcs-OpenAccess Series in Informatics, Vol. 15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop. IEEE Computer Society, 3--14. Google Scholar
Digital Library
- Yazhi Huang, Qingan Li, and Chun Jason Xue. 2011. Minimizing Schedule Length via Cooperative Register Allocation and Loop Scheduling for Embedded Systems. In Proceedings of the 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications. IEEE Computer Society, 1038--1044. Google Scholar
Digital Library
- Yazhi Huang, Liang Shi, Jianhua Li, Qingan Li, and Chun Jason Xue. 2014. WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 1 (2014), 168--180. Google Scholar
Digital Library
- Texas Instruments incorporated. 1995-2016. TI TMS320C64x DSPs Architecture. Copyright. (1995-2016). Retrieved from http://www.ti.com.Google Scholar
- Morteza Mohajjel Kafshdooz, Mohammadkazem Taram, Sepehr Assadi, and Alireza Ejlali. 2016. A Compile-Time Optimization Method for WCET Reduction in Real-Time Embedded Systems Through Block Formation. ACM Trans. Archit. Code Optim. 12, 4, Article 66 (Jan. 2016), 25 pages. Google Scholar
Digital Library
- Krishnan Kailas, Kemal Ebcioglu, and Ashok Agrawala. 2001. CARS: A new code generation framework for clustered ILP processors. In The Seventh International Symposium on High-Performance Computer Architecture (HPCA'01). IEEE, 133--143. Google Scholar
Digital Library
- F. Li, M. Zhao, and C. J. Xue. 2015. C3: Cooperative Code Positioning and Cache Locking for WCET Minimization. In 2015 IEEE 21st International Conference on Embedded and Real-Time Computing Systems and Applications. 51--59. Google Scholar
Digital Library
- Björn Lisper. 2014. SWEET--a tool for WCET flow analysis. In International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Springer, 482--485.Google Scholar
- David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM (JACM) 30, 3 (1983), 417--427. Google Scholar
Digital Library
- Rahul Nagpal and Y. N. Srikant. 2008. Pragmatic integrated scheduling for clustered VLIW architectures. Software: Practice and Experience 38, 3 (2008), 227--257. Google Scholar
Digital Library
- STMicroelectronics Group of Companies. 2000-2004. ST200 VLIW Series ST231 Core and Instruction Set Architecture Manual. Copyright. (2000-2004). Retrieved from http://lipforge.ens-lyon.fr/docman/view.php/53/26/st231arch.pdf.Google Scholar
- Jinpyo Park and Soo-Mook Moon. 2004. Optimistic Register Coalescing. ACM Trans. Program. Lang. Syst. 26, 4 (July 2004), 735--765. Google Scholar
Digital Library
- Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 5 (1999), 895--913. Google Scholar
Digital Library
- Vasileios Porpodas and Marcelo Cintra. 2013. LUCAS: Latency-adaptive unified cluster assignment and instruction scheduling. ACM SIGPLAN Notices 48, 5 (2013), 45--54. Google Scholar
Digital Library
- Xuesong Su, Hui Wu, and Qing Yang. 2016. An Efficient WCET-Aware Hybrid Global Branch Prediction Approach. In Proceedings of the 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, 195--201.Google Scholar
Cross Ref
- Jeffrey D. Ullman. 1976. Complexity of sequencing problems. In Computer and Job-Shop Scheduling Theory.Google Scholar
- D. J. A. Welsh and M. B. Powell. 1967. An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10, 1 (1967), 85--87.Google Scholar
- Xuemeng Zhang, Hui Wu, and Haiyan Sun. 2013. Register Allocation by Incremental Graph Colouring for Clustered VLIW Processors. In Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 927--934. Google Scholar
Digital Library
- Xuemeng Zhang, Hui Wu, Haiyan Sun, and Jingling Xue. 2014. Lifetime holes aware register allocation for clustered VLIW processors. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014. IEEE, 1--4. Google Scholar
Digital Library
- Xuemeng Zhang, Hui Wu, and Jingling Xue. 2011. An efficient heuristic for instruction scheduling on clustered vliw processors. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 35--44. Google Scholar
Digital Library
- Xuemeng Zhang, Hui Wu, and Jingling Xue. 2013. Instruction scheduling with k-successor tree for clustered VLIW processors. Design Automation for Embedded Systems 17, 2 (2013), 439--458. Google Scholar
Digital Library
- Wenguang Zheng and Hui Wu. 2017. Dynamic Data-Cache Locking for Minimizing the WCET of a Single Task. ACM Trans. Embed. Comput. Syst. 16, 2, Article 31 (Jan. 2017), 29 pages. Google Scholar
Digital Library
- Wenguang Zheng, Hui Wu, and Qing Yang. 2017. WCET-Aware Dynamic I-Cache Locking for a Single Task. ACM Trans. Archit. Code Optim. 14, 1, Article 4 (March 2017), 26 pages. Google Scholar
Digital Library
Index Terms
An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors
Recommendations
WCET-aware hyper-block construction for clustered VLIW processors
LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsHyper-blocks can significantly improve instruction level parallelism on a wide range of super-scalar and VLIW processors. However, most hyper-block construction approaches aim at minimizing the average-case execution time of a program. In real-time ...
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture
LCTES '12Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key ...
Evaluating Register Allocation and Instruction Scheduling Techniques in Out-Of-Order Issue Processors
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation TechniquesThe phase ordering of register allocation and instruction scheduling in a compiler and their integration have been well studied for in-order issue and VLIW processors. In this paper we study this problem in the context of out-of-order issue processors. ...






Comments