skip to main content
research-article

An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors

Authors Info & Claims
Published:27 September 2017Publication History
Skip Abstract Section

Abstract

In real-time embedded system design, one major goal is to construct a feasible schedule. Whether a feasible schedule exists depends on the Worst-Case Execution Time (WCET) of each task. Consequently, it is important to minimize the WCET of each task. We investigate the problem of instruction scheduling and register allocation for a program executed on a clustered Very Long Instruction Word (VLIW) processor such that the WCET of the program is minimized, and propose a novel, unified instruction scheduling and register allocation heuristic approach. Our heuristic approach is underpinned by a set of novel techniques, including spanning graph-based WCET-aware live range splitting, WCET-aware dynamic register pressure control, WCET-aware basic block prioritization for performing integrated instruction scheduling and register allocation, and WCET-aware spill code handling. We have implemented our approach in Trimaran 4.0, and compared it with the state-of-the-art approach by using a set of 20 benchmarks. The experimental results show that our approach achieves the maximum WCET improvement of 29.61% and the average WCET improvement of 10.23%, respectively.

References

  1. Thomas S. Brasier, Philip H. Sweany, Steven J. Beaty, and Steve Carr. 1995. CRAIG: A practical framework for combining instruction scheduling and register assignment. In Proceedings of the 1995 International Conference on Parallel Architectures and Compiler Techniques (PACT 95), Limassol, Cyprus. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Preston Briggs. 1992. Register allocation via graph coloring. Ph.D. Dissertation. Rice University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Preston Briggs, Keith D. Cooper, and Linda Torczon. 1994. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 16, 3 (1994), 428--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Inc CEVA. 2017. CEVA-X DSP Cores for Multimedia and Communications. Copyright. (2017). Retrieved from http://www.ceva-dsp.com/product/ceva-x2/.Google ScholarGoogle Scholar
  5. Gregory J. Chaitin. 1986. Register allocation and spilling via graph coloring. (Feb. 18 1986). US Patent 4,571,678.Google ScholarGoogle Scholar
  6. Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register allocation via coloring. Computer Languages 6, 1 (1981), 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lakshmi N. Chakrapani, John Gyllenhaal, W. Hwu Wen-mei, Scott A. Mahlke, Krishna V. Palem, and Rodric M. Rabbah. 2004. Trimaran: An infrastructure for research in instruction-level parallelism. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Transmeta Corporation. 2001-2004. Transmeta Efficeon Processor. Copyright. (2001-2004). Retrieved from http://datasheets.chipdb.org/Transmeta/TM8600/efficeon_tm8600_prod_brief.pdf.Google ScholarGoogle Scholar
  9. John Rolfe Ellis. 1985. Bulldog: A compiler for VLIW architectures. Technical Report. Yale Univ., New Haven, CT (USA).Google ScholarGoogle Scholar
  10. Mattias V. Eriksson, Oskar Skoog, and Christoph W. Kessler. 2008. Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of the 11th International Workshop on Software 8 Compilers for Embedded Systems. ACM, 11--20. Google ScholarGoogle Scholar
  11. Heiko Falk. 2009. WCET-aware register allocation based on graph coloring. In Proceedings of the 46th Annual Design Automation Conference. ACM, 726--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Falk and H. Kotthaus. 2011. WCET-driven cache-aware code positioning. In 2011 Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Heiko Falk, Norman Schmitz, and Florian Schmoll. 2011. WCET-aware Register Allocation Based on Integer-Linear Programming. In Proceedings of the 2011 23rd Euromicro Conference on Real-Time Systems. IEEE Computer Society, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David E. Golberg. 1989. Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989 (1989), 102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James R. Goodman and W.-C. Hsu. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the 2nd International Conference on Supercomputing. ACM, 442--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jan Gustafsson, Adam Betts, Andreas Ermedahl, and Björn Lisper. 2010. The Mälardalen WCET benchmarks: Past, present and future. In OASIcs-OpenAccess Series in Informatics, Vol. 15. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google ScholarGoogle Scholar
  17. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop. IEEE Computer Society, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yazhi Huang, Qingan Li, and Chun Jason Xue. 2011. Minimizing Schedule Length via Cooperative Register Allocation and Loop Scheduling for Embedded Systems. In Proceedings of the 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications. IEEE Computer Society, 1038--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yazhi Huang, Liang Shi, Jianhua Li, Qingan Li, and Chun Jason Xue. 2014. WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 1 (2014), 168--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Texas Instruments incorporated. 1995-2016. TI TMS320C64x DSPs Architecture. Copyright. (1995-2016). Retrieved from http://www.ti.com.Google ScholarGoogle Scholar
  21. Morteza Mohajjel Kafshdooz, Mohammadkazem Taram, Sepehr Assadi, and Alireza Ejlali. 2016. A Compile-Time Optimization Method for WCET Reduction in Real-Time Embedded Systems Through Block Formation. ACM Trans. Archit. Code Optim. 12, 4, Article 66 (Jan. 2016), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Krishnan Kailas, Kemal Ebcioglu, and Ashok Agrawala. 2001. CARS: A new code generation framework for clustered ILP processors. In The Seventh International Symposium on High-Performance Computer Architecture (HPCA'01). IEEE, 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Li, M. Zhao, and C. J. Xue. 2015. C3: Cooperative Code Positioning and Cache Locking for WCET Minimization. In 2015 IEEE 21st International Conference on Embedded and Real-Time Computing Systems and Applications. 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Björn Lisper. 2014. SWEET--a tool for WCET flow analysis. In International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Springer, 482--485.Google ScholarGoogle Scholar
  25. David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM (JACM) 30, 3 (1983), 417--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rahul Nagpal and Y. N. Srikant. 2008. Pragmatic integrated scheduling for clustered VLIW architectures. Software: Practice and Experience 38, 3 (2008), 227--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. STMicroelectronics Group of Companies. 2000-2004. ST200 VLIW Series ST231 Core and Instruction Set Architecture Manual. Copyright. (2000-2004). Retrieved from http://lipforge.ens-lyon.fr/docman/view.php/53/26/st231arch.pdf.Google ScholarGoogle Scholar
  28. Jinpyo Park and Soo-Mook Moon. 2004. Optimistic Register Coalescing. ACM Trans. Program. Lang. Syst. 26, 4 (July 2004), 735--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 5 (1999), 895--913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vasileios Porpodas and Marcelo Cintra. 2013. LUCAS: Latency-adaptive unified cluster assignment and instruction scheduling. ACM SIGPLAN Notices 48, 5 (2013), 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xuesong Su, Hui Wu, and Qing Yang. 2016. An Efficient WCET-Aware Hybrid Global Branch Prediction Approach. In Proceedings of the 2016 IEEE 22nd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). IEEE, 195--201.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jeffrey D. Ullman. 1976. Complexity of sequencing problems. In Computer and Job-Shop Scheduling Theory.Google ScholarGoogle Scholar
  33. D. J. A. Welsh and M. B. Powell. 1967. An upper bound for the chromatic number of a graph and its application to timetabling problems. Comput. J. 10, 1 (1967), 85--87.Google ScholarGoogle Scholar
  34. Xuemeng Zhang, Hui Wu, and Haiyan Sun. 2013. Register Allocation by Incremental Graph Colouring for Clustered VLIW Processors. In Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 927--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Xuemeng Zhang, Hui Wu, Haiyan Sun, and Jingling Xue. 2014. Lifetime holes aware register allocation for clustered VLIW processors. In Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014. IEEE, 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xuemeng Zhang, Hui Wu, and Jingling Xue. 2011. An efficient heuristic for instruction scheduling on clustered vliw processors. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Xuemeng Zhang, Hui Wu, and Jingling Xue. 2013. Instruction scheduling with k-successor tree for clustered VLIW processors. Design Automation for Embedded Systems 17, 2 (2013), 439--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wenguang Zheng and Hui Wu. 2017. Dynamic Data-Cache Locking for Minimizing the WCET of a Single Task. ACM Trans. Embed. Comput. Syst. 16, 2, Article 31 (Jan. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wenguang Zheng, Hui Wu, and Qing Yang. 2017. WCET-Aware Dynamic I-Cache Locking for a Single Task. ACM Trans. Archit. Code Optim. 14, 1, Article 4 (March 2017), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!