skip to main content
research-article

WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Published:12 June 2012Publication History
Skip Abstract Section

Abstract

Worst-Case Execution Time (WCET) is one of the most important metrics in real-time embedded system design. For embedded systems with clustered VLIW architecture, register allocation, instruction scheduling, and cluster assignment are three key activities to pursue code optimization which have profound impact on WCET. At the same time, these three activities exhibit a phase ordering problem: Independently performing register allocation, scheduling and cluster assignment could have a negative effect on the other phases, thereby generating sub-optimal compiled codes. In this paper, a compiler level optimization, namely WCET-aware Re-scheduling Register Allocation (WRRA), is proposed to achieve WCET minimization for real-time embedded systems with clustered VLIW architecture. The novelty of the proposed approach is that the effects of register allocation, instruction scheduling and cluster assignment on the quality of generated code are taken into account for WCET minimization. These three compilation processes are integrated into a single phase to obtain a balanced result. The proposed technique is implemented in Trimaran 4.0. The experimental results show that the proposed technique can reduce WCET effectively, by 33% on average.

References

  1. H. Falk. "WCET-aware register allocation based on graph coloring," in DAC '09: Proceedings of the 46th annual design automation conference, 2009, pp. 726--731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Texas Instruments Inc., "TMS320C62x/67x CPU and instrction set reference guide," 1998.Google ScholarGoogle Scholar
  3. T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: The fifteenth IEEE real-time and embedded technology and applications symposium, 2009, pp. 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. "MAP1000 unfolds at equator," in Microprocessor Report, 1998.Google ScholarGoogle Scholar
  5. J. Fridman and A. Greefield, "The TigerSharc DSP architecture," in IEEE Micro, 2000, pp. 66--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. D. Smith, N. Ramsey, and G. Holloway, "A generalized algorithm for graph-coloring register allocation," in PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on programming language design and implementation, 2004, pp. 277--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Briggs, "Register allocation via graph coloring," PhD thesis, Rice University, Houston, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. George and A. W. Appel, "Iterated registe coalescing," in ACM transactions on programming language systems, 1996, pp. 300--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Lokuciejewski, H. Falk, and P. Marwedel, "WCET-driven cache-based procedure positioning optimizations," in ECRTS '08: Proceedings of euromicro technical committee on real-time systems, 2008 pp. 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J.-F. Deverge and I. Puaut, "WCET-directed dynamic scratchpad memory allocation of data," in ECRTS '07: Proceedings of euromicro technical committee on real-time systems, 2007, pp. 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Puaut and C. Pais, "Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison," in DATE '07: Proceedings of design, automation and test in Europe, 2007, pp. 1484--1489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Leupers, "Instruction scheduling for clustered VLIW dsps," in PACT '00: Proceedings of the international conference on parallel architecture and compilation techniques, 2000, pp. 291--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. http://www.absint.com/aitGoogle ScholarGoogle Scholar
  14. J. A. Fisher, J. R. Ellis, J. C. Ruttenberg, and A. Nicolau, "Parallel processing: A smart compiler and a dumb machine," in Proceedings 1984 SIG-PLAN symposium on compiler construction, 1984, pp. 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Texas Instrucments, Inc., in TMS320C62xx CPU and Instruction Set: Reference Guide, 1997, Manufacturing part #D426008--9761, revision A.Google ScholarGoogle Scholar
  16. L. Gwennap, "Digital 21264 sets new standard," in Microprocessor Report, 1996, Vol. 10, pp. 11--16.Google ScholarGoogle Scholar
  17. An infrastructure for research in backend compilation and architecture exploration. http://www.trimaran.orgGoogle ScholarGoogle Scholar
  18. MiBench. http://www.eecs.umich.edu/mibenchGoogle ScholarGoogle Scholar
  19. M. Poletto and V. Sarkar, "Linear scan register allocation," in ACM transactions on programming languages and systems, 1999, pp. 895--913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. W. Goodwin and K. D. Wilken, "Optimal and near-optimal global register allocation using 0--1 interger programming," in Software: practice and experience, 1996, pp. 929--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Hack, D. Grund, and G. Goos, "Register allocation for programs in SSA form," in CC '06: International conference on compiler construction, 2006, Vol. 3923, pp. 247--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. M. Codina, J. Sanchez, and A. Gonzalez, "A unified modulo scheduling and register allocation technique for clustered processors," in PACT '01: Proceedings of the international conference on parallel architecture and compilation techniques, 2001, pp. 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. J. Chaitin, M. A. Auslander, et al., "Register allocation via coloring," in Comouter language, 1981, pp. 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. D. Ullman, "Complexity of sequencing problems," 1976.Google ScholarGoogle Scholar
  25. J. R. Ellis, "Bulldog: A compiler for VLIW architectures," in The MIT press, 1986, pp. 180--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Jang, S. Carr, P. Sweany, and D. Kuras, "A code generation framwork for VLIW architectures with partitioned register banks," in MPCS '98: Proceedings of 3rd international conference on massively parallel computing systems, 1998, pp. 61--69.Google ScholarGoogle Scholar
  27. V. S. Lapinskii and M. F. Jacome, "Cluster assignment for high-performance embedded VLIW processors," in ACM transactions on design automation of electronic systems, 2002, pp. 430--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Ozer, S. Banerjia, and T. M. Conte, "Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures," in MICRO, 1998, pp. 308--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Nagpal and Y. N. Srikant, "Progmatic integrated scheduling for clustered VLIW architectures," in Software-practice and experience, 2008, pp. 227--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Kailars, A. Agrawala, and K. Ebcioglu, "Cars: a new code generation framwork for clustered ILP processors," in HPCA '01: Prodeedings of the 7th international symposium on high-performance computer architecture, 2001, pp. 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Sanchez and A. Conzalezor, "Instruction scheduling for clustered VLIW architectures," in Proceedings of 13th international symposium on system synthesis, 2000, pp. 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zalamea, J. Llosa, E. Ayguade, and M. Valero, "Modulo scheduling with integrated register spilling for clustered VLIW architectures," in MICRO, 2001, pp. 160--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Pereira and J. Palsberg, "Register allocation by puzzle solving," in PLDI '08: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, 2008, pp. 216--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Faraboschi, G. Brown, J. Fisher, G. Desoli, and F. Homewood, "Lx: a technology platform for customizable VLIW embedded processing," in ISCA '00: Proceedings of the 27th international symposium on comupter architecture, 2000, pp. 203--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Liu, M. Li, and C. J. Xue, "Minimizing WCET for real-time embedded systems via static instruction cache locking," in RTAS '09: Real-time and embedded technology and applications symposium, 2009, pp. 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC," in Journal of parallel distributed computing, 2011, Vol. 71, pp. 1473--1483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Liu, Y. Zhao, M. Li, and C. J. Xue, "Task assignment with cache partitioning and locking for WCET minimization on MPSoC," in ICPP '10: International conference on parallel processing, 2010, pp. 573--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Q. Xu, C. J. Xue, Y. He, and E. H.-M. Sha, "Energy efficient joint scheduling and multi-core interconnect design," in ASP-DAC '10: Asia and south pacific design automation conference, 2010, pp. 879--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Q. Xu, C. J. Xue, J. Hu, and E. H.-M. Sha, "Optimizing scheduling and intercluster connection for application-specific DSP processors," in IEEE transactions on signal processing, 2009, Vol. 57, pp. 4538--4547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Q. Xu, C. J. Xue, and E. H.-M. Sha, "Energy-efficient joint scheduling and application-specific interconnection design," in IEEE transactions on very large scale integration systems, 2011, Vol. 19, pp. 1813--1822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. Liu, A. Orailoglu, and C. J. Xue, "Register allocation for simultaneous reduction of energy and peak temperature on registers," in DATE '11: Proceedings of design, automation and test in Europe, 2011, pp. 20--25.Google ScholarGoogle Scholar

Index Terms

  1. WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 5
      LCTES '12
      MAY 2012
      152 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345141
      Issue’s Table of Contents
      • cover image ACM Conferences
        LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
        June 2012
        153 pages
        ISBN:9781450312127
        DOI:10.1145/2248418

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!