skip to main content
10.1145/1542452.1542454acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Modulo scheduling without overlapped lifetimes

Published:19 June 2009Publication History

ABSTRACT

This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software pipeline. The length of each stage in the software pipeline is the Initiation Interval (II), which is the rate at which new loop iterations are started. An overlapped lifetime has a live range longer than the II, and as a consequence, the current iteration writes a new value to a register before a previous loop iteration has fin-ished using the old value. Hardware and software solutions for dealing with overlapped lifetimes have been proposed by re-searchers and also implemented in commercial products. These solutions include rotating register files, register queues, modulo variable expansion, and post-scheduling live range splitting. Each of these approaches has drawbacks for embedded systems such as an increase in silicon area, power consumption, and code size.

Our approach, which is an improvement to the current solutions, prevents overlapped lifetimes through a combination of hardware and software techniques. The hardware element of our approach implements a register assignment latency that allows multiple in-flight writes to be pending to the same register. The software element of our approach uses dependence analysis and a constrained modulo scheduling algorithm to prevent overlapped lifetimes. We describe how to use these hardware and software techniques during modulo scheduling. Finally, we present the results of using our approach to compile embedded application code and present results in terms of modulo schedule quality and application performance.

References

  1. V.H. Allan, R.B. Jones, R.M. Lee, and S.J. Allan. Software pipelining. ACM Computing Surveys, volume 27(3), pages 367--432, Sept. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proc. of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177--189, Jan. 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Dehnert and R. Towle. Compiling for the Cydra 5. Journal of Supercomputing, volume 7(1/2), Jan. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Dehnert, P. Hsu, and J. Bratt. Overlapped loop support in the Cydra 5. In Proc. of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26--38, April 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M.M. Fernandes, J. Llosa, and N. Topham. Partitioned Schedules for Clustered VLIW Architectures. In Proc. of the 12th Inter-national Parallel Processing Symposium, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Granston, R. Scales, E. Stotzer, A. Ward, and J. Zbiciak. Controlling code size of software-pipelined loops on the TMS320C6000 VLIW DSP architecture. In Proc. of the 3rd Workshop on Media and Streaming Processors held in conjunction with MICRO-34, pages 29--38, Dec. 2001.Google ScholarGoogle Scholar
  7. J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 architecture. IEEE Micro, 20(5):12--23, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R.A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI'93), pages 258--267, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of the ACM SIGPLAN '88 Conference on Programming Language Design and Implementation (PLDI'88), pages 318--328, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E.L. Leiss. Parallel and vector computing: A practical introduction. McGraw-Hill, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Llosa, A. Gonzalez, E. Ayguade, and M. Valero. Swing modulo scheduling: A lifetime sensitive approach. In Proc. of the Int. Conf. on Parallel Architectures and Compilation Techniques, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Llosa, S. M. Freudenberger. Reduced code size modulo scheduling in the absence of hardware support. In Proc. of the 35th Annual IEEE/ACM Int. Symposium on Microarchitecture (MICRO-35), pages 99--110, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Merten and W. Hwu. Modulo schedule buffers. In Proc. of the 34th Annual IEEE/ACM Int. Symposium on Microarchitecture (MI-CRO-34), pages 138--149, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.H. Patel and E.S. Davidson. Improving the throughput of a pipeline by insertion of delays. In Proc. of the 3rd Annual Symposium on Computer Architecture, pages 159--164, Jan. 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the 14th Annual Workshop on Microprogramming, pages 183--198, Oct. 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B.R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 183--197, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B.R. Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In Proc. of the 27th Annual Symposium on Microarchitecture, pages 63--74, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Stotzer and E. Leiss. Modulo scheduling for the TMS320C6x VLIW DSP architecture. In Proc. of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES'99), pages 28--34, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Texas Instruments, Inc., TMS320C674x DSP CPU Instruction Set User's Guide, (literature number sprufe8), Oct. 2008.Google ScholarGoogle Scholar
  20. Texas Instruments Inc., TMS320C6000 Optimizing Compiler User's Guide, (literature number spru187o), May 2008.Google ScholarGoogle Scholar
  21. G.S. Tyson, M. Smelyanskiy, and E.S. Davidson. Evaluating the Use of Register Queues in Software Pipelined Loops. In IEEE Transactions on Computers, volume 50(8), pages 769--783, Aug. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Register constrained modulo scheduling. In IEEE Transactions on Parallel and Distributed Systems, volume 15(5), pages 417--430, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. Addison-Wesley, 1990. Google ScholarGoogle Scholar

Index Terms

  1. Modulo scheduling without overlapped lifetimes

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
            June 2009
            188 pages
            ISBN:9781605583563
            DOI:10.1145/1542452
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 44, Issue 7
              LCTES '09
              July 2009
              176 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/1543136
              Issue’s Table of Contents

            Copyright © 2009 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            LCTES '09 Paper Acceptance Rate18of81submissions,22%Overall Acceptance Rate116of438submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!