ABSTRACT
This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software pipeline. The length of each stage in the software pipeline is the Initiation Interval (II), which is the rate at which new loop iterations are started. An overlapped lifetime has a live range longer than the II, and as a consequence, the current iteration writes a new value to a register before a previous loop iteration has fin-ished using the old value. Hardware and software solutions for dealing with overlapped lifetimes have been proposed by re-searchers and also implemented in commercial products. These solutions include rotating register files, register queues, modulo variable expansion, and post-scheduling live range splitting. Each of these approaches has drawbacks for embedded systems such as an increase in silicon area, power consumption, and code size.
Our approach, which is an improvement to the current solutions, prevents overlapped lifetimes through a combination of hardware and software techniques. The hardware element of our approach implements a register assignment latency that allows multiple in-flight writes to be pending to the same register. The software element of our approach uses dependence analysis and a constrained modulo scheduling algorithm to prevent overlapped lifetimes. We describe how to use these hardware and software techniques during modulo scheduling. Finally, we present the results of using our approach to compile embedded application code and present results in terms of modulo schedule quality and application performance.
- V.H. Allan, R.B. Jones, R.M. Lee, and S.J. Allan. Software pipelining. ACM Computing Surveys, volume 27(3), pages 367--432, Sept. 1995. Google Scholar
Digital Library
- J.R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Proc. of the Tenth Annual ACM Symposium on Principles of Programming Languages, pages 177--189, Jan. 1983. Google Scholar
Digital Library
- J. Dehnert and R. Towle. Compiling for the Cydra 5. Journal of Supercomputing, volume 7(1/2), Jan. 1993. Google Scholar
Digital Library
- J. Dehnert, P. Hsu, and J. Bratt. Overlapped loop support in the Cydra 5. In Proc. of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 26--38, April 1989. Google Scholar
Digital Library
- M.M. Fernandes, J. Llosa, and N. Topham. Partitioned Schedules for Clustered VLIW Architectures. In Proc. of the 12th Inter-national Parallel Processing Symposium, April 1998. Google Scholar
Digital Library
- E. Granston, R. Scales, E. Stotzer, A. Ward, and J. Zbiciak. Controlling code size of software-pipelined loops on the TMS320C6000 VLIW DSP architecture. In Proc. of the 3rd Workshop on Media and Streaming Processors held in conjunction with MICRO-34, pages 29--38, Dec. 2001.Google Scholar
- J. Huck, D. Morris, J. Ross, A. Knies, H. Mulder, and R. Zahir. Introducing the IA-64 architecture. IEEE Micro, 20(5):12--23, 2000. Google Scholar
Digital Library
- R.A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation (PLDI'93), pages 258--267, June 1993. Google Scholar
Digital Library
- M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of the ACM SIGPLAN '88 Conference on Programming Language Design and Implementation (PLDI'88), pages 318--328, June 1988. Google Scholar
Digital Library
- E.L. Leiss. Parallel and vector computing: A practical introduction. McGraw-Hill, 1995. Google Scholar
Digital Library
- J. Llosa, A. Gonzalez, E. Ayguade, and M. Valero. Swing modulo scheduling: A lifetime sensitive approach. In Proc. of the Int. Conf. on Parallel Architectures and Compilation Techniques, 1996. Google Scholar
Digital Library
- J. Llosa, S. M. Freudenberger. Reduced code size modulo scheduling in the absence of hardware support. In Proc. of the 35th Annual IEEE/ACM Int. Symposium on Microarchitecture (MICRO-35), pages 99--110, 2002. Google Scholar
Digital Library
- M. Merten and W. Hwu. Modulo schedule buffers. In Proc. of the 34th Annual IEEE/ACM Int. Symposium on Microarchitecture (MI-CRO-34), pages 138--149, Dec. 2001. Google Scholar
Digital Library
- J.H. Patel and E.S. Davidson. Improving the throughput of a pipeline by insertion of delays. In Proc. of the 3rd Annual Symposium on Computer Architecture, pages 159--164, Jan. 1976. Google Scholar
Digital Library
- B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the 14th Annual Workshop on Microprogramming, pages 183--198, Oct. 1981. Google Scholar
Digital Library
- B.R. Rau. Data flow and dependence analysis for instruction level parallelism. In Fourth Workshop on Languages and Compilers for Parallel Computing, pages 183--197, Oct. 1991. Google Scholar
Digital Library
- B.R. Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In Proc. of the 27th Annual Symposium on Microarchitecture, pages 63--74, Nov. 1994. Google Scholar
Digital Library
- E. Stotzer and E. Leiss. Modulo scheduling for the TMS320C6x VLIW DSP architecture. In Proc. of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES'99), pages 28--34, 1999. Google Scholar
Digital Library
- Texas Instruments, Inc., TMS320C674x DSP CPU Instruction Set User's Guide, (literature number sprufe8), Oct. 2008.Google Scholar
- Texas Instruments Inc., TMS320C6000 Optimizing Compiler User's Guide, (literature number spru187o), May 2008.Google Scholar
- G.S. Tyson, M. Smelyanskiy, and E.S. Davidson. Evaluating the Use of Register Queues in Software Pipelined Loops. In IEEE Transactions on Computers, volume 50(8), pages 769--783, Aug. 2001. Google Scholar
Digital Library
- J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Register constrained modulo scheduling. In IEEE Transactions on Parallel and Distributed Systems, volume 15(5), pages 417--430, May 2004. Google Scholar
Digital Library
- H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. Addison-Wesley, 1990. Google Scholar
Index Terms
Modulo scheduling without overlapped lifetimes
Recommendations
Modulo scheduling without overlapped lifetimes
LCTES '09This paper describes complementary software- and hardware-based approaches for handling overlapping register lifetimes that occur in modulo scheduled loops. Modulo scheduling takes the N-instructions in a loop body and constructs an M-stage software ...
Register Constrained Modulo Scheduling
Abstract--Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software ...
Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling
Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present an approach that schedules the loop operations for minimum ...







Comments