Abstract
Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by the register allocator in the software pipelined loop. If spilling is not necessary for the input code, preconditioning techniques insert dependence arcs so that the maximum register pressure MAXLIVE achieved by any loop schedule is below the number of available registers, without hurting the initiation interval if possible. When a solution exists, a spill-free software pipeline is guaranteed to exist.
Existing preconditioning techniques consider one register type (register class) at a time [Deschinkel and Touati 2008]. In this article, we extend preconditioning techniques so that multiple register types are considered simultaneously. First, we generalize the existing theory of register pressure minimization for cyclic scheduling. Second, we implement our method inside the production compiler of the ST2xx VLIW family, and we demonstrate its efficiency on industry benchmarks (FFMPEG, MEDIABENCH, SPEC2000, SPEC2006). We demonstrate a high spill reduction rate without a significant initiation interval loss.
- Briais, S., and Touati, S.-A.-A. 2009. Schedule-Sensitive register pressure reduction in innermost loops, basic blocks, and super-blocks. Tech. rep. INRIA-HAL-00436348, University of Versailles Saint-Quentin en Yvelines.Google Scholar
- Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst. 16, 3, 428--455. Google Scholar
Digital Library
- Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74. Google Scholar
Digital Library
- De Werra, D., Eisenbeis, C., Lelait, S., and Marmol, B. 1999. On a graph-theoretical model for cyclic register allocation. Discr. Appl. Math. 93, 2--3, 191--203. Google Scholar
Digital Library
- Deschinkel, K. and Touati, S.-A.-A. 2008. Efficient method for periodic task scheduling with storage requirement minimization. In Proceedings of the 2nd International Conference on Combinatorial Optimization and Applications (COCOA’08). Lecture Notes in Computer Science, vol. 5165. Springer, 438--447. Google Scholar
Digital Library
- Dupont-de-Dinechin, B. 1997. Parametric computation of margins and of minimum cumulative register lifetime dates. In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing (LCPC’96). Springer, 231--245. Google Scholar
Digital Library
- Eichenberger, A. and Davidson, E. S. 1997. Efficient formulation for optimal modulo schedulers. SIGPLAN Not. 32, 5, 194--205. Google Scholar
Digital Library
- Eisenbeis, C., Lelait, S., and Marmol, B. 1995. The meeting graph: A new model for loop cyclic register allocation. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT’95). L. Bic, W. Bohm, P. Evripidou, and J. L. Gaudiot Eds., ACM Press, New York, 264--267. Google Scholar
Digital Library
- Farabosch, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, 203--213. Google Scholar
Digital Library
- Fisher, J. A., Faraboschi, and Young, C. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann Publishers.Google Scholar
Digital Library
- Guillon, C., Rastello, F., Bidault, T., and Bouchez, F. 2005. Procedure placement using temporal-ordering information: Dealing with code size expansion. J. Embed. Comput. 1, 4, 437--459. Google Scholar
Digital Library
- Hendren, L. J., Gao, G. R., Altman, E. R., and Mukerji, C. 1992. A register allocation framework based on hierarchical cyclic interval graphs. In Proceedings of the 4th International Conference on Compiler Construction (CC’92). Springer, 176--191. Google Scholar
Digital Library
- Jain, R. 1991. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley and Sons, New York.Google Scholar
- Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’88). ACM Press, New York, 318--328. Google Scholar
Digital Library
- Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction (CC’07) Held as Part of the Joint European Conference on Theory and Practice of Software (ETAPS’07). Lecture Notes in Computer Science, vol. 4420., Springer, 126--140. Google Scholar
Digital Library
- Nicolau, A., Potasman, R., and Wang, H. 1992. Register allocation, renaming and their impact on fine-grain parallelism. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Springer, 218--235. Google Scholar
Digital Library
- Ramakrishna, R. B. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (Micro27). ACM, New York, 63--74. Google Scholar
Digital Library
- Ramakrishna, R. B., Schlansker, M. S., and Tirumalai, P. P. 1992a. Code generation schema for modulo scheduled loops. SIGMICRO Newslett. 23, 1--2, 158--169. Google Scholar
Digital Library
- Ramakrishna, R. B., Lee, M., Tirumalaiand, P. P., and Schlansker, M. S. 1992b. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’92). ACM, New York, 283--299. Google Scholar
Digital Library
- Ravindra, K. A., Magnanti, T. L., and Orlin, J. B. 1991. Network Flows: Theory, Algorithms, and Applications. John Wiley and Sons, New York.Google Scholar
- Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’96). ACM Press, New York, 1--11. Google Scholar
Digital Library
- Schrijver, A. 1986. Theory of Linear and Integer Programming. Wiley and Sons. Google Scholar
Digital Library
- Touati, S.-A.-A. 2007. On the periodic register need in software pipelining. IEEE Trans. Comput. 56, 11, 1103--1504. Google Scholar
Digital Library
- Touati, S.-A.-A. and Barthou, D. 2006. On the decidability of phase ordering problem in optimizing compilation. In Proceedings of the ACM International Conference on Computing Frontiers. Google Scholar
Digital Library
- Touati, S.-A.-A. and Eisenbeis, C. 2004. Early periodic register allocation on ILP processors. Parall. Process. Lett. 14, 2, 287.Google Scholar
Cross Ref
- Wang, J., Eisenbeis, C., Jourdan, M., and Su, B. 1994. Decomposed software pipelining: A new perspective and a new approach. Int. J. Parall. Program. 22, 3, 351--373. Google Scholar
Digital Library
Index Terms
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors
Recommendations
Improved spill code generation for software pipelined loops
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementationSoftware pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their ...
Improved spill code generation for software pipelined loops
Software pipelining is a loop scheduling technique that extracts parallelism out of loops by overlapping the execution of several consecutive iterations. Due to the overlapping of iterations, schedules impose high register requirements during their ...
Enabling compiler flow for embedded VLIW DSP processors with distributed register files
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsHigh-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank ...






Comments