skip to main content
research-article

Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

Published:01 November 2011Publication History
Skip Abstract Section

Abstract

Integrating register allocation and software pipelining of loops is an active research area. We focus on techniques that precondition the dependence graph before software pipelining in order to ensure that no register spill instructions are inserted by the register allocator in the software pipelined loop. If spilling is not necessary for the input code, preconditioning techniques insert dependence arcs so that the maximum register pressure MAXLIVE achieved by any loop schedule is below the number of available registers, without hurting the initiation interval if possible. When a solution exists, a spill-free software pipeline is guaranteed to exist.

Existing preconditioning techniques consider one register type (register class) at a time [Deschinkel and Touati 2008]. In this article, we extend preconditioning techniques so that multiple register types are considered simultaneously. First, we generalize the existing theory of register pressure minimization for cyclic scheduling. Second, we implement our method inside the production compiler of the ST2xx VLIW family, and we demonstrate its efficiency on industry benchmarks (FFMPEG, MEDIABENCH, SPEC2000, SPEC2006). We demonstrate a high spill reduction rate without a significant initiation interval loss.

References

  1. Briais, S., and Touati, S.-A.-A. 2009. Schedule-Sensitive register pressure reduction in innermost loops, basic blocks, and super-blocks. Tech. rep. INRIA-HAL-00436348, University of Versailles Saint-Quentin en Yvelines.Google ScholarGoogle Scholar
  2. Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. ACM Trans. Program. Lang. Syst. 16, 3, 428--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chaitin, G. 2004. Register allocation and spilling via graph coloring. SIGPLAN Not. 39, 4, 66--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. De Werra, D., Eisenbeis, C., Lelait, S., and Marmol, B. 1999. On a graph-theoretical model for cyclic register allocation. Discr. Appl. Math. 93, 2--3, 191--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Deschinkel, K. and Touati, S.-A.-A. 2008. Efficient method for periodic task scheduling with storage requirement minimization. In Proceedings of the 2nd International Conference on Combinatorial Optimization and Applications (COCOA’08). Lecture Notes in Computer Science, vol. 5165. Springer, 438--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dupont-de-Dinechin, B. 1997. Parametric computation of margins and of minimum cumulative register lifetime dates. In Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing (LCPC’96). Springer, 231--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eichenberger, A. and Davidson, E. S. 1997. Efficient formulation for optimal modulo schedulers. SIGPLAN Not. 32, 5, 194--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eisenbeis, C., Lelait, S., and Marmol, B. 1995. The meeting graph: A new model for loop cyclic register allocation. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT’95). L. Bic, W. Bohm, P. Evripidou, and J. L. Gaudiot Eds., ACM Press, New York, 264--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Farabosch, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA’00). ACM, New York, 203--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fisher, J. A., Faraboschi, and Young, C. 2005. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann Publishers.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guillon, C., Rastello, F., Bidault, T., and Bouchez, F. 2005. Procedure placement using temporal-ordering information: Dealing with code size expansion. J. Embed. Comput. 1, 4, 437--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hendren, L. J., Gao, G. R., Altman, E. R., and Mukerji, C. 1992. A register allocation framework based on hierarchical cyclic interval graphs. In Proceedings of the 4th International Conference on Compiler Construction (CC’92). Springer, 176--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jain, R. 1991. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley and Sons, New York.Google ScholarGoogle Scholar
  14. Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’88). ACM Press, New York, 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction (CC’07) Held as Part of the Joint European Conference on Theory and Practice of Software (ETAPS’07). Lecture Notes in Computer Science, vol. 4420., Springer, 126--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nicolau, A., Potasman, R., and Wang, H. 1992. Register allocation, renaming and their impact on fine-grain parallelism. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Springer, 218--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ramakrishna, R. B. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture (Micro27). ACM, New York, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ramakrishna, R. B., Schlansker, M. S., and Tirumalai, P. P. 1992a. Code generation schema for modulo scheduled loops. SIGMICRO Newslett. 23, 1--2, 158--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ramakrishna, R. B., Lee, M., Tirumalaiand, P. P., and Schlansker, M. S. 1992b. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’92). ACM, New York, 283--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ravindra, K. A., Magnanti, T. L., and Orlin, J. B. 1991. Network Flows: Theory, Algorithms, and Applications. John Wiley and Sons, New York.Google ScholarGoogle Scholar
  21. Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’96). ACM Press, New York, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Schrijver, A. 1986. Theory of Linear and Integer Programming. Wiley and Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Touati, S.-A.-A. 2007. On the periodic register need in software pipelining. IEEE Trans. Comput. 56, 11, 1103--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Touati, S.-A.-A. and Barthou, D. 2006. On the decidability of phase ordering problem in optimizing compilation. In Proceedings of the ACM International Conference on Computing Frontiers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Touati, S.-A.-A. and Eisenbeis, C. 2004. Early periodic register allocation on ILP processors. Parall. Process. Lett. 14, 2, 287.Google ScholarGoogle ScholarCross RefCross Ref
  26. Wang, J., Eisenbeis, C., Jourdan, M., and Su, B. 1994. Decomposed software pipelining: A new perspective and a new approach. Int. J. Parall. Program. 22, 3, 351--373. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!