skip to main content
research-article

Integrated Code Generation for Loops

Published:01 June 2012Publication History
Skip Abstract Section

Abstract

Code generation in a compiler is commonly divided into several phases: instruction selection, scheduling, register allocation, spill code generation, and, in the case of clustered architectures, cluster assignment. These phases are interdependent; for instance, a decision in the instruction selection phase affects how an operation can be scheduled We examine the effect of this separation of phases on the quality of the generated code. To study this we have formulated optimal methods for code generation with integer linear programming; first for acyclic code and then we extend this method to modulo scheduling of loops. In our experiments we compare optimal modulo scheduling, where all phases are integrated, to modulo scheduling, where instruction selection and cluster assignment are done in a separate phase. The results show that, for an architecture with two clusters, the integrated method finds a better solution than the nonintegrated method for 27% of the instances.

References

  1. Altemose, G. and Norris, C. 2001. Register pressure responsive software pipelining. In Proceedings of the ACM Symposium on Applied Computing (SAC’01). ACM, New York, 626--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Altman, E. R., Govindarajan, R., and Gao, G. R. 1995. Scheduling and mapping: Software pipelining in the presence of structural hazards. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’95). ACM, New York, 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bednarski, A. and Kessler, C. W. 2006. Optimal integrated VLIW code generation with integer linear programming. In Proceedings of the European Conference on Parallel Computing (Euro-Par’06). Springer, 461--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blachot, F., de Dinechin, B. D., and Huard, G. 2006. SCAN: A heuristic for near-optimal software pipelining. In Proceedings of the European Conference on Parallel Computing (Euro-Par’06). Springer, 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chang, C., Chen, C., and King, C. 1997. Using integer linear programming for instruction scheduling and register allocation in multi-issue processors. Comput. Math. Appl. 34, 9, 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  6. Charlesworth, A. 1981. An approach to scientific array processing: The architectural design of the AP-120b/FPS-164 family. Computer 14, 9, 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Codina, J. M., Sánchez, J., and González, A. 2001. A unified modulo scheduling and register allocation technique for clustered processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’01). IEEE, 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eichenberger, A. E., Davidson, E. S., and Abraham, S. G. 1996. Minimizing register requirements of a modulo schedule via optimum stage scheduling. Int. J. Paral. Program. 24, 2, 103--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Eisenbeis, C. and Sawaya, A. 1996. Optimal loop parallelization under register constraints. In Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC’96). 245--259.Google ScholarGoogle Scholar
  10. Eriksson, M. 2009. Integrated software pipelining. Licentiate degree thesis, Linköping Studies in Science and Technology Thesis No. 1393, Linköping University, Sweden.Google ScholarGoogle Scholar
  11. Eriksson, M. V. and Kessler, C. W. 2009. Integrated modulo scheduling for clustered VLIW architectures. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Springer, Lecture Notes in Computer Science, vol. 5409, 65--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eriksson, M. V., Skoog, O., and Kessler, C. W. 2008. Optimal vs. heuristic integrated code generation for clustered VLIW architectures. In Proceedings of the 11th International Workshop on Software & Compilers for Embedded Systems (SCOPES’’08). ACM, New York, 11--20. Google ScholarGoogle Scholar
  13. Fan, K., Kudlur, M., Park, H., and Mahlke, S. 2005. Cost sensitive modulo scheduling in a loop accelerator synthesis system. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 219--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fernandes, M. M. 1998. A clustered VLIW architecture based on queue register files. Ph.D. thesis, University of Edinburgh.Google ScholarGoogle Scholar
  15. Fernandes, M. M., Llosa, J., and Topham, N. 1999. Distributed modulo scheduling. In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA’99). IEEE, 130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fimmel, D. and Müller, J. 2002. Optimal software pipelining with rational initiation interval. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’02). CSREA Press, 638--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fisher, J. A. 1983. Very long instruction word architectures and the ELI-512. In Proceedings of the 10th Annual International Symposium on Computer Architecture (ISCA’83). ACM, New York, 140--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gebotys, C. and Elmasry, M. 1993. Global optimization approach for architectural synthesis. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 12, 9, 1266--1278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hanono, S. and Devadas, S. 1998. Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator. In Proceedings of the 35th Annual Conference on Design Automation (DAC’98). ACM, New York, 510--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huff, R. A. 1993. Lifetime-sensitive modulo scheduling. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’93). ACM, New York, 258--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kailas, K., Ebcioglu, K., and Agrawala, A. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). IEEE, 133--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kästner, D. 2001. Propan: A retargetable system for postpass optimizations and analyses. In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES’00). Springer, 63--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kessler, C. W. and Bednarski, A. 2006. Optimal integrated code generation for VLIW architectures. Concur. Comput. Pract. Exper. 18, 11, 1353--1390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kessler, C., Bednarski, A., and Eriksson, M. 2007. Classification and generation of schedules for VLIW processors. Concur. Comput. Pract. Exper. 19, 18, 2369--2389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. SIGPLAN Not. 23, 7, 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. IEEE, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Leupers, R. 2000. Instruction scheduling for clustered VLIW DSPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’00). IEEE, 291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Llosa, J., Valero, M., Ayguadé, E., and González, A. 1995. Hypernode reduction modulo scheduling. In Proceedings of the 28th Annual International Symposium on Microarchitecture. IEEE, 350--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Llosa, J., Gonzalez, A., Ayguade, E., and Valero, M. 1996. Swing modulo scheduling: A lifetime-sensitive approach. In Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT’96). IEEE, 80--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lorenz, M. and Marwedel, P. 2004. Phase coupled code generation for DSPs using a genetic algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’04). IEEE, 1270--1275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Malik, A. M., Chase, M., Russell, T., and van Beek, P. 2008. An application of constraint programming to superblock instruction scheduling. In Proceedings of the 14th International Conference on Principles and Practice of Constraint Programming. Springer, 97--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nagarakatte, S. G. and Govindarajan, R. 2007. Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation. In Proceedings of the 16th International Conference on Compiler Construction. Springer, 126--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nagpal, R. and Srikant, Y. N. 2004. Integrated temporal and spatial scheduling for extended operand clustered VLIW processors. In Proceedings of the 1st Conference on Computing Frontiers. ACM, New York, 457--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ning, Q. and Gao, G. R. 1993. A novel framework of register allocation for software pipelining. In Proceedings of the 20th ACM Symposium on Principles of Programming Languages (POPL’93). ACM, New York, 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nystrom, E. and Eichenberger, A. E. 1998. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ozer, E., Banerjia, S., and Conte, T. M. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 308--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Pister, M. and Kästner, D. 2005. Generic software pipelining at the assembly level. In Proceedings of the Workshop on Software and Compilers for Embedded Systems (SCOPES’05). ACM, New York, 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rau, B. R. 1994. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, New York, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rau, B. R. and Glaeser, C. D. 1981. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. SIGMICRO Newsl. 12, 4, 183--198. Google ScholarGoogle ScholarCross RefCross Ref
  40. Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’96). ACM, New York, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stotzer, E. and Leiss, E. 1999. Modulo scheduling for the TMS320C6x VLIW DSP architecture. SIGPLAN Not. 34, 7, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tarjan, R. E. 1973. Enumeration of the elementary circuits of a directed graph. SIAM J. Comput. 2, 3, 211--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Texas Instruments Incorporated. 2000. TMS320C6000 CPU and Instruction Set Reference Guide.Google ScholarGoogle Scholar
  44. Touati, S.-A.-A. 2007. On periodic register need in software pipelining. IEEE Trans. Comput. 56, 11, 1493--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Touati, S.-A.-A. 2009. Data dependence graphs from Spec, Mediabench, and Ffmpeg benchmark suites. Personal communication.Google ScholarGoogle Scholar
  46. Vegdahl, S. R. 1992. A dynamic-programming technique for compacting loops. In Proceedings of the 25th Annual International Symposium on Microarchitecture. IEEE, 180--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wilken, K., Liu, J., and Heffernan, M. 2000. Optimal instruction scheduling using integer programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM, New York, 121--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wilson, T. C., Grewal, G. W., and Banerji, D. K. 1994. An ILP solution for simultaneous scheduling, allocation, and binding in multiple block synthesis. In Proceedings of the International Conference on Computer Design (ICCD’94). IEEE, 581--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Winkel, S. 2004. Optimal global instruction scheduling for the Itanium processor architecture. Ph.D. thesis, Universität des Saarlandes.Google ScholarGoogle Scholar
  50. Winkel, S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 43--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Yang, H., Govindarajan, R., Gao, G. R., and Theobald, K. B. 2002. Power-performance trade-offs for energy-efficient architectures: A quantitative study. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02). IEEE, 174. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Integrated Code Generation for Loops

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!