
- 1 (Special issue on IBM RISC System/6000 processor). IBM Journal of Research and Development 34, i (i990).Google Scholar
- 2 Allen, J.R., Kennedy, K., Porterfield, C., and Warren, J. Conversion of control dependence to data dependence. In Proceedings of Tenth Annual ACM Symposium of Principles of Programming languages, (1983). Google Scholar
Digital Library
- 3 Butler, M., et al. Single instruction stream parallelism is greater than two. In Proceedings of Eighteenth Annual International Symposium on Computer Architecture, (Toronto, 1991). Google Scholar
Digital Library
- 4 Char!esworth, A.E. An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 family. IEEE Computer 14, 9 (1981), 18-27.Google Scholar
- 5 Dehnert, J.C., Hsu, P.Y.-T., and Bratt, J.P. Overlapped loop support in the Cydra 5. In Proceedings of the Third International Conference on Architectural Support for Program tm'ng~guages and Operating Systems, (Boston, Mass., 1989), 26-38. Google Scholar
Digital Library
- 6 Ebcioglu, K., and Nakatanl, T. A new compilation technique for Raralle!!zi.ng loops with unpredictable branches on a v 14 w arcmtecmre, m canguages and Compilers for Parallel Computing, Gelernter, D., Nicolau, A., and Padua, D., Editor. 1989, Pitman/The MIT Press: London. p. 213-229. Google Scholar
Digital Library
- 7 Fisher, J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers C- 30, 7 (1981).Google Scholar
- 8 Foster, C.C., and Riseman, E.M. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers C-21, 12 (1972), 1411-1415.Google Scholar
Digital Library
- 9 Hsu, P.Y.-T. Highly Concurrent Scalar Processing. Coordinated Science Lab. Technical Report CSG-49. University of Illinois, 1986.Google Scholar
- 10 Jain, S. Circular scheduling: a new technique to perform software pipelining,m' rroceeamgs oj ~ne ~c. ln olurt~,~v ~I Conference on Programming Language Design and Implementation, (1991), 219-228. Google Scholar
Digital Library
- 11 Dam, M. Software pipelining; an effective scheduling technique for VLIW machines. In Proceedings of the ACM SIGPLAN '88 Conference on Programming Language Design and_ !mp!en~ntation; (!988), 3!8-327: Google Scholar
Digital Library
- 12 Lee, R.L., Kwok, A.Y., and Briggs, F.A. The floating point performance of a superscalar SPARC processor. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, (Santa Clara, California, 1991), 28-37. Google Scholar
Digital Library
- 13 Mahlke, S.A., et al. Sentinel scheduling for VLIW and superscaiar processors, in Proceedings of the The Fifth international Conference on Architectural Support for Programming Languages and Operating Systems, (Boston, IVIaSSaC, IIU$St~LI~, 1 ~,~,). Google Scholar
Digital Library
- 14 Nicolau, A., and Fisher, J.A. Measuring the parallelism available for very long instruction word architectures. IEEE Transactions on Comp,ters C-33. Google Scholar
Digital Library
- 15 Nicolau, A., and Potasman, R. Realistic scheduling: compaction for pipelined axchitecmres. In Proceedings of the 23th Ann_u.a! Workshop on Microprogramming and Microarchitecture, (Orlando, Florida, 1990), 69-79. Google Scholar
Digital Library
- 16 Rau, B.R., and Glaeser, C.D. Some scheduling techniques and an easily schedulahle horizontal architecture for high performance scientific computing. In Proceedings of the Fourteenth Annual Workshop on Microprogramming, (1981), 183-198. Google Scholar
Digital Library
- 17 Rau, B.R., Lee, M., Tirumalai, P., and Schlansker, M.S. Register allocation for software pipelined loops. In Proceedings of the SIGPLAN'92 Conference on Programming Language Design and implementation, (San Francisco, 1992). Google Scholar
Digital Library
- 18 Rau, B.R., Schlansker, M.S., and Tirumalai, P.P. Code generation schemas for modulo scheduled DO-loops and Laboratories, 1992.Google Scholar
- 19 Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. The Cydra s; departmental mpercomputer: design nhiln.~cmhles~ decisions and trade-offs. IEEE Computer 22, i (1989). Google Scholar
Digital Library
- 20 Riseman, E.M., and Foster, C.C. The inhibition of potential parallelism by conditional jumps, iEEE Transactions on Computers C-21, 12 (1972), 1405-1411.Google Scholar
- 21 Su, B., and Wang, J. GURPR*: a new global software pipelining algorithm. In Proceedings of the 24th Annual International Symposium on Microarchitecture, (Albuquerque, New Mexico, 1991), 212-216. Google Scholar
Digital Library
- 22 Tirumalai, P., Lee, M., and Schlansker, M.S. Parallelizatioa of loops with exits on pipeiined architectures, in Proceedings of the Supercomputing '90, (1990), 200-212. Google Scholar
Digital Library
- 23 Tjaden, G.S., and Flynn, M.J. Detection and parallel execution OI pigasiel iu~tructlun~, tg~,f-, Transactions on Computers C-19 10 (1970), 889-895.Google Scholar
Digital Library
- 24 Touzeau, R.F. A FORTRAN compiler for the FPS-164 Scientific computer. In proceedings of the ACM SIGPLAN'84 Symposium on Compiler Construction, (1984), 48-57. Google Scholar
Digital Library
- 25 Wall, D.W. Limits of instruction-level parallelism. In Prncn~.dln~,s nf the Fourth International Conference on ~4rchite-ctu~al-Support-for'-Programming Languages and Operating Systems, (1991), 176-188. Google Scholar
Digital Library
Index Terms
Code generation schema for modulo scheduled loops
Recommendations
General loop fusion technique for nested loops considering timing and code size
CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systemsLoop fusion is commonly used to improve the instruction-level parallelism of loops for high-performance embedded computing systems. Loop fusion, however, is not always directly applicable because the fusion prevention dependencies may exist among loops. ...
Timing Optimization of Nested Loops Considering Code Size for DSP Applications
ICPP '04: Proceedings of the 2004 International Conference on Parallel ProcessingSoftware pipelining for nested loops remains a challenging problem for embedded system design. The existing software pipelining techniques for single loops can only explore the parallelism of the innermost loop, so the final timing performance is ...
Timing optimization via nest-loop pipelining considering code size
Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, ...






Comments