Abstract
We describe a compilation algorithm for efficient software pipelining of general inner loops, where the number of iterations and the time taken by each iteration may be unpredictable, due to arbitrary if-then-else statements and conditional exit statements within the loop. As our target machine, we assume a wide instruction word architecture that allows multi-way branching in the form of if-then-else trees, and that allows conditional register transfers depending on where the microinstruction branches to (a hardware implementation proposal for such a machine is briefly described in the paper). Our compilation algorithm, which we call the pipeline scheduling technique, produces a software-pipelined version of a given inner loop, which allows a new iteration of the loop to begin on every cycle whenever dependences and resources permit. The correctness and termination properties of the algorithm are studied in the paper.
- Agerwala, T. and Cocke, J. (87) "High Performance Reduced Instruction Set Computers" research report no. RC 12434, IBM Thomas J. Watson Research Center, Yorktown Heights, 1987.Google Scholar
- Allen, R. and Kennedy, K. (84) "Automatic Translation of Fortran Programs to Vector Form" Technical report COMP TR84-9, Dept. of Computer Science, Rice University, July 1984.Google Scholar
- Anderson, D.W., Sparacio, F.J., and Tomasulo, F.M. (67) "The IBM System/360 Model 91: Machine Philosophy and Instruction Handling" IBM Journal of Research and Development, Vol. 11, January 1967.Google Scholar
- Arvind, and Ianucci, R.A. (83) "A Critique of Multiprocessing von Neumann Style" Proc. 10th Annual International Conference on Computer Architecture, 1983. Google Scholar
Digital Library
- Banerjee, U., Gajski, D, and Kuck, D. (80) "Array Machine Control Units for Loops Containing IFs" Proc. 1980 International Conference on Parallel Processing.Google Scholar
- Beetem, J., Deaneau, M., and Weingarten, D. (85) "The GFII Supercomputer" Proc. 12th Annual International Symposium on Computer Architecture, June 1985. Google Scholar
Digital Library
- Charlesworth, A.E. (81) "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family" IEEE Computer, September 1981.Google Scholar
- Cytron, R.G. (84) "Compile-time Scheduling and Optimization for Asynchronous Machines" Report no. UIUCDCS-R-84-1177, Dept. of Computer Science, University of Illinois at Urbana-Champaign, October 1984.Google Scholar
- Davies, J.R.B. (81) "Parallel Loop Constructs For Multiprocessors" Report no. UIUDCS-R-81-1070. Dept. of Computer Science, University of Illinois at Urbana-Champaign, May 1981.Google Scholar
- de Bakker, J. (79) "Mathematical Theory of Program Correctness" North Holland, 1979. Google Scholar
Digital Library
- Ellis, J.R. (86) "Bulldog: A Compiler for VLIW Architectures" MIT Press, 1986. Google Scholar
Digital Library
- Fisher, J. (79) "The Optimization of Horizontal Microcode within and beyond Basic Blocks: An Application of Processor Scheduling with Resources" Ph.D. Thesis, Dept. of Computer Science, New York University, October 1979. Google Scholar
Digital Library
- Fisher, J.A. (82) "Very Long Instruction Word Architectures and the ELI-512" Research report # 253, Dept. of Computer Science, Yale University, December 1982.Google Scholar
- Hagiwara, H., Tomita, S., Oyanagi, S., Shibayama, K. (80) "A Dynamically Microprogrammable Computer with Low-level Parallelism" IEEE Transactions on Computers, Vol C-29, no. 7, July 1980.Google Scholar
Digital Library
- Kogge, P.M. (77) "The Microprogramming of Pipelined Processors" Proc. 4th Annual International Symposium on Computer Architecture, 1977. Google Scholar
Digital Library
- Li, Z., and Abu-Sufah, W. (85) "A Technique for Reducing Synchronization Overhead in Large Scale Multiprocessors" Proc. 12th Annual International Symposium on Computer Architecture, 1985. Google Scholar
Digital Library
- Multiflow Computer Inc. (87), "Technical Summary" (Trace(™) series computers), Branford, Connecticut, 1987.Google Scholar
- Munshi, A. A., and Simons, B. (87) "Scheduling Loops on Processors: Algorithms and Complexity" Research report no. RJ 5546, IBM Thomas J. Watson Research Center, Yorktown Heights, March 1987.Google Scholar
- Nanodata Computer Corporation (79) "OM-I Hardware Level User's Manual" Buffalo, New York, 1979.Google Scholar
- Nicolau, A. (85) "Percolation Scheduling: A Parallel Compilation Technique" TR 85--678, Dept. of Computer Science, Cornell University, May 1985. Google Scholar
Digital Library
- Padua-Haiek, D.A. (79) "Multiprocessors: Discussion of Some Theoretical and Practical Problems" Report no. UIUCDCS-R-79-990, University of Illinois at Urbana-Champaign, November 1979.Google Scholar
- Pfister, G. F., Brantley, W. C., George, D. A., Harvey, S. L., Kleinfelder, W. J., McAuliffe, K. P., Melton, E. A., Norton, V. A., and Weiss, J. (85) "The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture" Proceedings of the 1985 International Conference on Parallel Processing, August 1985.Google Scholar
- Ramamoorthy, C. V. and Li, H. F. (77) "Pipeline Architecture" ACM Computing Surveys, Vol. 9, No. 1, March 1977. Google Scholar
Digital Library
- Rau, B. R., Glaeser, C.D. (81) "Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High-performance Scientific Computing" Proc. 14th Annual Microprogramming Workshop, October 1981. Google Scholar
Digital Library
- Rau, B. R., Glaeser, C. D., and Picard, R. L. (82) "Efficient Code Generation for Horizontal Architectures: Compiler Techniques and Architectural Support" Proc. 9th Symposium on Computer Architecture, April 1982. Google Scholar
Digital Library
- Smith, B. J. (81) "Architecture and Applications of the HEP Multiprocessor Computer System" Real Time Signal Processing IV, Proceedings of SPIE, 1981.Google Scholar
- Southard, J. (84) "MACPITTS: An Approach to Silicon Compilation" Computer Magazine, December 1984.Google Scholar
- Su, B., Ding, S., and Xia, J. (86) "URPR - An Extension of URCR for Software Pipelining" Proc. 19th Annual Microprogramming Workshop, October 1986. Google Scholar
Digital Library
- Tomasulo, R. M. (67) "An Efficient Algorithm for Exploiting Multiple Arithmetic Units" IBM Journal of Research and Development, vol. 11, January 1967.Google Scholar
Digital Library
- Tomita, S., Shibayama, K., Toshiyuki, N., Yuasa, S., and Hagiwara, H. (86) "A Computer with Low-Level Parallelism QA-2" Proc. 13th Annual International Symposium on Computer Architecture, 1986. Google Scholar
Digital Library
- Touzeau, R. F. (84) "A Fortran Compiler for the FPS-164 Scientific Computer" Proceedings of the SIGPLAN '84 Symposium on Compiler Construction, June 1984. Google Scholar
Digital Library
Index Terms
A compilation technique for software pipelining of loops with conditional jumps
Recommendations
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20: Proceedings of the 20th annual workshop on MicroprogrammingWe describe a compilation algorithm for efficient software pipelining of general inner loops, where the number of iterations and the time taken by each iteration may be unpredictable, due to arbitrary if-then- else statements and conditional exit ...
Software Pipelining of Nested Loops
CC '01: Proceedings of the 10th International Conference on Compiler ConstructionSoftware pipelining is a technique to improve the performance of a loop by overlapping the execution of several iterations. The execution of a software-pipelined loop goes through three phases: prolog, kernel, and epilog. Software pipelining works best ...
Single-dimension software pipelining for multidimensional loops
Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...






Comments