Abstract
The problem of automatic loop parallelization has received a lot of attention in the area of parallelizing compilers. Automatic loop parallelization can be achieved by several algorithms. In this paper we address the problem of time optimal parallelization of loops with conditional jumps. We prove that even for machines with unlimited resources there are simple loops for which no semantically and algorithmically equivalent time optimal program exists.
- 1 Adam T.L., Chandy K.M ., and Dickson J .R. A comparaison of list schedules for parallel processing systems. Communications of the ACM, 17( 12):685-690, December 1974. Google Scholar
Digital Library
- 2 Aiken A. and Nicolau A. A development environment for horizontal microcode. IEEE Transactions on Software Engineering, 14(5):584-594, May 1988. Google Scholar
Digital Library
- 3 Aiken A. and Nicolau A. Optimal loop parallelization. In Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, pages 308-317, ACM, 1988. Google Scholar
Digital Library
- 4 Aiken A. and Nicolau A. Perfect pipelining: a new loop parallelization technique. In European Symposium on Programming, pages 221-235, Springer-Verlag, Lecture notes in Computer Science No. 300, June 1988. Google Scholar
Digital Library
- 5 Allen R. and Kennedy K. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491-542, October 1987. Google Scholar
Digital Library
- 6 Coffman E.G. Computer and Job-shop Scheduling Theory. John Wiley and Sons, 1976.Google Scholar
- 7 Cytron R. Doacross: beyond vectorization for multiprocessors. In International Conference on Parallel Processing, pages 836-844, IEEE, August 1986.Google Scholar
- 8 Ebcioglu K. Some design ideas for a VLIW architecture for sequential-natured software. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Processing, pages 1-21, North- Holland, April 1988.Google Scholar
- 9 Ferrante J., Ottenstein K.J., and Warren J.D. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319-349, July 1987. Google Scholar
Digital Library
- 10 Fisher J.A. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, C-30(7):478-490, July 1981.Google Scholar
Digital Library
- 11 Fisher J.A. The VLIW machine: a multiprocessor for compiling scientific code. IEEE Computer, 17(7):45-53, July 1984.Google Scholar
Digital Library
- 12 Foster C.C. and Riseman E.M. Percolation of code to enhance parallel dispatching and execution. IEEE Transactions on Computers, C- 21(12):1411-1415, December 1972.Google Scholar
Digital Library
- 13 Garey M.R. and Johnson D.S. Computers and Intractability - A Guide to the Theory of NP- Completeness. Freeman, 1979. Google Scholar
Digital Library
- 14 Lam M. Software pipelining: an effective scheduling technique for VLIW machines. In Proceedings of the SIGPLAN 1988 Conference on Programming Language Design and Implementation, pages 318-328, ACM, June 1988. Google Scholar
Digital Library
- 15 Padua D.A., Kuck D.J. and Lawrie D.H. Highspeed multiprocessors and compilation techniques. IEEE Transactions on Computers, C- 29(9):763-776, September 1980.Google Scholar
Digital Library
- 16 Padua D. A. and Wolfe M. J. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184-1201, December 1986. Google Scholar
Digital Library
- 17 Polychronopoulos C.D. Parallel Programming and Compilers. Kluwer Academic Publishers, 1988. Google Scholar
Digital Library
- 18 Riseman E.M. and Foster C.C. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405-1411, December 1972.Google Scholar
Digital Library
- 19 Uht A.K. Requirements for optimal execution of loops with tests. In 1988 ACM International Conference on Supercomputing, ACM, July 1988. St. Malo, France. Google Scholar
Digital Library
Index Terms
On optimal loop parallelization
Recommendations
Loop Parallelization: Revisiting Framework of Unimodular Transformations
PDP '96: Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)Abstract: The paper extends the framework of linear loop transformations adding a new nonlinear step at the transformation process. The current framework of linear loop transformation cannot identify a significant fraction of parallelism. For this ...
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsThis paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on ...






Comments