Abstract
Lookahead is a common technique for high performance uniprocessor design. In general, however, hardware lookahead window is too small to exploit instruction-level parallelism at run time, while compaction-based parallelizing compilers must suffer from worst-case exponential code explosion at compile time. In this paper, we propose a software lookahead method, which allows inter-basic block code motions within the prespecified number of operations, called software lookahead window, on any path emanating from the currently processed instruction at compile time. By software lookahead, instruction-level parallelism can be exploited in a much greater code area than the hardware approach, but the lookahead region is still limited to a constant depth with a user-specifiable window, and thus code explosion is restricted. The proposed scheme has been implemented in our prototype parallelizing compiler, which can generate code for uniprocessors with multiple functional units and multiway conditional branches, such as VLIW machines, and potentially for superscalars as well. To study code explosion problem and instruction-level parallelism for branch intensive code, we compiled five AIX utilities: sort, fgrep, sed, yacc, and compress. It is demonstrated that, with software lookahead, code explosion problem is effectively alleviated, yet a substantial amount of inter-basic block parallelism is successfully extracted.
- Aiken, A. {1988}. Compaction-Based Parallelization, Ph.D. Dissertation, Department of Computer Science, Cornell University. Google Scholar
Digital Library
- Aiken, A. and Nicolau, A. {1988}. Perfect Pipelining: A New Loop Parallelization Technique, In Proceedings of the 1988 European Symposium on Programming, pp. 221--235. Google Scholar
Digital Library
- Ebcioǧlu, K. {1987}. Compilation Technique for Software Pipelining of Loops with Conditional Jumps, In Proceedings of the 20th Annual Workshop on Microprogramming, pp. 69--79, ACM Press. Google Scholar
Digital Library
- Ebcioǧlu, K. {1988}. Some Design Ideas for a VLIW Architecture for Sequential Natured Software, In Parallel Processing (Proceedings of IFIP WG 10.3 Working Conference on Parallel Processing), edited by M. Cosnard et al., North Holland.Google Scholar
- Ebcioǧlu, K. and Nicolau, A. {1989} A Global Resource-Constrained Parallelization Technique, In Proceedings of the Third International Conference on Supercomputing, pp. 154--163, Crete. Google Scholar
Digital Library
- Ebcioǧlu, K. and Nakatani, T. {1989}. A New Compilation Technique for Parallelizing Loops with Unpredictable Branches on a VLIW Architecture, In Languages and Compilers for Parallel Computing, edited by D. Gelernter et al., Research Monographs in Parallel and Distributed Computing, MIT Press. Google Scholar
Digital Library
- Ellis, J. {1985}. Bulldog: A Compiler for VLIW Architectures, Ph.D. Dissertation, Department of Computer Science, Yale University. Google Scholar
Digital Library
- Ferrante, J., Ottenstein, K., and Warren, J. {1987}. The Program Dependence Graph and Its Use in Optimization, ACM Transactions on Programming Languages and Systems, 9:3, pp. 319--349. Google Scholar
Digital Library
- Fisher, J. {1981}. Trace Scheduling: A Technique for Global Microcode Compaction, IEEE Transactions on Computers, c-30:7, pp. 478--490.Google Scholar
Digital Library
- Gupta, R. and Soffa, M. {1990}. Region Scheduling: An Approach for Detecting and Redistributing Parallelism, IEEE Transactions on Software Engineering, 16:4, pp. 421--431. Google Scholar
Digital Library
- Hsu, P. {1985}. Highly Concurrent Scalar Processing, Ph.D. Dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign. Google Scholar
Digital Library
- Lam, M. {1988}. Software Pipelining: An Effective Scheduling Technique for VLIW Machines, In Proceedings of the SIGPLAN 1988 Conference of Programming Language Design and Implementation, pp. 318--328, ACM Press. Google Scholar
Digital Library
- Nakatani, T. and Ebcioǧlu, K. {1989}. "Combining" as a Compilation Technique for a VLIW Architecture, In Proceedings of the 22nd Annual International Workshop of Microprogramming and Microarchitecture, ACM and IEEE, pp. 43--55. Google Scholar
Digital Library
- Nicolau, A. {1985}. Percolation Scheduling: A Parallel Compilation Technique, TR 85--678, Department of Computer Science, Cornell University. Google Scholar
Digital Library
- Tomasulo, R. {1967}. An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal of Research and Development, 11:1, pp. 25--33.Google Scholar
Digital Library
- Warren, S. H., Auslander, M. A., Chaitin, G. J., Chibib, A. C., Hopkins, M. E., and MacKay, A. L. {1986}. Final Code Generation in the PL.8 Compiler, Report No. RC 11974, IBM T.J. Watson Research Center.Google Scholar
Index Terms
Using a lookahead window in a compaction-based parallelizing compiler
Recommendations
Using a lookahead window in a compaction-based parallelizing compiler
MICRO 23: Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitectureLookahead is a common technique for high performance uniprocessor design. In general, however, hardware lookahead window is too small to exploit instruction-level parallelism at run time, while compaction-based parallelizing compilers must suffer from ...
Performance of OSCAR multigrain parallelizing compiler on SMP servers
LCPC'04: Proceedings of the 17th international conference on Languages and Compilers for High Performance ComputingThis paper describes performance of OSCAR multigrain parallelizing compiler on various SMP servers, such as IBM pSeries 690, Sun Fire V880, Sun Ultra 80, NEC TX7/i6010 and SGI Altix 3700. The OSCAR compiler hierarchically exploits the coarse grain task ...
Exploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel® OpenMP* Compiler
IPDPS '03: Proceedings of the 17th International Symposium on Parallel and Distributed ProcessingProcessors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. ...






Comments