Abstract
Horizontally microprogrammable CPUs belong to a class of machines having statically schedulable parallel instruction execution (SPIE machines). Several experiments have shown that within basic blocks, real code only gives a potential speed-up factor of 2 or 3 when compacted for SPIE machines, even in the presence of unlimited hardware. In this paper, similar experiments are described. However, these measure the potential parallelism available using any global compaction method, that is, one which compacts code beyond block boundaries. Global compaction is a subject of current investigation; no measurements yet exist on implemented systems.
The approach taken is to first assume that an oracle is available during compaction. This oracle can resolve all dynamic considerations in advance, giving us the ability to find the maximum parallelism available without reformulation of the algorithm. The parallelism found is constrained only by legitimate data dependencies, since questions of conditional jump directions and unresolved indirect memory references are answered by the oracle. Using such an oracle, we find that typical scientific programs may be sped up by anywhere from 3 to 1000 times. These dramatic results provide an upper bound for global compaction techniques. We describe experiments in progress which attempt to limit the oracle progressively, with the aim of eventually producing one which provides only information that may be obtained by a very good compiler. This will give us a more practical measure of the parallelism potentially obtainable via global compaction methods.
- 1 Aho A. V. and Ullman, J. D. Principles of compiler design. Addison-Wesley, 1974. Google Scholar
Digital Library
- 2 Banerjee, U.; Chen, S.-C.; Kuck, D.J.; and Towle, R.A. Time and Parallel Processor Bounds for Fortran-Like Loops. IEEE Trans. Comp. C-28, 9 (September 1979), 660-669.Google Scholar
- 3 Chen, S.-C.; and Kuck, D.J. Time and Parallel Processor Bounds for Linear Recurrence Systems. IEEE Trans. Comp. C-24, 7 (July 1975), 701-717.Google Scholar
- 4 Fisher, J.A.; Landskov, D. and Shriver, B.D. Microcode Compaction: Looking Backward and Looking Forward. 1981 National Computer Conference, AFIPS, 1981, pp. 95-102.Google Scholar
Digital Library
- 5 Fisher, J. A. Trace scheduling: A technique of global microcode compaction. IEEE Trans. Comp. C-30, 7 (July 1981).Google Scholar
- 6 Fisher, J. A. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. U.S. Department of Energy Report C00-3077-161, Courant Mathematics and Computing Laboratory, New York University, October, 1979.Google Scholar
Cross Ref
- 7 Kuck,D.J.; Muraoka,Y.; and Chen, S.-C. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup. IEEE Trans. Comp. C-21, 12 (December 1972), 1293-1310.Google Scholar
- 8 Lamport, L. The Parallel Execution of Do Loops. Comm. ACM 17, 2 (Feb. 1974), 83-93. Google Scholar
Digital Library
- 9 Riseman, E. M.; and Foster, C. C. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Trans. Comp. 21, 12 (Dec. 1972), 1405-1411.Google Scholar
- 10 Tjaden, G. S.; and Flynn, M. J. Detection and Parallel Execution of Independent Instructions. IEEE Trans. Comp. 19, 10 (Oct. 1970), 889-895.Google Scholar
Index Terms
Using an oracle to measure potential parallelism in single instruction stream programs
Recommendations
Using an oracle to measure potential parallelism in single instruction stream programs
MICRO 14: Proceedings of the 14th annual workshop on MicroprogrammingHorizontally microprogrammable CPUs belong to a class of machines having statically schedulable parallel instruction execution (SPIE machines). Several experiments have shown that within basic blocks, real code only gives a potential speed-up factor of ...
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...






Comments