Abstract
It has been shown that parallelism is a very promising alternative for enhancing computer performance. Parallelism, however, introduces much complexity to the programming effort. This has lead to the development of automatic concurrency extraction techniques. Prior work has demonstrated that static program restructuring via compiler based techniques provides a large degree of parallelism to the target machine. Purely hardware based extraction techniques (without software preprocessing) have also demonstrated significant (but lesser) degrees of parallelism. This paper considers the performance effects of the combination of both hardware and software techniques. The concurrency extracted from a given set of benchmarks by each technique separately, and together, is determined via simulations and or analysis. The "common parallelism" extracted by the two methods is thus also considered, using new metrics. The analytic techniques for predicting the performance of specific programs are also described.
- Acosta, R. D., Kjelstrup, J., and Torng, H. C. An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors. IEEE Transactions on Computers C-35:815--828, September, 1986. Google Scholar
Digital Library
- Banerjee, U. Speedup of Ordinary Programs. PhD thesis, University of Illinois at Urbana-Champaign, October, 1979. Available as DCS Report No. UIUCDCS-R-79-989. Google Scholar
Digital Library
- Chamberlin, D. D. The Single-Assignment Approach to Parallel Processing. In Fall Joint Computer Conference, pages 263--269, AFIPS, 1971.Google Scholar
Digital Library
- Cytron, R. G. Doacross: Beyond Vectorization for Multiprocessors (Extended Abstract). In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844. Pennsylvania State University and the IEEE Computer Society, August, 1986.Google Scholar
- Hwu, W. and Pau, Y. HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality. In Proceedings of the 13th Annual Symposium on Computer Architecture, pages 297--306. ACM-IEEE, June, 1986. Google Scholar
Digital Library
- Keller, R. M. Look-Ahead Processors. ACM Computing Surveys 7(4):177--195, December, 1975. Google Scholar
Digital Library
- Kolen, J. F. Characterization of Concurrently Executed Programs. 1987. Undergraduate project report, Dept. of Electrical Engineering and Computer Sciences, University of California at San Diego, La Jolla, CA.Google Scholar
- Kuck, D. J., Muraoka, Y, and Chen, S.-C. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup, IEEE Transactions on Computers C-21(12):1293--1310, December, 1972.Google Scholar
Digital Library
- Kuck, D. J. A Survey of Parallel Machine Organization and Programming. ACM Computing Surveys 9(1):29--59, March, 1977. Google Scholar
Digital Library
- Kuck, D. J. The Structure of Computers and Computations. John Wiley & Sons, New York, NY, 1978. Google Scholar
Digital Library
- Kuck. D. J., Kuhn, R. H., Leasure, B., and Wolfe, M. The Structure of an Advanced Voctorizer for Pipelined Processors. In Proceedings of the Fourth International Computer Software and Applications Conference. ACM, October, 1980.Google Scholar
- Pau, Y., Hwu, W., and Shebanow, M. HPS, a New Microarchitecture: Rationale and Introduction. In Proceedings of MICRO-18, pages 101--108, ACM, December, 1985. Google Scholar
Digital Library
- Polychronopoulos, C. D., Kuck, D. J., and Padua. D. A. Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems. IEEE Transactions on Computers, publication date unknown. Accepted for publication as of September 1987. Google Scholar
Digital Library
- Polychronopoulos, C. D., On Program Restructuring, Scheduling, and Communication for Parallel Processor Systems. PhD thesis. University of Illinois at Urbana-Champaign, August, 1986. Available as Center for Supercomputing Research and Development Tech. Report CSRD No. 595. Google Scholar
Digital Library
- Polychronopoulos, C. D., and Banerjee, U. Processor Allocation for Horizontal and Vertical Parallelism and Related Speedup Bounds. IEEE Transactions on Computers. April, 1987. Special Issue on Parallel and Distributed Processing. Google Scholar
Digital Library
- Polychronopoulos, C. D., and Kuck, D.J. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers, December 1987. Special Issue on Supercomputing. Google Scholar
Digital Library
- Thorton, J. E. Parallel Operation in the Control Data 6600. In Proceedings of the Full Joint Computer Conference, pages 33--40, AFIPS, 1964.Google Scholar
- Tjaden, G. S. Representation and Detection of Concurrency Using Ordering Matrices. PhD thesis. The Johns Hopkins University, 1972. Google Scholar
Digital Library
- Tjaden, G. S. and Flynn, M. J. Representation of Concurrency with Ordering Matrices. IEEE Transaction on Computers C-22(8):752--761, August, 1973. Google Scholar
Digital Library
- Tomasulo, R. M. An Efficient Algorithm for Exporting Multiple Arithmetic Units. IBM Journal:25--33, January, 1967.Google Scholar
- Uht, A. K. Hardware Extraction of Low-level Concurrency from Sequential Instruction Streams. PhD thesis, Carnegie-Mellon University, Pittusburgh, PA, December, 1985. Available from University Microfilms International, Ann Arbor, Michigan. U.S.A. Google Scholar
Digital Library
- Uht, A. K. An Efficient Hardware Algorithm to Extract Concurrency From General-Purpose Code. In Proceedings of the Nineteenth Annual Hawaii International Conference on System Sciences. University of Hawaii, in cooperation with the ACM and the IEEE Computer Society, January, 1986.Google Scholar
- Uht, A. K. and Wedig, R. G. Hardware Extraction of Low-level Concurrency from Serial Instruction Streams. In Proceedings of the International Conference on Parallel Processing, pages 729--736. IEEE Computer Society and the Association for Computing Machinery, August, 1986.Google Scholar
- Uht, A. K. Incremental Performance Contributions of Hardware Concurrency Extraction Techniques. In Proceedings of the International Conference on Supercomputing, Athens, Greece. Computer Technology Institute, Greece, in cooperation with the Association for Computing Machinery, IFIP, et al, June, 1987. Springer-Verlag Lecture Note Series. In publications. Google Scholar
Digital Library
- Wedig, R. G. Detection of Concurrency in Directly Executed Language Instruction Streams. PhD thesis, Stanford University, June, 1982.Google Scholar
Index Terms
On the combination of hardware and software concurrency extraction methods
Recommendations
On the combination of hardware and software concurrency extraction methods
MICRO 20: Proceedings of the 20th annual workshop on MicroprogrammingIt has been shown that parallelism is a very promising alternative for enhancing computer performance. Parallelism, however, introduces much complexity to the programming effort. This has lead to the development of automatic concurrency extraction ...
Parallel Software for Inductance Extraction
ICPP '04: Proceedings of the 2004 International Conference on Parallel ProcessingThe next generation VLSI circuits will be designed with millions of densely packed interconnect segments on a single chip. Inductive effects between these segments begin to dominate signal delay as the clock frequency is increased. Modern parasitic ...






Comments