skip to main content
article
Free access

Exploiting superword level parallelism with multimedia instruction sets

Published: 01 May 2000 Publication History

Abstract

Increasing focus on multimedia applications has prompted the addition of multimedia extensions to most existing general purpose microprocessors. This added functionality comes primarily with the addition of short SIMD instructions. Unfortunately, access to these instructions is limited to in-line assembly and library calls. Generally, it has been assumed that vector compilers provide the most promising means of exploiting multimedia instructions. Although vectorization technology is well understood, it is inherently complex and fragile. In addition, it is incapable of locating SIMD-style parallelism within a basic block.
In this paper we introduce the concept of Superword Level Parallelism (SLP),a novel way of viewing parallelism in multimedia and scientific applications. We believe SLPP is fundamentally different from the loop level parallelism exploited by traditional vector processing, and therefore demands a new method of extracting it. We have developed a simple and robust compiler for detecting SLPP that targets basic blocks rather than loop nests. As with techniques designed to extract ILP, ours is able to exploit parallelism both across loop iterations and within basic blocks. The result is an algorithm that provides excellent performance in several application domains. In our experiments, dynamic instruction counts were reduced by 46%. Speedups ranged from 1.24 to 6.70.

Formats available

You can view the full content in the following formats:

References

[1]
E. Albert, K. Knobe, J. Lukas, and G. Steele, Jr. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the A CM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages, and Systems (PPEALS), New Haven, CT, July 1988.
[2]
J. R. Allen and K. Kennedy. PFC: A Program to Convert Fortran to Parallel Form. In K. Hwang, editor, Supercomputers: Design and Applications, pages 186- 203. IEEE Computer Society Press, Silver Spring, MD, 1984.
[3]
Krste Asanovid, James Beck, Bertrand Irissou, Brian E. D. Kingsbury, Nelson Morgan, and John Wawrzynek. The TO Vector Microprocessor. In Proceedings of Hot Chips VII, August 1995.
[4]
D. Callahan and P. Havlak. Scalar expansion in PFC: Modifications for Parallelization. Supercomputer Software Newsletter 5, Dept. of Computer Science, Rice University, October 1986.
[5]
Derek J. DeVries. A Vectorizing SUIF Compiler: Implementation and Performance. Master's thesis, University of Toronto, June 1997.
[6]
Keith Diefendorff. Pentium III= Pentium II + SSE. Microprocessor Report, 13(3):1,6-11, March 1999.
[7]
Keith Diefendorff. Sony's Emotionally Charged Chip. Microprocessor Report, 13(5):1,6-11, April 1999.
[8]
Keith Diefendorff and Pradeep K. Dubey. How Multimedia Workloads Will Change Processor Design. IEEE Computer, 30(9):43-45, September 1997.
[9]
G. H. Barnes, R. Brown, M. Kato, D. J. Kuck, D. L. Slotnick, and R. A. Stokes. The Illiac IV Computer. IEEE Transactions on Computers, C(17):746-757, August 1968.
[10]
Linley Gwennap. AltiVec Vectorizes PowerPC. Microprocessor Report, 12(6):1,6-9, May 1998.
[11]
Craig Hansen. MicroUnity's MediaProcessor Architecture. IEEE Micro, 16(4):34-41, Aug 1996.
[12]
D.J. Kuck, R.H. Kuhn, D. Padua, B. Leasure, and M. Wolfe. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th A CM Symposium on Priciples of Programming Languages, pages 207-218, Williamsburg, VA, Jan 1981.
[13]
Samuel Larsen, Radu Rugina, and Saman Amarasinghe. Alignment Analysis. Technical Report LCS- TM-605, Massachusetts Institute of Technology, June 2000.
[14]
Corina G. Lee and Derek J. DeVries. Initial Results on the Performance and Cost of Vector Microprocessors. In Proceedings of the 30th Annual International Symposium on MicroArchitecutre, pages 171-182, Research Triangle Park, USA, December 1997.
[15]
Corina G. Lee and Mark G. Stoodley. Simple Vector Microprocessors for Multimedia Applications. In Proceedings of the 31st Annual International Symposium on MicroArchitecutre, pages 25-36, Dallas, TX, December 1998.
[16]
Ruby Lee. Subword Parallelism with MAX-2. IEEE Micro, 16(4):51-59, Aug 1996.
[17]
Glenn Luecke and Waqar Haque. Evaluation of Fortran Vector Compilers and Preprocessors. Software-- Practice and Experience, 21(9), September 1991.
[18]
Marc Tremblay and Michael O'Connor and Venkatesh Narayanan and Liang He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10-20, Aug 1996.
[19]
Motorola. AltiVec Technology Programming Environments Manual, November 1998.
[20]
Alex Peleg and Uri Weiser. MMX Technology Extension to Intel Architecture. IEEE Micro, 16(4):42-50, Aug 1996.
[21]
Radu Rugina and Martin Rinard. Pointer Analysis for Multithreaded Programs. In Proceedings of the SIC- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, CA, May 1999.
[22]
Mark Stephenson, Jonathon Babb, and Saman Amarasinghe. Bitwidth Analysis with Application to Silicon Compilation. In Proceedings of the SICPLAN '00 Conference on Programming Language Design and Implementation, Vancouver, BC, June 2000.
[23]
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.- W. Tseng, M. W. Hall, M. S. Lain, and J. L. Hennessy. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. A CM SIGPLAN Notices, 29(12):31-37, December 1994.

Cited By

View all
  • (2025)Parallel optimization techniques for ultra-large scale refined numerical simulations on Chinese supercomputersCluster Computing10.1007/s10586-024-05057-328:8Online publication date: 19-Aug-2025
  • (2025)A Graph-Based Learning Framework for Compiler Loop Auto-VectorizationIntelligent Computing10.34133/icomputing.01134Online publication date: 2-Jun-2025
  • (2025)REFIT: Improve Code Efficiency via Binary Level Loop Optimization2025 10th International Conference on Computer and Communication System (ICCCS)10.1109/ICCCS65393.2025.11069474(252-257)Online publication date: 18-Apr-2025
  • Show More Cited By

Index Terms

  1. Exploiting superword level parallelism with multimedia instruction sets

      Recommendations

      Comments