skip to main content
research-article

A compiler framework for extracting superword level parallelism

Authors Info & Claims
Published:11 June 2012Publication History
Skip Abstract Section

Abstract

SIMD (single-instruction multiple-data) instruction set extensions are quite common today in both high performance and embedded microprocessors, and enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. In this paper, we propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling, of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.

References

  1. Pentium processor with MMX technology. http://edc.intel.com/Platforms/Previous/Processors/Pentium-MMX/.Google ScholarGoogle Scholar
  2. NAS parallel benchmark suite. http://www.nas.nasa.gov/Resources/Software/npb.html.Google ScholarGoogle Scholar
  3. Spec cpu2006. http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  4. The SUIF 2 compiler system. http://suif.stanford.edu/suif/suif2/.Google ScholarGoogle Scholar
  5. R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bik, M. Girkar, P. Grey, and X. Tian. Automatic intra-register vectorization for the intel architecture. IJPP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. Proc. of PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. DeVries. A vectorizing SUIF compiler: Implementation and performance. Master's Thesis, 1997.Google ScholarGoogle Scholar
  9. A. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. PLDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Hanxleden and K. Kennedy. Relaxing SIMD control flow constraints using loop transformations. PLDI, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Henretty, K. Stock, L. Pouchet, F. Franchetti, J. Ramanujam, and P. Sadayappan. Data layout transformation for stencil computations on short-vector SIMD architectures. CC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hohenauer, F. Engel, R. Leupers, G. Ascheid, and H. Meyr. A SIMD optimization framework for retargetable compilers. TACO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. IBM. PowerPC microprocessor family: Vector/SIMD multimedi extension technology programming environments manual. IBM Systems and Technology Group, 2005.Google ScholarGoogle Scholar
  14. Intel. IA-32 intel architecture optimization reference manual. 2005.Google ScholarGoogle Scholar
  15. A. Krall and S. Lelait. Compilation techniques for multimedia processors. IJPP, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Larsen. Compilation techniques for short-vector instructions. PhD Thesis, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. PLDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Lee and D. DeVries. Initial results on the performance and cost of vector microprocessors. Micro, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Lee and M. Stoodley. Simple vector microprocessors for multimedia applications. Micro, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Leupers and F. David. A uniform optimization technique for offset assignment problems. ISSS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Nuzman and A. Zaks. Outer-loop vectorization - revisited for short SIMD architectures. PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Oberman, G. Favor, and F. Weber. Amd 3dnow! technology: Architecture and implementations. IEEE MICRO, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Ren, P. Wu, and D. Padua. Optimizing data permutations for simd devices. PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Shin. Compiler optimizations for architectures supporting superword-level parallelism. PhD Thesis, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Shin, J. Chame, and M. Hall. Compiler-controlled caching in superword register files for multimedia extension architectures. PACT, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Shin, J. Chame, and M. Hall. Exploiting superword-level locality in multimedia extension architectures. JILP, 2003.Google ScholarGoogle Scholar
  28. J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. IJPP, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  30. C. Tenllado, L. Pinuel, M. Prieto, and F. Catthoor. Pack transposition: Enhancing superword level parallelism exploitation. PARCO, 2005.Google ScholarGoogle Scholar
  31. C. Tenllado, L. P. M. Prieto, F. Tirado, and F. Catthoor. Improving superword level parallelism support in modern compilers. CODES+ISSS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Weiss. Strip-mining on SIMD architectures. ICS, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Wilson, R. French, C. Wilson, S. Amarasinghe, J. Anderson, S. Tjiang, S. Liao, C. Tseng,M. Hall,M. Lam, and J. Hennessy. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P.Wu, A. Eichenberger, and A.Wang. Efficient SIMD code generation for runtime alignment and length conversion. CGO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A compiler framework for extracting superword level parallelism

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 6
      PLDI '12
      June 2012
      534 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2345156
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2012
        572 pages
        ISBN:9781450312059
        DOI:10.1145/2254064

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!