skip to main content
10.1145/1133981.1133996acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Optimizing data permutations for SIMD devices

Authors Info & Claims
Published:11 June 2006Publication History

ABSTRACT

The widespread presence of SIMD devices in today's microprocessors has made compiler techniques for these devices tremendously important. One of the most important and difficult issues that must be addressed by these techniques is the generation of the data permutation instructions needed for non-contiguous and misaligned memory references. These instructions are expensive and, therefore, it is of crucial importance to minimize their number to improve performance and, in many cases, enable speedups over scalar code.Although it is often difficult to optimize an isolated data reorganization operation, a collection of related data permutations can often be manipulated to reduce the number of operations. This paper presents a strategy to optimize all forms of data permutations. The strategy is organized into three steps. First, all data permutations in the source program are converted into a generic representation. These permutations can originate from vector accesses to non-contiguous and misaligned memory locations or result from compiler transformations. Second, an optimization algorithm is applied to reduce the number of data permutations in a basic block. By propagating permutations across statements and merging consecutive permutations whenever possible, the algorithm can significantly reduce the number of data permutations. Finally, a code generation algorithm translates generic permutation operations into native permutation instructions for the target platform. Experiments were conducted on various kinds of applications. The results show that up to 77% of the permutation instructions are eliminated and, as a result, the average performance improvement is 48% on VMX and 68% on SSE2. For several applications, near perfect speedups have been achieved on both platforms.

References

  1. Aart J. C. Bik. The Software Vectorization Handbook : Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CCIR Recommendation 601-2. Encoding Parameters of Digital Television for Studios, 1990.Google ScholarGoogle Scholar
  3. Siddhartha Chatterjee, John R. Gilbert, Robert Schreiber, and Shang-Hua Teng. Automatic array alignment in data-parallel programs. In POPL '93: Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 16--28. ACM Press, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gerald Cheong and Monica Lam. An optimizer for multimedia instruction sets. In Proceedings of the Second SUIF Compiler Workshop, 1997.Google ScholarGoogle Scholar
  5. E. Dahlhaus, D. S. Johson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SIAM J. Computing, 23:864--894, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexandre E. Eichenberger, Peng Wu, and Kevin O'Brien. Vectorization for SIMD architectures with alignment constraints. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 82--93. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Franz Franchetti, Stefan Kral, Juergen Lorenz, and Christoph W. Ueberhuber. Efficient utilization of SIMD extensions. Proceedings of the IEEE, 93(2):409--425, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  8. Free Software Foundation. Auto-vectorization in GCC, 2004. GCC.Google ScholarGoogle Scholar
  9. Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gwan-Hwan Hwang, Jenq Kuen Lee, and Dz-Ching Ju. An array operation synthesis scheme to optimize FORTRAN 90 programs. In PPOPP '95: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 112--122. ACM Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel Corporation. IA32 Intel Architecture Optimization, 2004.Google ScholarGoogle Scholar
  12. Andreas Krall and Sylvain Lelait. Compilation techniques for multimedia processors. International Journal of Parallel Programming, 28(4):347--361, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  13. Alexei Kudriavtsev and Peter Kogge. Generation of permutations for SIMD processors. In LCTES'05: Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, pages 147--156. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Samuel Larsen and Saman Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI '00: Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, pages 145--156. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Samuel Larsen, Emmett Witchel, and Saman P. Amarasinghe. Increasing and detecting memory address congruence. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 18--29. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rainer Leupers. Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xiaoming Li, Maria Jesus Garzaran, and David Padua. Optimizing sorting with genetic algorithms. In CGO '05: Proceedings of the international symposium on Code generation and optimization, pages 99--110. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Motorola Inc. AltiVec Technology Programming Environments Manual, 1998.Google ScholarGoogle Scholar
  19. Dorit Naishlos, Marina Biberstein, Shay Ben-David, and Ayal Zaks. Vectorizing for a SIMdD DSP architecture. In CASES '03: Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, pages 2--11. ACM Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Manikandan Narayanan and Katherine A. Yelick. Generating permutation instructions from a high-level description. In MSP '04: Proceedings of the 6th Workshop on Media and Streaming Processors, 2004.Google ScholarGoogle Scholar
  21. Dorit Nuzman, Ira Rosen, and Ayal Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI '06: Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Markus Puschel, Jose M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan W. Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nick Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, 93(2):232--275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. Gang Ren, Peng Wu, and David Padua. An empirical study on the vectorization of multimedia applications for multimedia extensions. In IPDPS '05: Proceedings of the 19th International Parallel & Distributed Processing Symposium, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nicholas Rizzolo and David Padua. HiLO: High level optimization of FFTs. In LCPC '04: Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodik, and Kemal Ebcioglu. Programming by sketching for bit-streaming programs. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 281--294. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming, 28(4):363--300, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  27. Peng Wu, Alexandre E. Eichenberger, and Amy Wang. Efficient SIMD code generation for runtime alignment and length conversion. In CGO '05: Proceedings of the International Symposium on Code Generation and Optimization, pages 153--164. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jianxin Xiong, Jeremy Johnson, Robert Johnson, and David Padua. SPL: a language and compiler for dsp algorithms. In PLDI '01: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, pages 298--308. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimizing data permutations for SIMD devices

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2006
      438 pages
      ISBN:1595933204
      DOI:10.1145/1133981
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 41, Issue 6
        Proceedings of the 2006 PLDI Conference
        June 2006
        426 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1133255
        Issue’s Table of Contents

      Copyright © 2006 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate406of2,067submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!