skip to main content
research-article

Efficient SIMD code generation for irregular kernels

Authors Info & Claims
Published:25 February 2012Publication History
Skip Abstract Section

Abstract

Array indirection causes several challenges for compilers to utilize single instruction, multiple data (SIMD) instructions. Disjoint memory references, arbitrarily misaligned memory references, and dependence cycles in loops are main challenges to handle for SIMD compilers. Due to those challenges, existing SIMD compilers have excluded loops with array indirection from their candidate loops for SIMD vectorization. However, addressing those challenges is inevitable, since many important compute-intensive applications extensively use array indirection to reduce memory and computation requirements. In this work, we propose a method to generate efficient SIMD code for loops containing indirected memory references. We extract both inter- and intra-iteration parallelism, taking data reorganization overhead into consideration. We also optimally place data reorganization code in order to amortize the reorganization overhead through the performance gain of SIMD vectorization. Experiments on four array indirection kernels, which are extracted from real-world scientific applications, show that our proposed method effectively generates SIMD code for irregular kernels with array indirection. Compared to the existing SIMD vectorization methods, our proposed method significantly improves the performance of irregular kernels by 91%, on average.

References

  1. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 201--212, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Chang and W. Sung. Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware. In Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '08, pages 167--176, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. J. Parallel Distrib. Comput., 22: 462--478, Sep. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Diefendorff, P. K. Dubey, R. Hochsprung, and H. Scales. AltiVec extension to PowerPC accelerates media processing. IEEE Micro, 20: 85--95, Mar./Apr. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. E. Eichenberger, P. Wu, and K. O'Brien. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, PLDI '04, pages 82--93, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Grosser, H. Zheng, R. A, A. Simburger, A. Grosslinger, and L.-N. Pouchet. Polly - polyhedral optimization in llvm. In First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), 2011.Google ScholarGoogle Scholar
  8. M. Gschwind, H. P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. Synergistic processing in Cell's multicore architecture. IEEE Micro, 26: 10--24, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Krall and S. Lelait. Compilation techniques for multimedia processors. Int. J. Parallel Program., 28: 347--361, Aug. 2000. Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 145--156, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Larsen, R. Rabbah, and S. Amarasinghe. Exploiting vector parallelism in software pipelined loops. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 119--129, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Lattner. Macroscopic Data Structure Analysis and Optimization. PhD thesis, Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL, May 2005. {online} http://llvm.cs.uiuc.edu. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Leupers. Code selection for media processors with SIMD instructions. In Proceedings of the conference on Design, Automation and Test in Europe, DATE '00, pages 4--8, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP architecture. In Proceedings of the 2003 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '03, pages 2--11, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 132--143, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Pryanishnikov, A. Krall, T. U. Wien, and N. Horspool. Pointer alignment analysis for processors with SIMD instructions. In Proceedings of the 5th Workshop on Media and Streaming Processors, pages 50--57, 2003.Google ScholarGoogle Scholar
  19. G. Ren, P. Wu, and D. Padua. A preliminary study on the vectorization of multimedia applications for multimedia extensions. In Languages and Compilers for Parallel Computing, volume 2958 of Lecture Notes in Computer Science, pages 420--435. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Ren, P. Wu, and D. Padua. Optimizing data permutations for SIMD devices. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '06, pages 118--131, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Rosen, D. Nuzman, and A. Zaks. Loop-aware SLP in GCC. In Proceedings of GCC Developers' Summit, pages 131--142, 2007.Google ScholarGoogle Scholar
  22. J. Shalf, S. Dosanjh, and J. Morrison. Exascale computing technology challenges. In Proc. International Meeting on High Performance Computing for Computational Science, volume 6449 of Lecture Notes in Computer Science, pages 1--25, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Sreraman and R. Govindarajan. A vectorizing compiler for multimedia extensions. Int. J. Parallel Program., 28: 363--400, Aug. 2000. Google ScholarGoogle ScholarCross RefCross Ref
  24. R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1 (2): 146--160, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Walls. How to use the restrict qualifier in C. Sun Microsystems, Sun Developer Network (SDN), March 2006. {online} http://developers.sun.com/.Google ScholarGoogle Scholar
  26. P. Wu, A. E. Eichenberger, A. Wang, and P. Zhao. An integrated simdization framework using virtual vectors. In Proceedings of the 19th annual International Conference on Supercomputing, ICS '05, pages 169--178, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient SIMD code generation for irregular kernels

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 47, Issue 8
        PPOPP '12
        August 2012
        334 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2370036
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
          February 2012
          352 pages
          ISBN:9781450311601
          DOI:10.1145/2145816

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 February 2012

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!