skip to main content
research-article

Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization

Authors Info & Claims
Published:30 January 2018Publication History
Skip Abstract Section

Abstract

Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV).

The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural Loop-oriented Pointer Analysis for C, called Lpa, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate access-based location sets based on how structs and arrays are accessed. Lpa can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, Lpa is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily.

We have implemented Lpa fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate Lpa by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, Lpa outperforms LLVM’s BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respectively. In contrast, Lpa has reduced these static checks to 220, with an average of 15.7 checks per benchmark, resulting in the best speedup of 7.23% for 177.mesa.

References

  1. Lo Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation.Google ScholarGoogle Scholar
  2. Olaf Bachmann, Paul S. Wang, and Eugene V. Zima. 1994. Chains of recurrences—A method to expedite the evaluation of closed-form functions. In Proceedings of the ISAAC’94. 242--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive points-to analysis for C and C++. In Proceedings of the SAS’16.Google ScholarGoogle ScholarCross RefCross Ref
  4. Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2010. Efficient selection of vector instructions using dynamic programming. In Proceedings of the Micro’10. 201--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Xiaokang Fan, Yulei Sui, Xiangke Liao, and Jingling Xue. 2017. Boosting the precision of virtual call integrity protection with partial pointer analysis for C++. In Proceedings of the 26th ACM SIGSOFT’17. 329--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly-polyhedral optimization in {LLVM}. In Proceedings of the IMPACT’11.Google ScholarGoogle Scholar
  7. Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Proceedings of the PLDI’07. ACM, 290--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Hardekopf and C. Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the CGO’11. 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ISO90. 1990. ISO/IEC. international standard ISO/IEC 9899, programming languages C.Google ScholarGoogle Scholar
  10. Michael Jung and Sorin Alexander Huss. 2004. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems. Springer, 107--121.Google ScholarGoogle Scholar
  11. Ralf Karrenberg. 2015. Whole-function vectorization. In Proceedings of the CGO’11. Springer, 85--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Samuel Larsen and Saman Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the PLDI’00. 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the CGO’04. IEEE Computer Society, 75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to analysis with efficient strong updates. In Proceedings of the POPL’11. 3--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. LLVM-Alias-Analysis. 2017. Retrieved from http://llvm.org/docs/AliasAnalysis.html.Google ScholarGoogle Scholar
  16. Saeed Maleki, Yaoqing Gao, Mara J. Garzaran, Tommy Wong, David Padua, et al. 2011. An evaluation of vectorizing compilers. In Proceedings of the PACT’11. IEEE, 372--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Phung Hua Nguyen and Jingling Xue. 2015. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In Proceedings of the ACSC’05. 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett. 49, 1 (1994), 9--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. In Proceedings of the PLDI’06. 132--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the PACT’08. ACM, 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vitor Paisante, Maroua Maalej, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintão Pereira. 2016. Symbolic range analysis of pointers. In Proceedings of the CGO’16. ACM, 171--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. Proceedings of the TOPLAS’07 30, 1 (2007), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In Proceedings of the CGO’09. 126--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: Padded SLP automatic vectorization. In Proceedings of the CGO’15. IEEE, 190--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ganesan Ramalingam. 1994. The undecidability of aliasing. ACM TOPLAS 16, 5 (1994), 1467--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rajiv Ravindran Rick Hank, Loreena Lee. 2010. Implementing next generation points-to in open64. In Open64 Developers Forum. Retrieved from http://www.affinic.com/documents/open64workshop/2010/.Google ScholarGoogle Scholar
  27. Radu Rugina and Martin Rinard. 2000. Symbolic bounds analysis of pointers, array indices, and accessed memory regions. In Proceedings of the PLDI’00, Vol. 35. ACM, 182--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the PACT’07. 280--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the CGO’05. 165--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. Proceedings of the PLDI’06, 387--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yulei Sui, Peng Di, and Jingling Xue. 2016. Sparse flow-sensitive pointer analysis for multithreaded programs. In Proceedings of the CGO’16. 160--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yulei Sui, Yue Li, and Jingling Xue. 2013. Query-directed adaptive heap cloning for optimizing compilers. In Proceedings of the CGO’13. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yulei Sui and Jingling Xue. 2016a. On-demand strong update analysis via value-flow refinement. In Proceedings of the FSE’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yulei Sui and Jingling Xue. 2016b. SVF: Interprocedural static value-flow analysis in LLVM. https://github.com/unsw-corg/SVF. In Proceedings of the CC’16. 265--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the ISSTA’12. ACM, 254--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the PACT’09. 327--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Robert van Engelen. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the CC’01. 118--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Robert P. Wilson and Monica S. Lam. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the PLDI’95. ACM, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ding Ye, Yulei Sui, and Jingling Xue. 2014a. Accelerating dynamic detection of uses of undefined values with static value-flow analysis. In Proceedings of the CGO’14. ACM, 154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sen Ye, Yulei Sui, and Jingling Xue. 2014b. Region-based selective flow-sensitive pointer analysis. In Proceedings of the SAS’14. Springer, 319--336.Google ScholarGoogle ScholarCross RefCross Ref
  41. Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the POPL’08. 197--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hao Zhou and Jingling Xue. 2016a. A compiler approach for exploiting partial SIMD parallelism. ACM Trans. Arch. Code Optim. 13, 1 (2016), 11:1--11:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hao Zhou and Jingling Xue. 2016b. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In Proceedings of the CGO’16. 59--69. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!