Abstract
Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV).
The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural Loop-oriented Pointer Analysis for C, called Lpa, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate access-based location sets based on how structs and arrays are accessed. Lpa can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, Lpa is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily.
We have implemented Lpa fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate Lpa by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, Lpa outperforms LLVM’s BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respectively. In contrast, Lpa has reduced these static checks to 220, with an average of 15.7 checks per benchmark, resulting in the best speedup of 7.23% for 177.mesa.
- Lo Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation.Google Scholar
- Olaf Bachmann, Paul S. Wang, and Eugene V. Zima. 1994. Chains of recurrences—A method to expedite the evaluation of closed-form functions. In Proceedings of the ISAAC’94. 242--249. Google Scholar
Digital Library
- George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive points-to analysis for C and C++. In Proceedings of the SAS’16.Google Scholar
Cross Ref
- Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2010. Efficient selection of vector instructions using dynamic programming. In Proceedings of the Micro’10. 201--212. Google Scholar
Digital Library
- Xiaokang Fan, Yulei Sui, Xiangke Liao, and Jingling Xue. 2017. Boosting the precision of virtual call integrity protection with partial pointer analysis for C++. In Proceedings of the 26th ACM SIGSOFT’17. 329--340. Google Scholar
Digital Library
- Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly-polyhedral optimization in {LLVM}. In Proceedings of the IMPACT’11.Google Scholar
- Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Proceedings of the PLDI’07. ACM, 290--299. Google Scholar
Digital Library
- B. Hardekopf and C. Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the CGO’11. 289--298. Google Scholar
Digital Library
- ISO90. 1990. ISO/IEC. international standard ISO/IEC 9899, programming languages C.Google Scholar
- Michael Jung and Sorin Alexander Huss. 2004. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems. Springer, 107--121.Google Scholar
- Ralf Karrenberg. 2015. Whole-function vectorization. In Proceedings of the CGO’11. Springer, 85--125. Google Scholar
Digital Library
- Samuel Larsen and Saman Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the PLDI’00. 145--156. Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the CGO’04. IEEE Computer Society, 75. Google Scholar
Digital Library
- Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to analysis with efficient strong updates. In Proceedings of the POPL’11. 3--16.Google Scholar
Digital Library
- LLVM-Alias-Analysis. 2017. Retrieved from http://llvm.org/docs/AliasAnalysis.html.Google Scholar
- Saeed Maleki, Yaoqing Gao, Mara J. Garzaran, Tommy Wong, David Padua, et al. 2011. An evaluation of vectorizing compilers. In Proceedings of the PACT’11. IEEE, 372--382. Google Scholar
Digital Library
- Phung Hua Nguyen and Jingling Xue. 2015. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In Proceedings of the ACSC’05. 9--18. Google Scholar
Digital Library
- Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett. 49, 1 (1994), 9--14. Google Scholar
Digital Library
- Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. In Proceedings of the PLDI’06. 132--143. Google Scholar
Digital Library
- Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the PACT’08. ACM, 2--11. Google Scholar
Digital Library
- Vitor Paisante, Maroua Maalej, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintão Pereira. 2016. Symbolic range analysis of pointers. In Proceedings of the CGO’16. ACM, 171--181. Google Scholar
Digital Library
- David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. Proceedings of the TOPLAS’07 30, 1 (2007), 4. Google Scholar
Digital Library
- Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In Proceedings of the CGO’09. 126--135. Google Scholar
Digital Library
- Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: Padded SLP automatic vectorization. In Proceedings of the CGO’15. IEEE, 190--201. Google Scholar
Digital Library
- Ganesan Ramalingam. 1994. The undecidability of aliasing. ACM TOPLAS 16, 5 (1994), 1467--1471. Google Scholar
Digital Library
- Rajiv Ravindran Rick Hank, Loreena Lee. 2010. Implementing next generation points-to in open64. In Open64 Developers Forum. Retrieved from http://www.affinic.com/documents/open64workshop/2010/.Google Scholar
- Radu Rugina and Martin Rinard. 2000. Symbolic bounds analysis of pointers, array indices, and accessed memory regions. In Proceedings of the PLDI’00, Vol. 35. ACM, 182--195. Google Scholar
Digital Library
- Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the PACT’07. 280--291. Google Scholar
Digital Library
- Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the CGO’05. 165--175. Google Scholar
Digital Library
- Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. Proceedings of the PLDI’06, 387--400. Google Scholar
Digital Library
- Yulei Sui, Peng Di, and Jingling Xue. 2016. Sparse flow-sensitive pointer analysis for multithreaded programs. In Proceedings of the CGO’16. 160--170. Google Scholar
Digital Library
- Yulei Sui, Yue Li, and Jingling Xue. 2013. Query-directed adaptive heap cloning for optimizing compilers. In Proceedings of the CGO’13. 1--11. Google Scholar
Digital Library
- Yulei Sui and Jingling Xue. 2016a. On-demand strong update analysis via value-flow refinement. In Proceedings of the FSE’16. Google Scholar
Digital Library
- Yulei Sui and Jingling Xue. 2016b. SVF: Interprocedural static value-flow analysis in LLVM. https://github.com/unsw-corg/SVF. In Proceedings of the CC’16. 265--266. Google Scholar
Digital Library
- Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the ISSTA’12. ACM, 254--264. Google Scholar
Digital Library
- Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the PACT’09. 327--337. Google Scholar
Digital Library
- Robert van Engelen. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the CC’01. 118--132. Google Scholar
Digital Library
- Robert P. Wilson and Monica S. Lam. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the PLDI’95. ACM, 1--12. Google Scholar
Digital Library
- Ding Ye, Yulei Sui, and Jingling Xue. 2014a. Accelerating dynamic detection of uses of undefined values with static value-flow analysis. In Proceedings of the CGO’14. ACM, 154. Google Scholar
Digital Library
- Sen Ye, Yulei Sui, and Jingling Xue. 2014b. Region-based selective flow-sensitive pointer analysis. In Proceedings of the SAS’14. Springer, 319--336.Google Scholar
Cross Ref
- Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the POPL’08. 197--208. Google Scholar
Digital Library
- Hao Zhou and Jingling Xue. 2016a. A compiler approach for exploiting partial SIMD parallelism. ACM Trans. Arch. Code Optim. 13, 1 (2016), 11:1--11:26. Google Scholar
Digital Library
- Hao Zhou and Jingling Xue. 2016b. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In Proceedings of the CGO’16. 59--69. Google Scholar
Digital Library
Index Terms
Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization
Recommendations
Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsCompiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism ...
Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
LCTES '16Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism ...
Semi-sparse flow-sensitive pointer analysis
POPL '09Pointer analysis is a prerequisite for many program analyses, and the effectiveness of these analyses depends on the precision of the pointer information they receive. Two major axes of pointer analysis precision are flow-sensitivity and context-...






Comments