Abstract
Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), re- quire precise dependence analysis on arrays and structs in order to vectorize isomorphic scalar instructions and/or reduce dynamic dependence checks incurred at runtime. The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-insensitive models, which are too imprecise in handling arrays and structs). This paper pro- poses an inter-procedural Loop-oriented Pointer Analysis, called LPA, for analyzing arrays and structs to support aggressive SLP and LLV optimizations. Unlike field-insensitive solutions that pre- allocate objects for each memory allocation site, our approach uses a fine-grained memory model to generate location sets based on how structs and arrays are accessed. LPA can precisely analyze ar- rays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, LPA is designed to reuse easily existing points-to resolution algorithms. We evaluate LPA using SLP and LLV, the two classic vectorization techniques on a set of 20 CPU2000/2006 benchmarks. For SLP, LPA enables it to vectorize a total of 133 more basic blocks, with an average of 12.09 per benchmark, resulting in the best speedup of 2.95% for 173.applu. For LLV, LPA has reduced a total of 319 static bound checks, with an average of 22.79 per benchmark, resulting in the best speedup of 7.18% for 177.mesa.
- L. Andersen. Program analysis and specialization for the C programming language. PhD thesis, 1994.Google Scholar
- O. Bachmann, P. S. Wang, and E. V. Zima. Chains of recurrences - a method to expedite the evaluation of closed-form functions. In ISSAC ’94, pages 242–249, 1994. Google Scholar
- R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. In MICRO ’10, pages 201–212, 2010. Google Scholar
- B. Hardekopf and C. Lin. Flow-Sensitive Pointer Analysis for Millions of Lines of Code. In CGO ’11, pages 289–298, 2011. Google Scholar
Digital Library
- ISO90. ISO/IEC. international standard ISO/IEC 9899, programming languages - C. 1990.Google Scholar
- M. Jung and S. A. Huss. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems, pages 107–121. Springer, 2004.Google Scholar
- S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI ’00, pages 145–156, 2000. Google Scholar
- O. Lhoták and K.-C. A. Chung. Points-to analysis with efficient strong updates. In POPL ’11, pages 3–16, 2011. Google Scholar
- Y. Li, T. Tan, Y. Sui, and J. Xue. Self-inferencing reflection resolution for java. In ECOOP ’14, pages 27–53. Springer, 2014. Google Scholar
Digital Library
- Y. Li, T. Tan, Y. Zhang, and J. Xue. Program tailoring: Slicing by sequential criteria. In ECOOP ’16, 2016.Google Scholar
- J. Liu, Y. Zhang, O. Jang, W. Ding, and M. Kandemir. A compiler framework for extracting superword level parallelism. In PLDI ’12, pages 347–358, 2012. Google Scholar
- S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua. An evaluation of vectorizing compilers. In PACT ’11, pages 372–382, 2011. Google Scholar
- P. H. Nguyen and J. Xue. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In ACSC ’05, pages 9–18, 2015. Google Scholar
Digital Library
- E. Nuutila and E. Soisalon-Soininen. On finding the strongly connected components in a directed graph. Information Processing Letters, 49(1):9–14, 1994. Google Scholar
Digital Library
- D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT ’08, pages 2–11. ACM, 2008. Google Scholar
Digital Library
- D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI ’06, pages 132–143, 2006. Google Scholar
- D. J. Pearce, P. H. Kelly, and C. Hankin. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems, 30(1):4, 2007. Google Scholar
Digital Library
- F. M. Q. Pereira and D. Berlin. Wave propagation and deep propagation for pointer analysis. In CGO ’09, pages 126–135, 2009. Google Scholar
- V. Porpodas, A. Magni, and T. M. Jones. PSLP: Padded SLP automatic vectorization. In CGO ’15, pages 190–201, 2015. Google Scholar
Digital Library
- R. R. Rick Hank, Loreena Lee. Implementing next generation pointsto in open64. In Open64 Developers Forum, 2010. URL http: //www.affinic.com/documents/open64workshop/2010/.Google Scholar
- J. Shin. Introducing control flow into vectorized code. In PACT ’07, pages 280–291, 2007. Google Scholar
- J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO ’05, pages 165–175, 2005. Google Scholar
- B. Steensgaard. Points-to analysis in almost linear time. In POPL ’96, pages 32–41. ACM, 1996. Google Scholar
Digital Library
- Y. Sui and J. Xue. SVF: Interprocedural static value-flow analysis in LLVM. In CC ’16, 2016. https://github.com/unsw-corg/SVF. Google Scholar
Digital Library
- Y. Sui, D. Ye, and J. Xue. Static memory leak detection using fullsparse value-flow analysis. In ISSTA ’12, pages 254–264, 2012. Google Scholar
- Y. Sui, Y. Li, and X. Jingling. Query-directed adaptive heap cloning for optimizing compilers. In CGO ’13, CGO ’13, pages 1–11, 2013. Google Scholar
- Y. Sui, S. Ye, J. Xue, and J. Zhang. Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation. Software: Practice and Experience, 44(12): 1485–1510, 2014. Google Scholar
Digital Library
- Y. Sui, P. Di, and J. Xue. Sparse flow-sensitive pointer analysis for multithreaded programs. In CGO ’16, pages 160–170, 2016. Google Scholar
- K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT ’09, pages 327–337, 2009. Google Scholar
- R. van Engelen. Efficient symbolic analysis for optimizing compilers. In CC ’01, pages 118–132, 2001. Google Scholar
Digital Library
- R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. In PLDI ’95, pages 1–12, 1995. Google Scholar
- S. Ye, Y. Sui, and J. Xue. Region-based selective flow-sensitive pointer analysis. In SAS ’14, pages 319–336. Springer, 2014.Google Scholar
- H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. ACM Transactions on Architecture and Code Optimization, 13(1):11:1–11:26, 2016. Google Scholar
Digital Library
- H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO ’16, pages 59–69, 2016. Google Scholar
Index Terms
Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
Recommendations
Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded SystemsCompiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism ...
Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization
Special Issue on MEMCODE 2015 and Regular Papers (Diamonds)Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-...
Semi-sparse flow-sensitive pointer analysis
POPL '09Pointer analysis is a prerequisite for many program analyses, and the effectiveness of these analyses depends on the precision of the pointer information they receive. Two major axes of pointer analysis precision are flow-sensitivity and context-...







Comments