Abstract
This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S=0, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ==> S = 0). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-Club and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers.
- V. Adve and J. Mellor-Crummey. Using Integer Sets for Data-Parallel Program Analysis and Optimization. In Procs. Int. Conf. Prog. Lang. Design and Implementation, 1998. Google Scholar
Digital Library
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. ISBN 1-55860-286-0. Google Scholar
Digital Library
- B. Armstrong and R. Eigenmann. Application of Automatic Parallelization to Modern Challenges of Scientific Computing Industries. In Int. Conf. Parallel Proc., pages 279--286, 2008. Google Scholar
Digital Library
- U. Banerjee. Speedup of Ordinary Programs. Ph.D. Thesis, Univ. of Illinois at Urbana-Champaign, Report No. 79--989, 1988. Google Scholar
Digital Library
- W. Blume et al. Parallel Programming with Polaris. Computer, 29(12), 1996. Google Scholar
Digital Library
- W. Blume and R. Eigenmann. Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs. IEEE Transactions on Parallel and Distributed Systems, 3: 643--656, 1992. Google Scholar
Digital Library
- W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-Linear Expressions. In Procs. Int. Conf. on Supercomp, pages 528--537, 1994. Google Scholar
Digital Library
- W. Blume and R. Eigenmann. Demand-Driven,Symbolic Range Propagation. In Procs. Int. Lang. Comp. Par. Comp., 1995. Google Scholar
Digital Library
- R. A. V. Engelen. A unified framework for nonlinear dependence testing and symbolic analysis. In Procs. Int. Conf.on Supercomp, pages 106--115, 2004. Google Scholar
Digital Library
- T. Fahringer. Efficient Symbolic Analysis for Parallelizing Compilers and Performance Estimator. Journal of Supercomp,12: 227--252,1997. Google Scholar
Digital Library
- P. Feautrier. Parametric Integer Programming. Operations Research, 22(3): 243--268, 1988.Google Scholar
Cross Ref
- P. Feautrier. Dataflow Analysis of Array and Scalar References. Int. Journal of Par. Prog, 20(1): 23--54, 1991.Google Scholar
Cross Ref
- M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural Parallelization Analysis in SUIF. Trans. on Prog. Lang. and Sys., 27(4): 662--731, 2005. Google Scholar
Digital Library
- J. Hoeflinger, Y. Paek, and K. Yi. Unified Interprocedural Parallelism Detection. Int. Journal of Par. Prog, 29(2): 185--215, 2001. Google Scholar
Digital Library
- Y. Lin and D. Padua. Demand-Driven Interprocedural Array Property Analysis. In Procs. Int. Lang. Comp. Par. Comp., 1999. Google Scholar
Digital Library
- Y. Lin and D. Padua. Analysis of Irregular Single-Indexed Arrays and its Applications in Compiler Optimizations. In Procs. Int. Conf. on Compiler Construction, pages 202--218, 2000. Google Scholar
Digital Library
- B. Lu and J. Mellor-Crummey. Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors In Int. Par. Proc. Symp., 1998 Google Scholar
Digital Library
- S. Moon and M. W. Hall. Evaluation of Predicated Array Data-Flow Analysis for Automatic Parallelization. In Proc. of Principles and Practice of Parallel Programming, pages 84--95, 1999. Google Scholar
Digital Library
- C. E. Oancea and L. Rauchwerger. A Hybrid Approach to Proving Memory Reference Monotonicity. In Procs. Int. Lang. Comp. Par. Comp., 2011.Google Scholar
- Y. Paek, J. Hoeflinger, and D. Padua. Efficient and Precise Array Access Analysis. Trans. on Prog. Lang. and Sys., 24(1): 65--109, 2002. Google Scholar
Digital Library
- L.N. Pouchet, et al. Loop Transformations: Convexity, Pruning and Optimization. In Procs. of Princ. of Prog. Lang., 2011. Google Scholar
Digital Library
- W. Pugh. The Omega Test: a Fast and Practical Integer Programming Algorithm for Dependence Analysis. Com. of the ACM, 8: 4--13, 1992.Google Scholar
- W. Pugh and D. Wonnacott. Nonlinear Array Dependence Analysis. In Proc. Lang. Comp. Run-Time Support Scal. Sys., 1995.Google Scholar
- W. Pugh and D. Wonnacott. Constraint-Based Array Dependence Analysis. In Trans. on Prog. Lang. and Sys., 20(3), 635--678, 1998. Google Scholar
Digital Library
- L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. IEEE Trans. Par. Distrib. Sys, 10(2): 160--199, 1999. Google Scholar
Digital Library
- L. Rauchwerger, N. Amato, and D. Padua. A Scalable Method for Run Time Loop Parallelization. Int. Journal of Par. Prog,26: 26--6,1995. Google Scholar
Digital Library
- L. Renganarayanan, D. Kim, S. Rajopadhye, and M. M. Strout.L. Renganarayanan, et. al. Parameterized Tiled Loops for Free. In Int. Conf. Prog. Lang. Design and Implementation., 2007. Google Scholar
Digital Library
- S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: Static & dynamic memory reference analysis. Int. Journal of Par. Prog, 31(3): 251--283, 2003. Google Scholar
Digital Library
- S. Rus, M. Pennings, and L. Rauchwerger. Sensitivity Analysis for Automatic Parallelization on Multi-Cores. In Procs. Int. Conf. on Supercomp, pages 263--273, 2007. Google Scholar
Digital Library
- X. Zhuang, et. al. Exploiting Parallelism with Dependence-Aware Scheduling. In Int. Conf. Par. Arch. Compilation Tech., 2009. Google Scholar
Digital Library
Index Terms
Logical inference techniques for loop parallelization
Recommendations
Logical inference techniques for loop parallelization
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationThis paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid ...
Loop Parallelization: Revisiting Framework of Unimodular Transformations
PDP '96: Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)Abstract: The paper extends the framework of linear loop transformations adding a new nonlinear step at the transformation process. The current framework of linear loop transformation cannot identify a significant fraction of parallelism. For this ...







Comments