Abstract
This paper presents GLORE, a novel approach to enabling the detection and removal of large-scoped redundant computations in nested loops. GLORE works on LER-notation, a new representation of computations in both regular and irregular loops. Together with a set of novel algorithms, it makes GLORE able to systematically consider computation reordering at both the expression level and the loop level in a unified manner. GLORE shows an applicability much broader than prior methods have, and frequently lowers the computational complexities of some nested loops that are elusive to prior optimization techniques, producing significantly larger speedups.
- R. Allen and K. Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers.Google Scholar
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008a. http://pluto-compiler.sourceforge.net.Google Scholar
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008b. A Practical Automatic Polyhedral Program Optimization System. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google Scholar
- Lam Chi-Chung, P Sadayappan, and Rephael Wenger. 1997. On optimizing a class of multi-dimensional loops with reduction for parallel execution. Parallel Processing Letters 7, 02 (1997), 157–168.Google Scholar
Cross Ref
- Keith Cooper, Jason Eckhardt, and Ken Kennedy. 2008. Redundancy elimination revisited. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques. ACM, 12–21. Google Scholar
Digital Library
- Steven J Deitz, Bradford L Chamberlain, and Lawrence Snyder. 2001. Eliminating redundancies in sum-of-product array computations. In Proceedings of the 15th international conference on Supercomputing. ACM, 65–77.Google Scholar
Digital Library
- Yufei Ding, Lin Ning, Hui Guan, and Xipeng Shen. 2017. Generalizations of the theory and deployment of triangular inequality for compiler-based strength reduction. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 33–48. Google Scholar
Digital Library
- Y. Ding, X. Shen, M. Musuvathi, and T. Mytkowicz. 2015. TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems. In Proceedings of the 41st International Conference on Very Large Data Bases. Google Scholar
Digital Library
- Jonathan Drake and Greg Hamerly. 2012. Accelerated k-means with adaptive distance bounds. In 5th NIPS Workshop on Optimization for Machine Learning.Google Scholar
- Charles Elkan. 2003. Using the triangle inequality to accelerate k-means. In ICML, Vol. 3. 147–153.Google Scholar
- AM Fahim, AM Salem, FA Torkey, and MA Ramadan. 2006. An efficient enhanced k-means clustering algorithm. Journal of Zhejiang University SCIENCE A, Springer 7, 10 (2006), 1626–1633.Google Scholar
Cross Ref
- Andrew V Goldberg and Chris Harrelson. 2005. Computing the shortest path: A search meets graph theory. In Proceedings of the sixteenth annual ACM-SIAM. 156–165.Google Scholar
- Michael Greenspan, Guy Godin, and Jimmy Talbot. 2000. Acceleration of binning nearest neighbor methods. In Vision Interface, IEEE. 337–344.Google Scholar
- Gautam Gupta and Sanjay V Rajopadhye. 2006. Simplifying reductions.. In POPL, Vol. 6. 30–41.Google Scholar
- Ronald J Gutman. 2004. Reach-Based Routing: A New Approach to Shortest Path Algorithms Optimized for Road Networks.. In ALENEX/ANALC. 100–111.Google Scholar
- Greg Hamerly. 2010. Making k-means Even Faster.. In SDM, SIAM. 130–140.Google Scholar
- Matthew A Hammer, Joshua Dunfield, Kyle Headley, Nicholas Labich, Jeffrey S Foster, Michael Hicks, and David Van Horn. 2015. Incremental Computation with Names. arXiv preprint arXiv:1503.07792 (2015).Google Scholar
- Matthew A Hammer, Khoo Yit Phang, Michael Hicks, and Jeffrey S Foster. 2014. Adapton: Composable, demand-driven incremental computation. In ACM SIGPLAN Notices, Vol. 49. ACM, 156–166.Google Scholar
- Albert Hartono, Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Marcel Nooijen, Gerald Baumgartner, David E Bernholdt, Venkatesh Choppella, Russell M Pitzer, J Ramanujam, et al. 2006. Identifying cost-effective common subexpressions to reduce operation count in tensor contraction evaluations. In International Conference on Computational Science. Springer, 267–275. Google Scholar
Digital Library
- Albert Hartono, Alexander Sibiryakov, Marcel Nooijen, Gerald Baumgartner, David E Bernholdt, So Hirata, Chi-Chung Lam, Russell M Pitzer, J Ramanujam, and P Sadayappan. 2005. Automated operation minimization of tensor contraction expressions in electronic structure calculations. In International Conference on Computational Science. Springer, 155–164. Google Scholar
Digital Library
- David Joyner, Ondřej Čertík, Aaron Meurer, and Brian E Granger. 2012. Open source computer algebra systems: SymPy. ACM Communications in Computer Algebra 45, 3/4 (2012), 225–234. Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2008. http://clang.llvm.org.Google Scholar
- Wang Kay Ngai, Ben Kao, Chun Kit Chui, Reynold Cheng, Michael Chau, and Kevin Y Yip. 2006. Efficient clustering of uncertain data. In Data Mining, 2006. ICDM’06, IEEE. 436–445.Google Scholar
- Oswaldo Olivo, Isil Dillig, and Calvin Lin. 2015. Static detection of asymptotic performance bugs in collection traversals. In ACM SIGPLAN Notices, Vol. 50. ACM, 369–378. Google Scholar
Digital Library
- Robert Paige and Shaye Koenig. 1982. Finite differencing of computable expressions. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 3, 402–454. Google Scholar
Digital Library
- Yonghong Song and Zhiyuan Li. 1999. New tiling techniques to improve cache temporal locality. ACM SIGPLAN Notices 34, 5 (1999), 215–228. Google Scholar
Digital Library
- TR. Omitted to Avoid Conflicts with Blind Review, 2017. Generalized Loop Redundancy Elimination upon Formula-Based Redundancy Removal. In http://goo.gl/j4UKAp.Google Scholar
- Jing Wang, Jingdong Wang, Qifa Ke, Gang Zeng, and Shipeng Li. 2012. Fast approximate k-means via cluster closures. In Computer Vision and Pattern Recognition (CVPR), IEEE. 3037–3044. Google Scholar
Cross Ref
- Xueyi Wang. 2011. A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In Neural Networks (IJCNN), IEEE. 1293–1299. Google Scholar
Cross Ref
Index Terms
GLORE: generalized loop redundancy elimination upon LER-notation
Recommendations
Redundant Synchronization Elimination for DOACROSS Loops
Cross-iterations data dependences in DOACROSS loops require explicit data synchronizations to enforce them. However, the composite effect of some data synchronizations may cover the other dependences and make the enforcement of those covered dependences ...
Generalized Loop-Unrolling: A Method for Program Speedup
ASSET '99: Proceedings of the 1999 IEEE Symposium on Application - Specific Systems and Software Engineering and TechnologyIt is well known that, to optimize a program ,for speedup, efforts should he ,focused on the regions where the payoff will he greatest. Loop constructs in a program represent such regions. In the literature, it has been shown that a certain degree of ...
Redundancy elimination revisited
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesThis work proposes and evaluates improvements to previously known algorithms for redundancy elimination.
Enhanced Scalar Replacement combines two classic techniques, scalar replacement and hash-based value numbering. The former detects redundant array ...






Comments