Abstract
There has been a significant amount of effort invested in designing scheduling transformations such as loop tiling and loop fusion that rearrange the execution of dynamic instances of loop nests to place operations that access the same data close together temporally. In recent years, there has been interest in designing similar transformations that operate on recursive programs, but until now these transformations have only considered simple scenarios: multiple recursions to be fused, or a recursion nested inside a simple loop. This paper develops the first set of scheduling transformations for nested recursions: recursive methods that call other recursive methods. These are the recursive analog to nested loops. We present a transformation called recursion twisting that automatically improves locality at all levels of the memory hierarchy, and show that this transformation can yield substantial performance improvements across several benchmarks that exhibit nested recursion.
- Randy Allen and Ken Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, 2001.Google Scholar
Digital Library
- Pierre Amiranoff, Albert Cohen, and Paul Feautrier. Beyond Iteration Vectors: Instancewise Relational Abstract Domains. In Static Analysis, 13th International Symposium, SAS 2006, Seoul, Korea, August 29-31, 2006, Proceedings, pages 161--180, 2006.Google Scholar
- Utpal Banerjee. Unimodular Transformations of Double Loops. In Languages and Compilers for Parallel Computing, 1991.Google Scholar
- Jon Louis Bentley. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, 18(9):509--517, September 1975. Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 101--113, New York, NY, USA, 2008. ACM.Google Scholar
Digital Library
- Trishul M. Chilimbi, Bob Davidson, and James R. Larus. Cache-conscious Structure Definition. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 13--24, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious Structure Layout. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, PLDI '99, pages 1--12, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Trishul M. Chilimbi and James R. Larus. Using Generational Garbage Collection to Implement Cache-conscious Data Placement. In Proceedings of the 1st International Symposium on Memory Management, ISMM '98, pages 37--48, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- Albert Cohen and Jean-François Collard. Instance-Wise Reaching Definition Analysis for Recursive Programs Using Context-Free Transductions. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, PACT '98, pages 332--, Washington, DC, USA, 1998. IEEE Computer Society. Google Scholar
Cross Ref
- Intel Corp. Intel Cilk Plus Language Extension Specification, 2011.Google Scholar
- Ryan R. Curtin, William B. March, Parikshit Ram, David V. Anderson, Alexander G. Gray, and Charles L. Isbell. Tree-Independent Dual-Tree Algorithms. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1435--1443, 2013.Google Scholar
- Paul Feautrier. Some Efficient Solutions to the Affine Scheduling Problem. Part I. One-dimensional Time. International Journal of Parallel Programming, 21(5):313--347, 1992. Google Scholar
Digital Library
- Paul Feautrier. Some Efficient Solutions to the Affine Scheduling Problem. Part II. Multidimensional Time. International Journal of Parallel Programming, 21(6):389--420, 1992. Google Scholar
Cross Ref
- Paul Feautrier. A Parallelization Framework for Recursive Tree Programs. In Euro-Par '98 Parallel Processing, 4th International Euro-Par Conference, Southampton, UK, September 1--4, 1998, Proceedings, pages 470--479, 1998. Google Scholar
Cross Ref
- Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-Oblivious Algorithms. In 40th Annual Symposium on Foundations of Computer Science, FOCS '99, 17--18 October, 1999, New York, NY, USA, pages 285--298, 1999. Google Scholar
Cross Ref
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- Hwansoo Han and Chau-Wen Tseng. Exploiting Locality for Irregular Scientific Codes. IEEE Trans. Parallel Distrib. Syst., 17(7):606--618, 2006. Google Scholar
Digital Library
- Stefaan Himpe, Francky Catthoor, and Geert Deconinck. Control Flow Analysis for Recursion Removal. In Software and Compilers for Embedded Systems, 7th International Workshop, SCOPES 2003, Vienna, Austria, September 24--26, 2003, Proceedings, pages 101--116, 2003. Google Scholar
Cross Ref
- Hong Jia-Wei and H. T. Kung. I/O Complexity: The Red-blue Pebble Game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, STOC '81, pages 326--333, New York, NY, USA, 1981. ACM.Google Scholar
Digital Library
- Youngjoon Jo and Milind Kulkarni. Enhancing Locality for Recursive Traversals of Recursive Structures. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '11, pages 463--482, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- Youngjoon Jo and Milind Kulkarni. Automatically Enhancing Locality for Tree Traversals with Traversal Splicing. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, pages 355--374, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- Yanhong A. Liu and Scott D. Stoller. From Recursion to Iteration: What Are the Optimizations? In Proceedings of the 2000 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-based Program Manipulation, PEPM '00, pages 73--82, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Yanhong A. Liu and Scott D. Stoller. Dynamic Programming via Static Incrementalization. Higher-Order and Symbolic Computation, 16(1--2):37--62, 2003.Google Scholar
- Richard L. Mattson, Jan Gecsei, Donald R. Slutz, and Irving L. Traiger. Evaluation Techniques for Storage Hierarchies. IBM Systems Journal, 9(2):78--117, 1970. Google Scholar
Digital Library
- Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noel Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. A Domain-specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '16, pages 40:1--40:12, Piscataway, NJ, USA, 2016. IEEE Press. Google Scholar
Cross Ref
- Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. On Fusing Recursive Traversals of K-d Trees. In Proceedings of the 25th International Conference on Compiler Construction, CC 2016, pages 152--162, New York, NY, USA, 2016. ACM. Google Scholar
Digital Library
- Martin C. Rinard and Pedro C. Diniz. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM Trans. Program. Lang. Syst., 19(6):942--991, November 1997. Google Scholar
Digital Library
- D. N. Truong, F. Bodin, and A. Seznec. Improving Cache Behavior of Dynamically Allocated Data Structures. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, PACT '98, pages 322--, Washington, DC, USA, 1998. IEEE Computer Society. Google Scholar
Cross Ref
- Yusheng Weijiang, Shruthi Balakrishna, Jianqiao Liu, and Milind Kulkarni. Tree Dependence Analysis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pages 314--325, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- Qing Yi, Vikram Adve, and Ken Kennedy. Transforming Loops to Recursion for Multi-level Memory Hierarchies. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 169--181, New York, NY, USA, 2000. ACM. Google Scholar
Digital Library
- Kamen Yotov, Tom Roeder, Keshav Pingali, John Gunnels, and Fred Gustavson. An Experimental Comparison of Cache-oblivious and Cache-conscious Programs. In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '07, pages 93--104, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- Xingbin Zhang and Andrew A. Chien. Dynamic Pointer Alignment: Tiling and Communication Optimizations for Parallel Pointer-based Computations. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '97, pages 37--47, New York, NY, USA, 1997. ACM. Google Scholar
Digital Library
Index Terms
Locality Transformations for Nested Recursive Iteration Spaces
Recommendations
Locality Transformations for Nested Recursive Iteration Spaces
Asplos'17There has been a significant amount of effort invested in designing scheduling transformations such as loop tiling and loop fusion that rearrange the execution of dynamic instances of loop nests to place operations that access the same data close ...
Locality Transformations for Nested Recursive Iteration Spaces
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsThere has been a significant amount of effort invested in designing scheduling transformations such as loop tiling and loop fusion that rearrange the execution of dynamic instances of loop nests to place operations that access the same data close ...
Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests
Linear loop transformations and tiling are known to be very effective for enhancing locality of reference in perfectly-nested loops. However, they cannot be applied directly to imperfectly-nested loops. Some compilers attempt to convert imperfectly-...







Comments