Abstract
Series of traversals of tree structures arise in numerous contexts: abstract syntax tree traversals in compiler passes, rendering traversals of the DOM in web browsers, kd-tree traversals in computational simulation codes. In each of these settings, a tree is traversed multiple times to compute various values and modify various portions of the tree. While it is relatively easy to write these traversals as separate small updates to the tree, for efficiency reasons, traversals are often manually fused to reduce the number of times that each portion of the tree is traversed: by performing multiple operations on the tree simultaneously, each node of the tree can be visited fewer times, increasing opportunities for optimization and decreasing cache pressure and other overheads. This fusion process is often done manually, requiring careful understanding of how each of traversals of the tree interact. This paper presents an automatic approach to traversal fusion: tree traversals can be written independently, and then our framework analyzes the dependences between the traversals to determine how they can be fused to reduce the number of visits to each node in the tree. A critical aspect of our framework is that it exploits two opportunities to increase the amount of fusion: i) it automatically integrates code motion, and ii) it supports partial fusion, where portions of one traversal can be fused with another, allowing for a reduction in node visits without requiring that two traversals be fully fused. We implement our framework in Clang, and show across several case studies that we can successfully fuse complex tree traversals, reducing the overall number of traversals and substantially improving locality and performance.
- Pierre Amiranoff, Albert Cohen, and Paul Feautrier. 2006. Beyond Iteration Vectors: Instancewise Relational Abstract Domains. In Proceedings of the 13th International Conference on Static Analysis (SAS’06). Springer-Verlag, Berlin, Heidelberg, 161–180. DOI: Google Scholar
Digital Library
- Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA, 101–113. DOI: Google Scholar
Digital Library
- Wei-Ngan Chin. 1992. Safe Fusion of Functional Expressions. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming (LFP ’92). ACM, New York, NY, USA, 11–20. DOI: Google Scholar
Digital Library
- Albert Cohen and Jean-François Collard. 1998. Instance-Wise Reaching Definition Analysis for Recursive Programs Using Context-Free Transductions. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques (PACT ’98). IEEE Computer Society, Washington, DC, USA, 332–. http://dl.acm.org/citation.cfm?id=522344. 825716Google Scholar
Digital Library
- H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. 2007. Tree Automata Techniques and Applications. Available on: http://www.grappa.univ- lille3.fr/tata . (2007). release October, 12th 2007.Google Scholar
- Loris D’Antoni, Margus Veanes, Benjamin Livshits, and David Molnar. 2014. Fast: A Transducer-based Language for Tree Manipulation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 384–394. DOI: Google Scholar
Digital Library
- Alain Darte. 1999. On the Complexity of Loop Fusion. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT ’99). IEEE Computer Society, Washington, DC, USA, 149–. http: //dl.acm.org/citation.cfm?id=520793.825721 Google Scholar
Cross Ref
- John Doner. 1970. Tree Acceptors and Some of Their Applications. J. Comput. Syst. Sci. 4, 5 (Oct. 1970), 406–451. DOI: Google Scholar
Digital Library
- Torbjörn Ekman and Görel Hedin. 2007. The Jastadd Extensible Java Compiler. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA ’07). ACM, New York, NY, USA, 1–18. DOI: Google Scholar
Digital Library
- Joost Engelfriet. 1975. Bottom-up and top-down tree transformations— a comparison. Mathematical systems theory 9, 2 (01 Jun 1975), 198–231. DOI: Google Scholar
Cross Ref
- Joost Engelfriet and Sebastian Maneth. 2002. Output String Languages of Compositions of Deterministic Macro Tree Transducers. J. Comput. Syst. Sci. 64, 2 (March 2002), 350–395. DOI: Google Scholar
Digital Library
- Joost Engelfriet and Heiko Vogler. 1985. Macro tree transducers. J. Comput. System Sci. 31, 1 (1985), 71 – 146. DOI: Google Scholar
Cross Ref
- N. Engheta, W. D. Murphy, V. Rokhlin, and M. S. Vassiliou. 1992. The fast multipole method (FMM) for electromagnetic scattering problems. IEEE Transactions on Antennas and Propagation 40, 6 (Jun 1992), 634–641. DOI: Google Scholar
Cross Ref
- R. Farrow, K. Kennedy, and L. Zucconi. 1976. Graph Grammars and Global Program Data Flow Analysis. In Proceedings of the 17th Annual Symposium on Foundations of Computer Science (SFCS ’76). IEEE Computer Society, Washington, DC, USA, 42–56. DOI: Google Scholar
Digital Library
- Paul Feautrier. 1992a. Some Efficient Solutions to the Affine Scheduling Problem: I. One-dimensional Time. Int. J. Parallel Program. 21, 5 (Oct. 1992), 313–348. DOI: Google Scholar
Digital Library
- Paul Feautrier. 1992b. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6 (01 Dec 1992), 389–420. DOI: Google Scholar
Cross Ref
- Paul Feautrier. 1998. A parallelization framework for recursive tree programs. In Euro-Par’98 Parallel Processing: 4th International Euro-Par Conference Southampton, UK, September 1–4, 1998 Proceedings, David Pritchard and Jeff Reeve (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 470–479. DOI: Google Scholar
Cross Ref
- Rakesh Ghiya and Laurie J. Hendren. 1996. Is It a Tree, a DAG, or a Cyclic Graph? A Shape Analysis for Heap-directed Pointers in C. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’96). ACM, New York, NY, USA, 1–15. DOI: Google Scholar
Digital Library
- Rakesh Ghiya, Laurie J. Hendren, and Yingchun Zhu. 1998. Detecting Parallelism in C Programs with Recursive Darta Structures. In Proceedings of the 7th International Conference on Compiler Construction (CC ’98). Springer-Verlag, London, UK, UK, 159–173. http://dl.acm.org/citation.cfm?id=647474.727598 Google Scholar
Cross Ref
- Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A Short Cut to Deforestation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture (FPCA ’93). ACM, New York, NY, USA, 223–232. DOI: Google Scholar
Digital Library
- L. Greengard and V. Rokhlin. 1987. A Fast Algorithm for Particle Simulations. J. Comput. Phys. 73, 2 (Dec. 1987), 325–348. DOI: Google Scholar
Digital Library
- Robert J. Harrison, George I. Fann, Takeshi Yanai, Zhengting Gan, and Gregory Beylkin. 2004. Multiresolution quantum chemistry: Basic theory and initial applications. The Journal of Chemical Physics 121, 23 (2004), 11587–11598. DOI: Google Scholar
Cross Ref
- N. Hegde, J. Liu, K. Sundararajah, and M. Kulkarni. 2017. Treelogy: A benchmark suite for tree traversals. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 227–238. DOI: Google Scholar
Cross Ref
- Ralf Hinze, Thomas Harper, and Daniel W. H. James. 2011. Theory and Practice of Fusion. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages (IFL’10). Springer-Verlag, Berlin, Heidelberg, 19–37. http://dl.acm.org/citation.cfm?id=2050135.2050137 Google Scholar
Cross Ref
- Joseph Hummel, Laurie J. Hendren, and Alexandru Nicolau. 1994. A General Data Dependence Test for Dynamic, Pointerbased Data Structures. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). ACM, New York, NY, USA, 218–229. DOI: Google Scholar
Digital Library
- Ken Kennedy and Kathryn S. McKinley. 1994. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, London, UK, UK, 301–320. http://dl.acm.org/citation.cfm?id=645671.665526 Google Scholar
Cross Ref
- Donald E. Knuth. 1968. Semantics of context-free languages. Mathematical systems theory 2, 2 (01 Jun 1968), 127–145. DOI: Google Scholar
Cross Ref
- J. R. Larus and P. N. Hilfinger. 1988. Detecting Conflicts Between Structure Accesses. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI ’88). ACM, New York, NY, USA, 24–31. DOI: Google Scholar
Digital Library
- Parthasarathy Madhusudan, Xiaokang Qiu, and Andrei Stefanescu. 2012. Recursive Proofs for Inductive Tree Data-structures. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12). ACM, New York, NY, USA, 123–136. DOI: Google Scholar
Digital Library
- Andreas Maletti. 2008. Compositions of Extended Top-down Tree Transducers. Inf. Comput. 206, 9-10 (Sept. 2008), 1187–1196. DOI: Google Scholar
Digital Library
- MÃČÂşnica MartÃČÂŋnez and Alberto Pardo. 2013. A shortcut fusion approach to accumulations. Science of Computer Programming 78, 8 (2013), 1121 – 1136. DOI: Google Scholar
Digital Library
- R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. 1970. Evaluation Techniques for Storage Hierarchies. IBM Syst. J. 9, 2 (June 1970), 78–117. DOI: Google Scholar
Digital Library
- Erik Meijer and Johan Jeuring. 1995. Merging Monads and Folds for Functional Programming. In Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text. SpringerVerlag, London, UK, UK, 228–266. http://dl.acm.org/citation.cfm?id=647698.734152 Google Scholar
Cross Ref
- Leo A. Meyerovich and Rastislav Bodik. 2010. Fast and Parallel Webpage Layout. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). ACM, New York, NY, USA, 711–720. DOI: Google Scholar
Digital Library
- Leo A. Meyerovich, Matthew E. Torok, Eric Atkinson, and Rastislav Bodik. 2013. Parallel Schedule Synthesis for Attribute Grammars. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). ACM, New York, NY, USA, 187–196. DOI: Google Scholar
Digital Library
- Dmitry Petrashko, Ondřej Lhoták, and Martin Odersky. 2017. Miniphases: Compilation Using Modular and Efficient Tree Transformations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 201–216. DOI: Google Scholar
Digital Library
- Apan Qasem and Ken Kennedy. 2006. Profitable Loop Fusion and Tiling Using Model-driven Empirical Search. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS ’06). ACM, New York, NY, USA, 249–258. DOI: Google Scholar
Digital Library
- Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noel Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. 2016a. A Domain-specific Compiler for a Parallel Multiresolution Adaptive Numerical Simulation Environment. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’16). IEEE Press, Piscataway, NJ, USA, Article 40, 12 pages. http://dl.acm.org/citation.cfm?id=3014904.3014958Google Scholar
- Samyam Rajbhandari, Jinsung Kim, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, Robert J. Harrison, and P. Sadayappan. 2016b. On Fusing Recursive Traversals of K-d Trees. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, New York, NY, USA, 152–162. DOI: Google Scholar
Digital Library
- V Rokhlin. 1985. Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60, 2 (1985), 187 – 207. DOI: Google Scholar
Cross Ref
- Tiark Rompf, Arvind K. Sujeeth, Nada Amin, Kevin J. Brown, Vojin Jovanovic, HyoukJoong Lee, Manohar Jonnalagedda, Kunle Olukotun, and Martin Odersky. 2013. Optimizing Data Structures in High-level Programs: New Directions for Extensible Compilers Based on Staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). ACM, New York, NY, USA, 497–510. DOI: Google Scholar
Digital Library
- Radu Rugina and Martin C. Rinard. 2005. Symbolic Bounds Analysis of Pointers, Array Indices, and Accessed Memory Regions. ACM Trans. Program. Lang. Syst. 27, 2 (March 2005), 185–235. DOI: Google Scholar
Digital Library
- Mooly Sagiv, Thomas Reps, and Reinhard Wilhelm. 1999. Parametric Shape Analysis via 3-valued Logic. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’99). ACM, New York, NY, USA, 105–118. DOI: Google Scholar
Digital Library
- Philip Wadler. 1988. Deforestation: Transforming Programs to Eliminate Trees. In Proceedings of the Second European Symposium on Programming. North-Holland Publishing Co., Amsterdam, The Netherlands, The Netherlands, 231–248. http://dl.acm.org/citation.cfm?id=80098.80104Google Scholar
Cross Ref
- Yusheng Weijiang, Shruthi Balakrishna, Jianqiao Liu, and Milind Kulkarni. 2015. Tree Dependence Analysis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 314–325. DOI: Google Scholar
Digital Library
- Ben Wiedermann and William R. Cook. 2007. Extracting Queries by Static Analysis of Transparent Persistence. In Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’07). ACM, New York, NY, USA, 199–210. DOI: Google Scholar
Digital Library
Index Terms
TreeFuser: a framework for analyzing and fusing general recursive tree traversals
Recommendations
Sound, fine-grained traversal fusion for heterogeneous trees
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationApplications in many domains are based on a series of traversals of tree structures, and fusing these traversals together to reduce the total number of passes over the tree is a common, important optimization technique. In applications such as compilers ...
On fusing recursive traversals of K-d trees
CC 2016: Proceedings of the 25th International Conference on Compiler ConstructionLoop fusion is a key program transformation for data locality optimization that is implemented in production compilers. But optimizing compilers for imperative languages currently cannot ex- ploit fusion opportunities across a set of recursive tree ...
Reasoning about recursive tree traversals
PPoPP '21: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingTraversals are commonly seen in tree data structures, and performance-enhancing transformations between tree traversals are critical for many applications. Existing approaches to reasoning about tree traversals and their transformations are ad hoc, with ...






Comments