skip to main content
10.1145/3168815acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

DeLICM: scalar dependence removal at zero memory cost

Published: 24 February 2018 Publication History

Abstract

Increasing data movement costs motivate the integration of polyhedral loop optimizers in the standard flow (-O3) of production compilers. While polyhedral optimizers have been shown to be effective when applied as source-to-source transformation, the single static assignment form used in modern compiler mid-ends makes such optimizers less effective. Scalar dependencies (dependencies carried over a single memory location) are the main obstacle preventing effective optimization. We present DeLICM, a set of transformations which, backed by a polyhedral value analysis, eliminate problematic scalar dependences by 1) relocating scalar memory references to unused array locations and by 2) forwarding computations that otherwise cause scalar dependences. Our experiments show that DeLICM effectively eliminates dependencies introduced by compiler-internal canonicalization passes, human programmers, optimizing code generators, or inlining -- without the need for any additional memory allocation. As a result, polyhedral loop optimizations can be better integrated into compiler pass pipelines which is essential for metaprogramming optimization.

References

[1]
Annanay Agarwal. 2017. Enable Polyhedral Optimizations in XLA through LLVM/Polly. Google Summer of Code 2017 final report. (2017). http://pollylabs.org/2017/08/29/GSoC-final-reports.html
[2]
Randy Allen and Ken Kennedy. 1987. Automatic Translation of FORTRAN Programs to Vector Form. Transactions on Programming Languages and Systems (TOPLAS) 9, 4 (Oct. 1987), 491–542.
[3]
Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege, and Konrad Trifunović. 2013. Improved Loop Tiling Based on the Removal of Spurious False Dependences. Transactions on Architecture and Code Optimization (TACO) 9, 4, Article 52 (Jan. 2013), 52:1–52:26 pages.
[4]
Vinayaka Bandishti, Irshad Pananilath, and Uday Bondhugula. 2012. Tiling Stencil Computations to Maximize Parallelism. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1–11.
[5]
Muthu Baskaran, Jj Ramanujam, and P Sadayappan. 2010. Automatic C-to-CUDA code generation for affine programs. In Compiler Construction. Springer, 244–263.
[6]
Somashekaracharya G. Bhaskaracharya, Uday Bondhugula, and Albert Cohen. 2016. Automatic Storage Optimization for Arrays. Transactions on Architecture and Code Optimization (TACO) 38, 3, Article 11 (April 2016), 11:1–11:23 pages.
[7]
Somashekaracharya G. Bhaskaracharya, Uday Bondhugula, and Albert Cohen. 2016. SMO: An Integrated Approach to Intra-Array and InterArray Storage Optimization. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 526–538.
[8]
Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A Model for Fusion and Code Motion in an Automatic Parallelizing Compiler. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT ’10). ACM, 343–352.
[9]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. SIGPLAN Notices 43, 6, 101–113. http://pluto-compiler. sourceforge.net
[10]
Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register Allocation via Coloring. Computer Languages 6, 1 (1981), 47–57.
[11]
Alain Darte, Alexandre Isoard, and Tomofumi Yuki. 2016. Extended Lattice-Based Memory Allocation. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, New York, NY, USA, 218–228.
[12]
Paul Feautrier. 1992. Some Efficient Solutions to the Affine Scheduling Problem – Part I. One-dimensional Time. International Journal of Parallel Programming 21, 6 (Oct. 1992), 313–347.
[13]
Paul Feautrier. 1992. Some Efficient Solutions to the Affine Scheduling Problem – Part II. Multidimensional Time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389–420.
[14]
Paul Feautrier. 2014. Array Expansion. In International Conference on Supercomputing 25th Anniversary Volume (SC ’14). ACM, New York, NY, USA, 99–111.
[15]
Tobias Grosser, Armin Grösslinger, and Christian Lengauer. 2012. Polly – Performing Polyhedral Optimizations on a Low-Level Intermediate Representation. Parallel Processing Letters 22, 04 (2012). http://polly. llvm.org
[16]
Kathleen Knobe and Vivek Sarkar. 1998. Array SSA Form and Its Use in Parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’98). ACM, New York, NY, USA, 107–120.
[17]
Mathias Koch and Joerg Walter. 2017. Boost uBLAS. (7 Sept. 2017). http: //www.boost.org/doc/libs/1_65_1/libs/numeric/ublas/doc/index.html
[18]
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’81). ACM, New York, NY, USA, 207–218.
[19]
Vincent Lefebvre and Paul Feautrier. 1998. Automatic storage management for parallel programs. Parallel Comput. 24, 3 (1998), 649 – 671.
[20]
Zhiyuan Li. 1992. Array Privatization: A Loop Transformation for Parallel Execution. Technical Report 9226. Univ. of Minnesota.
[21]
Sanyam Mehta and Pen-Chung Yew. 2016. Variable Liberalization. Transactions on Architecture and Code Optimization (TACO) 13, 3, Article 23 (Aug. 2016), 23:1–23:25 pages.
[22]
Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. Polymage: Automatic Optimization for Image Processing Pipelines. In SIGPLAN Notices, Vol. 50. ACM, 429–443.
[23]
Irshad Pananilath, Aravind Acharya, Vinay Vasista, and Uday Bondhugula. 2015. An Optimizing Code Generator for a Class of LatticeBoltzmann Computations. Transactions on Architecture and Code Optimization (TACO) 12, 2 (2015), 14.
[24]
Louis-Noel Pouchet and Tomofumi Yuki. 2016. Polybench 4.2.1 beta. (2016). Retrieved 2017-07-07 from https://sourceforge.net/projects/ polybench
[25]
Fabien Quilleré and Sanjay Rajopadhye. 2000. Optimizing Memory Usage in the Polyhedral Model. Transactions on Architecture and Code Optimization (TACO) 22, 5 (Sept. 2000), 773–815.
[26]
Konrad Trifunovic, Albert Cohen, David Edelsohn, Feng Li, Tobias Grosser, Harsha Jagasia, Razya Ladelsky, Sebastian Pop, Jan Sjödin, and Ramakrishna Upadrasta. 2010. Graphite Two Years After: First Lessons learned from Real-World Polyhedral Compilation. In GCC Research Opportunities Workshop (GROW ’10).
[27]
Konrad Trifunovic, Albert Cohen, Ladelski Razya, and Feng Li. 2011. Elimination of Memory-Based Dependences for Loop-Nest Optimization and Parallelization. In 3rd Workshop on GCC Research Opportunities (GROW ’11). Chamonix, France.
[28]
Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, and Francky Catthoor. 2005. Transformation to Dynamic Single Assignment Using a Simple Data Flow Analysis. In Programming Languages and Systems: Third Asian Symposium, APLAS 2005, Tsukuba, Japan, November 2-5, 2005. Proceedings, Kwangkeun Yi (Ed.). Springer, Berlin, Heidelberg, 330–346.
[29]
Nicolas Vasilache, Benoit Meister, Albert Hartono, Muthu Baskaran, David Wohlford, and Richard Lethin. 2012. Trading Off Memory For Parallelism Quality. In International Workshop on Polyhedral Compilation Techniques (IMPACT ’12).
[30]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In SIGPLAN Notices, Vol. 50. ACM, 521–532.
[31]
Sven Verdoolaege. 2016. Presburger Formulas and Polyhedral Compilation. Technical Report. https://lirias.kuleuven.be/handle/123456789/ 523109
[32]
Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, Jose Ignacio Gomez, Christian Tenllado, and Francky Catthoor. 2013. Polyhedral Parallel Code Generation for CUDA. Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 54.
[33]
Sven Verdoolaege and Albert Cohen. 2016. Live Range Reordering. In International Workshop on Polyhedral Compilation Techniques (IMPACT ’16). Prague, Czech Republic.
[34]
Doran K. Wilde and Sanjay Rajopadhye. 1993. Allocating Memory Arrays for Polyhedra. Research Report RR-2059. Inria.

Cited By

View all
  • (2022)Reduced O3 subsequence labelling: a stepping stone towards optimisation sequence predictionConnection Science10.1080/09540091.2022.204476134:1(2860-2877)Online publication date: 1-Mar-2022
  • (2021)Polygeist: Raising C to Polyhedral MLIRProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00011(45-59)Online publication date: 26-Sep-2021
  • (2021)Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature ReviewHigh Performance Computing10.1007/978-3-030-90539-2_16(233-246)Online publication date: 24-Jun-2021

Index Terms

  1. DeLICM: scalar dependence removal at zero memory cost

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '18: Proceedings of the 2018 International Symposium on Code Generation and Optimization
    February 2018
    377 pages
    ISBN:9781450356176
    DOI:10.1145/3179541
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication Notes

    Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

    Publication History

    Published: 24 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. LLVM
    2. Polly
    3. Polyhedral Framework
    4. Scalar Dependence

    Qualifiers

    • Research-article

    Conference

    CGO '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Reduced O3 subsequence labelling: a stepping stone towards optimisation sequence predictionConnection Science10.1080/09540091.2022.204476134:1(2860-2877)Online publication date: 1-Mar-2022
    • (2021)Polygeist: Raising C to Polyhedral MLIRProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00011(45-59)Online publication date: 26-Sep-2021
    • (2021)Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature ReviewHigh Performance Computing10.1007/978-3-030-90539-2_16(233-246)Online publication date: 24-Jun-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media