skip to main content
article

Effective automatic parallelization of stencil computations

Published:10 June 2007Publication History
Skip Abstract Section

Abstract

Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically required in order to tile stencil codes along the time dimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approach for automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Experimental results are provided that demonstrate the effectiveness of the approach.

References

  1. V. Adve, G. Jin, J. Mellor-Crummey, and Q. Yi. High performance fortran compilation techniques for parallelizing scientific codes. In Proceedings of Supercomputing '98, pages 1--23, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly nested loops. In Proceedings of ACM ICS 2000, pages 141--152, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In Proceedings of SC'00, page 31, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. International Journal of Parallel Programming, 29(5), Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In Proceedings of PPOPP '91, pages 39--50, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev. Optimal semi-oblique tiling. IEEE Trans. Par. & Dist. Sys., 14(9):944--960, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17(1):33--51, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of PLDI '95, pages 279--290, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Desprez, J. Dongarra, F. Rastello, and Y. Robert. Determining the idle time of a tiling: new results. Journal of Information Science and Engineering, 14:167--190, 1998.Google ScholarGoogle Scholar
  10. M. Frigo and V. Strumpen. The memory behavior of cache oblivious stencil computations. Journal of Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Griebl. On tiling space-time mapped loop nests. In Proceedings of SPAA '01, pages 322--323, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. University of Passau, 2004. Habilitation Thesis.Google ScholarGoogle Scholar
  13. R. Haralick and L. Shapiro. Computer and Robot Vision. Addison Wesley, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Hodzic and W. Shang. On time optimal supernode shape. IEEE Trans. Par. & Dist. Sys., 13(12):1220--1233, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Hogstedt, L. Carter, and J. Ferrante. Determining the idle time of a tiling. In Proceedings of POPL '97, pages 160--173, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Hogstedt, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In Proceedings of SPAA '99, pages 201--211, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of POPL '88, pages 319--329, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In Proceedings of MSPC '06, pages 51--60, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In Proceedings of MSP '05, pages 36--43, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Proceedings of FRONTIERS '95, page 332, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for nonshared memory machines. In Proceedings of Supercomputing '91, pages 111--120, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Renganarayana and S. Rajopadhye. A geometric programming framework for optimal multi-level tiling. In Proceedings of SC '04, page 18, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Sawdey, M. O'Keefe, and R. Bleck. The design, implementation, and performance of a parallel ocean circulation model. In Proceedings of 6th ECMWF Workshop on the Use of Parallel Processors in Meteorology: Coming of Age, pages 523--550, 1995.Google ScholarGoogle Scholar
  24. A. Sawdey and M. T. O'Keefe. Program analysis of overlap area usage in self-similar parallel programs. In Proceedings of LCPC '97, pages 79--93, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Schreiber and J. Dongarra. Automatic blocking of nested loops. Technical report, University of Tennessee, Knoxville, TN, Aug. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of PLDI '99, pages 215--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Taflove and S. C. Hagness. Computational Electrodynamics: The Finite-Difference Time-Domain Method, Third Edition. Artech House Publishers, 2005.Google ScholarGoogle Scholar
  28. M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of PLDI '91, pages 30--44, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, pages 655--664, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective automatic parallelization of stencil computations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 42, Issue 6
      Proceedings of the 2007 PLDI conference
      June 2007
      491 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1273442
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2007
        508 pages
        ISBN:9781595936332
        DOI:10.1145/1250734

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2007

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!