skip to main content
10.5555/370049.370403acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free Access

Tiling optimizations for 3D scientific computations

Authors Info & Claims
Published:01 November 2000Publication History

ABSTRACT

Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show that iterative solvers for partial differential equations (PDEs) in three dimensions require new compiler optimizations not needed for 2D codes, since reuse along the third dimension cannot fit in cache for larger problem sizes. Tiling is a program transformation compilers can apply to capture this reuse, but successful application of tiling requires selection of non-conflicting tiles and/or padding array dimensions to eliminate conflicts. We present new algorithms and cost models for selecting tiling shapes and array pads. We explain why tiling is rarely needed for 2D PDE solvers, but can be helpful for 3D stencil codes. Experimental results show tiling 3D codes can reduce miss rates and achieve performance improvements of 17-121 percent for key scientific kernels, including a 27 percent average improvement for the key computational loop nest in the SPEC/NAS benchmark mgrid.

References

  1. 1.D. Bacon, J.-H. Chow, D.-C. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON'94, Toronto, Canada, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.J. Chame and S. Moon. A tile selection algorithm for data locality and cache interference. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.S. Chatterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.K. Esseghir. Improving data locality for caches. Master's thesis, Dept. of Computer Science, Rice University, September 1993.Google ScholarGoogle Scholar
  8. 8.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.K. Gatlin and L. Carter. Architecture-cognizant divide and conquer algorithms. In Proceedings of SC'99, Portland, OR, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.S. Ghosh, M. Martonosi, and S. Malik. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACMTransactions on Programming Languages and Systems, 21(4):703-746, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.S. Ghosh, M. Martonosi, and S. Malik. Automated cache optimizations using cme driven diagnosis. In Proceedings of the 2000 ACM International Conference on Supercomputing, Santa Fe, NM, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. Improving locality using loop and data transformations in an integrated framework. In Proceedings of the 31th IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.I. Kodukula, N. Ahmed, and K. Pingali. Datacentric multi-level blocking. In Proceedings of the SIG- PLAN '97 Conference on Programming Language Design and Implementation, Las Vegas, NV, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling and shacking for memory hierarchy management. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.N. Mitchell, L. Carter, J. Ferrante, and K. H ogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.R. Panda, H. Nakamura, N. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, 48(2):142-149, February 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.G. Rivera and C.-W. Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Confer ence on Supercomputing, Melbourne, Australia, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99), Amsterdam, TheNetherlands, March1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.G. Rivera and C.-W. Tseng. Locality optimizations for multi-level caches. In Proceedings of SC'99, Portland, OR, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.V. Sarkar. Automatic selection of higher order transformations in the IBM XL Fortran compilers. IBM Journal of Research and Development, 41(3):233- 264, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the SIG- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing '93, Portland, OR, November 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.C. Weib, W. Karl, M. Kowarschik, and U. R ude. Memory characteristics of iterative methods. In Proceedings of SC'99, Portland, OR, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.M. E. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th IEEE/ACMInternational Symposium on Microarchitecture, Paris, France, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.M. J. Wolfe. Iteration space tiling for memory hierarchies. In Proceedings of the Third SIAM Conference on Parallel Processing, December 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.M. J. Wolfe. More iteration space tiling. In Proceedings of Supercomputing '89, Reno, NV, November 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.D. Wonnacott. Time skewing for parallel computers. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing, San Diego, CA, August 1999.Google ScholarGoogle Scholar
  38. 38.Q. Yi, V. Adve, and K. Kennedy. Transforming loops to recursion for multi-level memory hierarchies. In Proceedings of the SIGPLAN '00 Conference on Programming Language Design and Implementation, Vancouver, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tiling optimizations for 3D scientific computations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing
            November 2000
            889 pages
            ISBN:0780398025

            Copyright © Copyright (c) 2000 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

            Publisher

            IEEE Computer Society

            United States

            Publication History

            • Published: 1 November 2000

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            SC '00 Paper Acceptance Rate62of179submissions,35%Overall Acceptance Rate1,516of6,373submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader