skip to main content
article
Public Access

Effective padding of multidimensional arrays to avoid cache conflict misses

Published:02 June 2016Publication History
Skip Abstract Section

Abstract

Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.

References

  1. J. Ansel. Autotuning programs with algorithmic choice. PhD thesis, Massachusetts Institute of Technology, 2014.Google ScholarGoogle Scholar
  2. J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U.-M. O’Reilly, and S. Amarasinghe. Open-Tuner: An extensible framework for program autotuning. In PACT’14, pages 303–316. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ATLAS. ATLAS homepage. http://math-atlas.sourceforge.net.Google ScholarGoogle Scholar
  4. D. F. Bacon, J.-H. Chow, D.-c. R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In CASCON’94. IBM Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bilmes. PHiPAC: a portable, high-performance, ANSI C coding methodology. In ICS’97. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Douglas. Alternating direction methods for three space variables. Numerische Mathematik, 4(1):41–63, 1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. LCPC’92, pages 328–343, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Frigo. A fast Fourier transform compiler. In PLDI’99, pages 169–180. ACM, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst., 21(4):703–746, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Herruzo, O. Plata, and E. L. Zapata. Using padding to optimize locality in scientific applications. In ICCS’08, pages 863–872. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L.-N. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. Effective padding of multi-dimensional arrays to avoid cache conflict misses. Technical Report OSU-CISRC-4/16-TR2, Ohio State University, 2016.Google ScholarGoogle Scholar
  12. Intel. Intel FFT length and layout advisor. https://software.intel.com/en-us/articles/ fft-length-and-layout-advisor,.Google ScholarGoogle Scholar
  13. Intel. Intel Math Kernel Library. https://software.intel.com/en-us/intel-mkl,.Google ScholarGoogle Scholar
  14. K. Ishizaka, M. Obata, and H. Kasahara. Cache optimization for coarse grain task parallel processing using inter-array padding. In LCPC’04, pages 64–76. Springer, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. G. Johnson and M. Frigo. Implementing FFTs in practice. In C. S. Burrus, editor, Fast Fourier Transforms, chapter 11. Connexions, Rice University, Houston TX, September 2008.Google ScholarGoogle Scholar
  16. M. Kowarschik and C. Weiss. An overview of cache optimization techniques and cache-aware numerical algorithms. In Algorithms for Memory Hierarchies, volume 2625 of LNCS, pages 213–232. Springer, 2003.Google ScholarGoogle Scholar
  17. Z. Li and Y. Song. Automatic tiling of iterative stencil loops. ACM Trans. Program. Lang. Syst., 26(6):975–1028, Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. on Computers, 48(2):142–149, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. W. Peaceman and H. H. Rachford, Jr. The numerical solution of parabolic and elliptic differential equations. J. of the Society for Industrial and Applied Mathematics, 3(1):28– 41, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  20. L.-N. Pouchet and T. Yuki. PolyBench/C 4.1. http://polybench.sourceforge.net.Google ScholarGoogle Scholar
  21. G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In SC’00, page 32. IEEE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. ¸Tăpu¸s, I.-H. Chung, J. K. Hollingsworth, et al. Active harmony: Towards automated performance tuning. In SC’02, pages 1–11. IEEE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In IPDPS’09., pages 1–12. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective padding of multidimensional arrays to avoid cache conflict misses

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 6
      PLDI '16
      June 2016
      726 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2980983
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2016
        726 pages
        ISBN:9781450342612
        DOI:10.1145/2908080
        • General Chair:
        • Chandra Krintz,
        • Program Chair:
        • Emery Berger

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 June 2016

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!