Abstract
Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and lower performance, even if the working set is much smaller than cache capacity. Array padding (increasing the size of array dimensions) is a well-known optimization technique that can reduce conflict misses. In this paper, we develop the first algorithms for optimal padding of arrays aimed at a set-associative cache for arbitrary tile sizes. In addition, we develop the first solution to padding for nested tiles and multi-level caches. Experimental results with multiple benchmarks demonstrate a significant performance improvement from padding.
- J. Ansel. Autotuning programs with algorithmic choice. PhD thesis, Massachusetts Institute of Technology, 2014.Google Scholar
- J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U.-M. O’Reilly, and S. Amarasinghe. Open-Tuner: An extensible framework for program autotuning. In PACT’14, pages 303–316. ACM, 2014. Google Scholar
Digital Library
- ATLAS. ATLAS homepage. http://math-atlas.sourceforge.net.Google Scholar
- D. F. Bacon, J.-H. Chow, D.-c. R. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In CASCON’94. IBM Press, 1994. Google Scholar
Digital Library
- J. Bilmes. PHiPAC: a portable, high-performance, ANSI C coding methodology. In ICS’97. ACM, 1997. Google Scholar
Digital Library
- J. Douglas. Alternating direction methods for three space variables. Numerische Mathematik, 4(1):41–63, 1962. Google Scholar
Digital Library
- J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. LCPC’92, pages 328–343, 1992.Google Scholar
Cross Ref
- M. Frigo. A fast Fourier transform compiler. In PLDI’99, pages 169–180. ACM, May 1999. Google Scholar
Digital Library
- S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst., 21(4):703–746, July 1999. Google Scholar
Digital Library
- E. Herruzo, O. Plata, and E. L. Zapata. Using padding to optimize locality in scientific applications. In ICCS’08, pages 863–872. Springer, 2008. Google Scholar
Digital Library
- C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L.-N. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. Effective padding of multi-dimensional arrays to avoid cache conflict misses. Technical Report OSU-CISRC-4/16-TR2, Ohio State University, 2016.Google Scholar
- Intel. Intel FFT length and layout advisor. https://software.intel.com/en-us/articles/ fft-length-and-layout-advisor,.Google Scholar
- Intel. Intel Math Kernel Library. https://software.intel.com/en-us/intel-mkl,.Google Scholar
- K. Ishizaka, M. Obata, and H. Kasahara. Cache optimization for coarse grain task parallel processing using inter-array padding. In LCPC’04, pages 64–76. Springer, 2004. Google Scholar
Digital Library
- S. G. Johnson and M. Frigo. Implementing FFTs in practice. In C. S. Burrus, editor, Fast Fourier Transforms, chapter 11. Connexions, Rice University, Houston TX, September 2008.Google Scholar
- M. Kowarschik and C. Weiss. An overview of cache optimization techniques and cache-aware numerical algorithms. In Algorithms for Memory Hierarchies, volume 2625 of LNCS, pages 213–232. Springer, 2003.Google Scholar
- Z. Li and Y. Song. Automatic tiling of iterative stencil loops. ACM Trans. Program. Lang. Syst., 26(6):975–1028, Nov. 2004. Google Scholar
Digital Library
- P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. on Computers, 48(2):142–149, 1999. Google Scholar
Digital Library
- D. W. Peaceman and H. H. Rachford, Jr. The numerical solution of parabolic and elliptic differential equations. J. of the Society for Industrial and Applied Mathematics, 3(1):28– 41, 1955.Google Scholar
Cross Ref
- L.-N. Pouchet and T. Yuki. PolyBench/C 4.1. http://polybench.sourceforge.net.Google Scholar
- G. Rivera and C.-W. Tseng. Tiling optimizations for 3D scientific computations. In SC’00, page 32. IEEE, 2000. Google Scholar
Digital Library
- C. ¸Tăpu¸s, I.-H. Chung, J. K. Hollingsworth, et al. Active harmony: Towards automated performance tuning. In SC’02, pages 1–11. IEEE, 2002. Google Scholar
Digital Library
- A. Tiwari, C. Chen, J. Chame, M. Hall, and J. K. Hollingsworth. A scalable auto-tuning framework for compiler optimization. In IPDPS’09., pages 1–12. IEEE, 2009. Google Scholar
Digital Library
- R. C. Whaley, A. Petitet, and J. J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1–2):3–35, 2001.Google Scholar
Digital Library
Index Terms
Effective padding of multidimensional arrays to avoid cache conflict misses
Recommendations
Effective padding of multidimensional arrays to avoid cache conflict misses
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationCaches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and ...
Runtime identification of cache conflict misses: The adaptive miss buffer
This paper describes the miss classification table, a simple mechanism that enables the processor or memory controller to identify each cache miss as either a conflict miss or a capacity (non-conflict) miss. The miss classification table works by ...
Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiersOff-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...







Comments