Abstract
A program can benefit from improved cache block utilization when contemporaneously accessed data elements are placed in the same memory block. This can reduce the program's memory block working set and thereby, reduce the capacity miss rate. We formally define the problem of data packing for arbitrary number of blocks in the cache and packing factor (the number of data objects fitting in a cache block) and study how well the optimal solution can be approximated for two dual problems. On the one hand, we show that the cache hit maximization problem is approximable within a constant factor, for every fixed number of blocks in the cache. On the other hand, we show that unless P=NP, the cache miss minimization problem cannot be efficiently approximated.
- Cortex57 technical reference manual. 2012. Available at http://www.arm.com/products/processors/cortex-a/cortex-a57processor.php.Google Scholar
- Cortex72 technical reference manual. 2015. Available at http://www.arm.com/products/processors/cortex-a/cortex-a72processor.php.Google Scholar
- A. Aggarwal, B. Alpern, A. Chandra, and M. Snir. A model for hierarchical memory. In Proceedings of the ACM Conference on Theory of Computing, pages 305–314, 1987. Google Scholar
Digital Library
- K. Andreev and H. Räcke. Balanced graph partitioning. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 120–124, 2004. Google Scholar
Digital Library
- E. M. Arkin and R. Hassin. On local search for weighted k-set packing. In Proceedings of the 5th Annual European Symposium on Algorithms, pages 13–22, 1997. Google Scholar
Digital Library
- T. N. Bui and C. Jones. Finding good approximate vertex and edge partitions is NP-hard. Inf. Process. Lett., 42(3):153–159, 1992. Google Scholar
Digital Library
- B. Calder, C. Krintz, S. John, and T. M. Austin. Cache-conscious data placement. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 139–149, 1998. Google Scholar
Digital Library
- J. F. Cantin and M. D. Hill. Cache performance for SPEC CPU2000 benchmarks. http://www.cs.wisc.edu/multifacet/misc/spec2000cachedata.Google Scholar
- B. Chandra and M. M. Halldórsson. Greedy local improvement and weighted set packing approximation. J. Algorithms, 39(2):223–240, 2001. Google Scholar
Digital Library
- T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 13–24, 1999. Google Scholar
Digital Library
- U. Feige. Relations between average case complexity and approximation complexity. In Proceedings of the 17th Annual IEEE Conference on Computational Complexity, pages 534–543, 2002. Google Scholar
Digital Library
- U. Feige and R. Krauthgamer. A polylogarithmic approximation of the minimum bisection. SIAM J. Comput., 31(4):1090–1118, 2002. Google Scholar
Digital Library
- B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cacheoblivious algorithms. In Proceedings of the Symposium on Foundations of Computer Science, pages 285–298, 1999. Google Scholar
Digital Library
- N. C. Gloy and M. D. Smith. Procedure placement using temporalordering information. ACM Transactions on Programming Languages and Systems, 21(5):977–1027, 1999. Google Scholar
Digital Library
- R. E. Gomory and T. C. Hu. Multi-terminal network flows. Journal of the Society for Industrial & Applied Mathematics, 9(4):551––570, 1961.Google Scholar
Cross Ref
- E. G. Hallnor and S. K. Reinhardt. A fully associative softwaremanaged cache design. In 27th International Symposium on Computer Architecture, pages 107–116, 2000. Google Scholar
Digital Library
- E. Hazan, S. Safra, and O. Schwartz. On the complexity of approximating k-set packing. Computational Complexity, 15(1):20–39, 2006. Google Scholar
Digital Library
- X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the International Conference on Object-Oriented Programming, Systems, Languages and Applications, pages 69–80, 2004. Google Scholar
Digital Library
- C. Huneycutt, J. B. Fryman, and K. M. Mackenzie. Software caching using dynamic binary rewriting for embedded devices. In 31st International Conference on Parallel Processing, pages 621–630, 2002. Google Scholar
Digital Library
- S. Khot. On the power of unique 2-prover 1-round games. In Proceedings on 34th Annual ACM Symposium on Theory of Computing, pages 767–775, 2002. Google Scholar
Digital Library
- D. G. Kirkpatrick and P. Hell. On the completeness of a generalized matching problem. In Proceedings of the 10th Annual ACM Symposium on Theory of Computing, pages 240–245, 1978. Google Scholar
Digital Library
- P. Li, H. Luo, C. Ding, Z. Hu, and H. Ye. Code layout optimization for defensiveness and politeness in shared cache. In Proceedings of the International Conference on Parallel Processing, pages 151–161, 2014. Google Scholar
Digital Library
- X. Liu, K. Sharma, and J. M. Mellor-Crummey. ArrayTool: a lightweight profiler to guide array regrouping. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 405–416, 2014. Google Scholar
Digital Library
- Q. Lu, J. Lin, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Soft-OLP: Improving hardware cache performance through softwarecontrolled object-level partitioning. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 246–257, 2009. Google Scholar
Digital Library
- N. McIntosh, S. Mannarswamy, and R. Hundt. Whole-program optimization of global variable layout. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 164–172, 2006. Google Scholar
Digital Library
- E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 101–112, 2002. Google Scholar
Digital Library
- K. Pettis and R. C. Hansen. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 16–27, 1990. Google Scholar
Digital Library
- M. K. Qureshi, D. Thompson, and Y. N. Patt. The V-way cache: Demand-based associativity via global replacement. In 32st International Symposium on Computer Architecture, pages 544–555, 2005. Google Scholar
Digital Library
- R. M. Rabbah and K. V. Palem. Data remapping for design space optimization of embedded memory systems. ACM Transactions in Embedded Computing Systems, 2(2), 2003. Google Scholar
Digital Library
- S. Rubin, R. Bodik, and T. Chilimbi. An efficient profile-analysis framework for data layout optimizations. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2002. Google Scholar
Digital Library
- H. Saran and V. V. Vazirani. Finding k-cuts within twice the optimal. SIAM Journal on Computing, 24(1):101–108, 1995. Google Scholar
Digital Library
- M. L. Seidl and B. G. Zorn. Segregating heap objects by reference behavior and lifetime. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–23, 1998. Google Scholar
Digital Library
- S. Seo, J. Lee, and Z. Sura. Design and implementation of softwaremanaged caches for multicores with local memory. In 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), pages 55–66, 2009.Google Scholar
- X. Shen, Y. Gao, C. Ding, and R. Archambault. Lightweight reference affinity analysis. In Proceedings of the International Conference on Supercomputing, pages 131–140, 2005. Google Scholar
Digital Library
- K. O. Thabit. Cache management by the compiler. PhD thesis, Rice University, Houston, TX, 1982. Google Scholar
Digital Library
- J. Yan, J. He, W. Chen, P.-C. Yew, and W. Zheng. ASLOP: A fieldaccess affinity-based structure data layout optimizer. SCIENCE CHINA Info. Sci., 54(9):1769–1783, 2011.Google Scholar
Cross Ref
- C. Zhang, C. Ding, M. Ogihara, Y. Zhong, and Y. Wu. A hierarchical model of data locality. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 16–29, 2006. Google Scholar
Digital Library
- P. Zhao, S. Cui, Y. Gao, R. Silvera, and J. N. Amaral. Forma: A framework for safe automatic array reshaping. ACM Transactions on Programming Languages and Systems, 30(1):2, 2007. Google Scholar
Digital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 255–266, 2004. Google Scholar
Digital Library
Index Terms
The hardness of data packing
Recommendations
Efficient parameterized algorithms for data packing
There is a huge gap between the speeds of modern caches and main memories, and therefore cache misses account for a considerable loss of efficiency in programs. The predominant technique to address this issue has been Data Packing: data elements that ...
The hardness of data packing
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesA program can benefit from improved cache block utilization when contemporaneously accessed data elements are placed in the same memory block. This can reduce the program's memory block working set and thereby, reduce the capacity miss rate. We ...
The hardness of cache conscious data placement
POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languagesThe growing gap between the speed of memory access and cache access has made cache misses an influential factor in program efficiency. Much effort has been spent recently on reducing the number of cache misses during program run. This effort includes ...






Comments