ABSTRACT
In POPL 2002, Petrank and Rawitz showed a universal result---finding optimal data placement is not only NP-hard but also impossible to approximate within a constant factor if P ≠ NP. Here we study a recently published concept called reference affinity, which characterizes a group of data that are always accessed together in computation. On the theoretical side, we give the complexity for finding reference affinity in program traces, using a novel reduction that converts the notion of distance into satisfiability. We also prove that reference affinity automatically captures the hierarchical locality in divide-and-conquer computations including matrix solvers and N-body simulation. The proof establishes formal links between computation patterns in time and locality relations in space.On the practical side, we show that efficient heuristics exist. In particular, we present a sampling method and show that it is more effective than the previously published technique, especially for data that are often but not always accessed together. We show the effect on generated and real traces. These theoretical and empirical results demonstrate that effective data placement is still attainable in general-purpose programs because common (albeit not all) locality patterns can be precisely modeled and efficiently analyzed.
- A. Aggarwal, B. Alpern, A. Chandra, and M. Snir. A model for hierarchical memory. In Proceedings of the ACM Conference on Theory of Computing, New York, NY, 1987. Google Scholar
Digital Library
- B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12(2/3):72--109, 1994.Google Scholar
Digital Library
- M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. In Proceedings of Symposium on Foundations of Computer Science, November 2000. Google Scholar
Digital Library
- K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, August 2002. Google Scholar
Digital Library
- B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998. Google Scholar
Digital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005. Google Scholar
Digital Library
- S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of International Conference on Supercomputing, 1999. Google Scholar
Digital Library
- T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google Scholar
Digital Library
- T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999. Google Scholar
Digital Library
- E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiway cuts. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, May 1992. Google Scholar
Digital Library
- A. Darte. On the complexity of loop fusion. Parallel Computing, 26(9):1175--1193, 2000. Google Scholar
Digital Library
- C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google Scholar
Digital Library
- C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108--134, 2004. Google Scholar
Digital Library
- C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google Scholar
Digital Library
- C. Fang, S. Carr, S. Onder, and Z. Wang. Instruction based memory distance analysis and its application to optimization. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google Scholar
Digital Library
- J. D. Frens and D. S. Wise. Auto-blocking matrix-multiplication or tracking BLAS3 performance with source code. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, Las Vegas, NV, 1997. Google Scholar
Digital Library
- J. D. Frens and D. S. Wise. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, San Diego, CA, 2003. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of Symposium on Foundations of Computer Science, October 1999. Google Scholar
Digital Library
- N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems, 21(5), September 1999. Google Scholar
Digital Library
- H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.Google Scholar
- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000. Google Scholar
Digital Library
- R. E. Hank, W. W. Hwu, and B. R. Rau. Region-based compilation: An introduction and motivation. In Proceedings of the Annual International Symposium on Microarchitecture, 1995. Google Scholar
Digital Library
- K. Kennedy and U. Kremer. Automatic data layout for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4), 1998. Google Scholar
Digital Library
- K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993. (also available as CRPC-TR94370).Google Scholar
- I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the SIGPLAN '97 Conference on Programming Language Design and Implementation, Las Vegas, NV, June 1997. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proceedings of the Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, 2005.Google Scholar
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google Scholar
Digital Library
- K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996. Google Scholar
Digital Library
- J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001. Google Scholar
Digital Library
- C. H. Papadimitriou. Computational Complexity. Addison Wesley, 1994.Google Scholar
- E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 2002. Google Scholar
Digital Library
- X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.Google Scholar
- X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google Scholar
Digital Library
- M. Snir and J. Yu. On the theory of spatial and temporal locality. Technical Report DCS-R-2005-2564, Computer Science Dept., Univ. of Illinois at Urbana-Champaign, July 2005.Google Scholar
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google Scholar
Digital Library
- M. M. Strout and P. Hovland. Metrics and models for reordering transformations. In Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004. Google Scholar
Digital Library
- K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981. Google Scholar
Digital Library
- B. S. White, S. A. McKee, B. R. de Supinski, B. Miller, D. Quinlan, and M. Schulz. Improving the computational intensity of unstructured mesh applications. In Proceedings of the 19th ACM International Conference on Supercomputing, Cambridge, MA, June 2005. Google Scholar
Digital Library
- M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google Scholar
Digital Library
- Q. Yi, V. Adve, and K. Kennedy. Transforming loops to recursion for multi-level memory hierarchies. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, Canada, June 2000. Google Scholar
Digital Library
- C. Zhang, Y. Zhong, C. Ding, and M. Ogihara. Finding reference affinity groups in trace using sampling method. Technical Report TR 842, Department of Computer Science, University of Rochester, July 2004. presented at the 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with ACM SIGKDD 2004.Google Scholar
- Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003. Google Scholar
Digital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004. Google Scholar
Digital Library
Index Terms
A hierarchical model of data locality
Recommendations
A hierarchical model of data locality
Proceedings of the 2006 POPL ConferenceIn POPL 2002, Petrank and Rawitz showed a universal result---finding optimal data placement is not only NP-hard but also impossible to approximate within a constant factor if P ≠ NP. Here we study a recently published concept called reference affinity, ...
Locality approximation using time
POPL '07: Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesReuse distance (i.e. LRU stack distance) precisely characterizes program locality and has been a basic tool for memory system research since the 1970s. However, the high cost of measuring has restricted its practical uses in performance debugging, ...
Locality approximation using time
Proceedings of the 2007 POPL ConferenceReuse distance (i.e. LRU stack distance) precisely characterizes program locality and has been a basic tool for memory system research since the 1970s. However, the high cost of measuring has restricted its practical uses in performance debugging, ...







Comments