skip to main content
10.1145/1111037.1111040acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
Article

A hierarchical model of data locality

Published:11 January 2006Publication History

ABSTRACT

In POPL 2002, Petrank and Rawitz showed a universal result---finding optimal data placement is not only NP-hard but also impossible to approximate within a constant factor if PNP. Here we study a recently published concept called reference affinity, which characterizes a group of data that are always accessed together in computation. On the theoretical side, we give the complexity for finding reference affinity in program traces, using a novel reduction that converts the notion of distance into satisfiability. We also prove that reference affinity automatically captures the hierarchical locality in divide-and-conquer computations including matrix solvers and N-body simulation. The proof establishes formal links between computation patterns in time and locality relations in space.On the practical side, we show that efficient heuristics exist. In particular, we present a sampling method and show that it is more effective than the previously published technique, especially for data that are often but not always accessed together. We show the effect on generated and real traces. These theoretical and empirical results demonstrate that effective data placement is still attainable in general-purpose programs because common (albeit not all) locality patterns can be precisely modeled and efficiently analyzed.

References

  1. A. Aggarwal, B. Alpern, A. Chandra, and M. Snir. A model for hierarchical memory. In Proceedings of the ACM Conference on Theory of Computing, New York, NY, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Alpern, L. Carter, E. Feig, and T. Selker. The uniform memory hierarchy model of computation. Algorithmica, 12(2/3):72--109, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. In Proceedings of Symposium on Foundations of Computer Science, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of International Conference on Supercomputing, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiway cuts. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Darte. On the complexity of loop fusion. Parallel Computing, 26(9):1175--1193, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. Journal of Parallel and Distributed Computing, 64(1):108--134, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Fang, S. Carr, S. Onder, and Z. Wang. Instruction based memory distance analysis and its application to optimization. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, St. Louis, MO, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. D. Frens and D. S. Wise. Auto-blocking matrix-multiplication or tracking BLAS3 performance with source code. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, Las Vegas, NV, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. D. Frens and D. S. Wise. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming, San Diego, CA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of Symposium on Foundations of Computer Science, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Gloy and M. D. Smith. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems, 21(5), September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.Google ScholarGoogle Scholar
  21. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. E. Hank, W. W. Hwu, and B. R. Rau. Region-based compilation: An introduction and motivation. In Proceedings of the Annual International Symposium on Microarchitecture, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Kennedy and U. Kremer. Automatic data layout for distributed memory machines. ACM Transactions on Programming Languages and Systems, 20(4), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993. (also available as CRPC-TR94370).Google ScholarGoogle Scholar
  25. I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the SIGPLAN '97 Conference on Programming Language Design and Implementation, Las Vegas, NV, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Marin and J. Mellor-Crummey. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proceedings of the Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, 2005.Google ScholarGoogle Scholar
  28. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, July 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. H. Papadimitriou. Computational Complexity. Addison Wesley, 1994.Google ScholarGoogle Scholar
  32. E. Petrank and D. Rawitz. The hardness of cache conscious data placement. In Proceedings of ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.Google ScholarGoogle Scholar
  34. X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Proceedings of the Eleventh International Conference on Architect ural Support for Programming Languages and Operating Systems (ASPLOS XI), Boston, MA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Snir and J. Yu. On the theory of spatial and temporal locality. Technical Report DCS-R-2005-2564, Computer Science Dept., Univ. of Illinois at Urbana-Champaign, July 2005.Google ScholarGoogle Scholar
  36. M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. M. Strout and P. Hovland. Metrics and models for reordering transformations. In Proceedings of the 2nd ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. S. White, S. A. McKee, B. R. de Supinski, B. Miller, D. Quinlan, and M. Schulz. Improving the computational intensity of unstructured mesh applications. In Proceedings of the 19th ACM International Conference on Supercomputing, Cambridge, MA, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Q. Yi, V. Adve, and K. Kennedy. Transforming loops to recursion for multi-level memory hierarchies. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Zhang, Y. Zhong, C. Ding, and M. Ogihara. Finding reference affinity groups in trace using sampling method. Technical Report TR 842, Department of Computer Science, University of Rochester, July 2004. presented at the 3rd Workshop on Mining Temporal and Sequential Data, in conjunction with ACM SIGKDD 2004.Google ScholarGoogle Scholar
  43. Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A hierarchical model of data locality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!