skip to main content
research-article

Identifying the sources of cache misses in Java programs without relying on hardware counters

Published:15 June 2012Publication History
Skip Abstract Section

Abstract

Cache miss stalls are one of the major sources of performance bottlenecks for multicore processors. A Hardware Performance Monitor (HPM) in the processor is useful for locating the cache misses, but is rarely used in the real world for various reasons. It would be better to find a simple approach to locate the sources of cache misses and apply runtime optimizations without relying on an HPM. This paper shows that pointer dereferencing in hot loops is a major source of cache misses in Java programs. Based on this observation, we devised a new approach to identify the instructions and objects that cause frequent cache misses. Our heuristic technique effectively identifies the majority of the cache misses in typical Java programs by matching the hot loops to simple idiomatic code patterns. On average, our technique selected only 2.8% of the load and store instructions generated by the JIT compiler and these instructions accounted for 47% of the L1D cache misses and 49% of the L2 cache misses caused by the JIT-compiled code. To prove the effectiveness of our technique in compiler optimizations, we prototyped object placement optimizations, which align objects in cache lines or collocate paired objects in the same cache line to reduce cache misses. For comparison, we also implemented the same optimizations based on the accurate information obtained from the HPM. Our results showed that our heuristic approach was as effective as the HPM-based approach and achieved comparable performance improvements in the SPECjbb2005 and SPECpower_ssj2008 benchmark programs.

References

  1. A. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney, Prefetch injection based on hardware monitoring and object metadata, in Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 267--276, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. T. Schneider, M. Payer, and T. R. Gross, "Online optimizations driven by hardware performance monitoring", in Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 373--382, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Serrano and X. Zhuang, "Placement Optimization Using Data Context Collected During Garbage Collection", In Proceedings of the International Symposium on Memory Management, pp. 69--78, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cuthbertson, S. Viswanathan, K. Bobrovsky, A. Astapchuk, E. Kaczmarek, and U. Srinivasan, "A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM", in Proceedings of the International Symposium on Code Generation and Optimization, pp. 190--199, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Burtscher, A, Diwan and M. Hauswirth, "Static load classification for improving the value predictability of data cache misses"in Proceedings of the ACM Conference on Programming Language Design and Implementation", pp. 222--233, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. M. Panait, A. Sasturkar, and W. F. Wong, "Static Identification of Delinquent Loads", in Proceedings of the International Symposium on Code Generation and Optimization, pp. 303--314, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. M. Chilimbi, and J. R. Larus, "Using generational garbage collection to implement cache-conscious data placement", in Proceedings of the ACM International Symposium on Memory Management, pp. 37--48, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. M. Chilimbi, M. D. Hill, and J. R. Larus, "Cache-conscious structure layout", in Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 1--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Chen, S. Bhansali, T. M. Chilimbi, X. Gao, and W. Chuang, "Profile-guided proactive garbage collection for locality optimization", in Proceedings of ACM Conference on Programming Language Design and Implementation, pp. 332--340, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. L. Seidel and B. G. Zorn, "Segregating Heap Objects by Reference Behavior and Lifetime", in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 12--23, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Shuf, M. Gupta, R. Bordawekar, and J. P. Singh, "Exploiting prolific types for memory management and optimizations", in Proceedings of the ACM Symposium on Principles of Programming Languages, pp. 295--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Jula and L. Rauchwerger, "Two memory allocators that use hints to improve locality", in Proceedings of the ACM International Symposium on Memory Management, pp. 109--118, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Jeon, K. Shin, and H. Han, "Layout transformations for heap objects using static access patterns", in Proceedings of the International Conference on Compiler Construction, pp. 187--201, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. M. Blackburn et al., "The DaCapo Benchmarks: Java Benchmarking Development and Analysis", in Proceedings of the ACM conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 169--190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden, "IBM POWER6 microarchitecture", IBM Journal of Research and Development, Vol. 51 (6), pp. 639--662, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Siegwart and Martin Hirzel, "Improving locality with parallel hierarchical copying GC", in Proceedings of the International Symposium on Memory Management, pp. 52--63, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Inoue and T. Nakatani, "How a Java VM Can Get More from a Hardware Performance Monitor", in Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 137--154, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Jump, S. M. Blackburn, and K. S. McKinley, "Dynamic object sampling for pretenuring", in Proceedings of the International Symposium on Memory Management, pp. 152--162, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Odaira, K. Ogata, K. Kawachiya, T. Onodera, and T. Nakatani, "Efficient Runtime Tracking of Allocation Sites in Java", in Proceedings of the ACM International Conference on Virtual Execution Environments, pp. 109--120, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Identifying the sources of cache misses in Java programs without relying on hardware counters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 11
      ISMM '12
      November 2012
      136 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2426642
      Issue’s Table of Contents
      • cover image ACM Conferences
        ISMM '12: Proceedings of the 2012 international symposium on Memory Management
        June 2012
        152 pages
        ISBN:9781450313506
        DOI:10.1145/2258996

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 June 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!