skip to main content
research-article

How a Java VM can get more from a hardware performance monitor

Published:25 October 2009Publication History
Skip Abstract Section

Abstract

This paper describes our sampling-based profiler that exploits a processor's HPM (Hardware Performance Monitor) to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level event profiling and lightweight context-sensitive event profiling. For Java events, we propose new techniques to leverage the sampling facility of the HPM to generate object creation profiles and lock activity profiles. The HPM sampling is the key to achieve a smaller overhead compared to profilers that do not rely on hardware helps. To sample the object creations with the HPM, which can only sample hardware events such as executed instructions or cache misses, we correlate the object creations with the store instructions for Java object headers. For the lock activity profile, we introduce an instrumentation-based technique, called ProbeNOP, which uses a special NOP instruction whose executions are counted by the HPM. For the context-sensitive event profiling, we propose a new technique called CallerChaining, which detects the calling context of HPM events based on the call stack depth (the value of the stack frame pointer). We show that it can detect the calling contexts in many programs including a large commercial application. Our proposed techniques enable both programmers and runtime systems to get more valuable information from the HPM to understand and optimize the programs without adding significant runtime overhead.

References

  1. G. Ammons, T. Ball, and J. R. Larus. "Exploiting hardware performance counters with flow and context sensitive profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 85--96, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. "Java just-in-time compiler and virtual machine improvements for server and middleware applications". In Proceedings of the USENIX Virtual Machine Research and Technology Symposium, pp. 151--162, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. "IBM POWER6 microarchitecture". IBM Journal of Research and Development, Vol. 51 (6), pp. 639--662, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. "Prefetch injection based on hardware monitoring and object metadata". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 267--276, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Ogasawara, H. Komatsu, and T. Nakatani. "To-lock: Removing lock overhead using the owners' temporal locality". In Proceedings of the Conference on Parallel Architectures and Compilation Techniques, pp. 255-266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Kawachiya, A. Koseki, and T. Onodera. "Lock reservation: Java locks can mostly do without atomic operations". In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 292--310, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Jones and C. Ryder. "A Study of Java Object Demographics". In Proceedings of the ACM International Symposium on Memory Management, pp. 121--130, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. L. Seidl and B. G. Zorn. "Segregating heap objects by reference behavior and lifetime". In Proceedings of the eighth Architectural Support for Programming Languages and Operating Systems, pp 12--23, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. E. Levine. "A programmer's view of performance monitoring in the PowerPC microprocessor". IBM Journal of Research and Development, Vol 41 (3), pp. 345--356, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. OProfile - A System Profiler for Linux. http://oprofile.sourceforge.net/news/Google ScholarGoogle Scholar
  11. Intel Corp. IA-32 Intel Architecture Software Developer's Manual.Google ScholarGoogle Scholar
  12. JVM Tool Interface version 1.0. http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.htmlGoogle ScholarGoogle Scholar
  13. M. Jump, S. M. Blackburn, and K.S. McKinley. "Dynamic object sampling for pretenuring", In Proceedings of the International Symposium on Memory Management, pp. 152--162, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Hauswirth and T. M. Chilimbi. "Low-overhead memory leak detection using adaptive statistical profiling", in Proceedings of the international conference on Architectural support for programming languages and operating systems table of contents, pp. 156--164, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Arnold, and B. G. Ryder. "A framework for reducing the cost of instrumented code". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 168--179, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Spivey. "Fast, Accurate Call Graph Profiling". Software: Practice and Experience, Vol. 34 (3), pp. 249--264, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. D. Bond, and K. S. McKinley. "Probabilistic Calling Context". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 97--112, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Zhuang, M. J. Serrano, H. W. Cain, and J Choi. "Accurate, efficient, and adaptive calling context profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 263--271, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Arnold and P. F. Sweeney. "Approximating the calling context tree via sampling". IBM Research Report, 2000.Google ScholarGoogle Scholar
  20. J. Whaley. "A portable sampling-based profiler for java virtualmachines". In Proceedings of ACM Java Grande, pp. 78--87, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Mytkowicz, D. Coughlin, and A. Diwan. "Inferred Call Path Profiling", In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, to appear, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. T. Schneider, M. Payer, and T. R. Gross. "Online optimizations driven by hardware performance monitoring". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 373--382, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Cuthbertson, S. Viswanathan, K. Bobrovsky, A. Astapchuk, E. Kaczmarek, and U. Srinivasan. "A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM". In Proceedings of the International Symposium on Code Generation and Optimization, pp. 190--199, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Serrano and X. Zhuang, "Placement Optimization Using Data Context Collected During Garbage Collection", In Proceedings of the International Symposium on Memory Management, pp. 69--78, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Dolby. "Automatic Inline Allocation of Objects", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 7--17, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Power.org, Power Instruction Set Architecture Version 2.05. http://www.power.org/resources/reading/PowerISA_V2.05.pdfGoogle ScholarGoogle Scholar
  27. N. Grcevski, "Effective method for Java Lock Reservation for Java Virtual Machines that Have Cooperative Multithreading" 6th Workshop on Compiler-Driven Performance, 2007.Google ScholarGoogle Scholar
  28. D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. "Thin Locks: Featherweight Synchronization for Java". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 258--268, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Onodera and K. Kawachiya. "A study of locking objects with bimodal fields". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 223--237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Performance Inspector, http://perfinsp.sourceforge.net/Google ScholarGoogle Scholar
  31. S. L. Graham, P. B. Kessler, and M K. McKusick. "An execution profiler for modular programs". Software: Practice and Experience, Vol. 13 (8), pp. 671--685, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  32. Standard Performance Evaluation Corporation. SPECjbb2005. http://www.spec.org/jbb2005/Google ScholarGoogle Scholar
  33. Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/Google ScholarGoogle Scholar
  34. The Apache Software Foundation. DayTrader. http://cwiki.apache.org/GMOxDOC20/daytrader.htmlGoogle ScholarGoogle Scholar
  35. IBM Corporation. WebSphere Application Server. http://www-01.ibm.com/software/webservers/appserv/was/Google ScholarGoogle Scholar

Index Terms

  1. How a Java VM can get more from a hardware performance monitor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 10
      OOPSLA '09
      October 2009
      554 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1639949
      Issue’s Table of Contents
      • cover image ACM Conferences
        OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
        October 2009
        590 pages
        ISBN:9781605587660
        DOI:10.1145/1640089

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 October 2009

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!