Abstract
This paper describes our sampling-based profiler that exploits a processor's HPM (Hardware Performance Monitor) to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level event profiling and lightweight context-sensitive event profiling. For Java events, we propose new techniques to leverage the sampling facility of the HPM to generate object creation profiles and lock activity profiles. The HPM sampling is the key to achieve a smaller overhead compared to profilers that do not rely on hardware helps. To sample the object creations with the HPM, which can only sample hardware events such as executed instructions or cache misses, we correlate the object creations with the store instructions for Java object headers. For the lock activity profile, we introduce an instrumentation-based technique, called ProbeNOP, which uses a special NOP instruction whose executions are counted by the HPM. For the context-sensitive event profiling, we propose a new technique called CallerChaining, which detects the calling context of HPM events based on the call stack depth (the value of the stack frame pointer). We show that it can detect the calling contexts in many programs including a large commercial application. Our proposed techniques enable both programmers and runtime systems to get more valuable information from the HPM to understand and optimize the programs without adding significant runtime overhead.
- G. Ammons, T. Ball, and J. R. Larus. "Exploiting hardware performance counters with flow and context sensitive profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 85--96, 1997. Google Scholar
Digital Library
- N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. "Java just-in-time compiler and virtual machine improvements for server and middleware applications". In Proceedings of the USENIX Virtual Machine Research and Technology Symposium, pp. 151--162, 2004. Google Scholar
Digital Library
- H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. "IBM POWER6 microarchitecture". IBM Journal of Research and Development, Vol. 51 (6), pp. 639--662, 2007. Google Scholar
Digital Library
- A. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. "Prefetch injection based on hardware monitoring and object metadata". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 267--276, 2004. Google Scholar
Digital Library
- T. Ogasawara, H. Komatsu, and T. Nakatani. "To-lock: Removing lock overhead using the owners' temporal locality". In Proceedings of the Conference on Parallel Architectures and Compilation Techniques, pp. 255-266, 2004. Google Scholar
Digital Library
- K. Kawachiya, A. Koseki, and T. Onodera. "Lock reservation: Java locks can mostly do without atomic operations". In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 292--310, 2002. Google Scholar
Digital Library
- R. Jones and C. Ryder. "A Study of Java Object Demographics". In Proceedings of the ACM International Symposium on Memory Management, pp. 121--130, 2008. Google Scholar
Digital Library
- M. L. Seidl and B. G. Zorn. "Segregating heap objects by reference behavior and lifetime". In Proceedings of the eighth Architectural Support for Programming Languages and Operating Systems, pp 12--23, 1998. Google Scholar
Digital Library
- F. E. Levine. "A programmer's view of performance monitoring in the PowerPC microprocessor". IBM Journal of Research and Development, Vol 41 (3), pp. 345--356, 1997. Google Scholar
Digital Library
- OProfile - A System Profiler for Linux. http://oprofile.sourceforge.net/news/Google Scholar
- Intel Corp. IA-32 Intel Architecture Software Developer's Manual.Google Scholar
- JVM Tool Interface version 1.0. http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/jvmti.htmlGoogle Scholar
- M. Jump, S. M. Blackburn, and K.S. McKinley. "Dynamic object sampling for pretenuring", In Proceedings of the International Symposium on Memory Management, pp. 152--162, 2004. Google Scholar
Digital Library
- M. Hauswirth and T. M. Chilimbi. "Low-overhead memory leak detection using adaptive statistical profiling", in Proceedings of the international conference on Architectural support for programming languages and operating systems table of contents, pp. 156--164, 2004. Google Scholar
Digital Library
- M. Arnold, and B. G. Ryder. "A framework for reducing the cost of instrumented code". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 168--179, 2001. Google Scholar
Digital Library
- J. M. Spivey. "Fast, Accurate Call Graph Profiling". Software: Practice and Experience, Vol. 34 (3), pp. 249--264, 2004. Google Scholar
Digital Library
- M. D. Bond, and K. S. McKinley. "Probabilistic Calling Context". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 97--112, 2007. Google Scholar
Digital Library
- X. Zhuang, M. J. Serrano, H. W. Cain, and J Choi. "Accurate, efficient, and adaptive calling context profiling". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 263--271, 2006. Google Scholar
Digital Library
- M. Arnold and P. F. Sweeney. "Approximating the calling context tree via sampling". IBM Research Report, 2000.Google Scholar
- J. Whaley. "A portable sampling-based profiler for java virtualmachines". In Proceedings of ACM Java Grande, pp. 78--87, 2000. Google Scholar
Digital Library
- T. Mytkowicz, D. Coughlin, and A. Diwan. "Inferred Call Path Profiling", In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, to appear, 2009. Google Scholar
Digital Library
- F. T. Schneider, M. Payer, and T. R. Gross. "Online optimizations driven by hardware performance monitoring". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 373--382, 2007. Google Scholar
Digital Library
- J. Cuthbertson, S. Viswanathan, K. Bobrovsky, A. Astapchuk, E. Kaczmarek, and U. Srinivasan. "A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM". In Proceedings of the International Symposium on Code Generation and Optimization, pp. 190--199, 2009. Google Scholar
Digital Library
- M. Serrano and X. Zhuang, "Placement Optimization Using Data Context Collected During Garbage Collection", In Proceedings of the International Symposium on Memory Management, pp. 69--78, 2009. Google Scholar
Digital Library
- J. Dolby. "Automatic Inline Allocation of Objects", In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp 7--17, 1997. Google Scholar
Digital Library
- Power.org, Power Instruction Set Architecture Version 2.05. http://www.power.org/resources/reading/PowerISA_V2.05.pdfGoogle Scholar
- N. Grcevski, "Effective method for Java Lock Reservation for Java Virtual Machines that Have Cooperative Multithreading" 6th Workshop on Compiler-Driven Performance, 2007.Google Scholar
- D. F. Bacon, R. Konuru, C. Murthy, and M. Serrano. "Thin Locks: Featherweight Synchronization for Java". In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 258--268, 1998. Google Scholar
Digital Library
- T. Onodera and K. Kawachiya. "A study of locking objects with bimodal fields". In Proceedings of the ACM Conference on Object Oriented Programming Systems Languages and Applications, pp. 223--237, 1999. Google Scholar
Digital Library
- Performance Inspector, http://perfinsp.sourceforge.net/Google Scholar
- S. L. Graham, P. B. Kessler, and M K. McKusick. "An execution profiler for modular programs". Software: Practice and Experience, Vol. 13 (8), pp. 671--685, 1983.Google Scholar
Cross Ref
- Standard Performance Evaluation Corporation. SPECjbb2005. http://www.spec.org/jbb2005/Google Scholar
- Standard Performance Evaluation Corporation. SPECjvm2008. http://www.spec.org/jvm2008/Google Scholar
- The Apache Software Foundation. DayTrader. http://cwiki.apache.org/GMOxDOC20/daytrader.htmlGoogle Scholar
- IBM Corporation. WebSphere Application Server. http://www-01.ibm.com/software/webservers/appserv/was/Google Scholar
Index Terms
How a Java VM can get more from a hardware performance monitor
Recommendations
How a Java VM can get more from a hardware performance monitor
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applicationsThis paper describes our sampling-based profiler that exploits a processor's HPM (Hardware Performance Monitor) to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level event ...
Fast Java profiling with scheduling-aware stack fragment sampling and asynchronous analysis
PPPJ '14: Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform: Virtual machines, Languages, and ToolsSampling is a popular approach to profiling because it typically has only a small impact on performance and does not modify the profiled application. Common sampling profilers collect data about an application by pausing the application threads, walking ...
Accurate, efficient, and adaptive calling context profiling
PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and ImplementationCalling context profiles are used in many inter-procedural code optimizations and in overall program understanding. Unfortunately, the collection of profile information is highly intrusive due to the high frequency of method calls in most applications. ...







Comments