ABSTRACT
Hardware performance monitors provide detailed direct feedback about application behavior and are an additional source of infor-mation that a compiler may use for optimization. A JIT compiler is in a good position to make use of such information because it is running on the same platform as the user applications. As hardware platforms become more and more complex, it becomes more and more difficult to model their behavior. Profile information that captures general program properties (like execution frequency of methods or basic blocks) may be useful, but does not capture sufficient information about the execution platform. Machine-level performance data obtained from a hardware performance monitor can not only direct the compiler to those parts of the program that deserve its attention but also determine if an optimization step actually improved the performance of the application.
This paper presents an infrastructure based on a dynamic compiler+runtime environment for Java that incorporates machine-level information as an additional kind of feedback for the compiler and runtime environment. The low-overhead monitoring system provides fine-grained performance data that can be tracked back to individual Java bytecode instructions. As an example, the paper presents results for object co-allocation in a generational garbage collector that optimizes spatial locality of objects on-line using measurements about cache misses. In the best case, the execution time is reduced by 14% and L1 cache misses by 28%.
- Perfmon project. http://www.hpl.hp.com/research/linux/perfmon/.Google Scholar
- IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. 2005.Google Scholar
- A.-R. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. Prefetch injection based on hardware monitoring and object metadata. In Proc. of the ACM Conf. on Programming Language Design and Implementation (PLDI 2004), pages 267--276, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- B. Alpern, C. R. Attanasio, J. J. Barton, A. Cocchi, S. F. Hummel, D. Lieber, T. Ngo, M. F. Mergen, J. C. Shepherd, and S. Smith. Implementing Jalapeno in Java. In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPLSA 1999), pages 314--324, 1999. Google Scholar
Digital Library
- B. Alpern, D. Attanasio, J. Barton, M. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, T. Ngo, M. Mergen, V. Sarkar, M. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM Systems Journal, Java Performance Issue, 39(1), 2000. Google Scholar
Digital Library
- A. W. Appel. Simple generational garbage collection and fast allocation. Softw. Pract. Exper., 19(2):171--183, 1989. Google Scholar
Digital Library
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2000), pages 47--65, New York, 2000. ACM Press. Google Scholar
Digital Library
- M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of java. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 111--129, New York, USA, 2002. ACM Press. Google Scholar
Digital Library
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: the performance impact of garbage collection. In SIGMETRICS 2004/PERFORMANCE 2004: Proceedings of the joint international conference on Measurement and modeling of computer systems, pages 25--36, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? high performance garbage collection in java with mmtk. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 137--146. IEEE Computer Society, 2004. Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. of the Conf. on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2006), New York, Oct. 2006. ACM Press. Google Scholar
Digital Library
- P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 21(12):1301--1321, Dec 1991. Google Scholar
Digital Library
- T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Procof the ACM SIGPLAN'99 Conf. on Programming Language Design and Implementation (PLDI 1999), pages 13--24, New York, NY, USA, 1999. ACM Press. Google Scholar
Digital Library
- M. Cierniak, G.-Y. Lueh, and J. M. Stichnoth. Practicing judo: Java under dynamic optimizations. In Procof the ACM Conf on Programming Language Design and Implementation (PLDI 2000), pages 13--26, New York, NY, USA, 2000. ACM Press. Google Scholar
Digital Library
- A. Georges, D. Buytaert, L. Eeckhout, and K. D. Bosschere. Method-level phase behavior in java workloads. In Proc. of the ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 270--287, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proc. of Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 251--269, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Procof the ACM Confon Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 69--80, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- X. Huang, B. T. Lewis, and K. S. McKinley. Dynamic code management: Improving whole program code locality in managed runtimes. In VEE '06: Proc. of the second international Conf. on Virtual Execution Environments, pages 133--143, New York, USA, 2006. ACM Press. Google Scholar
Digital Library
- T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. ACM Trans. Program. Lang. Syst., 22(3):490--505, 2000. Google Scholar
Digital Library
- J. Lau, M. Arnold, M. Hind, and B. Calder. Online performance auditing: Using hot optimizations without getting burned. In Proc. Conf. on Programming Language Design and Implementation (PLDI 2006), pages 239--251, New York, USA, 2006. ACM Press. Google Scholar
Digital Library
- K. Pettis and R. Hansen. Profile guided code positioning. In Proc. ACM SIGPLAN'90 Conf. on Prog. Language Design and Implementation, pages 16--27, White Plains, N.Y., June 1990. ACM. Google Scholar
Digital Library
- S. Rubin, R. Bodik, and T. Chilimbi. An efficient Profile-Analysis framework for data-layout optimizations. In Procof the Sympon Principles Of Programming Languages (POPL 2002), pages 140--153, New York, NY, USA, 2002. ACM Press. Google Scholar
Digital Library
- F. Schneider and T. Gross. Using platform-specific performance counters for dynamic compilation. In Proc. of the International Workshop on Compilers for Parallel Computing (LCPC 2005), Oct. 2005. Google Scholar
Digital Library
- Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and preserving locality of java applications at allocation and garbage collection times. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 13--25, New York, 2002. ACM Press. Google Scholar
Digital Library
- D. Siegwart and M. Hirzel. Improving locality with parallel hierarchical copying gc. In Proceedings of the 2006 International Symposium on Memory Management (ISMM 2006), pages 52--63, New York, USA, 2006. ACM Press. Google Scholar
Digital Library
- B. Sprunt. Pentium 4 performance monitoring features. In IEEE Micro, pages 72--82, July-August 2002. Google Scholar
Digital Library
- T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a java just-in-time compiler. In Proc. of the ACM Conf. on Object Oriented Programming, Systems, Languages, and Applications (OOPLSA 2001), pages 180--195, New York, NY, USA, 2001. ACM Press. Google Scholar
Digital Library
- The Standard Performance Evaluation Corporation. SPEC JBB2000 Benchmark. http://www.spec.org/jbb2000/.Google Scholar
- The Standard Performance Evaluation Corporation. SPEC JVM98 Benchmarks. http://www.spec.org/osg/jvm98, 1996.Google Scholar
- D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In Proc. of the Software Engineering Symposium on Practical Software Development Environments (SDE 1), pages 157--167, New York, USA, 1984. ACM Press. Google Scholar
Digital Library
Index Terms
Online optimizations driven by hardware performance monitoring
Recommendations
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 PLDI conferenceHardware performance monitors provide detailed direct feedback about application behavior and are an additional source of infor-mation that a compiler may use for optimization. A JIT compiler is in a good position to make use of such information because ...
Performance driven data cache prefetching in a dynamic software optimization system
ICS '07: Proceedings of the 21st annual international conference on SupercomputingSoftware or hardware data cache prefetching is an efficient way to hide cache miss latency. However effectiveness of the issued prefetches have to be monitored in order to maximize their positive impact while minimizing their negative impact on ...
Managing bounded code caches in dynamic binary optimization systems
Dynamic binary optimizers store altered copies of original program instructions in software-managed code caches in order to maximize reuse of transformed code. Code caches store code blocks that may vary in size, reference other code blocks, and carry a ...







Comments