skip to main content
10.1145/1250734.1250777acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Online optimizations driven by hardware performance monitoring

Published:10 June 2007Publication History

ABSTRACT

Hardware performance monitors provide detailed direct feedback about application behavior and are an additional source of infor-mation that a compiler may use for optimization. A JIT compiler is in a good position to make use of such information because it is running on the same platform as the user applications. As hardware platforms become more and more complex, it becomes more and more difficult to model their behavior. Profile information that captures general program properties (like execution frequency of methods or basic blocks) may be useful, but does not capture sufficient information about the execution platform. Machine-level performance data obtained from a hardware performance monitor can not only direct the compiler to those parts of the program that deserve its attention but also determine if an optimization step actually improved the performance of the application.

This paper presents an infrastructure based on a dynamic compiler+runtime environment for Java that incorporates machine-level information as an additional kind of feedback for the compiler and runtime environment. The low-overhead monitoring system provides fine-grained performance data that can be tracked back to individual Java bytecode instructions. As an example, the paper presents results for object co-allocation in a generational garbage collector that optimizes spatial locality of objects on-line using measurements about cache misses. In the best case, the execution time is reduced by 14% and L1 cache misses by 28%.

References

  1. Perfmon project. http://www.hpl.hp.com/research/linux/perfmon/.Google ScholarGoogle Scholar
  2. IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. 2005.Google ScholarGoogle Scholar
  3. A.-R. Adl-Tabatabai, R. L. Hudson, M. J. Serrano, and S. Subramoney. Prefetch injection based on hardware monitoring and object metadata. In Proc. of the ACM Conf. on Programming Language Design and Implementation (PLDI 2004), pages 267--276, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Alpern, C. R. Attanasio, J. J. Barton, A. Cocchi, S. F. Hummel, D. Lieber, T. Ngo, M. F. Mergen, J. C. Shepherd, and S. Smith. Implementing Jalapeno in Java. In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPLSA 1999), pages 314--324, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Alpern, D. Attanasio, J. Barton, M. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, T. Ngo, M. Mergen, V. Sarkar, M. Serrano, J. Shepherd, S. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalapeno virtual machine. IBM Systems Journal, Java Performance Issue, 39(1), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. W. Appel. Simple generational garbage collection and fast allocation. Softw. Pract. Exper., 19(2):171--183, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2000), pages 47--65, New York, 2000. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Arnold, M. Hind, and B. G. Ryder. Online feedback-directed optimization of java. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 111--129, New York, USA, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: the performance impact of garbage collection. In SIGMETRICS 2004/PERFORMANCE 2004: Proceedings of the joint international conference on Measurement and modeling of computer systems, pages 25--36, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? high performance garbage collection in java with mmtk. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 137--146. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. of the Conf. on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2006), New York, Oct. 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 21(12):1301--1321, Dec 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Procof the ACM SIGPLAN'99 Conf. on Programming Language Design and Implementation (PLDI 1999), pages 13--24, New York, NY, USA, 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Cierniak, G.-Y. Lueh, and J. M. Stichnoth. Practicing judo: Java under dynamic optimizations. In Procof the ACM Conf on Programming Language Design and Implementation (PLDI 2000), pages 13--26, New York, NY, USA, 2000. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Georges, D. Buytaert, L. Eeckhout, and K. D. Bosschere. Method-level phase behavior in java workloads. In Proc. of the ACM SIGPLAN Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 270--287, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proc. of Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 251--269, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Procof the ACM Confon Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2004), pages 69--80, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Huang, B. T. Lewis, and K. S. McKinley. Dynamic code management: Improving whole program code locality in managed runtimes. In VEE '06: Proc. of the second international Conf. on Virtual Execution Environments, pages 133--143, New York, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Kistler and M. Franz. Automated data-member layout of heap objects to improve memory-hierarchy performance. ACM Trans. Program. Lang. Syst., 22(3):490--505, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lau, M. Arnold, M. Hind, and B. Calder. Online performance auditing: Using hot optimizations without getting burned. In Proc. Conf. on Programming Language Design and Implementation (PLDI 2006), pages 239--251, New York, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Pettis and R. Hansen. Profile guided code positioning. In Proc. ACM SIGPLAN'90 Conf. on Prog. Language Design and Implementation, pages 16--27, White Plains, N.Y., June 1990. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Rubin, R. Bodik, and T. Chilimbi. An efficient Profile-Analysis framework for data-layout optimizations. In Procof the Sympon Principles Of Programming Languages (POPL 2002), pages 140--153, New York, NY, USA, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Schneider and T. Gross. Using platform-specific performance counters for dynamic compilation. In Proc. of the International Workshop on Compilers for Parallel Computing (LCPC 2005), Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Shuf, M. Gupta, H. Franke, A. Appel, and J. P. Singh. Creating and preserving locality of java applications at allocation and garbage collection times. In Proc. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2002), pages 13--25, New York, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Siegwart and M. Hirzel. Improving locality with parallel hierarchical copying gc. In Proceedings of the 2006 International Symposium on Memory Management (ISMM 2006), pages 52--63, New York, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Sprunt. Pentium 4 performance monitoring features. In IEEE Micro, pages 72--82, July-August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani. A dynamic optimization framework for a java just-in-time compiler. In Proc. of the ACM Conf. on Object Oriented Programming, Systems, Languages, and Applications (OOPLSA 2001), pages 180--195, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. The Standard Performance Evaluation Corporation. SPEC JBB2000 Benchmark. http://www.spec.org/jbb2000/.Google ScholarGoogle Scholar
  29. The Standard Performance Evaluation Corporation. SPEC JVM98 Benchmarks. http://www.spec.org/osg/jvm98, 1996.Google ScholarGoogle Scholar
  30. D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In Proc. of the Software Engineering Symposium on Practical Software Development Environments (SDE 1), pages 157--167, New York, USA, 1984. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online optimizations driven by hardware performance monitoring

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2007
      508 pages
      ISBN:9781595936332
      DOI:10.1145/1250734
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 42, Issue 6
        Proceedings of the 2007 PLDI conference
        June 2007
        491 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1273442
        Issue’s Table of Contents

      Copyright © 2007 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate406of2,067submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!