skip to main content
10.1145/1542476.1542526acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Binary analysis for measurement and attribution of program performance

Published:15 June 2009Publication History

ABSTRACT

Modern programs frequently employ sophisticated modular designs. As a result, performance problems cannot be identified from costs attributed to routines in isolation; understanding code performance requires information about a routine's calling context. Existing performance tools fall short in this respect. Prior strategies for attributing context-sensitive performance at the source level either compromise measurement accuracy, remain too close to the binary, or require custom compilers. To understand the performance of fully optimized modular code, we developed two novel binary analysis techniques: 1) on-the-fly analysis of optimized machine code to enable minimally intrusive and accurate attribution of costs to dynamic calling contexts; and 2) post-mortem analysis of optimized machine code and its debugging sections to recover its program structure and reconstruct a mapping back to its source code. By combining the recovered static program structure with dynamic calling context information, we can accurately attribute performance metrics to calling contexts, procedures, loops, and inlined instances of procedures. We demonstrate that the fusion of this information provides unique insight into the performance of complex modular codes. This work is implemented in the HPCToolkit performance tools (http://hpctoolkit.org).

References

  1. V. S. Adve, J. Mellor-Crummey, M. Anderson, J.-C. Wang, D. A. Reed, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 50, New York, NY, USA, 1995. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apple Computer. Shark. http://developer.apple.com/tools/sharkoptimize.html.Google ScholarGoogle Scholar
  3. G. Brooks, G. J. Hansen, and S. Simmons. A new approach to debugging optimized code. In PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, pages 1--11, New York, NY, USA, 1992. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Buck and J. K. Hollingsworth. An API for runtime code patching. The International Journal of High Performance Computing Applications, 14(4):317--329, Winter 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Charney. XED2 user guide. http://www.pintool.org/docs/24110/Xed/html.Google ScholarGoogle Scholar
  6. R. Cohn and P. G. Lowney. Hot cold optimization of large Windows/NT applications. In MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, pages 80--89, Washington, DC, USA, 1996. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Dubey, L. Reid, and R. Fisher. Introduction to FLASH 3.0, with application to supersonic turbulence. Physica Scripta, 132:014046, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. Free Standards Group. DWARF debugging information format, version 3. http://dwarf.freestandards.org. 20 December, 2005.Google ScholarGoogle Scholar
  9. N. Froyd, J. Mellor-Crummey, and R. Fowler. Low-overhead call path profiling of unmodified, optimized code. In ICS'05: Proceedings of the 19th annual International Conference on Supercomputing, pages 81--90, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Froyd, N. Tallent, J. Mellor-Crummey, and R. Fowler. Call path profiling for unmodified, optimized binaries. In GCC Summit'06: Proceedings of the GCC Developers' Summit, 2006, pages 21--36, 2006.Google ScholarGoogle Scholar
  11. S. L. Graham, P. B. Kessler, and M. K. McKusick. Gprof: A call graph execution profiler. In SIGPLAN'82: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, pages 120--126, New York, NY, USA, 1982. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. J. Hall. Call path profiling. In ICSE'92: Proceedings of the 14th international conference on Software engineering, pages 296--306, New York, NY, USA, 1992. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Havlak. Nesting of reducible and irreducible loops. ACM Trans. Program. Lang. Syst., 19(4):557--567, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Intel Corporation. Intel performance tuning utility. \hrefhttp://software.intel.com/en--us/articles/intel-performance-tuning-utility http://software.intel.com/en-us/articles/intel-performance-\\tuning-utility.Google ScholarGoogle Scholar
  15. Intel Corporation. Intel VTune performance analyzer. http://www.intel.com/software/products/vtune.Google ScholarGoogle Scholar
  16. ITAPS working group. The ITAPS iMesh interface. http://www.tstt-scidac.org/software/documentation/iMesh_userguide.pdf.Google ScholarGoogle Scholar
  17. J. Levon et al. OProfile. http://oprofile.sourceforge.net.Google ScholarGoogle Scholar
  18. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI'05: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mellor-Crummey, R. Fowler, G. Marin, and N. Tallent. HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing, 23(1):81--104, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Monroe. ENERGY Science with DIGITAL Combustors. http://www.scidacreview.org/0602/html/combustion.html.Google ScholarGoogle Scholar
  21. D. Mosberger-Tang. libunwind. http://www.nongnu.org/libunwind.Google ScholarGoogle Scholar
  22. T. Moseley, D. A. Connors, D. Grunwald, and R. Peri. Identifying potential parallelism via loop-centric profiling. In CF'07: Proceedings of the 4th international conference on Computing frontiers, pages 143--152, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. Sweeney. Producing wrong data without doing anything obviously wrong! In Fourteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Rosenblum, X. Zhu, B. Miller, and K. Hunt. Learning to analyze binary computer code. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008), pages 798--804, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Sandmann. Sysprof. http://www.daimi.au.dk/sandmann/sysprof. 21 October 2007.Google ScholarGoogle Scholar
  26. S. S. Shende and A. D. Malony. The Tau parallel performance system. Int. J. High Perform. Comput. Appl., 20(2):287--311, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SPEC Corporation. SPEC CPU2006 benchmark suite. http://www.spec.org/cpu2006. 3 November 2007.Google ScholarGoogle Scholar
  28. N. R. Tallent. Binary analysis for attribution and interpretation of performance measurements on fully-optimized code. M.S. thesis, Department of Computer Science, Rice University, May 2007.Google ScholarGoogle Scholar
  29. T. J. Tautges. MOAB-SD: integrated structured and unstructured mesh representation. Eng. Comput. (Lond.), 20(3):286--293, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. O. Waddell and J. M. Ashley. Visualizing the performance of higher-order programs. In Proceedings of the 1998 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pages 75--82. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Zhuang, M. J. Serrano, H. W. Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In PLDI '06: Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, pages 263--271, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Binary analysis for measurement and attribution of program performance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2009
          492 pages
          ISBN:9781605583921
          DOI:10.1145/1542476
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 44, Issue 6
            PLDI '09
            June 2009
            478 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1543135
            Issue’s Table of Contents

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!