skip to main content
article

Performance tuning with instruction-level cost derived from call-stack sampling

Published:01 August 2007Publication History
Skip Abstract Section

Abstract

Except for program-counter histogramming, most modern profiling tools summarize at the level of entire functions or basic blocks, with or without additional information such as calling context or call graphs. This paper explicates the value of information about the cost of specific instructions, relative to summaries that do not include it. A good source of this information is time-random sampling of the call stack. To get the diagnostic benefit of instruction costs it is not necessary to measure them with high precision or efficiency. In fact, manual sampling suffices quite well, when it can be used. Other benefits of call stack sampling are that it can be used with unmodified software and libraries, and it is easily confined to the time intervals of interest. As with other profiling techniques, it can be employed repeatedly to remove all significant performance problems in single-thread programs.

References

  1. {Ammons97} Glenn Ammons, Thomas Ball, James Larus, Exploiting hardware performance counters with flow and context sensitive profiling, ACM SIGPLAN, PLDI-97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {Ammons04} Glenn Ammons, Jong-Deok Choi, Manish Gupta, Nikhil Swamy, Finding and Removing Performance Bottlenecks in Large Systems, European conference on object-oriented programming, Oslo, Norway, 2004, http://pages.cs.wisc.edu/~ammons/bottlenecks.pdfGoogle ScholarGoogle Scholar
  3. {Dunlavey} Michael Dunlavey, Performance Tuning: Slugging It Out!, Dr. Dobb's Journal, Vol 18, #12, November 1993, pp 18-26. Also: Building Better Applications: a Theory of Efficient Software Development, International Thomson Publications, NY 1994, ISBN 0442017405 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {Froyd05} Nathan Froyd, John Mellor-Crummey, Rob Fowler, Low-Overhead Call Path Profiling of Unmodified, Optimized Code, Proceedings, 19th annual conference on Supercomputing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {Graham04} Susan Graham, Peter Kessler, Marshall McKusick, gprof: a Call Graph Execution Profiler, ACM SIGPLAN Notices, Vol. 39, #4, April 2004, pp 49-57 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {SGI} SGI Altix Applications Development and Optimization, http://sc.tamu.edu/help/SGI.Tutorial/sgi-tutorial.pdfGoogle ScholarGoogle Scholar
  7. {Sun} Sun Studio Performance Analyzer. http://developers.sun.com/sunstudio/analyzer_index.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Performance tuning with instruction-level cost derived from call-stack sampling

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!