skip to main content
research-article

Statistical debugging for real-world performance problems

Authors Info & Claims
Published:15 October 2014Publication History
Skip Abstract Section

Abstract

Design and implementation defects that lead to inefficient computation widely exist in software. These defects are difficult to avoid and discover. They lead to severe performance degradation and energy waste during production runs, and are becoming increasingly critical with the meager increase of single-core hardware performance and the increasing concerns about energy constraints. Effective tools that diagnose performance problems and point out the inefficiency root cause are sorely needed.

The state of the art of performance diagnosis is preliminary. Profiling can identify the functions that consume the most computation resources, but can neither identify the ones that waste the most resources nor explain why. Performance-bug detectors can identify specific type of inefficient computation, but are not suited for diagnosing general performance problems. Effective failure diagnosis techniques, such as statistical debugging, have been proposed for functional bugs. However, whether they work for performance problems is still an open question.

In this paper, we first conduct an empirical study to understand how performance problems are observed and reported by real-world users. Our study shows that statistical debugging is a natural fit for diagnosing performance problems, which are often observed through comparison-based approaches and reported together with both good and bad inputs. We then thoroughly investigate different design points in statistical debugging, including three different predicates and two different types of statistical models, to understand which design point works the best for performance diagnosis. Finally, we study how some unique nature of performance bugs allows sampling techniques to lower the overhead of run-time performance diagnosis without extending the diagnosis latency.

References

  1. http://sourceware.org/binutils/docs/gprof/.Google ScholarGoogle Scholar
  2. M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen. Performance debugging for distributed systems of black boxes. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Altman, M. Arnold, S. Fink, and N. Mitchell. Performance analysis of idle programs. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Alves, M. Gligoric, V. Jagannath, and M. d'Amorim. Fault-localization using dynamic slicing and change impact analysis. In ASE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Andrzejewski, A. Mulhern, B. Liblit, and X. Zhu. Statistical debugging using latent topic models. In Proceedings of the 18th European conference on Machine Learning, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Arulraj, P.-C. Chang, G. Jin, and S. Lu. Production-run software failure diagnosis via hardware performance counters. In ASPLOS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Attariyan, M. Chow, and J. Flinn. X-ray: automating root-cause diagnosis of performance anomalies in production software. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003. ISSN 1532-4435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T.-H. Chen, W. Shang, Z. M. Jiang, A. E. Hassan, M. Nasser, and P. Flora. Detecting performance anti-patterns for applications developed using object-relational mapping. In ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Chipounov, V. Kuznetsov, and G. Candea. S2E: a platform for in-vivo multi-path analysis of software systems. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Diwan, M. Hauswirth, T. Mytkowicz, and P. F. Sweeney. Traceanalyzer: a system for processing performance traces. Softw., Pract. Exper., 41(3):267--282, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Dufour, B. G. Ryder, and G. Sevitsky. A scalable technique for characterizing the usage of temporaries in framework-intensive java applications. In FSE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In SOSP, pages 57--72, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Fonseca, M. J. Freedman, and G. Porter. Experiences with tracing causality in networked services. In Internet network management conference on Research on enterprise networking, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. C. Hunt. Debugging in the (very) large: ten years of implementation and experience. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Gupta, H. He, X. Zhang, and R. Gupta. Locating faulty code using failure-inducing chops. In ASE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie. Performance debugging in the large via mining millions of stack traces. In ICSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Hangal and M. S. Lam. Tracking down software bugs using automatic anomaly detection. In ICSE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. ACM Trans. Program. Lang. Syst., 12(1):26--60, Jan. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and sampling strategies for cooperative concurrency bug isolation. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Jin, L. Song, X. Shi, J. Scherpelz, and S. Lu. Understanding and detecting real-world performance bugs. In PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In ICSE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. P. Kasick, J. Tan, R. Gandhi, and P. Narasimhan. Blackbox problem diagnosis in parallel file systems. In FAST, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Killian, K. Nagaraj, S. Pervez, R. Braud, J. W. Anderson, and R. Jhala. Finding latent performance bugs in systems implementations. In FSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. H. Kim, J. Rhee, H. Zhang, N. Arora, G. Jiang, X. Zhang, and D. Xu. Introperf: Transparent context-sensitive multilayer performance inference using system stack traces. SIGMETRICS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI, Nov 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now?: an empirical study of bug characteristics in modern open source software. In ASID, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Liu and E. D. Berger. Sheriff: precise detection and automatic mitigation of false sharing. In OOPSLA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Liu, C. Xu, and S.-C. Cheung. Characterizing and detecting performance bugs for smartphone applications. In ICSE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study of real world concurrency bug characteristics. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. Molyneaux. The Art of Application Performance Testing: Help for Programmers and Quality Assurance. O'Reilly Media, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Nistor, T. Jiang, and L. Tan. Discovering, reporting, and fixing performance bugs. In The 10th Working Conference on Mining Software Repositories, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Nistor, L. Song, D. Marinov, and S. Lu. Toddler: Detecting performance problems via similar memory-access patterns. In ICSE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. OProfile. OProfile - A System Profiler for Linux. http://oprofile.sourceforge.net.Google ScholarGoogle Scholar
  39. R. R. Sambasivan, A. X. Zheng, M. De Rosa, E. Krevat, S. Whitman, M. Stroucken, W. Wang, L. Xu, and G. R. Ganger. Diagnosing performance changes by comparing request flows. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Santelices, J. A. Jones, Y. Yu, and M. J. Harrold. Lightweight fault-localization using multiple coverage types. In ICSE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. U. Smith and L. G. Williams. Software performance antipatterns. In Proceedings of the 2nd international workshop on Software and performance, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wikipedia. Z-test. http://en.wikipedia.org/wiki/Z-test.Google ScholarGoogle Scholar
  43. X. Xiao, S. Han, T. Xie, and D. Zhang. Context-sensitive delta inference for identifying workload-dependent performance bottlenecks. In ISSTA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. G. Xu and A. Rountev. Detecting inefficiently-used containers to avoid bloat. In PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Xu, M. Arnold, N. Mitchell, A. Rountev, and G. Sevitsky. Go with the flow: profiling copies to find runtime bloat. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. G. Xu, N. Mitchell, M. Arnold, A. Rountev, E. Schonberg, and G. Sevitsky. Finding low-utility data structures. In PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detecting large-scale system problems by mining console logs. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. X. Yu, S. Han, D. Zhang, and T. Xie. Comprehending performance from real-world execution traces: A device-driver case. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Zaman, B. Adams, and A. E. Hassan. A qualitative study on performance bugs. In The 9th Working Conference on Mining Software Repositories, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Zeller. Isolating cause-effect chains from computer programs. In FSE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Statistical debugging for real-world performance problems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 49, Issue 10
        OOPSLA '14
        October 2014
        907 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2714064
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
          October 2014
          946 pages
          ISBN:9781450325851
          DOI:10.1145/2660193

        Copyright © 2014 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 October 2014

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!