skip to main content
research-article

Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications

Published:29 October 2013Publication History
Skip Abstract Section

Abstract

Understanding and analyzing multi-threaded program performance and scalability is far from trivial, which severely complicates parallel software development and optimization. In this paper, we present bottle graphs, a powerful analysis tool that visualizes multi-threaded program performance, in regards to both per-thread parallelism and execution time. Each thread is represented as a box, with its height equal to the share of that thread in the total program execution time, its width equal to its parallelism, and its area equal to its total running time. The boxes of all threads are stacked upon each other, leading to a stack with height equal to the total program execution time. Bottle graphs show exactly how scalable each thread is, and thus guide optimization towards those threads that have a smaller parallel component (narrower), and a larger share of the total execution time (taller), i.e. to the 'neck' of the bottle.

Using light-weight OS modules, we calculate bottle graphs for unmodified multi-threaded programs running on real processors with an average overhead of 0.68%. To demonstrate their utility, we do an extensive analysis of 12 Java benchmarks running on top of the Jikes JVM, which introduces many JVM service threads. We not only reveal and explain scalability limitations of several well-known Java benchmarks; we also analyze the reasons why the garbage collector itself does not scale, and in fact performs optimally with two collector threads for all benchmarks, regardless of the number of application threads. Finally, we compare the scalability of Jikes versus the OpenJDK JVM. We demonstrate how useful and intuitive bottle graphs are as a tool to analyze scalability and help optimize multi-threaded applications.

References

  1. B. Alpern, C. R. Attanasio, A. Cocchi, D. Lieber, S. Smith, T. Ngo, J. J. Barton, S. F. Hummel, J. C. Sheperd, and M. Mergen. Implementing Jalapeño in Java. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 314--324, Nov. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Altman, M. Arnold, S. Fink, and N. Mitchell. Performance analysis of idle programs. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 739--753, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the Annual ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 22--32, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. L. Hosking, M. Jump, H. B. Lee, J. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 169--190, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K.-Y. Chen, J. M. Chang, and T.-W. Hou. Multithreading in Java: Performance and scalability on multicore systems. IEEE Transactions on Computers, 60 (11): 1521--1534, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Demme and S. Sethumadhavan. Rapid identification of architectural bottlenecks via precise event counting. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pages 353--364, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pages 511--522, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Eyerman, K. Du Bois, and L. Eeckhout. Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications. In Proceedings of the International Symposium on Performance Analysis of Software and Systems (ISPASS), pages 145--155, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor. Kremlin: Rethinking and rebooting gprof for the multicore age. In Proceedings of the Annual ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 458--469, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. ACM SIGOPS: Operating Systems Review, 45 (3), Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicore. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 229--240, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel. Intel VTune#8482; Amplifier XE 2011. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google ScholarGoogle Scholar
  13. M. Itzkowitz and Y. Maruyama. HPC profiling with the Sun Studio#8482; performance tools. In Tools for High Performance Computing 2009, pages 67--93. Springer, 2010.Google ScholarGoogle Scholar
  14. L. K. John. More on finding a single number to indicate overall performance of a benchmark suite. ACM SIGARCH Computer Architecture News, 32 (4): 1--14, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Kalibera, M. Mole, R. Jones, and J. Vitek. A black-box approach to understanding concurrency in DaCapo. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 335--354, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kambadur, K. Tang, and M. A. Kim. Harmony: Collection and analysis of parallel block vectors. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pages 452--463, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. OpenJDK. OpenJDK (Implementation of the Java SE 6 Specification), Version 1.6. Oracle, 2006. URL http://openjdk.java.net/projects/jdk6/.Google ScholarGoogle Scholar
  18. J. B. Sartor and L. Eeckhout. Exploring multi-threaded Java application performance on multicore hardware. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA), pages 281--296, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. SPEC. SPECjbb2005 (Java Server Benchmark), Release 1.07. Standard Performance Evaluation Corporation, 2006. URL http://www.spec.org/jbb2005.Google ScholarGoogle Scholar
  20. STMicroelectronics. PGProf: parallel profiling for scientists and engineers. http://www.pgroup.com/products/pgprof.htm, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 48, Issue 10
          OOPSLA '13
          October 2013
          867 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2544173
          Issue’s Table of Contents
          • cover image ACM Conferences
            OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
            October 2013
            904 pages
            ISBN:9781450323741
            DOI:10.1145/2509136

          Copyright © 2013 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2013

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!