ABSTRACT
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.
- G. S. Almasi, C. Cascaval et al., "Demonstrating the scalability of a molecular dynamics application on a Petaflop computer," Proc. Int'l Conf. Supercomputing, 2001, pp. 393-406.]] Google Scholar
Digital Library
- J. M. Anderson, L. M. Berc et al., "Continuous profiling: where have all the cycles gone?," ACM Trans. Computer Systems, 15(4):357-90, 1997.]] Google Scholar
Digital Library
- T. E. Anderson and E. D. Lazowska, "Quartz: A Tool for Tuning Parallel Program Performance," Proc. 1990 SIGMETRICS Conf. Measurement and Modeling Computer Systems, 1990, pp. 115-25.]] Google Scholar
Digital Library
- R. Bosch, C. Stolte et al., "Rivet: a flexible environment for computer systems visualization," Computer Graphics, 34(1):68-73, 2000.]] Google Scholar
Digital Library
- P. N. Brown, R. D. Falgout, and J. E. Jones, "Semicoarsening multigrid on distributed memory machines," SIAM Journal on Scientific Computing, 21(5):1823-34, 2000.]] Google Scholar
Digital Library
- J. Caubet, J. Gimenez et al., "A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications," Proc. Workshop on OpenMP Applications and Tools (WOMPAT), 2001.]] Google Scholar
Digital Library
- K. C. Claffy, G. C. Polyzos, and H.-W. Braun, "Application of sampling methodologies to network traffic characterization," Proc. SIGCOMM: Communications architectures, protocols and applications, 1993, pp. 194-203.]] Google Scholar
Digital Library
- G. A. Geist, M. T. Heath et al., "A Users' Guide to PICL - A Portable Instrumented Communication Library," Oak Ridge National Laboratory, P.O.Box 2009, Bldg. 9207-A, Oak Ridge, TN 37831-8083 1991.]]Google Scholar
- S. L. Graham, P. B. Kessler, and M. K. McKusick, "Gprof: A Call Graph Execution Profiler," SIGPLAN Notices (SIGPLAN '82 Symp. Compiler Construction), 17(6):120-6, 1982.]] Google Scholar
Digital Library
- W. Gropp, E. Lusk, and A. Skjellum, Using MPI: portable parallel programming with the message-passing interface, 2nd ed. Cambridge, MA: MIT Press, 1999.]] Google Scholar
Digital Library
- W. D. Gropp, E. Lusk, and D. Swider, "Improving the Performance of MPI Derived Datatypes," Proc. MPI Developers and Users Conference (MPIDC), 1999.]]Google Scholar
- W. Gu, G. Eisenhauer et al., "Falcon: On-line Monitoring and Steering of Parallel Programs," Concurrency: Practice and Experience, 10(9):699-736, 1998.]]Google Scholar
Cross Ref
- M. T. Heath, A. D. Malony, and D. T. Rover, "Parallel performance visualization: from practice to theory," IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):44-60, 1995.]] Google Scholar
Digital Library
- J. Hoeflinger, B. Kuhn et al., "An Integrated Performance Visualizer for OpenMP/MPI Programs," Proc. Workshop on OpenMP Applications and Tools (WOMPAT), 2001.]] Google Scholar
Digital Library
- A. Hoisie, O. Lubeck et al., "A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs," Proc. ICPP 2000, 2000.]] Google Scholar
Digital Library
- K. R. Koch, R. S. Baker, and R. E. Alcouffe, "Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor," Trans. Amer. Nuc. Soc., 65(198), 1992.]]Google Scholar
- J. Labarta, S. Girona et al., "DiP: A Parallel Program Development Environment," CEPBA, Barcelona, Spain 1996.]]Google Scholar
- A. D. Malony and D. A. Reed, "Visualizing Parallel Computer System Performance," in Parallel Computer Systems: Performance Instrumentation and Visualization, M. S. Bucher, Ed. New York: ACM, 1990.]] Google Scholar
Digital Library
- A. A. Mirin, R. H. Cohen et al., "Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System," Proc. SC99: High Performance Networking and Computing Conf. (electronic publication), 1999.]] Google Scholar
Digital Library
- D. A. Reed, P. C. Roth et al., "Scalable performance analysis: the Pablo performance analysis environment," Proc. Scalable Parallel Libraries Conf., 1994, pp. 104-13.]]Google Scholar
Cross Ref
- S. Shende, A. D. Malony et al., "Portable profiling and tracing for parallel, scientific applications using C++," Proc. SIGMETRICS Symp. Parallel and Distributed Tools (SPDT), 1998, pp. 134-45.]] Google Scholar
Digital Library
- M. Snir, S. Otto et al., Eds., MPI--the complete reference, 2nd ed. Cambridge, MA: MIT Press, 1998.]] Google Scholar
Digital Library
- J. Stasko, J. Domingue et al., Eds., Software Visualization: Programming as a Multimedia Experience,. Cambridge, MA: MIT Press, 1998.]] Google Scholar
Digital Library
- J. S. Vetter, "Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies," Proc. ACM Int'l Conf. Supercomputing (ICS), 2000, pp. 245 - 54.]] Google Scholar
Digital Library
- J. S. Vetter and F. Mueller, "Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures," Proc. International Parallel and Distributed Processing Symposium (IPDPS), 2002.]] Google Scholar
Digital Library
- C. E. Wu, A. Bolmarcich et al., "From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems," Proc. SC2000: High Performance Networking and Computing, 2000.]] Google Scholar
Digital Library
Index Terms
(auto-classified)Dynamic statistical profiling of communication activity in distributed applications
Recommendations
Dynamic statistical profiling of communication activity in distributed applications
Measurement and modeling of computer systemsPerformance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of ...
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
CGO '07: Proceedings of the International Symposium on Code Generation and OptimizationIn profiling, a tradeoff exists between information and overhead. For example, hardware-sampling profilers incur negligible overhead, but the information they collect is consequently very coarse. Other profilers use instrumentation tools to gather ...
Low-overhead memory leak detection using adaptive statistical profiling
ASPLOS '04Sampling has been successfully used to identify performance optimization opportunities. We would like to apply similar techniques to check program correctness. Unfortunately, sampling provides poor coverage of infrequently executed code, where bugs ...







Comments