Abstract
Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for communication modeling due to its lossless yet scalable trace collection. Estimating the impact of scaling on communication efficiency still remains non-trivial due to execution-time variations and exposure to hardware and software artifacts. This work contributes a fundamentally novel modeling scheme. We synthetically generate the application trace for large numbers of nodes by extrapolation from a set of smaller traces. We devise an innovative approach for topology extrapolation of single program, multiple data (SPMD) codes with stencil or mesh communication. The extrapolated trace can subsequently be (a) replayed to assess communication requirements before porting an application, (b) transformed to auto-generate communication benchmarks for various target platforms, and (c) analyzed to detect communication inefficiencies and scalability limitations. To the best of our knowledge, rapidly obtaining the communication behavior of parallel applications at arbitrary scale with the availability of timed replay, yet without actual execution of the application at this scale is without precedence and has the potential to enable otherwise infeasible system simulation at the exascale level.
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google Scholar
Digital Library
- D.H. Bailey and A. Snavely. Performance modeling: Understanding the present and predicting the future. In Euro-Par Conference, August 2005. Google Scholar
Digital Library
- H. Brunst, D. Kranzlmüller, and W. Nagel. Tools for Scalable Parallel Program Analysis - Vampir NG and DeWiz. The International Series in Engineering and Computer Science, Distributed and Parallel Systems, 777:92--102, 2005.Google Scholar
- Z. Eckert and G. Nutt. Trace extrapolation for parallel programs on shared memory multiprocessors. Technical Report TR CU-CS-804-96, Department of Computer Science, University of Colorado at Boulder, Boulder, CO, 1996.Google Scholar
- Zulah K. F. Eckert and Gary J. Nutt. Parallel program trace extrapolation. In International Conference on Parallel Processing, pages 103--107, 1994. Google Scholar
Digital Library
- Ahmad Faraj, Pitch Patarasuk, and Xin Yuan. A study of process arrival patterns for MPI collective operations. In International Conference on Supercomputing, pages 168--179, June 2007. Google Scholar
Digital Library
- W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, September 1996. Google Scholar
Digital Library
- Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. Efficiently exploring architectural design spaces via predictive modeling. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 195--206, 2006. Google Scholar
Digital Library
- D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Supercomputing, November 2001. Google Scholar
Digital Library
- Darren J. Kerbyson and Adolfy Hoisie. Performance modeling of the blue gene architecture. In JVA'06: Proceedings of the IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing, pages 252--259, 2006. Google Scholar
Digital Library
- A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006. Google Scholar
Digital Library
- Jesús Labarta, Sergi Girona, and Toni Cortes. Analyzing scheduling policies using dimemas. Parallel Computing, 23(1-2):23--34, 1997. Google Scholar
Digital Library
- W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.Google Scholar
- M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing, 69(8):969--710, August 2009. Google Scholar
Digital Library
- V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, volume 44 of Transputer and Occam Engineering, pages 17--31, April 1995.Google Scholar
- Robert Preissl, Thomas Köckerbauer, Martin Schulz, Dieter Kranzlmüller, Bronis R. de Supinski, and Daniel J. Quinlan. Detecting patterns in mpi communication traces. In ICPP'08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 230--237, Washington, DC, USA, 2008. IEEE Computer Society. Google Scholar
Digital Library
- Robert Preissl, Martin Schulz, Dieter Kranzlmüller, Bronis R. Supinski, and Daniel J. Quinlan. Using mpi communication patterns to guide source code transformations. In ICCS '08: Proceedings of the 8th international conference on Computational Science, Part III, pages 253--260, Berlin, Heidelberg, 2008. Springer-Verlag. Google Scholar
Digital Library
- P. Ratn, F. Mueller, Bronis R. de Supinski, and M. Schulz. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46--55, June 2008. Google Scholar
Digital Library
- Arun F Rodrigues, Richard C Murphy, Peter Kogge, and Keith D Underwood. The structural simulation toolkit: exploring novel architectures. In Poster at the 2006 ACM/IEEE Conference on Supercomputing, page 157, 2006. Google Scholar
Digital Library
- Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, 2002. Google Scholar
Digital Library
- A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for performance modeling and prediction. In Supercomputing, November 2002. Google Scholar
Digital Library
- J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001. Google Scholar
Digital Library
- Qiang Xu, Ravi Prithivathi, Jaspal Subhlok, and Rong Zheng. Logicalization of mpi communication traces. Technical Report UH-CS-08-07, Dept. of Computer Science, University of Houston, 2008.Google Scholar
- Qiang Xu and Jaspal Subhlok. Construction and evaluation of coordinated performance skeletons. In International Conference on High Performance Computing, pages 73--86, 2008. Google Scholar
Digital Library
- J. Zhai, W. Chen, and W. Zheng. Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 305--314, 2010. Google Scholar
Digital Library
- J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. Fact: fast communication trace collection for parallel applications through program slicing. In Supercomputing, pages 1--12, 2009. Google Scholar
Digital Library
Index Terms
ScalaExtrap: trace-based communication extrapolation for spmd programs
Recommendations
ScalaExtrap: Trace-based communication extrapolation for SPMD programs
Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingPerformance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
Portable and Transparent Host-Device Communication Optimization for GPGPU Environments
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationGeneral purpose graphics processors units (GPU) provide the potential for high computational performance with reduced cost and power. Typically they are employed in heterogeneous settings acting as accelerators. Here an application resides on a host ...







Comments