Abstract
Benchmarks are essential for evaluating HPC hardware and software for petascale machines and beyond. But benchmark creation is a tedious manual process. As a result, benchmarks tend to lag behind the development of complex scientific codes. Our work automates the creation of communication benchmarks. Given an MPI application, we utilize ScalaTrace, a lossless and scalable framework to trace communication operations and execution time while abstracting away the computations. A single trace file that reflects the behavior of all nodes is subsequently expanded to C source code by a novel code generator. This resulting benchmark code is compact, portable, human-readable, and accurately reflects the original application's communication characteristics and performance. Experimental results demonstrate that generated source code of benchmarks preserves both the communication patterns and the run-time behavior of the original application. Such automatically generated benchmarks not only shorten the transition from application development to benchmark extraction but also facilitate code obfuscation, which is essential for benchmark extraction from commercial and restricted applications.
- MPI-2: Extensions to the message passing interface, July 1997.Google Scholar
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The nInternational Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google Scholar
- H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance optimization for large scale computing: The scalable VAMPIR approach. In International Conference on Computational Science (2), pages 751--760, 2001. Google Scholar
Digital Library
- M. Casas, R. Badia, and J. Labarta. Automatic structure extraction from mpi applications tracefiles. In Euro-Par Conference, Aug. 2007. Google Scholar
Digital Library
- L. V. Ertvelde and L. Eeckhout. Dispersing proprietary applications as benchmarks through code mutation. In Architectural Support for Programming Languages and Operating Systems, pages 201--210, 2008. Google Scholar
Digital Library
- W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the bMPI message passing interface standard. Parallel Computing, 22(6):789--828, Sept. 1996. Google Scholar
Digital Library
- P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991. Google Scholar
Digital Library
- A. Knupfer. Construction and compression of complete call graphs for post-mortem program trace analysis. In International Conference on Parallel Processing, pages 165--172, 2005. Google Scholar
Digital Library
- A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the Open Trace Format (OTF). In International Conference on Computational Science, pages 526--533, May 2006. Google Scholar
Digital Library
- J. Marathe and F. Mueller. Detecting memory performance bottlenecks via binary rewriting. In Workshop on Binary Translation, Sept. 2002.Google Scholar
- M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, Apr. 2007. Google Scholar
Digital Library
- M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing, 69(8):969--710, Aug. 2009. Google Scholar
Digital Library
- V. Pillet, V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualize and analyze parallel code. In P. Nixon, editor, Proceedings of the 18th Technical Meeting of WoTUG-18: Transputer and Occam Developments, volume 44 of Transputer and Occam Engineering, pages 17--31, Manchester, United Kingdom, Apr. 9-12, 1995. IOS Press.Google Scholar
- P. Ratn, F. Mueller, B. R. de Supinski, and M. Schulz. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46--55, June 2008. Google Scholar
Digital Library
- S. Shao, A. Jones, and R. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In In International Parallel and Distributed Processing Symposium, 2006. Google Scholar
Digital Library
- J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001. Google Scholar
Digital Library
- J. S. Vetter and B. R. de Supinski. Dynamic software testing of mpi applications with umpire. In Supercomputing, page 51, 2000. Google Scholar
Digital Library
- H. Wasserman, A. Hoisie, and O. Lubeck. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14:330--346, 2000. Google Scholar
Digital Library
- X. Wu and F. Mueller. ScalaExtrap: Trace-based communication extrapolation for SPMD programs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011. Google Scholar
Digital Library
- X. Wu, F. Mueller, and S. Pakin. Automatic generation of executable communication specifications from parallel applications. In ICS, pages 12--21, 2011. Google Scholar
Digital Library
- Q. Xu, R. Prithivathi, J. Subhlok, and R. Zheng. Logicalization of MPI communication traces. Technical Report UH-CS-08-07, Dept. of Computer Science, University of Houston, 2008.Google Scholar
- Q. Xu and J. Subhlok. Construction and evaluation of coordinated performance skeletons. In International Conference on High Performance Computing, pages 73--86, 2008. Google Scholar
Digital Library
- J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: Fast communication trace collection for parallel applications through program slicing. In Proceedings of SC'09, pages 1--12, 2009. Google Scholar
Digital Library
Index Terms
Auto-generation of communication benchmark traces
Recommendations
Elastic and scalable tracing and accurate replay of non-deterministic events
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingSCALATRACE represents the state-of-the-art of parallel application tracing for high performance computing (HPC). This paper presents SCALATRACE II, a next generation tracer that delivers even higher trace compression capability, even when events are not ...
Auto-generation of communication benchmark traces
PMBS '11: Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systemsBenchmarks are essential for evaluating HPC hardware and software for petascale machines and beyond. But benchmark creation is a tedious manual process. As a result, benchmarks tend to lag behind the development of complex scientific codes. Our work ...
Automatic generation of executable communication specifications from parallel applications
ICS '11: Proceedings of the international conference on SupercomputingPortable parallel benchmarks are widely used and highly effective for (a) the evaluation, analysis and procurement of high-performance computing (HPC) systems and (b) quantifying the potential benefits of porting applications for new hardware platforms. ...






Comments