skip to main content
column

Auto-generation of communication benchmark traces

Authors Info & Claims
Published:08 October 2012Publication History
Skip Abstract Section

Abstract

Benchmarks are essential for evaluating HPC hardware and software for petascale machines and beyond. But benchmark creation is a tedious manual process. As a result, benchmarks tend to lag behind the development of complex scientific codes. Our work automates the creation of communication benchmarks. Given an MPI application, we utilize ScalaTrace, a lossless and scalable framework to trace communication operations and execution time while abstracting away the computations. A single trace file that reflects the behavior of all nodes is subsequently expanded to C source code by a novel code generator. This resulting benchmark code is compact, portable, human-readable, and accurately reflects the original application's communication characteristics and performance. Experimental results demonstrate that generated source code of benchmarks preserves both the communication patterns and the run-time behavior of the original application. Such automatically generated benchmarks not only shorten the transition from application development to benchmark extraction but also facilitate code obfuscation, which is essential for benchmark extraction from commercial and restricted applications.

References

  1. MPI-2: Extensions to the message passing interface, July 1997.Google ScholarGoogle Scholar
  2. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The nInternational Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarGoogle Scholar
  3. H. Brunst, H.-C. Hoppe, W. E. Nagel, and M. Winkler. Performance optimization for large scale computing: The scalable VAMPIR approach. In International Conference on Computational Science (2), pages 751--760, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Casas, R. Badia, and J. Labarta. Automatic structure extraction from mpi applications tracefiles. In Euro-Par Conference, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. V. Ertvelde and L. Eeckhout. Dispersing proprietary applications as benchmarks through code mutation. In Architectural Support for Programming Languages and Operating Systems, pages 201--210, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the bMPI message passing interface standard. Parallel Computing, 22(6):789--828, Sept. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Knupfer. Construction and compression of complete call graphs for post-mortem program trace analysis. In International Conference on Parallel Processing, pages 165--172, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the Open Trace Format (OTF). In International Conference on Computational Science, pages 526--533, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Marathe and F. Mueller. Detecting memory performance bottlenecks via binary rewriting. In Workshop on Binary Translation, Sept. 2002.Google ScholarGoogle Scholar
  11. M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, Apr. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing, 69(8):969--710, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Pillet, V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualize and analyze parallel code. In P. Nixon, editor, Proceedings of the 18th Technical Meeting of WoTUG-18: Transputer and Occam Developments, volume 44 of Transputer and Occam Engineering, pages 17--31, Manchester, United Kingdom, Apr. 9-12, 1995. IOS Press.Google ScholarGoogle Scholar
  14. P. Ratn, F. Mueller, B. R. de Supinski, and M. Schulz. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46--55, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Shao, A. Jones, and R. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In In International Parallel and Distributed Processing Symposium, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. S. Vetter and B. R. de Supinski. Dynamic software testing of mpi applications with umpire. In Supercomputing, page 51, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Wasserman, A. Hoisie, and O. Lubeck. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14:330--346, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Wu and F. Mueller. ScalaExtrap: Trace-based communication extrapolation for SPMD programs. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Wu, F. Mueller, and S. Pakin. Automatic generation of executable communication specifications from parallel applications. In ICS, pages 12--21, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Xu, R. Prithivathi, J. Subhlok, and R. Zheng. Logicalization of MPI communication traces. Technical Report UH-CS-08-07, Dept. of Computer Science, University of Houston, 2008.Google ScholarGoogle Scholar
  22. Q. Xu and J. Subhlok. Construction and evaluation of coordinated performance skeletons. In International Conference on High Performance Computing, pages 73--86, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: Fast communication trace collection for parallel applications through program slicing. In Proceedings of SC'09, pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Auto-generation of communication benchmark traces

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!