skip to main content
research-article

ScalaExtrap: trace-based communication extrapolation for spmd programs

Authors Info & Claims
Published:12 February 2011Publication History
Skip Abstract Section

Abstract

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for communication modeling due to its lossless yet scalable trace collection. Estimating the impact of scaling on communication efficiency still remains non-trivial due to execution-time variations and exposure to hardware and software artifacts. This work contributes a fundamentally novel modeling scheme. We synthetically generate the application trace for large numbers of nodes by extrapolation from a set of smaller traces. We devise an innovative approach for topology extrapolation of single program, multiple data (SPMD) codes with stencil or mesh communication. The extrapolated trace can subsequently be (a) replayed to assess communication requirements before porting an application, (b) transformed to auto-generate communication benchmarks for various target platforms, and (c) analyzed to detect communication inefficiencies and scalability limitations. To the best of our knowledge, rapidly obtaining the communication behavior of parallel applications at arbitrary scale with the availability of timed replay, yet without actual execution of the application at this scale is without precedence and has the potential to enable otherwise infeasible system simulation at the exascale level.

References

  1. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D.H. Bailey and A. Snavely. Performance modeling: Understanding the present and predicting the future. In Euro-Par Conference, August 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Brunst, D. Kranzlmüller, and W. Nagel. Tools for Scalable Parallel Program Analysis - Vampir NG and DeWiz. The International Series in Engineering and Computer Science, Distributed and Parallel Systems, 777:92--102, 2005.Google ScholarGoogle Scholar
  4. Z. Eckert and G. Nutt. Trace extrapolation for parallel programs on shared memory multiprocessors. Technical Report TR CU-CS-804-96, Department of Computer Science, University of Colorado at Boulder, Boulder, CO, 1996.Google ScholarGoogle Scholar
  5. Zulah K. F. Eckert and Gary J. Nutt. Parallel program trace extrapolation. In International Conference on Parallel Processing, pages 103--107, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ahmad Faraj, Pitch Patarasuk, and Xin Yuan. A study of process arrival patterns for MPI collective operations. In International Conference on Supercomputing, pages 168--179, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6):789--828, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. Efficiently exploring architectural design spaces via predictive modeling. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 195--206, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Kerbyson, H. Alme, A. Hoisie, F. Petrini, H. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Supercomputing, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Darren J. Kerbyson and Adolfy Hoisie. Performance modeling of the blue gene architecture. In JVA'06: Proceedings of the IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing, pages 252--259, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Knüpfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jesús Labarta, Sergi Girona, and Toni Cortes. Analyzing scheduling policies using dimemas. Parallel Computing, 23(1-2):23--34, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPI resources. Supercomputer, 12(1):69--80, 1996.Google ScholarGoogle Scholar
  14. M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing, 69(8):969--710, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments, volume 44 of Transputer and Occam Engineering, pages 17--31, April 1995.Google ScholarGoogle Scholar
  16. Robert Preissl, Thomas Köckerbauer, Martin Schulz, Dieter Kranzlmüller, Bronis R. de Supinski, and Daniel J. Quinlan. Detecting patterns in mpi communication traces. In ICPP'08: Proceedings of the 2008 37th International Conference on Parallel Processing, pages 230--237, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert Preissl, Martin Schulz, Dieter Kranzlmüller, Bronis R. Supinski, and Daniel J. Quinlan. Using mpi communication patterns to guide source code transformations. In ICCS '08: Proceedings of the 8th international conference on Computational Science, Part III, pages 253--260, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Ratn, F. Mueller, Bronis R. de Supinski, and M. Schulz. Preserving time in large-scale communication traces. In International Conference on Supercomputing, pages 46--55, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Arun F Rodrigues, Richard C Murphy, Peter Kogge, and Keith D Underwood. The structural simulation toolkit: exploring novel architectures. In Poster at the 2006 ACM/IEEE Conference on Supercomputing, page 157, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. Automatically characterizing large scale program behavior. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 45--57, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for performance modeling and prediction. In Supercomputing, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qiang Xu, Ravi Prithivathi, Jaspal Subhlok, and Rong Zheng. Logicalization of mpi communication traces. Technical Report UH-CS-08-07, Dept. of Computer Science, University of Houston, 2008.Google ScholarGoogle Scholar
  24. Qiang Xu and Jaspal Subhlok. Construction and evaluation of coordinated performance skeletons. In International Conference on High Performance Computing, pages 73--86, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Zhai, W. Chen, and W. Zheng. Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 305--314, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. Fact: fast communication trace collection for parallel applications through program slicing. In Supercomputing, pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ScalaExtrap: trace-based communication extrapolation for spmd programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 8
          PPoPP '11
          August 2011
          300 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2038037
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
            February 2011
            326 pages
            ISBN:9781450301190
            DOI:10.1145/1941553
            • General Chair:
            • Calin Cascaval,
            • Program Chair:
            • Pen-Chung Yew

          Copyright © 2011 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 February 2011

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!