skip to main content
research-article
Free Access

ScalaExtrap: Trace-based communication extrapolation for SPMD programs

Published:04 May 2012Publication History
Skip Abstract Section

Abstract

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for communication modeling due to its lossless yet scalable trace collection. Estimating the impact of scaling on communication efficiency still remains nontrivial due to execution-time variations and exposure to hardware and software artifacts.

This work contributes a fundamentally novel modeling scheme. We synthetically generate the application trace for large numbers of nodes via extrapolation from a set of smaller traces. We devise an innovative approach for topology extrapolation of single program, multiple data (SPMD) codes with stencil or mesh communication. Experimental results show that the extrapolated traces precisely reflect the communication behavior and the performance characteristics at the target scale for both strong and weak scaling applications. The extrapolated trace can subsequently be (a) replayed to assess communication requirements before porting an application, (b) transformed to autogenerate communication benchmarks for various target platforms, and (c) analyzed to detect communication inefficiencies and scalability limitations.

To the best of our knowledge, rapidly obtaining the communication behavior of parallel applications at arbitrary scale with the availability of timed replay, yet without actual execution of the application, at this scale, is without precedence and has the potential to enable otherwise infeasible system simulation at the exascale level.

References

  1. Bailey, D. and Snavely, A. 2005. Performance modeling: Understanding the present and predicting the future. In Proceedings of the Euro-Par Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, D., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V., and Weeratunga, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3, 63--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bergroth, L., Hakonen, H., and Raita, T. 2000. A survey of longest common subsequence algorithms. In Proceedings of the 7th International Symposium on String Processing Information Retrieval (SPIRE'00). Los Alamitos, CA, 39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brunst, H., Kranzlmüller, D., and Nagel, W. 2005. Tools for scalable parallel program analysis—Vampir NG and DeWiz. In International Series Eng. Comput. Sci. Distributed and Parallel Systems 777, 92--102.Google ScholarGoogle Scholar
  5. Deshpande, V. 2011. Automatic generation of complete communication skeletons from traces. M.S. thesis, North Carolina State University, Raleigh, NC.Google ScholarGoogle Scholar
  6. Eckert, Z. and Nutt, G. 1996. Trace extrapolation for parallel programs on shared memory multiprocessors. Tech. rep. TR CU-CS-804-96, Department of Computer Science, University of Colorado at Boulder, Boulder, CO.Google ScholarGoogle Scholar
  7. Eckert, Z. K. F. and Nutt, G. J. 1994. Parallel program trace extrapolation. In Proceedings of the International Conference on Parallel Processing. 103--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Faraj, A., Patarasuk, P., and Yuan, X. 2007. A study of process arrival patterns for MPI collective operations. In Proceedings of the International Conference on Supercomputing. 168--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22, 6, 789--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gruber, B., Haring, G., Kranzlmueller, D., and Volkert, J. 1996. Parallel programming with capse—A case study. In Proceedings of the International Euromicro Conference on Parallel, Distributed, and Network-Based Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gustafson, J. L. 1988. Reevaluating Amdahl's law. Comm. ACM 31, 5, 532--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hermanns, M.-A., Geimer, M., Wolf, F., and Wylie, B. J. N. 2009. Verifying causality between distant performance phenomena in large-scale mpi applications. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. Los Alamitos, CA, 78--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hoisie, A., Lubeck, O. M., and Wasserman, H. J. 1999. Performance analysis of wavefront algorithms on very-large scale distributed systems. In Proceedings of the Workshop on Wide Area Networks and High Performance Computing. Springer-Verlag, 171--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ïpek, E., McKee, S. A., Caruana, R., de Supinski, B. R., and Schulz, M. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 195--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kerbyson, D., Alme, H., Hoisie, A., Petrini, F., Wasserman, H., and Gittings, M. 2001. Predictive performance and scalability modeling of a large-scale application. In International Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kerbyson, D. J. and Hoisie, A. 2006. Performance modeling of the blue gene architecture. In Proceedings of the IEEE John Vincent Atanasoff International Symposium on Modern Computing. 252--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Knüpfer, A., Brendel, R., Brunst, H., Mix, H., and Nagel, W. E. 2006. Introducing the open trace format (OTF). In Proceedings of the International Conference on Computational Science. 526--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Labarta, J., Girona, S., and Cortes, T. 1997. Analyzing scheduling policies using dimemas. Parallel Comput. 23, 1--2, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nagel, W. E., Arnold, A., Weber, M., Hoppe, H. C., and Solchenbach, K. 1996. VAMPIR: Visualization and analysis of MPI resources. Supercomput. 12, 1, 69--80.Google ScholarGoogle Scholar
  20. Noeth, M., Mueller, F., Schulz, M., and de Supinski, B. R. 2007. Scalable compression and replay of communication traces in massively parallel environments. In Proceedings of the International Parallel and Distributed Processing Symposium.Google ScholarGoogle Scholar
  21. Noeth, M., Mueller, F., Schulz, M., and de Supinski, B. R. 2009. Scalatrace: Scalable compression and replay of communication traces in high performance computing. J. Parall. Distrib. Comput. 69, 8, 969--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pillet, V., Labarta, J., Cortes, T., and Girona, S. 1995. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and OCCAM Developments. Transputer and Occam Engineering Series, vol. 44. 17--31.Google ScholarGoogle Scholar
  23. Preissl, R., Köckerbauer, T., Schulz, M., Kranzlmüller, D., Supinski, B. R. D., and Quinlan, D. J. 2008. Detecting patterns in mpi communication traces. In Proceedings of the 37th International Conference on Parallel Processing. Los Alamitos, CA, 230--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Preissl, R., Schulz, M., Kranzlmüller, D., Supinski, B. R., and Quinlan, D. J. 2008. Using mpi communication patterns to guide source code transformations. In Proceedings of the 8th International Conference on Computational Science. Part III, Springer-Verlag, Berlin, 253--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ratn, P., Mueller, F., de Supinski, B. R., and Schulz, M. 2008. Preserving time in large-scale communication traces. In Proceedings of the International Conference on Supercomputing. 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rodrigues, A. F., Murphy, R. C., Kogge, P., and Underwood, K. D. 2006. The structural simulation toolkit: exploring novel architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing. 157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ronsse, M. and Kranzlmueller, D. 1998. Roltmp-replay of lamport timestamps for message passing systems. In Proceedings of the International Euromicro Conference on Parallel, Distributed, and Network-Based Processing.Google ScholarGoogle Scholar
  28. Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems. 45--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., and Purkayastha, A. 2002. A framework for performance modeling and prediction. In Proceedings of the International Conference on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vetter, J. and McCracken, M. 2001. Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wasserman, H., Hoisie, A., and Lubeck, O. 2000. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. Appli. 14, 330--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wu, X., Mueller, F., and Pakin, S. 2011. Automatic generation of executable communication specifications from parallel applications. In Proceedings of the International Conference on Supercomputing. (ICS '11). ACM, New York, 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wu, X., Vijayakumar, K., Mueller, F., Ma, X., and Roth, P. C. 2011. Probabilistic communication and i/o tracing with deterministic replay at scale. In Proceedings of the International Conference on Parallel Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xu, Q., Prithivathi, R., Subhlok, J., and Zheng, R. 2008. Logicalization of mpi communication traces. Tech. rep. UH-CS-08-07, Department of Computer Science, University of Houston.Google ScholarGoogle Scholar
  35. Xu, Q. and Subhlok, J. 2008. Construction and evaluation of coordinated performance skeletons. In Proceeding of the International Conference on High Performance Computing. 73--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhai, J., Chen, W., and Zheng, W. 2010. Phantom: Predicting performance of parallel applications on large-scale parallel machines using a single node. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 305--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhai, J., Sheng, T., He, J., Chen, W., and Zheng, W. 2009. Fact: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the International Conference on Supercomputing. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ScalaExtrap: Trace-based communication extrapolation for SPMD programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Programming Languages and Systems
          ACM Transactions on Programming Languages and Systems  Volume 34, Issue 1
          April 2012
          225 pages
          ISSN:0164-0925
          EISSN:1558-4593
          DOI:10.1145/2160910
          Issue’s Table of Contents

          Copyright © 2012 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 May 2012
          • Accepted: 1 February 2012
          • Revised: 1 November 2011
          • Received: 1 June 2011
          Published in toplas Volume 34, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!