skip to main content
10.1145/1693453.1693493acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Published:09 January 2010Publication History

ABSTRACT

For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines.

This paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute any process of a parallel application on a single node at real speed. As a result, we can simply measure the real sequential computation time on a target node for each process one by one. Second, we observe that computation behavior of processes in parallel applications can be clustered into a few groups while processes in each group have similar computation behavior. This observation helps us reduce measurement time significantly because we only need to execute representative parallel processes instead of all of them.

We have implemented a performance prediction framework, called PHANTOM, which integrates the above computation-time acquisition approach with a trace-driven network simulator. We validate our approach on several platforms. For ASCI Sweep3D, the error of our approach is less than 5% on 1024 processor cores. Compared to a recent regression-based prediction approach, PHANTOM presents better prediction accuracy across different platforms.

References

  1. A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the logp model for parallel computation. Journal of Parallel and Distributed Computing, 44(1): 71--79, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.Google ScholarGoogle Scholar
  3. K. J. Barker, S. Pakin, and D. J. Kerbyson. A performance model of the krak hydrodynamics application. In ICPP'06, pages 245--254, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. In ICS'08, pages 368--377, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bouteiller, G. Bosilca, and J. Dongarra. Retrospect: Deterministic replay of MPI applications for interactive distributed debugging. In EuroPVM/MPI, pages 297--306, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Choudhury, Y. Mehta, and T. L. W. et al. Scaling an optimistic parallel simulation of large-scale interconnection networks. In WSC'05, pages 591--600, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14(4):330--346, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC'01, pages 37--48, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. C. Lee, D. M. Brooks, and B. R. de Supinski et al. Methods of inference and learning for performance modeling of parallel applications. In PPoPP'07, pages 249--258, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LLNL. ASCI purple benchmark. URL https://asc. llnl.gov/computing_resources/purple/archive/ benchmarks.Google ScholarGoogle Scholar
  13. G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, pages 2--13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Maruyama, T. Tsumura, and H. Nakashima. Parallel program debugging based on data-replay. In PDCS'05, pages 151--156, 2005.Google ScholarGoogle Scholar
  15. M. Mathias, D. Kerbyson, and A. Hoisie. A performance model of non-deterministic particle transport on large-scale systems. In Workshop on Performance Modeling and Analysis. ICCS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Prakash and R. Bagrodia. MPI-SIM: Using parallel simulation to evaluate MPI programs. In Winter Simulation Conference, pages 467--474, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC'02, pages 1--17, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Sundaram-Stukel and M. K. Vernon. Predictive analysis of a wavefront application using LogGP. In PPoPP, pages 141--150, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Susukita, H. Ando, and M. A. et al. Performance prediction of large--scale parallell system and application using macro-level simulation. In SC'08, pages 1--9, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tsinghua University. SIM-MPI simulator. URL http://www. hpctest.org.cn/resources/sim-mpi.tgz.Google ScholarGoogle Scholar
  23. T. Wilmarth, G. Zheng, and E. J. B. et al. Performance prediction using simulation of large-scale interconnection networks in POSE. In Proc. 19th Workshop on Parallel and Distributed Simulation, pages 109--118, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP'09, pages 251--260, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC'05, page 40, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: fast communication trace collection for parallel applications through program slicing. In SC'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Zheng, G. Kakulapati, and L. V. Kale. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS'04, pages 78--87, 2004.Google ScholarGoogle Scholar
  28. Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In PLDI'04, pages 255--266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
              January 2010
              372 pages
              ISBN:9781605588773
              DOI:10.1145/1693453
              • cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 45, Issue 5
                PPoPP '10
                May 2010
                346 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/1837853
                Issue’s Table of Contents

              Copyright © 2010 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 January 2010

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate230of1,014submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!