ABSTRACT
For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by several factors, including sequential computation time in each process, communication time and their convolution. Despite previous efforts, it remains an open problem to estimate sequential computation time in each process accurately and efficiently for large-scale parallel applications on non-existing target machines.
This paper proposes a novel approach to predict the sequential computation time accurately and efficiently. We assume that there is at least one node of the target platform but the whole target system need not be available. We make two main technical contributions. First, we employ deterministic replay techniques to execute any process of a parallel application on a single node at real speed. As a result, we can simply measure the real sequential computation time on a target node for each process one by one. Second, we observe that computation behavior of processes in parallel applications can be clustered into a few groups while processes in each group have similar computation behavior. This observation helps us reduce measurement time significantly because we only need to execute representative parallel processes instead of all of them.
We have implemented a performance prediction framework, called PHANTOM, which integrates the above computation-time acquisition approach with a trace-driven network simulator. We validate our approach on several platforms. For ASCI Sweep3D, the error of our approach is less than 5% on 1024 processor cores. Compared to a recent regression-based prediction approach, PHANTOM presents better prediction accuracy across different platforms.
- A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the logp model for parallel computation. Journal of Parallel and Distributed Computing, 44(1): 71--79, 1997. Google Scholar
Digital Library
- D. Bailey, T. Harris, W. Saphir, R. V. D. Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. NAS Systems Division, NASA Ames Research Center, Moffett Field, CA, 1995.Google Scholar
- K. J. Barker, S. Pakin, and D. J. Kerbyson. A performance model of the krak hydrodynamics application. In ICPP'06, pages 245--254, 2006. Google Scholar
Digital Library
- B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. In ICS'08, pages 368--377, 2008. Google Scholar
Digital Library
- A. Bouteiller, G. Bosilca, and J. Dongarra. Retrospect: Deterministic replay of MPI applications for interactive distributed debugging. In EuroPVM/MPI, pages 297--306, 2007. Google Scholar
Digital Library
- N. Choudhury, Y. Mehta, and T. L. W. et al. Scaling an optimistic parallel simulation of large-scale interconnection networks. In WSC'05, pages 591--600, 2005. Google Scholar
Digital Library
- A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications, 14(4):330--346, 2000. Google Scholar
Digital Library
- D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC'01, pages 37--48, 2001. Google Scholar
Digital Library
- J. Labarta, S. Girona, V. Pillet, T. Cortes, and L. Gregoris. DiP: A parallel program development environment. In Euro-Par'96, pages 665--674, 1996. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. Google Scholar
Digital Library
- B. C. Lee, D. M. Brooks, and B. R. de Supinski et al. Methods of inference and learning for performance modeling of parallel applications. In PPoPP'07, pages 249--258, 2007. Google Scholar
Digital Library
- LLNL. ASCI purple benchmark. URL https://asc. llnl.gov/computing_resources/purple/archive/ benchmarks.Google Scholar
- G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In SIGMETRICS'04, pages 2--13, 2004. Google Scholar
Digital Library
- M. Maruyama, T. Tsumura, and H. Nakashima. Parallel program debugging based on data-replay. In PDCS'05, pages 151--156, 2005.Google Scholar
- M. Mathias, D. Kerbyson, and A. Hoisie. A performance model of non-deterministic particle transport on large-scale systems. In Workshop on Performance Modeling and Analysis. ICCS, 2003. Google Scholar
Digital Library
- S. Prakash and R. Bagrodia. MPI-SIM: Using parallel simulation to evaluate MPI programs. In Winter Simulation Conference, pages 467--474, 1998. Google Scholar
Digital Library
- S. Shao, A. K. Jones, and R. G. Melhem. A compiler-based communication analysis approach for multiprocessor systems. In IPDPS, 2006. Google Scholar
Digital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In ASPLOS, pages 45--57, 2002. Google Scholar
Digital Library
- A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A framework for application performance modeling and prediction. In SC'02, pages 1--17, 2002. Google Scholar
Digital Library
- D. Sundaram-Stukel and M. K. Vernon. Predictive analysis of a wavefront application using LogGP. In PPoPP, pages 141--150, 1999. Google Scholar
Digital Library
- R. Susukita, H. Ando, and M. A. et al. Performance prediction of large--scale parallell system and application using macro-level simulation. In SC'08, pages 1--9, 2008. Google Scholar
Digital Library
- Tsinghua University. SIM-MPI simulator. URL http://www. hpctest.org.cn/resources/sim-mpi.tgz.Google Scholar
- T. Wilmarth, G. Zheng, and E. J. B. et al. Performance prediction using simulation of large-scale interconnection networks in POSE. In Proc. 19th Workshop on Parallel and Distributed Simulation, pages 109--118, 2005. Google Scholar
Digital Library
- R. Xue, X. Liu, M. Wu, Z. Guo, W. Chen, W. Zheng, Z. Zhang, and G. M. Voelker. MPIWiz: subgroup reproducible replay of mpi applications. In PPoPP'09, pages 251--260, 2009. Google Scholar
Digital Library
- L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC'05, page 40, 2005. Google Scholar
Digital Library
- J. Zhai, T. Sheng, J. He, W. Chen, and W. Zheng. FACT: fast communication trace collection for parallel applications through program slicing. In SC'09, 2009. Google Scholar
Digital Library
- G. Zheng, G. Kakulapati, and L. V. Kale. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS'04, pages 78--87, 2004.Google Scholar
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In PLDI'04, pages 255--266, 2004. Google Scholar
Digital Library
Index Terms
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node
Recommendations
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node
PPoPP '10For designers of large-scale parallel computers, it is greatly desired that performance of parallel applications can be predicted at the design phase. However, this is difficult because the execution time of parallel applications is determined by ...
Extraction of Parallel Application Signatures for Performance Prediction
HPCC '10: Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and CommunicationsPredicting performance of parallel applications is becoming increasingly complex and the best performance predictor is the application itself, but the time required to run it thoroughly is a onerous requirement. We seek to characterize the behavior of ...
Parallel Application Signature for Performance Analysis and Prediction
Predicting the performance of parallel scientific applications is becoming increasingly complex. Our goal was to characterize the behavior of message-passing applications on different target machines. To achieve this goal, we developed a method called ...







Comments