Abstract

Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both processors and accelerators is becoming an important solution for large-scale computing. However, large-scale simulation of the global atmosphere brings a severe challenge to the development of highly scalable algorithms that fit well into state-of-the-art heterogeneous systems. Although successes have been made on GPU-accelerated computing in some top-level applications, studies on fully exploiting heterogeneous architectures in global atmospheric modeling are still very less to be seen, due in large part to both the computational difficulties of the mathematical models and the requirement of high accuracy for long term simulations.
In this paper, we propose a peta-scalable hybrid algorithm that is successfully applied in a cubed-sphere shallow-water model in global atmospheric simulations. We employ an adjustable partition between CPUs and GPUs to achieve a balanced utilization of the entire hybrid system, and present a pipe-flow scheme to conduct conflict-free inter-node communication on the cubed-sphere geometry and to maximize communication-computation overlap. Systematic optimizations for multithreading on both GPU and CPU sides are performed to enhance computing throughput and improve memory efficiency. Our experiments demonstrate nearly ideal strong and weak scalabilities on up to 3,750 nodes of the Tianhe-1A. The largest run sustains a performance of 0.8 Pflops in double precision (32% of the peak performance), using 45,000 CPU cores and 3,750 GPUs.
- E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 20 (3): 404--418, march 2009. Google Scholar
Digital Library
- S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Users Manual. Argonne National Laboratory, 2010.Google Scholar
- M. Bernaschi, M. Bisson, T. Endo, S. Matsuoka, M. Fatica, and S. Melchionna. Petaflop biofluidics simulations on a two million-core system. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 4:1--4:12, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl., 14 (3): 189--204, Aug. 2000. Google Scholar
Digital Library
- K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC '08), pages 4:1--4:12, Piscataway, NJ, USA, 2008. IEEE Press. Google Scholar
Digital Library
- S. Gottlieb, C.-W. Shu, and E. Tadmore. Strong stability preserving high-order time integration methods. SIAM Review, 43: 89--112, 2001. Google Scholar
Digital Library
- T. Hamada and K. Nitadori. 190 TFlops astrophysical N-body simulation on a cluster of GPUs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10), pages 1--9, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
Digital Library
- T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori, and M. Taiji. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09), pages 62:1--62:12, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- K. Hamilton and W. Ohfuchi, editors. High Resolution Numerical Modelling of the Atmosphere and Ocean. Springer, 2008.Google Scholar
Cross Ref
- T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, and P. Madden. Experience applying Fortran GPU compilers to numerical weather prediction. In Proceedings of 2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2011), pages 34--41, 2011. Google Scholar
Digital Library
- Q. Hu, N. A. Gumerov, and R. Duraiswami. Scalable fast multipole methods on distributed heterogeneous architectures. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 36:1--36:12, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- R. Jakob-Chien, J. J. Hack, and D. L. Williamson. Spectral transform solutions to the shallow water test set. J. Comput. Phys., 119: 164--187, 1995. Google Scholar
Digital Library
- A. Kageyama and T. Sato. Yin-Yang grid: An overset grid in spherical geometry. Geochem. Geophys. Geosyst., 5, 2004.Google Scholar
- J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. In Proceedings of IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pages 1--7, 2008.Google Scholar
Cross Ref
- P. Micikevicius. 3D Finite Difference Computation on GPUs using CUDA. In Proc. 2nd Workshop on General Purpose Processing on Graphic Processing Units, pages 79--84, 2009. Google Scholar
Digital Library
- H. Miura, M. Satoh, T. Nasuno, A. T. Noda, and K. Oouchi. A Madden-Julian Oscillation event realistically simulated by a global cloud-resolving model. Science, 318: 1763--1765, 2007.Google Scholar
Cross Ref
- S. Osher and S. Chakravarthy. Upwind schemes and boundary conditions with applications to Euler equations in general geometries. J. Comput. Phys., 50: 447--481, 1983.Google Scholar
Cross Ref
- W. M. Putman. Development of the finite-volume dynamical core on the cubed-sphere. PhD thesis, The Florida State University, 2007.Google Scholar
- W. M. Putman and M. Suarez. Cloud-system resolving simulations with the NASA Goddard Earth Observing System global atmospheric model (GEOS-5). Geophys. Res. Lett., 38, 2011.Google Scholar
- C. Ronchi, R. Iacono, and P. Paolucci. The cubed sphere: A new method for the solution of partial differential equations in spherical geometry. J. Comput. Phys., 124: 93--114, 1996. Google Scholar
Digital Library
- J. A. Rossmanith. A wave propagation method for hyperbolic systems on the sphere. J. Comput. Phys., 213: 629--658, 2006. Google Scholar
Digital Library
- R. Sadourny. Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon. Wea. Rev., 100: 211--224, 1972.Google Scholar
Cross Ref
- R. Sadourny, A. Arakawa, and Y. Mintz. Integration of the nondivergent barotropic vorticity equation with an icosahedral-hexagonal grid for the sphere. Mon. Wea. Rev., 96: 351--356, 1968.Google Scholar
Cross Ref
- T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10), pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
Digital Library
- Shimokawabe, Aoki, Ishida, Kawano, and Muroi}shimokawabe2011iccsT. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi. 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Computer Science, 4: 1535 -- 1544, 2011. Proceedings of the International Conference on Computational Science (ICCS 2011).Google Scholar
- Shimokawabe, Aoki, Takaki, Endo, Yamanaka, Maruyama, Nukada, and Matsuoka}2011gb_tsubameT. Shimokawabe, T. Aoki, T. Takaki, T. Endo, A. Yamanaka, N. Maruyama, A. Nukada, and S. Matsuoka. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 3:1--3:11, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- S. Shingu, H. Takahara, H. Fuchigami, M. Yamada, Y. Tsuda, W. Ohfuchi, Y. Sasaki, K. Kobayashi, T. Hagiwara, S.-i. Habata, M. Yokokawa, H. Itoh, and K. Otsuka. A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing (SC '02), pages 1--19, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Google Scholar
Digital Library
- D. L. Williamson. Integration of the barotropic vorticity equation on a spherical geodesic grid. Tellus, 20: 642--653, 1968.Google Scholar
Cross Ref
- D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber. A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys., 102: 211--224, 1992. Google Scholar
Digital Library
- M. Xie, Y. Lu, K. Wang, L. Liu, H. Cao, and X. Yang. The Tianhe-1A interconnect and message passing services. IEEE Micro, 1, 2012. Google Scholar
Digital Library
- C. Yang and X.-C. Cai. Parallel multilevel methods for implicit solution of shallow water equations with nonsmooth topography on the cubed-sphere. J. Comput. Phys., 230: 2523--2539, 2011. Google Scholar
Digital Library
- X.-J. Yang, X.-K. Liao, K. Lu, Q.-F. Hu, J.-Q. Song, and J.-S. Su. The Tianhe-1A supercomputer: Its hardware and software. J. Comput. Sci. Tech., 26, 2011.Google Scholar
Index Terms
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
Recommendations
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingDeveloping highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both ...
Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms
CF '16: Proceedings of the ACM International Conference on Computing FrontiersIn climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, ...
Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system
XSEDE '12: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyondThe Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow ...







Comments