skip to main content
research-article

A peta-scalable CPU-GPU algorithm for global atmospheric simulations

Authors Info & Claims
Published:23 February 2013Publication History
Skip Abstract Section

Abstract

Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both processors and accelerators is becoming an important solution for large-scale computing. However, large-scale simulation of the global atmosphere brings a severe challenge to the development of highly scalable algorithms that fit well into state-of-the-art heterogeneous systems. Although successes have been made on GPU-accelerated computing in some top-level applications, studies on fully exploiting heterogeneous architectures in global atmospheric modeling are still very less to be seen, due in large part to both the computational difficulties of the mathematical models and the requirement of high accuracy for long term simulations.

In this paper, we propose a peta-scalable hybrid algorithm that is successfully applied in a cubed-sphere shallow-water model in global atmospheric simulations. We employ an adjustable partition between CPUs and GPUs to achieve a balanced utilization of the entire hybrid system, and present a pipe-flow scheme to conduct conflict-free inter-node communication on the cubed-sphere geometry and to maximize communication-computation overlap. Systematic optimizations for multithreading on both GPU and CPU sides are performed to enhance computing throughput and improve memory efficiency. Our experiments demonstrate nearly ideal strong and weak scalabilities on up to 3,750 nodes of the Tianhe-1A. The largest run sustains a performance of 0.8 Pflops in double precision (32% of the peak performance), using 45,000 CPU cores and 3,750 GPUs.

References

  1. E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 20 (3): 404--418, march 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Balay, K. Buschelman, W. D. Gropp, D. Kaushik, M. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Users Manual. Argonne National Laboratory, 2010.Google ScholarGoogle Scholar
  3. M. Bernaschi, M. Bisson, T. Endo, S. Matsuoka, M. Fatica, and S. Melchionna. Petaflop biofluidics simulations on a two million-core system. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 4:1--4:12, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl., 14 (3): 189--204, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing (SC '08), pages 4:1--4:12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Gottlieb, C.-W. Shu, and E. Tadmore. Strong stability preserving high-order time integration methods. SIAM Review, 43: 89--112, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Hamada and K. Nitadori. 190 TFlops astrophysical N-body simulation on a cluster of GPUs. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10), pages 1--9, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Hamada, T. Narumi, R. Yokota, K. Yasuoka, K. Nitadori, and M. Taiji. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09), pages 62:1--62:12, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Hamilton and W. Ohfuchi, editors. High Resolution Numerical Modelling of the Atmosphere and Ocean. Springer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  10. T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, and P. Madden. Experience applying Fortran GPU compilers to numerical weather prediction. In Proceedings of 2011 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2011), pages 34--41, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. Hu, N. A. Gumerov, and R. Duraiswami. Scalable fast multipole methods on distributed heterogeneous architectures. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 36:1--36:12, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Jakob-Chien, J. J. Hack, and D. L. Williamson. Spectral transform solutions to the shallow water test set. J. Comput. Phys., 119: 164--187, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kageyama and T. Sato. Yin-Yang grid: An overset grid in spherical geometry. Geochem. Geophys. Geosyst., 5, 2004.Google ScholarGoogle Scholar
  14. J. Michalakes and M. Vachharajani. GPU acceleration of numerical weather prediction. In Proceedings of IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pages 1--7, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Micikevicius. 3D Finite Difference Computation on GPUs using CUDA. In Proc. 2nd Workshop on General Purpose Processing on Graphic Processing Units, pages 79--84, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Miura, M. Satoh, T. Nasuno, A. T. Noda, and K. Oouchi. A Madden-Julian Oscillation event realistically simulated by a global cloud-resolving model. Science, 318: 1763--1765, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Osher and S. Chakravarthy. Upwind schemes and boundary conditions with applications to Euler equations in general geometries. J. Comput. Phys., 50: 447--481, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  18. W. M. Putman. Development of the finite-volume dynamical core on the cubed-sphere. PhD thesis, The Florida State University, 2007.Google ScholarGoogle Scholar
  19. W. M. Putman and M. Suarez. Cloud-system resolving simulations with the NASA Goddard Earth Observing System global atmospheric model (GEOS-5). Geophys. Res. Lett., 38, 2011.Google ScholarGoogle Scholar
  20. C. Ronchi, R. Iacono, and P. Paolucci. The cubed sphere: A new method for the solution of partial differential equations in spherical geometry. J. Comput. Phys., 124: 93--114, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. A. Rossmanith. A wave propagation method for hyperbolic systems on the sphere. J. Comput. Phys., 213: 629--658, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Sadourny. Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon. Wea. Rev., 100: 211--224, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Sadourny, A. Arakawa, and Y. Mintz. Integration of the nondivergent barotropic vorticity equation with an icosahedral-hexagonal grid for the sphere. Mon. Wea. Rev., 96: 351--356, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  24. T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10), pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shimokawabe, Aoki, Ishida, Kawano, and Muroi}shimokawabe2011iccsT. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi. 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Computer Science, 4: 1535 -- 1544, 2011. Proceedings of the International Conference on Computational Science (ICCS 2011).Google ScholarGoogle Scholar
  26. Shimokawabe, Aoki, Takaki, Endo, Yamanaka, Maruyama, Nukada, and Matsuoka}2011gb_tsubameT. Shimokawabe, T. Aoki, T. Takaki, T. Endo, A. Yamanaka, N. Maruyama, A. Nukada, and S. Matsuoka. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11), pages 3:1--3:11, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Shingu, H. Takahara, H. Fuchigami, M. Yamada, Y. Tsuda, W. Ohfuchi, Y. Sasaki, K. Kobayashi, T. Hagiwara, S.-i. Habata, M. Yokokawa, H. Itoh, and K. Otsuka. A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing (SC '02), pages 1--19, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. L. Williamson. Integration of the barotropic vorticity equation on a spherical geodesic grid. Tellus, 20: 642--653, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  29. D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber. A standard test set for numerical approximations to the shallow water equations in spherical geometry. J. Comput. Phys., 102: 211--224, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Xie, Y. Lu, K. Wang, L. Liu, H. Cao, and X. Yang. The Tianhe-1A interconnect and message passing services. IEEE Micro, 1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Yang and X.-C. Cai. Parallel multilevel methods for implicit solution of shallow water equations with nonsmooth topography on the cubed-sphere. J. Comput. Phys., 230: 2523--2539, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X.-J. Yang, X.-K. Liao, K. Lu, Q.-F. Hu, J.-Q. Song, and J.-S. Su. The Tianhe-1A supercomputer: Its hardware and software. J. Comput. Sci. Tech., 26, 2011.Google ScholarGoogle Scholar

Index Terms

  1. A peta-scalable CPU-GPU algorithm for global atmospheric simulations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!