Abstract
One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SWEs), which is one of the most essential equation sets describing atmospheric dynamics. We first design a hybrid methodology that employs both the host CPU cores and the field-programmable gate array (FPGA) accelerators to work in parallel. Through a careful adjustment of the computational domains, we achieve a balanced resource utilization and a further improvement of the overall performance. By decomposing the resource-demanding SWE kernel, we manage to map the double-precision algorithm into three FPGAs. Moreover, by using fixed-point and reduced-precision floating point arithmetic, we manage to build a fully pipelined mixed-precision design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The mixed-precision design with four FPGAs running together can achieve a speedup of 20 over a fully optimized design on a CPU rack with two eight-core processorsand is 8 times faster than the fully optimized Kepler GPU design. As for power efficiency, the mixed-precision design with four FPGAs is 10 times more power efficient than a Tianhe-1A supercomputer node.
- S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, B. Smith, and H. Zhang. 2013. PETSc Users Manual Revision 3.4.Google Scholar
- J. G. Charney and A. Eliassen. 1949. A numerical method for predicting the perturbations of the middle latitude westerlies. Tellus 1, 2, 38--54.Google Scholar
Cross Ref
- G. C. T. Chow, A. H. T. Tse, Q. Jin, W. Luk, P. H. W. Leong, and D. B. Thomas. 2012. A mixed precision Monte Carlo methodology for reconfigurable accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 57--66. Google Scholar
Digital Library
- H. Fu, W. Osborne, R. G. Clapp, O. Mencer, and W. Luk. 2009. Accelerating seismic computations using customized number representations on FPGAs. EURASIP Journal on Embedded Systems 2009, Article No. 3. Google Scholar
Digital Library
- H. Fu and R. G. Clapp. 2011. Eliminating the memory bottleneck: An FPGA-based solution for 3d reverse time migration. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 65--74. Google Scholar
Digital Library
- H. Fu, R. G. Clapp, O. Lindtjorn, T. Wei, and G. Yang. 2012. Revisiting finite difference and spectral migration methods on diverse parallel architectures. Computers and Geosciences 43, 187--196. Google Scholar
Digital Library
- H. Fu, L. Gan, R. Clapp, H. Ruan, O. Pell, O. Mencer, M. Flynn, X. Huang, and G. Yang. 2013. Scaling the reverse time migration performance through reconfigurable data-flow engines. IEEE Micro 34, 1, 30--40.Google Scholar
Cross Ref
- L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang. 2013. Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 1--6.Google Scholar
- S. Gottlieb, C. W. Shu, and E. Tadmor. 2001. Strong stability-preserving high-order time discretization methods. SIAM Review 43, 1, 89--112. Google Scholar
Digital Library
- T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, and P. Madden. 2011. Experience applying Fortran GPU compilers to numerical weather prediction. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 34--41. Google Scholar
Digital Library
- T. C. Johns, J. M. Gregory, W. J. Ingram, C. E. Johnson, A. Jones, J. A. Lowe, J. F. B. Mitchell, D. L. Roberts, D. M. H. Sexton, D. S. Stevenson, S. F. B. Tett, and M. J. Woodage. 2003. Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dynamics 20, 6, 583--612.Google Scholar
Cross Ref
- D. U. Lee, A. A. Gaffar, R. C. C. Cheung, O. Mencer, W. Luk, and G. A. Constantinides. 2006. Accuracy-guaranteed bit-width optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 10, 1990--2000. Google Scholar
Digital Library
- O. Lindtjorn, R. G. Clapp, and M. J. Flynn. 2010. Surviving the end of scaling of traditional micro processors in HPC. In Proceedings of HOT CHIPS 22.Google Scholar
- Maxeler. 2011. Maxeler Products. Retrieved February 25, 2015, from http://www.maxeler.com/products/.Google Scholar
- J. Mielikainen, B. Huang, H. Huang, and M. D. Goldberg. 2012. GPU acceleration of the updated Goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5, 2, 555--562.Google Scholar
Cross Ref
- G. Mingas and C. Bouganis. 2012. A custom precision based architecture for accelerating parallel tempering MCMC on FPGAs without introducing sampling error. In Proceedings of the 20th Annual Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Los Alamitos, CA, 153--156. Google Scholar
Digital Library
- D. Oriato, S. Tilbury, M. Marrocu, and G. Pusceddu. 2012. Acceleration of a meteorological limited area model with dataflow engines. In Proceedings of the Symposium on Application Accelerators in High Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 129--132. Google Scholar
Digital Library
- O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Computing in Science and Engineering 14, 4, 98--103. Google Scholar
Digital Library
- T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi. 2011. 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Computer Science 4, 1535--1544.Google Scholar
- T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. 2010. An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE, Los Alamitos, CA, 1--11. Google Scholar
Digital Library
- W. C. Skamarock, J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers. 2005. A Description of the Advanced Research WRF Version 2. Technical Report. DTIC Document.Google Scholar
- M. C. Smith, J. S. Vetter, and X. Liang. 2005. Accelerating scientific applications with the SRC-6 reconfigurable computer: Methodologies and analysis. In Proceedings of the 19th IEEE International Parallel and Distributed Computing Symposium. IEEE, Los Alamitos, CA, 157b. Google Scholar
Digital Library
- G. Strand. 2011. Community earth system model data management: Policies and challenges. Procedia Computer Science 4, 558--566.Google Scholar
Cross Ref
- A. H. T. Tse, D. B. Thomas, K. H. Tsoi, and W. Luk. 2010. Reconfigurable control variate Monte-Carlo designs for pricing exotic options. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 364--367. Google Scholar
Digital Library
- F. Wilhelm. 2012. Parallel Preconditioners for an Ocean Model in Climate Simulations. Ph.D. Dissertation. Karlsruher Institut fu¨r Technologie, Karlsruhe, Germany.Google Scholar
- D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber. 1992. A standard test set for numerical approximations to the shallow water equations in spherical geometry. Journal of Computational Physics 102, 1, 211--224. Google Scholar
Digital Library
- C. Yang, W. Xue, H. Fu, L. Gan, L. Li, Y. Xu, Y. Lu, J. Sun, G. Yang, and W. Zheng. 2013. A peta-scalable CPU-GPU algorithm for global atmospheric simulations. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, NY, 1--12. Google Scholar
Digital Library
Index Terms
Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms
Recommendations
Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms
CF '16: Proceedings of the ACM International Conference on Computing FrontiersIn climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, ...
Optimization of atmospheric transport models on HPC platforms
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, ...
Global Atmospheric Simulation on a Reconfigurable Platform
FCCM '13: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing MachinesAs the only method to study long-term climate Trendcja and to predict potential climate risk, climate modeling is becoming a key research topic among governments and research organizations. One of the most essential and challenging components in climate ...






Comments