skip to main content
research-article

Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms

Authors Info & Claims
Published:25 March 2015Publication History
Skip Abstract Section

Abstract

One of the most essential and challenging components in climate modeling is the atmospheric model. To solve multiphysical atmospheric equations, developers have to face extremely complex stencil kernels that are costly in terms of both computing and memory resources. This article aims to accelerate the solution of global shallow water equations (SWEs), which is one of the most essential equation sets describing atmospheric dynamics. We first design a hybrid methodology that employs both the host CPU cores and the field-programmable gate array (FPGA) accelerators to work in parallel. Through a careful adjustment of the computational domains, we achieve a balanced resource utilization and a further improvement of the overall performance. By decomposing the resource-demanding SWE kernel, we manage to map the double-precision algorithm into three FPGAs. Moreover, by using fixed-point and reduced-precision floating point arithmetic, we manage to build a fully pipelined mixed-precision design on a single FPGA, which can perform 428 floating-point and 235 fixed-point operations per cycle. The mixed-precision design with four FPGAs running together can achieve a speedup of 20 over a fully optimized design on a CPU rack with two eight-core processorsand is 8 times faster than the fully optimized Kepler GPU design. As for power efficiency, the mixed-precision design with four FPGAs is 10 times more power efficient than a Tianhe-1A supercomputer node.

References

  1. S. Balay, J. Brown, K. Buschelman, V. Eijkhout, W. Gropp, D. Kaushik, M. Knepley, L. Curfman McInnes, B. Smith, and H. Zhang. 2013. PETSc Users Manual Revision 3.4.Google ScholarGoogle Scholar
  2. J. G. Charney and A. Eliassen. 1949. A numerical method for predicting the perturbations of the middle latitude westerlies. Tellus 1, 2, 38--54.Google ScholarGoogle ScholarCross RefCross Ref
  3. G. C. T. Chow, A. H. T. Tse, Q. Jin, W. Luk, P. H. W. Leong, and D. B. Thomas. 2012. A mixed precision Monte Carlo methodology for reconfigurable accelerator systems. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 57--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Fu, W. Osborne, R. G. Clapp, O. Mencer, and W. Luk. 2009. Accelerating seismic computations using customized number representations on FPGAs. EURASIP Journal on Embedded Systems 2009, Article No. 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Fu and R. G. Clapp. 2011. Eliminating the memory bottleneck: An FPGA-based solution for 3d reverse time migration. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, NY, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Fu, R. G. Clapp, O. Lindtjorn, T. Wei, and G. Yang. 2012. Revisiting finite difference and spectral migration methods on diverse parallel architectures. Computers and Geosciences 43, 187--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Fu, L. Gan, R. Clapp, H. Ruan, O. Pell, O. Mencer, M. Flynn, X. Huang, and G. Yang. 2013. Scaling the reverse time migration performance through reconfigurable data-flow engines. IEEE Micro 34, 1, 30--40.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. Gan, H. Fu, W. Luk, C. Yang, W. Xue, X. Huang, Y. Zhang, and G. Yang. 2013. Accelerating solvers for global atmospheric equations through mixed-precision data flow engine. In Proceedings of the 23rd International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 1--6.Google ScholarGoogle Scholar
  9. S. Gottlieb, C. W. Shu, and E. Tadmor. 2001. Strong stability-preserving high-order time discretization methods. SIAM Review 43, 1, 89--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, and P. Madden. 2011. Experience applying Fortran GPU compilers to numerical weather prediction. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. C. Johns, J. M. Gregory, W. J. Ingram, C. E. Johnson, A. Jones, J. A. Lowe, J. F. B. Mitchell, D. L. Roberts, D. M. H. Sexton, D. S. Stevenson, S. F. B. Tett, and M. J. Woodage. 2003. Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dynamics 20, 6, 583--612.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. U. Lee, A. A. Gaffar, R. C. C. Cheung, O. Mencer, W. Luk, and G. A. Constantinides. 2006. Accuracy-guaranteed bit-width optimization. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 10, 1990--2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. O. Lindtjorn, R. G. Clapp, and M. J. Flynn. 2010. Surviving the end of scaling of traditional micro processors in HPC. In Proceedings of HOT CHIPS 22.Google ScholarGoogle Scholar
  14. Maxeler. 2011. Maxeler Products. Retrieved February 25, 2015, from http://www.maxeler.com/products/.Google ScholarGoogle Scholar
  15. J. Mielikainen, B. Huang, H. Huang, and M. D. Goldberg. 2012. GPU acceleration of the updated Goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5, 2, 555--562.Google ScholarGoogle ScholarCross RefCross Ref
  16. G. Mingas and C. Bouganis. 2012. A custom precision based architecture for accelerating parallel tempering MCMC on FPGAs without introducing sampling error. In Proceedings of the 20th Annual Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, Los Alamitos, CA, 153--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Oriato, S. Tilbury, M. Marrocu, and G. Pusceddu. 2012. Acceleration of a meteorological limited area model with dataflow engines. In Proceedings of the Symposium on Application Accelerators in High Performance Computing (SAAHPC). IEEE, Los Alamitos, CA, 129--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Pell and V. Averbukh. 2012. Maximum performance computing with dataflow engines. Computing in Science and Engineering 14, 4, 98--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Shimokawabe, T. Aoki, J. Ishida, K. Kawano, and C. Muroi. 2011. 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Procedia Computer Science 4, 1535--1544.Google ScholarGoogle Scholar
  20. T. Shimokawabe, T. Aoki, C. Muroi, J. Ishida, K. Kawano, T. Endo, A. Nukada, N. Maruyama, and S. Matsuoka. 2010. An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC). IEEE, Los Alamitos, CA, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. C. Skamarock, J. B. Klemp, J. Dudhia, D. O. Gill, D. M. Barker, W. Wang, and J. G. Powers. 2005. A Description of the Advanced Research WRF Version 2. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  22. M. C. Smith, J. S. Vetter, and X. Liang. 2005. Accelerating scientific applications with the SRC-6 reconfigurable computer: Methodologies and analysis. In Proceedings of the 19th IEEE International Parallel and Distributed Computing Symposium. IEEE, Los Alamitos, CA, 157b. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Strand. 2011. Community earth system model data management: Policies and challenges. Procedia Computer Science 4, 558--566.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. H. T. Tse, D. B. Thomas, K. H. Tsoi, and W. Luk. 2010. Reconfigurable control variate Monte-Carlo designs for pricing exotic options. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). IEEE, Los Alamitos, CA, 364--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Wilhelm. 2012. Parallel Preconditioners for an Ocean Model in Climate Simulations. Ph.D. Dissertation. Karlsruher Institut fu¨r Technologie, Karlsruhe, Germany.Google ScholarGoogle Scholar
  26. D. L. Williamson, J. B. Drake, J. J. Hack, R. Jakob, and P. N. Swarztrauber. 1992. A standard test set for numerical approximations to the shallow water equations in spherical geometry. Journal of Computational Physics 102, 1, 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Yang, W. Xue, H. Fu, L. Gan, L. Li, Y. Xu, Y. Lu, J. Sun, G. Yang, and W. Zheng. 2013. A peta-scalable CPU-GPU algorithm for global atmospheric simulations. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, NY, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Solving the Global Atmospheric Equations through Heterogeneous Reconfigurable Platforms

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 8, Issue 2
              Special Section on FPL 2013
              April 2015
              129 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/2746532
              • Editor:
              • Steve Wilton
              Issue’s Table of Contents

              Copyright © 2015 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 March 2015
              • Revised: 1 March 2014
              • Accepted: 1 March 2014
              • Received: 1 December 2013
              Published in trets Volume 8, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!