Big Data Assimilation: Real-Time 30-Second-Refresh Heavy Rain Forecast Using Fugaku During Tokyo Olympics and Paralympics

Real-time 30-second-refresh numerical weather prediction (NWP) was performed with exclusive use of 11,580 nodes (~7%) of supercomputer Fugaku during Tokyo Olympics and Paralympics in 2021. Total 75,248 forecasts were disseminated in the 1-month period mostly stably with time-to-solution less than 3 minutes for 30-minute forecast. Japan's Big Data Assimilation (BDA) project developed the novel NWP system for precise prediction of hazardous rains toward solving the global climate crisis. Compared with typical 1-hour-refresh systems, the BDA system offered two orders of magnitude increase in problem size and revealed the effectiveness of 30-second refresh for highly nonlinear, rapidly evolving convective rains. To achieve the required time-to-solution for real-time 30-second refresh with high accuracy, the core BDA software incorporated single precision and enhanced parallel I/O with properly selected configurations of 1000 ensemble members and 500-m-mesh weather model. The massively parallel, I/O intensive real-time BDA computation demonstrated a promising future direction.

30-minute forecast.Japan's Big Data Assimilation (BDA) project developed the novel NWP system for precise prediction of hazardous rains toward solving the global climate crisis.Compared with typical 1-hour-refresh systems, the BDA system offered two orders of magnitude increase in problem size and revealed the effectiveness of 30-second refresh for highly nonlinear, rapidly evolving convective rains.To achieve the required time-to-solution for real-time 30-second refresh with high accuracy, the core BDA software incorporated single precision and enhanced parallel I/O with properly selected configurations of 1000 ensemble members and 500-m-mesh weather model.The massively parallel, I/O intensive real-time BDA computation demonstrated a promising future direction.

Justification for ACM Gordon Bell Prize for Climate Modelling
Every 30 seconds ~100MB radar sensing data was transferred to Fugaku in ~3 seconds and was assimilated with a 1000-member ensemble Kalman filter in ~15 seconds to achieve time-to-solution <3 minutes for 30-minute forecast.The computation revealed the effectiveness of 30-second-refresh real-time NWP for highly nonlinear, rapidly evolving convective rains.

Overview of the Problem
The global climate crisis involves increasing risks of extreme rains, and their precise prediction is essential for effective risk management.Figure 1 shows an example of such precise prediction from our demonstration experiment during Tokyo Olympics.This type of high precision rain prediction has never existed before due to computational challenges for stable real-time performance of massively parallel and I/O intensive numerical weather prediction (NWP) refreshed every 30 seconds, 120x faster than 1-hour-refresh systems.
The typical 1-hour-refresh NWP is not designed to make precise prediction of extreme rains associated with highly nonlinear, rapidly evolving convective activities with complex cloud processes.A regular weather radar observes rain distributions on a single line of sight at a time and is rotated for 5 minutes to scan 15 vertical levels.In 5 minutes, convective clouds undertake highly nonlinear processes and change the shapes substantially, especially for rapid development of hazardous rains.Even though NWP includes physical equations to predict such phenomena, the hourly refresh rate is too slow to represent them precisely.
Therefore, a phased array weather radar (PAWR, [1,2,3]) was developed to observe every 30 seconds evolution of rain distributions in the 3-D volume without gap, totaling 100x more data.To use such big data in NWP, we developed a novel Big Data Assimilation (BDA) system with promising offline proof-ofconcept demonstrations using the K computer [4,5].Namely, we showed two real-world cases of precise representation of highly nonlinear, rapidly evolving convective rains achieved by the BDA system offline without real-time computational restrictions.For online application and real-time prediction, time-to-solution for the entire workflow from PAWR observation to NWP production needs to be accelerated by O (10)x.

Current State of the Art
The current state-of-the-art operational regional NWP systems at national weather centers are listed in Table 1.The horizontal grid spacings are a few km, and refresh rates are hourly or less frequently.Ensemble data assimilation methods employ about 40 ensemble members.Ensemble forecasts are not always available, and if available with fewer ensemble members at lower resolution.Many systems assimilate radar data indirectly as relative humidity (RH) or latent heating derived from reflectivity, the directly observed quantity of power of radio signals reflected by raindrops and other objects.
In addition, the current state-of-the-art NWP systems usually use double precision.Also, the weather model and data assimilation codes are usually developed independently, and the data transfer between the two independent codes are made by writing and reading files.A typical NWP model has O(10 9 ) variables, and the file I/O for the GB scale data is usually fast enough compared with the refresh rate of O(1 hour).

Innovations Realized
Compared with the current state of the art in the previous section, the BDA system offers two orders of magnitude increase in problem size (Table 1, bottom row).To achieve stable time-tosolution for 30-second-refresh real-time NWP for real-life applications, we improved the overall workflow (Fig. 2) including the data transfer of multi-parameter PAWR (MP-PAWR, [24,25]) and the BDA core software SCALE-LETKF [26] consisting of the regional NWP model SCALE-RM (Scalable Computing for Advanced Library and Environment Regional Model, [27,28]) and the Local Ensemble Transform Kalman Filter (LETKF, [29,30]).MP-PAWR is a new-generation PAWR installed at Saitama University (Fig. 3a, red star) with overall hardware improvements.
Innovations were realized in the data transfer from MP-PAWR to Fugaku by developing a dedicated software JIT-DT (Just-In-Time Data Transfer, [31,32]) and in the SCALE-LETKF by incorporating single precision, enhancing parallel I/O between SCALE and the LETKF, and selecting proper configurations such as 1000 ensemble members and a combination of 500-m and 1.5km grid spacings.The workflow starts from the MP-PAWR data transfer (Fig. 2, yellow).As soon as the MP-PAWR completes a 3-D volume scan in the previous 30 seconds, a data file of ~100MB is created in a server at Saitama University.JIT-DT monitors the new data file creation and transfers it immediately and directly to the SCALE-LETKF processes running on Fugaku.For a fail-safe workflow in case of abnormal delays or troubles, data transfer activities are monitored, and JIT-DT is restarted automatically when necessary.
The SCALE-LETKF was run as a single executable continuously without interruption with 8888 nodes or 426,624 CPU cores of Fugaku.Fugaku is a general CPU machine, and the SCALE-LETKF is designed for maximal portability, not exclusively for Fugaku.The SCALE-LETKF is composed of two parts: <1> assimilating the MP-PAWR data to produce 1000member ensemble analyses and <2> running 11-member ensemble forecasts for the next 30 minutes initialized by the ensemble mean analysis and 10 analyses randomly chosen from the 1000-member ensemble analyses.Part <1> is further decomposed into two parts: <1-1> assimilating the MP-PAWR data to obtain 1000-member ensemble analyses and <1-2> running 1000-member ensemble forecasts for the next 30 seconds initialized by the 1000-member ensemble analyses.
We converted variables of both SCALE and LETKF Fortran codes from double precision to single precision for 2x acceleration.The LETKF contains eigenvalue decomposition of the size of the ensemble at each grid point, involving total 256x256x60 calls of an eigenvalue solver of the matrix size of 1000.We applied KeDV [33] for the eigenvalue solver in place of the standard LAPACK solver to accelerate the computation.Next, the data transfer between SCALE and the LETKF was accelerated by replacing the original file I/O with parallel I/O using the MPI data transfer with RAM copy and node-to-node network communications without using files.Furthermore, we designed an efficient node allocation to initialize the expensive part <2> 30minute SCALE forecasts every 30 seconds [32,34].In addition, selecting proper configurations for the SCALE-LETKF is not a trivial task.We performed comprehensive sensitivity tests with various choices of grid spacings, ensemble sizes, LETKF localization scales, and boundary data options and reached the overall configurations shown in Fig. 3 with reasonable trade-offs of forecast accuracy and computer time with allocated resources [35].6 How Performance Was Measured

What application(s) was used to measure performance
We measured time-to-solution as shown in Figs. 2 and 4; this is the most relevant measure for the real-world NWP application.This is a performance actually measured (not projected), based on the entire application including I/O and with uniform single precision.The time-to-solution includes the time for the MP-PAWR data file creation which is provided by the MP-PAWR hardware and beyond our control.We included this part since it contributes to the forecast lead time for end users.Figure 4 shows exactly what is included in time-to-solution, which is defined as the total wall-clock time from time Tobs (00:00:00 in Fig. 2) when the MP-PAWR completes the scanning of the previous 30 seconds to time Tfcst when the final production forecast data file is created.The raw MP-PAWR data includes the time stamp when the MP-PAWR scanning is completed, and we used this time stamp as Tobs.This time Tobs coincides with the valid time of data assimilation (Fig. 2, <1-1>) and the initial time of the 30-minute forecast (Fig. 2, <2>).We did not use the MP-PAWR data file time stamp which includes additional data creation time after the scan completion.For the final production forecast data, the file time stamp was used as Tfcst which is the time when the final product is available for use.
To summarize, time-to-solution includes the time to create the MP-PAWR data file in the server at Saitama University (Fig. 4, File creation), the time for JIT-DT data transfer (Fig. 4, JIT-DT), and the time to run parts of the SCALE-LETKF including <1-1> LETKF data assimilation and <2> SCALE 11-member 30minutes forecast.Time-to-solution was measured every 30 seconds whenever the end-to-end workflow ran normally.In addition to the time-to-solution, we measured the forecast skill for the relevance to the real-world application.We show a single heavy rain event on July 29, 2021.30-minute forecast rains are compared with the actual MP-PAWR observation.Both a snapshot and time-averaged skill are investigated.For comparison, the persistence forecast is used as a baseline, following a common practice in the meteorological domain science.In the persistence forecast, the initial rain patterns are taken from the MP-PAWR observation and do not evolve.Other comparative forecast data did not exist since, for example, the JMA operational LFM was initialized every hour (cf.Table 1).To highlight this point, we selected the initial time to be 19:27:30 UTC, July 29, 2021, for the snapshot.One can imagine that the novel BDA system is the only forecasting system that can initialize a forecast at such a fractional time.

System and environment where performance was measured
JIT-DT uses the Science Information NETwork (SINET), a scientific communication infrastructure for academic institutions in Japan.SINET offers fast and reliable data transfer between Saitama University and Fugaku at RIKEN Center for Computational Science in Kobe with 400 Gbps line [36].This experiment required a continuous use of computational nodes of Fugaku without que time.Therefore, we made a special use case of Fugaku for an exclusive access to 11,580 nodes for the periods of Olympics and Paralympics (July 20 to August 8 and August 25 to September 5) except for the period from July 27 to August 8 when technical issues forced us to use 13,854 nodes.Moreover, we had a stable performance for disk access by a special exclusive allocation of a disk volume.The resources were competitively granted under the HPCI (High Performance Computing Infrastructure) general access program (project ID: hp210028).This was the very first use case of an exclusive access to Fugaku computational nodes and a disk volume for a real-time real-world application.The inner domain SCALE-LETKF uses 8888 nodes, 8008 of which was used for part <1> and the rest 880 for part <2> (cf.Fig. 2).

Performance Results
Figure 5 shows time-to-solution measured every 30 seconds for the entire experimental period.The system performed stably in general and produced total 75,248 forecasts, net 26 days 3 hours and 4 minutes during the 1-month period.Time-to-solution was less than 3 minutes for ~97% of the cases (Fig. 5c).On average, JIT-DT sends ~100MB data in ~3 seconds, and <1> SCALE-LETKF takes ~15 seconds, and <2> SCALE 30-minute forecast takes ~2 minutes.We would expect some variations of compute time depending on the area of rain.Generally speaking, the more the rain area, the more the computation since we need to process more information content.
Figure 6 (a) shows an example of 30-minute forecast rains initialized at 19:27:30 UTC, July 29, 2021, one of the 75,248 forecasts.Although this single snapshot is shown here, the results are generally similar at other times with robust conclusion.This forecast valid at 19:57:30 UTC was available within three minutes by 19:30:30 UTC (cf.Fig. 5).This was only possible by the novel BDA system; no other system existed to produce an equivalent forecast since, for example, the JMA operational LFM was initialized every hour (cf.Table 1).The forecast shows heavy rains with strong radar reflectivity (Fig. 6a, orange shades >40 dBZ), which agrees generally well with the actual MP-PAWR observation (Fig. 6b).If we look at the average forecast skill for heavy rains (>30 dBZ) for this case, the BDA system produces highly skillful forecasts compared with the baseline persistence (Fig. 7).The only exception is the forecast initial time when the persistence shows perfect skill (i.e., equal to the MP-PAWR observation) since the persistence forecast starts from the very MP-PAWR observation at the initial time.In addition, Fig. 7 red line shows monotonic decline of forecast skill.Namely, if we wait for another 30 seconds, we would statistically expect to obtain a Big Data Assimilation: Real-time 30-second-refresh Heavy Rain Forecast Using Fugaku during Tokyo Olympics and Paralympics SC23, November, 2023, Denver, Colorado USA newly refreshed forecast with consistently higher skill.Such precise forecast will bring a revolution to emergency management for increasing risks of local torrential rains and associated disasters under the changing climate.With the innovations realized, we demonstrated that the massively parallel and I/O intensive computation of the real-time 30-secondrefresh NWP was feasible during Tokyo Olympics and Paralympics.The 30-second refresh rate is 120x faster than the current state-of-the-art 1-hour-refresh systems and provides an effective approach to precise prediction of convective rains, essential for risk management under the changing climate.In the 1-month experiment, we found multiple promising cases of such precise prediction of the real-world rain events as shown in Figs. 6  and 7. Figure 8 shows another example on July 30, 2021, in which we find precise 3-D structures of each rain core.This would enhance our understanding of convective rains, their predictability, and the associated risks as important climate impacts.
The innovative BDA technology has already started expanding to solve the global climate crisis.For example, an international cooperation project between Argentina and Japan has been launched recently to develop Argentinean operational NWP and flood risk prediction systems based on Japan's BDA project achievements.Climate change induces increasing global risks for flood disasters, particularly in vulnerable populated regions.We will further expand the development in Argentina toward various places in the world.
Expo 2025, Osaka, Kansai, Japan is another good opportunity to further develop the BDA system.We have new MP-PAWRs installed in Osaka and Kobe, and the dual coverage is available.Our recent simulation study based on the BDA project achievements suggested that multiple PAWR coverage be beneficial for disastrous heavy rain prediction [42].
The success of the real-time BDA computation during Tokyo Olympics and Paralympics demonstrated a promising future direction.Future HPC systems with more compute power and I/O performance will enhance the BDA system with even bigger data from increasing number and spatiotemporal resolution of new future sensors including satellite remote sensing.This type of precise rain prediction would enhance human's predictive capability and would be essential to overcome the increasing risks of climate change.

Figure 1 :
Figure 1: Final production images of (a) map view of rain intensity in the RIKEN webpage and (b) 3-D views of the MTI's smartphone application.

Figure 2 :Figure 3 :
Figure 2: Overall workflow.Time 00:00:00 indicates the time when the MP-PAWR scanning of the previous 30 seconds is completed.Time-to-solution starts from 00:00:00 until the 30-minute ensemble forecast file is created.JIT-DT transfers the MP-PAWR data to Fugaku.The SCALE-LETKF is a single executable run on 8888 nodes of Fugaku with multiple components labeled as <1-1>, <1-2>, and <2> corresponding to the main text descriptions.

Figure 5 :
Figure 5: Every 30-second time series of time-to-solution (minutes, left axis) for each forecast initial time (JST) in 2021 for the periods of (a) Olympics and (b) Paralympics.Gray shadings show the periods when 30-minute forecasts were not produced in due course.Cyan and blue curves show the independent Japan Meteorological Agency observed rain area (100 km 2 , right axis) in the computational domain for rain rates >= 1 mm/h (cyan) and >= 20 mm/h (blue).(c) Histogram of time-to-solution (minutes).Total 75,248 forecasts were issued.

Figure 6 :
Figure 6: (a) 30-minute forecast rains at 19:57:30 UTC, July 29, 2021.Colors represent radar reflectivity (dBZ) at the 2-km height.(b) Similar to (a), but for the actual MP-PAWR observation at the closest time.Hatched areas indicate no data due to out of the 60km range, radar beam blockage, or other reasons.

Figure 7 :
Figure 7: Heavy rain forecast skill as shown by threat scores (the higher, the more skillful) for radar reflectivity at the 30dBZ threshold for 120 forecast cases between 19:00:00 UTC and 20:00:00 UTC, July 29, 2021.Red and black lines indicate the BDA system and persistence, respectively.

Figure 8 : 3 -
Figure 8: 3-D bird's-eye view of 30-minute forecast rains at 04:48:00 UTC, July 30, 2021.Colors represent simulated radar reflectivity every 10 dBZ for 10-50 dBZ.Vertical scale is stretched by three times.Map data is from the web page of the Geospatial Information Authority of Japan [43] (Courtesy of H. Sakamoto of RIKEN).