Reduced non-Gaussianity by 30-second rapid update in convective-scale numerical weather prediction

Non-Gaussian forecast error is a challenge for ensemble-based data assimilation (DA), particularly for more nonlinear convective dynamics. In this study, we investigate the degree of non-Gaussianity of forecast error distributions at 1-km resolution using a 1000-member ensemble Kalman filter, and how it is affected by the DA update frequency and observation number. Regional numerical weather prediction experiments are performed with the SCALE (Scalable Computing for Advanced Library and Environment) model and the LETKF (Local Ensemble Transform Kalman Filter) assimilating every5 30-second phased array radar observations. The results show that non-Gaussianity develops rapidly within convective clouds and is sensitive to the DA frequency and the number of assimilated observations. The non-Gaussianity is reduced by up to 40% when the assimilation window is shortened from 5 minutes to 30 seconds, particularly for vertical velocity and radar reflectivity.

In this study, the regional NWP model known as the Scalable Computing for Advanced Library and Environment model (SCALE, Nishizawa et al., 2015) is used, coupled with the local ensemble transform Kalman filter (LETKF, Hunt et al., 2007). Lien et al. (2017) and Honda et al. (2018) describe the SCALE-LETKF system in detail. The model configuration follows Lien 55 et al. (2017) with a single-moment bulk microphysics scheme (Tomita, 2008), a level-2.5 boundary layer turbulence scheme (Nakanishi and Niino, 2004), the Model Simulation Radiation Transfer radiation scheme (Sekiguchi and Nakajima, 2008), and soil processes represented by a Beljaars-type soil model (Beljaars and Holtslag, 1991).
The SCALE-LETKF system is implemented over a single domain with horizontal resolution of 1 km, 50 vertical sigma levels, and a size of 180 km by 180 km (Fig. 1a). A 1000-member ensemble is used to assimilate the observations. Kondo 60 and Miyoshi (2019) showed significant sampling error contaminations in non-Gaussian measures when the ensemble size is smaller than 1000. The initial conditions for the first cycle and the boundary conditions are taken from the National Centers for Environmental Prediction Global Data Assimilation System final analysis (FNL). Using FNL as the boundary conditions may be overly optimistic for the forecasting purpose, but this is not relevant to the goal of this study which focuses on non-Gaussian distributions and the impact of DA frequency. The boundary-condition ensemble is perturbed by adding balanced large scale 65 random perturbations following (Necker et al., 2020a). These perturbations are generated by taking differences of the Climate Forecast System Reanalysis Saha et al. (2010) fields corresponding to randomly selected dates in the same season at the same time of the day. The perturbations are scaled by a multiplicative factor of 0.1 so that the amplitude of the perturbations is roughly equivalent to 10% of the climatological variability. All variables including soil variables are perturbed.
In the SCALE-LETKF system, radar data can be assimilated using different localization scales for different variables. Based 70 on preliminary experiments with the SCALE-LETKF using smaller ensemble sizes and every-30-second PAWR data, it was found that vertical localization scale of 2km (with a 7.3km cut-off, similarly hereafter) produced good results. For horizontal localization, better results were obtained using 4 − km localization to assimilate observations with reflectivities > 10dBZ.
Observations of reflectivity values ≤ 10dBZ are assimilated with a fixed value of 10dBZ to avoid large observation-minusforecast departures associated with clear air reflectivities (Aksoy et al., 2009). Also, a shorter horizontal localization scale of 75 2km is used to reduce the impact of no-rain observations at the edge of clouds. Doppler velocity observations are assimilated with horizontal and vertical localization scales of 10km and 3km, respectively. A relaxation to prior ensemble spread (RTPS, Whitaker and Hamill, 2012) with a relaxation parameter of 0.9 is applied. This helps consider the inhomogeneous distribution of observations as in Lien et al. (2017).
Reflectivity and Doppler velocity observations are superobbed to horizontal resolution of 1km and vertical resolution of 80 500m to match the model resolution. The observational error standard deviations for these super-observations are set at 5.0dBZ and 3.0ms −1 for reflectivity and Doppler velocities, respectively. The radar data are assimilated up to a maximum height of 11km.
A spin-up DA experiment with every-5-minute PAWR reflectivity and Doppler velocity data is performed for an hour from 0400 UTC, July 13, 2013. Only a single PAWR volume scan closest to the analysis time is assimilated per analysis. The 85 1000-member analysis ensemble at 0500 UTC is used as the initial conditions for the DA experiments. Experiments are performed with different DA update frequency to study the impacts of DA frequency and observation number on the forecast error distributions. All experiments share the configuration described above, but the only differences are the DA frequency and the amount of the data assimilated. First, four experiments with 5, 2, 1, and 0.5 minutes DA frequencies are performed, hereafter referred to as 5MIN, 2MIN, 1MIN, and 30SEC, respectively. Here, only a single volume scan closest 90 to the analysis time is used per analysis. Namely, more frequent updates assimilate more data.
Next, to separate the impact of DA frequency and the amount of data assimilated, two additional experiments are performed using a 5-minute and 1-minute DA frequency, with all radar volumes every 30 seconds assimilated by a 4-dimensional EnKF approach Hunt et al. (2004). These experiments are referred to as 5MIN-4D and 1MIN-4D, respectively, assimilating the same amount of data as 30SEC but using longer assimilation windows.

95
To measure the degree of non-Gaussianity of the error distributions we compute the Kullback-Leibler divergence (hereafter KLD, Kullback and Leibler, 1951) which is defined as follows: where P (x) and Q(x) are two probability density functions (PDFs). In our case P (x) is the ensemble-based sample distribution of x, and Q(x) is a Gaussian distribution whose mean and standard deviation are given by the ensemble-based sample 100 estimates. The KLD is 0 if P and Q are the same. Therefore, a low KLD value corresponds to the sample distribution close to a Gaussian. To compute the KLD for different variables and at different grid points, we approximate P (x) with the sample histogram populated from the 1000-member ensemble using 32 equally-sized bins covering the range where P (x) is greater than 0.

105
All experiments show that the analyzed reflectivity fields are in good agreement with the observation. However, some differences can be found between the experiments that assimilate different amounts of data and with different assimilation windows.
For example, Figures 1c and d show that 30SEC captures the strong reflectivity areas (>45 dBZ, orange and red shadings) better than 5MIN. 5MIN shows noisy patterns of spurious convective cells surrounding the main convective rainband.
First, the impact of data assimilation frequency is explored by the 5MIN, 2MIN, 1MIN, and 30SEC experiments. Here, 110 more observations are assimilated with more frequent data assimilation. Figure 2 top row (a-d) shows that the reflectivity (Z) patterns (shades) are similar among all experiments, but vertical velocity (W, contours) are different. Figure 2e shows strong non-Gaussianity in the first-guess ensemble in W and temperature (T). KLD for W and T are consistently reduced with more frequent DA (Fig. 2e-h, shades and blue contours), although the reduction is smaller for T. Overall, KLD is reduced more from 5MIN to 2MIN than from 1MIN to 30SEC. The forecast error distributions for W and Z at the location of maximum KLD for 115 W show some discrepancies from the Gaussian distribution (Figs. 2i-p). The ensemble spread for W is reduced significantly from 5MIN to 2MIN (Fig. 2e-h, red contours). 5MIN shows strong non-Gaussianity for W at the southern edge and the highest peak of the convective cell (Fig. 2e), which is probably related to the development of a new updraft in the southern edge and the top of the strong updraft, respectively. Weaker low-level maxima south of the convective line are associated with shallow convective clouds that are not effectively corrected by radar observations. Kondo and Miyoshi (2019) found that in synoptic 120 scales, the ensemble spread maxima are co-located with the KLD maxima. At convective scales for W, the ensemble spread maxima ( Fig. 2e-h, red contours) are slightly out of phase with respect to the KLD maxima (shades). The KLD maxima for T (blue contours) are approximately collocated as those for W (shades). These KLD maxima can be associated with non-Gaussianity in W through vertical advection of scalar quantities such as T and moisture. Another KLD maximum for T is found near the surface south of the convective cell, probably associated with the gust front.   4D-EnKF experiments allow us to investigate the impact of changing the assimilation frequency while keeping the observation number unchanged. 5MIN-4D shows almost the same ensemble spread for W (Fig. 3c, red contours). KLD for W (Fig. 3c, shades) is lower, indicating that the observation number contributes to reducing non-Gaussianity. This is not the case for T for which KLD is similar or larger (Fig. 3c, blue contours). 1MIN-4D is close to 1MIN and 30SEC in terms of non-Gaussianity and the shape and strength of the convective cell (Figs. 3b and d).

Conclusions
1000-member 1-km-resolution ensemble DA experiments were performed using real phased array radar observations and a mesoscale NWP model to investigate the impact of DA frequency and observation number on the non-Gaussian error distributions. We found that a DA frequency of 5 minutes, although it was already much faster than the typical DA frequency, resulted in strong non-Gaussianity possibly affecting the performance of the EnKF. Non-Gaussianity is stronger for vertical velocity as has 160 been found by Kawabata and Ueno (2020). Non-Gaussianity is also larger at mid-levels within convective cells, near the level of larger latent-heat release and vertical accelerations associated with convective instability. At convective scales, some of the local maxima in KLD can be related directly to advection by mesoscale circulations associated with strong convective cells, but other processes not specifically presented in this study may also possibly contribute to the generation of non-Gaussianity, such as those not directly associated with clouds, like differential heating circulations or gravity waves. We found that increasing the 165 analysis update frequency and observation number from 5 minutes to 30 seconds has a huge impact upon non-Gaussianity in the error distributions for all model variables but particularly for vertical velocity and reflectivity which are the ones showing larger KLD from Gaussianity at these scales. Increasing the assimilation frequency to 30 seconds and assimilating more observations can reduce KLD by up to 40%. Moreover, 4D-EnKF experiments revealed that for frequent DA of every 1 minute, the observation number explained most of the reduction in non-Gaussianity; in contrast, for a longer window of 5 minutes, even 170 the experiments using all 30-second-frequency observations presents significant departures from the Gaussian. While convective clouds are particularly favorable for nonlinear error growth, non-Gaussianity is not necessarily larger within convective clouds. This is mainly due to the convective-scale radar DA is usually most effective within precipitating clouds. This is the