Simulating model uncertainty of subgrid-scale processes by sampling model errors at convective scales

Abstract. Ideally, perturbation schemes in ensemble forecasts should be based on the statistical properties of the model errors. Often, however, the statistical properties of these model errors are unknown. In practice, the perturbations are pragmatically modelled and tuned to maximize the skill of the ensemble forecast. In this paper a general methodology is developed to diagnose the model error, linked to a specific physical process, based on a comparison between a target and a reference model. Here, the reference model is a configuration of the ALADIN (Aire Limitée 5 Adaptation Dynamique Développement International) model with a parameterization of deep convection. This configuration is also run with the deep convection parameterization scheme switched off, degrading the forecast skill. The model error is then defined as the difference of the energy and mass fluxes between the reference model with scale-aware deep convection parameterization and the target model without deep convection parameterization. In the second part of the paper, the diagnosed model-error characteristics are used to stochastically perturb the fluxes of the 10 target model by sampling the model errors from a training period in such a way that the distribution and the vertical and multivariate correlation within a grid column are preserved. By perturbing the fluxes it is guaranteed that the total mass, heat and momentum remain conserved. The tests, performed over the period 11 – 20 April 2009, show that the ensemble system with the stochastic flux perturbations combined with the initial condition perturbations, not only outperforms the target ensemble, where deep convection is not 15 parameterized, but for many variables it even performs better than the reference ensemble (with scale-aware deep convection scheme). The introduction of the stochastic flux perturbations reduces the small-scale erroneous spread while increasing the overall spread leading to a more skillful ensemble. The impact is largest in the upper troposphere with substantial improvements compared to other state-of-the-art stochastic perturbation schemes. At lower levels the improvements are smaller or neutral, except for temperature where the forecast skill is degraded. 20


Introduction
Numerical weather prediction (NWP) is a numerical initial value problem with models based on equations that describe the evolution of the atmospheric state and its interaction with other Earth system components. The solution of these equations is highly sensitive to the initial conditions (e.g. Thompson, 1957;Lorenz, 1969;Vannitsem and Nicolis, 1997; description of the methodology is given, after which it is applied to deep convection. This is done by running the ALADIN (Aire Limitée Adaptation Dynamique Développement International) limited area model (LAM)  in a configuration where deep convection is not parameterized and comparing it to a reference configuration with a scale-aware deep convection parameterization scheme. The simulations are performed over the tropics during a period of enhanced convective activity on a grid with a horizontal grid spacing of 4 km. This resolution lies on the verge of what is considered the convection-15 permitting scale, where it has been shown (Deng and Stauffer, 2006;Lean et al., 2008;Roberts and Lean, 2008) that nonparameterized convection often shows unrealistically strong updrafts and density currents as well as overestimated precipitation rates.
The model error is expressed in the form of a flux difference and its characteristics are obtained from a well-chosen training period. After discussing the statistical properties of the model error, its usefulness for probabilistic forecasting is investigated by 20 developing a prototype stochastic flux perturbation scheme. The scheme introduces flux perturbations by sampling the model error from the training period. The impact of perturbing the fluxes, using the properties of the model error, on a short-range (48 h) LAM forecast is then studied, revealing a positive impact on the forecast quality.
The structure of this paper is as follows. The methodology for quantifying the model error and its statistical properties are presented in Section 2. In Section 3 the design of the perturbation scheme is described and the results of its application are 25 discussed. Finally, the main results are summarized and a conclusions are drawn in Section 4.
2 Quantification of the model error

Methodology for a high-order NWP model
In the theoretical work on generic dynamical low-order models (e.g. Nicolis, 2003Nicolis, , 2004Nicolis et al., 2009) the behaviour of the model error is investigated by comparing the evolution laws of an approximating model with the exact evolution laws. For 30 these low-order problems, both the model and exact evolution laws can be written in an analytical form. This is formally done as follows. First one writes: where X represents the exact solution of the perfect model f and and Y the solution of the model g. Then, starting from identical initial conditions X(t 0 ) = Y(t 0 ) one can write for a sufficiently small time interval δt: X(t 0 + δt) = X(t 0 ) + f (X(t 0 ))δt (3) 5 Y(t 0 + δt) = X(t 0 ) + g(X(t 0 ))δt. (4) The model error can be written as U(t) ≡ Y(t) − X(t) by extracting Eq. (3) from Eq. (4) U(t + δt) = (g(X(t 0 )) − f (X(t 0 ))) δt.
The model error source, determining the short time behaviour of U is then characterized by estimating 10 Of course, when dealing with the comparison of the evolution operator of an NWP model and reality, the model error cannot be formally expressed as the underlying dynamics of the reality is not fully known. Nevertheless the same formal approach can be applied to quantify the model errors that are caused by poor choices in the model formulation compared to well-performing ones of a reference version. In practice this is done by 15 1. running a validated reference configuration of a full NWP model.
2. switching off one of the parameterizations (or replacing it with a poor choice) to create a target of study, 3. verifying with standard NWP scores that the target configuration performs less well than the reference configuration, 4. computing the model error.
In this case we consider the reference as a perfect model corresponding to the model f in Eq. (1), while the target model now 20 becomes the one in Eq. (2). We then define the model error as in Eq. (6).
Here we apply this method to quantify the model error related to a specific parameterization of a subgrid process. The formulations of both models differ only in the representation of the physical process under consideration. Other parameters, such as the time step and vertical and horizontal resolution are taken identically. Finally, both models start from the same initial model state, and the error is evaluated over a single time step. The model error obtained in this way, can then be seen as a source 25 of model uncertainty of the subgrid processes. It will then be shown how its properties can serve as a guide when developing stochastic perturbation schemes.
We will apply this method to simulations with the ALADIN NWP model, using the ALARO (ALadin-AROme) canonical configuration , henceforth called the ALARO model. The model can be run both in a hydrostatic and a non-hydrostatic configuration and uses a spectral dynamical core with a two time level semi-Lagrangian semi-implicit scheme.
The vertical discretization uses a mass-based hybrid pressure terrain-following coordinate. An overview of the used parameterization schemes and a description of the scale-aware deep convection parameterization can be found in Appendix A.
schemes within the same model (e.g. radiation, turbulence, condensation, microphysics). However, in this paper, it is applied to the representation of deep convection in a model with a horizontal grid spacing of 4 km. The reference model uses a scaleaware deep convective parameterization, called 3MT (Gerard et al., 2009). This scheme has been developed specifically for the convection-permitting transition regime within the so-called grey zone of deep convection (Yano et al., 2018). These are horizontal resolutions where deep convection is not entirely resolved by the dynamics. As such the parameterization accounts 10 for some subgrid transport that would be handled by the dynamics at fully convection resolving resolutions. In operational applications with resolutions of about 4-5 km, the ALARO model is usually run with the hydrostatic setup of the dynamical core.
For the target setup we switch off the 3MT scheme. Fig. 1 shows, for the experimental setup that will be described in section 3.2, that switching off the deep convection scheme leads to a significant degradation of the model performance, both 15 for the hydrostatic configuration and the non-hydrostatic configuration. So the computed model error can be seen as an error with respect to the operational reference. Moreover, it can be seen that the hydrostatic version performs better than the nonhydrostatic one, which does not come as a suprise since the entire ALARO model configuration including the deep-convection parameterization has been tuned with the hydrostatic option of the dynamical core. It is also useful to note that the error correction between the simulation with deep-convection parameterization and the one without is largest for the hydrostatic 20 simulations, which confirms the above statement that the parameterization somehow compensates for non-hydrostatic subgrid effects. Verification scores for the 250 hPa (not shown) lead to the same conclusions. Therefore, in this paper, the hydrostatic version of the model will be used as the reference.
Within the ALARO model configuration the physics-dynamics coupling is organized in a flux-conservative way according to the proposal of Catry et al. (2007) where the contributions from the different physics parameterizations appear as fluxes in 25 the thermodynamic energy equation. This allows to express the model errors of Eq. (6) between the simulation without deep convection and the one with deep convection parameterization in terms of the fluxes instead of tendencies. Here we use the following definition, where J td ψ and J c ψ represent respectively the turbulent diffusion and convective transport flux of water vapour (q v ), cloud water (q l ), cloud ice (q i ), enthalpy (h) and zonal (u) and meridional (v) momentum. F st ϕ and F c ϕ represent the stratiform and convective condensation flux from water vapour to cloud water (vl) and from water vapour to cloud ice (vi). Stratiform and convective evaporation fluxes from rain to water vapour (rv) and snow to water vapour (sv) are denoted by F st φ and F c φ respectively. Non-primed fluxes are those obtained when running with deep convection parameterization while primed fluxes represent those using no deep convection parameterization. By defining the errors in this way one can see in Eqs. (7)  In a numerical model, the time derivative in Eqs. (3) and (4) can be approximated by the evolution during a single model time step at t 0 . For the model error to be correctly quantified, it is thus important that both reference and target model start from the same atmospheric state. For this reason one should use exactly the same model as the reference and the target model and only modify the source of the error, and nothing else. On the other hand, it is well known that NWP models suffer from spin 10 up periods during the first hours of the model runs when the model is still finding its physical balances. So taking initial model states immediately after the initial state may generate spurious tendencies which may lead to spurious errors in Eqs. (7)-(9).
We solve this issue here by performing two parallel runs with from identical initial conditions with the reference configuration.
These identical simulations are given 12 hours to spin up. After 12 hours the deep convection parameterization in one of the simulations is switched off and flux errors defined in Eqs. (7), (8) and (9) are diagnosed one time step later. In the next section 15 it will be explained how this is implemented within the NWP model.

Construction of a model error database: implementation with an NWP model
The previous section explained how the model error due to the parameterization of deep convection can be quantified appropriately. This section describes how a database of model errors can be constructed.
In order to build a database with relevant weather cases (i.e. cases with active convection), the model simulations are per-   The deactivation of the deep convection parameterization leads to an instant increase of the domain-averaged water vapour 10 transport flux due to the absence of a (mainly) upward convective transport (Fig. 4a). We see that the dynamics of the NCP model instantly takes over some of the removed transport of the deep convective parameterization, and this leads to a difference that remains fairly constant between the NCP and the CP throughout the remainder of the two runs. The total transport flux difference one time step after the switch can be considered as a representative measurement of the error in the transport flux as defined in Eq. (7). Similar conclusions were found for the transport of the other species in Eq. (7) (not shown).

15
Form Fig. 4b it can be seen that the resolved condensation in the microphysics parameterization starts to compensate for the absence of convective condensation with a negative jump at 12-h lead time. However, the NCP time series converges to the one of the CP run after 2 hours. So the time series of the condensation flux does not suggest a model error after 2 hours but instead we find that the NCP run is in an adjustment phase of the microphysics at the time step where we compute the model error. In The differences between the fluxes are diagnosed after one time step (red dashed vertical line). this paper we will not investigate this issue in further detail but instead we only the study of the errors for the transport fluxes Compared to water vapour, the error in enthalpy transport ( Fig. 5b) peaks at a higher level (250-350 hPa). This peak is linked to the updraft parameterization, typically entraining warm air at its start and detraining it at higher levels. The reason the maximum transport flux error of water vapour lies at lower levels than that of enthalpy is explained by the condensation. At higher levels condensation removes some of the water vapour from the updraft reducing its total transport. The smaller peak around 950 hPa in Fig. 5b coincides with that of Fig. 5a and is thus also linked to the downdraft transporting colder air down. expect the error to be negative since the missing parameterized updraft transports air with lower momentum to higher levels where detrainment reduces the total momentum. The wind direction, however, is dominantly westward resulting in a missing upward transport of slower, negative momentum and thus a positive error. At lower levels the downdraft transports high momentum eastward (positive) wind downward resulting in a (small) negative momentum flux error at 900-1000 hPa.
The standard deviations, shown in the bottom row of Figure 5, are typically one order of magnitude larger than the mean 10 of the respective errors. The convective transport is thus, besides being responsible for a systematic forcing, also an important source of variability for the tendencies in the middle and upper troposphere. Figure 6 shows the probability density distribution of the flux errors at the level of highest variability (∼ 250 hPa) for water vapour, enthalpy and zonal momentum. All three distributions show a large peak around zero, indicating that in a majority of 15 the grid points no error is present. This peak represents the grid points without any convective activity. The long tail of the distributions explains the large variance seen in Fig. 5. More specifically, the linear trend in the log-scaled figures for positive values, most distinct for water vapour and enthalpy, hints at an exponential distribution of the transport error in grid points with convective activity. The number of grid points having a negative transport flux error is very small (0.01 % and 1.8 % of the total amount of grid points with non-zero error for respectively water vapour and enthalpy).

Vertical correlation
The vertical autocorrelation of the water vapour, enthalpy and zonal wind transport flux errors is shown in Fig 7. The correlation with vertically neighbouring grid cells is high for all three flux errors. The vertical correlation decays fastest for enthalpy, where the correlation drops below 0.2 after a distance of 75 hPa. Vertical correlation becomes negative for all errors for distances larger than 225 hPa. The vertical correlations indicate a clear vertical structure of the transport fluxes. Accounting correctly for this 25 vertical structure is important when developing a stochastic perturbation scheme in Sect 3.

Inter-variable correlation
The transport flux errors of the different variables all originate from the absence of an up-and downdraft. Therefore, their inter-variable correlation is expected to be large. However, the correlation matrix (Table 1) shows only moderate correlation between water vapour, cloud water and zonal wind, while the correlation coefficient of the flux error of meridional wind w.r.t. 30 the other errors is close to zero. This is because, contrary to the other variables, meridional momentum flux error doesn't have a dominant sign. For the same reasons as discussed for the vertical correlation, an appropriate representation of these inter-variable correlations should be a major requirement when building a stochastic perturbation scheme.   Figure 8 shows the correlation between the transport flux and its error for water vapour, enthalpy and zonal wind for all model levels. In general, the relation between a flux and its corresponding error is weak, with correlation coefficients ranging from 5 -0.1 to 0.3. This seems to suggest that a stochastic perturbation scheme based on a simple multiplication of the fluxes with a random number, will ignore part of the actual variability seen in the reference configuration, resulting in an underdispersive ensemble system. 3 Random Perturbations for uncertainty forecasting

Perturbation scheme
When perturbing the transport fluxes during a forecast, the distribution of the perturbations ideally is the same as the distribution of the errors described in the previous section. Additionally, the perturbations should preserve the vertical, horizontal, temporal and inter-variable correlations of the errors. However, how to enforce all of these constraints simultaneously remains an open question, not only here but also in other schemes ). Here, a simple perturbation scheme is proposed 5 that simulates appropriately the distributions (Fig. 6), while also preserving the vertical and inter-variable correlation of the errors. As discussed above, the correct representation of both the vertical and inter-variable correlation of the transport flux perturbations is necessary to keep the perturbed fluxes physically consistent.
The vertical and inter-variable correlation are preserved by organizing the flux-error profiles per grid column in the model error database, i.e. by grouping the vertical error profiles trans ψ for all species ψ = q v , q l , q i , h, u, v at a specific grid point and a specific time always as one multivariate set. During the forecast, the transport fluxes inside a particular grid column are randomly perturbed by sampling a grid column from the model-error sets and subtracting the associated flux errors from the respective total fluxes (Fig. 9). The resulting perturbed tendencies can thus be written as: with all flux-error profiles (ψ = q v , q l , q i , h, u, v) belonging to the same sampled grid column i. Adding the error as a flux in a scheme we will exclude these point from the model-error database. Only grid columns where the convective activity is significant are retained by defining a cut-off updraft mass flux M cut-off and selecting only those grid columns which satisfy: with M u the updraft mass flux, available from the CP configuration, averaged over the vertical column. This allows us to retain only the error at those grid columns where convective activity is relevant. This is achieved by taking a value of M cut-off = -0.5 Pa s −1 .

5
Finally, a proxy for deep convection is needed to determine the grid columns in the NCP run where the transport fluxes should be perturbed. Two different convection criteria are compared. The first criterion (labelled MOCON) uses the vertically integrated moisture convergence in the PBL as a proxy for convective activity (Waldstreicher, 1989;Calas et al., 1998;van Zomeren and van Delden, 2007), while the second configuration (labelled OMEGA) uses the grid column averaged resolved vertical velocity. For both proxies a threshold value, M OCON c and OM EGA c respectively, is determined. During every time 10 step of the integration, when the chosen threshold is exceeded for a given grid column, a sample is drawn from the database as shown in Fig. 9

Results and Discussion
The above described perturbation scheme has been tested with the ALARO model configuration without deep-convection parameterization to see whether a stochastic perturbation can compensate for the error due to the lack of the parameterization.
The runs were carried out for a 10 day period between 11 and 20 April 2009 over the same domain as described in Sect.
2.2. Every day a 10-member 48-hour ensemble forecast is started at 0000 UTC. Tests were carried out both for the hydrostatic and non-hydrostatic setup of the ALARO model. Since conventional wisdom dictates that non-hydrostatic models should be 5 applied at these resolutions we will only show here the results for the non-hydrostatic model configurations. The tests with the hydrostatic formulation (not shown) led to exactly the same conclusions.

No IC and LCB perturbations
The direct impact of the stochastic forcing is studied by running both ensemble configurations MOCON and OMEGA without IC and LBC perturbations and comparing the RMSE of the ensemble mean (w.r.t. the ERA5 HRES analysis) and the RMSE of 10 the deterministic NCP (target) forecast with that of the deterministic CP (reference) forecast.
The impact is largest in the upper atmosphere (Fig. 10). Here the RMSE of the ensemble mean of both ensemble configu-       Figure 11. Same as Fig. 10 but for the 850 hPa level rations is significantly better than the NCP configuration for most of the variables and leadtimes. The OMEGA configuration only performs better than the CP configuration for specific humidity (all leadtimes) and vertical velocity (after 24 h).
This means that perturbing with flux errors indeed compensates the error made by disabling the deep convection parameterization, as was the immediate goal of these perturbations. But even better, part of the error in the reference configuration due to the stochastic nature of deep convection, is correctly captured by the flux-error perturbation scheme.
In the lower atmosphere (Fig. 11) the effect of the stochastic perturbations is smaller. Here, the forcings have no significant 5 impact on the temperature RMSE and also the reduction in specific humidity error is, albeit still significant, smaller. This is explained by looking back at profiles of flux differences in Figs. 5a and 5b. At 850 hPa the mean tendency perturbations caused by the addition of the corrective flux profiles is close to zero for both specific humidity and temperature, while also the standard variation of the corrective fluxes is smaller here than aloft. The situation for cloud condensates is opposite with a small but significant improvement in RMSE at the 250 hPa level and a quite large improvement below at the 850 hPa level. Figure 12 shows the spread created by the stochastic forcing from both perturbation configurations for specific humidity at 250 hPa at different leadtimes. The two ensemble configurations create similar spread both in amplitude and in spatial distribution, with largest spread concentrated in regions with convective activity. While the spread does show some spatial growth during the first 24 hours, the impact of the stochastic perturbations remains limited for regions with no convective activity. Furthermore, the spread created during a convective episode does not seem to continue to grow once the convective episode is 15 over and there is no longer a stochastic contribution to the tendencies. This is seen for instance over the Java Sea between Java and Borneo, where the spread induced after 24 hours has almost disappeared 24 hours later. These results clearly show that the proposed stochastic scheme provides a sensible way to generate perturbations. The effect of the stochastic perturbations is namely twofold. First of all they force the individual members closer to the CP forecast, reducing the error. This is a logical consequence of perturbing the members by sampling the flux differences between target and reference forecast. On top of that, the spread is created at locations consistent with the error of the reference configuration, resulting in an ensemble that is more skillful than the reference CP forecast.

IC and LBC perturbations 5
Next, the interaction of the stochastic forcing with the IC and LBC perturbations is studied. The impact on the ensemble spread at 250 hPa is shown in Figure 13. In general the spread created by the stochastic forcing alone is much smaller (between 40 % and 60 %) than that created by the perturbed initial and boundary conditions. Only for cloud condensates and vertical velocity, the spread induced by stochastic forcing alone approaches the spread generated by the perturbed initial and boundary conditions. The figure also shows that the combined effect of physics and IC/BC perturbations is smaller than the sum of the 10 individual effects. This is in agreement with the findings of Baker et al. (2014) and Gebhardt et al. (2011).
For all variables the spread of the CP ensemble is smaller than that of the NCP ensemble. Comparing the snapshots of the zonal wind spread at 250 hPa of CP and NCP on 0000 UTC 13 April 2009 (+48h) (Fig. 14) shows a similar pattern of large scale variability. The higher domain-averaged spread of the NCP ensemble is (dominantly) caused by increased small scale variability. This small scale variability is linked to regions with excessive updrafts resulting from the absence of a sub-grid stabilizing mechanism. It is however questionable that an increase in spread resulting from a inadequate representation of a certain process is an appropriate way to increase the skill of an ensemble.
The addition of the stochastic perturbations increases the spread (between 4.9 % and 8.6 % w.r.t. the NCP ensemble) for 5 all variables at 250 hPa except specific humidity. Fig. 14c shows that the increased spread is mainly caused by an expansion of regions with moderate spread (e.g. around 95 • E), while regions of high spread now show a smoother pattern thanks to the introduced stabilizing fluxes. This is true especially for the specific humidity (not shown), where a decrease in spread is combined with a large increase in skill (Fig. 15b). Here the stabilizing flux perturbations reduce the (erroneous) spread, while still maintaining enough small scale spread resulting in substantial increase in skill. Considering the spread in Fig. 15, for horizontal wind, one can see that the CP reference ensemble performs worse than the target NCP ensemble after 22 (4) hours for the zonal (meridional) component. This is probably due to the smaller spread of the reference ensemble compared to that of the target ensemble (see Fig. 13) as explained above. Finally, for cloud condensates and 5 vertical velocity the difference between target NCP and reference CP ensemble is small and also the impact of the stochastic forcings is limited, with the CRPS of the MOCON and OMEGA ensemble lying below the CP scores or between the NCP and CP ones, respectively. It seems that the spread created by the initial and boundary condition perturbations partly reduces the improved skill of the CP configuration seen in Fig. 10.
At 850 hPa (Fig. 16), the most notable changes in CRPS scores are found in temperature, zonal wind and vertical velocity.     Figure 15. CRPS of the NCP (black), MOCON (blue) and OMEGA (red) ensemble for temperature (a), specific humidity (b), cloud condensates (c), zonal (d) and meridional (e) wind and vertical velocity (f) at 250 hPa. All ensemble forecasts are performed with perturbations in IC and LBC's. Error bars show the 95 % confidence interval. Leadtimes where the ensemble mean RMSE is significantly lower than the NCP control RMSE at the 95% confidence level are indicated with a filled circle. meridional wind scores between NCP and CP ensemble are small and also the impact of stochastic forcings is mostly neutral.
For the upper troposphere zonal wind, the improvements (9 %) in skill are substantial when compared to other state-of-theart stochastic perturbation schemes. In Ollinaho et al. (2017) improvements by the SPP and SPPT scheme around 2.5 % are reported for zonal wind at 200 hPa. Also for 850 hPa zonal wind CRPS improvements induced by the SPP and SPPT scheme (1.8 %) are comparable to the results presented here (1.7 %). The RP scheme presented in Baker et al. (2014) has a slightly larger impact with improvements around 15 % and 10 % for 1.5-m temperature and humidity respectively. However, one should 5 keep in mind that only perturbations related to deep convection are considered here, while the SPP (RP) scheme perturbs 20 (16) parameters touching turbulent diffusion, convection, cloud processes and radiation.
Finally, the relative change in dispersion of the MOCON and OMEGA ensemble is investigated w.r.t. the NCP ensemble. Table 2 shows the relative difference (as a percentage) in the number of times the ERA5 HRES reanalysis lies inside the range of the ensemble at leadtime 48 h. For all configurations the ensemble is underdispersive, a characteristic common to all single-10 model ensembles (Palmer et al., 2005). In the upper atmosphere (250 hPa) the flux perturbation reduce the under-dispersion for all variables. At this level the increase in number of observations falling within the ensemble is the result of a reduction in the bias of the individual ensemble members (not shown) and an increase in the spread. In the middle (500 hPa) and lower (850 hPa) atmosphere, underdispersion is reduced for water vapour and zonal wind, while the stochastic ensembles are more underdispersive for temperature. The flux perturbations have little influence on the spread in the lower atmosphere (not shown).    Therefore, changes in dispersion are mainly attributed to the positive (negative) shift in ensemble bias for specific humidity and zonal wind (temperature).

Precipitation
In what follows, the results for the OMEGA ensemble are very similar to those of the MOCON ensemble and are omitted for clarity. The ensemble skill for precipitation was evaluated using the CRPS calculated for the 3-hourly averaged precipitation w.r.t. the TRMM observations. The CRPS of all ensembles correlates well with the convective diurnal cycle (not shown). smaller peak at 2100 UTC (0400 local time) (not shown). The peaks in CRPS at leadtimes 9 h, 21 h and 33 h are closely linked to these maxima of precipitation. Investigation of the domain-averaged precipitation amounts (not shown) reveals that the absolute error in domain-averaged precipitation is constant throughout the day, meaning that the diurnal pattern in the CRPS is rather caused by a mismatch of local precipitation regions than by an overall misrepresentation of the convective diurnal cycle. Figure 17 shows the relative change in CRPS w.r.t. the CP ensemble. The impact of the absence of a deep convection 5 parameterization and stochastic forcing on the 3-hourly averaged precipitation is relatively small.
There is a neutral to positive impact in skill for the MOCON ensemble (albeit inside the significance confidence intervals).
This is an improvement compared to Baker et al. (2014) who only found improved skill in precipitation rate during the first two hours with their RP scheme and a worsening afterwards. A significant improvement in CRPS is also seen at lead time 33 h, coinciding with a maximum in convective activity. Unsurprisingly, the convective activity must be high in order for the 10 stochastic forcing to have a significant effect on the precipitation.
In order to differentiate between different precipitation regimes, the Brier skill score (BSS) w.r.t. the CP ensemble for 24 h cumulated precipitation is shown in Fig.18 for different thresholds. A significant difference between the NCP and MOCON ensemble is only seen at low thresholds (1 and 5 mm day −1 ). At these low precipitation thresholds, the perturbations bring the precipitation distribution of the MOCON ensemble closer to that of the reference CP ensemble. Fig. 18 also shows that an ensemble forecast. This result can be attributed to the same syndrome of better scores for the wrong reasons as discussed in Sect. 3.2.2. At larger thresholds no significant difference in BSS between the CP and both NCP and MOCON configurations is found. These results further confirm the suggestion that the (transport-) fluxes from the deep convection scheme have a limited impact on the precipitation amounts.

Time to Solution
Finally, the computing-performance of the different configurations is studied.

24
This paper presents a novel approach for perturbing physics parameterizations of numerical weather prediction systems. The novelty lies in the idea that the perturbations are sampled from a database of model error estimates. In particular, the model errors studied here originate from switching off the parameterization of deep convection, thus degrading the forecast skill. The error is then diagnosed by taking the difference of the contributing physics fluxes of the runs with and without parameterization. Analysis of the error fluxes reveals the importance of the vertical correlation structure, as well as of the inter-variable 5 correlation. Next, it is investigated whether these stochastic flux perturbations can increase the probabilistic forecast skill in an ensemble context in such a way that it compensates for the lack of the parameterization.
The sampling method was implemented in a 10-member ensemble system testbed driven by the members of the ERA5 analysis. From the tests it has been shown that the ensemble system with the stochastic perturbations along with the IC and LBC perturbations compensates for the absence of the parameterization scheme. The results shown here were produced in a controlled testbed, relying on perfect boundary conditions from the ERA5 reanalysis. To be able to optimally exploit the features of this approach within an operational EPS system it would be helpful to address a few issues. 25 For instance, a better understanding of the strong moisture readjustment regime after disabling the deep convection parameterization within the error-diagnostics procedure could be useful. It could be investigated how to estimate the condensation flux differences by protecting the convective condensates from re-evaporation after disabling the deep convection scheme. If a controlled adjustment could be developed then the errors in the condensation and evaporation fluxes might also be taken into account in the stochastic perturbation, potentially leading to a higher probabilistic forecast skill. 30 Combining the vertical correlation and multivariate correlation between the different flux perturbations with adequate spatiotemporal correlations would be useful, possibly relying on principle component analysis methods. In addition, the definition of the conditions determining the convective activity of a grid column and triggering the perturbations might be improved following the suggestions made in Banacos and Schultz (2005).
The creation of the sampling database requires extra implementation efforts and maintenance in an operational context.
Additional research could be useful to model the error statistics and to find some universal error functions that may become candidates for PDFs to draw from for the stochastic sampling. The distributions found in Fig. 6 provide a good starting point for fitting such curves. If the functions are parameterized with a few parameters, the cumbersome computing task for estimating the error statistics may be replaced by a tuning exercise. Also the demanding task of building and maintaining the sampling 5 database would not be necessary.
In Subramanian and Palmer (2017) the work on stochastic tropical convection parameterization done by Khouider et al. (2010), Frenkel et al. (2012) and Deng et al. (2015) is used in an ensemble context and the probabilistic features of the so called ensemble superparameterization are compared to those of a classical SPPT approach. It would be useful to extend their comparison with the approach suggested here. The process-oriented perturbation method, studied in this work, is indicated 10 as highly desirable in Leutbecher et al. (2017) and is complementary to the different representations of model uncertainty presented in the introduction. Appendix A: The ALARO model parameterization schemes The ALARO model, used in this study, uses the "p-TKE"-scheme for the parameterization of the boundary layer mixing.
This scheme complements the method described in Louis (1979) by a prognostic equation for the Total Kinetic Energy (TKE) 20 (Geleyn et al., 2006). Shallow convection is part of the turbulence scheme through a modification of the Richardson number (Ri) as described in Geleyn (1986). The ALARO model can be run with and without parameterized deep convection. In the case of parameterized convection, the scale-aware 3MT-scheme of Gerard et al. (2009) is used. This scheme calculates vertical transport fluxes as well as condensation fluxes which contribute to the gross cloud condensates passed to the microphysical scheme. The microphysical processes are parameterized similar to the scheme described in Lopez (2002), while a statistical 25 sedimentation formalism proposed by Geleyn et al. (2008) is used to calculate the precipitation and sedimentation fluxes.
The coupling between physics and dynamics is handled by an interface based on a flux-conservative formulation developed in Catry et al. (2007) and further generalized in Degrauwe et al. (2016). The interface transforms the fluxes coming from the parameterizations to tendencies using a clean description of the thermodynamic and continuity equations, ensuring the conservation of heat, mass and momentum. Evaluating the impact on the forecast skill of the stochastic forcing in the absence of perturbations in initial and boundary conditions is done on the basis of root mean square error (RMSE) and ensemble spread. Root mean square errors are calculated with respect to ERA5 HRES reanalysis: with x era i,j and x ctrl i,j the respective reanalysis and control (CP and NCP) forecast values of field x at grid point (i,j) and x stoch i,j the 5 ensemble (MOCON and OMEGA) mean of field x at grid point (i,j) and nx and ny the number of grid points in the x and y direction respectively.
The spread of the ensemble created by the stochastic forcing is given by: The impact of introducing the stochastic forcing together with perturbed initial and boundary conditions on the skill is expressed using the CRPS (Brown, 1974;Matheson and Winkler, 1976). The CRPS in point (i,j) measures the distance between the probabilistic forecast ρ of variable x and the reanalysis value x era as 15 CRPS i,j = +∞ −∞ (P (x i,j ) − P era (x i,j )) 2 dx with P (x) and P era (x) cumulative distributions: where f i,j is the probability of a given event occurring in the ensemble forecast and o i,j a binary indicator of the occurrence of the event in the TRMM observations.
Competing interests. The authors declare that they have no conflict of interest.