Machine learning represents a potential method to cope with the gray zone problem of representing motions in dynamical systems on scales comparable to the model resolution. Here we explore the possibility of using a neural network to directly learn the error caused by unresolved scales. We use a modified shallow water model which includes highly nonlinear processes mimicking atmospheric convection. To create the training dataset, we run the model in a high- and a low-resolution setup and compare the difference after one low-resolution time step, starting from the same initial conditions, thereby obtaining an exact target. The neural network is able to learn a large portion of the difference when evaluated on single time step predictions on a validation dataset. When coupled to the low-resolution model, we find large forecast improvements up to 1 d on average. After this, the accumulated error due to the mass conservation violation of the neural network starts to dominate and deteriorates the forecast. This deterioration can effectively be delayed by adding a penalty term to the loss function used to train the ANN to conserve mass in a weak sense. This study reinforces the need to include physical constraints in neural network parameterizations.

Current limitations on computational power force weather and climate prediction to use relatively low-resolution simulations. Subgrid-scale processes, i.e., processes that are not resolved by the model grid, are typically represented using physical parameterizations

A particularly difficult problem in the representation of unresolved processes is the so-called gray zone

Viewing the atmosphere as a turbulent flow, with up- and downscale cascades, phenomena like synoptic cyclones and cumulus clouds emerge where geometric or physical constraints impose length scales on the flow

An important example of the gray zone in practice is the simulation of deep convective clouds in kilometer-scale models used operationally for regional weather prediction. The models typically have a horizontal resolution of 2–4 km, which is not sufficient to fully resolve the cumulus clouds with sizes in the range from 1 to 10 km. In these models, the simulated cumulus clouds collapse to a scale proportional to the model grid length, unrealistically becoming smaller and more intense as the resolution is increased

Using machine learning (ML) methods such as artificial neural networks (ANNs) for alleviating the problems described above has received increasing attention over the past few years. One approach is to avoid the need for parameterizations altogether by emulating the entire model using observations

Though studies have shown that surrogate models produced by machine learning can be accurate for small dynamical systems, replacing an entire numerical weather prediction model for operational use is not yet within our reach. Therefore, a more practical approach is to use ANNs as replacement for uncertain parameterizations. This has been done either by learning from physics-based expensive parametrization schemes

Here, we use the modified rotating shallow water (modRSW) model to explore the use of a machine learning subgrid representation in a highly nonlinear dynamical system. The modRSW is an idealized fluid model of convective-scale numerical weather prediction in which convection is triggered by orography. As such, the model mimics the gray zone problem of operational kilometer-scale models. Using a simplified model allows us to focus on some key conceptual questions surrounding machine learning parameterizations, such as how choices in the neural network training affect long-term physical consistency. In particular, we include weak physical constraints in the training procedure.

The contents of this work are outlined in the following.
Section

The modRSW model

A 24 h segment of the HR simulation for the three model variables

Conceptually, the ANN's task is to correct a low-resolution (LR) model forecast towards the model truth, which is a coarse-grained high-resolution (HR) model simulation. The coarse-graining factor in this study is set to 4, which is analogous to the range of scales found in the gray zone where deep cumulus convection is partially resolved (e.g., 2.5–10 km).

Schematic of the training data generation process. A HR run is coarse grained to LR to generate the model truth. Each model truth state is integrated forward for one time step using LR dynamics. The difference between the obtained states and corresponding model truth defines the desired network output (red arrows), while the preceding model truth defines the network input.

A training sample (input target pair) is defined by the model truth at some time

Model setting parameters.

A characteristic property of convolutional ANNs is that they reflect spatial invariance and localization. These two properties also apply to the dynamics of many physical systems, such as the one investigated here. They differ from, e.g., dense networks by the use of a so-called kernel. This vector is shifted across the domain grid, covering

The ANN structure used in this research is described in the following. There are five hidden layers applied, each using the ReLU activation function. The input layer uses ReLU as well, while the output layer uses a linear activation function. All hidden layers have 32 filters. The input and output layer shapes are defined by the input and target data. The kernel size is set uniformly to three grid points.

The loss is determined during training by comparing the ANN output to the corresponding target. A standard measure for loss is the mean squared error (MSE). However, any loss function can be used to tailor the application. For example, additional terms can be added to impose weak constraints on the training process as, for example, done in

The Adam algorithm, with a learning rate of

Loss function value for

Mean (bars) and standard deviations

As the initial training weights of the ANNs and the exact number of epochs performed are, to some extent, arbitrary, it is desirable to measure the sensitivity of our results to the realization of these quantities. Figure

In the following, the main scores that are used to verify the efficacy of the ANNs are the root mean squared error (RMSE), spatial mean error (SME), and bias:

We performed a series of experiments designed to investigate the feasibility of using an ANN to correct for model error due to unresolved scales. In Sect.

Figure

The

Same as Fig.

Next we examine the effect of the ANN on a 48 h forecast. Here we compare the LR simulation with (

For both orographies, the ANN has a clear positive effect on the forecast until the error of LR saturates, after which the error of

It is not surprising that

The

Same as Fig.

Instead of including mass conservation in the training process of the ANN, it is natural to first try to correct the mass violation by post processing the ANN corrections. We tested two approaches, i.e., homogeneously subtracting the spatial mean of the

Figures

The

Same as Fig.

Correlation of the different weighting (blue colors) between the bias of

Figure

Evolution of

Next we look at the variability in the forecast errors in terms of

Snapshot of the state variables for the chaotic case of a 6 h forecast starting from initial conditions of the validation dataset for the truth (red), LR (black), and

A visual examination of the animations of the forecast evolution suggests that convective events produced in the LR run are wider and shallower than in the coarse-grained HR run. This behavior mimics the collapse of convective clouds towards the grid length that is typical of kilometer-scale numerical weather prediction models, as noted in the introduction. This then leads to a lack of rain mass but also, via conservation of momentum, a drift in the wind field. The convective events in the LR simulations are therefore also increasingly misplaced as the forecast lead time increases. The ANNs are capable of sharpening the gradients of the convective events, leading to highly accurate forecasts of convective events up to 6–12 h. After this, spurious, missed, and misplaced events start to occur, although the forecast skill remains significant a while longer, which is in contrast to the LR simulations where the forecast skill dissolves after just a few hours. A snapshot of the state for the chaotic case is presented in Fig.

In this paper, we evaluated the feasibility of using an ANN to correct for model error in the gray zone, where important features occur on scales comparable to the model resolution. The model that was used in our idealized setup mimics key aspects of convection, such as conditional instability triggered by orography, and resulting convective events, including rain. As such, this model is representative of fully complex convective-scale numerical weather prediction models and, in particular, the corresponding errors due to unresolved scales in the gray zone. We considered two cases, each with a different realization of the orography, leading to two different regimes. We considered one where the convective events are large and long-lived and one where the convective events are small and short-lived. We refer to the former case as regular and the latter case as chaotic. We showed that the ANNs are capable of accurately sharpening gradients where necessary in both cases to prevent the missing and flattening of convective events that are caused by the low-resolution model's inability to resolve fine scales. For the regular case, the RMSE is still significantly lower than the low-resolution simulation (LR) after 48 h. For the chaotic case, the RMSE surpasses LR after about 1 d. Since the ANNs are not perfect, their errors accumulate over time, deteriorating the forecast skill. In particular, the accumulated mass error causes biases which are not present in LR because the model conserves mass exactly. We, therefore, investigated the effects of adding a term to the loss function of the ANN's training process to penalize mass conservation violation. We found that reducing the mass error reduces the biases in the wind and rain field, leading to further forecasts improvements. For the chaotic case, an additional 15 h in forecast lead time is gained before the RMSE exceeds the LR control simulation and at least 30 h for regular case. Such a positive effect of mass conservation was also found in, for example,

While these results are encouraging, there are some issues to consider when applying this method to operational configurations. On a technical level, the generation of the training data and the training of the ANN can be costly and time consuming due to the requirement of sufficient HR data and the cumbersome exercise of tuning the ANN. The latter is a known problem that can be minimized through the clever iteration of tested ANN settings, but it cannot be fully avoided. Depending on the costs of generating HR data, using observations could be considered instead, as done by

The provided source code (

RK produced the source code. RK and YR ran the experiments and visualized the results. SR provided expertise on neural networks. GC provided expertise on convective-scale dynamics. All authors contributed to the scientific design of the study, the analysis of the numerical results, and the writing of the paper.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been supported by the German Research Foundation (DFG; subproject B6 of the Transregional Collaborative Research Project SFB/TRR 165, “Waves to Weather” and grant no. JA 1077/4-1).

This paper was edited by Stéphane Vannitsem and reviewed by Julien Brajard and Davide Faranda.