Physical numerical weather prediction models have biases and miscalibrations that can depend on the weather situation, which makes it difficult to post-process them effectively using the traditional model output statistics (MOS) framework based on parametric regression models. Consequently, much recent work has focused on using flexible machine learning methods that are able to take additional weather-related predictors into account during post-processing beyond the forecast of the variable of interest only. Some of these methods have achieved impressive results, but they typically require significantly more training data than traditional MOS and are less straightforward to implement and interpret.

We propose MOS random forests, a new post-processing method that avoids these problems by fusing traditional MOS with a powerful machine learning method called random forests to estimate weather-adapted MOS coefficients from a set of predictors. Since the assumed parametric base model contains valuable prior knowledge, much smaller training data sizes are required to obtain skillful forecasts, and model results are easy to interpret. MOS random forests are straightforward to implement and typically work well, even with no or very little hyperparameter tuning. For the difficult task of post-processing daily precipitation sums in complex terrain, they outperform reference machine learning methods at most of the stations considered. Additionally, the method is highly robust in relation to changes in data size and works well even when less than 100 observations are available for training.

Although physically based numerical weather predictions (NWPs) have made significant improvements in recent decades

Post-processing with MOS or EMOS is intuitive and can work well but requires a dataset that is both sufficiently large to allow for stable estimation of model coefficients and homogeneous enough for a single model with constant coefficients to work well. This means that the numerical weather model which is to be post-processed must have relatively constant systematic biases and miscalibrations. In order to obtain such a homogeneous dataset, it is standard practice to estimate separate MOSs for different atmospheric quantities, locations, and lead times. Seasonal changes in predictability can be accounted for using time-adaptive MOSs that employ sliding-window training schemes

Weather-adaptive post-processing – i.e., allowing biases and miscalibrations of the NWP model to depend on the weather situation – is necessary to obtain optimal forecast performance but is made complicated by the large number of potentially relevant atmospheric variables whose interactions are unknown or poorly understood. It is possible to include such additional predictors in a MOS model by using selection procedures based on expert knowledge

Machine learning (ML) methods have become increasingly popular post-processing tools in recent years because they are well suited to dealing with this high-dimensional predictor space

MOS random forests (MOS forests for short) fuse traditional and ML-based post-processing by first assuming an appropriate parametric MOS model and then adapting its coefficients to the weather situation at hand using random forests. The split variables and corresponding split points in the individual trees of a MOS forest are not selected based on properties of the response variable directly (e.g., their mean, quantiles, or other parameters), as done in quantile forests or distributional forests. Instead, the splits are chosen based on changes in the MOS coefficients of the assumed model, which may reflect either changes in the marginal distribution of the response (e.g., captured by intercepts) or changes in the dependence on the model outputs (e.g., captured by slopes). The predictor space is thus partitioned to ensure homogeneity with respect to the MOS coefficients, meaning that a single model with constant coefficients can be assumed to work well in each corresponding subsample of the data. In order to decrease variance and to allow for smooth dependencies, a MOS forest combines the partitions from many different MOS trees grown using bootstrapped or subsampled data

A detailed description of MOS forests can be found in Sect.

MOS forests adapt the regression coefficients of an assumed (non-adaptive) base MOS to some set of additional atmospheric variables that characterize the current weather situation. Thus, it is first necessary to choose a suitable base MOS for the specific post-processing task at hand (Sect.

The goal of MOS is to improve upon the quality of physical NWP models by identifying their weather-related statistics using regression models trained on historical observations and corresponding predictions

In the simplest case – with a single (deterministic) forecast for an atmospheric quantity and forecast errors that may be assumed to be Gaussian – systematic biases in the NWPs can be identified using a classical linear regression. A classical example is to regress observed temperatures

This simple post-processing model not only allows biases in the NWP to be corrected but also implicitly estimates the uncertainty of the post-processed forecast. Namely, if

Generally though, weather forecasts do not have constant uncertainty, and many atmospheric variables do not follow Gaussian distributions, even conditionally. To allow for more flexibility in post-processing, modern implementations of MOS therefore often employ distributional regressions

Typically, coefficients of distributional regression models are estimated by maximizing the log-likelihood

In the subsequent sections, we therefore assume that the base MOS for

In order to adapt the coefficients of the base MOS chosen in Sect.

MOS coefficients

Scores with respect to each coefficient are again computed at all observations (Eq.

Once the splitting variable

The three steps described above split a dataset of size

Coefficients

Individual MOS trees grown according to Sect.

Given a MOS forest with

By using partitions from many different trees to estimate the weather-adapted MOS, model coefficients are not restricted to a discrete number of unique values at most equal to the number of terminal nodes (as can be seen with estimates for

The MOS coefficients

Using neighborhood weights as described above is commonplace in forests that contain more complex models rather than just a single scalar value in the terminal nodes

The MOS forests described in Sect.

The

There are 80 different predictor variables derived from the GEFS that can be used for post-processing. These include the direct predictor of the observation: the mean of the ensemble forecast of total (24 h) precipitation between

The ensemble forecasts described in Sect.

Overview of methods used to post-process precipitation forecasts from the

To deal with the fact that precipitation sums are strictly non-negative, we follow

The prespecified base MOS

MOS forests are able to flexibly model MOS coefficients

Distributional forests

Both MOS forests and distributional forests require specifying a parametric response distribution a priori. Since this assumption may not always hold (even conditionally), a fully non-parametric method called quantile regression forests

All three methods described above incorporate additional predictors using forest-based algorithms to allow for weather-adaptive post-processing. In order to quantify the benefit that comes with this added model flexibility, a simple fully parametric non-adaptive EMOS is also considered:

To illustrate how post-processing with MOS forests works in practice, first a single MOS tree is grown at the station of Axams

A MOS tree for Axams is grown from the first 24 years of data and is visualized in Fig.

MOS models for each terminal node (i.e., distinct weather situation) are visualized in Fig.

A single MOS tree estimated for Axams. Ellipses represent nodes used for splitting and contain the name of the splitting variable along with the

MOS forests are compared to the reference methods described in Sect.

Scatterplots of observations versus ensemble mean forecasts in each terminal node of Fig.

The Axams data are randomly split into seven disjoint folds that each contain observations and NWPs from 4 different years. MOS forests and the reference post-processing methods outlined in Sect.

Solid lines are out-of-sample predictions for the location (

To investigate predictive performance at all 95 stations, all models are trained on the first 24 years of data (1985–2008), and out-of-sample predictions are made for the last 4 years (2009–2012).

CRPS skill scores relative to EMOS are computed for each method at each station and are visualized by boxplots in Fig.

Regional differences in model performance can be seen in the map of Fig.

Overall, probabilistic forecasts obtained from the MOS forests not only have a better CRPS than those obtained from the other two methods but are also more statistically consistent with observations (i.e., calibrated). Calibration across all stations is visualized by probability integral transform (PIT) histograms for MOS forests and distributional forests and with a rank histogram for the quantile regression forests (Fig.

Map showing the post-processing method that performs best at each station. Three different circle sizes (small, medium, large) are used to indicate where the CRPSS with respect to the second best method is less than 0.2, between 0.2 and 0.4, and more than 0.4, respectively. Terrain elevation is indicated by background color.

Probability integral transform (PIT) histograms for MOS forests and distributional forests and rank histogram for quantile regression forests across all stations for the time period 2009–2012. Dashed red lines are the 95 % confidence intervals for a uniform distribution.

The methods compared above use 24 years of data for model training, but since such large datasets are not always available in post-processing – e.g., for newly erected observational sites – the hold-out evaluations for all stations in Sect.

As for the hold-out evaluation of all stations in Fig.

When compared to state-of-the-art weather-adaptive post-processing methods, MOS forests have the main advantage of being highly robust: they reliably outperform simple non-adaptive reference methods even when trained on very small sample sizes. This is possible because, unlike state-of-the-art weather-adaptive methods that treat all predictors equally and use a data-driven approach to learn their relationships to the response, MOS forests directly incorporate prior (physically based) knowledge about the most important relationships in the form of a parametric model. One might think that robustness is not important in our current big-data era, but consider the fact that NWP models are continuously updated (e.g., with improved resolutions or parameterizations), and new stations (or measurement instruments) can always be installed. In the words of

In the application considered here, MOS forests are used to post-process NWP ensembles, and separate models are estimated for each station. Without any modifications, MOS forests also offer a powerful way to obtain probabilistic forecasts from deterministic NWPs, where no predictors explicitly characterizing the forecast uncertainty are available. Similarly, MOS forests could also be employed as spatial (rather than station-wise) post-processing models by including predictors that contain information about the individual grid points or stations within the training data. Potentially relevant variables would then include latitude, longitude, and altitude but also surface roughness, land cover type, or other characteristics.

Despite their many advantages, MOS forests require specifying the same two things as all other MOS models: (i) a parametric distribution for the response and (ii) models linking the parameters of that distribution with appropriate predictors derived from the NWP. Not much can be done about the first point besides trying different response distributions or transformations of the data. As for the second point, in cases where no suitable models for the distributional parameters can be specified a priori, MOS forests have no advantage over distributional forests. In fact, MOS forests collapse to distributional forests if the assumed base MOS has intercept-only models for the parameters of the response distribution.

Since NWPs have errors that can depend on the weather situation, weather-adaptive post-processing methods are necessary to obtain optimal probabilistic forecasts. By fusing traditional (non-adaptive) and modern (weather-adaptive) post-processing approaches, MOS forests retain the best of both worlds: a method that is flexible enough to allow for weather-adaptive post-processing but that is also robust, intuitive, and straightforward to implement. This is achieved by using random forests to adapt the regression coefficients of a prespecified parametric base MOS to a set of additional predictor variables that characterize the current weather situation. In contrast to state-of-the-art post-processing methods, which typically directly estimate properties of the response from these predictors, MOS forests only use them to estimate the regression coefficients of the assumed base model. As a result, they can generate skillful forecasts even when only a very limited amount of data are available for training and when purely data-driven weather-adaptive methods fail to outperform a simple non-adaptive model.

Code with wrapper functions for training and evaluating postprocessing models on the RainTyrol dataset can be found at

The RainTyrol dataset used for training and evaluating the postprocessing models is available at

TM, GJM, AZ, and TS planned the research. TM wrote the original paper draft, and all the authors subsequently reviewed and revised it.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

The authors thank the two anonymous reviewers for their helpful comments.

This research has been supported by the Austrian Science Fund (grant no. P 31836). Thomas Muschinski was also supported by the Doktoratsstipendium of Universität Innsbruck. The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).

This paper was edited by Takemasa Miyoshi and reviewed by two anonymous referees.