Non-homogeneous regression is a frequently used post-processing method for increasing the predictive skill of probabilistic ensemble weather forecasts. To adjust for seasonally varying error characteristics between ensemble forecasts and corresponding observations, different time-adaptive training schemes, including the classical sliding training window, have been developed for non-homogeneous regression. This study compares three such training approaches with the sliding-window approach for the application of post-processing near-surface air temperature forecasts across central Europe. The predictive performance is evaluated conditional on three different groups of stations located in plains, in mountain foreland, and within mountainous terrain, as well as on a specific change in the ensemble forecast system of the European Centre for Medium-Range Weather Forecasts (ECMWF) used as input for the post-processing.

The results show that time-adaptive training schemes using data over multiple years stabilize the temporal evolution of the coefficient estimates, yielding an increased predictive performance for all station types tested compared to the classical sliding-window approach based on the most recent days only. While this may not be surprising under fully stable model conditions, it is shown that “remembering the past” from multiple years of training data is typically also superior to the classical sliding-window approach when the ensemble prediction system is affected by certain model changes. Thus, reducing the variance of the non-homogeneous regression estimates due to increased training data appears to be more important than reducing its bias by adapting rapidly to the most current training data only.

The need for accurate probabilistic weather forecasts is steadily increasing,
because reliable information about the expected uncertainty is crucial for
optimal risk assessment in agriculture and industry or for personal planning
of outdoor activities. Therefore, most forecast centers nowadays issue
probabilistic forecasts based on ensemble prediction systems (EPSs). To
quantify the uncertainty of a specific forecast, an EPS provides a set of
numerical weather predictions using slightly perturbed initial conditions and
different model parameterizations

One of the most frequently used parametric post-processing methods is “ensemble
model output statistics” (EMOS) introduced by

As the error characteristics between the covariates, typically provided by the
EPS, and the observations often show seasonal dependencies and might change
inter-annually over time, different time-adaptive training schemes have been
developed for NR models.

Alternative time-adaptive models are based on historical analogs or
non-parametric approaches. For approaches employing analogs

In addition to the training scheme employed, an important data-specific aspect
which has to be considered in post-processing is that the EPS may change over
time

This paper presents a comparison of four widely used different time-adaptive
training schemes proposed in the literature that employ alternative strategies
to account for varying error characteristics in the data. To show a wide
spectrum of possible approaches in a unified setup – rather than finding the
universally best method – we consider typical basic applications of these
training schemes and refrain from more elaborate tuning or combinations. A
case study is shown for post-processed

The structure of the paper is as follows: Sect.

The different training schemes for NR models proposed in the literature try to adapt to various kinds of error sources that can occur in post-processing, both in space and time. In order to provide a unifying view and to fix jargon, we first discuss these different error sources and then introduce the training schemes considered along with the comparison setup employed.

NR models aim to adjust for errors and biases in EPS forecasts but, of course, the NR models can be affected by errors and misspecifications themselves. Therefore, we try to carefully distinguish between the two different models involved with their associated errors, i.e., the numerical weather prediction model underlying the EPS vs. the statistical NR model employed for post-processing.

The skill of the EPS can be quantified in EPS forecast biases and variances, which (i) typically vary for different locations conditional on the surrounding terrain, (ii) often show cyclic seasonal patterns, and (iii) can experience non-seasonal temporal changes, e.g., due to changes in the EPS itself.

In addition to the error sources in the employed EPS, the performance of the statistical post-processing itself will typically also (iv) differ at different measurement sites, (v) strongly depend on the amount of training data used, and (vi) whether it is affected by effects that are not accounted for in the NR specification.

Clearly, larger training samples (v) will lead to more reliable predictions when the NR specification (vi) – in terms of response distribution, covariates and corresponding effects, link functions, estimation method, etc. – appropriately captures the error characteristics in the relationship between EPS forecasts and actual observations. However, when these error characteristics differ in space (i and iv) and/or in time (ii and iii), it is not obvious what the best strategy for training the NR is. Extending the training data (v) in space or time will reduce the variance of the NR estimation but might also introduce bias if the NR specification (vi) is not adapted. Thus, this is a classical bias-variance trade-off problem, and we investigate which strategies for dealing with this are most useful in a typical temperature forecasting situation.

To fix jargon, we employ the terms “model” and “bias” without further qualifiers when referring to the NR model in post-processing, whereas when referring to the numerical weather prediction model we employ “EPS model“ and “EPS bias”. Moreover, we refer to a statistical model whose estimates have small bias and variance as stable.

Non-homogeneous regression as originally introduced by

The regression coefficients

The

In this study, we use a period of 40 d for the

A regularized adaption of the classical

Therefore,

For the

As already pointed out by

This idea has recently been pursued by

In this study, to be comparable to the

If we reformulate the

Cyclic smooth functions belong to the broader model class of generalized
additive models

To account for seasonal variations we only need to fit one single model, here
called the

Overview of time-adaptive training schemes, distinguished by
model specification/estimation and training data selection corresponding
to errors sources (vi) and (v), respectively. The basic model specification
refers to Eqs. (

The NR training schemes presented in Sect.

Potential spatial differences (i) and (iv) are handled for all training schemes in the same way: the NR models are estimated separately for each station and subsequently evaluated in groups of terrain types (plain, foreland, alpine). The underlying EPS data – described subsequently – are the same for all NR training schemes and are thus affected by the same seasonal (ii) and non-seasonal changes (iii).

For validation of the training schemes, we consider 2 m temperature ensemble
forecasts and corresponding observations at

Overview of the study area with selected stations classified as
plain, foreland, and alpine station sites. The two highlighted and labeled
stations, Hamburg and Innsbruck, are discussed in detail in
Sect.

As covariates for Eqs. (

This period has been selected in order to investigate the impact of
non-seasonal long-term changes in the EPS model (iii) that is not reflected in
the NR model specifications; i.e., the horizontal resolution of the ECMWF
EPS changed from the previous version (cycle 36r1; 26 January 2010) to the
new version on 8 March 2016 (cycle 41r2). This specific model change was
chosen among various others as it modifies the height of the terrain and, thus,
likely introduces an EPS bias for temperature forecasts directly affecting the
coefficient estimates; other changes such as modified model parameterizations
or improvements in the analysis scheme are expected to have a minor impact on
the post-processing of 2 m temperatures.
It is of specific interest how the

To understand how this affects the different training schemes, we first illustrate
in Fig.

Now Fig.

Data set A. All models are trained and evaluated without being affected by the EPS change.

Data set B. All models start with a training period entirely before the
EPS change but a validation period entirely after the change. However, for the

Data set C. Effects from A and B are mixed so that the

The validation period is 2 years for A and B and 1 year for C. A total number
of 731/730/365 NR models has to be estimated for the three sliding-window
approaches, while only 1/1/1

This section assesses the performance of the different time-adaptive training
schemes. First, the temporal evolutions of the estimated coefficients are shown
for two stations representative of one measurement site in the plains and one
in mountainous terrain. Afterwards, the predictive performance of the training
schemes is evaluated in terms of the CRPS conditional on the three data sets
with and without the change in the horizontal resolution of the EPS (Fig.

Temporal evolution of regression coefficients for the validation
period in data set A for Innsbruck at forecast step

As Fig.

Figure

CRPS skill scores clustered into groups of stations located in the
plain, in the mountain foreland near the Alps, and within mountainous terrain
and for the out-of-sample validation periods according to the different data
sets: data set A without the change in the horizontal resolution of the EPS,
data set B with the EPS change in
between the training and validation data sets, and data set C with the EPS
change within training data (Fig.

In comparison to the other time-adaptive training schemes, the classical

For Hamburg (Fig.

After the illustrative evaluation of the coefficients' temporal evolution for
the different time-adaptive training schemes, Fig.

For data set A, the

For data set B at stations in the plains and foreland, the mean predictive
skill behaves similarly to data set A, except that the

For data set C at stations in the plains and foreland, the predictive skill
is again similar to data set A with slight performance losses. For alpine
stations, the

The validation of the different time-adaptive training schemes shows that the

Non-homogeneous regression (NR) is a widely used method to statistically post-process ensemble weather forecasts. In its original version it was used for temperature forecasts employing a Gaussian response distribution, but over the last decade various statistical model extensions have been proposed for other quantities employing different response distributions or to enhance its predictive performance. When estimating NR models there is always a trade-off between large enough training data sets to get stable estimates and still allowing the statistical model to adjust to temporal changes in the statistical error characteristics of the data. Therefore, different training schemes with specific advantages and drawbacks have been developed as presented in this paper. To show a wide spectrum of possible approaches in a unified setup, we consider typical basic applications of the training schemes and refrain from more elaborate tuning or combinations.

The classical

The differences between the methods presented can be seen in the coefficient
paths shown in Figs.

To conclude, all four training schemes shown in this paper have their
advantages in particular applications. If only short periods of training data
are available (

All computations are performed in R 3.6.1

The supplement related to this article is available online at:

This study is based on the PhD work of MNL under supervision of GJM and AZ. The majority of the work for this study was performed by MNL with the support of RS. All the authors worked closely together in discussing the results and commenting on the manuscript.

Sebastian Lerch is one of the editors of the special issue on “Advances in post-processing and blending of deterministic and ensemble forecasts”. The remaining authors declare that they have no conflict of interest.

This article is part of the special issue “Advances in post-processing and blending of deterministic and ensemble forecasts”. It is not associated with a conference.

We thank the Zentralanstalt für Meteorologie und Geodynamik (ZAMG) for providing access to the data.

This project was partly funded by the Austrian Research Promotion Agency (FFG, grant no. 858537) and by the Austrian Science Fund (FWF, grant no. P31836). Sebastian Lerch gratefully acknowledges support by the Deutsche Forschungsgemeinschaft (DFG) through SFB/TRR 165 “Waves to Weather”.

This paper was edited by Maxime Taillardat and reviewed by two anonymous referees.