The success of ensemble data assimilation systems substantially depends on localization, which is required to mitigate sampling errors caused by modeling background error covariances with undersized ensembles. However, finding an optimal localization is highly challenging, as covariances, sampling errors, and appropriate localization depend on various factors. Our study investigates vertical localization based on a unique convection-permitting 1000-member ensemble simulation; 1000-member ensemble correlations serve as truth for examining vertical correlations and their sampling error. We discuss requirements for vertical localization by deriving an empirical optimal localization (EOL) that minimizes the sampling error in 40-member subsample correlations with respect to the 1000-member reference. Our analysis covers temperature, specific humidity, and wind correlations on various pressure levels. Results suggest that vertical localization should depend on several aspects, such as the respective variable, vertical level, or correlation type (self- or cross-correlations). Comparing the empirical optimal localization with common distance-dependent localization approaches highlights that finding suitable localization functions bears substantial room for improvement. Furthermore, we examine approaches for achieving positive semi-definiteness for covariance localization that hardly affect the sampling error reduction. Finally, we discuss the gain of combining different localization approaches with an adaptive statistical sampling error correction.

The accuracy of the initial conditions provided by data assimilation systems strongly determines the skill of numerical weather prediction (NWP). Data assimilation (DA) relies on accurate estimates of forecast errors and error covariances that determine the weighting and spreading of observational information. However, modeling suitable error covariances is intrinsically difficult given various atmospheric processes acting on different scales, leading to situation- and flow-dependent error covariance structures. A breakthrough in estimating background errors has been the development of ensemble and hybrid data assimilation algorithms

Considering the large state space of atmospheric models with a hundred million or more degrees of freedom, estimating error covariances with an ensemble forecast is demanding. Computational restrictions usually limit the number of affordable ensemble members to about 20 to 80 members

In the past decade, advanced high-performance computing systems such as the Japanese K-computer

In recent years, several approaches for vertical localization have been developed. The most frequently applied localization approach is a distance-dependent localization that dampens long-range correlations

Several studies investigated different aspects of optimal localization but often focused on horizontal localization. These studies cover fundamental research on sampling errors and their correction

Localization approaches can roughly be grouped into two categories: adaptive and non-adaptive approaches. Non-adaptive approaches apply fixed domain- or variable-uniform localization functions and scales that do not change with time. Adaptive localization approaches, such as statistical sampling error correction methods, enable a flow- or error-correlation-dependent localization

Current regional NWP models exhibit a grid spacing of a few kilometers, allowing an explicit representation of deep convection

This paper investigates how vertical error covariances should be localized based on an existing convection-permitting 1000-member ensemble simulation

How do vertical error correlations for humidity, temperature, or wind behave on average?

How should we localize vertical error correlations from small ensembles?

How much error reduction can be achieved with a domain-uniform vertical localization or by combining different localization approaches?

The remainder of the paper is outlined as follows:
Sect.

Our study uses an existing convective-scale 1000-member ensemble simulation described in detail by

Initial and boundary conditions: the data assimilation cycling has been performed in the coarse European domain assimilating conventional observations with a LETKF

Our study uses the model output from the inner model domain with a

Atmospheric blocking over the Atlantic influenced the large-scale flow over Europe in the 5 d experimental period. The blocking led to a quasi-stationary weather pattern over central Europe with an upper-level trough over western Europe and a shallow surface low over central Europe. The low-pressure system was associated with a cold front and a warm front that moved over Germany during the period. A convergence zone over southern Germany caused large-scale lifting. Furthermore, mid-level winds advected warm and moist air masses from southern Europe towards Germany at the beginning of the experimental period. Combined with the convergence zone, atmospheric conditions led to intense convection and heavy precipitation, including hail. Weak pressure gradients and slowly moving convective cells resulted in high local precipitation rates and flash flooding. Due to these severe weather events, several studies focused on this exceptional period

Error covariances are a key component in data assimilation and determine how assimilated information is weighted and distributed in state space. Given a sample of state vectors

Usually, the number of affordable ensemble members is limited in NWP due to a huge state space and computational restrictions. This deficit causes severe sampling errors. Consequently, all ensemble filters require a correction of sampling errors, often referred to as localization. For example,

The implementation of localization depends on various factors determined by the type of ensemble filter. Usually, localization is applied directly to the background error covariance matrix using a Schur product:

The most common localization approach is a distance-dependent localization that determines tapering factors

In Sect.

Analyzed correlation pairs. Self-correlations on the diagonal and cross-correlations on the off-diagonal of the table. The first variable of each pair represents the ensemble at the reference level.

The sampling noise expected for zero correlation estimates and sample size

The present study will adopt the subsampling approach from

In the present study, we will analyze four prognostic variables: temperature (

Vertical temperature–temperature correlations and empirical optimal localization for reference level 500 hPa on 29 May 2016, 15:00 UTC.

Figure

Throughout this paper we will analyze the 1000-member horizontally averaged absolute vertical correlation to support the discussion of the empirical optimal localization. Averaged absolute correlations are computed as follows:

Figure

Our goal is to empirically find the optimal localization factor

Applying the EOL by construction yields a symmetric but not necessarily positive semi-definite localization matrix. In our case, constructed localization matrices were not positive semi-definite. Depending on the data assimilation algorithm, additional steps could be required to apply the EOL results to guarantee the positive semi-definiteness of the localized covariance matrix. For this purpose, Sect.

Our approach for empirically estimating localization is inspired by

Figure

This section presents mean absolute 1000-member vertical correlations and EOLs for various settings. First, we will evaluate how vertical localization for various single variable pairs should be constructed. Afterward, we will group variable pairs based on similar behavior. Finally, at the end of the results section, we will evaluate the error reduction of all discussed localization approaches, including combinations with the SEC.

As discussed in Sect.

Domain-averaged absolute 1000-member (true) vertical correlations for reference level 500 hPa and different variable pairs.

Next, we focus on the EOL derived for 40-member subsamples from all forecasts. Figure

EOLs for humidity correlations all peak at the reference level 500 hPa (Fig.

Overall, the variability of domain-averaged correlations from forecast to forecast is small (Fig.

EOL for vertical sample correlations of 40-member ensembles:

Same as Fig.

Same as Fig.

Subsequently, we will discuss the EOL for two additional reference levels to highlight changes in height within the troposphere. Figure

All reference levels within the boundary layer show similar behavior of the EOL (see, for example, Fig.

Root mean square difference before and after the EOL was applied to each vertical correlation. Shading and numbers (%) indicate the change in RMSD analyzed for each variable pair averaged over all reference levels, columns, subsamples, and 10 forecasts from 29 May to 2 June 2016. Self correlations are highlighted via hatching.

Assessing the EOL for single variable pairs revealed several requirements for vertical localization. Now, we evaluate the error reduction by the EOL, considering each possible correlation pair separately. The 1000-member ensemble correlation serves as truth to compute the RMSD of each 40-member subsample correlation. Figure

The sampling error of the 40-member correlation of most correlations lies within the expected range and close to

Some operational DA systems apply a uniform distance-based vertical localization that does not change with time, height, variable, or observation type. In this case, appropriate localization needs to meet several requirements using a suitable uniform localization approach. Results in Sect.

Domain mean absolute 1000-member (true) vertical correlations for different variable combinations (self, cross, and all): reference levels

Figure

EOL for vertical sample correlations of 40-member ensembles and different variable combinations (self, cross, and all): reference levels

In contrast, the peak amplitude of the EOL for all correlations is closer to the peak of self-correlation (Fig.

As discussed in Sect.

Root mean square difference before and after localization of 40-member vertical subsample correlations. EOL and Gaspari–Cohn scales are obtained and tuned using the first eight forecasts. Errors are evaluated using two independent forecasts on 2 June 2016: 03:00 UTC (opaque) and 15:00 UTC (hatched). Numbers (%) indicate the average change in RMSD analyzed for different settings (

Figure

Now, we will compare the performance of EOLs to three different domain-uniform distance-dependent localization approaches using Gaspari–Cohn functions. Section

In contrast, a vertical localization constructed similarly to the regional DA system of the DWD increases the difference of the 40-member ensemble correlation with respect to the 1000-member ensemble. The increased difference originates from the damping of meaningful error correlations. The DWD system employs a LETKF that uses observation-space localization, tuned to function in all seasons and weather situations that may differ from our investigation period. Furthermore, it needs to be considered that localization in the LETKF also affects the degrees of freedom of the analysis

Now, we will evaluate the benefit of using a look-up table-based SEC that adjusts correlations based on predefined statistical assumptions. The SEC is an adaptive localization approach that corrects sampling errors as a function of the correlation value. Therefore, the SEC applies an individual correction for each correlation within the domain. An adaptive localization (

Finally, we investigate the benefit of combining the statistical SEC with an EOL or a distance-dependent localization. For this analysis, EOLs have been estimated after applying the SEC to highlight the maximum error reduction achieved by combining SEC with an optimal localization. The localization scale of the distance-dependent localization is kept the same as for the

The EOL approach empirically yields an optimal localization by minimizing differences between sample correlations and a defined true correlation. By design, the results discussed above exclude algorithm-specific requirements to better understand how vertical localization should behave in different situations. However, further steps might be required to apply EOL results depending on the data assimilation algorithm. As mentioned in Sect.

Besides function fitting, re-conditioning of matrices can help achieve positive definiteness

Finally, our most successful attempt at achieving positive definiteness aiming for the least changes in the EOL was by searching for the nearest correlation matrix using specifically designed mathematical algorithms. For example,

This example suggests that ensuring SPSD can be achieved with minor changes to the EOL estimate. However, providing a general answer on how the EOL needs to be adapted is difficult as changes will depend on the construction of the localization matrix and its unique nearest correlation matrix. NCM algorithms can iteratively determine the nearest correlation matrix for a symmetric matrix and could be a useful tool for data assimilation. Choosing the best approach to guaranteeing SPSD is likely case-dependent given changing properties of the problem and potentially very large matrices in NWP.

Examples of EOL-based localization matrices for a single vertical column:

Current ensemble data assimilation systems suffer from severe undersampling requiring vertical localization of error covariances. Our study analyzes vertical correlations from an existing convection-permitting 1000-member ensemble simulation

Furthermore, we use the 1000-member ensemble to evaluate the error reduction achieved by different localization approaches. These approaches include EOLs, distance-dependent localization approaches using a Gaspari–Cohn tapering function

Our results allow a better understanding of the requirements for vertical localization. When employing these conclusions, it is important to consider the specific demands of different ensemble filter algorithms. In ensemble transform Kalman filters, localization increases the degrees of freedom of the analysis and thereby enables the assimilation of more observations

How to apply EOL estimates will vary with the data assimilation algorithm as the application of localization is highly algorithm-specific. In case of covariance localization, constructing a generally non-SPSD localization matrix based on the EOL does not guarantee a symmetric positive semi-definite localized covariance matrix. However, different approaches allow one to achieve positive semi-definiteness of localization matrices. Applying an NCM algorithm

For a serial filter (e.g., the ensemble adjustment Kalman filter (EAKF) by

Our study solely judges localization based on ensemble sampling error, assuming the 1000-member ensemble correlation as truth. It is difficult to predict the number of ensembles needed to apply our method, as it will vary for differing scenarios. However, we do not expect our results to change drastically if we had a larger ensemble. Besides, it would be interesting to compare the EOL with the ELF or GGF approach. For example, comparing ELF and EOL could allow us to investigate other error sources in the assimilation that can influence localization

We have found robust results for a mid-latitude convective summer period. The ever-increasing computational capabilities will enable extended data sets and a higher vertical resolution that is comparatively coarse in the current setup. Furthermore, our approach can be easily applied to other large ensemble simulations to study additional aspects, including horizontal localization. Extending this analysis is desirable given that localization can depend on the underlying weather condition

Code and processed data such as derived empirical optimal localizations are shared on Zenodo:

TN and MW were responsible for the conceptualization and formal analysis. All the authors from the University of Vienna contributed to the development of the methodology. TN developed the software code and was responsible for data curation and visualization of results. TM supported the research with important computational resources. TN and MW wrote and prepared the original paper draft. DH, PJG, and TM helped during the review and editing.

At least one of the (co-)authors is a member of the editorial board of

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Many thanks to Juan Ruiz, Jago Silberbauer, Jeffrey Anderson, and other colleagues at the University of Vienna, RIKEN RCC-S in Kobe, and LMU in Munich, who contributed to this research. Furthermore, we want to thank the two reviewers and the editor for their helpful comments that allowed us to improve the manuscript. The open-source project and Python package “xarray” (

Open-access funding was provided by the University of Vienna.

This paper was edited by Olivier Talagrand and reviewed by Pavel Sakov and one anonymous referee.