This contribution explores a new approach to forecasting multivariate covariances for atmospheric chemistry through the use of the parametric Kalman filter (PKF). In the PKF formalism, the error covariance matrix is modellized by a covariance model relying on parameters, for which the dynamics are then computed. The PKF has been previously formulated in univariate cases, and a multivariate extension for chemical transport models is explored here. This contribution focuses on the situation where the uncertainty is due to the chemistry but not due to the uncertainty of the weather. To do so, a simplified two-species chemical transport model over a 1D domain is introduced, based on the non-linear Lotka–Volterra equations, which allows us to propose a multivariate pseudo covariance model. Then, the multivariate PKF dynamics are formulated and their results are compared with a large ensemble Kalman filter (EnKF) in several numerical experiments. In these experiments, the PKF accurately reproduces the EnKF. Eventually, the PKF is formulated for a more complex chemical model composed of six chemical species (generic reaction set). Again, the PKF succeeds at reproducing the multivariate covariances diagnosed on the large ensemble.

Data assimilation aims to provide an estimation of the true state of a system. This estimation, called the analysis, is a compromise between the forecast of the state and the available observations. The optimal combination of the forecast and the observations relies on their respective error covariance matrices as given by the Kalman filter equations

In atmospheric chemistry applications, the system to study is the concentration of multiple chemical species in the atmosphere. In most cases, chemical transport models (CTMs) are used to forecast the concentrations, such as the operational Model Of atmospheric Chemistry At
larGE scale (MOCAGE) used in Météo-France

In this context, the forecast-error covariance matrix contains the correlations of the forecast errors within and between the chemical species. In multivariate covariance modelling applied in meteorology, these correlations are respectively denoted as

However, the estimation and the modelling of multivariate covariances in air quality are complex topics

Recently, a new approximation of the Kalman filter (KF) was introduced, the parametric Kalman filter (PKF), where the error covariance matrices are approximated by a covariance model fitted with a set of parameters, e.g. the grid-point variance and the local anisotropy

Applying the PKF approach for CTMs is attractive because the parametric dynamics are known for the transport equations

While the PKF has been formulated for univariate statistics, a first attempt at multivariate statistics has been proposed based on the balance operator approach

This contribution only focuses on the uncertainty dynamics due to the chemistry without accounting for the part of the uncertainty of the weather: for example, we do not take into account the uncertainty of the wind that transports the chemical species.

The paper is organized as follows. Section

The PKF is a recent implementation of the Kalman filter where the covariance matrices are approximated by some covariance model. For the sake of consistency, this section first recaps the basics of the Kalman filter, and then it recalls the diagnosis of the covariance matrix in large-dimension and covariance models to introduce the formalism of the PKF in univariate statistics. The section ends with a numerical example of interest for air quality that illustrates the PKF.

Here we consider a system whose state is denoted by

Because of the spatio-temporal sparsity of observations, modelling, and chaotic amplification of initial error in forecast and measurement errors, the exact actual state at a time

Data assimilation aims to provide the analysis state,

The process of estimating the analysis state from a forecast and some observations is called the analysis step. The forecast-error covariance matrix denoted by

Next, the forecast step pushes the uncertainty forward in time.
The analysis state

While the Kalman filter formalism is based on simple vector algebra equations, it is not easy to understand the statistical content of the error covariances, which would require representing each covariance function and exploring their temporal evolution. Fortunately, simple diagnosis can be introduced to summarize the statistical relationship between points in the geographic domain. In turn, these diagnostics can be used as parameters of covariance models, as detailed now.

In data assimilation, two diagnoses for the error covariance matrices are often introduced: the variance field and the anisotropy of the correlation functions which correspond to the principal axes of the spatial correlation. These diagnoses are used for the description of the forecast-error covariance matrix.

The forecast-error variance field,

When the forecast error is a differential random field, the anisotropy of the correlation is characterized by the so-called local forecast-error metric tensor

In practice, the direction of the largest correlation anisotropy corresponds to the principal axis of the smallest eigenvalue for the metric tensor: the metric tensor is

One of the motivations behind the diagnosis of the variance and the local anisotropy tensor is that they can be used as parameters of covariance models, the VLATcov models

Heterogeneous covariance models are important because they provide a way to produce non-obvious correlation functions from a set of parameters. Hence, approximating a covariance matrix, as the forecast-error covariance at a given time, by a covariance model is reduced to the knowledge of a set of parameters. The parametric Kalman filter takes advantage of this kind of approximation to reproduce the Kalman filter dynamics as explained now.

A covariance model is first considered,

To describe the sequential evolution of error-covariance matrices along the assimilation cycles, we assume that the forecast-error covariance matrix at a time

At an abstract level, the parametric Kalman filter consists of the following sequential steps

Then, the forecast step of the PKF, equivalent to Eq. (

An illustration of the PKF is now proposed for a univariate advection problem, with a focus on the forecast step. This introduction of an intermediate problem aims to give the reader a good understanding of the PKF and its advantages and difficulties, which will be necessary to address the more complex problem encountered in a multivariate CTM.

For a 1D and periodic domain, of coordinate

The forecast step of the PKF is illustrated for the conservative dynamics, where the covariance matrices are approximated by a VLATcov model. The computation of the PKF dynamics can be performed using SymPKF

In the following, a numerical test bed shows the ability of the PKF to predict the uncertainty dynamics compared to a reference ensemble estimation (EnKF).

The numerical experiment studies the time propagation of an uncertainty at time

To assess the PKF's ability to forecast the error statistics, we compare its results with diagnoses obtained from the forecast of a large ensemble,

Hence, from the ensemble, the variance at a given time is then estimated from its
unbiased estimator

The numerical framework used to forecast both the ensemble and the PKF system is described now. The periodic domain is

Pre-defined heterogeneous and stationary wind field

For this experiment, the mean state

Comparison of the (low-resolution) forecasts (

The dynamics of the uncertainty show in Fig.

Regarding the performances of the two methods, the PKF forecast results for the error statistics are quite similar to the one diagnosed from the ensemble, i.e. the EnKF for this test bed. The forecasts of the concentrations in Fig.

This example shows the motivation behind the PKF: it is able to predict the (main parameters of the) error covariance with a good skill and at a low numerical cost. This low numerical cost first concerns the computer memory:
the information contained in a covariance matrix of size

As another advantage, the PKF provides information about the physics of the uncertainty: when ensemble diagnosis only observes the time evolution of the statistics without any explications, the PKF provides a simplified proxy that details the origins of these statistical evolutions with only three equations, and thus the PKF improves our knowledge of uncertainty dynamics.

The exploration of the multivariate extension is now addressed.
For multivariate problems, a modellization of the cross-correlation functions (or inter-species correlation functions) is needed. Moreover, it would be convenient to introduce a multivariate covariance model that extends the univariate VLATcov model, as the heterogeneous Gaussian model (Eq.

Because multivariate modelling is a difficult topic, a multivariate covariance model is proposed in a simplified test bed in Sect.

To explore a multivariate formulation of the PKF, a simplified chemical transport model is introduced that mimics the MOCAGE framework. This simplified CTM contains the essential features of what can be found in a more realistic CTM, i.e. advection, multiple chemical species, and non-linearities.

To do so, a 1D periodic domain of coordinate

Numerical simulations of the Lotka–Volterra dynamical system whose solutions are periodical orbits (purple curves with different transparencies), flowing anti-clockwise around the critical point

Considered as a dynamical system of ordinary equations and represented in the phase space

In this multivariate framework, the error-covariance matrix

From a covariance-modelling point of view, and from the perspective of the PKF, the univariate covariances

Since no multivariate modelling extending the VLATcov model is available, a numerical exploration of the dynamics of multivariate statistics is performed for the LV CTM so as to guess a proxy for the cross-covariance functions.

Compared to the univariate experiment described in Sect.

For the simulation, the fields

While there is no cross-correlation at the initial condition, the coupling provided by the LV equations should introduce a non-zero cross-correlation between errors in

Evaluation of the cross-correlation model

Figure

Now, a proxy for the cross-correlation is introduced from the data set of multivariate forecasts.

After a trial-and-error process, and inspired by the VLATcov model in Eq. (

One of the main advantages of considering a simple analytic formula is that it can be extended to a problem with more chemical species and for a domain of a higher dimension.

Note that formulation Eq. (

To assess the skill of the proxy, Fig.

Time evolutions of the relative errors between the empirical cross-correlation matrix (EnKF) and the proxy-generated cross-correlation matrix fitted with EnKF-diagnosed parameters for two different settings of the initial length scales: equal length scales with

At a quantitative level, Fig.

As the two multivariate error fields are uncorrelated at the initial time, the true cross-correlation matrix

According to our knowledge, no proxy of cross-correlations similar to Eq. (

Despite the limitations of the proxy, a multivariate extension of the univariate VLATcov model is explored below, where the cross-correlation is approximated by the proxy in Eq. (

The computation of the PKF dynamics leverages the SymPKF package which, applied to the dynamics Eq. (

The overlines of the mean states

For the dynamics of the anisotropy in Eqs. (

Note that the dynamics induced by the transport process are exact, as mentioned in Sect.

The dynamics of the aspect tensors, Eqs. (

A closure is proposed for the LV CTM multivariate PKF dynamics. Note that the open terms of the PKF dynamics Eq. (

From a detailed quantification of the impact of the chemistry alone (see Appendix

For multivariate statistics, the update Eq. (

For an observation at location

In this section, two numerical experiments, labelled FCST and DA, are proposed to evaluate the multivariate formulation of the PKF for the LV CTM. Again, a large EnKF will be used as a reference to be compared with regarding the error statistics produced. The first experiment, FCST, focuses on the forecast step alone. Therefore, the PKF dynamics (Eq.

In both experiments, the EnKF relies on 6400 members. The total time of the simulation is

For the data assimilation experiment, a network of four sensors regularly spaced on the right-hand side of the domain is considered to generate observations of the chemical species

The results for the FCST experiment are shown in Fig.

Results of the forecast numerical experiment. PKF error statistics (solid red lines) and EnKF-diagnosed error statistics (dashed blue lines) at times

The forecasts of the means match perfectly for both methods (see Fig.

Results of the data assimilation numerical experiment. Nature run (dash-dotted green lines, only in panels

The outcome of the DA experiment in Fig.

For the DA experiment (Fig.

In both of these experiments, the PKF has shown itself able to reproduce the results of a large ensemble Kalman filter. Again, these qualitative results of the PKF were obtained at a low numerical cost: the equivalent of 3 time integrations of Eq. (

It would be interesting to assess the robustness of the results, including whether the advection terms remain dominant under different conditions, such as weaker winds or accelerated chemistry, from a set of operational CTM predictions.

The simplified LV CTM has allowed for a multivariate PKF assimilation validated in numerical experiments. To explore the ability of the PKF to apply to a more complex chemical scheme, an intermediate chemical model is now introduced, the GRS

The GRS describes the dynamics of a reduced number of chemical species or pseudo species. Hence, six species are considered and interact as

The system of equations of the GRS CTM is written as

GRS settings.

In the

Settings of the GRS CTM, with the pre-defined heterogeneous and stationary wind field

In a new numerical experiment, the PKF forecasts will be compared with those of an EnKF (of size

Given the complexity of the set of Eq. (

For the settings of this numerical experiment, the resolution of the grid has been reduced to

Multivariate forecast statistics for the GRS CTM, PKF outputs (coloured lines), and ensemble estimations from

Realistic heterogeneous initial concentration fields are constructed as follows.
First, starting from zero concentrations, a chemical equilibrium state is computed from a 4-week time integration of a 0D version of Eq. (

The initial condition for the PKF is set as follows. The mean state is given by the six 1D fields

For the validation, an ensemble of 1600 initial conditions has been populated, consistently from the PKF initial conditions, by adding univariate perturbations to the GRS-CTM initial condition. For each member

Figure

Regarding the behaviour of the error statistics, the impact of the chemistry appears: the chemical reactions led to non-zero cross-correlations visible in the right column (except Fig.

The impact of chemistry leads to non-zero cross-correlations between all pairs of species (Fig.

Compared to the EnKF, the PKF offers a high-quality forecast at a very low computational cost. The means (left column) are in perfect accordance in both methods. Slight differences can be observed regarding the standard-deviation fields (second column) but, as established in Sect.

Note that the specific behaviour of the ROC error variance can be understood from the PKF equations for the GRS CTM (not detailed here but available on the github repository; see

This work explored a multivariate formulation of the PKF for atmospheric chemistry needs, when the PKF is formulated from the variance and the anisotropy tensor.

While a significant portion of the air quality uncertainty is due to meteorology (e.g. the uncertainty in the wind used for the transport), the present work focuses on the situation where the uncertainty in chemical variables is due solely to chemistry as it evolves during a given meteorological situation.

A simplified univariate chemical transport model was introduced in a 1D periodical domain with a heterogeneous wind field and conservative dynamics, illustrating the impact of the transport on the error statistics, and in particular the evolution of the variance and of the anisotropy (length scale) due to the wind heterogeneity. Compared with an estimation from a large ensemble of 6400 forecasts, the PKF has proven to be able to reproduce the variance and the anisotropy and also able to provide a proxy for the correlation functions. The PKF prediction has been obtained at a lower numerical cost compared with the cost of the ensemble. In addition, the PKF has been shown to be less sensitive to a dispersive model error encountered for this simulation that required computation of the ensemble at a high resolution to mitigate the effect of the dispersive term on the ensemble estimation. This simplified model proposed a proxy for multivariate covariance to approximate cross-covariances, which extends the univariate covariance model parameterized from variance and anisotropy, but the resulting multivariate covariance is symmetric with no guarantee of positiveness.

Then a simplified multivariate chemical transport model was introduced to tackle multivariate error statistics. Based on Lotka–Volterra (LV) dynamics, this test bed reproduces non-linear coupling between chemical species and the transport due to the wind, as it can be observed in a real chemical transport model. Then a multivariate PKF formulation was proposed, which made a closure issue related to the chemical part appear, but not to the transport, and concerns the dynamics of the anisotropy. A detailed analysis of the effect of the chemistry on the dynamics of the anisotropy led to an analytical solution of the multivariate evolution of the uncertainty in a 1D harmonic oscillator, which helps to understand the transfer of uncertainty from one species to another.

The PKF has permitted the understanding of the uncertainty dynamics: it offered equations that described the time evolutions of variances, cross-covariances, and anisotropies. The impacts of the advection and the chemistry have been clearly identified in the dynamics of the error statistics, allowing for a better comprehension of the overall problem. Since the relative contribution of the transport was larger than the one of the chemistry in the trend of the anisotropy, a closed form has been considered by removing the terms related to the chemistry in the dynamics of the anisotropy.

Despite this approximation, a validation test bed using an ensemble method showed that the PKF dynamics are able to predict the uncertainty dynamics for two chemical schemes based on LV. Moreover, a multivariate formulation of the PKF analysis step has been introduced, given by Algorithm

A final multivariate example, focused on the forecast step, was introduced to evaluate the potential of the multivariate PKF formulation to a larger system. In this case, the chemical scheme (GRS) describes the interaction of six species. Again, this example has shown the ability of the PKF to reproduce the EnKF error statistics.

To go further, it will be interesting to see whether the advection terms remain dominant under different conditions like weaker wind or accelerated chemistry from an ensemble of forecasts of operational CTMs, where isotropic and homogeneous correlations are often considered in variational data assimilation.

In addition, since we have focused on the uncertainty due to chemistry, it would be interesting to address the part of the uncertainty due to meteorology. For a CTM like MOCAGE, this could be done by considering an ensemble of weather forecasts with each member used as a forcing for a single CTM forecast. However, this solution would lead to multiple CTM forecasts, which would be expensive. Therefore, from the perspective of using a PKF (applied to a CTM), a less expensive solution would be to consider a single PKF forecast where the wind is uncertain (stochastic advection wind), with the wind uncertainty characterized by the variance and the anisotropy tensor estimated from the weather forecast ensemble. The challenge will be to find an appropriate closure for the unknown terms in the dynamics, including the cross-correlation between the wind error and chemical species, with the help of this contribution to multivariate statistics.

This work is a milestone in the development of a multivariate assimilation based on the PKF and applied to air quality and is an important step in extending the univariate PKF implementation to complex operational CTMs like the operational transport model MOCAGE at Météo-France. The work also highlights a drawback of the PKF: the cost of the current multivariate PKF formulation scales as the square of number of chemical species, which appears to be a limitation, at least if all the chemical species are considered in the multivariate uncertainty prediction. Hence, it would be interesting to test a PKF formulation on a reduced chemical scheme of interest for the data assimilation.

Moreover, while this contribution focused on air quality, it contributes to improving our understanding of multivariate statistics, e.g. with the analytical solution of the 1D harmonic oscillator. It would be interesting to extend this multivariate PKF formulation to other geophysical applications, e.g. numerical weather prediction, with particular attention paid to the extension of the multivariate cross-covariance proxy to the 2D or 3D domains. Compared with air quality where the chemical reactions are point-wise, geophysical equations make local interactions appear that have to be studied in view of the PKF approach, e.g. the geostrophic balance in the barotropic model.

The exploration of the uncertainty dynamics from numerical experiments, as made here to validate the PKF from an ensemble method, faces some limits.
Figure

As the problem is discretized for numerical simulations, the actual equation that is simulated is not exactly Eq. (

This can be understood as follows. Since Eq. (

This justifies why the dispersion does not affect the prediction of the mean state – the estimation for the means coinciding for the two methods in Fig.

Since the magnitude of the dispersive term scales as

Same experiment as Fig.

This is demonstrated by comparing the PKF statistics to a high-resolution forecast of the EnKF with a grid of 3 times the original resolution, i.e.

Figure

Correlation functions at location

We consider four chemical species

The kinetics of the reaction, deduced from the mass action law for reaction rates, are written as

This section contributes to evaluating the impact of chemistry on the dynamics of uncertainty with respect to the effect due to advection, leading to a closure for the PKF applied to the multivariate LV CTM.

Regarding the dynamics of the anisotropy fields presented in the prognostic equations (Eqs.

In the PKF dynamics in Eq. (

To focus on the contribution of the chemistry to the dynamics of the anisotropies, an ensemble of

Time series of the spatial average of the error statistics from the ensemble forecast with

Time series of the spatial average of the error statistics from the ensemble forecast with

In the first experiment, Fig.

Numerical results computed for the HO are represented in Fig.

Numerical results for the case

Numerical results for the cases

Note that, for equal initial length scales, the anisotropy appears to be stationary (see Fig.

The time evolution of the HO error statistics makes an alternate transfer of the error statistics appear between the two components

The following section aims at identifying the dominant terms or processes in the dynamics of the anisotropy (Eqs.

Two different evaluations are performed.
The first one evaluates the relative contribution

The computation of these relative contributions will rely on ensemble of forecasts. They will be used to diagnose a posteriori the PKF parameters

The quantifications of the relative contribution by term and by process will be performed for equal and different initial length scales for

The results of the relative contributions presented in Fig.

The harmonic oscillator equations are written as

Their analytical solution is given by

At the initial time, we consider the case where the errors are uncorrelated

From the analytical solution for the errors in Eq. (

Following the same process, we deduce that

We can also deduce an analytical solution for the term

Note that we could have derived analytical solutions in the case of heterogeneous initial fields, but for the sake of simplicity we chose to consider only the homogeneous case. However, obtaining the analytical solution when the initial error fields are correlated seems more difficult.

By introducing the true state and the error fields

Then, using the definition of the cross-correlation function

The update of the variance in the multivariate situation leads to a new version of the PKFO1 as detailed in Algorithm

Sequential process building the analysis state and its error covariance matrix for the first-order PKF (PKFO1) with a pseudo-multivariate covariance model.

Univariate fields of

The code developed and used to generate the experiments is available at

AP and OP explored the multivariate extension of the PKF and designed the experiments. A part of the work was co-supervised with VG during the master internship of AP.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank Annika Vogel, the other two anonymous referees, and Zoltan Toth for their fruitful comments, which helped to improve the article. We thank Richard Ménard and Béatrice Josse for interesting discussions.

The Toulouse Paul Sabatier University and the SDU2E (Sciences de l'Univers, de l'Environnement et de l'Espace) doctoral school supported Antoine Perrot's thesis. This work was supported by French national programme LEFE/INSU grant “Multivariate Parametric Kalman Filter” (MPKF).

This paper was edited by Zoltan Toth and reviewed by Annika Vogel and two anonymous referees.