Data assimilation is considered as a problem in Bayesian estimation, viz. determine the probability distribution for the state of the observed system, conditioned by the available data. In the linear and additive Gaussian case, a Monte Carlo sample of the Bayesian probability distribution (which is Gaussian and known explicitly) can be obtained by a simple procedure: perturb the data according to the probability distribution of their own errors, and perform an assimilation on the perturbed data. The performance of that approach, called here ensemble variational assimilation (EnsVAR), also known as ensemble of data assimilations (EDA), is studied in this two-part paper on the non-linear low-dimensional Lorenz-96 chaotic system, with the assimilation being performed by the standard variational procedure. In this first part, EnsVAR is implemented first, for reference, in a linear and Gaussian case, and then in a weakly non-linear case (assimilation over 5 days of the system). The performances of the algorithm, considered either as a probabilistic or a deterministic estimator, are very similar in the two cases. Additional comparison shows that the performance of EnsVAR is better, both in the assimilation and forecast phases, than that of standard algorithms for the ensemble Kalman filter (EnKF) and particle filter (PF), although at a higher cost. Globally similar results are obtained with the Kuramoto–Sivashinsky (K–S) equation.

Introduction

The purpose of assimilation of observations is to reconstruct as accurately as possible the state of the system under observation, using all the relevant available information. In geophysical fluid applications, such as meteorology or oceanography, that relevant information essentially consists of the physical observations and of the physical laws which govern the evolution of the atmosphere or the ocean. Those physical laws are in practice available in the form of a discretized numerical model. Assimilation is therefore the process by which the observations are combined together with a numerical model of the dynamics of the observed system in order to obtain an accurate description of the state of that system.

All the available information, the observations as well as the numerical model, is affected (and, as far as we can tell, will always be affected) with some uncertainty, and one may wish to quantify the resulting uncertainty in the output of the assimilation process. If one chooses to quantify uncertainty in the form of probability distributions (see e.g. , or , for a discussion of the problems which underlie that choice), assimilation can be stated as a problem in Bayesian estimation. Namely, determine the probability distribution for the state of the observed system, conditioned by the available information. That statement makes sense only under the condition that the available information is described from the start in the form of probability distributions. We will not discuss here the difficult problems associated with that condition (see , for such a discussion) and will assume below that it is verified.

There is one situation in which the Bayesian probability distribution is readily obtained in analytical form. That is when the link between the available information on the one hand, and the unknown system state on the other, is linear, and affected by additive Gaussian error. The Bayesian probability distribution is then Gaussian, with explicitly known expectation and covariance matrix (see Sect. 2 below).

Now, the very large dimension of the numerical models used in meteorology and oceanography (that dimension can lie in the range 106 to 109) forbids explicit description of probability distributions in the corresponding state spaces. A widely used practical solution is to describe the uncertainty in the form of an ensemble of points in state space, with the dispersion of the ensemble being meant to span the uncertainty. Two main classes of algorithms for ensemble assimilation exist at present. The ensemble Kalman filter (EnKF), originally introduced by and further studied by many authors ( and ), is a heuristic extension to large dimensions of the standard Kalman filter (KF) . The latter exactly achieves Bayesian estimation in the linear and Gaussian case that has just been described. It explicitly determines the expectation and covariance matrix of the (Gaussian) conditional probability distribution and evolves those quantities in time, updating these with new observations as they become available.

The EnKF, contrary to the standard KF, evolves an ensemble of points in state space. One advantage is that it can be readily, if empirically, implemented on non-linear dynamics. On the other hand, it keeps the same linear Gaussian procedure as KF for updating the current uncertainty with new observations. EnKF exists in many variants and, even with ensemble sizes of relatively small size (O(10–100)), produces results of high quality. It has now become, together with variational assimilation, one of the two most powerful algorithms used for assimilation in large-dimension geophysical fluid applications.

Concerning the Bayesian properties of EnKF, have proven that, in the case of linear dynamics and in the limit of infinite ensemble size, EnKF achieves Bayesian estimation, in that it determines the exact (Gaussian) conditional probability distribution. In the case of non-linear dynamics, EnKF has a limiting probability distribution, which is not in general the Bayesian conditional distribution.

Contrary to EnKF, which was from the start developed for geophysical applications (but has since extended to other fields), particle filters (PFs) have been developed totally independently of such applications. They are based on general Bayesian principles and are thus independent of any hypothesis of linearity or Gaussianity (see , and , for more details). Like the EnKF, they evolve an ensemble of (usually weighted) points in state space and update them with new observations as these become available. They exist in numerous variants, many of which have been mathematically proven to achieve Bayesianity in the limit of infinite ensemble size . On the other hand, no results exist to the authors' knowledge in the case of finite ensemble size. They are actively studied in the context of geophysical applications as presented in , but have not at this stage been operationally implemented on large-dimension meteorological or oceanographical models.

There exist at least two other algorithms that can be utilized to build a sample of a given probability distribution. The first one is the acceptance–rejection algorithm described in . The other one is the Metropolis–Hastings algorithm , which itself possesses a number of variants . These algorithms can be very efficient in some circumstances, but it is not clear at this stage whether they could be successfully implemented in large-dimension geophysical applications.

Coming back to the linear and Gaussian case, not only, as said above, is the (Gaussian) conditional probability distribution explicitly known, but a simple algorithm exists for determination of independent realizations of that distribution. In succinct terms, perturb additively the data according to their own error probability distribution, and perform the assimilation for the perturbed data. Repetition of this procedure on successive sets of independently perturbed data produces a Monte Carlo sample of the Bayesian posterior distribution.

The present work is devoted to the study of that algorithm, and of its properties as a Bayesian estimator, in non-linear and/or non-Gaussian cases. Systematic experiments are performed on two low-dimensional chaotic toy models, namely the model defined by and the Kuramoto–Sivashinsky (K–S) equation . Variational assimilation, which produces the Bayesian expectation in the linear and Gaussian case, and is routinely, and empirically, implemented in non-linear situations in operational meteorology, is used for estimating the state vector for given (perturbed) data. The algorithm is therefore called ensemble variational assimilation, abbreviated to EnsVAR.

This algorithm is not new. There exist actually a rather large number of algorithms for assimilation that are variational (at least partially) and build (at least at some stage) an ensemble of estimates of the state of the observed system. A review of those algorithms has been recently given by . Most of these algorithms are actually different from the one that is considered here. They have not been defined with the explicit purpose of achieving Bayesian estimation and are not usually evaluated in that perspective.

EnsVAR, as defined here, has been specifically studied under various names and in various contexts by several authors . have extended it in to what they call the randomize-then-optimize (RTO) algorithm. These works have shown that EnsVAR is not in general Bayesian in the non-linear case, but can nevertheless lead to a useful estimate.

EnsVAR is also used operationally at the European Centre for Medium-Range Weather Forecasts (ECMWF) in the definition of the initial conditions of ensemble forecasts. It is also used, both at ECMWF and at Météo-France (see respectively , and ), under the name ensemble of data assimilations (EDA) for defining the background error covariance matrix of the variational assimilation system. And ECMWF, in its latest reanalysis project ERA5 uses a low-resolution ensemble of data assimilations system in order to estimate the uncertainty in the analysis.

None of the above ensemble methods seems however to have been systematically and objectively evaluated as a probabilistic estimator. That is precisely the object of the present two papers.

The first of these is devoted to the exactly linear and weakly non-linear cases, and the second to the fully non-linear case. In this first one, Sect. 2 describes in detail the EnsVAR algorithm, as well as the experimental set-up that is to be used in both parts of the work. Section 3 describes the statistical tests to be used for objectively assessing EnsVAR as a probabilistic estimator. EnsVAR is implemented in Sect. 4, for reference, in an exactly linear and Gaussian case in which theory says it achieves exact Bayesian estimation. It is implemented in Sect. 5 on the non-linear Lorenz system, over a relatively short assimilation window (5 days), over which the tangent linear approximation remains basically valid and the performance of the algorithm is shown not to be significantly altered. Comparison is made in Sect. 6 with two standard algorithms for EnKF and PF. Experiments performed on the Kuramoto–Sivashinsky equation are summarized in Sect. 7. Partial conclusions, valid for the weakly non-linear case, are drawn in Sect. 8.

The second part is devoted to the fully non-linear situation, in which EnsVAR is implemented over assimilation windows for which the tangent linear approximation is no longer valid. Good performance is nevertheless achieved through the technique of quasi-static variational assimilation (QSVA), defined by and . Comparison is made again with EnKF and PF.

The general conclusion of both parts is that EnsVAR can produce good results which, in terms of performance as a probabilistic estimator and of numerical accuracy, are at least as good as the results of EnKF and PF.

In the sequel of the paper we denote by N(m,P) the multivariate Gaussian probability distribution with expectation m and covariance matrix P (for a univariate Gaussian probability distribution, we will use the similar notation N(m,r)). E will denote statistical expectation, and Var will denote variance.

The method of ensemble variational assimilation

We assume the available data make up a vector z, belonging to data space D with dimension Nz, of the form z=Γx+ζ. In this expression, x is the unknown vector to be determined, belonging to state space S with dimension Nx, while Γ is a known linear operator from S into D, called the data operator and represented by an Nz×Nx matrix. As for the Nz vector ζ, we will call it an “error”, even though it may not represent an error in the usual sense, but any form of uncertainty. It is assumed to be a realization of the Gaussian probability distribution N(0,Σ) (in case the expectation E(ζ) were non-zero, but known, it would be necessary to first unbias the data vector z by subtracting that expectation). It should be stressed that all available information about x is assumed to be included in the data vector z. For instance, if one, or even several, Gaussian prior estimates N(xb,Pb) are available for x, they must be introduced as subsets of z, each with Nx components, in the form xb=x+ζb,ζbN(0,Pb).

In those conditions the Bayesian probability distribution P(x|z) for x conditioned by z is the Gaussian distribution N(xa,Pa) with xa=(ΓTΣ-1Γ)-1ΓTΣ-1zPa=(ΓTΣ-1Γ)-1.

At first glance, the above equations seem to require the invertibility of the Nz×Nz matrix Σ and then of the Nx×Nx matrix ΓTΣ-1Γ. Without going into full details, the need for invertibility of Σ is only apparent, and invertibility of ΓTΣ-1Γ is equivalent to the condition that the data operator Γ is of rank Nx. This in turn means that the data vector z contains information on every component of x. This condition is known as the determinacy condition. It implies that NzNx. We will call p=Nz-Nx the degree of over-determinacy of the system.

The conditional expectation xa can be determined by minimizing the following scalar objective function defined on state space S ξSJ(ξ)=12[Γξ-z]TΣ-1[Γξ-z]. In addition, the covariance matrix Pa is equal to the inverse of the Hessian of J Pa=2Jξ2-1.

In the case where the error ζ, while still being random with expectation 0 and covariance matrix Σ, is not Gaussian, the vector xa defined in Eq. () is not the conditional expectation of x for a given z, but only the least-variance linear estimate, or best linear unbiased estimate (BLUE), of x from z. Similarly, the matrix Pa is no longer the conditional covariance matrix of x for a given z, but the covariance matrix of the estimation error associated with the BLUE, averaged over all realizations of the error ζ.

Minimization of Eq. () can also been performed, at least in favourable circumstances, with a non-linear data operator Γ. This is what is done, heuristically but with undisputable usefulness, in meteorological and oceanographical variational assimilation. The latter is routinely implemented in a number of major meteorological centres on non-linear dynamical models with non-linear observation operators. For more on minimization of objective functions of Eq. () with non-linear Γ, see e.g. .

Coming back to the linear and Gaussian case, consider the perturbed data vector z=z+ζ, where the perturbation ζ has the same probability distribution N(0,Σ) as the error ζ. It is easily seen that the corresponding estimate xa=(ΓTΣ-1Γ)-1ΓTΣ-1z is distributed according to the Gaussian posterior distribution N(xa,Pa) (Eq. ). This defines a simple algorithm for obtaining a Monte Carlo sample of that posterior distribution. Namely, perturb the data vector z according to its own error probability distribution, compute the corresponding estimate (Eq. ), and repeat the same process with independent perturbations on z.

That is the ensemble variational assimilation, or EnsVAR, algorithm that is implemented below in non-linear and non-Gaussian situations, with the analogue of the estimate xa being computed by minimization of Eq. (). In general, this procedure, as already mentioned in the introduction, does not achieve Bayesian estimation, but it is interesting to study the properties of the ensembles thus obtained.

Remark. In the case when, the data operator Γ being linear, the error ζ in Eq. () is not Gaussian, the quantity xa defined by Eq. () has expectation xa (BLUE) and covariance matrix Pa (see ). The probability distribution of the xa is in general not Bayesian, but it has the same expectation and covariance matrix as the Bayesian distribution corresponding to a Gaussian ζ.

All the experiments presented in this work are of the standard identical twin type, in which the observations to be assimilated are extracted from a prior reference integration of the assimilating model. And all experiments presented in this first part are of the strong-constraint variational assimilation type, in which the temporal sequence of states produced by the assimilation are constrained to satisfy exactly the equations of the assimilating model.

That model, which will emanate from either the Lorenz or the Kuramoto–Sivashinsky equation, will be written as xt+1=M(xt), where xt is the model state at time t, belonging to model space M, with dimension N (in the strong-constraint case considered in this first part, the model space M will be identical with the state space S). For each model, a “truth”, or reference, run xtr has first been produced. A typical (strong-constraint) experiment is as follows.

Choosing an assimilation window [t0,tT] with length T (it is mainly the parameter T that will be varied in the experiments), synthetic observations are produced at successive times (t0<t1<<tk<<tK=tT), of the form yk=Hkxkr+ϵk, where Hk is a linear observation operator, and ϵkN(0,Rk) is an observation error. The ϵk's are taken to be mutually independent.

The following process is then implemented Nens times (iens=1,,Nens).

Perturb the observations yk,k=0,,K according to (ykiens)=yk+δk, where δkN(0,Rk) is an independent realization of the same probability distribution that has produced ϵk. The notation stresses, as in Eq. (), the perturbed character of (ykiens).

Assimilate the perturbed observations ykiens by minimization of the following objective function: ξ0MJiens(ξ0)=12k=0KHkξk-ykiensTRk-1Hkξk-ykiens, where ξk is the value at time tk of the solution of Eq. () emanating from ξ0.

The objective function (Eq. ) is of type (Eq. ), with the state space S being the model space M (N=Nx) and the data vector z consisting of the concatenation of the K+1 perturbed data vectors ykiens.

The process (i)–(ii), repeated Nens times, produces an ensemble of Nens model solutions over the assimilation window [t0,tT].

In the perspective taken here, it is not the properties of those individual solutions that matter the most, but the properties of the ensemble considered as a sample of a probability distribution.

The ensemble assimilation process, starting from Eq. (), is then repeated over Nwin assimilation windows of length T (taken sequentially along the true solution xtr).

In variational assimilation as it is usually implemented, the objective function to be minimized contains a so-called background term at the initial time t0 of the assimilation window. That term consists, together with an associated error covariance matrix, of a climatological estimate of the model state vector, or of a prior estimate of that vector at time t0 coming from assimilation of previous observations. An estimate of the state vector at t0 is explicitly present in Eq. (), in the form of the perturbed observation y0iens. But that is not a background term in the usual sense of the expression. In particular, no cycling of any type is performed from one assimilation window to the next. The question of a possible cycling of ensemble variational assimilation will be discussed in Part 2 .

The covariance matrix Rk in Eq. () is the same as the covariance matrix of the perturbations δk in Eq. (). The situation in which one used in the assimilation assumed statistics for the observation errors that were different from the real statistics has not been considered.

We sum up the description of the experimental procedure and define precisely the vocabulary to be used in the sequel. The output of one experiment consists of Nwin ensemble variational assimilations. Each ensemble variational assimilation produces, through Nens minimizations of form (Eq. ), or individual variational assimilations, an ensemble of Nens model solutions corresponding to one set of observations yk(k=0,,K) over one assimilation window. These model solutions will be simply called the elements of the ensemble. The various experiments will differ through various parameters and primarily the length T of the assimilation windows.

The minimizations (Eq. ) are performed through an iterative limited-memory BFGS (Broyden–Fletcher–Goldfarb–Shanno) algorithm , started from the observation y0 at time t0 (which, as said below, is taken here as bearing on the entire state vector x0r). Each step of the minimization algorithm requires the explicit knowledge of the local gradient of the objective function Jiens with respect to ξ0. That gradient is computed, as usual in variational assimilation, through the adjoint of Eq. (). Unless specified otherwise, the size of the assimilation ensembles will be Nens=30, and the number Nwin of ensemble variational assimilations for one experiment will be equal to 9000.

The validation procedure

We recall the general result that, among all deterministic functions from data space into state space, the conditional expectation zE(x|z) minimizes the variance of the estimation error on x.

What should ideally be done here for the validation of results is objectively assess (if not on a case-by-case basis, at least in a statistical sense) whether the ensembles produced by EnsVAR are samples of the corresponding Bayesian probability distributions. In the present setting, where the probability distribution of the errors ϵk in Eq. () is known, and where a prior probability distribution is also known, through the observation y0, for the state vector x0, one could in principle obtain a sample of the exact Bayesian probability distribution by proceeding as follows.

Through repeated independent realizations of the process defined by Eqs. () and (), build a sample of the joint probability distribution for the couple (x, z). That sample can then be read backwards for a given z and, if large enough, will produce a useful sample estimate of the corresponding Bayesian probability distribution for x. That would actually solve numerically the problem of Bayesian estimation. But it is clear that the sheer numerical cost of the whole process, which requires explicit exploration of the joint space (x, z), makes this approach totally impossible in any realistic situation.

We have evaluated instead the weaker property of reliability (also called calibration). Reliability of a probabilistic estimation system (i.e. a system that produces probabilities for the quantities to be estimated) is the statistical consistency between the predicted probabilities and the observed frequencies of occurrence.

Consider a probability distribution π (the words probability distribution must be taken here in the broadest possible sense, meaning as well discrete probabilities for the occurrence of a binary or multi-outcome event, as continuous distributions for a one- or multi-dimensional random variable), and denote π(π) the distribution of the reality in the circumstances when π has been predicted. Reliability is the property that, for any π, the distribution π(π) is equal to π.

Reliability can be objectively evaluated, provided a large enough verification sample is available. Bayesianity clearly implies reliability. For any data vector z, the true state vector x is distributed according to the conditional probability distribution P(x|z), so that a probabilistic estimation system which always produces P(x|z) is reliable. The converse is clearly not true. A system which, ignoring the observations, always produces the climatological probability distribution for x will be reliable. It will however not be Bayesian (at least if, as one can reasonably hope, the available data bring more than climatological information on the state of the system).

Root-mean-square errors from the truth as functions of time along the assimilation window (linear and Gaussian case). Blue curve: error in individual minimizations. Red curve: error in the means of the ensembles. Green curve: error in the assimilations performed with the unperturbed observations yk (Eq. ). Dashed–dotted horizontal curve: standard deviation of the observation error. Each point on the blue curve corresponds to an average over a sample of NxNwinNens=1.08×107 elements and each point on the red and green curves to an average over a sample of NxNwin=3.6×105 elements.

Diagnostics of statistical performance (linear and Gaussian case). (a) Rank histogram for the model variable x. (b) Reliability diagram for the event E={x>1.14} (black horizontal dashed–dotted line: frequency of occurrence of the event). (c) Variation with threshold τ of the reliability and resolution components of the Brier score for the events E={x>τ} (red and blue curves respectively; note the logarithmic scale on the vertical). The diagnostics have been computed over all grid points, timesteps, and realizations, making up a sample of size 7.56×106.

Another desirable property of a probabilistic estimation system, although not directly related to Bayesianity, is resolution (also called sharpness). It is the capacity of the system for a priori distinguishing between different outcomes. For instance, a system which always predicts the climatological probability distribution is perfectly reliable, but has no resolution. Resolution, like reliability, can be objectively evaluated if a large enough verification sample is available.

We will use several standard diagnostic tools for validation of our results. We first note that the error in the mean of the predicted ensembles is itself a measure of resolution. The smaller that error, the higher the capacity of the system to a priori distinguish between different outcomes. Concerning reliability, the classical rank histogram and the reduced centred random variable (RCRV) (the latter is described in Appendix A) are (non-equivalent) measures of the reliability of probabilistic prediction of a scalar variable. The reliability diagram and the associated Brier score are relative to probabilistic prediction of a binary event. The Brier score decomposes into two parts, which measure respectively the reliability and the resolution of the prediction. The definition used here for those components is given in Appendix A (Eqs.  and  respectively). Both scores are positive, and are negatively oriented, so that perfect reliability and resolution are achieved when the corresponding scores take the value 0. For more on these diagnostics and, more generally, on objective validation of probabilistic estimation systems, see e.g. chap. 8 of the book by , as well as the papers by and .

Numerical results: the linear case

We present in this section results obtained in an exactly linear and Gaussian case, in which theory says that EnsVAR must produce an exact Monte Carlo Bayesian sample. These results are to be used as a benchmark for the evaluation of later results. The numerical model (Eq. ) is obtained by linearizing the non-linear Lorenz model, which describes the space–time evolution of a scalar variable denoted x, about one particular solution (the Lorenz model will be described and discussed in more detail in Sect. 5; see Eq.  below). The model space dimension N is equal to 40. The length T of the assimilation windows is 5 days, which covers Nt=20 timesteps (the “day” will be defined in the next section). The complete state vector (Hk=I in Eq. ) is observed every 0.5 days (K=10). The data vector z has therefore dimension (K+1)N=440. The observation errors are Gaussian, spatially uncorrelated, with constant standard deviation σ=0.1 (Rk=σ2I,k). However, because of the linearity, the absolute amplitude of those errors must have no impact.

Since conditions for exact Bayesianity are verified, any deviation in the results from exact reliability can be due to only the finiteness Nens of the ensembles (except for the rank histogram, which takes that finiteness into account), the finiteness Nwin of the validation sample, or numerical effects (such as resulting from incomplete minimization or round-off errors).

Figure  shows the root-mean-square errors from the truth along the assimilation window, averaged at each time over all grid points and all realizations. The upper (blue) curve shows the average error in the individual minimizing solutions of Jiens (Eq. ). The lower (red) curve shows the error in the mean of the individual ensembles, while the green curve shows the error in the fields obtained in minimizations performed with the raw unperturbed observations yk (Eq. ).

All errors are smaller than the observation error (horizontal dashed–dotted line). The estimation errors are largest at both ends of the assimilation window and smallest at some intermediate time. As known, and already discussed by various authors , this is due to the fact that the error along the stable components of the flow decreases over the assimilation window, while the error along the unstable components increases. The ratio between the values on the blue and green curves, averaged over the whole assimilation window, is equal to 1.414. This is close to 2 as can be expected from the linearity of the process and the perturbation procedure defined by Eqs. ()–() (actually, it can be noted that the value 2 is itself, independently of any linearity, a test for reliability, since the standard deviation of the difference between two independent realizations of a random variable must be equal to 2 times the standard deviation of the variable itself). The green curve corresponds to the expectation of (what must be) the Bayesian probability distribution, while the red curve corresponds to a sample expectation, computed over Nens elements. The latter expectation is therefore not, as can be seen on the figure, as accurate an estimate of the truth. The relative difference must be about 12Nens0.017. This is the value obtained here.

For a reliable system, the reduced centred random variable, which we denote s, has expectation 0 and variance 1 (see Appendix A). The sample values, computed over all grid points, times, and assimilation windows (which amounts to a set of size Nx(Nt+1)Nwin=7.56×106), are E(s)=0.0035 and Var(s)=1.00.

Histogram of (half) the minima of the objective function (Eq. ), along with the corresponding mean (vertical black line) and standard deviation (horizontal blue line) (linear and Gaussian case).

Figure shows other diagnostics of the statistical performance of the system, performed again over all 7.56×106 individual ensembles in the experiment. The top-left panel is the rank histogram. The top-right panel is the reliability diagram relative to the event {x>1.14}, which occurs with frequency 0.32 (black horizontal dashed–dotted line in the diagram). Both panels visually show high reliability (flatness for the histogram, closeness to the diagonal for the reliability diagram), although that reliability is obviously not perfect. More accurate quantitative diagnostics are given by the lower panel, which shows, as functions of the threshold τ, the two components (reliability and resolution; see Eqs.  and  respectively) of the Brier score for the events {x>τ}. The reliability component is about 10-3; the resolution component is about 5×10-2. A further diagnostic has been made by comparison with an experiment in which the validating truth has been obtained, for each of the Nwin windows, from an additional independent (Nens+1)st variational assimilation. That procedure is by construction perfectly reliable, and any difference with Fig.  could result only from the fact that the validating truth is not defined by the same process. The reliability (not shown) is very slightly improved in comparison with Fig.  (this could be possibly due to a lack of full convergence of the minimizations). The resolution is not modified.

Diagnostics relative to the non-linear and Gaussian case, with assimilation over 5 days. (a) and (b) are relative to one particular assimilation window. (a) (horizontal coordinate: spatial position j) Reference truth at the initial time of the assimilation window (black dashed curve), observations (blue circles), and minimizing solutions (full red curves). (b) (horizontal coordinate: time along the assimilation window) Truth (dashed curve) and minimizing solutions (full red curves) at three points in space. (c) Overall diagnostics of estimation errors (same format as in Fig. ).

Same as Fig. , for the non-linear case (for the event E={x<1.}, which occurs with frequency 0.33, as concerns the reliability diagram on the top-right panel).

It is known that the minimum Jmin=J(xa) of the objective function (Eq. ) takes on average the value E(Jmin)=p2, where p=Nz-Nx has been defined as the degree of over-determinacy of the minimization. This result is true provided the following two conditions are verified: (i) the operator Γ is linear and (ii) the error ζ in Eq. () has expectation 0 and the covariance matrix Σ used in the objective function (Eq. ). It is independent of whether ζ is Gaussian or not. But when ζ is Gaussian, the quantity 2Jmin follows a χ2 probability distribution of order p (for that reason, Eq.  is often called the χ2 condition, although it is verified in circumstances where 2Jmin does not follow a χ2 distribution). As a consequence, the minimum Jmin has standard deviation σ(Jmin)=p/2. In the present case, Nx=40 and Nz=(K+1)Nx=440, so that p/2=200 and p/214.14.

The histogram of the minima Jmin (corrected for a multiplicative factor 1/2 resulting from the additional perturbations, Eq. ) is shown in Fig. . The corresponding empirical expectation and standard deviation are 199.39 and 14.27 respectively, in agreement with Eqs. ()–(). It can be noted that, as a consequence of the central limit theorem, the histogram in Fig.  is in effect Gaussian. Indeed the value of negentropy, a measure of Gaussianity that will be defined in the next section, is 0.0012.

For the theoretical conditions of exact Bayesianity considered here, reliability should be perfect and should not be degraded when the information content of the observations decreases (through increased observation error and/or degraded spatial and/or temporal resolution of the observations). Statistical resolution should, on the other hand, be degraded. Experiments have been performed to check this aspect (the exact experimental procedure is described in Sect. 5). The numerical results (not shown) are that both components of the Brier score are actually degraded and can increase by 1 order of magnitude. The reliability component always remains much smaller than the resolution component, and the degradation of the latter is much more systematic. This is in good agreement with the fact that the degradation of reliability can be due to only numerical effects, such as less efficient minimizations.

The above results, obtained in the case of exact theoretical Bayesianity, are going to serve as reference for the evaluation of EnsVAR in non-linear and non-Gaussian situations where Bayesianity does not necessarily hold.

Numerical results: the non-linear case

The non-linear Lorenz-96 model reads dxjdt=xj+1-xj-2xj-1-xj+F, where j=1,,N represent the spatial coordinate (longitude), with cyclic boundary conditions. As in , we choose N=40 and F=8. For these values, the model is chaotic with 13 positive Lyapunov exponents, the largest of which has a value of (2.5day)-1, where 1 day is equal to 0.24 time unit in Eq. (). This is the definition of “day” we will use hereafter. It is slightly different from the choice made in , where the day is equal to 0.2 time unit in Eq. (12). The difference is not critical for the sequel, nor for possible comparison with other works.

Impact of the informative content of observations on the two components of the Brier score (non-linear case). The format of each panel is the same as the format of the bottom panels of Figs.  and  (red and blue curves: reliability and resolution components respectively). (a) Impact of the temporal density of the observations. Observations are performed every grid point, with error variance σ2=0.4; every time step (full curves); and every second and fourth timestep (dashed and dashed–dotted curves respectively). (b) Impact of the spatial density of the observations. Observations are performed every timestep, with error σ2=0.4; at every grid point (full curves); and every second and fourth grid point (dashed and dashed–dotted curves respectively). (c) Impact of the variance σ2 of the observation error. Observations are performed every second timestep and at every grid point with observation error SD σ=0.4,20.4, and 40.4 (full, dashed, and dashed–dotted curves respectively.

Values of (half) the minima of the objective function for all realizations (non-linear case) (horizontal coordinate: realization number; vertical coordinate: value of the minima).

Except for the dynamical model, the experimental setup is fundamentally the same as in the linear case. In particular, the model time step 0.25 days (our definition), the observation frequency 0.5 days, and the values Nens=30 and Nwin=9000 are the same. The observation error is uncorrelated in space and time, with constant variance σ2=0.4 (Rk=σ2I,k). The associated standard deviation σ=0.63 is equal to 2 % of the variability of the reference solution (it is because of the different range of variability that the value of σ has been chosen different from the value in the linear case). We mention again that no cycling is present between successive assimilation windows.

The results are shown on Fig. . The top panels are relative to one particular assimilation window. In the left panel, where the horizontal coordinate is the spatial position j, the black dashed curve is the reference truth at the initial time of the assimilation window, the blue circles are the corresponding observations, and the full red curves (Nens=30 of them) are the minimizing solutions at the same time. The right panel, where the horizontal coordinate is time along the assimilation window, shows the truth (dashed curve) and the Nens minimizing solutions (full red curves) at three different points in space. Both panels show that the minimizations reconstruct the truth with a high degree of accuracy.

The bottom panel, which shows error statistics accumulated over all assimilation windows, is in the same format as Fig.  (note that, because of the different dynamics and observational error, the amplitude on the vertical axis is different from Fig. ). The conclusions are qualitatively the same. The estimation error, which is smaller than the observational error, is maximum at both ends of the assimilation window and minimum at some intermediate time. The ratio between the blue and red curves, equal on average to 1.41, is close to the value 2, which, as already said, is in itself an indication of reliability. But a significant difference is that the green curve lies now above the red curve. One obtains a better approximation of the truth by taking the average of the Nens minimizing solutions than by performing an assimilation on the raw observations (Eq. ). This is an obvious non-linear effect. One can note it is fully consistent with the fact that the expectation of the a posteriori Bayesian probability distribution is the variance-minimizing estimate of the truth. The expectation and variance of the RCRV are respectively E(s)=0.012 and Var(s)=1.047.

Figure , which is in the same format as Fig. , shows similar diagnostics: rank histogram; reliability diagram for the event {x<1.0}, which occurs with frequency 0.33; and the two components of the Brier score for events of the form {x>τ}. The general conclusion is the same as in the linear case. High level of reliability is achieved. Actually, the reliability component of the Brier score (bottom panel) is now decreased below 10-3. That improvement, in the present situation where exact Bayesianity cannot be expected, can only be due to better numerical conditioning than in the linear case. The resolution component of the Brier score, on the other hand, is increased.

Cross section of the objective function Jiens, for one particular minimization, between the starting point of the minimization and the minimum of Jiens (black curve). Parabola going through the starting point and having the same minimum (red curve).

(a) Identical with the top-right panel of Fig. , repeated for comparison with figures that follow. The other panels show the same diagnostics as in Fig.  but performed at the final time of the assimilation windows. (b) Rank histogram. (c) Reliability diagram for the event E={x>1.33}, which occurs with frequency 0.42. (d) Components of the Brier score for the events E={x>τ} (same format as in the bottom panels of Figs.  and ).

Figure  is relative to experiments in which the informative content of the observations, i.e. their temporal density, spatial density, and accuracy (top, middle, and bottom panels respectively), has been varied. Each panel shows the two components of the Brier score, in the same format as in the bottom panels of Figs.  and  (but with more curves corresponding to different informative contents). The reliability component (red curves) always remains significantly smaller than the resolution component (blue curves). With the exception of the reliability component in the top panel, both components are systematically degraded when the information content of the observations decreases. This is certainly to be expected for the resolution component, but not necessarily for the reliability component. The degradation of the latter is significantly larger than in the linear case (not shown), where we concluded that it could be due only to degradation of numerical conditioning. The degradation of reliability in the lower two panels may therefore be due here to non-linearity. One noteworthy feature is that the degradation of the resolution scores, for the same total decrease in the number of observations, is much larger for the decrease in spatial density than for the decrease in temporal density (middle and top panels respectively). Less information is therefore lost in degrading the temporal than the spatial density of observations.

Figure  shows the distribution of (half) the minima of the objective function (it contains the same information as Fig. , in a different format). Most values are concentrated around the linear value 200, but a small number of values are present in the range 600–1000. Excluding these outliers, the expectation and standard deviation of the minima are 199.62 and 14.13 respectively. These values are actually in better agreement with the theoretical χ2 values (200 and 14.14) than the ones obtained above in the theoretically exact Bayesian case (199.39 and 14.27). This again suggests better numerical conditioning for the non-linear situation.

In view of previous results, in particular results obtained by , a likely explanation for the presence of the larger minima in Fig.  is the following. Owing to the non-linearity of Eq. (), and more precisely to the folding which occurs in state space as a consequence of the chaotic character of the motion, the uncertainty in the initial state is distributed along a folded subset in state space. It occasionally happens that the minimum of the objective function falls in a secondary fold, which corresponds to a larger value of the objective function. This aspect will be further discussed in the second part of the paper. In any case, the presence of larger minima of the objective function is an obvious sign of non-linearity.

Non-linearity is also obvious in Fig. , which shows, for one particular minimization, a cross section of the objective function between the starting point of the minimization and the minimum of the objective function (black curve), as well as a parabola going through the starting point and having the same minimum (red curve). The two curves are distinctly different, while they would be identical in a linear case.

We have evaluated the Gaussian character of univariate marginals of the ensembles produced by the assimilation by computing their negentropy. The negentropy of a probability distribution is the Kullback–Leibler divergence of that distribution with respect to the Gaussian distribution with the same expectation and variance (see Appendix B). The negentropy is positive and is equal to 0 for exact Gaussianity. The mean negentropy of the ensembles is here 10-3, indicating closeness to Gaussianity (for a reference, the negentropy of the Laplace distribution is 0.072). Although non-linearity is present in the whole process, EnsVAR produces ensembles that are close to Gaussianity.

Experiments have been performed in which the observational error, instead of being Gaussian, has been taken to follow a Laplace distribution (with still the same variance σ2=0.4). No significant difference has been observed in the results in comparison with the Gaussian case. This suggests that the Gaussian character of the observational error is not critical for the conclusions obtained above.

Same as Fig. , for the ensemble Kalman filter.

Same as Fig. , for the particle filter.

Comparison with the ensemble Kalman filter and the particle filter

We present in this section a comparison with results obtained with the ensemble Kalman filter (EnKF) and the particle filter (PF). As used here, those filters are sequential in time. Fair comparison is therefore possible only at the end of the assimilation window. Figure  shows the diagnostics obtained from EnsVAR at the end of the window (the top-left panel, identical with the top-right panel of Fig. , is included for easy comparison with the figures that will follow). Comparison with Fig.  shows that the reliability (as measured by the rank histogram, the reliability diagram, and the reliability component of the Brier score) is significantly degraded. It has been verified (not shown) that this degradation is mostly due not to a really degraded performance at the end of the window, but to the use of a smaller validation sample (by a factor of Nt+1=21, which leads to a sample with size 3.6×105).

Figure , which is in the same format as Fig. , shows the same diagnostics for the EnKF. The algorithm used is the one described by . It is stochastic in the sense that observations have been perturbed randomly, for updating the background ensembles, according to the probability distribution of the observation errors. Spatial localization of the background error covariance matrix has been implemented by Schur-multiplying the sample covariance matrix by a squared exponential kernel with length scale 12.0 (the positive definiteness of the periodic kernel has been ensured by removing its negative Fourier components). And multiplicative inflation with factor r=1.001 has been applied, as in , on the ensemble after each analysis.

Comparison with Fig.  shows that the individual ensembles, after a warm-up period, tend to remain more dispersed than in EnsVAR (top-left panel). Reliability, as measured by the reliability diagram and the Brier score, is similar to what it is in Fig. . But it is significantly degraded as evaluated by the rank histogram. The ensembles, although they have larger absolute dispersion than in EnsVAR, tend to miss reality more often.

Following comments from referees, we have made a few experiments not using localization in the EnKF. The RMSE and the RCRV are significantly degraded, while the rank histogram and the resolution component of the Brier score are improved. The reliability component of the Brier score remained the same. All this is true for both assimilation and forecast. These results, not included in the paper, would deserve further studies which are postponed for a future work.

Figure (again in the same format as Fig. ) shows the same diagnostics for a particle filter. The algorithm used here is the “Sampling Importance Particle Filter” presented in . Comparison with Fig.  shows first that the individual ensembles are still more dispersed than in EnKF (top-left panel). It also shows a slight degradation of the reliability component of the Brier score (and, incidentally, a significant degradation of the resolution component), but no visible difference on the reliability diagram. Concerning the rank histogram, PF produces unequally weighted particles, and the standard histogram could not be used. A histogram has been built instead on the quantiles defined by the weights of the particles. This shows, as for EnKF, a significant tendency to miss the truth.

RMS errors at the end of 5 days of assimilation (left column) and of 5 days of forecast (right column) for the three algorithms.

Assimilation Forecasting EnsVAR 0.22 1.49 EnKF 0.24 1.67 PF 0.76 2.63

Same as Fig. , but at the end of 5-day forecasts. On the top-left panel the horizontal axis spans both the assimilation and the forecast intervals.

Same as Fig. , but for EnKF.

Same as Fig. , but for PF.

Same as Fig. , for variational ensemble assimilations performed on the Kuramoto–Sivashinsky equation, i.e. root-mean-square error from the truth along the assimilation window, averaged at each time over all grid points and all realizations, for both the linear and non-linear cases (a and b respectively).

The left column of Table  shows the mean root-mean-square error in the means of the ensembles as obtained from the three algorithms. The performance of EnsVAR and EnKF (0.22 and 0.24) is comparable by that measure, while the performance of PF is significantly worse (0.76). Figures  are relative to ensemble forecasts performed, for each of the three assimilation algorithms, from the ensembles obtained at the end of the 5-day assimilations. They are in the same format as Fig.  and show diagnostics at the end of 5-day forecasts. One can first observe that the dispersion of individual forecasts (top-left panels) increases, as can be expected, with the forecast range, but much less with the EnsVAR than with EnKF and PF. Reliability, as measured by the Brier score, is slightly degraded in all three algorithms with respect to the case of the assimilations. It is slightly worse for EnKF than for EnsVAR and significantly worse for PF. Resolution is, on the other hand, significantly degraded in all three algorithms. This is associated with the dispersion of ensembles and corresponds to what could be expected. Concerning the rank histograms, the histogram of EnsVAR, although still noisy, shows no systematic sign of over- or underdispersion of the ensembles. The EnKF and PF histograms both present, as before, what appears to be a significant underdispersion.

Finally, the right column of Table  shows that RMS errors, which are of course now larger, still rank comparatively in the same order as before, i.e. EnsVAR < EnKF < PF.

The Kuramoto–Sivashinsky equation

Similar experiments have been performed with the Kuramoto–Sivashinsky (K–S) equation. It is a one-dimensional spatially periodic evolution equation, with an advective non-linearity, a fourth-order dissipation term, and a second-order anti-dissipative term. It reads ut+4ux4+2ux2+uux=0,x[0,L]iuxi(x+L,t)=iuxi(x,t)fori=0,1,,4,t>0u(x,0)=u0(x), where the spatial period L is a bifurcation parameter for the system. The K–S equation models pattern formations in different physical contexts and is a paradigm of low-dimensional behaviour in solutions to partial differential equations. It arises as a model amplitude equation for inter-facial instabilities in many physical contexts. It was originally derived by to model small thermal diffusive instabilities in laminar flame fronts in two space dimensions. Equation () has been used here with the value L=32π and has been discretized to 64 Fourier modes. In accordance with the calculations of , we observe chaotic motion with 27 positive Lyapunov exponents, with the largest one being λmax⁡0.13.

With L=32π and the initial condition u(x,0)=cos⁡x161+sin⁡x16, Equation () is known to be stiff. The stiffness is due to rapid exponential decay of some modes (the dissipative part) and to rapid oscillations of other modes (the dispersive part). Figure , where the two panels are in the same format as Fig. 1, shows the errors in the EnsVAR assimilations, in both a linearized (top panel) and a fully non-linear (bottom panel) cases. The length of the assimilation window, marked as 1 on the figure, is equal to 1λmax⁡7.7 in units of Eq. (), i.e. a typical predictability time of the system. The shapes of the curves show that the K–S equation has globally more stability and less instability than the Lorenz equation. The figure shows similar performance for the linear and non-linear situation. Other results (not shown) are also qualitatively very similar to those obtained with the Lorenz equation: high reliability of the ensembles produced by EnsVAR and slightly superior performance over EnKF and PF.

Summary and conclusions

Ensemble variational assimilation (EnsVAR) has been implemented on two small-dimension non-linear chaotic toy models, as well as on linearized versions of those models.

One specific goal of the paper was to stress what is in the authors' mind a critical aspect, namely to systematically evaluate ensembles produced by ensemble assimilation as probabilistic estimators. This requires us to consider these ensembles as defining probability distributions (instead of evaluating them principally, for instance, by the error in their mean). In view of the impossibility of objectively validating the Bayesianity of ensembles, the weaker property of reliability has been evaluated instead. In the linear and Gaussian case, where theory says that EnsVAR is exactly Bayesian, the reliability of the ensembles produced by EnsVAR is high, but not numerically perfect, showing the effect of sampling errors and, probably, of numerical conditioning.

In the non-linear case, EnsVAR, implemented on temporal windows on the order of magnitude of the predictability time of the systems, shows as good (and in some cases slightly better) performance as in the exactly linear case. Comparison with the ensemble Kalman filter (EnKF) and the particle filter (PF) shows EnsVAR is globally as good a statistical estimator as those two other algorithms.

On the other hand, EnsVAR, at it has been implemented here, is numerically more costly than either EnKF or PF. And the specific algorithms used for the latter two methods may not be the most efficient. But it is worthwhile to evaluate EnsVAR in the more demanding conditions of stronger non-linearity. That is the object of the second part of this work.

Methods for ensemble evaluation

This Appendix describes in some detail two of the scores that are used for evaluation of results in the paper, namely the reduced centred random variable and the reliability–resolution decomposition of the classical Brier score. Given a predicted probability distribution for a scalar variable x and a verifying observation ξ, the corresponding value of the reduced centred random variable is defined as sξ-μσ, where μ and σ are respectively the mean and the standard deviation of the predicted distribution. For a perfectly reliable prediction system, and over all realizations of the system, s, by the very definition of expectation and standard deviation, has expectation 0 and variance 1. This is true independently of whether or not the predicted distribution is always the same. An expectation of s that is different from 0 means that the system is globally biased. If the expectation is equal to 0, a variance of s that is smaller (respectively larger) than 1 is a sign of global overdispersion (respectively underdispersion) of the predicted distribution. One can note that, contrary to the rank histogram, which is invariant in any monotonous one-to-one transformation on the variable x, the RCRV is invariant only in a linear transformation.

We recall the Brier score for a binary event E is defined by B=E(p-p0)2, where p is the probability predicted for the occurrence of E in a particular realization of the probabilistic prediction process, p0 is the corresponding verifying observation (p0=1or0 depending on whether E has been observed to occur or not), and E denotes the mean taken over all realizations of the process. Denoting by p(p), for any probability p, the frequency with which E is observed to occur in the circumstances when p has been predicted, B can be rewritten as B=E(p-p)2+Ep(1-p).

The first term on the right-hand side, which measures the horizontal dispersion of the points on the reliability diagram about the diagonal, is a measure of reliability. The second term, which is a (negative) measure of the vertical dispersion of the points, is a measure of resolution (the larger the dispersion, the higher the resolution, and the smaller the second term on the right-hand side). It is those two terms, divided by the constant pc(1-pc), where pc=E(p0) is the overall observed frequency of occurrence of E, that are taken in the present paper as measures of reliability and resolution: Breli=E(p-p)2pc(1-pc), Breso=Ep(1-p)pc(1-pc).

Both measures are negatively oriented and have 0 as optimal value. Breli is bounded above by 1/pc(1-pc), while Breso is bounded by 1.

Remark. There exist other definitions of the reliability and resolution components of the Brier score. In particular, concerning resolution, the uncertainty term pc(1-pc) (which depends on the particular event E under consideration) is often subtracted from the start from the raw score (Eq. ). This leads to slightly different scores.

As said in the main text, more on the above diagnostics and, more generally, on objective validation of probabilistic estimation systems can be found in e.g. chap. 8 of the book by , or in the papers by and .

Negentropy

The negentropy of a probability distribution with density f(y) is the Kullback–Leibler divergence, or relative entropy, of that distribution with respect to the Gaussian distribution with the same expectation and variance. Denoting by fG(y) the density of that Gaussian distribution, the negentropy can be expressed as N(f)=f(y)ln⁡f(y)fG(y)dy. The negentropy is always positive and is equal to 0 if and only if the density f(y) is Gaussian. As examples, a Laplace distribution has negentropy 0.072, while the empirical negentropy of a 30-element random Gaussian sample is 10-6. In the case of small skewness s and normalized kurtosis k, the negentropy can be approximated by N(f)112s2+148k2. It is this formula that has been used in the present paper.

MJ and OT have defined together the scientific approach to the paper and the numerical experiments to be performed. MJ has written the codes and run the experiments. Most of the writing has been carried out by OT.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Numerical modeling, predictability and data assimilation in weather, ocean and climate: A special issue honoring the legacy of Anna Trevisan (1946–2016)”. It is a result of a Symposium Honoring the Legacy of Anna Trevisan – Bologna, Italy, 17–20 October 2017.

Acknowledgements

This work has been supported by Agence Nationale de la Recherche, France, through the Prevassemble and Geo-Fluids projects, as well as by the programme Les enveloppes fluides et l'environnement of Institut national des sciences de l'Univers, Centre national de la recherche scientifique, Paris. The authors acknowledge fruitful discussions during the preparation of the paper with Julien Brajard and Marc Bocquet. The latter also acted as a referee along with Massimo Bonavita. Both of them made further suggestions which significantly improved the paper. Edited by: Alberto Carrassi Reviewed by: Marc Bocquet and Massimo Bonavita

References Anderson, J. L. and Anderson, S. L.: A Monte Carlo Implementation of the Nonlinear Filtering Problem to Produce Ensemble Assimilations and Forecasts, Mon. Weather Rev., 127, 2741–2785, 1999. Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T.: A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking, IEEE T. Signal Proces., 150, 174–188, 2002. Bannister, R. N.: A review of operational methods of variational and ensemble-variational data assimilation, Q. J. Roy. Meteor. Soc., 143, 607–633, 10.1002/qj.2982, 2017. Bardsley, J. M.: MCMC-Based Image Reconstruction with Uncertainty Quantification, SIAM J. Sci. Comput., 34, A1316–A1332, 2012. Bardsley, J. M., Solonen, A., Haario, H., and Laino, M.: Randomize-then-Optimize: a method for sampling from posterior distributions in nonlinear inverse problems, SIAM J. Sci. Comput., 36, A1895–A1910, 2014. Berre, L., Varella, H., and Desroziers, G.: Modelling of flow-dependent ensemble-based background-error correlations using a wavelet formulation in 4D-Var at Meteo-France, Q. J. Roy. Meteor. Soc., 141, 2803–2812, 10.1002/qj.2565, 2015. Bonavita, M., Hólm, E., Isaksen, L., and Fisher, M.: The evolution of the ECMWF hybrid data assimilation system, Q. J. Roy. Meteor. Soc., 142, 287–303, 10.1002/qj.2652, 2016. Bowler, N. E., Clayton, A. C., Jardak, M., Lee, E., Lorenc, A. C., Piccolo, C., Pring, S. R., Wlasak, M. A., Barker, D. M., Inverarity, G. W., and Swinbank, R.: The development of an ensemble of 4D-ensemble variational assimilations, Q. J. Roy. Meteor. Soc., 143, 785–797, 10.1002/qj.2964, 2017. Candille, G. and Talagrand, O.: Evaluation of probabilistic prediction systems for a scalar variable, Q. J. Roy. Meteor. Soc., 131, 2131–2150, 10.1256/qj.04.71, 2005. Chavent, G.: Nonlinear Least Square for Inverse Problems, Theoretical Foundations and Step-by-Step Guide for applications, Springer-Verlag, 2010. Crisan, D. and Doucet, A.: A survey of convergence results on particle filtering methods for practitioners, IEEE T. Signal Proces., 50, 736–746, 10.1109/78.984773, 2002. Doucet, A., Godsill, A. S., and Andrieu, C.: On Sequential Monte Carlo sampling methods for Bayesian filtering, Stat. Comput., 10, 197–208, 2000. Doucet, A., de Freitas, J. F. G., and Gordon, N. J.: An introduction to sequential Monte Carlo methods, in: Sequential Monte Carlo Methods in Practice, edited by: Doucet, A., de Freitas, J. F. G., and Gordon, N. J., Springer-Verlag, New York, 2001. Evensen, G.: Sequential data assimilation in non-linear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 99, 10143–10162, 1994. Evensen, G.: The Ensemble Kalman Filter: theoretical formulation and practical implementation, Ocean Dynam., 53, 343–367, 2003. Gordon, N. J., Salmond, D., and Smith, A. F. M.: Novel approach to nonlinear non-Gaussian Bayesian state estimate, IEEE Proc.-F., 140, 107–113, 1993. Hersbach, H. and Dee, D.: ERA5 reanalysis is in production, ECMWF Newsletter No. 147, 7 pp., 2016. Houtekamer, P. and Mitchell, H.: Data assimilation using an ensemble Kalman filter technique, Mon. Weather Rev., 126, 796–811, 1998. Houtekamer, P. and Mitchell, H.: A Sequential Ensemble Kalman Filter for Atmospheric Data Assimilation, Mon. Weather Rev., 129, 123–137, 2001. Isaksen, L., Bonavita, M., Buizza, R., Fisher, M., Haseler, J., Leubecher, M., and Raynaud, L.: Ensemble of Data Assimilation at ECMWF, ECMWF Technical Memoranda 636, ECMWF, December 2010. Jardak, M. and Talagrand, O.: Ensemble variational assimilation as a probabilistic estimator – Part 2: The fully non-linear case, Nonlin. Processes Geophys., 25, 589–604, 10.5194/npg-25-589-2018, 2018. Järvinen, H., Thépaut, J. N., and Courtier, P.: Quasi-continuous variational data assimilation, Q. J. Roy. Meteorol. Soc., 122, 515–534, 1996. Jaynes, E. T.: Probability Theory: The Logic of Science, Cambridge University Press, 2004. Kalman, R. E.: A new approach to linear filtering and prediction problems, J. Basic Eng.-T. ASME, 82, 35–45, 1960. Kullback, S. and Leibler, R. A.: On Information and Sufficiency, Ann. Math. Statist., 22, 79–86, 1951. Kuramoto, Y. and Tsuzuki, T.: On the formation of dissipative structures in reaction-diffusion systems, Prog. Theor. Phys., 54, 687–699, 1975. Kuramoto, Y. and Tsuzuki, T.: Persistent propagation of concentration waves in dissipative media far from thermal equilibrium, Prog. Theor. Phys., 55, 356–369, 1976. Le Gland, F., Monbet, V., and Tran, V.-D.: Large sample asymptotics for the ensemble Kalman Filter, The Oxford handbook of nonlinear filtering, Oxford University Press, 598–631, 2011. Lorenz, E. N.: Predictability: A problem partly solved, Proc. Seminar on Predictability, vol. 1, ECMWF, Reading, Berkshire, UK, 1–18, 1996. Lorenz, E. N. and Emanuel, K. A.: Optimal sites for supplementary weather observations: simulation with a small model, J. Atmos. Sci., 55, 399–414, 1998. Liu, Y., Haussaire, J., Bocquet, M., Roustan, Y., Saunier, O., and Mathieu, A.: Uncertainty quantification of pollutant source retrieval: comparison of Bayesian methods with application to the Chernobyl and Fukushima Daiichi accidental releases of radionuclides, Q. J. Roy. Meteor. Soc., 143, 2886–2901, 10.1002/qj.3138, 2017. Manneville, P.: Macroscopic Modelling of Turbulent Flows, in: Liapounov exponents for the Kuramoto-Sivashinsky model, edited by: Frisch, U., Keller, J., Papanicolaou, G., and Pironneau, O., Lecture Notes in Physics, Springer, 230 pp., 1985. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.: Equation of State Calculations by Fast Computing Machines, J. Chem. Phys., 21, 1087–1092, 1953. Miller, R. N., Carter, E. F., and Blue, S. T.: Data assimilation into nonlinear stochastic models, Tellus A, 51, 167–194, 1999. Nocedal, J. and Wright, S. J.: Numerical Optimization, Operations Research Series, 2nd edn., Springer, 2006. Oliver, D. S., He, N., and Reynolds, A. C.: Conditioning permeability fields to pressure data, in: ECMOR V-5th European Conference on the Mathematics of Oil Recovery, EAGE, 259–269, 10.3997/2214-4609.201406884, 1996. Pires, C., Vautard, R., and Talagrand, O.: On extending the limits of variational assimilation in nonlinear chaotic systems, Tellus A, 48, 96–121, 1996. Pires, C. A., Talagrand, O., and Bocquet, M.: Diagnosis and impacts of non-Gaussianity of innovations in data assimilation, Physica D, 239, 1701–1717, 2010. Robert, C. P.: The Metropolis–Hastings Algorithm, Wiley StatsRef, Statistics Reference Online, 1–15, 10.1002/9781118445112, 2015. Talagrand, O., Vautard, R., and Strauss, B.: Evaluation of probabilistic prediction systems, Proc. ECMWF Workshop on Predictability, 125, 1–25, 1997. Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, Philadelphia, 2005. Trevisan, A., D'Isidoro, M., and Talagrand, O.: Four-dimensional variational assimilation in the unstable subspace and the optimal subspace dimension, Q. J. Roy. Meteor. Soc., 136, 387–496, 10.1002/qj.571, 2010. van Leeuwen, P. J.: Particle filtering in geophysical systems, Mon. Weather Rev., 137, 4089–4114, 2009. van Leeuwen, P. J.: Nonlinear Data Assimilation in geosciences: an extremely efficient particle filter, Q. J. Roy. Meteor. Soc., 136, 1991–1996, 10.1002/qj.699, 2010. Van Leeuwen, P. J.: Particle Filters for nonlinear data assimilation in high-dimensional systems, Annales de la faculte des sciences de Toulouse Mathematiques, 26, 1051–1085, 10.5802/afst.1560, 2017. Wilks, D. S.: Statistical Methods in the Atmospheric Sciences, 3rd edn., Academic Press, New York, 704 pp., 2011.