Expanding the validity of the ensemble Kalman filter without the intrinsic need for inflation

The ensemble Kalman filter (EnKF) is a powerful data assimilation method meant for high-dimensional nonlinear systems. But its implementation requires somewhat ad hoc procedures such as localization and inflation. The recently developed finite-size ensemble Kalman filter (EnKFN) does not require multiplicative inflation meant to counteract sampling errors. Aside from the practical interest in avoiding the tuning of inflation in perfect model data assimilation experiments, it also offers theoretical insights and a unique perspective on the EnKF. Here, we revisit, clarify and correct several key points of the EnKF-N derivation. This simplifies the use of the method, and expands its validity. The EnKF is shown to not only rely on the observations and the forecast ensemble, but also on an implicit prior assumption, termed hyperprior, that fills in the gap of missing information. In the EnKF-N framework, this assumption is made explicit through a Bayesian hierarchy. This hyperprior has so far been chosen to be the uninformative Jeffreys prior. Here, this choice is revisited to improve the performance of the EnKF-N in the regime where the analysis is strongly dominated by the prior. Moreover, it is shown that the EnKF-N can be extended with a normal-inverse Wishart informative hyperprior that introduces additional information on error statistics. This can be identified as a hybrid EnKF–3D-Var counterpart to the EnKF-N.


Introduction
The ensemble Kalman filter (EnKF) has become a popular data assimilation method for high-dimensional geophysical systems (Evensen, 2009, and references therein).The flow dependence of the forecast error used in the analysis is its main strength, compared to schemes using static background statistics such as 3D-Var and 4D-Var.
However, to perform satisfyingly, the EnKF may require the use or inflation and/or localization, depending on the data assimilation system setup.Localization is required in the rank-deficient regime, in which the limited size of the ensemble leads to an empirical error covariance matrix of overly small rank, as is often the case in realistic high-dimensional systems (Houtekamer and Mitchell, 2001;Hamill et al., 2001;Ott et al., 2004).It can also be useful in a ranksufficient context in the presence of non-Gaussian/nonlinear effects.
Inflation is a complementary technique meant to increase the variances diagnosed by the EnKF (Pham et al., 1998;Anderson and Anderson, 1999).It is usually intended to compensate for an underestimation of uncertainty.This underestimation can be caused either by sampling error, an intrinsic deficiency of the EnKF system, or model error, an extrinsic deficiency.
A variant of the EnKF, called the finite-size ensemble Kalman filter (EnKF-N), has been introduced in Bocquet (2011) and Bocquet and Sakov (2012).It has subsequently been successfully applied in Bocquet and Sakov (2013) and Bocquet and Sakov (2014)  tive inflation usually needed to counteract sampling errors.In particular, it avoids the costly chore of tuning this inflation.
The EnKF-N is derived by assuming that the ensemble members are drawn from the same distribution as the truth, but makes no further assumptions about the ensemble's accuracy.In particular, the EnKF-N, unlike the traditional EnKFs, does not make the approximation that the sample first-and second-order moments coincide with the actual moments of the prior (which would be accessible if the ensemble size N were infinite).
Through its mathematical derivation, the scheme underlines the missing information besides the observations and the ensemble forecast, an issue that is ignored by traditional EnKFs.This missing information is explicitly compensated for in the EnKF-N using a so-called hyperprior.In Bocquet (2011), a simple choice was made for this hyperprior, namely the Jeffreys prior, which is meant to be as non-informative as possible.While the EnKF-N built on the Jeffreys prior often performs very well with low-order models, it may fail in specific dynamical regimes because a finer hyperprior is needed (Bocquet and Sakov, 2012).Other choices were made in the derivation of the EnKF-N that remain only partly justified or insufficiently clear.
The objective of this paper is to clarify several of those choices, to answer several questions raised in the above references, and to advocate the use of improved or new hyperpriors.This should add to the theoretical understanding of the EnKF, but also provide a useful algorithm.Specifically, the EnKF-N allows the development of data assimilation systems under perfect model conditions without worrying about tuning the inflation.In the whole paper, we will restrict ourselves to perfect model conditions.
In Sect.2, the key ideas and algorithms of the EnKF-N are recalled and several aspects of the approach are clarified.It is shown that the redundancy in the EnKF-centered perturbations leads to a subtle but important correction to the EnKF-N when the analysis is performed in the affine space defined by the mean state and the ensemble perturbations.In Sect.3, the ensemble update step of the EnKF-N is revisited and clarified.In Sect.4, the nonlinearity of the ensemble forecast step and its handling by the EnKF-N, and more generally multiplicative inflation, are discussed.The corrections to the EnKF-N are illustrated with numerical experiments in Sect. 5.Sections 6 and 7 discuss modifying or even changing the hyperprior.In Sect.6, we discuss caveats of the method in regimes where the posterior ensemble is drawn to the prior ensemble.Simple alternatives to the Jeffreys hyperprior are proposed.Finally, a class of more informative priors based on the normal-inverse Wishart distribution and permitting one to incorporate additional information into error statistics is introduced and theoretically discussed in Sect.7. Conclusions are given in Sect.8.

The finite-size ensemble Kalman filter (EnKF-N)
The key ideas of the EnKF-N are presented and clarified in this section.Additional insights into the scheme and why it is successful are also given.Bocquet (2011) (later Boc11) recognized that the ensemble mean x and ensemble error covariance matrix P used in the EnKF may be different from the unknown first-and secondorder moments of the true error distribution, x b and B, where B is a positive definite matrix.The mismatch is due to the finite size of the ensemble that leads to sampling errors, partially induced by the nonlinear ensemble propagation in the forecast step (see Sect. 4 for a justification).Figure 1 illustrates the effect of sampling error when the prior is assumed Gaussian and reliable, whereas the prior actually stems from an uncertain sampling using the ensemble.

Marginalizing over potential priors
The EnKF-N prior accounts for the uncertainty in (1) The symbol dB corresponds to the Lebesgue measure on all independent entries M i≤j d[B] ij , but the integration is restricted to the cone of positive definite matrices.Since p(x|E, x b , B) is conditioned on the knowledge of the true prior statistics and assumed to be Gaussian, it does not depend on E, so that (2) Bayes' rule can be applied to p(x b , B|E), yielding Assuming independence of the samples, the likelihood of the ensemble E can be written as The last factor, p(x b , B), is the hyperprior.This distribution represents our beliefs about the forecasted filter statistics, x b and B, prior to actually running any filter.This distribution is termed a hyperprior because it represents a prior for the background information in the first stage of a Bayesian hierarchy.Assuming one subscribes to this EnKF-N view of the EnKF, it shows that additional information is actually required in the EnKF, in addition to the observations and the prior ensemble that are potentially insufficient to make an inference.
A simple choice was made in Boc11 for the hyperprior: the Jeffreys prior is an analytically tractable and uninformative hyperprior of the form where |B| is the determinant of the background error covariance matrix B of dimension M × M.

Predictive prior
With a given hyperprior, the marginalization over x b and B, Eq. ( 3), can in principle be carried out to obtain p(x|E).We choose to call it a predictive prior to comply with the traditional view that sees it as a prior before assimilating the observations.Note, however, that statisticians would rather call it a predictive posterior distribution as the outcome of a first-stage inference of a Bayesian hierarchy, where E is the data.Using Jeffreys' hyperprior, Boc11 showed that the integral can be obtained analytically and that the predictive prior is a multivariate T distribution: where |.| denotes the determinant and ε N = 1 + 1/N .The determinant is computed in the space generated by the perturbations of the ensemble so that it is not singular.This distribution has fat tails, thus accounting for the uncertainty in B. The factor ε N is a result of the uncertainty in x b ; if x b were known to coincide with the ensemble mean x, then ε N would be 1 instead.For a Gaussian process, ε N P is an unbiased estimator of the squared error of the ensemble mean x (Sacher and Bartello, 2008), where ε N stems from the uncertain x b , which does not coincide with x.In the derivation of Boc11, the ε N P correction comes from integrating out on x b .Therefore, ε N can be seen as an inflation factor on the prior covariance matrix that should actually apply to any type of EnKF.This non-Gaussian prior distribution can be seen as an average over Gaussian distributions weighted according to the hyperprior.It can be shown that Eq. ( 6) can be re-arranged: where P † is the Moore-Penrose inverse of P.
In comparison, the traditional EnKF implicitly assumes that the hyperprior is δ(B − P)δ(x b − x), where δ is a Dirac multidimensional distribution.In other words, the background statistics generated from the ensemble coincide with the true background statistics.As a result, one obtains in this case the Gaussian prior:

Analysis
Consider a given analysis step of the data assimilation cycle.
This is at odds with the ill-founded claim by Boc11 that the likelihood still depends on E. This expression clarifies one of the issues raised in Boc11.
Let us recall and further discuss the analysis step of the EnKF-N for state estimation.For the sake of simplicity, the observational error distribution is assumed Gaussian, unbiased, with covariance matrix R. The observation operator will be denoted H .Because the predictive prior Eq. ( 6) is non-Gaussian, the analysis is performed through a variational optimization similarly to the maximum likelihood filter (Zupanski, 2005), rather than by matrix algebra as in traditional EnKFs.Working in ensemble space, states are parameterized by vectors w of size N such that There is at least one "gauge" degree of freedom in w due to the fact that x is invariant under w −→ w + λ1, where λ is an arbitrary scalar.This is the result of the linear dependence of the centered perturbation vectors.
For reference, with these notations, the cost function of the ensemble transform Kalman filter (ETKF, Bishop et al., 2001;Ott et al., 2004) based on Eq. ( 8) reads as where z 2 R = z T R −1 z and w is the orthogonal projector onto the row space of X. Algebraically, w = X † X where X † is the Moore-Penrose inverse of X. Equation ( 12) is the direct result of the substitution into Eq.( 8) of x by w using Eq. ( 11).As explained by Hunt et al. (2007), one can add the term w 2 I N − w to the cost function without altering the minimum.Denoting z 2 = z T z, this leads to The added term has been labelled the gauge fixing term by Boc11 using standard physics terminology.The EnKF-N cost function in Boc11 is It is the result of the substitution of x by w using Eq. ( 11) into Eq.( 7), and of the addition of the gauge fixing term albeit inside the logarithm, which was justified by extending the idea of Hunt et al. (2007) and the monotonicity of the logarithm.The restriction of x to the ensemble subspace is an approximation inherent in the traditional EnKFs.By virtue of the hyperprior, it is not necessarily part of the EnKF-N.However, it is quite justified assuming the ensemble tracks the unstable subspace of the dynamics.When the ensemble is of limited size and cannot span the full range of uncertain directions, such as in high-dimensional systems, this ensemble transform representation can be made local (Hunt et al., 2007).A cost function in state space could be rigorously derived from the prior Eq. ( 7) following Boc11.The cost function J (w) was obtained from the substitution x(w) = x + Xw in the state space cost function, which, however, ignores the Jacobian of this transformation.Hence, it would be preferable to directly obtain the probability density of the prior as a function of w, which requires some mathematical development compared to the immediate substitution in the cost function.
From a probabilistic standpoint, the logarithm of the determinant of the Jacobian matrix should be added to the cost function since ln p w (w) = ln p x (x(w)) + ln ∂x(w) ∂w . (15) Had the transformation w −→ x(w) been nonlinear, the cost function would have been impacted (see for instance Fletcher and Zupanski, 2006).However, the standard ensemble transform is linear, which should result in an irrelevant constant.
Unfortunately, because of the gauge degree(s) of freedom of the perturbations, the transformation is not injective and therefore singular, and the determinant of the transformation is zero, yielding an undefined constant.Hence, the issue should be addressed more carefully.It will turn out in the following section that the cost function should be amended in the non-quadratic case.

Accounting for the gauge degrees of freedom of the ensemble transform
Let us denote N ≤ min(N − 1, M) the rank of X.The number of gauge degrees of freedom is then g ≡ N − N. The most common case encountered when applying the EnKF to highdimensional systems is that the rank of X is N − 1 M, that is to say g = 1, because X1 = 0.A nonsingular ensemble transform is obtained by restricting w to N ⊥ the orthogonal complement of the null space, N , of X.Hence, the ensemble transform is nonsingular.This amounts to fixing the gauge at zero.With this restriction to N ⊥ , the prior of the ETKF defined over whereas the prior pdf of the EnKF-N is In principle, the analysis can be performed in N ⊥ using reduced variables w r ∈ R N , looking for an estimate of the form x = x + X r w r , where X r would stand for a reduced perturbation matrix.To do so, let us introduce the singular value decomposition of the initial perturbation matrix: The reduced perturbation matrix X r is then simply given by X r = U .However, the change of variable w −→ w r would prevent us from using the elegant symmetric formalism of the ensemble transform Kalman filter because the perturbation matrix X r is not centered.Moreover, the new perturbations, X r , are non-trivial linear combinations of the initial perturbations, X.It is likely to generate imbalances with nonlinear dynamics.Indeed, it is unlikely that the displacement of the ensemble in the analysis would be minimized, as opposed to what happens with the ETKF when the transform matrix is chosen symmetric (Ott et al., 2004).We applied this change of variable to a standard ETKF and tested it numerically with the Lorenz-95 low-order model (Lorenz and Emanuel, 1998).
We obtained much larger displacements and intermittent instabilities that require more inflation.Hence, we wish to fix the gauge while keeping the initial perturbations as much as possible.To do so, the definition of the prior pdfs defined on N ⊥ is extended to the full ensemble space R N = N ⊥ ⊕ N , while maintaining their correct marginal over N ⊥ .For the EnKF, we can fix the gauge by choosing as in Eq. ( 13), which has indeed the correct marginal since p(w) factorizes into independent components for N and N ⊥ .For the EnKF-N, we can fix the gauge while keeping the symmetry by choosing It can be seen that this pdf has the correct marginal by integrating out on N , using the change of variable w − w −→ ε N + w 2 (w − w).The use of these extended pdfs in the analysis is justified by the fact that the Bayesian analysis pdf p(w|y) in ensemble space has the correct marginal over N ⊥ .Indeed, if p(y|w) = p(y|x = x + Xw) is the likelihood in ensemble space that does not depend on w, then the marginal of the Bayesian analysis pdf p(w|y) ∝ p(y|w)p(w) is consistently given by p( w|y) ∝ p(y| w)p( w).We conclude that it is possible to perform an analysis in terms of the redundant w in place of w.
As opposed to the Gaussian case, the form of pdf Eq. ( 20) brings in a change in the EnKF-N when the analysis is performed in ensemble space.The appearance of g in the exponent is due to a non-trivial Jacobian determinant when passing from the ungauged to the gauged variables, a minimalist example of the so-called Faddeev-Popov determinant (Zinn-Justin, 2002).This consideration generates a modification of the EnKF-N cost function when using Eq. ( 20) as the predictive prior.Henceforth, we shall assume g = 1, which will always be encountered in the rest of the paper.Consequently, the modified EnKF-N has the following cost function: which replaces Eq. ( 14).This modification, g = 0 → 1, as compared with Boc11, will be enforced in the rest of the paper.Such a change will be shown to significantly impact the numerical experiments in Sect. 5.

Update of the ensemble
The form of the predictive prior also has important consequences for the EnKF-N theory.First of all, the pdfs Eqs. ( 18) or ( 20) are multivariate T distributions, and more specifically multivariate Cauchy distributions.They are proper, i.e., normalizable to 1, but have neither first-order nor second-order moments.

Laplace approximation
Conditioned on B, both the prior and the posterior are Gaussian provided the observation error distribution is Gaussian, which is assumed for the sake of simplicity.Without this conditioning, however, they are both a (continuous) mixture of candidate Gaussians in the EnKF-N derivation.Therefore, the posterior p(w|y) ∝ p(y|w)p(w) should be interpreted with caution.As was done in Boc11, its mode can in principle be safely estimated.However, its moments do not generally exist.They exist only if the likelihood p(y|w) enables it.Even when they do exist, they do not carry the same significance as for Gaussians.
Hence, the analysis w a is safely defined using the EnKF-N Cauchy prior as the most likely w of the posterior pdf.But, using the mean and the error covariance matrix of the posterior is either impossible or questionable because as explained above they may not exist.
One candidate Gaussian that does not involve integrating over the hyperprior is the Laplace approximation of the posterior (see Bishop, 2006, for instance), which is the Gaussian approximation fitted to the pdf in the neighborhood of w a .This way, the covariance matrix of the Laplace distribution is obtained as the Hessian of the cost function at the minimum w a .Refining the covariance matrix from the inverse Hessian is not an option since the exact covariance matrix of the posterior pdf may not exist.This is a counterintuitive argument against looking for a better approximation of the posterior covariance matrix rather than the inverse Hessian.
Once a candidate Gaussian for the posterior has been obtained, the updated ensemble of the EnKF-N is obtained from the Hessian, just as in the ETKF.The updated ensemble is www.nonlin-processes-geophys.net/22/645/2015/ Nonlin.Processes Geophys., 22, 645-662, 2015 Algorithm 1 Algorithm of the primal EnKF-N Require: The forecast ensemble {x k } k=1,...,N , the observations y, the observation error covariance matrix R, and U an orthogonal matrix satisfying U1 = 1.1: Compute the mean x and the perturbations X from {x k } k=1,...,N , Y = HX 2: Find the argument of the minimum: where x a is the analysis in state space; w a is the argument of the minimum of Eq. ( 21).The updated ensemble of perturbations X a is given by where U is an arbitrary orthogonal matrix satisfying U1 = 1 (Sakov and Oke, 2008) and where H a is the Hessian of Eq. ( 21), with Y = HX and H the tangent linear of H .The algorithm of this so-called primal EnKF-N is recalled by Algorithm 1.Note that the algorithm can handle nonlinear observation operators since it is based on a variational analysis similarly to the maximum likelihood ensemble filter of Zupanski (2005).
We will choose U to be the identity matrix in all numerical illustrations of this paper, and in particular Sect.5, in order to minimize the displacement in the analysis (Ott et al., 2004).

Theoretical equivalence between the primal and dual approaches
Boc11 showed that the functional Eq. ( 21) is generally nonconvex but has a global minimum.Yet, the cost function is only truly non-quadratic in the direction of the radial degree of freedom w of w, because the predictive prior is elliptical.This remark led Bocquet and Sakov (2012) (later BS12) to show, assuming H is linear or linearized, that the minimization of Eq. ( 21) can be performed simply by minimizing the following dual cost function over ]0, (N + 1)/ε N ]: where δ = y −H (x).Its global minimum can easily be found since ζ −→ D(ζ ) is a scalar cost function.The variable ζ is conjugate to the square radius w 2 .It can be seen as the number of effective degrees of freedom in the ensemble.
Once the argument of the minimum of D(ζ ), ζ a , is computed, the analysis for w can be obtained from the ETKF-like cost function with the solution Based on this effective cost function, an updated set of perturbations can be obtained: As a consequence, the EnKF-N with an analysis performed in ensemble space can be seen as an ETKF with an adaptive optimal inflation factor λ a applied to the prior distribution, and related to ζ a by λ a = √ (N − 1)/ζ a .Provided one subscribes to the EnKF-N formalism, this tells us that sampling errors can be cured by multiplicative inflation.This is supported by Whitaker and Hamill (2012), who experimentally showed that multiplicative inflation is well suited to account for sampling errors, whereas additive inflation is better suited to account for model errors in a meteorological context.Other efficient adaptive inflation methods have been proposed by, e.g., Wang and Bishop (2003), Anderson (2007), Li et al. (2009), Zheng (2009), Brankart et al. (2010), Miyoshi (2011), Liang et al. (2012), and Ying and Zhang (2015) for broader uses including extrinsic model error.Nevertheless, for the experiments described in Sect.5, they are not as successful with the specific goal of accounting for sampling errors as the EnKF-N.
Equation ( 28), on which the results of BS12 are based, is only an approximation, because it does not use the Hessian of the complete cost function Eq. ( 21).Only the diagonal term of the Hessian of the background term is kept: off-diagonal rank-one correction, −2(N + 1) −1 ζ 2 a w a w T a , has been neglected.This approximation is similar to that of the Gauss-Newton method, which is an approximation of the Newton method where the Hessian of the cost function to be minimized is approximated by the product of first-order derivative terms and by neglecting second-order derivative terms.The approximation actually consists in neglecting the co-dependence of the errors in the radial ( w ) and angular (w/ w ) degrees of freedom of w.
Since the dual EnKF-N is meant to be equivalent to the primal EnKF-N when the observation operator is linear, the updated ensemble should actually be based on Eq. ( 24), which can also be written as with H a = Y T R −1 Y+ζ a I N , and compared to the approximation Eq. ( 28) used in BS12.The algorithm of this so-called dual EnKF-N is recalled in Algorithm 2 and includes the correction.With Eq. ( 30), the dual scheme is strictly equivalent to the primal scheme provided that H is linear, whereas it is only approximately so with Eq. ( 28).
The co-dependence of the radial and angular degrees of freedom exposed by the dual cost function is further explored in Appendix A.

Cycling of the EnKF-N and impact of model nonlinearity
We have discussed and amended the analysis step of the EnKF-N.To complete the data assimilation cycle, the ensemble must be forecasted between analyses.The cycling of the EnKF-N can be summarized by the following diagram:

Cycling of the EnKF-N and impact of model nonlinearity
We have discussed and amended the analysis step of the EnKF-N.To complete the data assimilation cycle, the ensemble must be forecasted between analyses.The cycling of the 5 EnKF-N can be summarized by the following diagram: In accounting for sampling error, the EnKF-N framework differs quite significantly from that of van Leeuwen (1999); Furrer and Bengtsson (2007); Sacher and Bartello (2008).Focusing on the bias of the EnKF gain and precision matrix, these studies are geared towards 10 single-cycle corrections.By contrast, the EnKF-N enables the likelihood to influence the estimation of the posterior covariance matrix.This can be seen by writing and recognizing the posterior as a non-uniform mixture of Gaussians, as for the prior.The inclusion of the likelihood is what makes the EnKF-N equipped to handle the effects of model nonlinearity and the sequentiality of data assimilation. 15 With linear perfect evolution and observation models, and provided the ensemble is big enough to span the unstable and neutral subspace, inflation or localization are unnecessary in the ensemble square root filter (Sakov and Oke, 2008;Gurumoorthy et al., 2015).Sampling errors, if present, can be ignored in this case.Therefore, it is likely that inflation is actually compensating for the misestimation of errors generated by model nonlinearity.In accounting for sampling error, the EnKF-N framework differs quite significantly from that of van Leeuwen (1999), Furrer and Bengtsson (2007), and Sacher and Bartello (2008).Focusing on the bias of the EnKF gain and precision matrix, these studies are geared towards single-cycle corrections.By contrast, the EnKF-N enables the likelihood to influence the estimation of the posterior covariance matrix.This can be seen by writing and recognizing the posterior as a non-uniform mixture of Gaussians, as for the prior.The inclusion of the likelihood is what makes the EnKF-N equipped to handle the effects of model nonlinearity and the sequentiality of data assimilation.
Assume that an ensemble square root Kalman filter is applied to linear forecast and observation models, and further assume that the ensemble is big enough to span the unstable and neutral subspace.In this case, it was shown that inflation or localization are unnecessary to regularize the error covariance matrix (Sakov and Oke, 2008;Gurumoorthy et al., 2015).Sampling errors, if present, can be ignored in this case.Therefore, it is inferred from this result that inflation is actually compensating for the mis-estimation of errors generated by model nonlinearity.Following this line of thought, Boc11 hypothesized that the finite-size scheme actually accounts for the error generated in the nonlinear deformation of the ensemble in the forecast step of the EnKF.What happens to the EnKF-N when the model gets more linear is addressed in Sect.6.
A recent study by Palatella and Trevisan (2015) confirms and clarifies this suggestion.The authors show that the nonlinear evolution of the error in the extended Kalman filter generates additional errors unaccounted for by the extended Kalman filter linear propagation of the error.In a specific example, they are able to avoid the need for inflation with the 40-variable Lorenz-95 model using a total of 24 perturbations (14 for the unstable and neutral subspace and 10 for the main nonlinear corrections).We checked that the same root mean square errors as shown in Table II of Palatella and Trevisan (2015) can be achieved by the EnKF-N and the optimally tuned EnKF with an ensemble of size N = 24.This reinforces the idea that the EnKF-N accounts, albeit within ensemble space, for the error generated by nonlinear corrections inside and outside the ensemble subspace.Additionally, note that the EnKF-N does not show any sign of divergence in the regime studied by Palatella and Trevisan (2015) even for much stronger model nonlinearity.
To picture the impact of inflation on the fully cycled EnKF, let us consider the simplest possible, one-variable, perfect, linear model x k+1 = αx k , with k the time index.If α 2 > 1, the model is unstable, and stable if α 2 < 1.In terms of uncertainty quantification, multiplicative inflation is meant to increase the error covariances so as to account for misestimated errors.Here, we apply the inflation to the prior at each analysis step since the EnKF-N implicitly does it.Let us denote b k the forecast/prior error variance, r the static observation error variance and a k the error analysis variance.ζ plays the same role as in the EnKF-N scheme, so that a uniform inflation is ζ − 1 2 .Sequential data assimilation implies the following recursions for the variances: whose asymptotic solution (a Now, consider a multivariate model that is the collection of several independent one-variable models with as many growth factors α.In the absence of inflation, ζ = 1, the stable modes, α 2 < 1, converge to a perfect analysis (a = 0), whereas the unstable modes, α 2 > 1, converge to a finite error (a > 0) that grows with the instability of the modes, as expected.When inflation is used, ζ < 1; the picture changes but mostly affects the modes close to neutral (see Fig. 2).
Algorithm 2 Algorithm of the dual EnKF-N Require: The forecast ensemble {x k } k=1,...,N , the observations y, the observation error covariance matrix R, and U an orthogonal matrix satisfying U1 = 1.1: Compute the mean x and the perturbations X from {x k } k=1,...,N , Y = HX, δ = y − Hx 2: Find the argument of the minimum: The threshold is displaced and the modes with finite asymptotic errors now include a fraction of the stable modes.The strongly unstable modes are much less impacted.
In spite of its simplicity and its linearity, this model makes the link between the EnKF-N, multiplicative inflation and the dynamics.Ng et al. (2011) and Palatella and Trevisan (2015) have argued that, in the absence of model error, systematic error of the EnKF comes from the error transported from the unstable subspace to the stable subspace by the effect of nonlinearity.Unaccounted error would accumulate on the stable modes close to neutrality.As seen above, the use of the EnKF-N, or multiplicative inflation on the prior, precisely acts on these modes by increasing their error statistics without affecting the most unstable modes that mainly drive the performance of the EnKF.

Numerical experiments
Twin experiments using a perfect model and the EnKF-N have been carried out on several low-order models in previous studies.In many cases the EnKF-N, or its variant with localization (using domain localization), were reported to perform on the Lorenz-63 and Lorenz-95 models as well as the ETKF but with optimally tuned uniform inflation.With a two-dimensional forced turbulence model, driven by the barotropic vorticity advection equation, it was found to perform almost as well as the ETKF with optimally tuned uniform inflation (Bocquet and Sakov, 2014), although the local EnKF-N has not yet been thoroughly tested with this model.
The choice of ε N has remained a puzzle in these experiments.It has been reported that the Lorenz-63 model required ε N = 1 + 1/N , whereas the Lorenz-95 model required ε N = 1, seemingly owing to the larger ensemble size.It was also previously reported that domain localization of the EnKF-N with both models required ε N = 1+1/N .In the present study, we have revisited those experiments using the correction g = 0 → 1 of Sect.2.4, sticking with the theoretical value ε N = 1 + 1/N and the same ensemble sizes.This essentially reproduced the results of the best choice for ε N in each case.For these low-order models, this solved a puzzle: there is no need to adjust ε N = 1 + 1/N .Hence, the EnKF-N in the subsequent experiments uses the correction g = 0 → 1 and ε N = 1 + 1/N. Figure 3 summarizes the corrections of Sects. 2 and 3.It also illustrates the equivalence between the primal and dual EnKF-N.It additionally shows the performance of the dual EnKF-N with the approximate Hessian used in BS12, and the performance of the ensemble square root Kalman filter with optimally tuned uniform inflation.The Lorenz-95 low-order model is chosen for this illustration (Lorenz and Emanuel, 1998).Details about the model can be found in their article.A twin experiment is performed, with a fully observed system (H = I d , where d = M = 40), an observation error variance matrix R = I d that is also used to generate synthetic observations from the truth.The ensemble size is N = 20.The time interval between observation updates t is varied, which changes the nonlinearity strength.Varying the magnitude of a model's nonlinearity is highly relevant because, as explained in Sect.4, model nonlinearity is the underlying cause of the need for inflation, in this rank-sufficient context (N = 20).We plot the mean analysis root mean square error (RMSE) between the analysis state and the truth state.To ob-  tain a satisfying convergence of the statistics, the RMSEs are averaged over 10 5 cycles, after a spin-up of 5 × 10 3 cycles.
The performances of the primal and dual EnKF-N are indistinguishable for the full t range.The dual EnKF-N with approximate Hessian hardly differs from the EnKF-N, i.e., using Eq. ( 28) in place of Eq. ( 30).However, it is slightly suboptimal for t = 0.05 by about 5 %.
The additional numerical cost of using the finite-size formalism based on Jeffreys' hyperprior is now compared to the analysis step of an ensemble Kalman filter or of an ensemble Kalman smoother based on the ensemble-transform formulation.The computational cost depends on the type of method.Let us first discuss non-iterative methods, such as the ETKF or a smoother based on the ETKF.If the singular value decomposition (SVD) of R − 1 2 Y has already been obtained, the dual approach can be used and the additional cost of the EnKF-N, or EnKS-N, is due to the minimization of the dual cost function Eq. ( 25), which is negligible.This is indeed the case in all our experiments where the SVD has been obtained in order to compute the inverse in the state update Eq. ( 27) or the inverse square root in the perturbation update Eqs. ( 30) or ( 24).If the data assimilation is iterative (for significantly nonlinear models) such as the maximum likelihood ensemble filter (Zupanski, 2005) or the iterative ensemble Kalman smoother (Bocquet and Sakov, 2014), then the primal approach of the finite-size scheme can be made to coincide with the iterative scheme.Examples of such integrated schemes are given in Bocquet and Sakov (2012) and Bocquet and Sakov (2014).The additional cost is often negligible except if the number of expected iterations is small, which is the case if the models are weakly nonlinear.However, in this case, the finite-size correction is also expected to be small, with an effective inflation value close to 1.
Moreover, it is important to notice that the perturbations update as given by Eq. ( 30) can induce a significant extra numerical cost as compared to the update of an ETKF.Indeed, the SVD used to compute Eq. ( 27) cannot be directly used to compute Eq. ( 30), which might require another SVD.However, using the approximate scheme that consists in neglecting the off-diagonal term does not require the additional SVD.Even if the off-diagonal term is included in the Hessian, the inverse square root of the Hessian could be computed from the original SDV through a Sherman-Morisson update because the off-diagonal term is of rank one.
Let us finally mention that no significant additional storage cost is required by the scheme.

Performance in the prior-driven regime
The EnKF-N based on the Jeffreys hyperprior was found to fail in the limit where the system is almost linear but remains nonlinear (BS12).This regime is rarely explored with low-order models, but it is likely to be encountered in less homogeneous, more realistic applications.Figure 4a illustrates this failure.It extrapolates the results of Fig. 3 to very small time intervals between updates where the dynamics are quasi-linear.As t decreases the RMSE of the optimal inflation EnKF decreases as one would expect, while the RMSE of the EnKF-N based on the Jeffreys prior increases.
In this regime, the EnKF-N has great confidence in the prior as any filter would do.Therefore, the innovation-driven term becomes less important than the prior term 2 in the dual cost function Eq. ( 25), so that its mode ζ a tends to the mode of D b (ζ ), which is ζ a = (N +1)/ε N = N.Note that an inflation of 1 corresponds to ζ = N − 1.Hence, in this regime, even for moderately sized innovations, there is deflation.The failure of the EnKF-N was empirically fixed in BS12 by capping ζ a to prevent deflation.
More generally, we believe the problem is to be encountered whenever the prior largely dominates the analysis (prior-driven regime).This is bound to happen when the observations are too few and too sparsely distributed, which could occur when using domain localization, and whenever they are unreliable compared to the prior.Quasi-linear dynamics also fit this description, the ratio of the observation precision to the prior precision becoming small after a few iterations.This failure may not be due to the EnKF-N framework.It may be due to an inappropriate choice of candidate Gaussian posterior as described in Sec. 3. Or it may be due to an inappropriate choice of hyperprior in this regime.Although it seems difficult to devise a hyperprior that performs optimally in all regimes, we can suggest two adjustments to Jeffreys' hyperprior in this prior-driven regime.

Capping of the inflation
Here, deflation is avoided by capping ζ .Firstly, we build the desired dual cost function.Instead of minimizing D(ζ ) over ]0, (N + 1)/ε N ], it is minimized over ]0, ζ ], with 0 ≤ ζ ≤ (N + 1)/ε N , which defines the dual cost function.ζ is a tunable bound that is meant to be fixed over a wide range of regimes.Following a similar derivation to Appendix A of BS12, one can show that the background term of the primal cost function corresponding to this dual cost function is The dual and primal cost functions can both be shown to be convex.There is no duality gap, which means, with our definitions of these functions, that the minimum of the dual cost function is equal to the minimum of the primal cost function.By construction, in the small innovation range, i.e., w 2 ≤ (N + 1)/ζ − ε N , the EnKF-N, endowed with this new hyperprior, corresponds to the ETKF (Hunt et al., 2007), with an inflation of the prior by (N − 1)/ζ ≥ 1.Since the hyperprior assumed in the regime of small w is p(x b , B) = δ(B−ζ P), this could be called the Dirac-Jeffreys hyperprior.
Even with the Dirac-Jeffreys hyperprior, it is still necessary to introduce a tiny amount of inflation through ζ in the quasi-linear regime.This might prove barely relevant in a high-dimensional realistic system, as it was for the sensitive low-order models that we tested the scheme with.Even with Lorenz-95, an instability develops over very long experimental runs in the absence of this residual inflation.This still remains a theoretical concern.Moreover, we could not find a rigorous argument to support avoiding deflation in all regimes and hence the capping.That is why we propose an alternative solution in the following.

Smoother schemes in the prior-driven regime
In the limit of R getting very large, the observations cannot carry information, and the ensemble should not be updated at all; i.e., it should be close to the prior ensemble, with an inflation of 1 (ζ = N − 1).Outside of this regime, we do not see any fundamental reason to constrain ζ to be smaller than N − 1.A criterion to characterize this regime would be which computes the ratio of the prior variances to the observation error variances.When ψ tends to zero, the analysis should be dominated by the prior and ζ should tend to N −1.
When ψ drifts away from zero, we do not want to alter the hyperprior and the EnKF-N scheme, even if it implies deflation.We found several schemes that satisfy these constraints.
The point of these formulae is to make ζ b tend to N −1 (no inflation) when the criterion ψ tends to zero.On the other hand, when ψ gets bigger, ζ b tends to N, i.e., to the original dual cost function's behavior dictated by Jeffreys' hyperprior.The implementation of these schemes is straightforward for any of the Algorithms 1 or 2, since only ε N needs to be modified either in the dual or primal cost functions.

Numerical illustrations
The performance of the Dirac-Jeffreys EnKF-N where we choose (N − 1)/ζ = 1.005, and of the EnKF-N with the hyperprior corrections (R1) and (R2), are illustrated with a twin experiment on the Lorenz-95 model in the quasi-linear regime.Also included are the EnKF-N with the Jeffreys prior and the ensemble square root Kalman filter with optimally tuned inflation.The RMSEs are plotted as a function of t in [0.01, 0.5] in Fig. 4a.
Another way to make a data assimilation system based on the Lorenz-95 model more linear, rather than decreasing t, is to decrease the forcing parameter to render the model more linear.Figure 4b illustrates this when F is varied from 4 (linear) to 12 (strongly nonlinear), with t = 0.05 and the same setup as in Sect. 5.As anticipated, the EnKF-N based on Jeffreys' hyperprior fails for F < 7.5.However, the EnKF-N based on the Dirac-Jeffreys hyperprior and the EnKF-N with schemes R1 and R2 show performances equivalent to the EnKF with optimally tuned inflation.We note a slight underperformance of the EnKF-N in the very strongly chaotic regimes compared to the optimally tuned EnKF.We have also checked that these good performances also apply to the Lorenz-63 model.
The spread of the ensemble for the Dirac-Jeffreys EnKF-N has also been plotted in Fig. 4a and b.The value of the spread is consistent with the RMSE except in significantly nonlinear regimes such as when t > 0.15 and F = 8, or to a lesser extent when t = 0.05 and F > 8.In those nonlinear regimes and with such non-iterative EnKFs, the Gaussian error statistics approximation is invalidated so that the RMSE could differ significantly from the ensemble spread.

Informative hyperprior, covariance localization and hybridization
So far, the EnKF-N has relied on a noninformative hyperprior.In this section we examine, mostly at a formal level, the possibility of accounting for additional, possibly independent, information on the error statistics, like a hybrid EnKF-3D-Var is meant to (Hamill and Snyder, 2000;Wang et al., 2007a).A single numerical illustration is intended since extended results would involve many more developments and would be very model dependent.
In a perfect model context, we observed that uncertainty in the variances usually addressed by inflation could be taken care of by the EnKF-N based on Jeffreys' hyperprior.However, it does not take care of the correlation (as opposed to variance) and rank-deficiency issues, which are usually addressed by localization.Localization has to be superimposed onto the finite-size scheme to build a local EnKF-N without the intrinsic need for inflation (Bocquet, 2011).Nonetheless, by marginalizing over limited-range covariance matrices (Sect.5 of Bocquet, 2011), we also argued that the use of an informative hyperprior would produce covariance localization within the EnKF-N framework.A minimal example where the hyperprior is defined over B matrices that are positive diagonal, and hence very short-ranged, was given and supported by a numerical experiment.Hence, it is likely that the inclusion of an informative prior is a way to elegantly impose localization within the EnKF-N framework.
An informative hyperprior is the normal-inverse Wishart (NIW) pdf: It is convenient because, with this hyperprior, Eq. ( 3) remains analytically integrable.The location state x c , the scale matrix C, which is assumed to be full-rank, κ and ν are hyperparameters of the distribution from which the true error moments x b and B are drawn.The pdf p NIW is proper only if ν > M−1, but this is not an imperative requirement provided that the integral in Eq. ( 3) is proper.The resulting predictive prior can be deduced from Gelman et al. (2014), Sect. 3.6: where x = (κx c + Nx)/(N + κ).From these expressions, x c could be interpreted as some climatological state and C would be proportional to some static error covariance matrix, which could be estimated from a prior, long and welltuned EnKF run.They could also be parameterized by tunable scalars that could be estimated by a maximum likelihood principle (Hannart and Naveau, 2014).
A subclass of hyperpriors is obtained when the degree of freedom x c is taken out, leading to the inverse Wishart (IW) distribution and to the predictive prior Jeffreys' hyperprior is recovered from the IW hyperprior in the limit where ν → 0 and C → 0, well within the region ν ≤ M − 1 where the IW pdf is improper.Note that the use of an IW distribution was advocated owing to its natural conjugacy in a remarkable paper by Myrseth and Omre (2010) where a hierarchical stochastic EnKF was first proposed and developed.
Because the scale matrix C is assumed to be full rank, updating in state space is preferred to an analysis in ensemble space.Based on the marginals Eqs. ( 38) and ( 40), the J b term of the analysis cost function is of the form In the case of the NIW hyperprior, one has In the case of the IW hyperprior, one has We observe that the J b term is formally similar to that of the EnKF-N with Jeffreys' hyperprior that is directly obtained in state space from Eq. ( 7).Hence, the sequential data assimilation schemes built from the NIW and IW hyperpriors formally follow that of the EnKF-N.But, to do so, the analysis must be written in state space, whereas it has been expressed in ensemble space so far.

Primal analysis and dual analysis
The primal analysis in state space is obtained from x a = argmin x J (x), where For the dual analysis, we further assume that the observation operator H is linear -and hence denoted H -for the primal/dual correspondence to be exact.The derivation of the dual cost function follows that of BS12.The following Lagrangian is introduced to separate the radial and angular degrees of freedom of x: where ζ is a Lagrange multiplier.The saddle-point equations of this Lagrangian are x a , ρ a , and ζ a are the saddle-point values of the variables.Using these saddle-point equations, it can be shown that the minimization of Eq. ( 44) is equivalent to the minimization of the following scalar dual cost function over ]0, γ /ε N ], a mild generalization of Eq. ( 25).As in BS12, ζ is interpreted as an effective size of the ensemble as seen by the analysis.Note that, in this context, it could easily be larger than N − 1 if the added information content of the informative hyperprior is significant.

State space update of the ensemble perturbations
Recall that the square root ensemble update corresponding to Eq. ( 30) and Jeffreys' hyperprior is Note that covariance localization cannot be implemented in ensemble space using Eq.(50).To make the covariance matrix explicit, we wish to write this in state space.Firstly, from Eq. ( 27), w a can be written as w a = Y T z, where z = ζ a R + YY T −1 δ.Then, by the matrix shift lemma that asserts that Af (BA) = f (AB)A for any two matrices A and B of compatible sizes and f an analytic function 1 , we can turn this right-transform into a left-transform 2 : When ζ a = N − 1 and z = 0, one recovers the ensemble square root Kalman update formula written in state space: (Sakov and Bertino, 2011).
Note that we could absorb − 2ζ 2 a N+1 zz T into R using the Sherman-Morrison formula, leading to an effective observation error covariance matrix R e that is bigger than R (using the order of the positive symmetric matrices).To superimpose localization on this Jeffreys hyperprior EnKF-N, a (14), which, for traditional EnKFs, would not require localization but inflation, and N = 10, which, for traditional EnKFs, would require both localization and inflation.We do not use inflation since it is meant to be accounted for by the finite-size scheme.We do not superimpose domain or covariance localization.Analysis RMSEs are computed for each run and reported in Fig. 5.This is a preliminary experiment.In particular, we do not perform any optimization of α and β based for instance on empirical Bayesian estimation.For N = 20, we barely note any improvement in terms of RMSEs due to the use of the NIW hyperprior as compared to the EnKF-N based on Jeffreys' hyperprior, i.e., (α, β) = (0, 0).However, we observe that for N = 10, localization is naturally enforced via the hyperprior due to a mechanism known in statistics as shrinkage.Although there is no dynamical tuning of α and β, and even though the choice for C is gross, good RMSEs can be obtained.A RMSE of 0.33 is achieved for (α, β) = (0.50, 057) as compared to a typical analysis RMSE of 0.20 for the EnKF-N with optimally tuned, superimposed localization.Interestingly, the average optimal effective size in this case is ζ a = 15, above the unstable subspace dimension, validating its potential use as a diagnostic.

Conclusions
In this article, we have revisited the finite-size ensemble Kalman filter, or EnKF-N.The scheme offers a Bayesian hierarchical framework to account for the uncertainty in the forecast error covariance matrix of the EnKF that is inferred from a limited-size ensemble.We have discussed, introduced additional arguments for, and sometimes improved several of the key steps of the EnKF-N derivation.Our main findings are the following.
1.A proper account of the gauge degrees of freedom in the redundant ensemble of perturbations and the resulting analysis led to a small but important modification of the ensemble transform-based EnKF-N analysis cost function (g = 0 → 1, as seen in Eq. 21).
2. Consequently, the marginal posterior distribution of the system state is a Cauchy distribution, which is proper but does not have first-and second-order moments.Hence, only the maximum a posteriori estimator is unambiguously defined.Moreover, this suggests that the Laplace approximation should be used to estimate the full posterior.
3. The modification g = 0 → 1 frees us from the inconvenient tweaking of ε N to 1 or to 1 + 1 N : now, only ε N = 1 + 1 N is required.4. The connection to dynamics has been clarified.It had already been assumed that the EnKF-N compensates for the nonlinear deformation of the ensemble in the forecast step.This conjecture was substantiated here by arguing that the effect of the nonlinearities is similar to sampling error, thus explaining why multiplicative inflation, and the EnKF-N in particular, can compensate for it.
5. The ensemble update of the dual EnKF-N was amended to offer a perfect equivalence with the primal EnKF-N.It was shown that the additional term in the posterior error covariance matrix accounts for the error codependence between the angular and the radial degrees of freedom.However, this correction barely affected the numerical experiments we tested it with.
6.The EnKF-N based on Jeffreys' hyperprior led to unsatisfying performance in the limit where the analysis is largely driven by the prior, especially in the regime where the model is almost (but not) linear.We proposed two new types of schemes that rectify the hyperprior.These schemes have been successfully tested on low-order models, meaning that the performance of the EnKF-N becomes as good as the ensemble square root Kalman filter with optimally tuned inflation in all the tested dynamical regimes.
7. As originally mentioned in Boc11, the EnKF-N offers a broad framework to craft variants of the EnKF with alternative hyperpriors.Inflation was shown to be addressed by a noninformative hyperprior, whereas a localization seems to require an informative hyperprior.
Here, we showed that choosing the informative normalinverse Wishart distribution as a hyperprior for x b , B leads to a formally similar EnKF-N, albeit expressed in state space rather than ensemble space.The EnKF-N based on this informative hyperprior is a finite-size variant of the hybrid EnKF-3D-Var.It has a potential for tuning the balance between static and dynamical errors.Moreover, we showed on a preliminary numerical experiment that localization can be naturally carried out through shrinkage induced by the scale matrix of the normal-inverse Wishart hyperprior.
With the corrections and new interpretations on the EnKF-N based on Jeffreys' hyperprior, we have obtained a practical and robust tool that can be used in perfect model EnKF experiments in a wide range of conditions without the burden of tuning the multiplicative inflation.This has saved us a lot of computational time in recent published methodological studies.
An EnKF-N based on an informative hyperprior, the normal-inverse Wishart distribution, has been described and its equations derived.We plan to evaluate it thoroughly in extensive numerical experiments.Several optional uses of the method are contemplated.Hyperparameters x c , C, ν and κ could be diagnosed from the statistics of a prior well-tuned data assimilation run.Empirical Bayesian approaches could then be used to objectively balance the static errors and the dynamical errors.Alternatively, the hyperparameters could be estimated online in the course of the EnKF, rather than being obtained from prior statistics, using a more systematic empirical Bayesian approach.
The EnKF-N is not designed to handle model error, which is critical for realistic applications.Other adaptive inflation techniques currently in operation would be more robust in such contexts.We are working on a consistent merging of the finite-size approach that accounts for sampling errors and of a multiplicative inflation scheme designed to account for model error.
in an ensemble variational context.It has been shown to avoid the need for the multiplica-Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.M. Bocquet et al.: Improving the finite-size ensemble Kalman filter x b and B. Denote E = [x 1 , x 2 , . .., x N ] the ensemble of size N formatted as an M × N matrix where M is the state space dimension, x = E1/N the ensemble mean where 1 = (1, • • •, 1) T , and X = E−x1 T the perturbation matrix.Hence, P = XX T /(N − 1) is the empirical covariance matrix of the ensemble.Marginalizing over all potential x b and B, the prior of x reads p (x|E) = dx b dB p (x|E, x b , B) p (x b , B|E) .
which can be simply written H b ζ a I N using ζ a = N +1 ε N + w a 2 shown in BS12 to be one of the optimum conditions.The Nonlin.Processes Geophys., 22, 645-662, 2015 www.nonlin-processes-geophys.net/22/645/2015/ M. Bocquet et al.: Improving the finite-size ensemble Kalman filter 651 Paper | Discussion Paper | Discussion Paper | Discussion Paper | The co-dependence of the radial and angular degrees of freedom exposed by the dual cost function are further explored in Appendix A.

Figure 2 .
Figure 2. Analysis error variance when applying sequential data assimilation to x k+1 = αx k with (ζ = 0.75, dashed line) or without (ζ = 1, full line) multiplicative inflation on the prior, as a function of the model growth α.We chose r = 1.

Figure 3 .
Figure 3. Average analysis RMSE for the primal EnKF-N, the dual EnKF-N, the approximate EnKF-N, and the EnKF with uniform optimally tuned inflation, applied to the Lorenz-95 model, as a function of the time step between updates.The finite-size EnKFs are based on Jeffreys' hyperprior.

Figure 4 .
Figure 4. Average analysis RMSE for the EnKF-N with Jeffreys' hyperprior, with the EnKF-N based on the Dirac-Jeffreys hyperprior, with the EnKF-N based on the Jeffreys hyperprior but enforcing schemes R1 or R2, and the EnKF with uniform optimally tuned inflation, applied to the Lorenz-95 model, as a function of the time step between update (top), and as a function of the forcing F of the Lorenz-95 model (bottom).The analysis ensemble spread of the EnKF-N based on the Dirac-Jeffreys hyperprior is also shown.
Two of them, denoted R1 and R2, consist in modifying ε N into ε N and yield a well-behaved mode of the background part of the dual cost function ζ b = argmin ζ [D b (ζ )]:

Figure 5 .
Figure 5. Average analysis RMSE as a function of (α, β) for the EnKF-N based on the IW hyperprior, without inflation or enforced localization, for ensemble sizes of N = 20 (left) and of N = 10 (right).The RMSEs above 1, i.e., worse than an analysis based only on observations, are in white.
Analysis with a reliable Gaussian prior p(x|xb, B) Figure1.Schematic of the traditional standpoint on the analysis of the EnKF (top row), what it actually does using a Gaussian prior sampled from three particles (middle row), and using a predictive prior accounting for the uncertainty due to sampling (bottom row).The full green line represents the Gaussian observation error prior pdfs, and the dashed blue lines represent the Gaussian/predictive priors if known, or estimated from an ensemble, or obtained from a marginalization over multiple potential error statistics.The dotted red curves are the resulting analysis pdfs.