Ensemble Kalman filtering without the intrinsic need for inflation

The mainintrinsic source of error in the ensemble Kalman filter (EnKF) is sampling error. External sources of error, such as model error or deviations from Gaussianity, depend on the dynamical properties of the model. Sampling errors can lead to instability of the filter which, as a consequence, often requires inflation and localization. The goal of this article is to derive an ensemble Kalman filter which is less sensitive to sampling errors. A prior probability density function conditional on the forecast ensemble is derived using Bayesian principles. Even though this prior is built upon the assumption that the ensemble is Gaussian-distributed, it is different from the Gaussian probability density function defined by the empirical mean and the empirical error covariance matrix of the ensemble, which is implicitly used in traditional EnKFs. This new prior generates a new class of ensemble Kalman filters, called finite-size ensemble Kalman filter (EnKF-N). One deterministic variant, the finite-size ensemble transform Kalman filter (ETKF-N), is derived. It is tested on the Lorenz ’63 and Lorenz ’95 models. In this context, ETKF-N is shown to be stable without inflation for ensemble size greater than the model unstable subspace dimension, at the same numerical cost as the ensemble transform Kalman filter (ETKF). One variant of ETKF-N seems to systematically outperform the ETKF with optimally tuned inflation. However it is shown that ETKF-N does not account for all sampling errors, and necessitates localization like any EnKF, whenever the ensemble size is too small. In order to explore the need for inflation in this small ensemble size regime, a local version of the new class of filters is defined (LETKF-N) and tested on the Lorenz ’95 toy model. Whatever the size of the ensemble, the filter is stable. Its performance without inflation is slightly inferior to that of LETKF with optimally tuned inflation for small interval between updates, and superior to LETKF with optimally tuned inflation for large time interval between updates. Correspondence to: M. Bocquet (bocquet@cerea.enpc.fr)


Introduction
The ensemble Kalman filter (EnKF) has become a very popular potential substitute to variational data assimilation in high dimension, because it does not require the adjoint of the evolution model, because of a low storage requirement, because of its natural probabilistic formulation, and because it easily lends itself to parallel computing (Evensen, 2009 and reference therein).

Errors in the ensemble Kalman filter schemes
The EnKF schemes can be affected by errors of different nature.A flaw of the original scheme (Evensen, 1994) which incompletely took into account the impact of the uncertainty of the observations in the analysis, was corrected by Burgers et al. (1998).They introduced a stochastic EnKF, by perturbing the observations for each member of the ensemble, in accordance with the assumed observational noise.Alternatives to the stochastic schemes are the deterministic ensemble Kalman filters introduced by Anderson (2001); Whitaker and Hamill (2002); Tippett et al. (2003).
Even with this correction, EnKF is known to often suffer from undersampling issues, because it is based on the initial claim that the few tens of members of the ensemble may suffice to represent the first and second-order statistics of errors of a large geophysical system.This issue was diagnosed very early by Houtekamer and Mitchell (1998); Whitaker and Hamill (2002).Indeed, the failure to properly sample leads to an underestimation of the error variances, and ultimately to a divergence of the filter.
Adding to the error amplitude mismatch, undersampling generates spurious correlations, especially at long distance separation as addressed by Houtekamer and Mitchell (1998); Hamill et al. (2001).
The sampling errors are intrinsic deficiencies of the EnKF algorithms.In addition to these, one should also account for model errors that are of external nature for an EnKF scheme.Indeed, they are not due to a flaw in the data assimilation algorithm but to deficiencies in the evolution model.
Published by Copernicus Publications on behalf of the European Geosciences Union and the American Geophysical Union.

Strategies to reduce error
Besides the early correction of the stochastic filter, techniques were devised to correct, or make up for the sampling issues.For both deterministic and stochastic filters, the error amplitude problem can be fixed by the use of an inflation of the ensemble: the anomalies (deviations of the members from the ensemble mean) are scaled up by a factor that accounts for the underestimation of the variances (Anderson and Anderson, 1999;Hamill et al., 2001).Alternatively, the inflation can be additive via stochastic perturbations of the ensemble members, as shown by Mitchell and Houtekamer (1999); Corazza et al. (2002) where it was used to account for the misrepresented model error.
As far as stochastic filters are concerned, Houtekamer andMitchell (1998, 2001) proposed to use a multi-ensemble configuration, where the ensemble is split into several subensembles.The Kalman gain of one sub-ensemble can be computed from the rest of the ensemble, avoiding the socalled inbreeding effect.Remarkably, in a perfect model context, the scheme was shown to avoid the intrinsic need for inflation (Mitchell and Houtekamer, 2009).
Unfortunately, inflation or multi-ensemble configuration do not entirely solve the sampling problem and especially the long-range spurious correlations.These can be addressed in two ways under the name of localization.The first route consists in increasing the rank of the forecast error covariance matrix by applying a Schur product with a short-range admissible correlation matrix (Houtekamer and Mitchell, 2001;Hamill et al., 2001).The second route consists in making the analysis local by assimilating a subset of nearby observations (Ott et al., 2004 and references therein).Though vaguely connected, the two approaches still require common grounds to be understood (Sakov and Bertino, 2010).But alternative methods have emerged, either based on cross-validation (Anderson, 2007a), on multiscale analysis (Zhou et al., 2006), or on empirical considerations (Bishop and Hodyss, 2007).
Many of these techniques introduce additional parameters, such as the inflation factor, the number of sub-ensembles, or the localization length.A few of these parameters can even be made local.They can be chosen from experience gathered on a particular system, or they can be estimated online.
The online estimation methods are adaptive techniques, which is a growing subject.Focussing on the inflation issue, they are based on a specific maximum likelihood estimator of the inflation scaling, or of several scalars that parameterize the error covariance matrices (Mitchell and Houtekamer, 1999;Anderson, 2007b;Brankart et al., 2010) essentially following the ideas of Dee (1995).Another adaptive approach (Li et al., 2009) use the diagnostics of Desroziers et al. (2005).

Towards objective identification of errors
More straightforward approaches have recently been explored through the identification of the sampling errors.Mallat et al. (1998); Furrer and Bengtsson (2007); Raynaud et al. (2009) put forward a quantitative argument that shows the shortcomings of sampling.Let us define an ensemble of N state vectors x k in R M , for k = 1,...,N.The empirical mean of the ensemble is and the empirical background error covariance matrix of the ensemble is (2) They assume that the ensemble members are drawn from a multivariate Gaussian distribution of unknown covariance matrix B, that generally differs from the empirically estimated covariance matrix P. Then the variance of each entry of P can be assessed using Wick's theorem (Wick, 1950): with i,j = 1,...,M indexing the state space grid-cells; E is the expectation operator of the Gaussian process and [C] ij generically denotes entry (i,j ) of matrix C.
In particular one obtains the average of the error on the estimated variances in B which has been used in an ensemble of assimilations (Raynaud et al., 2009).Considering covariances at long distance (i = j ), [B] ij is expected to vanish for most geophysical systems.And yet the errors in estimating [B] ij are all but vanishing for a small ensemble.The impact of these errors on the analysis can be objectively estimated using the results of van Leeuwen (1999); Furrer and Bengtsson (2007); Sacher and Bartello (2008).This type of approach may offer objective solutions to account for sampling errors.However incorporating them into data assimilation scheme is not straightforward.For instance, the objective identification of the covariance errors Eq. (3) depends on the true covariances, which are unknown, and some approximate closure is needed.
The Gaussian assumption made by these authors on the distribution from which the ensemble is generated should be regarded as an approximation in the context of ensemble Kalman filtering since such an ensemble often results from Nonlin.Processes Geophys., 18,[735][736][737][738][739][740][741][742][743][744][745][746][747][748][749][750]2011 www.nonlin-processes-geophys.net/18/735/2011/ the propagation by a possibly nonlinear dynamical model.However this assumption allows to perform analytical computation using the properties of Gaussian distributions.Besides if the analysis of the data assimilation system only requires first-and second-order moments, higher-order moments are irrelevant for the update, although certainly not for the global performance of a filter.Following these authors, we shall use this statistical assumption.

Objectives and outline
In the context of ensemble Kalman filtering, the first objective of this article is to build a prior, to be used in the analysis step.Working on the first-and second-order empirical moments of the ensemble, a traditional ensemble Kalman filter performs an update as if the prior distribution was given by a Gaussian defined by the empirical moments x and P. Instead, our prior of the true state is conditioned on the entire forecast ensemble, not only its first-and second-order empirical moments.Knowing about the discrete nature of the ensemble, it should partly or completely account for the sampling flaws.
Our goal is, within the framework of ensemble Kalman filtering, to perform a Bayesian analysis with this new prior.In Sect.2, such a prior is derived.
The use of this prior in the analysis will result in the definition of a new class of algorithms for high-dimensional filtering that are exploited in Sect.3, the finite-size (i.e.finitesample) ensemble Kalman filters (denoted EnKF-N).We shall study one of its variant, which is an extension of the ensemble transform Kalman filter (ETKF) of Hunt et al. (2007), that we call the finite-size ensemble transform Kalman filter (ETKF-N).
In Sect.4, the new filters are applied to the Lorenz '63 and Lorenz '95 models.Their performance is compared to ETKF.The new filters do not seem to require inflation.Unfortunately, like any ensemble Kalman filter, ETKF-N diverges for small ensemble sizes in the Lorenz '95 case.It does require localization.This shows that the new filters do not entirely solve the sampling issue, and the reason for this is discussed in Sect. 5. Yet, a local variant of ETKF-N, the finite-size local ensemble transform Kalman filter (LETKF-N), can be built.It is tested on the Lorenz '95 toy model, and compared to the local ensemble transform Kalman filter (LETKF).The main goal of introducing LETKF-N is to examine whether the need for inflation is still avoided, in spite of the imbalance that localization is known to generate.In Sect.6, the results are summarized.A few leads to go further are also discussed.
In this article, model error is not considered.It is assumed throughout this study that the model is perfect.Therefore, in this study, inflation is meant to compensate for sampling errors (hence the adjective intrinsic in the title).Theoretically, (additive or multiplicative) inflation for model error is a rather distinct subject from inflation for sampling errors, even though it is difficult to untangle the two in practice.
The filters derived in this article should be applicable to (very) high-dimensional geophysical systems.This requires that only a small ensemble can be propagated between updates (typically no more than 100 members).

Accounting for sampling errors
We would like to reformulate the traditional analysis step of the EnKF.The prior (or previous forecast) is the focus of the reasoning.The prior that is usually used in the EnKF is given by the prior pdf of the state vector x, a vector in R M , conditional on the empirical mean x and on the empirical background error covariance matrix P, defined in Eqs. ( 1) and ( 2).Moreover this conditional pdf of the prior p(x|x,P) is implicitly assumed to be Gaussian.Lacking further information, it is the more natural distribution knowing its firstand second-order moments.

Getting more from the ensemble
Unfortunately, information is lost: this prior does not take into account the fact that x and P originate from sampling.That is why we aim at computing the prior pdf of x conditional on the ensemble, p(x|x 1 ,...,x N ).It is assumed that the members of the ensemble are independently drawn from a multivariate Gaussian distribution of mean state x b and covariance matrix B. As argued in the introduction this assumption leads to an approximation, since the ensemble members are rather samples of a (more or less) non-Gaussian distribution (Bocquet et al., 2010;Lei et al., 2010).There is no point in modelling higher-order moments of the statistics prior to the analysis, since the analysis of the Kalman filter only uses the first-and second-order moments.The moments x b and B of the true sampled distribution are unknown a priori and may differ from x and P.
Summing over all potential x b and B, where B is a positive definite matrix, the prior pdf reads The symbol dB corresponds to the Lebesgue measure on all independent entries M i≤j d[B] ij , but the integration is restricted to the cone of positive definite matrices.From the first to the second line, we used the fact that under the assumption of Gaussianity of the prior pdf of the errors, the conditioning of p(x|x 1 ,...,x N ,x b ,B) on the ensemble is redundant, since the pdf is completely characterized by x b , and B. Bayes' rule can be applied to p(x b ,B|x 1 ,...,x N ), so that www.nonlin-processes-geophys.net/18/735/2011/ Nonlin.Processes Geophys., 18, 735-750, 2011 The probability densities that are conditional on x b and B can be written explicitly thanks to the Gaussian assumptions.The first one in Eq. ( 7) would be the prior of x, if one knew the exact mean and error covariance matrix.The second one is the likelihood of the members to be drawn from the Gaussian distribution of the same mean and error covariance matrix (similarly to Dee, 1995).The third pdf in the integral of Eq. ( 7) is a prior on the background statistics (an hyperprior) whose choice will be discussed later.Writing explicitly the two Gaussian pdfs in the integral of Eq. ( 7) and re-organizing the terms, one gets where where |B| denotes the determinant of B.

Choosing priors for the background statistics
For the filters designed in this article, like for any (very) highdimensional ensemble-based Kalman filters, information on the background error statistics can only be transported by the ensemble between analyzes.Passing along information on the full statistics of the errors requires too much storage.That is one reason why the EnKF was preferred over the impractical extended Kalman filter.Still, we have to make a priori assumptions on (the statistics of) x b and B.
The most popular one in multivariate statistics is Jeffreys' prior.It maximizes the information that will be gained in any subsequent analysis made with that prior (making it as much less informative as possible).It is known that Jeffrey's prior for the couple (x b ,B) is not satisfying in practice, and one should make the independence assumption (Jeffreys, 1961): and compute the Jeffreys' priors for x b and B separately.Jeffreys' choice corresponds to where M is the dimension of the state space.The fact that p J (x b ,B) cannot be normalized is not truly an issue, like for any non-informative priors in Bayesian statistics, provided (as far as we are concerned) that integral Eq. ( 7) is proper.
The prior of B has some important properties that are essential for this study.First, it is invariant by any reparameterization of state vectors.Consider the change of state variables x = Fx , where F is a non-singular matrix in state space.It translates to B = FB F T for the error covariance matrices.The Jacobian of this change of variables for B is (see for instance Muirhead, 1982) dB so that This justifies the power (M + 1)/2.Besides we want the hyperprior to lead to asymptotic Gaussianity: in the limit of a large ensemble, this choice should lead to the usual Gaussian prior used in EnKF analysis.This will be checked in Sect.3.

Effective J b functional
Choosing the prior p(x b ,B) ≡ p J (x b )p J (B), the integration on x b in Eq. ( 8) is straightforward and leads to where This functional can be compactly written as where Like for most ensemble Kalman filters, especially ensemble transform Kalman filters, it is assumed in the following that x −x belongs to the vector space V spanned by the anomalies integral Eq. ( 14) turns out to be improper.The problem can be circumvented.Indeed, the B matrices to integrate on are merely test positive-definite matrices representing potential error covariance matrices.We could choose to integrate on a relevant subspace rather than on all positive definite matrices.We are merely interested in the matrices that act on V only, because the state vector lies in x +V.Integration on the other Nonlin.Processes Geophys., 18, 735-750, 2011 www.nonlin-processes-geophys.net/18/735/2011/ matrices will produce an infinite volume factor with no dependence on x, that can be subtracted from the final effective functional.On more rigorous grounds, one can extend matrix A to a full rank positive definite matrix A = A+ I M , where I M is the identity matrix of state space, and > 0. Then the integral in Eq. ( 14) becomes proper.After the integration, one can let goes to 0. A diverging term depending on only, and hence of no interest, can then be safely ignored.
To perform the integration on B in Eq. ( 14), one can proceed to the change of variables B = A 1/2 A 1/2 .From Eq. ( 12), the Jacobian of this change of variable is Therefore, the dependence in x through A can be extracted from the integral: It is important to realize that the last determinant of A actually applies to the restriction of the linear operator represented by A in the canonical basis of subspace V, which is of dimension lower or equal to N − 1, and is, by this definition, not singular.
From the expression of p(x|x 1 ,...,x N ), we deduce the background functional to be used in the subsequent analysis of our variant of the EnKF: up to some irrelevant constant.Let us remark that the mean of the ensemble x is the mean and mode of p(x|x 1 ,...,x N ).

Alternate J b functional
One can argue against the choice of p J (x b ) = 1.It might be considered too weakly informative.However as an hyperprior, it provides information before the observation, but also before exploiting the ensemble.So, whatever information is passed on to the subsequent analysis, it is weak, unless the information content of the ensemble is weak and the observation are not dense (small ensemble size, sparse/infrequent observation).
One alternative to the uniform distribution is to use a climatology for x b .It is not tested in this study.However it was recently demonstrated in the context of ensemble Kalman filtering that such an approach is helpful for sparsely observed systems (Gottwald et al., 2011).Another alternative is to make specific choices for x b .Equation ( 6) would be affected in the following way.The probability density p(x|x b ,B) is conditional on the knowledge of B and x b .For this density, we additionally assume a great confidence in x, like any standard EnKF, so that the first guess x b of the sampled prior is believed to be very close to x and p(x|x b ,B) p(x|x,B).This assumption can be wrong for small ensemble size.Therefore: The rest of the derivation is fundamentally unchanged.The final background functional reads However the disappearance of the N/(N + 1) factor is not cosmetic, and may have consequences that are investigated later.

Finite-size ensemble transform Kalman filter
Because J b and J alt b are not quadratic, it is clear that the analysis should be variational, in a similar flavor as the maximum likelihood ensemble filter (Zupanski, 2005;Carrassi et al., 2009).As such it can accommodate nonlinear observation operators.Therefore, in this study, the analysis step will be variational, similarly to 3D-Var.One should minimize the cost function with where y is the observation vector in observation space R d , R is the observation error covariance matrix, and H is the observation operator.We shall call finite-size (or finite-sample) ensemble Kalman filters (EnKF-N), the ensemble Kalman filters that could be generated using this type of J b term in the analysis step of the filter.In the following, the focus will be on the ensemble transform Kalman filter (ETKF) variant, following Hunt et al. (2007).The analysis is expressed as an element of subspace x + V.The state vector is characterized by a set of (redundant) coordinates {w k } k=1,...,N in the ensemble space: If X k = x k − x are the anomalies, and X the matrix of these anomalies, X = (X 1 ,...,X N ), then x − x = Xw.Hence, one has www.nonlin-processes-geophys.net/18/735/2011/ Nonlin.Processes Geophys., 18, 735-750, 2011 Recall that |A| represents the determinant of the linear operator related to A but restricted to subspace V.In the same subspace, the linear operator related to XX T is invertible, of inverse denoted XX T −1 .One gets There is a subtlety that we need to develop on, and which generalizes the clear explanation given by Hunt et al. (2007).

Gauge-invariance of the parameterization
As a family of vectors, the anomalies are not independent since N k=1 X k = 0. Therefore parameterizing J b (x) = J b (x + Xw) with w entails a so-called gauge invariance (a redundancy in w): J b (x + Xw) is invariant under a shift of all w k by a same constant.The number of degrees of freedom of this invariance is given by the dimension of the kernel of X, which is at least one according to the previous remark.
The expression given by Eq. ( 29) is not invariant under rotations of w.We could make it invariant by using the freedom of the gauge invariance.We can fix this gauge by choosing to minimize the cost function over the w that have a null orthogonal projection on the kernel of X: With this constraint, |A| is proportional to 1 + N N +1 w T w.This is cumbersome to enforce though.Instead, to perform the same task, a gauge-fixing term is inserted into the cost function Eq. ( 25) yielding an augmented cost function For instance, in the case where rank(A) = N − 1, one has Because ln is a monotonically increasing function, one gets J a (w) ≥ J a (x + Xw), for all w in R N , with equality if and only if G(w) = 0.Moreover, for any x there is a w in the kernel of X (G(w ) = 0) such that J a (x) = J a (w ).As a consequence, the two cost functions J a (w) and J a (x) have the same minimum.Note that this implies that at the minimum w a of J a , one has G(w a ) = 0. Hence, instead of Eq. ( 25) one can use the cost function with a gauge-fixing term: Cost function Eq. ( 35) is not necessarily convex because the ln function is concave.Let us assume a linear observation operator, or linearized around the innovation y − H (x). Then a minimum always exists since for a linear observation operator, J o (x) is convex in w, and is a monotonically increasing function when the norm of w goes to infinity.Conversely, the cost function may have several minima (see Appendix A).As a consequence the nature of the minimizer, as well as the first guess of the iterative optimization, may have an impact on the result.The first guess of the iterative minimization was chosen to be w = 0, which favors the prior against the observation if several minima do exist.Even though it may sound wiser to favor observation, the choice w = 0 is clearly simpler.

Posterior ensemble
Once w a is obtained as the minimizer of Eq. ( 35), the posterior state estimate is given by We wish to compute a local approximation of the error covariances at the minimum.The Hessian of J b can be computed in ensemble space: The Hessian of the observation term is where H is the tangent linear of H .The analysis error covariance matrix P a in ensemble space is approximately obtained from the inverse matrix of the total Hessian at the minimum where Note that H a must be positive definite by construction, even though H b (w a ) is not necessarily so.
Then, a posterior ensemble can be obtained from the square root of (N − 1) P a .More precisely, the posterior Nonlin.Processes Geophys., 18, 735-750, 2011 www.nonlin-processes-geophys.net/18/735/2011/ ensemble anomalies, in ensemble space, are given by the columns W a k of the transform matrix where U is an arbitrary orthogonal matrix that preserves the ensemble mean: Uu = u where u = (1,...,1) T .The degrees of freedom introduced by U allow to span the ensemble space of any ensemble square root Kalman filter (Sakov and Oke, 2008).Accordingly, the posterior ensemble in state space is given for k = 1,...,N by Let us check that the posterior ensemble is centered on x a .To do so, one has to verify that u is in the kernel of XW a .If we can prove that u is an eigenvector of P a , then XW a u ∝ Xu = 0.The eigenvectors of P a are those of the Hessian H a at the minimum.Since J o (x + Xw) is gauge invariant, it is easy to check that u is in the kernel of the Hessian H o .(Note that this remark also applies without approximation to nonlinear observation operators.)As for J b whose gauge-invariance has been intentionally broken, the argument cannot apply.But it was seen earlier that at the minimum G(w a ) = 0.In particular, one has u T w a = 0.As a consequence, it is clear from Eq. ( 38) that u is an eigenvector of H b , of eigenvalue N(1 + 1/N + (w a ) T w a ) −1 .Therefore the posterior ensemble is centered on x a .This property is important for the consistency and ultimately the stability and performance of the filter (Wang et al., 2004;Livings et al., 2008;Sakov and Oke, 2008).
The new filters are based on several mild approximations that are imposed by the non-Gaussianity of the prior.Firstly, one might not sample the right minimum when there are several of them (see Appendix A).Or the right estimator could be the average rather than a mode of the posterior pdf.Secondly, and unlike the Gaussian case, the inverse Hessian is only a local approximation of the analysis error covariance matrix (Gejadze et al., 2008).

Asymptotic Gaussianity
When the ensemble size goes to large N → ∞, the ln term in the background part of cost function Eq. ( 36), must decrease to smaller, yet always positive, values.So should ε ≡ w T w = N k=1 w 2 k .Therefore, in this limit, one has and the ETKF of Hunt et al. (2007) is recovered (assuming U is the identity matrix).

Algorithm
The variant of the finite-size EnKF that has just been described is the finite-size ensemble transform Kalman filter (ETKF-N).The numerical implementation is similar to that of Harlim and Hunt (2007) (see Algorithm 2).The pseudocode for ETKF-N is: 1. Obtain the forecast ensemble {x k } k=1,•••,N from the model propagation of the previous ensemble analysis.
2. Form the mean x, and the anomaly matrix X, necessary for the evaluation of cost function Eq. ( 35).

Compute
6. Generate the new ensemble: The complexity is the same as that of ETKF.The minimization of the analysis cost function, which is already well conditioned by construction, might be longer in such nonquadratic, and even non-convex context.However, the minimization remains in ensemble space, and is almost negligible in cost for high-dimensional applications with an ensemble size in the range of 10-100.

Interpretation
The influence of the background term of the cost function, J b = N 2 ln 1 + 1/N + w T w , within the full cost function Eq. ( 35), is compared to its counterpart in ETKF, J b = N −1 2 w T w.Firstly, let us assume that the innovation is such that, in the ETKF system, the analysis is driven away from the ensemble mean: In the ETKF-N system, the constraint enforced by the background term would be alleviated by the presence of the ln function.Therefore, in the same situation (same innovation), ETKF-N would be more controlled by the observation than ETKF.In particular, larger deviations from the ensemble mean would be allowed.It is reminiscent of the way the Huber norm operates (Huber, 1973).Secondly, assume that the innovation drives the ETKF system towards an analysis close to the ensemble mean From Eq. ( 43), it is clear that the ETKF-N system is in a similar regime.However, because of the 1/N offset in the ln www.nonlin-processes-geophys.net/18/735/2011/ Nonlin.Processes Geophys., 18, 735-750, 2011 function, the prior term cannot vanish even when the ensemble mean is taken as the optimal state.This is confirmed by the inverse of the Hessian H b , the contribution of the prior to P a , which is N −1 (1 + 1/N) at w a = 0, instead of N −1 .This also corresponds to the residual 1/2 term in J b of Eq. ( 43).Algebraically, this offset comes in the formula by the integration on x b : this blurring tells the system not to trust the ensemble mean entirely at finite N.
We believe this is the same term 1 + 1/N that was diagnosed by Sacher and Bartello (2008), who showed that, for a Gaussian process, the dispersion of the ensemble around the mean of the Gaussian should be (1 + 1/N)P, instead of P, because the ensemble mean does not coincide with the mean of the Gaussian distribution.

Alternate ETKF-N
The alternative formulation of ETKF-N, that assumes x is the best estimator for the prior, leads to the background term The only difference is in the missing 1/N offset term, which is not surprising since it was identified as a measure of the mistrust in the ensemble mean to represent the true forecast mean.

Tests and validation with simple models
In this section, the new filters will be numerically tested, on a three-variable chaotic dynamical toy model, as well as a onedimensional chaotic dynamical toy model.For the numerical experiments, U is chosen to be the identity.

Setup
The Lorenz '63 model (Lorenz, 1963) is a model with M = 3 variables, defined by the equations: The parameters are set to the original values (σ,ρ,β) = (10,28,8/3), which are known to lead to chaotic dynamics, with a doubling time of 0.78 time units.In the following simulations, a reference simulation stands for the truth.The model is considered to be perfect: the model of the truth is the same as the one used in data assimilation runs.We generate synthetic observations from the reference simulation for the three variables each t time interval, with t = 0.10, 8 M. Bocquet: Ensemble Kalman filtering without inflation From Eq. ( 43), it is clear that the ETKF-N system is in a similar regime.However, because of the 1/N offset in the ln function, the prior term cannot vanish even when the ensemble mean is taken as the optimal state.This is confirmed by the inverse of the Hessian H b , the contribution of the prior to P a , which is N −1 (1+1/N ) at w a = 0, instead of N −1 .This also corresponds to the residual 1/2 term in J b of Eq. ( 43).
Algebraically, this offset comes in the formula by the integration on x b : this blurring tells the system not to trust the ensemble mean entirely at finite N .We believe this is the same term 1 + 1/N that was diagnosed by Sacher and Bartello (2008), who showed that, for a Gaussian process, the dispersion of the ensemble around the mean of the Gaussian should be (1 + 1/N )P, instead of P, because the ensemble mean does not coincide with the mean of the Gaussian distribution.

Alternate ETKF-N
The alternative formulation of ETKF-N, that assumes x is the best estimator for the prior, leads to the background term The only difference is in the missing 1/N offset term, which is not surprising since it was identified as a measure of the mistrust in the ensemble mean to represent the true forecast mean.

Tests and validation with simple models
In this section, the new filters will be numerically tested, on a three-variable chaotic dynamical toy model, as well as a onedimensional chaotic dynamical toy model.For the numerical experiments, U is chosen to be the identity.

Setup
The Lorenz '63 model (Lorenz, 1963) is a model with M = 3 variables, defined by the equations: The parameters are set to the original values (σ,ρ,β) = (10,28,8/3), which are known to lead to chaotic dynamics, with a doubling time of 0.78 time units.In the following simulations, a reference simulation stands for the truth.The model is considered to be perfect: the model of the truth is the same as the one used in data assimilation runs.We generate synthetic observations from the reference simulation for the three variables each ∆t time interval, with ∆t = 0.10, ∆t = 0.25 and ∆t = 0.50.These choices are expected to generate mild, medium and strong impact of non-linearity and, as a possible consequence, non-Gaussianity of errors.These observations are independently perturbed with a normal white noise of standard deviation 2 following Harlim and Hunt (2007).In comparison, the natural variability (standard deviation from the mean of a long model run) of the x, y, and z variables is 7.9, 9.0, and 8.6 respectively.All the simulations are run for a period of time corresponding to 5 × 10 5 cycles, for the three values of ∆t.We use a burn-in period of 10 4 cycles to minimize any impact on the final result.The ensemble size is varied from N = 3 to N = 9.The filters are judged by the time-averaged value of the root mean square error between the analysis and the true state of the reference run.

Best rmse
For ETKF, a multiplicative inflation is applied by rescaling of the ensemble deviations from the mean: so that r = 1 means no inflation.A wide range of inflation factors r is tested.The inflation factor leading to the smallest (best) rmse is selected.For finite-size filters, inflation is not considered.Therefore for each finite-size filter score, only one run is necessary.
The results are reported in Fig. 1.For mild non-linearity, the ETKF is slightly better than ETKF-N.With a stronger impact of non-linearity (∆t = 0.25 and ∆t = 0.50), ETKF-N significantly outperforms ETKF.The alternate ETKF-N is t = 0.25 and t = 0.50.These choices are expected to generate mild, medium and strong impact of non-linearity and, as a possible consequence, non-Gaussianity of errors.These observations are independently perturbed with a normal white noise of standard deviation 2 following Harlim and Hunt (2007).In comparison, the natural variability (standard deviation from the mean of a long model run) of the x, y, and z variables is 7.9, 9.0, and 8.6 respectively.
All the simulations are run for a period of time corresponding to 5 × 10 5 cycles, for the three values of t.We use a burn-in period of 10 4 cycles to minimize any impact on the final result.The ensemble size is varied from N = 3 to N = 9.The filters are judged by the time-averaged value of the root mean square error between the analysis and the true state of the reference run.

Best rmse
For ETKF, a multiplicative inflation is applied by rescaling of the ensemble deviations from the mean: so that r = 1 means no inflation.A wide range of inflation factors r is tested.The inflation factor leading to the smallest (best) rmse is selected.For finite-size filters, inflation is not considered.Therefore for each finite-size filter score, only one run is necessary.
The results are reported in Fig. 1.
For mild non-linearity, the ETKF is slightly better than ETKF-N.With a stronger impact of non-linearity ( t = 0.25 and t = 0.50), ETKF-N significantly outperforms ETKF.
The alternate ETKF-N is diverging for t = 0.10 and for small ensemble size N ≤ 6.This emphasizes the fact that the ensemble mean x is not a fine estimation of x b , the mean Nonlin.Processes Geophys., 18, 735-750, 2011 www.nonlin-processes-geophys.net/18/735/2011/ of the true error distribution.For t = 0.25 and t = 0.50, where the errors are larger and the estimation of x b may be relatively less important, the performance is almost as good as ETKF-N, with slight deviations for the smallest ensembles.
To a large extent, these results are similar to those of Harlim and Hunt (2007).However, even though getting better results than ETKF, their filter still necessitates to adjust one parameter.

Setup
The filters are also applied to the one-dimensional Lorenz '95 toy-model (Lorenz and Emmanuel, 1998).This model represents a mid-latitude zonal circle of the global atmosphere, discretized into M = 40 variables {x m } m=1,...,M .The model reads, for m = 1,...,M, where F = 8, and the boundary is cyclic.Its dynamics is chaotic, and its attractor has a topological dimension of 13, a doubling time of about 0.42 time units, and a Kaplan-Yorke dimension of about 27.1.
The experiments follow the configuration of Sakov and Oke (2008).In the first experiment, the time interval between analyzes is t = 0.05, representative of time intervals of 6 hours for a global meteorological model.With this choice, non-linearity mildly affects the dynamics between updates.All variables are observed every t.Therefore, the observation operator is the identity matrix.All observations, which are obtained from a reference model run (the truth), are perturbed with a univariate normal white distribution of standard deviation 1.The observation error prior is accordingly a normal distribution of error covariance matrix the identity.In comparison, the natural variability of the model (standard deviation from the mean) is 3.6 for any of the M = 40 variables.The performance of a filter is assessed by the root mean square error (rmse) of the analysis with the truth, averaged over the whole experiment run.
As a burn-in period, 5 × 10 3 analysis cycles are used, whereas 10 4 analysis cycles are used for the assimilation experiments.This may be considered relatively short.However, on the one hand, the convergence was deemed sufficient for this demonstrative study.On the other hand, about 5 × 10 4 assimilation experiments have been performed, because the inflation (and later the localization) parameters are investigated for many sizes of the ensemble.Longer runs (10 5 analysis cycles) have also been performed, but no (longterm) instability was noted.Moreover these tests showed that the rmses of the 10 4 -cycle cases had reasonably converged.

Ensemble size -inflation diagrams
Following Sakov and Oke (2008) and many others, we investigate the rmse of the analysis with the reference state (the truth).The ensemble size is varied from 5 to 50.A multiplicative inflation is applied by rescaling of the ensemble deviations from the mean according to Eq. ( 48).The inflation factor r is varied from 1. to 1.095 by step of 0.005.As a result, one obtains two-dimensional tables of rmse, which displayed graphically.
The results for ETKF are reported in Fig. 2a.They are similar to the symmetric ensemble square root Kalman filter of Sakov and Oke (2008).The filter starts converging when the ensemble size is larger than the model unstable subspace dimension.Inflation is always necessary, even for a size of the ensemble greater than the Kaplan-Yorke dimension pointing to a systematic underestimation of sampling errors.
The results of ETKF-N are reported on Fig. 2b.At first, it is striking that the filter diverges for ensemble sizes below N = 15.This is disappointing, since the original goal of this study was to remedy to all sampling flaws in a deterministic context.This is obviously not achieved, similarly to any kind of EnKF without localization.However, the formalism developed here allows to understand the reason of this failure, and how it could later be amended.This will be discussed in Sect. 5.
Beyond N = 15 (which corresponds to a rank of 14 or less from the anomaly subspace, close to the model unstable subspace dimension), the filter is unconditionally stable.
The results of the alternate ETKF-N are also reported on Fig. 2c.It is also unconditionally stable beyond N = 15, but the rmses are better.

Best rmse
In the case of ETKF, the best root mean square error is obtained by taking, for each ensemble size, the minimal rmse over all inflation factors.For ETKF-N, there is only one rmse, since inflation is not considered.In Fig. 3 are plotted the best rmses for the three filters.The alternate ETKF-N seems to outperforms ETKF slightly.But its major asset is that the alternate ETKF-N obtains the best rmses without inflation.
Both ETKF-N and alternate ETKF-N perform better than ETKF over the range N = 5-16, especially in the critical range N = 14-16.This has been checked for other configurations (changing M and F ) of the Lorenz '95 model.
The ETKF-N is not as good as the other two filters beyond N = 16.It underperforms both filters by a maximum of 30 %, for N = 20.We believe this is explained by the robustness of the filter.Indeed, as discussed in Sect.3, ETKF-N assumes that the mean state can be different from the mean state of the true distribution, whereas the alternate ETKF-N assumes they match.Both finite-size filters are symmetric.If the model flow remains approximately linear, which is the www.nonlin-processes-geophys.net/18/735/2011/ Nonlin.Processes Geophys., 18, 735-750, 2011 If the model flow remains approximately linear, which is the case for small ∆t, the forecasted ensemble will remain centered on the trajectory of the mean, so that the mean of the ensemble will remain a good estimate of the true distribution mean.Therefore, the alternate ETKF-N, as well as symmetric ensemble square root filters, have an advantage in linear conditions over the more conservative, too cautious ETKF-N.If this is correct, then the performance of ETKF-N (which is symmetric) should be better, or at worst equal to the performance of a non-symmetric ensemble square root Kalman filter for small ∆t.Indeed this can be checked by comparison of Fig. 3 of the present article and Fig. 4 of Sakov and Oke (2008).Moreover, according to this argument, the performance of ETKF-N should improve for larger ensemble size and larger ∆t, in comparison with ETKF (with optimal inflation).
As mentioned earlier, the time interval between updates has been set to ∆t = 0.05.We know from the experiment on the Lorenz '63 model and from the previous remark, that the performance of ETKF-N as compared to ETKF is susceptible to vary with ∆t.Let us take the example of an ensemble size of N = 20.The setup is unchanged except for the time interval which is set to ∆t = 0.05,0.10,...,0.30.As shown in Fig. 4, ∆t ≤ 0.15 is a turning point beyond which ETKF-N obtains better rmse than ETKF without inflation.
The alternate ETKF-N offers the best of ETKF (with optimal inflation) and ETKF-N, over the full range of ∆t.

Can the use of localization be avoided?
We saw numerical evidence that ETKF-N does not solve the full sampling problem.A similar conclusion can be reached case for small t, the forecasted ensemble will remain centered on the trajectory of the mean, so that the mean of the ensemble will remain a good estimate of the true distribution mean.Therefore, the alternate ETKF-N, as well as symmetric ensemble square root filters, have an advantage in linear conditions over the more conservative, too cautious ETKF-N.If this is correct, then the performance of ETKF-N (which is symmetric) should be better, or at worst equal to the performance of a non-symmetric ensemble square root Kalman filter for small t.Indeed this can be checked by comparison of Fig. 3 of the present article and Fig. 4 of Sakov and Oke (2008).Moreover, according to this argument, the performance of ETKF-N should improve for larger ensemble size and larger t, in comparison with ETKF (with optimal inflation).
As mentioned earlier, the time interval between updates has been set to t = 0.05.We know from the experiment on the Lorenz '63 model and from the previous remark, that the performance of ETKF-N as compared to ETKF is susceptible to vary with t.Let us take the example of an ensemble size of N = 20.The setup is unchanged except for the time interval which is set to t = 0.05,0.10,...,0.30.
As shown in Fig. 4, t ≤ 0.15 is a turning point beyond which ETKF-N obtains better rmse than ETKF without inflation.The alternate ETKF-N offers the best of ETKF (with optimal inflation) and ETKF-N, over the full range of t.

Can the use of localization be avoided?
We saw numerical evidence that ETKF-N does not solve the full sampling problem.A similar conclusion can be reached from a more mathematical standpoint.The functional form Nonlin. Processes Geophys., 18,[735][736][737][738][739][740][741][742][743][744][745][746][747][748][749][750]2011 www.nonlin-processes-geophys.net/18/735/2011/ from a more mathematical standpoint.The functional form of ETKF-N Eq. ( 35) is formulated in ensemble space, via a set of ensemble coordinates that do not depend on the real space position.This functional form is due to the choice of the Jeffrey's prior.It implies that the dimension of the analysis space has a very reduced rank.Localization, which is meant to increase this rank, is therefore mandatory below some context-dependent ensemble size.We could contemplate two ways to tackle this difficult problem.
The first one would consist in trading Jeffrey's prior for a more informative one.The particular form of the ETKF-N background term which depends only on the ensemble coordinates was due to the specific choice of Jeffrey's prior, which had the merit to be simple.However, errors of the Lorenz '95 data assimilation system, or of more realistic geo-physical systems, often have short-range correlations.If, using an hyperprior different from Jeffreys', one could integrate on a restricted set of error covariance matrices of correlation matching the climatological correlations of the data assimilation system, we conjecture that localization could be consistently achieved within the proposed formalism.Let us take an example where it is assumed that the correlations of the data assimilation system are very short-range.At the extreme, we integrate Eq. ( 14) on the set of all positive definite diagonal matrices B, that is a set of M positive scalars, with the non-informative univariate prior: Following the derivation of Section 2, one obtains As opposed to the background terms introduced earlier, this J b cannot be written in ensemble space, i.e., not in terms of the coordinates w k in the vector space of anomalies.Assuming the same setup used for the Lorenz '95 model, the average analysis rmse of such EnKF-N is in the range 0.50 for N = 5 down to 0.35 for N = 40.It has been checked to be similar to any EnKF or ETKF with a minimal (meaningful) localization length, except that, for this new filter, inflation is not necessary even for small N .We conclude that localization can potentially be expressed in the formalism.Pursuing this idea is well beyond the scope of this article, because it seems mathematically challenging.
As an alternative, simpler, and widespread solution, a local version of the filter will be developed and tested in the remaining of this section.

LETKF-N
The extension of ETKF-N to a local ensemble transform Kalman filter is the same as the passage from ETKF to LETKF as described by Hunt et al. (2007), and by Harlim and Hunt (2007) for non-quadratic cost functions.For highdimensional and computationally challenging systems, this requires to follow their algorithm.However, for the Lorenz '95 toy-model the passage from ETKF-N to LETKF-N is trivial.Indeed, fixing a localization length l for each point of control space, one performs a local analysis using all observations within a radius of l units.Hence, for the Lorenz '95 model, l ranges from l = 0, using the single local observation if any, to l = 20, meaning that all observations are assimilated, i.e. no localization.We shall call this filter the finite-size (finite-sample) local ensemble transform Kalman filter (LETKF-N). of ETKF-N Eq. ( 35) is formulated in ensemble space, via a set of ensemble coordinates that do not depend on the real space position.This functional form is due to the choice of the Jeffrey's prior.It implies that the dimension of the analysis space has a very reduced rank.Localization, which is meant to increase this rank, is therefore mandatory below some context-dependent ensemble size.We could contemplate two ways to tackle this difficult problem.
The first one would consist in trading Jeffrey's prior for a more informative one.The particular form of the ETKF-N background term which depends only on the ensemble coordinates was due to the specific choice of Jeffrey's prior, which had the merit to be simple.However, errors of the Lorenz '95 data assimilation system, or of more realistic geophysical systems, often have short-range correlations.If, using an hyperprior different from Jeffreys', one could integrate on a restricted set of error covariance matrices of correlation matching the climatological correlations of the data assimilation system, we conjecture that localization could be consistently achieved within the proposed formalism.Let us take an example where it is assumed that the correlations of the data assimilation system are very short-range.At the extreme, we integrate Eq. ( 14) on the set of all positive definite diagonal matrices B, that is a set of M positive scalars, with the non-informative univariate prior: Following the derivation of Sect.2, one obtains As opposed to the background terms introduced earlier, this J b cannot be written in ensemble space, i.e. not in terms of from a more mathematical standpoint.The functional form of ETKF-N Eq. ( 35) is formulated in ensemble space, via a set of ensemble coordinates that do not depend on the real space position.This functional form is due to the choice of the Jeffrey's prior.It implies that the dimension of the analysis space has a very reduced rank.Localization, which is meant to increase this rank, is therefore mandatory below some context-dependent ensemble size.We could contemplate two ways to tackle this difficult problem.
The first one would consist in trading Jeffrey's prior for a more informative one.The particular form of the ETKF-N background term which depends only on the ensemble coordinates was due to the specific choice of Jeffrey's prior, which had the merit to be simple.However, errors of the Lorenz '95 data assimilation system, or of more realistic geo-  the coordinates w k in the vector space of anomalies.Assuming the same setup used for the Lorenz '95 model, the average analysis rmse of such EnKF-N is in the range 0.50 for N = 5 down to 0.35 for N = 40.It has been checked to be similar to any EnKF or ETKF with a minimal (meaningful) localization length, except that, for this new filter, inflation is not necessary even for small N. We conclude that localization can potentially be expressed in the formalism.Pursuing this idea is well beyond the scope of this article, because it seems mathematically challenging.
As an alternative, simpler, and widespread solution, a local version of the filter will be developed and tested in the remaining of this section.

LETKF-N
The extension of ETKF-N to a local ensemble transform Kalman filter is the same as the passage from ETKF to LETKF as described by Hunt et al. (2007), and by Harlim and Hunt (2007) for non-quadratic cost functions.For highdimensional and computationally challenging systems, this requires to follow their algorithm.However, for the Lorenz '95 toy-model the passage from ETKF-N to LETKF-N is trivial.Indeed, fixing a localization length l for each point of control space, one performs a local analysis using all observations within a radius of l units.Hence, for the Lorenz '95 model, l ranges from l = 0, using the single local observation if any, to l = 20, meaning that all observations are assimilated, i.e. no localization.We shall call this filter the finite-size (finite-sample) local ensemble transform Kalman filter (LETKF-N).

Ensemble size -inflation diagrams
The results of the experiments with LETKF, LETKF-N, and with the alternate LETKF-N are reported in Fig. 5.For LETKF, inflation is still necessary to stabilize the filter for not-so-small ensemble sizes (N ≤ 11).LETKF-N does not require inflation (at least for N ≥ 5, the case N < 4 was not investigated).Again, it means that LETKF-N estimates well, or over-estimates, sampling errors.But it is unconditionally stable with the best performance obtained without inflation.The alternate LETKF-N may still be the best filter for an ensemble size beyond the model unstable subspace dimension, but it disappoints by requiring inflation for small and moderate ensemble size.This indicates that trusting the mean x as the first guess is a source of error for small ensemble size.

Best rmse
In Fig. 6 are plotted the best rmses for the three filters, with localization.
LETKF-N is always slightly suboptimal (with a maximal discrepancy of 10 % for N = 5).However, it is the only unconditionally stable filter of the three: it does not require inflation.
The alternate LETKF-N is as good as LETKF-N for small ensembles but it does require inflation, which is why it is not so interesting in this regime.The alternate LETKF-N is as good as LETKF in the large ensemble size regime, but without inflation.
As we increase the time interval between updates, the performance of the filters degrades but their relative performance evolves.Let us take the example of an ensemble size of N = 10, following Harlim and Hunt (2007).The setup is unchanged except for the time interval between updates which is set to t = 0.05,0.10,...,0.50.The results are reported in Fig. 7.
For t ≤ 0.20, LETKF with optimal inflation and localization outperforms LETKF-N with optimal localization and no inflation.For t ≥ 0.20, LETKF-N dominates.Like in the Lorenz '63 case, this is reminiscent of the results of Harlim and Hunt (2007).This indicates that the relative performance of filters as shown for instance by Fig. 6 should not be taken as a rule, since there are regimes where LETKF-N (without inflation) performs better than LETKF.
The good performances of EnKF-Ns relative to the EnKFs in the strong nonlinear regime, is not an indication that EnKF-N can handle non-Gaussianity in this regime.However the sampling errors may be created and exacerbated by the non-linearity of the model flow, and hence of the non-Gaussianity of the underlying pdf of errors.This may give an advantage to the finite-size ensemble filters in this regime.

Summary and future directions
Current strategies for stabilizing the ensemble Kalman filter are empirical tuning of inflation, use of multi-ensemble configuration, explicit identification of the sampling/model errors, or adaptive optimization of inflation.In this article, we have followed a somehow different route.A new background prior pdf that takes into account the discrete nature of the ensemble was derived using Bayesian principles.It accounts for the uncertainty attached to the first-and secondorder empirical moments of the ensemble seen as a sample of a true error distribution.The definition of the prior pdf leads to a new class of filters (EnKF-N).Even though the resulting prior is non-Gaussian, it is entirely based on Gaussian hypotheses for the errors.In principle, through this prior, the analysis should take into account sampling errors.
Specifically, an ensemble transform variant (ETKF-N) was derived in the spirit of the ETKF of Hunt et al. (2007).It is tested on the Lorenz '63 and the Lorenz '95 toy models.
In the absence of model error, the filter appear to be stable without inflation for ensemble size greater than the model unstable subspace dimension of the attractor.Moreover, for large enough time interval between updates, its performance is superior to that of ETKF.A variant of ETKF-N is expected to outperform ETKF-N for small time interval between updates: without inflation, it seems to systematically perform as well as, or better than ETKF.Unfortunately, as shown in the case of the Lorenz '95 model, these finite-size filters diverge for smaller ensemble size, like any ensemble Kalman filter.Localization is mandatory.That is why a local variant of the filter (LETKF-N) which parallels LETKF, is developed.
From experiments on the Lorenz '95 toy model, LETKF-N scheme seems stable without inflation.Depending on the time interval between updates, its performance with optimally tuned localization can be slightly inferior or superior to LETKF with optimally tuned localization and optimally tuned inflation.
The methodology presented here is mainly a proof of concept.We believe that more work is needed to explore the strengths and limitations of the methodology, and that there is room for improvement of the schemes.
For instance, we conjectured that the incapacity of ETKF-N to fully account for sampling errors (as opposed to LETKF-N with optimally tuned localization), was mainly due to the use of an hyperprior which generates correlations different from that of the data assimilation system built on the particular model.To avoid using weakly informative (hyper)priors on x b and B, one solution is to pass information between analyzes beyond the knowledge of the ensemble.In the context of oil reservoir monitoring, Myrseth and Omre (2010) have built a sophisticated and elegant ensemble Kalman filter that could be seen as a stochastic extension of ETKF-N that achieves such a goal.They see covariance matrices as random matrices with an inverse Wishart distribution of precision matrix in R M×M (Muirhead, 1982).
Nonlin.Processes Geophys., 18, 735-750, 2011 www.nonlin-processes-geophys.net/18/735/2011/ The hyperprior for x b and B are chosen in such a way (conjugate distribution) that the posterior error covariance matrix still follows an inverse-Wishart distribution.However, such a B matrix should be drawn from this distribution for each member, and the corresponding innovation statistics computed and inverted, which could become very costly.Even though one assumes all members use the same draw of B, it is necessary to store the precision matrix Ψ, which cannot be afforded in the high-dimensional context of geophysics.Still, to pass supplementary information (beyond the ensemble) one might contemplate adapting the algorithm of Myrseth and Omre (2010) so as to maintain a reduced-order, short memory, precision matrix, with a rank of a few ensemble sizes.
The behavior of EnKF-N in limiting regimes is worth exploring.For instance, in the limiting case where the dynamical model is linear, EnKF-N may not exhibit optimal perfor-mance since EnKF-N does not make implicit assumptions on the linearity of the model as opposed to traditional EnKFs.In this limiting case, the hyperprior p(x b ,B) should optimally be a Dirac delta function pdf peaked at the empirical moments of the ensemble, which would make EnKF-N the traditional EnKF.But what happens to EnKF-N with Jeffreys' hyperprior in this regime is less clear.
Another lead for improvement points to the derivation of the new prior used in the (L)ETKF-N filters, which, by definition, ignores the observations to be assimilated.From a Bayesian perspective, this is suboptimal: in our scheme, any B matrix's likelihood is tested against the ensemble, but not against both the ensemble and the observations, which is what a full Bayesian scheme would prescribe.Indeed Eq. ( 6) should be generalized to: p(x|y,x 1 ,...,x N ) The hyperprior for x b and B are chosen in such a way (conjugate distribution) that the posterior error covariance matrix still follows an inverse-Wishart distribution.However, such a B matrix should be drawn from this distribution for each member, and the corresponding innovation statistics computed and inverted, which could become very costly.Even though one assumes all members use the same draw of B, it is necessary to store the precision matrix , which cannot be afforded in the high-dimensional context of geophysics.Still, to pass supplementary information (beyond the ensemble) one might contemplate adapting the algorithm of Myrseth and Omre (2010) so as to maintain a reduced-order, short memory, precision matrix, with a rank of a few ensemble sizes.
The behavior of EnKF-N in limiting regimes is worth exploring.For instance, in the limiting case where the dynamical model is linear, EnKF-N may not exhibit optimal performance since EnKF-N does not make implicit assumptions on the linearity of the model as opposed to traditional EnKFs.In this limiting case, the hyperprior p(x b ,B) should optimally be a Dirac delta function pdf peaked at the empirical moments of the ensemble, which would make EnKF-N the traditional EnKF.But what happens to EnKF-N with Jeffreys' hyperprior in this regime is less clear.
Another lead for improvement points to the derivation of the new prior used in the (L)ETKF-N filters, which, by definition, ignores the observations to be assimilated.From a Bayesian perspective, this is suboptimal: in our scheme, any B matrix's likelihood is tested against the ensemble, but not against both the ensemble and the observations, which is what a full Bayesian scheme would prescribe.Indeed Eq. ( 6) should be generalized to:  Because of the presence of p(y|x b ,B) in the last integral and its dependence in x b and B, it seems difficult to analytically solve the problem in order to generalize ETKF-N.Stability without inflation is a property shared by ETKF-N, or LETKF-N, with the multi-ensemble configuration of Mitchell and Houtekamer (2009).However the two approaches draw their rationale from two different standpoints: Bayesian statistics and cross-validation respectively, whose connections are not clearly understood in Statistics.Additionally, we note that the multi-ensemble approach makes use of the observation while the finite-size ensemble transform filters do not.In other words, the multi-ensemble approach reduces the errors in the analysis while the finite-size ensemble transform filters do so prior to the analysis.The methodology developed in this article naturally led to deterministic filters, whose comparison with stochastic filters cannot be simple.Therefore it would be interesting to develop a stochastic filter counterpart to the deterministic EnKF-N presented here.
The focus of this study was primarily on sampling errors.In a realistic context, one should additionally take into account model errors, and the errors that come from the viation from Gaussianity due to model non-linearity.If one trusts from the previous results that EnKF-N reduces significantly the need for inflation meant to compensate for sampling errors, then the use of inflation in EnKF-N would essentially be a measure of model errors.It could also be a measure of the deviation from Gaussianity, or of the misspecification of the hyperprior as discussed earlier.These ideas have been successfully tested on the context of the Lorenz '95 using the setup of this article.However, reporting these results is beyond the scope of this article.
As a final remark, we would like to mention that the prior pdf p(x|x 1 ,...,x N ) ∝ exp(−J b (x)), where J b is defined by Eq. ( 21), could more generally be useful in environmental statistical studies, when one needs to derive a pdf from samples of the system state, or of some error about it, which is assumed Gaussian-distributed.Note that the ensemble size needs to be large enough otherwise localization is still necessary.
The author acknowledges a useful discussion with C. Snyder on state-of-the-art ensemble Kalman filtering.The article has benefited from the useful comments and suggestions of L. Delle Monache, N. Bowler, P. Sakov acting as a Reviewer, an anonymous Reviewer, O. Talagrand acting as Editor, and L. Wu.Because of the presence of p(y|x b ,B) in the last integral and its dependence in x b and B, it seems difficult to analytically solve the problem in order to generalize ETKF-N.Stability without inflation is a property shared by ETKF-N, or LETKF-N, with the multi-ensemble configuration of Mitchell and Houtekamer (2009).However the two approaches draw their rationale from two different standpoints: Bayesian statistics and cross-validation respectively, whose connections are not clearly understood in Statistics.Additionally, we note that the multi-ensemble approach makes use of the observation while the finite-size ensemble transform filters do not.In other words, the multi-ensemble approach reduces the errors in the analysis while the finite-size ensemble transform filters do so prior to the analysis.The methodology developed in this article naturally led to deterministic filters, whose comparison with stochastic filters cannot be simple.Therefore it would be interesting to develop a stochastic filter counterpart to the deterministic EnKF-N presented here.
The focus of this study was primarily on sampling errors.In a realistic context, one should additionally take into account model errors, and the errors that come from the deviation from Gaussianity due to model non-linearity.If one Because of the presence of p(y|x b ,B) in the last integral and its dependence in x b and B, it seems difficult to analytically solve the problem in order to generalize ETKF-N.Stability without inflation is a property shared by ETKF-N, or LETKF-N, with the multi-ensemble configuration of Mitchell and Houtekamer (2009)  trusts from the previous results that EnKF-N reduces significantly the need for inflation meant to compensate for sampling errors, then the use of inflation in EnKF-N would essentially be a measure of model errors.It could also be a measure of the deviation from Gaussianity, or of the misspecification of the hyperprior as discussed earlier.These ideas have been successfully tested on the context of the Lorenz '95 using the setup of this article.However, reporting these results is beyond the scope of this article.
As a final remark, we would like to mention that the prior pdf p(x|x 1 ,...,x N ) ∝ exp(−J b (x)), where J b is defined by Eq. ( 21), could more generally be useful in environmental statistical studies, when one needs to derive a pdf from samples of the system state, or of some error about it, which is assumed Gaussian-distributed.Note that the ensemble size needs to be large enough otherwise localization is still necessary.

Fig. 1 .
Fig. 1.Time-averaged analysis rmse for ETKF, ETKF-N and the alternate ETKF-N, for three experiments with different time intervals between updates, and for an ensemble size from N = 3 to N = 9.

Fig. 1 .
Fig. 1.Time-averaged analysis rmse for ETKF, ETKF-N and the alternate ETKF-N, for three experiments with different time intervals between updates, and for an ensemble size from N = 3 to N = 9.

Fig. 3 .Fig. 4 .
Fig. 3. Best rmse over a wide range of inflation factors for ETKF, and rmse without inflation for ETKF-N and for the alternate ETKF-N.

Fig. 3 .
Fig. 3. Best rmse over a wide range of inflation factors for ETKF, and rmse without inflation for ETKF-N and for the alternate ETKF-N.

Fig. 3 .Fig. 4 .
Fig. 3. Best rmse over a wide range of inflation factors for ETKF, and rmse without inflation for ETKF-N and for the alternate ETKF-N.

Fig. 4 .
Fig. 4. Best rmse for ETKF over a wide range of inflation factors, rmse of ETKF-N without inflation and rmse of the alternate ETKF-N without inflation, for several time intervals between updates, and for an ensemble size N = 20.

Fig. 6 .Fig. 7 .
Fig. 6.Best rmse over a wide range of inflation factors, and all possible localization lengths for LETKF and the alternate LETKF-N.Best rmse without inflation over all possible localization lengths for LETKF-N.

Fig. 6 .
Fig. 6.Best rmse over a wide range of inflation factors, and all possible localization lengths for LETKF and the alternate LETKF-N.Best rmse without inflation over all possible localization lengths for LETKF-N.

Fig. 6 .Fig. 7 .
Fig. 6.Best rmse over a wide range of inflation factors, and all possible localization lengths for LETKF and the alternate LETKF-N.Best rmse without inflation over all possible localization lengths for LETKF-N.

Fig. 7 .
Fig. 7. Best rmse for LETKF and LETKF-N over all possible localization lengths and a wide range of inflation factors for LETKF, for several time intervals between updates, and for an ensemble size N = 10.
. However the two approaches draw their rationale from two different standpoints: