Combining inflation-free and iterative ensemble Kalman filters for strongly nonlinear systems

. The ﬁnite-size ensemble Kalman ﬁlter (EnKF-N) is an ensemble Kalman ﬁlter (EnKF) which, in perfect model condition, does not require inﬂation because it partially ac-counts for the ensemble sampling errors. For the Lorenz ’63 and ’95 toy-models, it was so far shown to perform as well or better than the EnKF with an optimally tuned inﬂation. The iterative ensemble Kalman ﬁlter (IEnKF) is an EnKF which was shown to perform much better than the EnKF in strongly nonlinear conditions, such as with the Lorenz ’63 and ’95 models, at the cost of iteratively updating the trajectories of the ensemble members. This article aims at further exploring the two ﬁlters and at combining both into an EnKF that does not require inﬂation in perfect model condition, and which is as efﬁcient as the IEnKF in very nonlinear conditions. In this study, EnKF-N is ﬁrst introduced and a new implementation is developed. It decomposes EnKF-N into a cheap two-step algorithm that amounts to computing an optimal inﬂation factor. This offers a justiﬁcation


Introduction
Let us first introduce two recently developed and complementary ensemble Kalman filters.

An ensemble Kalman filter without the intrinsic need for inflation
The finite-size ensemble Kalman filter (EnKF-N), introduced in Bocquet (2011), relies on the statistical modelling assumption that the ensemble used in the analysis is a collection of samples of the prior probability density function p(x|x 1 , x 2 , . . . , x N ), where x k is the k-th member of the Nmember forecast ensemble. The idea behind EnKF-N is that the empirical moments of the ensemble, do not necessarily match the mean x b and the error covariance matrix B of the prior probability density function (pdf). In the large ensemble size limit N → ∞, we expect that they coincide. But for any finite N , sampling errors may induce a discrepancy.
An effective form for the prior pdf was proposed. It is the result of an integration over all possible x b and B. The effective prior pdf which is advocated to be used in the analysis step of the ensemble Kalman filter is: rather than the Gaussian prior Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
which is implicitly assumed in traditional EnKF. Notation |X| designates the determinant of matrix X. Note that the determinant and the inverse operators in formula Eqs.
(2) and (3) are meant to operate in the vector space spanned by the anomalies {x k − x} k=1,...,N . Otherwise they would often be zero for the determinant and undefined for the inverse. The constant ε N depends on the assumptions made to derive the prior. Two classes of filter that depend on these assumptions have been introduced. For the first, both x b and P b are uncertain, and ε N = 1 + 1 N . For the second, it is assumed that the empirical mean x is a fine approximation of x b . In this case: ε N = 1.
The analysis step of EnKF-N follows from Bayes rule: p(x|y, x 1 , x 2 , . . . , x N ) ∝ p(y|x)p(x|x 1 , x 2 , . . . , x N ) , where y ∈ R d is the observation vector at a given update step. It is assumed that the observational likelihood is Gaussian: where H is the observation operator and R is the observation error covariance matrix. In Bocquet (2011), the filter was seen as a deterministic filter and, in particular, the focus was on its ensemble transform variant. It is assumed that the system state x we would like to estimate, can be decomposed into where A = [x 1 − x, . . . , x N − x] is the matrix of the anomalies. The weights w ∈ R N are the (redundant) coordinates of x in ensemble space. The filter follows the same algorithm as the ensemble transform Kalman filter of Hunt et al. (2007), except that the optimal vector of weight w a at the analysis step is obtained from the minimisation of the cost function where H is the observation operator and R is the observation error covariance matrix. The tilde symbol indicates that the function is defined in ensemble space. This variational analysis step has similarities with that of Zupanski (2005); Harlim and Hunt (2007). The other modification is the computation of the posterior error covariance matrix, which, instead of being based on the Hessian in ensemble space H = (HA) T R −1 HA + (N − 1)I N , where H is the Jacobian matrix of H as is done in the traditional scheme, is based on the Hessian of J in Eq. (7) where I N is the identity matrix in ensemble space. The complete algorithm is recalled in algorithm 1. In this algorithm, U is an arbitrary orthogonal matrix in ensemble space that preserves the ensemble mean (Sakov and Oke, 2008).
Require: The forecast ensemble {x k } k=1,...,N , the observations y and error covariance matrix R 1: Compute the mean x and the anomalies A from {x k } k=1,...,N . 2: Compute Y = HA, δ = y − Hx 3: Find the minimum: : Compute x a = x + Aw a . 6: Compute W a = ((N − 1) a ) 1/2 U 7: Compute x a k = x a + A W a k The filter was, in particular, tested on the Lorenz '95 toymodel (Lorenz and Emmanuel, 1998), using the root-meansquare error of the analysis as a criterion. In this context, EnKF-N was used without inflation which is usually required to stabilise or optimise the performance of such system (Anderson and Anderson, 1999). As a consequence the burden of tuning inflation was avoided. Yet, EnKF-N (ε N = 1 type) systematically levelled or slightly outperformed EnKF with optimally tuned inflation, for a large range of time intervals between updates. The extra numeral cost of using EnKF-N instead of EnKF was deemed marginal, especially for highdimensional systems. This statement will be confirmed and clarified in the present article.

An ensemble Kalman filter for strongly nonlinear systems
The extended Kalman filter propagates the error covariance matrix from time t 1 to time t 2 , based on a tangent linear model that has been computed at the previous analysis step around the analysis x 1 , and the tangent linear model should be corrected and obtained at this new state. One would usually solve for the optimal analysed state at time t 2 by minimizing where P 2 is the background error covariance at time t 2 . Instead, it is preferable, in strongly nonlinear conditions, to minimise for the (re-)analysed state x 1 conditional on the future observations: where M is the transition model from time t 1 to time t 2 , and P 1 is the background error covariance at time t 1 obtained by a forecast from the assimilation step prior to t 1 and does not depend on future observations. This is equivalent to solving a one-lag extended Kalman smoother. Even with a linear observation operator, cost function J 1 is difficult to solve when the model is nonlinear, since as opposed to J 2 , J 1 is nonquadratic, so that this cost function needs to be iteratively optimised. This approach has been suggested by (Wishner et al., 1969;Jazwinski, 1970;Tarantola, 2005). Cost function J 1 can be, for instance, optimised using a Newton approach where p is the iteration index and where the gradient and the Hessian are respectively given by where M (p) is the tangent linear model computed at x (p) 1 . More recently it has been suggested to use a similar approach, but with the EnKF (Gu and Oliver, 2007). Kalnay and Yang (2010) suggested repeating the assimilation cycle by applying the ensemble transform calculated at t 2 to the ensemble from the previous iteration at t 1 until the best fit to observations is obtained. One advantage of the EnKF framework is that the use of the tangent linear model and its adjoint is replaced with the use of the ensemble. It has been properly formalized as the iterative ensemble Kalman filter (IEnKF) in the framework of the deterministic filters and tested on the Lorenz '63 and '95 by Sakov et al. (2012). At the cost of additional iterations and, hence, of additional propagations of the ensemble, this IEnKF was shown to significantly outperform EnKF especially for large time interval between updates.
Two versions of the filter were put forward. The first one mimics the use of the tangent linear model by the propagation of a rescaled ensemble, a bundle of trajectories. It is called IEKF by Sakov et al. (2012). In this study, we shall not use IEKF, but a close variant that will be called the bundle variant. It will turn out to offer a performance improvement over the IEKF. The second one consists in transforming the ensemble before its propagation, using the ensemble transform obtained at the previous iteration. The inverse transformation is applied after propagation. It performs a form of preconditioning of the optimisation problem in ensemble space. We shall call it here the transform variant. The IEnKF still requires inflation. In strongly nonlinear conditions, the inflation factor may be large, although Sakov et al. (2012) remark that the sensitivity to it is weaker and that the required magnitudes are smaller than in non-iterative schemes.
Note that the minimisation scheme implicitly adopted by Sakov et al. (2012) is the iterative Newton scheme. Because the Newton method is one of many schemes to minimise cost function Eq. (11), there is a large freedom in choosing the iterative scheme. In this article, we shall choose the Levenberg-Marquardt algorithm (Levenberg, 1944;Marquardt, 1963), in order to have a better control of the optimisation and safely generalize the iterative ensemble Kalman filter of Sakov et al. (2012) to an inflation-free iterative ensemble Kalman filter, which is the final goal of this article.

Outline
In this article, the focus is on the deterministic EnKFs rather than the stochastic EnKF. The variants of the filters with localization are deliberately not studied. It is well beyond the objective of this article and, to some extent, a disconnected topic.
To start with, we shall come back to the EnKF-N and IEnKF filters. Our starting points are the results of Bocquet (2011) andSakov et al. (2012), and we develop on these two filters. In Sect. 2 we shall first give another interpretation of EnKF-N. It makes an explicit connection with inflation and provides an efficient way to optimise cost function Eq. (7). Besides it sheds light on the use of inflation and why it can be surprisingly efficient when accounting for sampling errors. In Sect. 3, we introduce an implementation of IEnKF that is based on a Levenberg-Marquardt algorithm. As a precursor of the trust-region methods, it allows the controlling of the update which will be useful in the generalization to a finite-size version of the filter, which we introduce in Sect. 4.
Numerical experiments are carried out on the new systems in Sect. 5. Conclusions and leads for improvements are given in Sect. 6.

The finite-size ensemble Kalman filter
In this section, we give a new insight on the EnKF-N, whose principles have been recalled in the introduction. For the sake of simplicity, the observation operator H is assumed linear. The generalization to a nonlinear H is not difficult and will be treated in the framework of iterative EnKFs anyway (Sects. 3 and 4). Equation (7) can be written where Y = HA is the observation anomaly matrix and δ = y − H (x) is the innovation. The minimisation of J is performed over ensemble space, which is numerically efficient in high-dimensional applications. However, this cost function was shown to be non-convex. Besides it has one global minimum and possibly additional local minima. In practice, in Bocquet (2011), the L-BFGS-B minimizer of Byrd et al. (1995) was used. The quasi-Newton algorithm prevents from forming a Hessian with a negative eigenvalue, which might occur with the joint use of such cost function and of a Newton optimisation method.

Dual scheme
Although this path is successful and efficient, we would like to find a more explicit scheme in order to establish stronger connections with traditional EnKF with inflation. We wish to split the radial degree of freedom of w, that is √ w T w, from its angular degrees of freedom, that is w/ √ w T w. To alleviate the notations, we define the functions: Having in mind J (w) of Eq. (16), a related Lagrangian is introduced: The dual cost function, that we define for ζ > 0, is given by: It is easy to check that the maximum and minimum that define D(ζ ) exist. The dual problem consists in minimizing this dual cost function: whereas the original problem is called the primal problem. In Appendix A, we demonstrate that the two global minima of D and J coincide. In particular = . This is a remarkable non-quadratic case of so-called strong duality (Borwein and Lewis, 2000). This means that the problem of finding the global minimum of our original (primal) problem can rigorously be traded with the problem of finding the global minimum of the dual function.
To connect the corresponding optimal ζ ⋆ , ρ ⋆ and w ⋆ , and to compute the dual cost function, we write the condition of stationarity of the Lagrangian over ρ and over w. It yields Since ρ ⋆ ≥ 0, Eq. (23) implies that ζ ⋆ belongs to [0, N/ε N ], the interval from 0 (excluded) to N/ε N (included). The solutions of Eqs. (23) and (24) are By inserting these solutions in the Lagrangian, one obtains the dual cost function The dual cost function is a function of one single variable. Hence, it is easy to find its global minimum, even in the presence of several minima.
As a result, instead of minimizing J (w) over w, one can equivalently: The implementation of the corresponding EnKF-N is detailed in algorithm 2.

Assets of the dual approach
Let us discuss this alternate minimisation. It has several advantages.
Firstly, even though the primal problem seems to be efficiently solved with the help of a quasi-Newton minimizer, it is only guaranteed to find one minimum, not necessarily the global one. On the contrary, because the search of the global Nonlin. Processes Geophys., 19, 383-399, 2012 www.nonlin-processes-geophys.net/19/383/2012/ Algorithm 2 Dual finite-size ensemble Kalman filter.

Require:
The forecast ensemble {x k } k=1,...,N , the observations y, and error covariance matrix R 1: Compute the mean x and the anomalies A from {x k } k=1,...,N . 2: Compute Y = HA, δ = y − Hx 3: Find the minimum: minimum is reduced to a one-dimensional problem, the dual problem allows to easily find the global minimum. We have performed numerical tests about this issue on the Lorenz '63 model (Lorenz, 1963) which are discussed in Sect. 5. We found that in marginal cases, the dual approach (global optimisation) would perform better than the primal scheme (local optimisation). However, for most of the tests (Lorenz '63 and Lorenz '95) we found no significant performance differences between the dual and primal approaches. Specifically for the numerical tests we have performed with Lorenz '95, the primal and dual algorithm systematically led to the same optimum. Secondly, the algorithm clearly exhibits the extra cost of EnKF-N against EnKF: minimizing Eq. (27) over scalar ζ . Note that the inverse matrix in the first term of Eq. (27) can be obtained from a singular value decomposition that can also be used in the gain computation Eq. (26). In practice, for the experiments of Sect. 5, we found that the extra cost is negligible.
Thirdly, this algorithm parallels the traditional EnKF scheme. Comparing Eq. (26) with Eqs. (20) and (21) of Hunt et al. (2007), it is clear that ζ replaces N − 1 found in the traditional deterministic filters. That is why ζ can be seen as the effective size of the ensemble: the mean number of members that truly contribute in the effective prior. The Lagrange multiplier ζ is also connected to an inflation of the prior error covariance matrix. It can be absorbed into a rescaling of the ensemble anomalies by a factor √ (N − 1)/ζ , so that the analysis Eq. (26) coincides with the usual formula. Therefore, if we define the inflation operation as: EnKF-N forecasts the following optimal prior inflation factor That EnKF-N can equivalently be rewritten as a a traditional EnKF with an optimal (prior) inflation factor sheds light as to why inflation can be so successful in dealing with sampling errors. At a fundamental level, the introduction of ζ was possible because of the rotational invariance of the prior in ensemble space. In short, the inflation works well to compensate sampling errors because of the exchangeability of members in the ensemble.
Finally, we found the dual approach to be more stable than the primal one. Indeed, in the primal algorithm the BFGS iterations cannot lead to a singular Hessian by construction. However, at the end of the minimisation, one has to generate the new ensemble by computing a in algorithm 1. In infinite machine precision, the Hessian is positive-definite. However, it might be singular in finite numerical conditions, in very demanding conditions, that we only found in the severe Lorenz '63 test case of Sect. 5. The dual algorithm does not meet this problem because the value of ζ a that enters the definition of a is controlled and is guaranteed to be positive, since D(ζ ) goes to +∞ when ζ vanishes.

The iterative ensemble Kalman filter
In this section, we follow the steps of Sakov et al. (2012). One minor difference is in the formulation of the iterative ensemble filter, which is written here in ensemble space. Another one is an improvement in terms of performance over the IETKF of Sakov et al. (2012), whose updated version will be called the bundle IEnKF. Another significant difference consists of noticing that one has the freedom to choose any iterative optimisation scheme. Here we shall choose the Levenberg-Marquardt scheme.

The IEnKF in ensemble space
Let us note x the ensemble mean at time t 1 and A the anomaly ensemble matrix at time t 1 . Equivalently to using cost function Eq. (11) to perform the analysis, one can use a cost function depending on the coordinates w of x 1 = x + Aw in ensemble space: The derivation of the background term can be read in Hunt et al. (2007). The iterative minimisation of the cost function following a Newton algorithm reads: where p is the iteration index and where the gradient and the Hessian are given by where is the tangent linear of the operator from ensemble space to the observation space that propagates the ensemble anomalies A through the model M and the observation operator H , and computed at x

Levenberg-Marquardt algorithm
The Levenberg-Marquardt algorithm (Levenberg, 1944;Marquardt, 1963) is a precursor of the trust-region method in the sense that it seeks to determine when the (superlinear) Newton method is applicable and when it is not, and should be replaced by the slower but safer gradient descent method (also known as steepest descent). The distinction between the two regimes is obtained by the ratio where w ′ is the new tentative vector, and L is the quadratic local expansion of J : Instead of determining a region of confidence as in modern trust-region methods, it shifts the Hessian in ensemble space used in the Newton method: H −→ H + µI N , where µ is a positive constant. If the quadratic expansion of the cost function allowed by the gradient and the Hessian matches the behaviour of the exact cost function, which corresponds to a large θ , then µ is reduced, otherwise µ is increased (small or negative θ ). When µ is small the algorithm is close to a Gauss-Newton method which has a superlinear convergence. When µ is large enough the algorithm is close to a gradient descent method. A textbook and a clear synthesis on this well-established technique are given by Nocedal and Wright (2006) and by Madsen et al. (2004). In the following, the Levenberg-Marquardt algorithm described in Madsen et al. (2004) is adapted to our ensemble Kalman filter context.
In the part of the algorithm which seeks a satisfying µ, a single model propagation is necessary. At each successful Newton step, the propagation of the full ensemble is required. The resulting IEnKF scheme is detailed in algorithm 3 of Appendix B. The algorithm describes both the bundle and the transform variants. When algorithmic steps differ in the two variants, the variant is explicitly mentioned.
In the bundle variant, the ensemble is shrunk by a small ǫ factor before propagation. It is chosen to be ǫ = 10 −4 throughout this article. It is the same as that of Sakov et al. (2012), and we found this value to be suitable for the experiments described in Sect. 5. After propagation, the ensemble is inflated by a factor 1/ǫ.
Note that in the transform variant, the last propagation of the ensemble is often unnecessary. But when the algorithm exits on, e.g., the maximum iteration criterion, it may be necessary to propagate the ensemble, because it may not have been updated at the latest w (as opposed to the Gauss-Newton implementation of Sakov et al., 2012). Therefore, it is possible to avoid this last propagation. However, for the sake of simplicity we have preferred to keep the simpler implementation.

Combining the finite-size and iterative ensemble Kalman filters
The EnKF-N and IEnKF are complementary since, when looking at the underlying cost function of the analysis, the EnKF-N modifies the prior likelihood, while IEnKF modifies the observational likelihood. Combining the outcome of the previous sections, the cost function used in the analysis as a function of coordinates w of x 1 = x + Aw should be J has a global minimum in R N since it is a positive function and since it goes to infinity as w T w goes to infinity. Several strategies are certainly possible to minimise this cost function.

Dual approach
The first one consists in using the dual transformation put forward for the EnKF-N in Sect. 2. The derivation holds if one replaces g with In particular, the strong duality holds (this can be checked going through Appendix A). As a consequence, it demonstrates that there should be an optimal inflation factor in this context. However, each one of the subproblems, indexed by ζ , which were equivalent to solving the traditional EnKF analysis in ensemble space Eq. (26), is now equivalent to solving an IEnKF analysis, with a prior inflation factor √ (N − 1)/ζ . Hence, such a path may be numerically costly. Solutions such as primal-dual algorithms may be contemplated, but we Nonlin. Processes Geophys., 19, 383-399, 2012 www.nonlin-processes-geophys.net/19/383/2012/ prefer to explore a more straightforward way to minimise Eq. (37). However, the existence of an optimal inflation factor is proven in this context.

Primal approach
The second and more direct way to minimise Eq. (37) is to minimise with respect to the w coordinates, with the sole guarantee to find a local minimum. The gradient and Hessian are The main drawback is that the Hessian may have a nonpositive eigenvalue. A priori, this precludes Newton approaches. One solution around it is to use a quasi-Newton minimizer such as L-BFGS-B (Byrd et al., 1995). Only the gradient is provided to the minimizer. This approach was successfully tested. However, on very rare occasions and only for very non-Gaussian systems, the filter can break because the approximate tangent linear leading to an approximate adjoint leads the minimizer into re-initializing the quasi-Hessian, which may break the filter. This is also a very economical approach, as part of the work is carried out by the minimizer, known to be very efficient.
However, for this article, we prefer to report results obtained in a more controlled environment. We can use the Levenberg-Marquardt algorithm as it uses a shifted Hessian, which can be made positive-definite, for large enough µ. Besides, we know that at the minimum of the cost function, the Hessian is positive-definite, so there is a neighborhood around the minimum where the Hessian is positivedefinite. This means that in the vicinity of the minimum, the Levenberg-Marquardt can safely rely on the Hessian of the cost function.
As a diagnostics, an intermediate result of the dual approach can be used. Indeed, from Eq. (23), and using the fact that ρ ⋆ = w T ⋆ w ⋆ at the saddle-point, an equivalent optimal prior inflation factor can be obtained from the analysis in ensemble space: For instance, the statistics of ζ ⋆ over a long data assimilation experiment can tell us about the need to enforce adaptive inflation or not. Cases where ζ ⋆ is strongly fluctuating are cases where IEnKF-N may outperform IEnKF with optimal but constant inflation.
The details of this primal scheme are given in algorithm 4 of Appendix B. In severe conditions, such as the one studied at the end of Sect. 5, it might be tedious to define a program in which µ is always large enough so as to guarantee a positive-definite Hessian. One rigorous way around it is to truncate the Hessian to a positive-definite matrix in the iterative minimisation, while the gradient is left unchanged. This truncation of the Hessian does not apply when building the new ensemble. This is equivalent to a slightly sub-optimal pre-conditioning of the minimisation problem. For instance, one can choose: which is an always positive-definite substitute matrix for the Hessian.

Numerical experiments
Most numerical tests will be performed on the Lorenz '95 toy-model (Lorenz and Emmanuel, 1998). It has M = 40 variables and its dynamics reads for m = 1, . . . , M: where F = 8, and the boundary conditions are chosen cyclic. It is integrated using a fourth-order Runge-Kutta scheme with a time-step of 0.05. With this choice of F , the model is strongly chaotic with 13 positive Lyapunov exponents and a doubling time of 0.42 time unit. Hence, it is difficult to control by filtering techniques and can offer simple but severe tests for new methods. Since the model is a simplistic representation of a mid-latitude band of the global atmosphere, a time step of 0.05 in the model's time represents 6 h of physical time.
The data assimilation experiment setup we have chosen consists in computing a reference run of the model, defined as the truth (Sakov and Oke, 2008;Bocquet, 2011;Sakov et al., 2012). This truth is observed for every variable each t. Each observation is perturbed by an independent draw from a Gaussian variable of mean 0 and variance 1. Because we do not want to use localization that would mask some of the properties of the filters, the ensemble size is chosen to be N = 40 in the rank-sufficient regime. Nevertheless, one can contemplate building local versions of the filters similarly to what was done by Hunt et al. (2007); Bocquet (2011). The performance of a data assimilation run is assessed computing the average root-mean-square of the difference between the data assimilation analysis and the truth (denoted rmse in the following), for a sufficiently long run. In the following, the runs' duration, that does not include the burn-in period, is 5 × 10 4 days (physical time), and we have checked that the convergence of the statistical indicators, such as the rmse, is satisfying.

Complementary experiments on EnKF-N
EnKF-N has been tested on the Lorenz '63 and Lorenz '95 models in Bocquet (2011). Here, we wish to report some complementary experiments related to the dual implementation of EnKF-N, introduced in Sect. 2, in its ε N = 1 variant. The efficient rank ζ a of the ensemble prior, as estimated by the EnKF-N formalism, is computed using Eq. (23), and ρ a = w T a w a at optimality: for a long run of EnKF-N applied to Lorenz '95, with the setup described above. Note that ζ a = N − 1 would correspond to a deterministic EnKF without inflation. The time interval between updates t is varied: t = 0.05, 0.10, . . . , 0.60, so as to probe the critical cases where inflation is strong (when ζ a /N is small). The statistics of ζ a are plot in Fig. 1: mean, standard deviation, tenth percentile and percentile. This allows to study the variability of the efficient rank, or, indirectly, of the optimal prior inflation factor (as seen by EnKF-N). It is clear that the efficient rank decreases with t. More importantly, the variability increases very significantly since the tenth percentile decreases below 20 for t ≥ 0.50. Because of this variability of the rank in time, the optimal inflation factor diagnosed by EnKF-N is not constant and varies with time. That is why we believe EnKF-N may outperform EnKF with optimised constant inflation, when the system becomes significantly nonlinear. Using Eq. (42), the efficient rank has been diagnosed from the optimal inflation of the EnKF obtained from selecting the best rmse configuration. It is also plotted in Fig. 1, and shows a similar evolution as the efficient rank of EnKF-N. Since the mean and median do not differ much, it also suggests that, most of the time, the efficient rank is rather constant, but that it occasionally suffers sudden drops. We have checked this examining sequences of ζ a . In Fig. 2 is displayed a typical sequence in the case t = 0.30.

Testing IEnKF, Levenberg-Marquardt implementation
The IEnKF is first tested in its Levenberg-Marquardt implementation, for the bundle and transform variants. Since the focus is on the ability of the IEnKF to handle strong nonlinearity, the time interval between updates t is varied: t = 0.05, 0.10, . . . , 0.60. The filters are run with optimal inflation: the inflation factors leading to the best rmses are selected. The termination criteria of the iterative minimisation are such that the maximal number of iterations is p max = 40, and the precision has reached || w|| = √ w T w ≃ e 2 , with e 2 = 10 −3 . Provided p max is large enough (especially fort large t), the results are largely independent from this choice. The choice of the influential criterion e 2 is more critical. A compromise must be found between precision and limiting the number of iterations. This leads to choosing e 2 = 10 −3 , found to be satisfying for all experiments reported in this article. The condition on the gradient was not implemented (see algorithm 3), which means that e 1 = 0. We also chose τ = 10 −3 which has an influence on the iteration number, since it tells whether the minimisation at the beginning is closer to a Newton method (small τ ) or to a steepest descent method. We found that in the weakly nonlinear regime of small t, a smaller τ could decrease the number of required iterations. This is consistent with the intuition that a steepest descent method is unnecessary in this regime and favouring Newton's method results in a finer convergence. The results are reported in Fig. 3. As a comparison, the rmse for the EnKF with optimal inflation is also reported. It clearly shows that, at the cost of cycling the ensemble propagation, the more nonlinear the propagation is the more IEnKF outperforms EnKF.
As opposed to Sakov et al. (2012), we did not find a significant difference between the transform and bundle variants for small and moderate t, as far as analysis rmses are concerned. The main difference is not in the Levenberg-Marquardt implementation, but in the fact that after the analysis full cycle, we propagate the ensemble from t 1 to t 2 avoiding to use the rescaled ensemble used within the cycle. This final propagation of the non-rescaled ensemble makes the bundle scheme used in this study no longer based on the extended Kalman filter, as the IEKF in Sakov et al. (2012).
However, note that for t ≥ 0.5, the transform outperforms the bundle variant. As opposed to Sakov et al. (2012) implementation of IEnKF, we found that the rmses are rapidly increasing beyond t ≥ 0.60. Beyond this time interval, which is more than the doubling time of the dynamical system, multiple minima are likely to form in the underlying cost function Eq. (11) as pointed out by Pires et al. (1996).
The average number of model runs used until convergence is reported in Fig. 4 for the bundle and transform variants. In addition to the N = 40, an ensemble of N = 25 members is also considered in order to quantitatively compare with the results of Sakov et al. (2012). However, we did not notice any qualitative change between the outcomes of the two ensemble configurations.
It is clear that the bundle variant requires less iterations than the transform variant, between 1 to 2 fewer iterations. This is different from the implementation of Sakov et al. (2012), who showed that the IEKF variant usually requires slightly more iterations than the transform variant, when not using the Levenberg-Marquardt framework. Note that in the transform variant algorithm, as opposed to the bundle variant, it is legitimate to skip the computation of the forecasted ensemble (lines 41-44) because it has been done earlier in line 29. This could lead to a gain of up to one iteration. However, in practice, we found it was inefficient in our test cases, since it had a negative impact on the precision which may have required additional iterations at a later cycle.
The large number of iterations needed at small t, between 3 to 4, is not very significant because the Levenberg-Marquardt is not designed to optimally perform a convergence of a very few iterations. In this case, the more straightforward Gauss-Newton method, such as the implementation of Sakov et al. (2012) is preferable.

Testing IEnKF-N
Here, the IEnKF-N, bundle and transform versions, are tested on the same configuration as IEnKF. The results are reported in Fig. 5.
First the two variants of IEnKF-N offer the same rmses in this configuration over the full range of t. Secondly, in this example, they are essentially equally performing or better than the IEnKF with optimal inflation. This shows that, as hoped for, some of the properties of the finite-size EnKF apply as well to iterative variants of the IEnKF.
The average number of model runs used until convergence is reported in Fig. 4 for the bundle and transform variants of IEnKF-N. Like for the IEnKF experiments, the same termination criterion is chosen (p max = 40, e 1 = 0, e 2 = 10 −3 , and τ = 10 −3 ) for all runs. The finite-size versions of the filters require a similar number of iterations (or less) as their optimised inflation counterparts. Here again, the bundle variant is doing better than the transform variant: it requires 1 to 2 less ensemble propagations on average required by the transform variants to achieve the same precision, using the same termination criterion.
Similarly to Fig. 1 in the case of EnKF-N, the statistics of ζ a are plot in Fig. 6: mean, standard deviation, median, tenth percentile, and percentile.
The results are qualitatively similar. However, as could be expected, the efficient rank is larger in the IEnKF-N case than EnKF-N in the same configuration. Even though the percentile remains high below t ≤ 0.5, it rapidly falls off beyond. Consistently this is where the IEnKF-N starts to slightly outperform IEnKF with optimal inflation.

Focusing on the weakly nonlinear regime
To mitigate the optimistic results obtained on Lorenz '95, we would like, in this section, to illustrate and discuss apparent limitations of the formalism. To do so, we choose the quite demanding case of the 3-variable Lorenz '63 toy model, with an ensemble of 3 members. It might not be directly relevant to high-dimensional geophysical systems. But the goal of this section is to find out about flaws or inconsistencies of the schemes and to point to directions of possible improvement.

Diagnosis
The Lorenz '63 model (Lorenz, 1963) is defined by the set of three ordinary differential equations: where σ = 10, ρ = 28, and β = 8/3. This model is chaotic with a doubling time of 0.78 time unit. It is integrated using a fourth-order Runge-Kutta scheme with a time-step of 0.01 time unit. In the data assimilation experiments ahead, all three variables are observed every t. These observations have normal uncorrelated errors of standard deviation σ obs . All runs are 5×10 5 -cycle long with an additional spin-up period of 5 × 10 4 cycles. To illustrate an apparent limitation of the finite-size formalism in either the EnKF or the iterative EnKF, we vary the time interval between updates between t = 0.05 (nearly linear regime between update steps) and t = 0.50 (strongly nonlinear regime). The error covariance will either be σ 2 obs = 1, or σ 2 obs = 8. The results are first obtained for three non-iterative ensemble Kalman filters. EnKF with optimally tuned inflation (the inflation factor that yields the best analysis rmse), EnKF-N using its primal implementation, and EnKF-N using its dual implementation are considered. As discussed in Bocquet (2011) about the application of the EnKF-N filters to the Lorenz '63 model, the ε N = 1+ 1 N variant should be used for such a small ensemble where the analysed state vector is not well estimated by the ensemble mean. The fundamental difference between the primal and dual formalism is that the primal implementation picks up one minimum for the analysis, whereas the dual implementation determines the global minimum of the analysis cost function.
The analysis root-mean-square errors are reported on Fig. 7.
First of all, it is clear that the finite-size filters underperform below some threshold equals to t = 0.16 for σ 2 obs = 1, and t = 0.14 for σ 2 obs = 8. They ultimately diverge for too small t. This divergence is not directly due to the quasilinearity of the model in that regime, but to the fact that even in that regime it still diverges without a properly tuned inflation. Indeed we have checked that the EnKF-N behaves to a large extent similarly to an EnKF on a linear advection model (LA model) used in Sakov and Oke (2008). It is, therefore, likely that this underperformance could be traced back to the determination of the optimal prior inflation through the minimisation of Eq. (27) that may become inaccurate in that regime. If one trusts the Bayesian formalism underlying EnKF-N, then according to the discussion in Bocquet (2011) this inaccuracy can be ascribed to either (i) an inappropriate choice of the hyperprior which is at the heart of the derivation of Eq. (2) and in fine specifies the particular form of Eq. (27), (ii) or the use of the prior p(x|x 1 , . . . , x N ) rather than p(x|y, x 1 , . . . , x N ), (iii) or the use of a local minimum rather than the global minimum. Point (iii) can be ruled out, because the present study essentially solved this problem.
Beyond time interval t ≃ 0.15, the EnKF-N primal and dual implementations significantly outperform the EnKF with optimally tuned inflation. As should have been expected, the dual implementation is better than the primal implementation. Note that when t ≤ 0.15, the primal variant can beat the dual one. However, since it is in a regime where the formalism breaks down for the likely reasons given above, this difference is essentially irrelevant.
A similar experiment was performed but for N = 9, closer to the asymptotic ensemble size limit. In that case, the turning point is at about t ≃ 0.04.
The same study was conducted for the iterative ensemble Kalman filters. The results are reported in Fig. 8. Note that we have not introduced any dual variant of the IEnKF-N. That is why only the IEnKF with optimally tuned inflation and the IEnKF-N in its primal implementation have been tested. The bundle variant was chosen. The results are qualitatively similar to the non-iterative filters. However, since the flow between updates is made more linear thanks to the iterative corrections, the regime is pushed toward the qualitative behaviour of small t. Consistently the turning point beyond which the IEnKF-N outperforms, is pushed towards higher t.

An empirical solution
Exploring the small t regime and the behaviour of the dual cost function Eq. (27), we found that, in that regime, the argument ζ ⋆ of its minimum is mostly given by the maximum of the interval, that is N/ε N . But if N/ε N asymptotically behaves like N − 1, it is bigger than N − 1 at finite N, for ε N = 1 or ε N = 1 + 1/N. For instance, in the case N = 3 and ε N = 1 + 1/N, one has N/ε N = 2.25 as compared to N −1 = 2. Hence, in that regime, EnKF-N tends to often implicitly deflate the ensemble, which may lead to an overconfident Kalman filter and to its divergence. From this educated guess, we propose to cap the argument ζ ⋆ of the minimum of Eq. (27) at N − 1. Note that the case ζ ⋆ = N − 1 corresponds to an absence of inflation and does not a priori guarantee the filter's stability. Moreover, rather than capping ζ ⋆ , we prefer to modify Eq. (27) in such a way that the maximum value of ζ ⋆ cannot exceed N −1. The first reason for doing so is that the duality result of Appendix A is derived on the full interval ]0, N/ε N ], and an abrupt truncation would invalidate the duality equivalence. The second reason is that a direct capping of ζ ⋆ requires an access to Eq. (27), which is not straightforward in the case of the primal schemes. For these two reasons, we propose to renormalize ε N to ε N = N/(N − 1). That way, it is easy to check through Eq. (23) that the maximum value of ζ ⋆ is ζ ⋆ = N − 1 at ρ ⋆ = 0. In the following capping will refer to this renormalization of ε N , and not to the abrupt truncation of Eq. (27) to the interval ]0, N − 1].
The resulting performances of the capped EnKF-N, primal and dual variants, are displayed in Fig. 9. With our setup, the dual EnKF-N outperforms or equals EnKF with optimally tuned inflation over the enlarged range t ∈ [0.05, 0.50]. The benefit of the dual filter over the primal variant is now even clearer in the almost linear regime.
In that regime, ζ ⋆ remains close to N − 1, with occasional excursions to smaller values. These events are mostly provoked by transitions of the dynamics of the Lorenz '63 model between lobes. Because of the short t, it is only on these occasions that the ensemble departs from the control run, whereas most of the time the system is in a quasi-linear regime where almost no inflation is needed. The behaviour of ζ ⋆ is illustrated in Fig. 10 in the case t = 0.05.
The corresponding results for the bundle IEnKF-N are reported in Fig. 11 for its primal variant. Even though the capping essentially solves the divergence problem diagnosed earlier, IEnKF-N underperforms IEnKF with optimal inflation in the nearly linear regime. Guided by the capped primal and dual EnKF-N results, we conjecture that a dual variant of the capped IEnKF-N would at least partially close this gap. However, implementing the dual IEnKF-N is certainly challenging and left as an open question.
In addition to being empirical, the capping solution cannot be fully satisfying since it leads back to some tuning. However, in the context of this numerical experiment, no scalar was tuned. It was only necessary to distinguish the weakly nonlinear regime from the rest of the t range. This also suggests that, in the context of this experiment, a genuinely satisfactory solution that avoids tuning of inflation   10. Sequence of efficient ranks ζ a in a long run of a capped dual EnKF-N applied to the Lorenz '63 model, with σ 2 obs = 1 and t = 0.05 which corresponds to a nearly linear regime, and with an ensemble of N = 3 members. The lower graph displays the x variable of the Lorenz '63 reference trajectory (the truth). Note the good correlation between the period of relative stability for ζ ⋆ and the presence of the true state in orbit within a lobe. or distinguishing between regimes, would necessitate a fine modelling of the hyperprior, beyond Jeffrey's prior that leads to the particular form of the dual cost function Eq. (27).

Conclusion
In this article, we have further more explored two recently developed ensemble Kalman filters meant to operate in high dimensional geophysical systems. We have presented and justified their algorithms. They have been mostly tested on the Lorenz '95 chaotic toy model.
The first filter, the finite-size ensemble Kalman filter, is built to reduce the impact of sampling errors, and was shown, in a perfect model context, to significantly reduce the need for inflation. We have shown that the scheme can be made equivalent to a traditional deterministic EnKF, but with an inflation of the prior, whose value is determined by the minimum of a one-dimensional non-necessarily convex costfunction. Assuming that EnKF-N significantly reduces the need for inflation accounting for sampling errors, this suggests why inflation is usually so efficient in accounting for sampling errors. It tells that this efficiency mathematically comes from the invariance of the ensemble by permutation of its members. Through a dual transformation, it was shown how to find the global minimum of the EnKF-N analysis cost function. This solves one of the open questions raised by Bocquet (2011).
The second filter, the iterative ensemble Kalman filter, outperforms EnKF in strongly nonlinear regime, at the cost of iterating the ensemble propagation between updates. The implementation of Sakov et al. (2012) corresponds to a Newton method in solving the underlying analysis cost function. In this article, we have proposed to use a Levenberg-Marquardt implementation of the filter instead. It offers a better control on the convergence, the positive-definiteness of the Hessian and seems to require less iterations in strongly nonlinear conditions. However, it is less efficient in mild nonlinear conditions where the number of required iterations is small. The transform and bundle variant of IEnKF lead to very similar results and we suggest that they should be considered as variants of a same IEnKF filter. Exploiting their complementarity, the two filtering approaches have been combined into an inflation-free iterative ensemble Kalman filter. The Levenberg-Marquardt scheme provides the necessary control on the minimisation of the non-convex underlying cost function. Their performances are close to that of the IEnKF with optimised inflation. The number of ensemble propagation required by IEnKF-N seems to be very similar to that of IEnKF in the same context.
In this article, we have deliberately eluded the rank problem aspect of the EnKF. To go beyond, and extend the results to low-rank ensemble, one should additionally built localization into this algorithm as was for instance done for EnKF-N in Bocquet (2011). But it should bring us far from the preliminary objectives of this article.
A possible improvement over the iterative filters in their Levenberg-Marquardt implementation, would be to re-shape the algorithm so that it avoids extra fundamentally unnecessary iterations for small time interval between updates. One has to keep in mind that the Levenberg-Marquardt method was designed to reduce the large number of iterations needed in the optimisation of a system built on a nonlinear model and, in the present form, is not optimal for system built on a mildly nonlinear model.
Another previously identified challenge is to build an optimisation algorithm for the cost function of the dual IEnKF-N.
In the last section of this article, we have identified a demanding regime, the almost (but not exactly) linear regime with a small ensemble, where the finite-size (iterative or not) EnKFs do not apparently succeed in determining a proper effective inflation. We have proposed a motivated but empirical solution which consists in capping the diagnosed value of degrees of freedom in the ensemble at N − 1 through a modification of the dual cost function Eq. (27). For the cases under scrutiny, this offers a satisfying solution in the EnKF-N case and a promising one in the IEnKF-N case. However, it surely requires a robust argument such as the derivation of the modified dual cost function from a more fitted hyperprior. Additionally it was shown in this special case that knowledge of the global minimum of the analysis, which is provided by the dual algorithm, leads to a systematically better performance than the primal algorithm.
For the longer term, we believe that this finite-size iterative ensemble Kalman filter can be seen as a first elementary onelag brick of an efficient ensemble Kalman smoother. and, using the concavity of the ln function and, in particular, for any ρ ≥ 0 N ln(ε N + ρ) ≤ N ln ε N + N ε N ρ , it is found that for any ρ > 0 f * * (ρ) = N ln(ε N + ρ) .
Therefore, f coincides with its bi-conjugate f * * .
To prove strong-duality, we consider the dual problem and we gradually transform it into the primal one: The key assumptions that make this derivation possible are the following. First the infima of D(ζ ), and J (w) in Eq. (A5) are attained and make the derivation meaningful. Secondly, we used the fact that f defined by Eq. (18) coincides with its bi-conjugate: f * * = f , as proven above. It can also be checked that the infimum of D(ζ ) over ζ > 0 is attained in ]0, N/ε N ].