Quasi static ensemble variational data assimilation : a theoretical and numerical study with the iterative ensemble Kalman smoother

The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a GaussNewton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This 5 is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It consists in gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces 10 to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.

Abstract.The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization.Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function.If this minimization uses a Gauss-Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum.Otherwise the method may converge to a local extremum, which degrades the analysis.With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy.This is unfortunate because the assimilation performance also increases with this temporal extent.However, a quasi-static (QS) minimization may overcome these local extrema.It accomplishes this by gradually injecting the observations in the cost function.This method was introduced by Pires et al. (1996) in a 4D-Var context.
We generalize this approach to four-dimensional strongconstraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble.This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms.We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods.Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows.We also examine the computational cost of QS implementations and suggest cheaper algorithms.

Context
Data assimilation (DA) aims at gathering knowledge about the state of a system from acquired observations.In the Bayesian framework, this knowledge is represented by the posterior probability density function (pdf) of the system state given the observations.A specificity of sequential DA is that observations are not directly available; they become available as time goes by.Thus, the posterior pdf should be regularly updated.
In order to do so, one usually proceeds in two steps: the analysis and the propagation (or forecast).During the analysis step, a background pdf is used as a prior together with the observation likelihood to construct the (often approximate) posterior pdf, following Bayes' theorem.During the propagation step, this posterior pdf is propagated in time with the model to yield the prior pdf of the next assimilation cycle.
In general these posterior and prior pdfs are not easily computable.In the Kalman filter, assumptions are notably made on the linearity of operators, to keep these pdfs Gaussian; this way, they are characterized by their mean and covariance matrix.Linear algebra is then sufficient to enforce both Bayes' theorem and the propagation step into operations on means and covariances.
However, with nonlinear models, the Kalman filter assumptions do not hold as the posterior and prior pdfs are not Gaussian anymore.A possibility in this case is to enforce Gaussianity with approximations.This requires the selection of mean and covariances intended for the Gaussian surrogate pdfs.With the Kullback-Leibler divergence, the best Gaus-sian approximation of a pdf is achieved by equating the mean and covariances (see, e.g., Bishop, 2006).However, the integrations necessary to evaluate these moments are also prohibitive.
In the 4D-Var algorithm (see, e.g., Lorenc, 2014, and references therein), the Laplace approximation gives us a way to work around the problem by replacing the posterior mean with the presumed unique minimizer of the cost function over all input values.A model propagation is then sufficient to estimate the prior pdf mean.This global approach calls for efficient global optimization routines.However, in practice, solving a global optimization problem is challenging when the number of unknowns is large, and local methods focused on finding a minimizer over an open subset like Gauss-Newton are often preferred (see, e.g., Björck, 1996).
Unfortunately, the Gauss-Newton method's ability to locate the global minimum depends on the minimization starting point and the cost function properties.Furthermore, missing this global minimum is likely to cause a quick divergence (from the truth) of the sequential DA method.Thus, it is critical for the assimilation algorithm to keep the minimization starting point in a global minimum basin of attraction.

Quasi-static variational data assimilation
Keeping the minimization starting point in a global minimum basin of attraction is constraining because, with a chaotic model, the number of local minima may increase exponentially with the data assimilation window (DAW) time extent L (Pires et al., 1996;Swanson et al., 1998).Unfortunately, assuming a perfect, chaotic -and hence unstable -model, the assimilation performs best for the longest time extents.Several strategies have been investigated to go beyond this restriction.Pires et al. (1996) propose the quasi-static (QS) minimization in a 4D-Var context: as the observations are progressively added to the cost function, the starting point (or first guess) of the 4D-Var minimization is also gradually updated.This led to the method known as QSVA for quasistatic variational assimilation.Ye et al. (2015) propose the gradual increase of the model error covariances in the weakconstraint 4D-Var cost function in a minimization over an entire trajectory; this way the model nonlinearity is gradually introduced into the cost function (see also Judd et al., 2004).They also propose to parallelize this minimization over multiple starting points to increase the chance to locate the global minimum.
On the one hand, 4D-Var benefits from the QS approach to approximate the posterior and prior means.On the other hand, with traditional 4D-Var, the prior covariance matrix is taken as static.This is appropriate when only one cycle of assimilation is considered.But this limits the dynamical transfer of error statistics from one cycle to the next, for instance when Pires et al. (1996) propose gradually moving their fixed-lag data assimilation window in order to build a sequential QSVA.

Ensemble variational methods
By contrast, four-dimensional ensemble variational (EnVar) schemes allow one to perform both a nonlinear variational analysis and a propagation of dynamical errors via the ensemble (see chap. 7 of Asch et al., 2016).The improvement brought by QS minimizations to these schemes has been suggested and numerically evaluated in Bocquet andSakov (2013, 2014) and Goodliff et al. (2015).This motivates a more complete analytical and numerical investigation.
The iterative ensemble Kalman smoother (IEnKS) (Bocquet and Sakov, 2014;Bocquet, 2016) is the archetype of such a four-dimensional nonlinear EnVar scheme, where the ensemble parts of the algorithm are deterministic.Using low-order models (usually toy-models), it was shown to significantly outperform 4D-Var, the EnKF or the ensemble Kalman smoother in terms of accuracy.
The IEnKS improves the DA cycling by keeping track of the pdfs' mean and covariance matrix.To do this, a Laplace approximation is used to replace the posterior mean and covariance matrix with the minimizer of the cost function and an approximation of the inverse Hessian at the minimizer, respectively.These moments are then used to update the ensemble statistics.The updated ensemble is next propagated to estimate the prior mean and covariance matrix.Hence, it is also critical for the IEnKS to locate the global minimum of the cost function.
Here, we are interested in the application of the QS minimization to the IEnKS.One of the variant of the IEnKS called the multiple data assimilation (MDA) IEnKS was shown (Bocquet and Sakov, 2014) to be quasi-static by design and can be seen as the EnVar generalization of the sequential QSVA by Pires et al. (1996).It was first tested in Bocquet and Sakov (2013).However, the MDA IEnKS is a specific variant of the IEnKS whereas we see here the IEnKS as an exemplar of deterministic nonlinear 4D EnVar methods.Goodliff et al. (2015) have applied QSVA numerically to a collection of hybrid and EnVar techniques on the Lorenz 1963 model (Lorenz, 1963), where they vary the magnitude of nonlinearity.Nonetheless the focus of their study was not cycling and the transfer of information from one cycle to the next, which is critical to EnVar methods.They also showed that the ensemble transform Kalman smoother outperforms all of the benchmarked methods.We have furthermore shown that this smoother was systematically outperformed by the IEnKS by design, which was also demonstrated on numerics (Bocquet and Sakov, 2013).This strengthens our claim that the IEnKS can be used here as an exemplar of deterministic nonlinear 4D EnVar methods.

Outline
The rest of the paper is organized as follows.In Sect.2, the performance dependency of 4D-Var and IEnKS algorithms Nonlin.Processes Geophys., 25,[315][316][317][318][319][320][321][322][323][324][325][326][327][328][329][330][331][332][333][334]2018 www.nonlin-processes-geophys.net/25/315/2018/ on the DAW parameters is investigated.The emphasis is on the transfer of information from one cycle to next, which distinguishes the IEnKS from 4D-Var.In order to do this, a brief presentation of 4D-Var and the IEnKS algorithms is given.We then define a measure of performance for assimilation algorithms.This definition is used to give analytic expressions for the accuracy of both algorithms with a linear, diagonal, autonomous model.This quantifies the impact of cycling on the algorithms.After these preliminaries, the nonlinear, chaotic case is studied.In Sect.3, we provide and describe the algorithms of the quasi-static IEnKS (IEnKS QS ).Section 4 is dedicated to numerical experiments with two Lorenz low-order models and to the improvement of the numerical efficiency.Conclusions are given in Sect. 5. We emphasize that the algorithmic developments of this study are not meant to improve either high-dimensional or imperfect model data assimilation techniques.Even if Miller et al. (1994) show some similarities between the perfect and imperfect settings of a model of intermediate complexity, model error would generally forbid the use of very long DAWs as sometimes considered in this study.Instead, the objective of this paper is to better understand the interplay between chaotic dynamics, ensemble variational data assimilation schemes and their cycling, irrespective of whether they could be useful in high-dimensional systems.

The data assimilation window and assimilation performance
After reviewing 4D-Var (in a constant background matrix version) and the IEnKS algorithms, we investigate the dependency of assimilation performance on the DAW key parameters.This illustrates the cycling improvement brought in by the IEnKS compared to 4D-Var.We will see that, with chaotic models, the longer the DAW is, the better the accuracy of these algorithms.Which highlights the QSVA relevance for cycled data assimilation.The evolution and observation equations of the system are assumed to be in the following form: where the unknown state x l at time t l is propagated to t l+1 with the model resolvent M : R m → R m .The model is assumed to be perfect (no errors in Eq. (1b)) and autonomous (M does not depend on time).The observation operator H : R m → R d relates the state x l to the observation vector y l .The observation errors (ε l ) l ≥ 0 are assumed to be Gaussian with mean 0 ∈ R d and covariance matrix R ∈ R d×d ; they are uncorrelated in time.
Figure 1.Schematic of a DAW.The state variable at t 0 is x 0 , the observation vector y K at time t K is the first of the DAW to be assimilated and the observation y L at present time t L is the last to be assimilated.These observations, possibly observation vectors, are represented by black dots.

4D-Var and IEnKS algorithms
Both 4D-Var and the IEnKS use a variational minimization in their analysis step.The objective of this minimization is to locate the global maximum of the posterior pdf p x 0 |y L:K of the system past state x 0 given the observations y L:K = y K , . .., y L at times t L:K = [t K , . .., t L ] ∈ R S .The system state and observations are seen as random vectors with values in R m and R d , respectively.The posterior pdf quantifies how our knowledge on the state x 0 changes with realizations of y L:K .Thus, its maximum is the most probable state after assimilating the observations.The DAW is displayed in Fig. 1.The parameters K and L are the time index of the DAW first and last assimilated observation batch, respectively.The number of observation vectors used within the DAW is L − K + 1.
To specify this posterior pdf, we have to make further assumptions on x 0 .
The initial state x 0 is assumed to be Gaussian with mean x b 0 ∈ R m and covariance matrix B ∈ R m×m : Following these assumptions, an analytic expression can be obtained for the posterior smoothing pdf p x 0 |y L:K at the first cycle, or for the cost function associated with this pdf.The latter is defined as The notation G is used rather than the traditional J to refer to an exact cost function, i.e., a cost function defined from an exact posterior pdf.Bayes' theorem consequently yields the following: and with the Gaussian assumption applied to the background and observation errors we find www.nonlin-processes-geophys.net/25/315/2018/ Nonlin.Processes Geophys., 25, 315-334, 2018 where x 2 A = x T Ax is the norm of x associated with a symmetric positive definite matrix A; c 0 is a normalization constant ensuring e −G(x 0 |y L:K ) dx 0 = 1; M l stands for l compositions of M.
The propagation corresponds to a time shift of S time steps.Thus, at the kth assimilation cycle, the posterior pdf is p x kS |y kS+L:K .Using Bayes' theorem the kth cycle cost function is where − ln p x kS |y (k−1)S+L:K is the background term.If the model and observation operators are nonlinear, an analytical expression for this prior is not accessible and one needs to approximate it.The 4D-Var and the IEnKS algorithms are solutions based on distinct approximation strategies.The 4D-Var cost function at the kth cycle, based on the static error covariance matrix B, is defined by The analysis of 4D-Var consists in by minimizing Eq. ( 7), yielding x a kS at t kS .Because this cost function depends on realizations of the random observations, x a kS is also a random variable.This analysis is then propagated at time t (k+1)S with the resolvent of the model to produce the next cycle background state: In general, G x 0 |y L:K and J x 0 ; y L:K , x b 0 only coincide at the first cycle of an assimilation because of the assumption outlined by Eq. (2).Subsequently, the background term of the 4D-Var cost function is a Gaussian approximation of the exact background term − ln p x kS |y (k−1)S+L:K .By definition, the background error covariance matrix B of the traditional 4D-Var cost function is the same for each cycle -this is not the case for the IEnKS.
The IEnKS (Bocquet and Sakov, 2014) is an ensemble method with a variational analysis.Two versions of the algorithm exist: the singular data assimilation (SDA) version where observations are assimilated only once, and the multiple data assimilation (MDA) version where they are assimilated several times.We focus on the SDA version in the theoretical development.The MDA version, which can be seen as the first published quasi-static EnVar scheme is used in the numerical experiments for comparison.Note that, for the SDA IEnKS, the number of observations is At the kth cycle of the IEnKS, the background ensemble at t kS is obtained by a propagation from the previous cycle.The ensemble members are the columns of the matrix E b kS , which is seen as a random matrix with values in R m×n .It is used to estimate the prior mean and covariance matrix as follows: where E and C are the expectation and covariance operators, respectively; x b kS and X b kS are the empirical mean and normalized anomaly of E b kS , respectively: with 1 n = [1, . .., 1] T ∈ R n a vector of ones and I n is the identity of R n .Note that Eqs. ( 9) and ( 10) are approximations because of sampling errors.If the state vector is of the form with w kS ∈ R n the control variable in the ensemble space, the IEnKS cost function is defined in the ensemble space by where • = • I with I the identity matrix.The analysis of the IEnKS consists in minimizing this cost function.It yields an updated ensemble E a kS at time t kS verifying with x a kS and X a kS representing the empirical mean and normalized anomalies of the analyzed ensemble, respectively, w a kS the cost function minimizer, ∇ 2 J w a kS ; y kS+L:kS+K , E b kS its Hessian at this minimum usually approximated in the Gauss-Newton method by Analysis Propagation Analysis Chaining of the 4D-Var and IEnKS first two cycles with S = 3, L = 4.The first 4D-Var analysis uses the background x b 0 at t 0 and the observations y L:K to give the analysis x a 0 at t 0 .It is propagated S steps forward in time to produce the new background at time t S where another analysis can be performed.The IEnKS does the same but with an ensemble.The dashed rectangle symbolizes the current DAW, black dots represent the observations assimilated in the current cycle, gray dots represent already assimilated observations and white dots represent observations not assimilated. with kS w kS , in order to avoid computing the model second derivatives.The exponent −1/2 refers to the unique symmetric definite positive inverse square root of a symmetric definite positive matrix and U ∈ R n×n an orthogonal matrix such that U1 n = 1 n .This analyzed ensemble is also propagated to time t (k+1)S to produce the next cycle background ensemble The cycle is completed by using the cost function Eq. ( 14) with time indexes incremented by S in the next analysis.4D-Var and the IEnKS are displayed in Fig. 2.

Performance of assimilation
In order to evaluate the efficiency of 4D-Var and the IEnKS with DAW parameters S and L, two measures of accuracy are investigated here: the usual empirical RMSE, and a theoretical counterpart.
At the kth cycle, the algorithm generates at time t kS an analysis x a kS from the observations.This analysis is propagated with the model l steps forward in time to yield the analysis x a kS+l meant to approximate the system's true state x kS+l .A traditional measure of assimilation performance is the root mean square error (RMSE).It is defined by The RMSE takes different names depending on when it is computed.If l = L, it is called the filtering RMSE; if 0 ≤ l ≤ L − 1, it is the smoothing RMSE with lag L − l.In the following, the smoothing RMSE will correspond to the one with (maximum) lag L.
The RMSE rigorously depends on the random variable realizations, and thus it is also a random variable.In our numerical experiments, as is usually the case, the RMSE is averaged over the cycles to mitigate this variability: Let us assume that there is a random couple x ∞S+l , x a ∞S+l whose distribution is invariant and ergodic with respect to the shift transformation: Then by Birkhoff's ergodic theorem (see Walters, 1982) the sequence (aRMSE N ) N converges when N → ∞ and its limit aRMSE verifies where the expectation E is taken over p x ∞S+l , x a ∞S+l .In this case, the aRMSE measures the long-term impact of the cycling on the assimilation accuracy.This limit is difficult to exploit algebraically.That is why, in the theoretical developments, we prefer the expected MSE (eMSE), denoted by P as follows: where the expectation is taken over p x kS+l , x a kS+l .In the following subsection, we will focus on the long term impact of the cycling on P .Simplifying assumptions will be made to express P ∞S+l = lim k→∞ P kS+l as a function of S and L.

Performance in the linear, diagonal, autonomous case
In order to obtain analytical expressions of the eMSE for 4D-Var and the IEnKS, we make drastic simplifying assumptions.
First, the model is assumed to be the resolvent of a linear, diagonal, autonomous ordinary differential equation.Thus, it can be expressed as where M = diag(α i ) i=1..m is diagonal and does not depend on x.We further assume H = hI m , B = bI m , R = rI m , where I m is the identity matrix of R m and h, r, b > 0. With these assumptions, Appendix A provides an expression for the 4D-Var asymptotic eMSE in the univariate case.The generalization to the diagonal multivariate case is obtained by summing www.nonlin-processes-geophys.net/25/315/2018/ Nonlin.Processes Geophys., 25, 315-334, 2018 up the eMSEs of each direction: The case i ≥ 1 means that too much credit is given to the background variance, which is approximated for all cycles by the constant b in our 4D-Var scheme.Therefore, the information carried by the observations is not sufficient to mitigate the exponential growth of errors in the propagation.
Concerning the IEnKS, the anomalies are assumed to be full rank to avoid any complication due to singular covariance matrices.Moreover, the linearity of the model is employed to express the background statistics: This way, sampling errors are avoided as they are not the focus of this study.The background ensemble E b kS becomes a notational shortcut for the pair x b kS , X b kS .This simplified IEnKS is actually a Kalman smoother (Cosme et al., 2012;Bocquet and Carrassi, 2017).With these assumptions, Appendix B gives an expression for the IEnKS asymptotic eMSE in the univariate case.The optimality of the IEnKS eMSE is also proven.The generalization to the diagonal multivariate case is also obtained by summing up the eMSEs of each direction: This expression shows that the eMSE components on the stable directions are null.Indeed one expects the IEnKS to be at least more efficient than a free-run, whose errors in the stable directions tend to zero.1 This is not the case for 4D-Var since the static background covariance matrix introduces spurious variance in the stable directions as seen in Eq. ( 24).
In Trevisan et al. (2010), 4D-Var error variances in the stable directions are forced to zero to improve the accuracy of the assimilation.
where c 1 does not depend on S, L, l.The contribution to P IEnKS ∞S+l on the stable direction is zero.It depends only on the lag L − l; it does not depend on S.Moreover, the filtering eMSE (l = L) is constant: the propagation compensates for the analysis.The smoothing eMSE (l = 0) decreases exponentially with L.
Concerning 4D-Var, we assume K to be fixed and S → ∞ to give an asymptotic eMSE expression: where c 2 is constant with S, l and o (1) → 0 when S → ∞.
The unstable component is close to the IEnKS overall eMSE.
The biggest difference with it concerns the eMSE on the stable component.The inexact background variance modeling adds a detrimental term to the eMSE.To qualify the long term impact of the cycling on the errors, the filtering eMSE is more instructive than the smoothing eMSE.Indeed, the smoothing eMSE is improved with L as it adds future observations (with respect to the analysis time) in the DAW.This improvement dominates the potentially detrimental impact of the Gaussian background approximations.In the filtering eMSE, this improvement is balanced by the propagation of the analysis at the end of the DAW.In Fig. 3, the filtering eMSE stable component is mitigated such that it has little effect.However, the parameter S improves the filtering eMSE on the unstable component.The bigger S becomes, the closer P 4D-Var ∞S+l is to P IEnKS ∞S+l , which is optimal (cf.Appendix B).A qualitative explanation is that to assimilate the same numbers of observations, a 4D-Var using high values of S needs fewer cycles; therefore, it less often relies on the background approximation, making the analysis more trustworthy.
Figure 4 displays the asymptotic eMSEs of both algorithms as a function of the lag for S = L = 5; these curves are similar to those of Trevisan et al. (2010).Concerning 4D-Var, the eMSE can be written as a function of the lag L − l as follows: is an exponentially increasing function of the lag.The sum therefore decreases when the unstable component is dominant and increases when the stable component is dominant.
In this section, we studied the accuracy of both the cycled IEnKS and 4D-Var eMSEs as a function of the DAW parameters under linear, autonomous, diagonal assumptions.We found that the DAW parameter L improves the smoothing eMSE and S improves the filtering eMSE.These properties will be discussed and numerically investigated in a nonlinear context in the next section.

Performance in the nonlinear, chaotic case
The results of Sect.2.3 foster the use of the largest possible L to improve the 4D-Var and IEnKS smoothing eMSEs.For filtering, the error propagation at the end of the DAW balances the gain in eMSE due to the assimilation of future observations.Thus, the filtering eMSE is not improved by the assimilation of observations distant in time; it is affected by the background pdf approximation.To assimilate the same number of observations, algorithms using high values of S need fewer cycles.Therefore, they less often rely on the background approximation.That is why the 4D-Var filtering performance is improved with S. For the IEnKS as in Sect.2.3, the prior pdf approximation is exact so that there is no dependence on S.This is no longer true in the nonlinear case.Hence, with a similar reasoning, one actually expects an improvement on the IEnKS filtering performance with S. As a matter of fact, it has been shown that with a nonlinear chaotic model, the filtering accuracy increases with L in most cases (see Bocquet and Sakov, 2014, and Sect. 4 of the present paper).

Multiple local minima
However, with a chaotic model, Pires et al. (1996) show that the 4D-Var cost function number of local extrema increases with L, making minimization problematic.We show in this section that the IEnKS cost function suffers from the same problem.
This behavior will be illustrated with the Lorenz 95 (L95) model (Lorenz and Emanuel, 1998).It represents a midlatitude zonal circle of the atmosphere and is described by a set of m nonlinear differential equations: where x j is the j -th modulo m component of x, m = 40 and F = 8.This equation is integrated using a fourth-order Runge-Kutta scheme with a time step of δt = 0.05.The dynamics of L95 are chaotic; the L95 model's largest Lyapunov exponent is λ 1.7.
Figure 5 shows a typical IEnKS cost function profile in one direction of the analyzed ensemble space for multiple values of the DAW parameters.The system is observed at every time step and The curves have more and more local extrema as L increases.The curves with the highest amplitudes of the ripples are found for small values of S. Indeed, an averaging effect may settle in as the number of observations increases.
This hilly shape causes minimization problems.A possible minimization procedure for the IEnKS is the Gauss-Newton (GN) algorithm (e.g., Björck, 1996).GN is not a global procedure meaning that, depending on the starting point and the cost function properties, the algorithm can con-verge towards any local extremum, take many iterations or even diverge.However, if the cost function is quadratic, the global minimum is reached in one iteration.The cost function non-quadraticity is induced by the model nonlinearity.
In the following, we will give a heuristic argument yielding a bound on the S parameter beyond which the GN method probably misses the global minimum in the configuration H = B = R = I m .To some extent, our argument can be seen as an improvement on the notion of useful DAW length introduced by Pires et al. (1996) beyond which the performance gain is negligible.In contrast with the useful DAW length, we account in the following for both the cycling and the nonlinearity.

Effective data assimilation window length
First, the GN convergence properties are drastically simplified.We assume the method converges to the global minimum if, and only if, the minimization starting point is in a neighborhood of the global minimizer where the IEnKS cost function is almost quadratic.
Unfortunately, this minimizer is unknown because the cost function depends on realizations of many random variables.In order to eliminate this variability, Pires et al. (1996) introduced a so-called error-free cost function.We prefer an averaged cost function J ∞S in this particular study defined by Relying on an ergodicity assumption, Appendix C proves that this averaged cost function verifies where the ergodic random variables x ∞S , X b ∞S and x ∞S have been defined in Sect.2.2 and Appendix C. As seen in Eq. ( 32), a sufficient condition for the starting point w = 0 to be in a neighborhood of the global minimizer where the cost function is assumed almost quadratic requires that x b ∞S be in a neighborhood of x ∞S where all the M l K≤l≤L are almost linear.
In the univariate case, if the model behavior is almost linear and unstable, we can use Eq. ( 26) to estimate the terms in the sum in Eq. ( 32) at the starting point w = 0: where α is the linear part of the model and the extra S accounts for the propagation.But in a necessarily bounded physical system, the right-hand side of Eq. ( 33) cannot grow indefinitely with l + S. Such model saturation imposes with B representing a bound.Hence, Eqs. ( 33) and ( 34) yield the following inequality on S: We choose l = L because it corresponds to the most constraining case.When Eq. (35a) is violated, x b ∞S departs from x ∞S such that the nonlinearities of M L are significant.
To apply this inequality to the L95 model we choose the following: σ being the mean of the singular values greater than 1.This corresponds to an average of the error amplification by dM dx (x ∞S ) in the local unstable subspace.We also choose the average squared norm between two long trajectories for the bound B: This quantity is greater than E δx b ∞S+l 2 because the IEnKS asymptotic performance is at least better than a free run.From the values of α and B we find S max = 14.
Figure 6 shows the filtering and smoothing aRMSEs of an IEnKS L = S with L95 as a function of S; the performance strongly deteriorates for S > 16, which is remarkably consistent with our estimation.
Figure 6 also shows another difference with the linear case: the IEnKS filtering aRMSE depends on S. The former discussion on the local extrema explains this dependency for large values of S.
However, for small values of S, the decreasing aRMSE has not been explained.This is a consequence of the Gaussian background approximation.At each cycle, the IEnKS uses the background ensemble E b kS to estimate the first two moments of the background pdf and makes the following ap-  proximation: Because the model is nonlinear, this pdf is unlikely to be Gaussian.Therefore, this approximation results in a loss of information and the more it is used, the farther from G the IEnKS cost function is.This is exactly what happens when S is small: to assimilate the same number of observations, the IEnKS uses more cycles so that it relies more on the Gaussian background approximation.
3 Quasi-static algorithms We have seen in the previous section that the effective DAW length is constrained by the cost function non-quadraticity.In this section we review and propose algorithms able to overcome these minimization issues and reach longer DAWs.Quasi-static algorithms were introduced by Pires et al. (1996) in a 4D-Var context.The idea behind QSVA is to control the way observations are added to the cost function in order to keep the minimization starting point in the basin of attraction containing the global minimizer.The method is carried out by repeating the minimization with an increasing number of observations: the first minimization is performed using the cost function with a single observation vector, then the number of observation vectors is increased and another minimization can be performed with the former minimizer as the new starting point.The process is then repeated until all observations are accounted for.This procedure is directly applicable to the IEnKS cost function minimization.The left panel in Fig. 7 is a schematic of a QS minimization and Algorithm 1 gives the pseudo-code of an IEnKS with a QS minimization.The new parameters (L q ) q<N Q control the number of observations added at each minimization, where N Q is the total number of batches of observation vectors.
The first three lines initialize the minimization starting point, the ensemble mean and anomaly matrix.The for loop in lines 4-23 repeats the QS minimization.The while loop in lines 6-22 is the Gauss-Newton minimization.Lines 7 and 8 center the ensemble on the current minimizer.Lines 9 and 10 initialize the cost function gradient and the approximate Hessian.The for loop in lines 12-18 compute the observation terms of the cost function, the gradient and the approximate Hessian.Lines 13 and 14 use a finite difference formula to compute the tangent linear and adjoint of w −→ H • M l x b 0 + X b 0 w .Lines 15 and 16 use this adjoint to update the gradient and approximate the Hessian.Lines 19 and 20 solve the linear system of the Gauss-Newton algorithm to update the current minimizer.When GN convergence is reached, this minimizer will be used as a starting point for the next QS minimization.Line 24 updates the ensemble.Line 25 propagates the updated ensemble to the next assimilation cycle.Figure 5 illustrates the QS scheme on a single analysis, as described in the caption.
To cycle the scheme, the DAW is then shifted with a small S.This ensures minimal cost function deformation, since few vectors of observation enter and few leave the DAW.This new cost function is then minimized using the forecast of the preceding minimizer as a starting point.
To keep this cost function deformation statistically consistent, Bocquet and Sakov (2014) advocate the use of S = 1 and assimilate one observation vector at the end of the DAW, which avoids multiple assimilation of the same observations.It is also the easiest way to ensure inequality (Eq.35a).However, this method has been shown to be suboptimal because of the frequent Gaussian background approximations.Moreover, it ultimately fails to be QS since there is only one distant observation vector per analysis.
An alternative is to keep S = 1 but use all observations in the DAW and, consequently, relax the condition K = L−S + 1.This way, observations are assimilated several times.This is done in the MDA IEnKS (Bocquet and Sakov, 2014).To keep the statistics consistent, at least in the linear/Gaussian case, the observations error covariances should be adequately altered.Hence, the MDA IEnKS is truly a QS scheme and makes a good reference scheme for our numerical experiments.However, the multiple assimilation of the observations introduces spurious correlations in the nonlinear/non-Gaussian case, which entail sub-optimality.A scheme similar to the IEnKS QS has also been successfully used in Carrassi et al. (2017) to compute model evidence.Indeed, the efficient computation of model evidence as an integral over the state space depended on the proper identification of a global Minimizer: Minimizer: . Schematics of an IEnKS QS minimization (a) and an IEnKS QC minimization (b).The rectangle contains the observations to be assimilated and the black dots represent the observations used in the current minimization.The "Gauss-Newton" surrounded by arrows represents the iterations of the Gauss-Newton procedure.The number of quasi-static steps is N Q = 3.The flow of observations is controlled by the parameters (L 0 , L 1 , L 2 ) = (3, 5, 6).For the QC IEnKS, the number of GN iterations is controlled by the (j 0 , j 1 , j 2 ) parameters.
maximum of the integrand.However, its implementation was based on the update of the ensemble whenever an observation batch is added to the cost function, which is not as numerically efficient as the scheme presented here.The success of the QS minimization lies in the fact that, when an observation is successfully assimilated, the eMSE is reduced.Thus, the analysis probability mass concentrates around the true state.The analysis is then more likely to be in a neighborhood of the true state where the model is linear.The cost function non-quadraticity can then be increased by adding a new term in it.This is confirmed by the following argument: let P (q) be the IEnKS asymptotic eMSE at the qth step of a QS minimization.With the notations and assumptions of Sect.2.3, i.e., in a linear context, we have the following recurrence relation: Thus, P (q+1) < P (q) and we can increase the cost function non-quadraticity by adding new terms in it as long as the propagation of errors does not exceed the bound: This implies which yields S ≤ N Q S max .Therefore, the QS minimizations allow for a N Q times longer DAW.
Unfortunately, these QS minimizations are very expensive.Indeed, they add a third outer loop repeating N Q GN minimizations.The GN iterations used to compute the intermediate starting points give unnecessary precision; all that is required for these starting points is to be in a neighborhood of x ∞S where the model is almost linear.Thus, one can restrain the number of intermediate GN loops and save the full convergence to the last minimization.This is done in the quasi-convergent IEnKS (IEnKS QC ) with the parameters (j q ) q<N Q .These parameters correspond to the numbers of GN loops in the intermediate QS minimizations.They are typically equal to 1 except for the last one.Algorithm 2 gives the pseudo-code of the IEnKS QC and the right panel in Fig. 7 is a schematic for it.

Algorithm 1 One cycle of the IEnKS QS
Require: E b 0 the background ensemble at t 0 ; λ the inflation; (L q ) q<NQ a list of DAW time indexes; the finite differences step; δ, j max GN end of loop parameters; 1 = (1, . . ., 1) T ∈ R n and I is the identity of R n . 11: for l = K, ..., L q do 13: end for 19: solve ∇2 Jδw = ∇J 20: 4 Numerical experiments with low-order models In the following, we perform numerical experiments with the Lorenz 1963 (L63) and Lorenz 1995 (L95) models.L95 has already been presented in Sect.2.4.L63 (Lorenz, 1963) is a simplified model for atmospheric convection.It is defined by the following ordinary differential equations: These equations are integrated using a fourth-order Runge-Kutta scheme with a time step of δt = 0.01 and (σ, ρ, β) = (10, 28, 8/3).The dynamics of the L63 model are chaotic, with a largest Lyapunov exponent given by λ 0.91.Both models are assumed perfect.The truth run is generated from a random state space point.The initial ensemble is generated from the truth with B = I m where m = 40, 3 and n = 20, 3 for L95 and L63, respectively.Observation vectors are generated from the truth with H = R = I m every t = 0.05 for L95 and every t = 0.02 for L63.A burn-in period of 5 × 10 3 × t is enforced in both cases.
Unlike Goodliff et al. (2015), our numerical experiments neither address increasing nonlinearity, nor do they address the use of climatological background error covariance matrices.Instead, we focus exclusively on the IEnKS performance dependence on the DAW key parameters L, S and the number N Q of QS minimizations.Goodliff et al. (2015) numerically evaluates the QSVA approach with hybrid and EnVar techniques, similarly to Bocquet and Sakov (2013), and confirms the findings of Pires et al. (1996) albeit in a En-Var context.In terms of accuracy, Goodliff et al. (2015) show that the ensemble transform Kalman smoother (ETKS) outperforms all hybrid schemes in their numerical experiment.Since the IEnKS systematically outperforms the ETKS in all Algorithm 2 One cycle of the IEnKS QC Require: E b 0 the background ensemble at t 0 ; λ the inflation; (L q ) q<NQ a list of DAW time indexes; (j q ) q<NQ the number of intermediate GN loops; the finite differences step; δ GN end of loop parameter; 1 = (1, . . ., 1) T ∈ R n and I is the identity of R n ; 1: w a 0 = 0 12: for l = K, ..., L q do 13: end for 19: solve ∇2 Jδw = ∇J 20: w a 0 = w a 0 − δw 21: until ||δw|| ≤ δ or j ≥ j q 23: end for 24: conditions (Bocquet andSakov, 2013, 2014) as long as the DAW length is not excessively long (for a chaotic model), then one concludes that our RMSEs would be systematically equal to or smaller than those reported for any hybrid scheme in Goodliff et al. (2015).Figures 8 and 9 show the aRMSE of the IEnKS and IEnKS QS for both L95 and L63 as a function of S and L. The smoothing and filtering performance of the IEnKS increases for small values of L, S then decreases for high values of L, S. This is due to the appearances of local minima.As noted by Goodliff et al. (2015) with L63, the QS variant allows one to reach much longer DAWs, and improves the performance.However, some limits to the L95 model method are visible with when, for L = 50, best smoothing aRMSEs are reached for S < 50.However, the IEnKS QS filtering performance is invariant with L and improves with S as in the 4D-Var filtering performance shown in Fig. 3.As suggested by Pires et al. (1996), one may be tempted to estimate the useful DAW length from past observations beyond which the performance gain is negligible.However, their study estimated useful DAW length in a one-cycle 4D-Var context with a focus on the filtering RMSE.Hence, this estimation is not directly relevant for a cycled IEnKS QS .By contrast, in this study, a lot of observations have already been assimi-lated and condensed in the background approximation.Thus, the performance gain with the DAW length comes from the precision of this Gaussian background approximation; a precision that the linearized theory is not able to provide, attested to by the filtering (l = L) performance independence of Eq. ( 26) with the DAW parameters.
Figure 10 compares the smoothing aRMSE (first column), the filtering aRMSE (second column) and the number of ensemble propagations (third column) of the IEnKS QS (N Q = S, S = L), the IEnKS (S = L) and the IEnKS-MDA (S = 1), for both L95 and L63 as a function of L. The number of ensemble propagations is the total number of ensemble propagations in units of t divided by the total number of assimilated observation vectors.For L < 20, all three algorithms show smoothing and filtering performance improvements with L. For L > 20, the IEnKS filtering and smoothing RMSEs increase because of the multiple local extrema.For L < 40, the IEnKS QS has smaller aRMSEs than the IEnKS-MDA.Because the IEnKS QS is SDA by design, it does not suffer from sub-optimality related to multiple assimilations and nonlinearity.Moreover, the IEnKS QS always requires less propagations of the ensemble, which improves the computational cost.However, for L > 40 with the L95 model, the quasi-static approach cannot sustain the nonlinearity any-  more and the IEnKS QS aRMSE degrades.Hence, the IEnKS-MDA S = 1 still has the best performance.With the L63 model, the IEnKS QS is always better than the IEnKS-MDA suggesting that L could still be increased.
Figure 11 compares the smoothing aRMSE (first column), the filtering aRMSE (second column) and the number of ensemble propagations (third column) of the IEnKS QS and IEnKS QC as a function of the N Q parameter.The IEnKS QS aRMSE quickly decreases after a point for both algorithms and for both models.Before this point, the algorithms fail to find the global minimum and the RMSE is close to the climatological variance.After this point, the algorithms succeed in finding the global minimum and the RMSE is low.For the IEnKS QS with the L95 model, this point can be estimated using results of Sect. 3 by S/S max 3.6, in remarkable agreement with Fig. 11.This point comes later for the IEnKS QC but it demands less ensemble propagations making this algorithm numerically more efficient than the IEnKS QS .However A. Fillion et al.: Quasi-static ensemble variational data assimilation those ensemble propagations have different behavior for the L95 and L63 models when the minimizations fail to find the global minimum.For the L95 model, the number of ensemble propagations is high meaning that the minimization takes a lot of iterations and fails to converge.For the L63 model, the number of ensemble propagations is low indicating that minimizations converge but to a non-global extrema.

Conclusions
In this paper, we have extended the study of Pires et al. (1996) on quasi-static variational data assimilation, focused on 4D-Var technique, to cycled data assimilation schemes and specifically four-dimensional nonlinear ensemble variational techniques, an exemplar of this being the iterative ensemble Kalman smoother (IEnKS).
The long term impact of cycling was first theoretically investigated in a linear context for 4D-Var and the IEnKS, then numerically for the IEnKS in a nonlinear context.The way information is propagated between data assimilation cycles indeed explains the difference between 4D-Var and the IEnKS.Both reveal performance improvements with the DAW parameter S, which counts the number of observation vectors within the DAW, as well as the time shift between cycles.This is a consequence of the Gaussian background approximation; the larger S is, the less the assimilation relies on it.However, it is observed that this improvement has a limit in the chaotic, perfect model case.The cost function global minimum basin of attraction appears to shrink with increasing L. This causes the Gauss-Newton procedure to miss the cost function global minimum, which deteriorates the assimilation performance.
Quasi-static minimizations lead slowly but surely to the global minimum by repeated cost function minimizations.As the DAW length L is gradually increased, the starting point of the minimization remains in the global minimum basin of attraction.For most S, L couples, the quasi-static IEnKS turns out to be a more accurate substitute for the multiple data assimilation IEnKS (IEnKS-MDA).
Unfortunately, this method (IEnKS QS ) adds an outer loop which could significantly increase the numerical cost.Precision on intermediate minima being superfluous, one can limit the intermediate Gauss-Newton number of loops.The unavoidable space increments required to minimize the nonquadratic cost function are thus reported in time in the quasiconvergent IEnKS (IEnKS QC ).
We did not focus on the applicability of the methods to high-dimensional and imperfect models.In particular, we considered very long DAWs, which, even if of high mathematical interest or for low-order reliable models, is less relevant for significantly noisy models.However, we know from Swanson et al. (1998), that the perfect model results are expected to extend to the imperfect model case provided that the growth rate of the model error is similar to that of the leading Lyapunov vectors of the model.This is likely to also apply to a (strong-constraint) IEnKS QS .Finally, an extension to this work would therefore consist in investigating the same ideas but using a weak constraint 4D-Var and IEnKS (Trémolet, 2006;Sakov and Bocquet, 2018;Sakov et al., 2018).where c k+1 is a constant independent from x (k+1)S and w (k+1)S .Hence G x (k+1)S |y (k+1)S+L:K is Gaussian with moments given by Eqs.(B1) and (B2).
The conditional variance Eq. ( B2) is therefore related to the IEnKS performance by the total law of expectation: Then, from Eq. ( 16) we get the recurrence relation: Thus, X a kS is not random and P IEnKS kS = X a kS 2 .Equation (B8) shows that the sequence of inverse IEnKS eMSEs is arithmetico-geometric: where the notation Eq. (A7) has been used.Properties of arithmetico-geometric sequences allow one to obtain the IEnKS asymptotic eMSE: and the generalization to asymptotic eMSEs with lag L − l is straightforward: Let us now show that the IEnKS eMSE is optimal.Let x a kS (y kS+L:K ) be the 4D-Var analysis or any other function of y kS+L:K .A bias-variance decomposition (e.g., Bishop, 2006)  In the multivariate, diagonal case the algebra can be conducted on each direction independently.Thus, the eMSE in this case is the sum of the univariate eMSEs of each direction. where

Figure 3 .
Figure3.The asymptotic smoothing (a) and filtering (b) eMSEs of 4D-Var and the IEnKS.eMSEs are displayed as functions of L, S with their components on the stable and unstable subspaces.The L parameter is on the abscissa axis for 4D-Var and on the ordinate axis for the IEnKS.The S parameter is on the abscissa axis for the IEnKS and on the ordinate axis for 4D-Var.

Figure 4 .
Figure 4.The asymptotic eMSEs as a function of the lag (lag = 0 is the filtering performance and lag = L is the smoothing performance).The superposing curves have been slightly translated for better readability, S = L = 5, h = b = r = 1.

Figure 5 .
Figure5.Cost functions of the IEnKS projected in one direction of the analyzed ensemble (hence centered and normalized) with various DAW parameters S, L. A quasi-static minimization can be visualized from these panels.It begins with the bottom-left cost function; the orange square is the starting point and the green dot is at the minimum.From the bottom-left cost function up to the top-right cost function, batches of 9 then 10 observation vectors are progressively added to the DAW, and the minimizer (green dot) is updated accordingly.

Figure 6 .
Figure6.Smoothing aRMSE (a) and filtering aRMSE (b) of a Gauss-Newton IEnKS L = S as a function of S with the L95 model over 5 × 10 5 cycles (logarithmic scale).We used the finite-size version(Bocquet, 2011) to account for sampling errors and avoid the need for inflation, H = R = B = I m , n = 20.

Figure 8 .Figure 9 .
Figure 8. IEnKS (lower triangles) and IEnKS QS (upper triangles, N Q = S) smoothing and filtering aRMSEs as a function of L and S with the L95 model.The L parameter is on the abscissa axis for the IEnKS and on the ordinate axis for the IEnKS QS .The S parameter is on the abscissa axis for the IEnKS QS and on the ordinate axis for the IEnKS.For readability, smoothing RMSE beyond 0.10 and filtering RMSE beyond 0.20 are in the same color and the scale is logarithmic.

=
of this estimator yields E x kS ,y kS+L:K x kS − x a kS 2 E y kS+L:K V x kS x kS |y kS+L:K + E y kS+L:K E x kS x kS |y kS+L:K − x a ∞S w and x b ∞S , X b ∞S are respectively the mean and normalized anomaly of M S E a ∞S .Finally, www.nonlin-processes-geophys.net/25/315/2018/ Nonlin.Processes Geophys., 25, 315-334, 2018