The ensemble Kalman smoother (EnKS) is used as a linear least-squares solver in the Gauss–Newton method for the large nonlinear least-squares system in incremental 4DVAR. The ensemble approach is naturally parallel over the ensemble members and no tangent or adjoint operators are needed. Furthermore, adding a regularization term results in replacing the Gauss–Newton method, which may diverge, by the Levenberg–Marquardt method, which is known to be convergent. The regularization is implemented efficiently as an additional observation in the EnKS. The method is illustrated on the Lorenz 63 model and a two-level quasi-geostrophic model.

Introduction

Four-dimensional variational data assimilation (4DVAR) is a dominant data assimilation method used in weather forecasting centers worldwide. 4DVAR attempts to reconcile model and data variationally, by solving a large weighted nonlinear least-squares problem. The unknown is a vector of system states over discrete points in time, when the data are given. The objective function minimized is the sum of the squares of the differences of the initial state from a known background state at the initial time and the differences of the values of the observation operator and the data at every given time point. In the weak-constraint 4DVAR , considered here, the model error is accounted for by allowing the ending and starting states of the model at every given time point to be different, and also adding to the objective function the sums of the squares of those differences. The sums of the squares are weighted by the inverses of the appropriate error covariance matrices, and much of the work in the applications of 4DVAR goes into modeling those covariance matrices.

In the incremental approach , the nonlinear least-squares problem is solved iteratively by solving a succession of linearized least-squares problems. The major cost in 4DVAR iterations is evaluating the model, the tangent and adjoint operators, and solving the large linear least squares. A significant software development effort is needed for the additional code to implement the tangent and adjoint operators to the model and the observation operators. Straightforward linearization leads to the Gauss–Newton method for nonlinear least squares . Gauss–Newton iterations are not guaranteed to converge, not even locally, though a careful design of an application system may avoid divergence in practice. Finally, while the evaluation of the model operator is typically parallelized on modern computer architectures, there is a need to further parallelize the 4DVAR process itself.

The Kalman filter is a sequential Bayesian estimation of the Gaussian state of a linear system at a sequence of discrete time points. At each of the time points, the use of the Bayes theorem results in an update of the state, represented by its mean and covariance. The Kalman smoother considers all states within an assimilation time window to be a large composite state. Consequently, the Kalman smoother can be obtained from the Kalman filter by simply applying the same update as in the filter to the past states as well. However, historically, the focus was on efficient short recursions , similarly as in the Kalman filter.

It is well known that weak-constraint 4DVAR is equivalent to the Kalman smoother in the linear case and when all observations are in the assimilation window. Use of the Kalman smoother to solve the linear least squares in the Gauss–Newton method is known as the iterated Kalman smoother, and considerable improvements can be obtained against running the Kalman smoother only once .

The Kalman filter and smoother require maintenance of the covariance of the state, which is not feasible for large systems, such as in numerical weather prediction. Hence, the ensemble Kalman filter (EnKF) and ensemble Kalman smoother (EnKS) use a Monte Carlo approach for large systems, representing the state by an ensemble of simulations and estimating the state covariance from the ensemble. The implementation of the EnKS in uses the adjoint model explicitly, with the short recursions and a forward and backward pass, as in the Kalman smoother. However, the implementations in and do not depend on the adjoint model and simply apply EnKF algorithms to the composite state over multiple time points. Such composite variables are also called 4-D vectors e.g.,. We use the latter approach in the computations reported here.

In this paper, we use the EnKS as a linear least-squares solver in 4DVAR. The EnKS is implemented in the physical space and with randomization. The ensemble approach is naturally parallel over the ensemble members. The rest of the computational work is relatively cheap compared to the ensemble of simulations, and parallel dense linear algebra libraries can be used; however, in high-dimensional systems or for a large lag, the storage requirements can be prohibitive e.g.,. The proposed approach uses finite differences from the ensemble, and no tangent or adjoint operators are needed. To stabilize the method and ensure convergence, a Tikhonov regularization term is added to the linear least squares, and the Gauss–Newton method becomes the Levenberg–Marquardt method . The Tikhonov regularization is implemented within EnKS as an independent observation following in a computationally cheap additional analysis step, which is statistically correct because the smoother operates only on the linearized problem. A new probabilistic ensemble is generated in every iteration, so the minimization is not restricted to the combinations of a single ensemble. We use finite differences from the ensemble mean towards the ensemble members to linearize the model and observation operators. The iterations can be proven to converge to incremental 4DVAR iterations for small finite difference steps and large ensemble sizes . Thus, in the limit, the method performs actual minimization of the weak-constraint objective function and inherits the advantages of 4DVAR in handling nonlinear problems. We call the resulting method EnKS-4DVAR.

Combinations of ensemble and variational approaches have been of considerable recent interest. Estimating the background covariance for 4DVAR from an ensemble was one of the first connections . It is now standard and became operational . use a two-way connection between EnKF and 4DVAR to obtain the covariance for 4DVAR, and 4DVAR to feed the mean analysis into EnKF. EnKF is operational at the National Centers for Environmental Prediction (NCEP) as part of its Global Forecast System Hybrid Variational Ensemble Data Assimilation System (GDAS), together with the Gridpoint Statistical Interpolation (GSI) variational data assimilation system .

The first methods that use ensembles for more than computing the covariance minimized the 3DVAR objective function in the analysis step. The maximum likelihood ensemble filter (MLEF) method by works in the ensemble space, i.e., minimizing in the span of the ensemble members, with the control variables being the coefficients of a linear combination of the ensemble members. use an iterated ensemble Kalman filter (with randomization) in the state space, with a linearization of the observation operator obtained by a regression on the increments given by the ensemble. This approach was extended by to a Levenberg–Marquardt method, with the regularization done by a multiplicative inflation of the covariance in the linearized problem rather than adding a Tikhonov regularization term. and minimize the (strong-constraint) 4DVAR objective function over linear combinations of the ensemble by computations in the observation space.

The iterated ensemble Kalman filter by , called IEnKF, minimizes the lag-one 4DVAR objective function in the ensemble space, using the square root EnKF as a linear solver in the Newton–Gauss method, and rescaling the ensemble to approximate the tangent operators, which is similar to the use of finite differences and EnKS here. combined the IEnKF method of with an inflation-free approach to obtain a 4-D ensemble variational method, and with the Levenberg–Marquardt method by adding a diagonal regularization to the Hessian. and used Levenberg–Marquardt for faster convergence, as an adaptive method between the steepest descent and the Gauss–Newton method rather than to overcome divergence. also considered scaling the ensemble to approximate the tangent operators (“bundle variant”) as in . extended IEnKF to a smoother (IEnKS) with fixed lag and moving window and noted that Gauss–Newton can be replaced by Levenberg–Marquardt. The method is formulated in terms of the composite model operator, i.e., with strong constraints. developed the method further, including cycling. note that various optimizers could be used in IEnKF/IEnKS; the present method can be understood as EnKS used as such an optimizer.

It is well known that for good practical performance, ensemble methods need to be modified by localization to improve the sampling error. Ensemble methods can be localized in multiple ways . For methods operating in the physical space, localization can be achieved, e.g., by tapering of the covariance matrix or by replacing the sample covariance by its diagonal in a spectral space . This is not completely straightforward for the EnKS, but implementations of the EnKS based on the Bryson–Frazier version of the classical formulation of the Kalman smoother, with a forward and backward pass, are more flexible . Methods in the ensemble space can be modified to update only nodes in a neighborhood of the observation e.g.,. The 4DEnVAR method of uses ensemble-derived background covariance, and the authors propose several methods to solve the linearized problem in each iteration by combinations of ensemble members with the weights allowed to vary spatially. compare the hybrid 4DEnVAR and hybrid 4DVAR for operational weather forecasts. “Hybrid” refers to a combination of a fixed climatological model of the background error covariances and localized covariances obtained from ensembles.

The paper is organized as follows. In Sect. , we review the formulation of 4DVAR. The EnKF and the EnKS are reviewed in Sect. . The proposed method is described in Sect. . Section  contains the results of the computational experiments, and Sect.  is the conclusion.

Incremental 4DVAR

For vectors ui, i=1,,L, denote the composite (column) 4-D vector u0:L=u0uL, where L is the number of cycles in the assimilation window. We want to estimate x0,,xL, where xi is the state at time i, from the background state, x0xb, the model, xiMixi-1, and the observations, Hixiyi, where Mi is the model operator and Hi is the observation operator. Quantifying the uncertainty by covariances, with x0xb taken as x0-xbTB-1x0-xb0, etc., we get the nonlinear least-squares problem x0-xbB-12+i=1Lxi-Mixi-1Qi-12+i=1Lyi-HixiRi-12min⁡x0:L, called weak-constraint 4DVAR . Originally, in 4DVAR, xi=Mixi-1; the weak-constraint xiMixi-1 accounts for model error.

The least-squares problem (Eq. ) is solved iteratively by linearization, Mixi-1+δxi-1Mixi-1+Mixi-1δxi-1,Hixi+δxiHixi+Hixiδxi. In each iteration x0:Lx0:L+δx0:L, one solves the auxiliary linear least-squares problem for the increments δx0:L, x0+δx0-xbB-12+i=1Lxi+δxi-Mixi-1+Mixi-1δxi-1Qi-12+i=1Lyi-Hixi+HixiδxiRi-12min⁡δx0:L. This is the Gauss–Newton method for nonlinear least squares, known in 4DVAR as the incremental approach . Write the auxiliary linear least-squares problem (Eq. ) for δx0:L as δx0-δxbB-12+i=1Lδxi-Miδxi-1+miQi-12+i=1Ldi-HiδxiRi-12min⁡δx0:L where δxb=xb-x0,mi=Mixi-1-xi,di=yi-Hixi,Mi=Mixi-1,Hi=Hixi.

The function minimized in Eq. () is the same as the one minimized in the Kalman smoother .

Ensemble Kalman filter and smoother

We present the EnKF and EnKS algorithms, essentially following , in a form suitable for our purposes. We start with a formulation of the EnKF, in a notation useful for the extension to EnKS. The notation vNm,A means that v is sampled from the Gaussian distribution Nm,A with mean m and covariance A, independently of anything else. The ensemble of states of the linearized model at time i, conditioned on data up to time j (that is, with the data up to time j already ingested), is denoted by Xi|jN=xi|j1,,xi|jN=xi|j, where the ensemble member index always runs over =1,,N, and similarly for other ensembles. Assume for the moment that the observation operator Hi is linear; that is, Hiu=Hiu. The EnKF algorithm consists of the following steps.

Initializex0|0Nxb,B,=1,,N.

For i=1,2,,L,

The analysis step isxi|i=xi|i-1-Pi,iNHiT(HiPi,iNHiT+Ri)-1(Hi(xi|i-1)-di-wi),wiN0,Ri,where Pi,iN is the sample covariance computed from the ensemble Xi|i-1N.

Denote by AiN the matrix of anomalies of the ensemble Xi|i-1N, AiN=ai1,,aiN=xi|i-11-xi|i-1,,xi|i-1N-xi|i-1,xi|i-1=1Nj=1Nxi|i-1j. Then Pi,iN=1N-1AiNAiNT, and we can write the matrices in Eq. () as Pi,iNHiT=1N-1AiNHiAiNT and HiPi,iNHiT=1N-1HiAiNHiAiNT. In particular, the matrix Hi is used here only in the matrix–vector multiplications gi=Hiai=Hixi|i-1-xi|i-1=Hixi|i-1-1Nj=1NHixi|i-1j, which allows the matrix–vector multiplication to be replaced by the use of a possibly nonlinear observation operator Hi evaluated on the ensemble members only (Eq.  below). This technique is commonly used for nonlinear observation operators. With HiAiN=GiN=gi1,,giN, Eq. () becomes Pi,iNHiT=1N-1AiNGiNT,HiPi,iNHiT=1N-1GiNGiNT. Also, from Eqs. () and () and writing the matrix of anomalies in the form AiN=Xi|jNI-11TN, where 1 is the column vector of all ones of length N; it follows that the analysis ensemble Xi|iN consists of linear combinations of the forecast ensemble. Hence, it can be written as multiplying the forecast ensemble by a suitable transformation matrix TiN, Xi|iN=Xi|i-1NTiN,TiNRN×N, where TiN=I-1N-1I-11TNAiNT1N-1AiNAiNT+Ri-1Hixi|i-1-di+wi=1,N.

The EnKS is obtained by applying the same analysis step as in EnKF (Eq. ) to the ensemble X0:i|i-1 of 4-D composite states from time 0 to i, conditioned on data up to time i-1, X0:i|i-1N=X0|i-1NXi|i-1N, in the place of Xi|i-1, with the observation matrix H̃0:i=0,,Hi. Then, Eq. () becomes x0:i|i=x0:i|i-1-P0:i,0:iNH̃0:iT(H̃0:iP0:i,0:iH̃0:iT+Ri)-1(H̃0:ix0:i|i-1-di-wi), where P0:i,0:iN is the sample covariance matrix of X0:i|i-1N. Fortunately, the matrix–vector and matrix–matrix products can be simplified: H̃0:ix0:i|i-1=0,,0,Hix0:i|i-1=Hixi|i-1,P0:i,0:iNH̃0:iT=P0:i,iNHiT,H̃0:iP0:i,0:iNH̃0:iT=HiPi,iNHiT, which is the same expression as in Eq. (). Also using Eq. (), we obtain the EnKS algorithm.

Initialize: x0|0Nxb,B,=1,,N.

For i=1,,L:

Compute the anomalies of the ensemble in the state space and in the observation space.A0:i=a0:i1,,a0:iN,a0:i=x0:i|i-1-1Nj=1Nx0:i|i-1jGiN=gi1,,giN,gi=Hixi|i-1-1Nj=1NHixi|i-1j

The analysis step:x0:i|i=x0:i|i-1-1N-1A0:iNGiNT1N-1GiNGiNT+Ri-1Hi(xi|i-1)-yi-wi,wiN0,Ri,=1,,N.

Comparing Eqs. () and (), we see that the EnKS can be implemented in a straightforward manner by applying the same transformation as in the EnKF to the composite 4-D state vector from times 0 to i, X0:i|iN=X0:i|i-1NTiN, where TiN is the transformation matrix in Eq. () Eq. 20.

EnKS-4DVAR

We apply the EnKS algorithm (Eqs. ) with the increments δx in place of x to solve the linearized auxiliary least-squares problem (Eq. ). Approximating by finite differences based at xi-1 with step τ>0, we get the action of the linearized model operator Miδxi-1+miMixi-1+τδxi-1-Mixi-1τ+Mixi-1-xi and the linearized observation operator HiδxiHixi+τδxi-Hixiτ.

The Gauss–Newton method may diverge, but convergence to a stationary point of Eq. () can be recovered by a control of the step δx. Adding a constraint of the form δxiε leads to globally convergent trust region methods . Here, we add to Eq. () a Tikhonov regularization term of the form γδxiSi-12, which controls the step size as well as rotates the step direction towards the steepest descent, and obtain the Levenberg–Marquardt method x0:Lx0:L+δx0:L, where δx0-δxbB-12+i=1Lδxi-Miδxi-1-miQi-12+i=1Ldi-HiδxiRi-12+γi=0LδxiSi-12min⁡δx0:L.

Under suitable technical assumptions, the Levenberg–Marquardt method is guaranteed to converge globally if the regularization parameter γ0 is large enough . Estimates for the convergence of the Levenberg–Marquardt method in the case when the linear system is solved only approximately exist .

Similarly as in , we interpret the regularization term γδxiSi-12 in Eq. () as arising from additional independent observations δxi0 with covariance γ-1Si. The independent observation can be assimilated separately, resulting in a mathematically equivalent but often more efficient two-stage method – simply run the EnKF analysis twice. With the choice of Si as an identity or, more generally, a diagonal matrix, the implementation of these large observations can be made efficient . We use the notation δx0:i|i-1/2 for the increments after the first half-step, conditioned on the original observations only, and δx0:i|i for the increments conditioned also on the regularization δxi0. Note that, unlike in , where the regularization was applied to a nonlinear problem and thus the sequential data assimilation was only approximate, here the EnKS is run on the auxiliary linearized problem, so all distributions are Gaussian and the equivalence of assimilating the observations at the same time and sequentially is statistically exact.

We obtain the following algorithm EnKS-4DVAR for Eq. ().

Initialize x0=xb,xi=Mixi-1,i=1,,L, if not given already.

Incremental 4DVAR (Eq. ): given x0,,xL, initialize the ensemble of increments δx0|0N0,B,=1,,N.

For i=1,,L:

Advance the ensemble of increments δx in time following Eq. (), with the linearized operator approximated from Eq. (),δxi|i-1=Mixi-1+τδxi-1|i-1-Mixi-1τ+Mixi-1-xi+vi,viN0,Qi,=1,,N.

Compute the anomalies of the ensemble in the 4-D state space and in the observation space.A0:i=a0:i1,,a0:iN,a0:i=δxi|i-1-1Nj=1Nδxi|i-1jGiN=gi1,,giN,gi=1τHixi+τδxi|i-1-1Nj=1NHi(xi+τδxi|i-1j)

The first analysis step:δx0:i|i-1/2=δx0:i|i-1-1N-1A0:iNGiNT1N-1GiNGiNT+Ri-1Hi(xi)+Hixi+τδxi|i-1-Hixiτ-yi-wi,wiN0,Ri,=1,,N.

If γ>0, compute the anomalies of the ensemble in the 4-D state space:Z0:iN=z0:i1,,z0:iN,z0:i=δxi|i-1/2-1Nj=1Nδxi|i-1/2j.The observation operator for the regularization is the identity, so the anomalies in the observation space are simply ZiN.

If γ>0, regularization as the second analysis step with zero data and data covariance γ-1Si:δx0:i|i=δx0:i|i-1/2-1N-1Z0:iNZiNT1N-1ZiNZiNT+1γSi-1δxi|i-1/2-vi,viN0,Si,=1,,N;otherwise, δx0:i|i=δx0:i|i-1/2, =1,,N.

Complete the approximate incremental 4DVAR iteration: update x0:Lx0:L+1N=1Nδx0:L|L.

Note that for small γ0, Eq. () has asymptotically no effect: δx0:i|iδx0:i|i-1/2. The computational cost of EnKS-4DVAR is one evaluation of the model Mi for the initialization, N+1 evaluations of the model Mi, and N evaluations of the observation operator Hi in each incremental 4DVAR iteration, in each of the L observation periods. In comparison, the cost of EnKF is N evaluation of the model Mi and of the observation operator Hi in each observation period. Running the model and evaluating the observation operator are the major costs in practical problems such as weather models, rather than the linear algebra of the EnKS itself, in a reasonably efficient EnKF/EnKS implementation.

It can be proven that for small τ and large N, the iterates x0:L converge to those of incremental 4DVAR . Surprisingly, it turns out that in the case when τ=1, we recover the standard EnKS applied directly to the nonlinear problem (Eq. ), as shown by the following theorem. In particular, EnKS-4DVAR does not converge when τ=1 for nonlinear problems, because the result of each iteration is determined only by the starting value x0. It is interesting that the ensemble transform approach in and corresponds to our τ=1, but it does not seem to reduce to the standard EnKS.

Theorem 1. If τ=1, then one step of EnKS-4DVAR (Eqs. ) becomes the EnKS (Eqs. ) (modified by including the additional regularization observation if γ>0). In particular, in that case, the values of x0:L+δx0:L do not depend on the previous values of x1:L.

Proof: indeed, Eq. () becomes δxi|i-1=Mixi-1+δxi-1|i-1-Mixi-11+Mixi-1-xi+vi=Mixi-1+δxi-1|i-1-xi+vi; hence, xi+δxi|i-1=Mixi-1+δxi-1|i-1+vi, which is the same as Eq. () for xi-1+δxi-1|i-1 in place of xi-1|i-1. Similarly, Eq. () becomes, with τ=1, gi=Hixi+δxi|i-1-Hixi1-1Nj=1NHixi+δxi|i-1j-Hixi1=Hixi+δxi|i-1-1Nj=1NHixi+δxi|i-1j, which is again the same as Eq. () for xi+δxi|i-1 in place of xi|i-1. Finally, the innovation term in Eq. () becomes, using Eq. (), Hi(xi)+Hixi+δxi|i-1-Hixi1-yi=Hixi+δxi|i-1-yi, which is again the same as in Eq. (), with xi+δxi|i-1 in place of xi|i-1

Computational results

In this section, we investigate the performance of the EnKS-4DVAR method, described in this paper, by solving the nonlinear least-squares problem (Eq. ) in which the dynamical models are chosen either by the Lorenz 63 system or the two-level quasi-geostrophic model . Most of the experiments assess the convergence of the incremental 4DVAR iterations, with EnKS as the linear solver in a single assimilation cycle (Sects. , ). We also demonstrate the overall long-term performance on a large number of assimilation cycles on the Lorenz 63 model in Sect. .

We first consider experiments where the regularization is not necessary to guarantee the convergence (i.e., γ=0). Lorenz 63 equations are used as a forecast model for these experiments. Section  describes the Lorenz 63 model and presents numerical results on the convergence. Using the same model, in Sect. , we investigate the impact of the finite differences parameter τ, used to approximate the derivatives of the model and observation operators, along the iterations.

Experiments where the regularization is necessary to guarantee the convergence are shown in Sect. , and we analyze the impact of the regularization parameter γ on the application to the two-level quasi-geostrophic model.

Note that for the experiments presented here, we do not use localization; hence, we choose large ensemble sizes. In all experiments, the regularization covariance Si=I.

Numerical tests using the Lorenz 63 model

The Lorenz 63 equations are given by the nonlinear system dxdt=-σ(x-y),dydt=ρx-y-xz,dzdt=xy-βz, where x=x(t), y=y(t), z=z(t) and σ, ρ, β are parameters whose values are chosen as 10, 28, and 8/3, respectively, for the experiments described in this paper. These values result in a chaotic behavior with two regimes as illustrated in Fig. . This figure shows the Lorenz attractor, which has two lobes connected near the origin, and the trajectories of the system in this saddle region are particularly sensitive to perturbations. Hence, slight perturbations can alter the subsequent path from one lobe to the other.

The state at time t is denoted by Xt=[x(t),y(t),z(t)], XtR3.

The Lorenz attractor; initial values x(0)=1, y(0)=1, and z(0)=1.

To evaluate the performance of the EnKS-4DVAR method, we will test it using the classical twin experiment technique, which consists in fixing an initial true state, denoted by truth0, and then integrating the initial truth in time using the model to obtain the true state truthi=M(truthi-1) at each cycle i. We then build the data yi by applying the observation operator Hi to the truth at time i and by adding a Gaussian perturbation N(0,Ri). Similarly, the background xb is sampled from the Gaussian distribution with the mean truth0 and the covariance matrix B. Then, we try to recover the truth using the observations and the background.

Convergence of the iterations

We perform numerical experiments without model error. The initial truth is set to truth0=[1,1,1] and the background covariance is chosen as the identity matrix of order 3, i.e., B=I3. The model is advanced in cycles of 0.1 time unit. Within each cycle, the differential equations are solved by the adaptive Runge–Kutta method implemented as MATLAB function ode45, with default parameter values. The assimilation time window length is L=50 cycles (5 time unit total). The observation operator is defined as Hix,y,z=x2,y2,z2. At each time i, the observations are constructed as follows: yi=Hi(truthi)+vi, where vi is sampled from N(0,R) with R=I3. Observations are taken for each cycle (i=1,50). The ensemble size is fixed to N=100.

Root square error given by Eq. () for the first five Gauss–Newton iterations from the Lorenz 63 problem. The initial conditions for the truth are x(0)=1, y(0)=1, and z(0)=1. The cycle length is dt= 0.1 time unit. The observations are the full state at each time step. The ensemble size is N=100. The assimilation window length is L=50 cycles. The finite differences parameter is τ=10-3.

Box plots of objective function values for the Lorenz 63 problem. From the left to the right and from the top to the bottom, the figures correspond to the results of the first, second, third and fourth iterations, respectively. The whole state is observed. Ensemble size is 50. The assimilation window is 50 cycles. In each box, the central line presents the median (red line), the edges are the 25th and 75th percentiles (blue line), the whiskers extend to the most extreme data points the plot algorithm considers not to be outliers (black line), and the outliers are plotted individually (red dots).

Same as Fig. , but for the fifth, sixth, seventh and eighth iteration, respectively.

Figure  shows the root square error (RSE) for the first five iterations, defined as RSEi(j)=1n(truthi-xi(j))(truthi-xi(j)),j=1,,5, where truthi is the true vector state at time i, xi(j) is the jth iterate at time i, and n is the length of xi. Table  shows the root mean square error (RMSE) for each iterate given by RMSE(j)=1Li=0LRSEi(j),j=1,,5.

From Table  and Fig. , it can be seen that the iterates converge to the solution, without using regularization. For these experiments, we observe that RMSE is reduced significantly in five iterations. Note that the error does not converge to zero because of the approximation and variability inherent in the ensemble approach.

The root mean square error given by Eq. () for the first six Gauss–Newton iterations, for the Lorenz 63 problem. The whole state is observed. Ensemble size is 100. The assimilation window length is 50 cycles. The finite differences parameter is 10-3.

Iteration 1 2 3 4 5 6 RMSE 20.16 15.37 3.73 2.53 0.09 0.09

Mean of the objective function from 30 runs of the EnKS-4DVAR algorithm for the Lorenz 63 problem and for different values of τ (finite differences parameter). The whole state is observed. Ensemble size is 50. The assimilation window length is 50 cycles.

Iter. τ=1 τ=10-1 τ=10-2 τ=10-3 τ=10-4 τ=10-5 τ=10-6 Init 5.61× 106 5.61× 106 5.61× 106 5.61× 106 5.61× 106 5.61× 106 5.61× 106 1 1.02× 106 1.39× 109 3.21× 109 3.54× 109 3.58× 109 3.58× 109 3.58× 109 2 1.39× 106 5.27× 107 1.70× 108 1.93× 108 1.96× 108 1.96× 108 1.96× 108 3 1.32× 106 4.14× 106 2.99× 106 3.69× 106 3.76× 106 3.77× 106 3.77× 106 4 1.38× 106 5699 3266 4431 4581.31 4594 4598 5 1.55× 106 1299 89.22 65.69 65.4442 65.41 65.26 6 1.34× 106 830.1 17.08 6.933 6.844 6.856 6.923 7 2.05× 106 826.8 10.75 1.885 1.89082 1.8 1.721 8 1.47× 106 847.4 10.82 1.68 1.63813 1.547 1.641
The impact of the finite difference parameter

Now we investigate the influence of the finite differences parameter τ used to approximate the derivatives of the model and observation operators. We use the same experimental setup as described in the previous section. The numerical results are based on 30 runs with eight iterations for the Lorenz 63 problem, with the following choices for the parameter τ: 1, 10-1, 10-2, 10-3, 10-4, 10-5, and 10-6.

Table  shows the mean of the objective function value as a function of the finite difference step τ and the number of iterations. When τ=1, the iterations after the first one do not improve the objective function. However, when τ10-1, the objective function was overall decreasing along the iterations after a large initial increase. Because of the stochastic nature of the algorithm, the objective function does not necessarily decrease every iteration, and its values eventually fluctuate around a limit value randomly. This stage was achieved after at most six iterations, so only eight iterations are shown; further lines (not shown) exhibit the same fluctuating pattern in all columns. This limit value of the objective function decreases with smaller τ until it stabilizes for τ10-3. Figures  and show more details of the statistics as box plots of the objective function values. Each panel corresponds to one line of Table .

We can conclude that, for this toy test case at least, the method was insensitive to the choice of τ10-3. This is a similar conclusion as in ; the parameter τ here plays the same role as their ε. It should be noted that very small τ, when the problem solved by the smoother is essentially the tangent problem, results in a large increase in the value of the objective function in the first iteration. This is not uncommon in Newton-type methods and highly nonlinear problems. Hence, an adaptive method, which decreases τ adaptively, may be of interest. This issue will be studied elsewhere.

Cycling

So far, we have studied the impact of the use of the stochastic solver for a single assimilation window only. Now we test the overall long-term performance. Consider again the Lorenz 63 model (Eq. ), with the parameters σ=10, ρ=28, β=8/3. This time, we use the Runge–Kutta method of order 4 with the time step of 0.01 time unit. This is the same parameter setup as the one used in . We then proceed with similar testing as in . We perform the usual twin model experiment. The initial truth state Y0 is generated from N(0,I3) distribution and the initial forecast state is then simulated by sampling from N(Y0,I3). Both states are advanced for a 50 000 model time steps burn-in period. We use the nonlinear observational operator hx,y,z=x3,y3,z3 with observational error generated from N0,σ2I3 with σ2=8 and τ=10-4. The cycle length Δt between the two available observations varies from 0.05 time unit, when the model is nearly linear, to 0.55 time unit, when the model is strongly nonlinear. We use ensemble size 10. After running multiple simulations, we have found suitable values of the parameters of the method as the number of iterations 25 and the penalty coefficient γ=10-9 when Δt=0.05 and γ=1000 otherwise. The length of the assimilation window is L=6, i.e., assimilating six observation vectors at once. Each observation vector is assimilated only once; i.e., the assimilation windows do not overlap. To create the initial ensemble at the beginning of each iteration, we use the background covariance created as a weighted average of the sample covariance from the last iteration in the previous assimilation window and the identity matrix, similarly as in . The weights are 0.99 for the sample covariance and 0.01 for the identity. The model error covariance in each cycle is Q=0.01I3. The experiment was run for 100 000 observation cycles.

We also compare the proposed method with the standard EnKF with ensemble size 10, where the initial ensemble is created after the burn-in period by adding perturbations sampled from N0,I3. For stability reasons and to preserve the covariance between ensemble members, we add noise sampled from N0,0.01I3 after advancing the ensemble. The necessity of related covariance inflation was also pointed out in . The EnKF algorithm is run every time when new observations are available.

Figure  shows that the proposed method has a significantly smaller RMSE than the EnKF in the case when the time between observation is larger and thus the behavior of the model is nonlinear. Only in the case when the cycle length between the observation is 0.05 time unit, i.e., the model behavior is nearly linear, does EnKF give a result comparable to the proposed method.

Numerical tests using a two-layer quasi geostrophic model (QG)

The EnKS-4DVAR algorithm has been implemented in the Object Oriented Prediction System (OOPS) , which is a data assimilation framework developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). Numerical experiments are performed by using the simple two-layer quasi-geostrophic model in the OOPS platform. Numerical experiments are performed to solve the weak-constraint data assimilation problem (Eq. ) by using EnKS-4DVAR with regularization. Numerical results are presented in Sect. .

Comparison of the RMSE between EnKF and EnKS-4DVAR from the twin experiment for the Lorenz 63 model. EnKS-4DVAR has better performance for the larger time interval between the observations as the model become more nonlinear. See Sect.  for further details.

A two-layer quasi-geostrophic model

The two-layer quasi-geostrophic channel model is widely used in theoretical atmospheric studies, since it is simple enough for numerical calculations and it adequately captures an important aspect of large-scale dynamics in the atmosphere.

The two-layer quasi-geostrophic model equations are based on the non-dimensional quasi-geostrophic potential vorticity, whose evolution represents large-scale circulations of the atmosphere. The quasi-geostrophic potential vorticity on the first (upper) and second (lower) layers can be written, respectively, as q1=2ψ1-f02L2gH1(ψ1-ψ2)+βy,q2=2ψ2-f02L2gH2(ψ2-ψ1)+βy+Rs, where ψ1 and ψ2 are the stream functions, 2 is the 2-D Laplacian, Rs represents orography or heating, β is the (non-dimensionalized) northward variation of the Coriolis parameter at the fixed latitude y, and f0 is the Coriolis parameter at the southern boundary of the domain. L is the typical length scale of the motion we wish to describe, H1 and H2 are the depths of the two layers, g=gΔθ/θ is the reduced gravity where θ is the mean potential temperature, and Δθ is the difference in potential temperature across the layer interface. The non-dimensional equations can be derived as follows: t=t̃U¯L,x=x̃L,y=ỹL,u=ũU¯,v=ṽU¯,β=β0L2U¯, where t denotes time, U¯ is a typical velocity scale, x and y are the eastward and northward coordinates, respectively, u and v are the horizontal velocity components, β0 is the northward derivative, and the tilde notation refers to the dimensionalized parameters.

Potential vorticity in each layer is conserved and thus is described by DiqiDt=0,i=1,2, where Di/Dt is the total derivative, defined by DiDt=t+uix+viy, and ui=-ψiy and vi=ψix are the horizontal velocity components in each layer. Therefore, the potential vorticity at each time step is determined by using the conservation of potential vorticity given by Eq. (). In this process, time stepping consists of a simple first-order semi-Lagrangian advection of potential vorticity.

Given the potential vorticity at a fixed time, Eq. () can be solved for the stream function at each grid point and then the velocity fields obtained through Eq. (). The equations are solved by using periodic boundary conditions in the west–east direction and the Dirichlet boundary condition in the north–south direction. For the experiments in this paper, we choose L=106m, U¯=10ms-1, H1=6000m, H2=4000m, f0=10-4s-1, and β0=1.5×10-11s-1m-1. For more details on the model and its solution, we refer to .

The domain for the experiments is 12 000 km by 6300km for both layers. The horizontal discretization consists of 40×20 points, so that the east–west and north–south resolution is approximately 300km. The dimension of the state vector of the model is then 1600. Note that the state vector is defined only in terms of the stream function.

Experimental setup

The performance of EnKS-4DVAR with regularization is analyzed by using twin experiments (Sect. ).

The truth is generated from a model with layer depths of D1=6000m and D2=4000m, and the time step is set to 300s, whereas the assimilating model has layer depths of D1=5500m and D2=4500m, and the time step is set to 3600s. These differences in the layer depths and the time step provide a source of model error.

RMSE values calculated by Eq. () along the incremental 4DVAR and EnKS-4DVAR iterations for different values of the regularization parameter γ, for the two-level quasi-geostrophic model (Sect. ).

Iter. 4DVAR γ=0 γ=10-3 γ=0.1 γ=1 γ=10 γ=100 γ=500 γ=103 Init 5.3026 5.3026 5.3026 5.3026 5.3026 5.3026 5.3026 5.3026 5.3026 1 3.9666 3.9713 3.9716 4.0274 4.4051 4.7046 4.8194 4.8774 4.9028 2 3.8167 3.8879 3.8903 3.8388 4.1949 4.3618 4.7136 4.8233 4.8514 3 3.8394 3.9703 3.9539 4.0927 4.1092 4.4898 4.6993 4.8093 4.8222 4 4.3390 4.1093 4.1891 3.9588 4.0232 4.4697 4.7348 4.7781 4.7771 5 3.9726 3.7723 3.7337 3.9000 3.9490 4.3866 4.7104 4.7802 4.7729 6 3.8984 3.8202 3.7302 3.8222 3.8045 4.3587 4.6785 4.7800 4.7624 7 3.7553 3.8873 3.8004 3.8619 4.0068 4.3369 4.6562 4.7742 4.7533 8 4.005 3.8183 4.1342 4.0614 3.7866 4.3147 4.6521 4.7578 4.7514 9 3.8429 3.7907 4.0450 3.7049 3.7159 4.2962 4.6358 4.7436 4.7409 10 3.8759 3.7177 4.0983 3.7242 3.6996 4.2805 4.6280 4.7239 4.7327

For all the experiments presented here, observations of non-dimensional stream function, vector wind and wind speed were taken from a truth of the model at 100 points randomly distributed over both levels. Observations were taken every 12 h. We note that the number of observations is much smaller than the dimension of the state vector. Observation errors were assumed to be independent of all others and uncorrelated in time. The standard deviations (SD) were chosen to be equal to 0.4 for stream function observation error, 0.6 for vector wind, and 1.2 for wind speed. The observation operator is the bi-linear interpolation of the model fields to horizontal observation locations.

The background error covariance matrix (matrix B) and the model error covariances (matrices Qi) used in these experiments correspond to vertical and horizontal correlations. The vertical and horizontal structures are assumed to be separable. In the horizontal plane, covariance matrices correspond to isotropic, homogeneous correlations of a stream function with Gaussian spatial structure obtained from a fast Fourier transform approach . For the background covariance matrix B, the SD and the horizontal correlation length scale in these experiments were set to 0.8 and 106m, respectively. For the model error covariance matrices Qi, the SD and the horizontal correlation length scale were set to 0.2 and 2×106 m, respectively. The vertical correlation is assumed to be constant over the horizontal grid, and the correlation coefficient value between the two layers was taken as 0.5 for Qi and 0.2 for B.

Numerical results

We perform one cycle for the experiments. The window length is set to 10 days when nonlinearity is increasing Fig. 2, with two sub-windows of 5 days (L=2). No localization is used in the experiments; as a result the ensemble size is chosen to be large enough, N= 30 000. Therefore, this test is only a partial assessment. Localization and cycling in the QG model are beyond the scope of this paper. For the finite difference approximation, the parameter τ is set to 10-4 for all experiments. We have performed experiments for incremental 4DVAR and EnKS-4DVAR. The incremental 4DVAR method used conjugate gradients to solve the linearized problem with exact tangent and adjoint models in each iteration, with no ensembles involved. The numerical results are presented as follows.

Figure  shows the objective function values along iterations of the incremental 4DVAR method. The objective function oscillates with the iteration number; therefore, the incremental 4DVAR method without regularization diverges. This divergence is due to the highly nonlinear behavior of the model for a long window (10 days). In such a case, as explained in Sect. , a convergence to a stationary point can be recovered by controlling the step, which is done by introducing an additional regularization term in this study. In order to see the effect of this regularization, we performed EnKS-4DVAR with different values of the regularization parameter γ. Figure  shows the objective function values along iterations for eight different choices of γ. RMSE values along the iterations for the same experiments performed with 4DVAR and EnKS-4DVAR are presented in Table .

Objective function values along incremental 4DVAR iterations for the two-level quasi-geostrophic problem from Sect. .

Objective function values along EnKS-4DVAR with regularization iterations for the two-level quasi-geostrophic problem (Sect. ).

It can be seen from Fig.  that when γ=0, the iterations diverge as expected, since we do not use regularization and we only approximate the linearized subproblem using ensembles. For small values of γ (e.g., γ10-1), the objective function is not monotonically decreasing; hence, the iterations are still diverging even if we use the regularization. Therefore, small values of γ can not guarantee the convergence. For large values of γ (e.g., γ10), we can observe the decrease on the objective function along iterations. Moreover, the fastest decrease on the objective function is obtained for γ=10.

If we look at the RMSE values from Table , we can see that increasing γ beyond an optimal value results in higher RMSE values, and the reduction in RMSE values becomes very slow. In any case, the RMSE values oscillate along the iterations. We note that all RMSE values are lower than the initial RMSE value.

In conclusion, when the regularization is used, the choice of the regularization parameter γ is crucial to ensure the convergence. For instance, for small values of γ, the method can still diverge, and for large values of γ, the objective function decreases, but slowly (and many iterations may be needed to attain some predefined decrease). On the other hand, small γ values result in small RMSE values with oscillation along the iterations, and RMSE values decrease slowly for the larger values of γ. Therefore the regularization parameter should be neither “very small” nor “very large”. An adaptive γ over iterations can be a better compromise, which will be explored in future studies.

Conclusions

We have proposed a stochastic solver for the incremental 4DVAR weak-constraint method. The regularization term added to the Gauss–Newton method, resulting in a globally convergent Levenberg–Marquardt method, maintains the structure of the linearized least-squares subproblem, enabling us to use an ensemble Kalman smoother as a linear solver while simultaneously controlling the convergence. We have formulated the EnKS-4DVAR method and have shown that it is capable of handling strongly nonlinear problems. We have demonstrated that the randomness of the EnKS version used (with perturbed data) eventually limits the convergence to a minimum, but a sufficiently large decrease in the objective function can be achieved for successful data assimilation. On the contrary, we suspect that the randomization may help to increase the supply of the search directions over the iterations, as opposed to deterministic methods locked into one low-dimensional subspace, such as the span of one given ensemble.

We have numerically illustrated the new method on the Lorenz 63 model and the two-level quasi-geostrophic model. We have analyzed the impact of the finite differences parameter τ used to approximate the derivatives of the model and observation operators. We have shown that for τ=1, the iterates obtained from EnKS-4DVAR are equivalent to those obtained from the standard EnKS. Based on computational experiments, it may be better to start with the EnKS (i.e., τ=1) and then to decrease τ in further iterations.

We have demonstrated long-term stability of the method on the Lorenz 63 model and shown that it achieves lower RMSE than standard EnKF for a highly nonlinear problem. This, however, took some parameter tuning, in particular the data error variance.

For the second part of the experiments, we have shown the performance of the EnKS-4DVAR method with regularization on the two-level quasi-geostropic problem, one of the widely used models in theoretical atmospheric studies, since it is simple enough for numerical calculations and it adequately captures an important aspect of large-scale dynamics in the atmosphere. We have observed that the incremental 4DVAR method does not converge for a long assimilation window length, and that the regularization is necessary to guarantee convergence. We have concluded that the choice of the regularization parameter is crucial to ensure the convergence, and different choices of this parameter can change the rate of decrease in the objective function. As a summary, an adaptive regularization parameter can be a better compromise to achieve the approximate solution in a reasonable number of iterations.

The choice of the parameters used in our approach is of crucial importance for the computational cost of the algorithm, for instance the number of iterations to obtain some desired reduction. The exploration in more detail of the best strategies to adapt these parameters' course of the iterations will be studied elsewhere.

The base method, used in the computational experiments here, is using sample covariance. However, there is a priori nothing to prevent the use of more sophisticated variants of EnKS with localization and the covariance inflation, and square root filters instead of EnKS with data perturbation, as is done in related methods in the literature. These issues, as well as the performance on larger and realistic problems, will be studied elsewhere.

Acknowledgements

This research was partially supported by Fondation STAE project ADTAO, the Czech Science Foundation under grant GA13-34856S, and the US National Science Foundation under grant DMS-1216481. A part of this work was done when Jan Mandel was visiting INP-ENSEEIHT and CERFACS, and when Elhoucine Bergou, Serge Gratton, and Ivan Kasanický were visiting the University of Colorado Denver. The authors would like to thank the editor, Olivier Talagrand, reviewer Emmanuel Cosme, and an anonymous reviewer for their comments, which contributed to the improvement of this paper.Edited by: O. TalagrandReviewed by: E. Cosme and one anonymous referee