The ensemble Kalman smoother (EnKS) is used as a linear least-squares solver in the Gauss–Newton method for the large nonlinear least-squares system in incremental 4DVAR. The ensemble approach is naturally parallel over the ensemble members and no tangent or adjoint operators are needed. Furthermore, adding a regularization term results in replacing the Gauss–Newton method, which may diverge, by the Levenberg–Marquardt method, which is known to be convergent. The regularization is implemented efficiently as an additional observation in the EnKS. The method is illustrated on the Lorenz 63 model and a two-level quasi-geostrophic model.

Four-dimensional variational data assimilation (4DVAR) is
a dominant data assimilation method used in weather forecasting centers
worldwide. 4DVAR attempts to reconcile model and data variationally, by
solving a large weighted nonlinear least-squares problem. The unknown is
a vector of system states over discrete points in time, when the data are
given. The objective function minimized is the sum of the squares of the
differences of the initial state from a known background state at the initial
time and the differences of the values of the observation operator and the
data at every given time point. In the weak-constraint 4DVAR

In the incremental approach

The Kalman filter is a sequential Bayesian estimation of the Gaussian state
of a linear system at a sequence of discrete time points. At each of the time
points, the use of the Bayes theorem results in an update of the state,
represented by its mean and covariance. The Kalman smoother considers all
states within an assimilation time window to be a large composite state.
Consequently, the Kalman smoother can be obtained from the Kalman filter by
simply applying the same update as in the filter to the past states as well.
However, historically, the focus was on efficient short recursions

It is well known that weak-constraint 4DVAR is equivalent to the Kalman
smoother in the linear case and when all observations are in the assimilation
window. Use of the Kalman smoother to solve the linear least squares in the
Gauss–Newton method is known as the iterated Kalman smoother, and
considerable improvements can be obtained against running the Kalman smoother
only once

The Kalman filter and smoother require maintenance of the covariance of the
state, which is not feasible for large systems, such as in numerical weather
prediction. Hence, the ensemble Kalman filter (EnKF) and ensemble Kalman
smoother (EnKS)

In this paper, we use the EnKS as a linear least-squares solver in 4DVAR. The
EnKS is implemented in the physical space and with randomization. The
ensemble approach is naturally parallel over the ensemble members. The rest
of the computational work is relatively cheap compared to the ensemble of
simulations, and parallel dense linear algebra libraries can be used;
however, in high-dimensional systems or for a large lag, the storage
requirements can be prohibitive

Combinations of ensemble and variational approaches have been of considerable
recent interest. Estimating the background covariance for 4DVAR from an
ensemble was one of the first connections

The first methods that use ensembles for more than computing the covariance
minimized the 3DVAR objective function in the analysis step.
The maximum likelihood ensemble filter (MLEF) method by

The iterated ensemble Kalman filter by

It is well known that for good practical performance, ensemble methods need
to be modified by localization to improve the sampling error. Ensemble
methods can be localized in multiple ways

The paper is organized as follows. In Sect.

For vectors

The least-squares problem (Eq.

The function minimized in Eq. (

We present the EnKF and EnKS algorithms, essentially following

Initialize

For

advance in time:

The analysis step is

Denote by

The EnKS is obtained by applying the same analysis step as in EnKF
(Eq.

Initialize:

For

advance in time:

Compute the anomalies of the ensemble in the state space and in the
observation space.

The analysis step:

Comparing Eqs. (

We apply the EnKS algorithm (Eqs.

The Gauss–Newton method may diverge, but convergence to a stationary point
of Eq. (

Under suitable technical assumptions, the Levenberg–Marquardt method is
guaranteed to converge globally if the regularization parameter

Similarly as in

We obtain the following algorithm

Initialize

Incremental 4DVAR (Eq.

For

Advance the ensemble of increments

Compute the anomalies of the ensemble in the 4-D state space and in the
observation space.

The first analysis step:

If

If

Complete the approximate incremental 4DVAR iteration: update

Note that for small

It can be proven that for small

Proof: indeed, Eq. (

In this section, we investigate the performance of the EnKS-4DVAR method,
described in this paper, by solving the nonlinear least-squares problem
(Eq.

We first consider experiments where the regularization is not necessary to
guarantee the convergence (i.e.,

Experiments where the regularization is necessary to guarantee the
convergence are shown in Sect.

Note that for the experiments presented here, we do not use localization;
hence, we choose large ensemble sizes. In all experiments, the regularization
covariance

The Lorenz 63 equations

The state at time

The
Lorenz attractor; initial values

To evaluate the performance of the EnKS-4DVAR method, we will test it using
the classical twin experiment technique, which consists in fixing an initial
true state, denoted by

We perform numerical experiments without model
error. The initial truth is set to

Root
square error given by Eq. (

Box plots of objective function
values for the Lorenz 63 problem. From the left to the right and from the top
to the bottom, the figures correspond to the results of the first, second,
third and fourth iterations, respectively. The whole state is observed.
Ensemble size is

Same as
Fig.

Figure

From Table

The root mean square error given by
Eq. (

Mean of the objective
function from 30 runs of the EnKS-4DVAR algorithm for the Lorenz 63 problem
and for different values of

Now we investigate the influence of the finite differences parameter

Table

We can conclude that, for this toy test case at least, the method was
insensitive to the choice of

So far, we have studied the impact of the use of the stochastic solver for a
single assimilation window only. Now we test the overall long-term
performance. Consider again the Lorenz 63 model (Eq.

We also compare the proposed method with the standard EnKF with ensemble size

Figure

The EnKS-4DVAR algorithm has been implemented in the Object Oriented
Prediction System (OOPS)

Comparison of the RMSE
between EnKF and EnKS-4DVAR from the twin experiment for the Lorenz 63 model.
EnKS-4DVAR has better performance for the larger time interval between the
observations as the model become more nonlinear. See Sect.

The two-layer quasi-geostrophic channel model is widely used in theoretical atmospheric studies, since it is simple enough for numerical calculations and it adequately captures an important aspect of large-scale dynamics in the atmosphere.

The two-layer quasi-geostrophic model equations are based on the
non-dimensional quasi-geostrophic potential vorticity, whose evolution
represents large-scale circulations of the atmosphere. The quasi-geostrophic
potential vorticity on the first (upper) and second (lower) layers can be
written, respectively, as

Potential vorticity in each layer is conserved and thus is described by

Given the potential vorticity at a fixed time, Eq. (

The domain for the experiments is 12 000

The performance of EnKS-4DVAR with regularization is analyzed by using twin
experiments (Sect.

The truth is generated from a model with layer depths of

RMSE values calculated by
Eq. (

For all the experiments presented here, observations of non-dimensional
stream function, vector wind and wind speed were taken from a truth of the
model at

The background error covariance matrix (matrix

We perform one cycle for the experiments. The window length is set to

Figure

Objective function values along incremental
4DVAR iterations for the two-level quasi-geostrophic problem from
Sect.

Objective function values along
EnKS-4DVAR with regularization iterations for the two-level quasi-geostrophic
problem (Sect.

It can be seen from Fig.

If we look at the RMSE values from Table

In conclusion, when the regularization is used, the choice of the
regularization parameter

We have proposed a stochastic solver for the incremental 4DVAR weak-constraint method. The regularization term added to the Gauss–Newton method, resulting in a globally convergent Levenberg–Marquardt method, maintains the structure of the linearized least-squares subproblem, enabling us to use an ensemble Kalman smoother as a linear solver while simultaneously controlling the convergence. We have formulated the EnKS-4DVAR method and have shown that it is capable of handling strongly nonlinear problems. We have demonstrated that the randomness of the EnKS version used (with perturbed data) eventually limits the convergence to a minimum, but a sufficiently large decrease in the objective function can be achieved for successful data assimilation. On the contrary, we suspect that the randomization may help to increase the supply of the search directions over the iterations, as opposed to deterministic methods locked into one low-dimensional subspace, such as the span of one given ensemble.

We have numerically illustrated the new method on the Lorenz 63 model and the
two-level quasi-geostrophic model. We have analyzed the impact of the finite
differences parameter

We have demonstrated long-term stability of the method on the Lorenz 63 model and shown that it achieves lower RMSE than standard EnKF for a highly nonlinear problem. This, however, took some parameter tuning, in particular the data error variance.

For the second part of the experiments, we have shown the performance of the EnKS-4DVAR method with regularization on the two-level quasi-geostropic problem, one of the widely used models in theoretical atmospheric studies, since it is simple enough for numerical calculations and it adequately captures an important aspect of large-scale dynamics in the atmosphere. We have observed that the incremental 4DVAR method does not converge for a long assimilation window length, and that the regularization is necessary to guarantee convergence. We have concluded that the choice of the regularization parameter is crucial to ensure the convergence, and different choices of this parameter can change the rate of decrease in the objective function. As a summary, an adaptive regularization parameter can be a better compromise to achieve the approximate solution in a reasonable number of iterations.

The choice of the parameters used in our approach is of crucial importance for the computational cost of the algorithm, for instance the number of iterations to obtain some desired reduction. The exploration in more detail of the best strategies to adapt these parameters' course of the iterations will be studied elsewhere.

The base method, used in the computational experiments here, is using sample covariance. However, there is a priori nothing to prevent the use of more sophisticated variants of EnKS with localization and the covariance inflation, and square root filters instead of EnKS with data perturbation, as is done in related methods in the literature. These issues, as well as the performance on larger and realistic problems, will be studied elsewhere.

This research was partially supported by Fondation STAE project ADTAO, the Czech Science Foundation under grant GA13-34856S, and the US National Science Foundation under grant DMS-1216481. A part of this work was done when Jan Mandel was visiting INP-ENSEEIHT and CERFACS, and when Elhoucine Bergou, Serge Gratton, and Ivan Kasanický were visiting the University of Colorado Denver. The authors would like to thank the editor, Olivier Talagrand, reviewer Emmanuel Cosme, and an anonymous reviewer for their comments, which contributed to the improvement of this paper.Edited by: O. TalagrandReviewed by: E. Cosme and one anonymous referee