The behavior of the iterative ensemble-based data assimilation algorithm is discussed. The ensemble-based method for variational data assimilation problems, referred to as the 4D ensemble variational method (4DEnVar), is a useful tool for data assimilation problems. Although the 4DEnVar is derived based on a linear approximation, highly uncertain problems, in which system nonlinearity is significant, are solved by applying this method iteratively. However, the ensemble-based methods basically seek the solution within a lower-dimensional subspace spanned by the ensemble members. It is not necessarily trivial how high-dimensional problems can be solved with the ensemble-based algorithm which employs the lower-dimensional approximation based on the ensemble. In the present study, an ensemble-based iterative algorithm is reformulated to allow us to analyze its behavior in high-dimensional nonlinear problems. The conditions for monotonic convergence to a local maximum of the objective function are discussed in a high-dimensional context. It is shown that the ensemble-based algorithm can solve high-dimensional problems by distributing the ensemble in different subspace at each iteration. The findings as the results of the present study were also experimentally supported.

The 4D ensemble variational method (4DEnVar;

The 4DEnVar algorithm is derived based on a low-dimensional linear approximation of the high-dimensional nonlinear system model. If the uncertainties in state variables are small, then the solution could be found within the range where a linear approximation is valid. However, geophysical
systems are often highly uncertain. If the scale of uncertainty is much larger than the range of linearity, a linear approximation would not be justified. In atmospheric applications, uncertainty can usually be reduced
by taking sufficient spin-up time. On the other hand, in some geophysical applications, it is difficult to obtain a sufficiently long sequence of observations to allow spin-up. For example, in data assimilation for the interior of the Earth, such as lithospheric plates (e.g.,

Several studies have suggested that estimations in nonlinear problems can be improved by iterative algorithms in which the ensemble is repeatedly updated in each iteration (e.g.,

The present study aims to reformulate an ensemble-based iterative algorithm in order to analyze its behavior in high-dimensional nonlinear problems.
We then explore the conditions for achieving monotonic convergence to a local maximum of the objective function in a high-dimensional nonlinear context. The monotonic convergence means that the discrepancies between estimates and observations are reduced in each iteration. It is ensured that the algorithm would attain a satisfactory result in high-dimensional problems if the ensemble is distributed in a different subspace at each iteration. This study is originally motivated by data assimilation into a geodynamo model to which the author contributed

In the following, the system state at time

The maximization of the objective function

For convenience, we define the following matrix

The approximate objective function

Since Eqs. (

Where the initially prepared ensemble is used, it is unlikely that a better solution than Eq. (

In the following, we combine the vectors of the whole time sequence from

The form of

We can consider various ways to obtain an ensemble satisfying Eq. (

If the ensemble is updated according to Eq. (

The iterative algorithm is summarized in Algorithm 1. The procedures in this iterative algorithm are similar to those in the ensemble-based multiple data assimilation method

Equation (

We hereinafter assume that

In practical cases, the Jacobian matrix

The fourth term on the right-hand side of Eq. (

At the

the surrogate function

and the Hessian of

Here we consider the following surrogate function

The above discussion is valid regardless of the choice of the ensemble

Based on the foregoing, convergence to a local maximum of the objective function

Our formulation refers to the result of a simulation run initialized at the

It is also important to appropriately choose the parameter

The algorithm in Sect.

There are various methods for updating the ensemble including the methods mentioned in Sect.

As described in the previous section, the use of Eq. (

Preceding studies have already demonstrated the usefulness of the ensemble-based iterative algorithms for various data assimilation problems. Estimation with the ensemble update in Eq. (

In this section, we employ the Lorenz 96 model

We compare two ensemble updating methods of Eqs. (

The value of the objective function

The value of the objective function

Figures

Figures

The value of the objective function

The value of the objective function

Figure

The temporal evolution started at the initial guess (red line), and the estimated evolutions after the second (yellow line), 10th (green line), and 30th iterations (blue line) in one of the

The root mean square errors over all the

In order to closely investigate the effect of

The value of the objective function

The value of the objective function

We also conducted experiments with a higher-dimensional system. The method with a randomly generated ensemble was applied to the Lorenz 96 model with 400 variables (

The value of the objective function

In Fig.

The temporal evolution started at the initial guess (red line),
and the estimated evolutions at the second (yellow line), 10th (green line),
and 30th iterations (blue line) in one of the

The root mean square errors over all the

The ensemble variational method is derived under the assumption that a linear approximation of a dynamical system model is valid over a range of uncertainty. This linear approximation is not valid in such problems where
the scale of uncertainty is much larger than the range of linearity. However, a local maximum of the log-likelihood or log-posterior function can be attained by updating the ensemble iteratively – even in cases with a large uncertainty. The present paper assessed the influence of system nonlinearity
on this iterative algorithm after considering the nonlinear terms of the system function

In applying the iterative algorithm discussed in this paper, the choice of the parameter

One issue peculiar to the ensemble-based method is the rank deficiency, which occurs when the ensemble size is smaller than the dimension of the initial state

Compared with the adjoint method, which is a conventional variational method for 4D variational problems, the convergence rate of this iterative method would be poorer because it employs an ensemble approximation within a lower-dimensional subspace at each iteration. Nonetheless, we can say that the iterative ensemble-based method is potentially useful because it is much easier to implement. While the adjoint method requires an adjoint code which is usually time-consuming to develop, the ensemble-based method can solve the same problem without requiring an adjoint code. This paper mainly considers data assimilation problems. However, the framework of the iterative ensemble variational method is also applicable to general nonlinear inverse problems as far as the Gaussian assumption in Eq. (

In the following, it is described how the iteration can be performed
without computing the inverse of

where

The code for reproducing the experimental results shown in Sect.

The author declares that there is no conflict of interest.

This research has been supported by the Japan Society for the Promotion of Science (grant no. 17H01704) and by PRC JSPS CNRS, the Bilateral Joint Research Project “Forecasting the geomagnetic secular variation based on data assimilation”.

This paper was edited by Amit Apte and reviewed by three anonymous referees.