Spectral diagonal ensemble Kalman filters

A new type of ensemble Kalman filter is developed, which is based on replacing the sample covariance in the analysis step by its diagonal in a spectral basis. It is proved that this technique improves the aproximation of the covariance when the covariance itself is diagonal in the spectral basis, as is the case, e.g., for a second-order stationary random field and the Fourier basis. The method is extended by wavelets to the case when the state variables are random fields, which are not spatially homogeneous. Efficient implementations by the fast Fourier transform (FFT) and discrete wavelet transform (DWT) are presented for several types of observations, including high-dimensional data given on a part of the domain, such as radar and satellite images. Computational experiments confirm that the method performs well on the Lorenz 96 problem and the shallow water equations with very small ensembles and over multiple analysis cycles.

1. Introduction.Data assimilation consists of incorporating new data periodically into computations in progress, which is of interest in many fields, including weather forecasting (e.g., Kalnay, 2003;Lahoz et al., 2010).One data assimilation method is filtering (e.g., Anderson and Moore, 1979), which is a sequential Bayesian estimation of the state at a given time given the data received up to that time.The probability distribution of the system state is advanced in time by a computational model, while the data is assimilated by modifying the probability distribution of the state by an application the Bayes theorem, called analysis.In the methods considered here, data is assimilated in discrete time steps, called analysis cycles, and the probability distributions are represented by their mean and covariance (thus making a tacit assumption that they are at least close to gaussian).When the state covariance is given externally, bayesian estimation becomes the classical optimal statistical interpolation (OSI).The Kalman filter (KF) uses the same computation as OSI in the analysis, but it evolves the covariance matrix of the state in time along with the model state.Since the covariance matrix can be large, the KF is not suitable for high-dimensional systems.The ensemble Kalman filter (EnKF) (Evensen, 2009) replaces the state covariance by the sample covariance computed from an ensemble of simulations, which represent the state probability distribution.It can be proved that the EnKF converges to the KF in the large ensemble limit (Kwiatkowski and Mandel, 2014;Le Gland et al., 2011;Mandel et al., 2011) in the gaussian case, but an acceptable approximation may require hundreds of ensemble members (Evensen, 2009), because of spurious long-distance correlations in the sample covariance due to its low rank.Localization techniques (e.g., Anderson, 2001;Furrer and Bengtsson, 2007;Hunt et al., 2007), essentially suppress longdistance covariance terms (Sakov and Bertino, 2010), which improves EnKF performance for small ensembles.
FFT EnKF (Mandel et al., 2010a,b) was proposed as an alternative approach to localization, based on replacing the sample covariance in the EnKF by its diagonal in the Fourier space.This approach is motivated by the fact that a random field in cartesian geometry is second order stationary (that is, the covariance between the values at two points depends only on their distance vector) if and only if its covariance in the Fourier space is diagonal (e.g., Pannekoucke et al., 2007).On a sphere, an isotropic random field has diagonal covariance in the basis of spherical harmonics (Boer, 1983), so similar algorithms can be developed there as well.However, the stationarity assumption does not allow the covariance to vary spatially.For this reason, the FFT EnKF was extended to wavelet EnKF (Beezley et al., 2011).The use of wavelets results in an automatic localization, which varies in space adaptively.For wavelets, the effect of the diagonal spectral approximation is 1 Institute of Computer Science, Academy of Sciences of the Czech Republic 2 Department of Mathematical and Statistical Sciences, University of Colorado Denver 2. Notation.Vectors in R n or C n are typeset as u and understood to be columns.Random vectors are typeset as X.The entry i of X is denoted by (X) i .Matrices (random or deterministic) are typeset as A, and and A * is the transpose, or conjugate transpose in the complex case.The entry i, j of matrix A is denoted by (A) i,j or a i,j , and A = [a 1 , . . ., a n ] is the writing of a matrix as a collection of columns.Nonlinear operators are typeset as M. The mean value is denoted by E [•], and Var is the variance.N (0, 1) is the normal (gaussian) distribution with zero mean and unit variance, and N (m, C) is the multivariate normal distribution with mean m and covariance . The Frobenius norm of a matrix is 3. Kalman filter and ensemble Kalman filter.The state of the system at time t is described by a random vector X t of length n.The system evolution between two times t 1 and t 2 is given by a function M(., t 1 , t 2 ), so that The goal of the Kalman filter (KF) (Kalman, 1960) is to correct the forecast state of the system X f t to obtain the analysis estimate X a t of the true state X t , given noisy observations Y t = H t X t + t , where H t is an observation operator, i.e., a mapping from state space to a data space, and t ∼ N (0, R t ).When the distributions of the state X t and the data error are gaussian, the analysis satisfies where C t is the covariance of the forecast X f t .In the KF, the state is represented by its mean and covariance, and the mean is transformed also by (3.1) and (3.2).In the rest of the paper, we will drop the time index t and the superscript f, unless there is a danger of confusion.
In the EnKF, the analysis formulas (3.1) and (3.2) are applied to each ensemble member, with the covariance replaced by the sample covariance from the ensemble.The resulting ensemble, however, would underestimate the analysis covariance, which is corrected by a data perturbation by sampling from the data error distribution (Burgers et al., 1998).Denote by X 1 , . . ., X N the forecast ensemble, created either by a perturbation of a background state or by evolving each analysis ensemble member from the previous time step independently by (3.1).Then, the analysis ensemble members are where the sample covariance matrix is and Y j = Y + τ j are the perturbed observations, with τ j ∼ N (0, R) independent.
The advantage of the EnKF update formula (3.2) is that it can be implemented efficiently without having acces to the whole sample covariance matrix C N .On the other hand, the rank of matrix C N is at most N −1, and, in the usual case when N << n, the low rank of the approximation C N of the true forecast covariance C is the biggest drawback of the EnKF.
4. Spectral diagonal EnKF.Let F be an orthonormal transformation matrix, which transform each ensemble member to spectral space, and denote each transformed ensemble member by the additional subscript F, X j F = FX j , j = 1, . . ., N .Since the transformation is orthonormal, the inverse transformation is F * , so F * X j F = X j for each j = 1, . . ., N. The columns of the inverse transform matrix F * are the spectral basis elements u 1 , . . ., u n , i.e., F = [u 1 , . . ., u n ] * .We will also denote the sample covariance of the transformed ensemble with the additional subscript F, The idea of the spectral diagonal Kalman filter is to replace the sample covariance in the update formula (3.3) by only the diagonal elements of sample covariance in spectral space, where • stands for Schur product, i.e., element-wise multiplication.The entries c i,i are the sample variances, computed without forming the whole matrix C N F .The diagonal approximation is transformed back to physical space as and the proposed analysis update is then 5. Error analysis.We will now compare the expected errors of the sample covariance and its spectral diagonal approximation (4.1).Assume that the ensemble members X i ∼ N X, C are independent, and the columns of the inverse spectral transformation F * are eigenvectors u i of the covariance C with the corresponding eigenvalues λ i , (5.1) Equivalently, in the basis {u 1 , . . ., u n }, the covariance FCF * of FX i is diagonal, with the diagonal elements λ i .This is the situation, e.g., when X i are sampled from a second-order stationary random field on a rectangular mesh, and u i is the Fourier basis.In the EnKF, the ensemble members after the first analysis cycle are not independent, because the sample covariance in the analysis step ties them together, but they converge to independent random vectors as the ensemble size N → ∞ (Le Gland et al., 2011;Mandel et al., 2011).The following theorem shows that the spectral diagonal approximation has smaller expected error than the sample covariance, in Frobenius norm.Theorem 5.1 (Error of the spectral diagonal approximation).Let X k ∼ N X, C , k = 1, . . ., N , be independent, and the transformation F satisfy (5.1).Then, the expected squared errors in the Frobenius norm of the sample covariance C N (3.4) and its spectral diagonal approximation (5.3) Proof.Without loss of generality, assume that X = 0.The Frobenius norm of a square 3 in the Appendix now gives (5.2).To prove (5.3), we consider the diagonal entries in the spectral domain, and use Lemma A.3 again.Since the eigenvalues of covariance are always nonnegative, we have λ i λ j ≥ 0, therefore the spectral diagonal covariance decreases the expected squared error of sample covariance: with equality only if all λ i λ j = 0, i = j, that is, only in the degenerate case when the exact covariance C has rank at most one.To compare the error terms further, note that shows that the error of the sample covariance depends on the 1 norm of the eigenvalues sequence, while the error of the spectral diagonal approximation depends only on the 2 norm, which is weaker as the state dimension n → ∞.The improvement depends on the rate of decay of the eigenvalues as the index k → ∞.Note that the eigenvalues of the covariance (if it exists) of a random element in an infinitely dimensional Hilbert space must satisfy the trace condition , Da Prato (2006).The eigenvalues of the covariance in many physical systems obey a power law, λ k ≈ k −α with α > 1, e.g., Gaspari and Cohn (1999).Suppose that λ k = ck −α and n → ∞.Then, Other considerations of similar ratios can be found in Furrer and Bengtsson (2007).Theorem 5.1 is related to but different from the estimate in Furrer and Bengtsson (2007, eq. ( 12)), which applies to the case when the mean known exactly rather than the sample covariance here.Also, the analysis in Furrer and Bengtsson (2007) is in the physical domain rather than in the spectral domain.
6. Spectral EnKF algorithms.We will show that the analysis step can be implemented very efficiently in cases of practical interest.We drop the ensemble members index in all update formulas to make them more readable.Note that when using all the following formulas, it is necessary to perturb the observations.6.1.State consisting of only one variable, completely observed.Assume that the state consists of one variable, e.g., X ∈ R n , and that we can observe the whole system state, i.e., the observation function is the identity, H = I, and observations are Y ∈ R n .Assume also that the observation noise covariance matrix is cI, where c > 0 is a constant.In this special case, we can do the whole update in the spectral space, since it is possible to transform the innovation to the spectral space, and the analysis step (4.4) becomes Note that the matrices D N F and D N F + cI are diagonal, so any operation with them, such as inversion or multiplication, is very cheap.The matrix F is never formed explicitly.Rather, the multiplications of F and F * times a vector are implemented by the fast Fourier transform (FFT) or discrete wavelet transform (DWT).This is the base case of the FFT EnKF (Mandel et al., 2010a,b) and the wavelet EnKF (Beezley et al., 2011), respectively.6.2.Multiple variables on the same grid, one variable completely observed.In a typical model, such as numerical weather prediction, the state consist usually of more than one variable.Assume the state consist of m different variables all based on the same grid of length n.Then each variable can be transformed to the spectral space independently, and we have the state vector X ∈ R n•m and the transformation matrix in the block form where each block X 1 is a vector of length n and F is n by n transformation matrix.
Assume also that the whole state of the first variable X 1 is observed, and again the covariance of observation error is cI.In this case, the observation operator is one by m block matrix of the form H = [I 0 • • • 0].In the proposed method, we approximate the crosscovariancess between the variables also by the diagonal of the sample covariance in spectral space, , where D i,j is matrix containing only diagonal elements from the sample covariance matrix between transformed variables FX i and FX j .With this notation, the analysis step (4.4) becomes Note that again the matrix to be inverted is diagonal and full-rank, and the transformation F is implemented by call to FFT or DWT, so the operations are computationally very efficient.A related method using interpolation and projection was proposed for the case when the model variables are defined on non-matching grids (Beezley et al., 2011).
6.3.Multiple variables on the same grid, one variable observed at a small number of points.This situation occurs, e.g., when assimilated observations are from discrete stations.In this case, the observation matrix is H = H 1 0 • • • 0 , where H 1 has a small number of rows, one for each data points, and X and F are the same as in Eq. (6.1).We substitute the diagonal spectral approximation into the analysis step (4.4) directly, and (6.2) becomes The solution of a system of linear equations with the matrix H 1 F * D N 1,1 FH * 1 + R in (6.3) does not present a problem, because its dimension is small by assumption, and FH * 1 is easy to compute explicitly by the action of FFT on the columns of H * 1 .Note that in this case, the data noise covariance R may be arbitrary.
6.4.State consisting of more variables, one partly observed.Consider the situation when the number of observation points is too large for the method of Sect.6.2 to be feasible, but only one variable on a contiguous part of the mesh is observed.The typical example of this type may be radar images, which cover typically only a part of domain of the numerical weather prediction model.
Suppose that observations (Y ) j of the values of the first variable (X 1 ) j are available only for a subset of indices j ∈ M ⊂ {1, . . ., m}.Augment the forecast state by an additional variable X 0 .For j = 1, . . ., m, set (X 0 ) j = (X 1 ) j if j ∈ M , (X 0 ) j = (Y ) j = 0 if j / ∈ M .We can now use the analysis update (6.2) with the augmented state X = (X 0 , X 1 , . . ., X m ) and observation Y = (Y , 0, . . ., 0), to get the augmented analysis X a = (X a 0 , X a 1 , . . ., X a m ), and drop X a 0 .Note that the innovations to the original variables are propagated through the spectral diagonal approximation of cross covariance between the original and augmented variables.Since this covariance is not spatially homogeneous, a Fourier basis will not be appropriate, and computational experiments in Sect.7 confirm that wavelets indeed perform better.

Computational experiments.
In all experiments, we use the usual twin experiment approach.A run of the model from one set of initial conditions is used to generate a sequence of states, which plays the role of truth.Data values were obtained by applying the observation operator to the truth; the data perturbation was done only for ensemble members within the assimilation algorithm.A second set of initial conditions is used for data assimilation and for a free run, with no data assimilation, for comparison.The error of the free run should be an upper bound on the error of a reasonable data assimilation method.
We evaluate the filter by the root mean square error, RMSE = 1 where Xa is the analysis ensemble mean, X is the true state, and n is the number of the grid points x i .In the case when the state consist of more than one variable, such as in the shallow water equations, we evaluate the error of each variable independently.While the purpose of a single analysis step is to balance the uncertainties of the state and the data rather than minimalize the RMSE, the RMSE values over multiple time steps are used to evaluate how well the data assimilation fulfills its overall purpose to track the truth.
We evaluate the RMSE of the the standard EnKF, marked as EnKF in the legend of the figures, and the spectral diagonal EnKF with the discrete sine transform, discrete cosine transform, and the Coiflet 2,4 discrete wavelet transform (Daubechies, 1992), marked as DST, DCT, and DWT, respectively.
7.1.Lorenz 96.In the Lorenz 96 model (Lorenz, 2006), the state consists of one variable X t ∈ R K , X t = (x 1 , . . ., x K ), governed by the differential equations where the values of x j−K and x j+K are defined to be equal to x j for each j = 1, . . ., K, and F is a parameter.We set the parameter F = 8, which causes the system to be strongly chaotic.The timestep of model was set to 0.01 s and the analysis cycle was 1 s.The data covariance was diagonal, with diagonal entries equal to 0.04.The ensemble and the initial conditions for the truth were generated by sampling from N (0.0005, 0.01).The the ensemble and the truth were moved forward for 10 second, then the assimilation starts.
In the case when the whole state is observed, spectral filters with ensemble size N = 4 (Fig. 7.1a) already decrease the error significantly compared to a run with no assimilation, while the standard EnKF actually increases the error.For all filters, the error eventually decreases with the ensemble size at the standard rate N −1/2 , but spectral EnKF shows the error decrease from the start, while the EnKF lags until the ensemble size is comparable to the state dimension, and even then its RMSE is significantly higher (Fig. 7.1b).Next, consider the case when only the first m points of a grid are observed.In the legend, DCT-S and DWT-S are the method with the discrete cosine transform, and the Coiflet 2,4 discrete wavelet transform, respectively, with the standard analysis update (4.4), while DCT-A and DWT-A use the augmented state method from Sect.6.4. Figure 7.2 shows that the spectral diagonal method decrease the RMSE, while the standard EnKF is unstable.This observation is consistent with the result of Kelly et al. (2014), which shows that, for a class of dynamical systems, the EnKF remains within a bounded distance of truth if sufficiently large covariance inflation is used and if the whole state is observed.The augmented state method DWT-A with wavelet transformation gave almost the same analysis error as DCT-S, which is using the spectral diagonal filter with the exact observation matrix, while the cosine basis, which implies a homogenenous random field, resulted in a much larger error (method DCT-A).A similar behavior was seen with a smaller number of observed points as well, but the error reduction in spectral diagonal EnKF was smaller (not shown).
7.2.Shallow water equations.The shallow water equations can serve as a simplified model of atmospheric flow.The state Y = (h, u, v) consists of water level height h and momentum u, v in x and y directions, governed by the differential equations of conservation of mass and momentum, where g is gravity acceleration, with reflective boundary conditions, and without Coriolis force or viscosity.The equations were discretized on a rectangular grid size 64 × 64 with horizontal distance between grid points 150 km and advanced by the Lax-Wendroff method with the time step 1 s.
The initial values where water level h = 10 km, plus Gaussian water raise of height 1 km, width 32 nodes, in the center of the domain, and u = v = 0. See Moler (2011, Chapter 18) for details.
We have used two independent initial conditions, one used for the truth and another for the ensemble and the free run.The only difference was the location of the initial wave.Both states were moved forward for 4 hours.Then the ensemble was created by adding random noise (with prescribed background covariance).Then, all states were moved forward for another hour, and assimilation starts 5 h after the model initialization.All assimilation methods start with the same forecast in the first assimilation cycle.
The background covariance for initial ensemble perturbation was estimated using samples taken every second from time t start = 4 h to time t end = 6 h, and modified by tapering the sample covariance matrix C N as B = C N • T, where the tapering matrix T had the block structure where the entry between nodes (i a , j a ) and 2D tensor product FFT and DWT were used in the diagonal spectral EnKF.The observation error was taken with zero mean and variance 1000 m 2 in h and 1000 kg m s −1 in u and v.The forecast ensemble was created by adding random noise with the covariance B 4 h after the model initialization.To relax the ensemble members, the model was run for another hour before the assimilation started.So the first assimilation was performed 5 hours after the model initialization.After the first assimilation, another 4 assimilation cycles were performed every 60 s.
When the full state is observed, the spectral diagonal method decreased the RMSE in all variables dramatically (Fig. 7.3), unlike the standard EnKF.When only the water level is observed, the RMSE in spectral diagonal EnKF decreases less, but still much more that in the standard EnKF (Fig. 7.4).

Conclusions.
A version of the ensemble Kalman filter was presented, based on replacing the sample covariance by its diagonal in the spectral space, which provides a simple, efficient, and automatic localization.We have demonstrated efficient implementations for several classes of observation operators and data important in applications, including high-dimensional data defined on a continuous part of the domain, such as radar or satellize images.The spectral diagonal was proved rigorously to give a lower mean square error that the sample covariance.Computational experimens with the Lorenz 96 problem and the shallow water equations have shown that the method that the analysis error drops very fast for small ensembles, and the method is stable over multiple analysis cycles.The paper provides a new technology for data assimilation, which can work with minimal computational resources, because an implementation needs only an orthogonal transformation, such as the fast Fourier or discrete wavelet transform, and manipulation of vectors and diagonal matrices.Therefore, it should be of interest in applications.
Proof.The sample covariance is unbiased estimate of the true covariance, so from Lemma A.2, 3) The random variables θ i,k are i.i.d., so it follows that Together, we get The variance of the off-diagonal entry (C N F ) i,j , i = j, is Fig. 7.1.Mean RMSE from 10 realizations for Lorenz 96 problem, the whole state observed, state dimension 256 (a) increasing analysis cycles with ensemble size 4, (b) increasing ensemble size, analysis cycle 1.