A Mechanism for Catastrophic Filter Divergence in Data Assimilation for Sparse Observation Networks

We study catastrophic filter divergence in data assimilation procedures whereby the forecast model develops severe numerical instabilities leading to a blow up of the solution. Catastrophic filter divergence occurs in sparse observational grids with small observational noise for intermediate 5 observation intervals and finite ensemble sizes. Using a minimal five dimensional model we establish that catastrophic filter divergence is caused by the filtering procedure producing analyses which are not consistent with the true dynamics, and stiffness caused by the fast attraction of the inconsistent 10 analyses towards the attractor during the forecast step.


Introduction
Data assimilation is the procedure to find the best estimation of the state of a dynamical system given a forecast model with possible model error and noisy observations at discrete observation intervals (Kalnay, 2002;Majda and Harlim, 2012).The presence of the often chaotic nature of the underlying nonlinear dynamics, as well as the sparseness of the observational network, significantly complicates this process.In the setting of ensemble-based filters (Evensen, 1994(Evensen, , 2006)), finite ensemble sizes may introduce additional sources of error (see, for example, Ehrendorfer, 2007).Insufficient ensemble size typically causes an underestimation of the error variances, which may ultimately lead to filter divergence when the filter trusts its own forecast and ignores the information provided by the observations.This filter divergence is caused by ensemble members aligning with the most unstable directions (Ng et al., 2011) and is exacerbated by large observational noise.Finite size effects may also lead to spurious overestimating correlations between otherwise uncorrelated variables (Hamill et al., 2001;Whitaker et al., 2004Whitaker et al., , 2009;;Liu et al., 2008;Sacher and Bartello, 2008), spoiling the overall analysis skill.Harlim and Majda (2010) and Gottwald et al. (2011) documented a new type of filter divergence which is characterised by the forecast model diverging to machine infinity.It was shown that this catastrophic filter divergence occurs in sparse observational networks with small observational noise for moderate observation intervals, in contrast to the classical filter divergence described in the previous paragraph.
We will establish here the mechanism leading to this instability in a minimal low-dimensional model: in a sparse observational grid, finite ensemble sizes cause the ensemble to align, and in the case of (sufficiently) small observational noise generate analyses which are not consistent with the actual dynamics and are located in phase space off the attractor.If the attraction towards the attractor is sufficiently strong, the subsequent forecast step attempts to integrate a stiff dynamical system, which may cause the integrator to develop numerical instabilities.
In Sect. 2 we introduce the minimal model for which catastrophic filter divergence is studied.We briefly describe ensemble Kalman filters in Sect.3. Numerical results are presented in Sect.4, and the mechanism for catastrophic filter divergence is established.We conclude with a discussion in Sect. 5.

Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
We study the Lorenz-96 (Lorenz, 1996) with z = (z 1 , • • • , z D ) and periodic z i+D = z i in a fivedimensional setting.We use negative forcing here, which allows strong mixing with small dimension D. We consider here D = 5 with F = −16 (Abramov and Majda, 2006).For these parameters we find as Lyapunov exponents λ = (2.72,0.09, −0.09, −1.83, −5.89) for an integration lasting 250 time units.One of the Lyapunov exponents should be zero, corresponding to the flow direction; due to slow convergence this is only approximately satisfied.Note that 5 i=1 λ i = lim t→∞ 1 t Tr(M(t))dt, where M is the linearized vector field of Eq. ( 1), and hence 5 i=1 λ i = −5.Using the Kaplan-Yorke dimension (see for example Schuster and Just, 2005), this suggests that the attractor has a fractal dimension of D attr = 4.15, and trajectories are on average attracted to this manifold with the fast rate λ 5 = −5.89.The climatic mean and variance are estimated from a long time trajectory as z = −2.47 and σ 2 clim = 33.7,respectively.We remark that the system in Eq. ( 1) has not been chosen to model any physical system but rather for its simplicity in addressing the phenomenon of catastrophic blow-up.We will report as well on results with F = 8, which is less chaotic with σ 2 clim = 13.1, λ = (0.474, 0.003, −0.523, −1.315, −3.636) and D attr = 2.9.
We assume that observations of the variables are given at equally spaced discrete observation times t i with observation interval t obs .We observe only one variable z 1 .It is well known that the Kalman filter is suboptimal for dynamical systems which are nonlinear and involve non-Gaussian statistics.It is pertinent to mention that although the fivedimensional Lorenz system in Eq. ( 1) is highly nonlinear, its probability density function is near-Gaussian for F = −16, but highly non-Gaussian for F = 8.The Lorenz system in Eq. ( 1) is assimilated using an ensemble transform Kalman filter (ETKF) (Tippett et al., 2003;Wang et al., 2004), which is briefly described in the following section.

Ensemble Kalman filter
In an ensemble Kalman filter (EnKF) (Evensen, 2006), an ensemble with k members The ensemble is split into its mean and its ensemble deviation matrix where e = [1, . . ., 1] T ∈ R k .The ensemble deviation matrix Z is used to provide a Monte Carlo estimate of the forecast covariance matrix Note that P f (t) is rank-deficient if the ensemble size k is smaller than the rank of the covariance matrix.
Given the forecast ensemble Z f = Z(t i − ) and the associated forecast error covariance matrix (or the prior) P f (t i − ), the actual Kalman analysis (Kalnay, 2002;Evensen, 2006;Simon, 2006) updates a forecast into a so-called analysis (or the posterior).Variables at times t = t i − are evaluated before taking observations y o into account in the analysis step, and variables at times t = t i + are evaluated after the analysis step when the observations, taken at t = t i , have been taken into account.Observations y o ∈ R n can be expressed as a perturbed truth according to where the observation operator H : R D → R n maps from the whole space into observation space, and r o ∈ R n is i.i.d.observational Gaussian noise with associated error covariance matrix R o and zero mean.
In the first step of the analysis, the forecast mean zf is updated to the analysis mean where the Kalman gain matrix is defined as The analysis covariance P a is given by To calculate an ensemble Z a which is consistent with the analysis error covariance P a , and which therefore needs to satisfy we use the method of deterministic ensemble square root filters (Simon, 2006) which expresses the analysis ensemble as a linear combination of the forecast ensemble.In particular we use the method proposed in Tippett et al. (2003) and Wang et al. (2004), the so-called ensemble transform Kalman filter (ETKF).Alternatively one could have chosen the ensemble adjustment filter (Anderson, 2001) or the continuous Kalman-Bucy filter, which does not require the inversion of matrix inverses (Bergemann et al., 2009).A new forecast Z(t i+1 − ) is then obtained by propagating Z a with the full nonlinear dynamics to the next time of observation.The numerical results presented in the next section are obtained with this method.

The genesis of catastrophic filter divergence
We observe only one of the five variables z i (without loss of generality we use z 1 ) and generate observations y o from the truth by adding Gaussian observational noise with small observational error covariance R o = 0.01 after equal observation intervals t obs .We use in the following a fourth-order Runge-Kutta method to integrate forward in time the system in Eq. ( 1) during the forecasting step.
In Fig. 1 we show an instance of catastrophic filter divergence for dt = 0.025 and t obs = 0.05 where we used k = 6 ensemble members (so the forecast error covariance matrix is not necessarily rank-deficient).Besides the maximal absolute amplitude of the analysis ensemble, we show the norm error E of the analysis evaluated at each analysis cycle t i between the truth z t and the ensemble mean za .After t i = 62 the norm error becomes machine infinity, due to the forecast model developing a numerical instability.The genesis of the blow-up is clearly seen from Fig. 1: until t 1 ≈ 55.5 the filter is stable and the analysis tracks where the norm error may be even smaller than the observational error (see the inset in Fig. 1).This is followed by a non-tracking episode lasting to t i ≈ 59 in which the norm error evolves around a mean value of approximately suggesting that the analysis is exploring the attractor, uncorrelated from the truth and not controlled by the observations anymore.This episode precedes the actual blow-up episode of the forecast integrator in which the norm error grows to machine infinity.
In order to quantify the propensity for blow-up, we count the number N b of blow-ups that occur before a total of 5000 simulations have terminated without blow-up.A single successful simulation consists of n a = 4000 analysis cycles.Simulations differ in truth, observations and in the initial ensemble with variance 1.The proportions of blow-ups is then given by Note that S b depends on the number of data assimilation cycles n a .In Fig. 2 we show S b as a function of the observation interval t obs for several values of the integration time step dt.It is seen that blow-up occurs for moderate observation time intervals.No blow-up occurs for sufficiently small or sufficiently large values of t obs .The percentage of blowups as well as the range of t obs for which blow-up occurs is reduced by reducing the integration time step, establishing blow-up as a numerical instability of the forecast model.Additionally, we performed simulations with the forcing parameter in Eq. ( 1) chosen as F = 8, corresponding to less chaotic dynamics.We found similar behaviour; however, in line with the less chaotic nature of the system when compared to F = −16, blow-up develops for larger values of the integration time step dt.All results presented in the following are for F = −16.
To obtain meaningful statistics for blow-up which are independent of the number of the analyses cycles n a , we estimate the number of assimilation cycles τ i before blow-up occurs.To generate statistics of these blow-up times, we numerically calculated τ i for 10 000 instances of blow-up where we allowed for a maximum of n a = 250 000 assimilation cycles.In Fig. 3 we show the empirical cumulative distribution function P b (τ i ) for the blow-up times.The results suggest that catastrophic blow-up is a Poisson process with cumulative probability distribution function where τb denotes the mean blow-up time.The Poisson character of blow-up suggests that blow-up is not an accumulative process, but rather that blow-up is a random process with each assimilation having the same probability of blowup, independent of previous assimilations.Linear regression of the curves in Fig. 3 yields for the average blow-up times τb = 62 633 ( t obs = 0.005), τb = 3324 ( t obs = 0.05) and τb = 6160 ( t obs = 0.1).This is consistent with the blow-up statistics S b for fixed number of assimilation cycles n a and shows that the probability of blow-up has a maximum for an intermediate value of the observation interval t obs for a specified integration time step dt (cf.red curve with circles in Fig. 2).
We now look at the dependency of the propensity for blow-up on the observational variance R o .Figure 4 shows that catastrophic filter blow-up requires the observational noise to be sufficiently small but non-zero.For the smallest value of the observational noise we used with R o = 10 −6 , we still observed S b = 0.07 for n a = 4000, dt = 0.0025 and t obs = 0.05.This is in stark contrast to the traditional filter divergence which occurs for sufficiently large observational noise.
We propose that catastrophic filter divergence is caused by insufficient ensemble size paired with sufficiently small observational noise.We have checked that by increasing the ensemble size to impractically high values of k = 400, we were able to avoid catastrophic blow-up.To monitor the ensemble spread, we consider the ensemble dimension D ens as defined in Patil et al. (2001) Note that D ens takes values between 1 and min(k, D), depending on whether the ensemble members are all aligned or are orthogonal to each other.In Fig. 5 we show the ensemble dimension as a function of time for an ensemble with k = 6 members corresponding to the blow-up presented in Fig. 1.It is seen that D ens ≈ 2 during the stable tracking episode, indicating that the ensemble is not spanning all directions on the attractor (we recall the fractal attractor dimension to be D attr = 4.15) but instead is aligning with the first two Lyapunov vectors (cf.Ng et al., 2011).This triggers the nontracking period until t ≈ 59.On the other hand, for ensemble sizes of k = 400 we observe that mostly D ens > 4, and no blow-up occurs.Before the actual blow-up the ensemble dimension reaches values of almost D ens = 1, indicating ensemble collapse.Finite ensemble sizes and the associated loss of ensemble spread are known to cause non-catastrophic filter divergence in which the filter trusts the wrong forecasts, ignoring error-correcting observations (Houtekamer and Mitchell, 1998;Hamill et al., 2001;Sacher and Bartello, 2008;Ng et al., 2011).Finite ensemble sizes cause the forecast error covariance P f to exhibit on the one hand small diagonal variances and on the other hand off-diagonal entry values of unrealistically large absolute value (Hamill et al., 2001).This leads to unrealistic innovations of the unobserved variables towards the observation of the observed distant variable.Gottwald et al. (2011) showed that catastrophic filter divergencies are suppressed by a variance limiting Kalman filter (VLKF) which controls overestimation of the analysis error covariance.
The destructive interplay of sparse (sufficiently) accurate observations and finite size ensembles can be illustrated as follows.The Kalman filter produces analyses according to Eq. ( 3), which read for our case where only z 1 is observed as zai = zfi − for i = 1, . . ., 5. The combination of small ensemble sizes causing small values of P f11 and large absolute values of P fi1 for i > 1 and comparably small observational noise R o leads to analyses for i = 1 which are significantly influenced by the observation y o at site i = 1, irrespective of the actual physical correlations present in the dynamics.The resulting analysis may therefore not be dynamically consistent but may lie in phase space off the attractor.As a proxy for the distance of the analysis to the attractor, we measure the time τ attr taken for the trajectory to reach an Euclidean distance θ from the attractor when propagating the analysis forward in time.We created an approximation of the attractor by storing 2 × 10 6 data points sampled at 0.005 time units.We choose θ = 1. Figure 6 illustrates clearly that during the non-tracking period episode (i.e. the period t i ∈ (55, 59) in Fig. 1), one has (predominantly) τ attr = 0, consistent with our previous observation that the analysis lies on the attractor, but is uncorrelated from the truth and not controlled by the observations.The subsequent initiation of blow-up for t > 59, however, is characterised by non-zero values of τ attr .It is clearly seen that blow-up is characterised by analyses lying off the attractor.
It is pertinent to mention that the existence of alignment of the ensemble and the occurrence of off-attractor analyses (i.e.large values of τ attr ) does not necessarily cause catastrophic filter divergence (for example as in Fig. 6 at t i ≈ 56).The effect of off-attractor analyses is the following: the forecast model, initialised with such an analysis lying off the attractor, tries to follow the stable direction towards the globally attracting set with a rate which is in our case very fast on average with a Lyapunov exponent of −5.89.This renders the dynamical system stiff developing numerical instabilities for sufficiently large time steps dt, causing the filter to catastrophically diverge to machine infinity.As seen in Fig. 2 there is no filter divergence for sufficiently small and sufficiently large observation intervals t obs .This can now be understood as follows: for too small observation intervals, the forecast model will not have sufficiently propagated the analysis away from dynamically realistic values, whereas for sufficiently large values of t obs τ corr the ensemble will have acquired sufficient spread with D ens ≥ 4 exploring the whole of the attractor.
The behaviour of the propensity S b for blow-up as a function of the noise covariance R o as depicted in Fig. 4b can also be readily understood from Eq. ( 8); for large observational noise with R o > 2 we obtain analyses za = zf which are forced by the dynamics to lie on the attractor, and the sampling error of finite ensemble sizes is not entering the analysis.For small observational noise the magnitude of the increments (P f1i / (P f11 + R o )) zf1 − y o is given as a balance between small innovations zf1 − y o and large finite ensemble size induced gains P f1i / (P f11 + R o ) ≈ P f1i /P f11 , yielding a maximum at R o ≈ 0.025.
We now address the question how catastrophic blow-up can be controlled, except through decreasing the time step of the numerical forecast model to control the numerical instability or through increasing the number of ensembles to diminish the sampling effects.We found that catastrophic blow-up can be avoided by employing covariance localisation into the data assimilation procedure, which controls the unrealistic overestimation of off-diagonal entries of the forecast covariance matrix P f .Houtekamer and Mitchell (2001) and Hamill et al. (2001) achieved covariance localisation by Schur multiplication of the forecast error covariance P f with a localisation matrix C loc .We used the compactly supported localisation function introduced by Gaspari and Cohn (1999), in conjunction with a DEnKF proposed by Sakov and Oke (2008), and found that catastrophic filter diver- gence is completely suppressed for a localisation radius of ρ loc < 1.5 grid spacings.Other covariance limiting strategies such as the ensemble filters suggested in Gottwald et al. (2011) and Mitchell and Gottwald (2012) were also able to suppress catastrophic filter divergence.We remark that the actual truth, however, does indeed exhibit nontrivial correlations between all variables for our parameters in this low dimension with D = 5.Multiplicative error covariance (i.e. the multiplication of the forecast error covariance matrix P f by a constant inflation factor δ) is a standard remedy to control underestimation of covariances (Anderson and Anderson, 1999).In Fig. 7 we show the effect of multiplicative covariance inflation on the propensity for blow-up.We observe that increasing the inflation factor δ from δ = 1 significantly decreases the propensity for blow-up S b ; this is achieved by reducing occurrences of non-tracking periods (the classical filter divergence), which are the precursors of catastrophic blow-up.For sufficiently large values of δ, however, the instances of catastrophic filter divergence are drastically increased.Too large values of the inflation factor exacerbate the sampling errors of the forecast error covariances with δP f1i / (δP f11 + R o ) ≈ P f1i /P f11 .Covariance inflation, however, cannot completely suppress catastrophic filter divergence for all observation times, and more complicated behaviour can occur as observed for example for t obs = 0.04 in Fig. 7.

Discussion
We have numerically established that catastrophic filter divergence is caused by the interplay of finite size effects and sparse observations with small observational noise producing analyses which may be situated in phase space far away from the actual attractor.The subsequent attraction back towards the attractor by the forecast model may cause numerical instabilities if the attraction rate is sufficiently large.This suggests that blow-up is to be expected in sparse observational networks involving observables which exhibit a large degree of irregularity.If those high variance fields are measured sufficiently accurately, catastrophic filter divergence is possible.This is, for example, the case in data assimilation of small-scale intermittent turbulent fields or in situations where sparse accurate observations of variables exhibiting strong spatial gradients such as jets can cause numerical instabilities to occur (J.L. Anderson, personal communication, 2012).
We have checked that our results are independent of the numerical integration scheme used during the forecasting step.We have performed simulations with a first-order in time forward Euler scheme and a second-order in time implicit midpoint rule scheme.The latter is unconditionally stable for the system in Eq. (1) (see Theorem 5.5.6. in Stuart and Humphries, 1996); however, in the case of the implicit midpoint rule, we observed a large increase of the iterations required to solve the nonlinear fixed point equation, rendering the scheme impractical.Furthermore, we have performed simulations with the deterministic ensemble adjustment Kalman filter (EAKF) (Anderson and Anderson, 1999) and observed similar behaviour.
Our work established the dynamical genesis of catastrophic filter divergence.Besides an impractical reduction of the integration time step (or an increase of the limit of iterations required in an unconditionally stable implicit method), to control the stiffness of the dynamical system, or an impractical increase of the number of ensembles to eliminate finite size sampling errors, covariance localisation and appropriately tuned covariance inflation were found to be effective in mitigating catastrophic filter divergence.Our study further shows that choosing too small observation covariances can lead to filter blow-ups as observed in numerical weather prediction models (A. Shlyaeva, personal communication, 2012).

Fig. 1 .
Fig. 1.Top: maximal absolute value Z amax of the analysis ensemble over all D = 5 components for dt = 0.025 and t obs = 0.05 with R o = 0.01 and k = 6 ensemble members.Bottom: the error norm E as a function of analyses cycles.The continuous line (online blue) in the inset shows the observational error √ R o .

Fig. 2 .
Fig. 2. Percentage S b of blow-ups as a function of the observation interval t obs for several values of the integration time step dt for fixed number of analyses cycles n a = 4000.We used R o = 0.01 and k = 6 ensemble members.

Fig. 3 .
Fig. 3. Log plot of the empirical normalised cumulative probability density function P b (τ i ) of blow-up times τ i for different observation intervals t obs for dt = 0.0025.We used R o = 0.01 and k = 6 ensemble members.

Fig. 4 .
Fig. 4. Top: percentage S b of blow-ups as a function of the observation interval t obs for several values of the observational noise R o for fixed number of analyses cycles n a = 4000, integration time step dt = 0.0025.Bottom: percentage S b of blow-ups as a function of the observational noise variance R o for fixed number of analyses cycles n a = 4000, integration time step dt = 0.0025 and observational interval t obs = 0.05.

Fig. 5 .
Fig. 5. Ensemble dimension D ens as a function of analyses cycles.Parameters as in Fig. 1.

Fig. 6 .
Fig. 6.Distance between analysis and the attractor as measured by τ attr .Parameters as in Fig. 1.

Fig. 7 .
Fig. 7. Percentage S b of blow-ups as a function of the multiplicative inflation factor δ for fixed number of analyses cycles n a = 4000, integration time step dt = 0.0025 and observational noise variance R o = 0.01.We used k = 6 ensemble members.