Brief communication: An innovation-based estimation method for model error covariance in Kalman filters

We present a simple innovation-based model error covariance estimation method for Kalman filters. The method is based on Berry and Sauer (2013) and the simplification results from assuming known observation error covariance. We carry out experiments with a prescribed model error covariance using a Lorenz (1996) model and ensemble Kalman filter. The prescribed error covariance matrix is recovered with high accuracy.


5
The Kalman filter is a state estimation method for combining model forecasts with noisy observations, and forms the basis for many data assimilation (DA) methods (Ghil et al., 1981;Kalnay, 2002). The estimation and incorporation of model error is an important aspect of the filtering problem. We briefly introduce the problem.
Following the notation of Ide et al. (1997), we assume that the true state evolution is given by 10 where x t (t i ) is the true state at time t i , M i is the dynamic model at time t i , and η is a model error with mean 0 and covariance Q. The observations are given by where H i is called the observation operator, and is an observation error with mean 0 and covariance R. The Kalman filter then combines the observations y with model forecasts x f with covariance P f , resulting in the analysis 20 state x a with covariance P a . The next forecast is then given by The forecast error covariance P f at time t i+1 can be estimated by 25 c.f. Ghil and Malanotte-Rizzoli (1991) or Tandeo et al. (2020); this is exact for a linear model. The first term, P p , in the above estimate can be identified as the one-step predictability error (Berry and Sauer, 2013). This error is due to the effect of the system's dynamics on the initial conditions. Kalman filters, without further modification, generally only use this term, and are thus prone to underestimate P f . This has led to a variety of methods for estimating Q, often simultaneously with estimating R; see the reviews by Duník et al. (2017) and Tandeo et al. (2020).

30
Then, adding the estimated Q to P p is known in the DA literature as additive inflation-as opposed to multiplicative inflation, where P p is multiplied by a scalar greater than 1. Additive inflation generally works better than multiplicative inflation in accounting for model errors, since multiplicative inflation assumes that model errors will span the same subspace as the errors due to initial conditions, which is not generally the case (Hamill and Whitaker, 2005).
Here, we suggest a method for estimating Q that is closely related to the one of Berry and Sauer (2013), but we assume 35 that the observation noise covariance R is known. This assumption allows us to derive a simple estimate for Q that does not require either lagged innovations or the gain matrix. Nor is model linearization required in the case of an ensemble Kalman filter applied to a nonlinear forward model.

Method
Many methods for estimating Q rely on the statistics of the innovations d(t i ) = y(t i ) − Hx f (t i ), which equal the difference 40 between observations and forecasts. A standard result for the Kalman filter states that see, for instance, Desroziers et al. (2005) or Simon (2006, Sec. 10.1).
If the state is not fully observed, as is usually the case in DA problems, then H is not invertible. However, for idealized cases when H is invertible, we can obtain an estimateQ of Q by substituting Eq. (4b) into Eq. (5) and rearranging: See section 2.2 below for the general case in which H is not invertible.
In order to avoid abrupt changes inQ over time, and to preserve positive semidefiniteness (see below), a temporal smoothing needs to be applied: 50 where 0 < ρ < 1 is a tunable parameter (Berry and Sauer, 2013;Tandeo et al., 2020), and Q is the smoothed estimate. Then, is estimated by adding Q(t i ) to the P p estimated by the filter. In what follows, we drop the time indices for simplicity.
Covariance matrices must be positive semidefinite: in other words, their smallest eigenvalue must satisfy λ min ≥ 0. Due to the observation noise entering the E[dd T ] term in Eq. (6), the estimate Q can often lack this property. To avoid this problem, a small enough ρ must be chosen, and the "initial guess" Q(t 0 ) should be positive semidefinite.

55
In general, the larger the observation noise relative to the model error, the smaller ρ must be. However, if the estimatedQ does become indefinite at some t j , definiteness can be restored. The matrix satisfying λ min ≥ δ that is nearest in the Frobenius norm · F (Horn and Johnson, 2013) to the problematic one at t = t j can be computed by using the spectral decomposition and setting all λ i < δ to δ (Cheng and Higham, 1998).

Ensemble filters 60
In the case of an ensemble Kalman filter, we estimate E[dd T ] (y − Hx f )(y − Hx f ) T , wherex f is the mean of the forecast ensemble.
In ensemble filters, P p is estimated as where x f i is the ith ensemble member and m is the ensemble size. We use this P p directly in Eq. (6), thus avoiding the need for a tangent linear model, as in Eq. (4a), when M is nonlinear.
Furthermore, instead of adding Q to P p , samples drawn from N (0, Q) can be added to each ensemble member. Mitchell and Carrassi (2015) found that this stochastic method performed better than directly adding Q to P p , although modified deterministic methods can work better for square-root filters (Raanes et al., 2015).

Rank-deficient observations 70
When H is not invertible, we can find a solution that minimizes the Frobenius norm, as in Berry and Sauer (2013). We let Q in Eq. (7) be a linear combination of fixed matrices,Q = p q p Q p . This formulation can be used to specify a simplified structure, such as a diagonal matrix or a block-constant one.
Let q be the vector of coefficients {q p }. Then, The minimization in Eq. (9) is carried out by finding the least-squares solution of Aq vec(C), where the pth column of A is vec(HQ p H T ).
We apply the proposed method to the Lorenz (1996) model:  Fig. 2a. 90 We carry out three distinct experiments: 1. Model error is Q 1 and the state is fully observed.
2. Model error is Q 1 but only every second x i is observed. In this experiment, furthermore, we parameterize Q as a blockconstant matrix of 4 × 4 blocks.
3. Model error is Q 2 and the state is fully observed.

95
For each experiment, we use the ensemble transform Kalman filter (ETKF: Bishop et al., 2001) with 80 ensemble members.
At every timestep, we draw a vector from the multivariate normal distribution N (0, Q) and add it to all the ensemble members.
We take R = 0.4I, with ρ = 10 −3 for experiment 1 and ρ = 10 −4 for experiments 2 and 3. The estimate Q is initialized with 0.1I 40 for experiments 1 and 3 and with I 40 for experiment 2. The number of DA cycles is 3 000 for experiment 1, 20 000 for experiment 2, and 15 000 for experiment 3.  Figures 1 and 2b shows the results of experiment 1. After about 2 000 DA cycles, the error in estimating Q 1 stabilizes (Fig. 1a), which also results in lower analysis errors when Q 1 is used in the DA process (Fig. 1b). Furthermore, Q 1 is recovered with high fidelity (Fig. 2b). Note that, for the analysis errors, we use the continuous ranked probability score (CRPS: Hersbach, 2000), a probabilistic error metric, to measure the discrepancy between the ensemble and the true probability distributions.

Results
105 Figure 2c shows the estimate of Q 1 obtained in experiment 2, with partial observations and a block-constant formulation. In this case, too, the method successfully recovers the coarse structure of Q 1 .
Finally, Figure 3 shows the results of experiment 2. Here, since ρ is smaller, it takes longer for the error to stabilize. The structure Q 2 is also recovered with high fidelity (not shown).