A Stochastic Covariance Shrinkage Approach to Particle Rejuvenation in the Ensemble Transform Particle Filter

Rejuvenation in particle filters is necessary to prevent the collapse of the weights when the number of particles is insufficient to sample the high probability regions of the state space. Rejuvenation is often implemented in a heuristic manner by the addition of stochastic samples that widen the support of the ensemble. This work aims at improving canonical rejuvenation methodology by the introduction of additional prior information obtained from climatological samples; the dynamical particles used for importance sampling are augmented with samples obtained from stochastic covariance shrinkage. The ensemble transport particle filter, and its second order variant, are extended with the proposed rejuvenation approach. Numerical experiments show that modified filters significantly improve the analyses for low dynamical ensemble sizes.


Introduction
Ensemble-based data assimilation (Asch et al., 2016;Law et al., 2015;Reich and Cotter, 2015) aims to estimate our uncertainty about the state of some dynamical system through an ensemble of possible states. The assumed underlying distribution of these states is taken to be our uncertainty.
Oftentimes the ensemble cannot describe the distribution of our uncertainty to sufficient accuracy. The curse of dimensionality (Tan et al., 2018) ensures that true descriptions of general high-dimensional distributions stay out of our reach. Several techniques such as the principal of maximum entropy (Jaynes, 2003) exist in order to attempt to alleviate some of the burden by prescribing a distribution to a set of data by constraining the choice with known or estimated quantities and qualities. For instance, the ensemble Kalman filter (Burgers et al., 1998;Evensen, 1994Evensen, , 2009, could be thought of as an abuse of the principle of maximum entropy: by discarding information about the underlying dynamical system, and assuming that the ensemble only gives information about a mean (that lives in R n ), and a covariance, the underlying distribution of our uncertainty is taken to be normal. In such a way, any application of Bayes' rule has to transform our assumed prior normal distribution into our assumed posterior normal distribution.
Previous work  has focused on augmenting the information represented by the ensemble with information derived from covariance shrinkage through a surrogate ensemble in the ensemble transport Kalman filter. In this paper, we extend this idea to the ensemble transport particle filter (ETPF) (Reich, 2013). The ETPF attempts to use importance sampling (Liu, 2008), not for a solution to the problem of Bayesian inference, but to simply transport a given ensemble into one that is equally sampled from some distribution whose moments, in the ensemble limit, approach the moments of the correct posterior distribution. This means that the underlying distribution from which our posterior is sampled, could be potentially very far, in information distance, to the optimal posterior distribution, for a finite ensemble.
This work explores a new approach to particle rejuvenation, which is necessary to prevent weight collapse in particle filters. Instead of heuristics, the approach makes use of prior information to enrich the ensemble subspace. Our contributions are as follows: we introduce an alternative way of performing particle rejuvenation in ETPF by incorporating climatological covariance information. We accomplish this by augmenting the dynamical (model) ensemble with synthetic anomalies with optimal scaling, accompanied by a statistically correct estimator. We show that this method of performing particle rejuvenation significantly improves the analysis RMSE for low dynamical ensemble sizes. This paper is organized as follows. Section 2 reviews the concept of Bayesian inference with the addition of prior information, and its use in importance sampling. Section 3 introduces the ensemble transform particle filter and its canonical rejuvenation heuristic. The concept of stochastic covariance shrinkage is proposed in Section 4, and ETPF is extended to make use of this shrinkage. Numerical experiment results are reported in Section 5. Finally, concluding remarks are drawn in Section 6.
2 Optimal coupling with prior information and the ensemble transform particle filter Bayesian inference (Jaynes, 2003) aims at transforming prior information about the state of a system (represented by the distribution of a random variable X f ), additional qualitative and quantitative information (P ), which we will use to stand for , and information obtained by observing the system (Y ), into combined posterior information (X a ): where π X f (x | p) represents the prior state probability density conditioned by all other relevant information, and π Y |X f (y | x, p) is the observational likelihood conditioned by the the forecast and the prior information. Here we consider the finite dimensional case where X f , X a ∈ R n , Y ∈ R m , and the supports of the probability densities π X f and π X a are subsets of the respective spaces.
Classical particle filtering (Reich and Cotter, 2015) represents state distributions by collections of particles, i.e., ensembles of samples. Specifically, consider an ensemble of is approximated weakly by the corresponding empirical measure where w f j for 1 ≤ j ≤ N f are the prior importance weights associated with each particle. Similarly, the posterior density is approximated weakly by an empirical measure based on the same sample values (particle states) but with different posterior importance weights w a j for 1 ≤ j ≤ N f : The posterior importance sampling weights are obtained from eq. (1): The ensemble of weights is denoted by w = [w 1 , . . . w N f ] T , and w f and w a refer to the forecast and analysis weights respectively. Using (3) and (4) empirical estimates of the posterior mean and covariance, respectively, are obtained by the importance sampling approach (Liu, 2008).
The goal of particle filtering (with resampling) is to find an ensemble X a ∈ R n×N a of N a realizations of the random variable X a that represents the posterior distribution π X a with equal weights. Specifically, the the posterior density is approximated weakly by the empirical measure where the importance sampling weights are uniform and equal to 1/N a (so as to be equally likely). We impose that the empirical mean (5) is preserved by (6): The optimal coupling (McCann et al., 1995;Reich and Cotter, 2015) between the prior empirical distribution eq. (2) and the posterior empirical distribution eq. (6), can be defined as an ensemble transformation, where T * ∈ R N f ×N a is the solution to the optimal transport Monge-Kantorovich problem (Villani, 2003). The discrete optimal transportation problem is where the distance measure of squared Euclidean distance is taken for a provably unique solution to the Monge-Kantorovich problem to exist (McCann and Guillen, 2011). The vector of ones of size q is represented by 1 q . The problem eq. (9) is a linear programming problem.
The discrete optimal transport eq. (8) formulation begets a mapping X a = Ψ N f ,N a (X f ), that, in the limit of ensemble size , converges weakly to a mapping Ψ, such that X a = Ψ(X f ), which has the exact desired distribution given by eq.
(1) (Reich and Cotter, 2015, Theorem 5.19 ). We believe, but don't prove, that this is likely when N f and N a , are not equal.
The standard ETPF (Reich, 2013) makes the assumption that the prior and posterior ensemble sizes are the same, N := N a = N f in (9). A second order extension to the ETPF (which we will call ETPF2 here) (Acevedo et al., 2017) modifies the optimal transport equation (8) as follows: where the additional term D is a matrix that ensures that the empirical covariance estimate Σ X a from (5) is preserved by (6).

Particle Rejuvenation in ETPF
Particle and ensemble-based filters often underrepresent uncertainty (Asch et al., 2016) due to the relatively small number of samples when compared to the dimension of the state and data spaces. Over several data assimilation cycles multiple particle start carrying either unimportant or redundant information, which leads to weight collapse or to ensemble degeneracy (Strogatz, 2018). To alleviate these effects, methods such as inflation (Anderson, 2001;, rejuvenation (Reich, 2013), and resampling (Reich and Cotter, 2015) have been developed.
In order to avoid ensemble collapse, ETPF employs a particle rejuvenation approach (Acevedo et al., 2017;Reich, 2013;Chustagulprom et al., 2016) that perturbs the analysis ensemble by a random sampling from the ensemble of prior anomalies, where η ∼ (N (0, 1)) N ×N is a matrix of i.i.d. samples from the standard normal distribution of size N , the factor τ is treated as a hyperparameter that controls the magnitude of the correction, and the ensemble anomalies are defined as the ensemble of deviations from the sample mean. Of note is the fact that the extra term I N − N −1 1 N 1 T N in (11) ensures that the introduction of the random matrix η does not modify the mean of X a . This is due to the fact that, Notice that if we define the matrix, it is possible to write the ETPF with rejuvenation (11) as, with the matrix B acting as a stochastic perturbation of the optimal transport operator T * , such that B preserves the constraints 1 T N B = 0 T N and B1 N = 0 N in eq. (9) because of the results in (13). This, of course, immediately calls into question the optimality of the transport for a finite ensemble, as adding this type of B matrix is perturbing the transport mapping T away from the optimum T * .
From a Bayesian perspective, covariance shrinkage seeks to incorporate additional prior information on error correlations into the analysis, in order to enhance the inference. In many data assimilation models, climatological covariance information is often available, i.e., it is known prior information. Climatological covariances are typically precomputed or derived from climatological models and are often employed in variational data assimilation Lorenc et al. (2015).
Following , we describe the stochastic covariance shrinkage technique. Instead of perturbing the transform matrix as in (15), we instead consider enhancing the dynamic ensemble X f ∈ R n×N with an M member synthetic ensemble X f ∈ R n×M of samples independent of the dynamical ensemble. This leads to the total N + M members ensemble: with We make the ansatz that each ensemble member is distributed as which is the full distribution of the forecast conditioned by the prior information that we have provided to the algorithm. Note that (17) is not the empirical measure distribution (2), that only has information from the ensemble members, but rather the 'exact' distribution that is assumed to contain all the information from the forecast. THis is because we are now incorporating more prior information P in the form of climatological information.
Taking (12) to be the anomalies of the dynamic ensemble, and to be the anomalies of the synthetic ensemble, the total empirical (unbiased) covariance can be written as, where the constituent covariances are defined in terms of the weights In the covariance shrinkage approach, the sample mean of the synthetic ensemble is assumed to be the sample mean of the dynamic ensemble, by construction and (13), thus requiring that only the synthetic ensemble anomalies need to be determined. Taking a covariance matrix P that represents a climatological estimate of the covariance, we sample the anomalies from some unbiased distribution with a scaling, by a factor µ, of said covariance. In the Gaussian case where µ is a scaling factor defined later. An alternate choice of distribution that we explore is the symmetric Laplace distribution (Kozubowski et al., 2013), which is described by the pdf where K is the modified Bessel function of the second kind (Olver et al., 2010). The choice of Laplace distribution is motivated by robust statistics techniques (Rao et al., 2017). The resulting sampled covariance would therefore be an estimate of the scaled climatological covariance, Remark 1. In order to stay consistent with the mean estimate, the anomalies are replaced with their sample mean zero coun- The weights w f are divided into two classes: those that are associated with the dynamic ensemble, and those that are associated with the synthetic ensemble, where the parameter γ is known as the covariance shrinkage factor.
One choice to calculate γ is the Rao-Blackwell Ledoit-Wolf (RBLW) estimator (Chen et al., 2009) (Nino-Ruiz and Sandu, 2017, equation (9)), where the sphericity factor, represents the mismatch between the climatological covariance (called the "target" in statistical literature) and sample covariance matrices. Note that if P = Σ X f , thenÛ (P, Σ X f ) = 0. In such a framework the scaling parameter for the climatological covariance is defined to be µ = tr(C) n .
Remark 2. The RBLW estimator (28) makes the assumption that the underlying distribution of the dynamic ensemble is Gaussian. Typically this assumption is violated for dynamical systems of interest.
Remark 3. In statistical literature, the target covariance is often taken to be the identity, P = I, which implies that C = Σ X f in (29). The assumption that the target is a climatological covariance is natural generalization in the specific context of sequential data assimilation.
Remark 4. The scaling of the target matrix P is not of any consequence. Let P = βP be a scalar scaling of the target matrix, The resulting analysis ensemble based on prior states and importance sampling weights of N + M states is transported into an equally weighted posterior ensemble of N states through the transformation where the optimal transport matrix T * ∈ R (N +M )×N is computed by solving (9).
Recall that in the traditional method of rejuvenation (15), the optimal transport matrix is perturbed randomly into a nearby transport matrix; no new prior information is introduced. We take a fundamentally different approach by incorporating "unseen" prior information derived from a climatological covariance. To this end, we augment of the empirical measure distribution (2) with samples from the climatological distribution, to accommodate the total ensemble (16), before the Monge-Kantorovich problem (9) is solved. In effect we are able to avoid ensemble collapse by enhancing the empirical measure distribution (32) with new prior information, as opposed to a reweighing of the old prior information. We will denote our method as FETPF, standing for 'foresight' ETPF, as we believe including prior information in the analysis procedure is a type of foresight.

Multiple Climatological Covariance Matrices
It is conceivable that multiple climatological models give rise to multiple climatological covariances, or alternatively multiple candidates for the most 'common' behavior of the model is to be chosen.
Given a collection of target covariances, {P j } j∈J , we must choose the appropriate covariance from which to sample. We consider the sphericity of the mismatch between the target and forecast covariances eq. (29). Based on authors' numerical experience, we select the target covariance that corresponds to the highest sphericity of the mismatch: We can justify this choice by realizing that the smaller the sphericity, the closer our samples are to that of canonical rejuvenation techniques. The aim of climatological shrinkage is to introduce unknown information into our inference procedure, thus the target covariance with the highest mismatch introduces the highest amount of outside information.
Remark 5. It is also possible to construct 'multi-target' shrinkage estimators (Lancewicki and Aladjem, 2014) that consider all target matrices simultaneously.

Numerical Experiments
For all our experiments we use the Lorenz '63 system (Lorenz, 1963): with canonical parameter values σ = 10, ρ = 28, and β = 8/3. We observe the first component, with Gaussian unbiased observation error, with a variance of R = 8. The implementation is from our test problem suite (Computational Science Laboratory, 2020; Roberts et al., 2019).
As discussed previously, the canonical choice for the shrinkage covariance is the identity matrix. It has been the authors' experience that for most dynamical systems the choice is poor. Moreover, the sequential data assimilation problem typically provides ways to calculate climatological approximations to the covariance. We take advantage of such techniques in this paper.
The first type of climatological covariance that we investigate is that of the distribution over that of the whole manifold of the dynamics. The (trace-state normalized) matrix that is obtained by taking the temporal covariance of 50000 sample points  (37), with respect to a canonical particle rejuvenation approach for first order ETPF and second order ETPF (denoted ETPF2) for an optimal rejuvenation factor τ = 0.04. The target covariance is taken to be (35 with condition number 15.88. For testing multiple covariances, we run an ETPF with N = 100 with 20000 evenly spaced samples over a time interval of 2400 time units and calculate the trace-state normalized forecast covariances. Under a square Frobenius norm distance, we cluster the empirical covariance matrices of the same ensemble at different times using the k-means algorithm (Tan et al., 2018) into two clusters. The collection of climatological covariances for the Lorenz '63 thus consists of the centroids of each cluster, and Laplace (L) covariance shrinkage approaches (FETPF) to particle rejuvenation with the multi-target covariances (36) for a synthetic ensemble size of M = 100 and for various values of synthetic anomaly inflation (α) with respect to a canonical particle rejuvenation approach for first order ETPF and second order ETPF (denoted ETPF2) for an optimal rejuvenation factor τ = 0.04.
with condition numbers 13.68 and 16.98 respectively. As can be seen, the clusters are mainly split by the correlation factors of z with respect to the other variables being positive or negative.
In order to stay in line with other particle rejuvenation techniques, a heuristic that we use is inflation on the synthetic ensemble, so that it is possible to overcome deficiencies in its descriptive power: which is equivalent to assuming an inflated scaling factor µ in (30). We therefore have two parameters that we can configure in our rejuvenation technique: M , the size of our surrogate synthetic ensemble, and α, the inflation applied to its realizations.
It is the authors' experience that inflation should only be applied to the synthetic ensemble, and not the dynamical ensemble, otherwise the shrinkage estimate (28) does not lead to a stable algorithm.
In our experiments we report the error of the analysis with respect to the truth, measured by the spatio-temporal RMSE, given by the formula, where T stands for the relevant measured timeframe of the experiments.
Our first round of experiments reported in Figure 1 compares the canonical method of rejuvenation in ETPF (and second order extensions) with the optimal rejuvenation factor of τ = 0.04 in (14) (35) is used. We perform 10000 assimilation steps, but discard the first 1000 that are used for spinup. The time interval between successive observations is ∆t = 0.12. We perform 20 independent runs and take the mean of the results to obtain an accurate estimate of the expected error.
Results in Figure 1 show that the stochastic covariance shrinkage technique converges to the same 'correct' RMSE in the case of a large dynamic ensemble of N = 100. Moreover, in the small ensemble case of N = 5, methods that inflate the synthetic ensemble with factor α = 1.2 perform significantly better than those that do not. Laplace distributed synthetic samples also seem to slightly reduce the error.
Our second round of experiments reported in Figure 2 makes use of multiple values of the climatological covariance P, namely those in (36). The rest of the setup is identical to the previous experiment. Again, for a large value of dynamic ensemble size N = 100, the stochastic covariance shrinkage approach attains the correct error statistics. For a low ensemble size, however, there is virtually no difference between the Gaussian, Laplace, inflation, and no inflation stochastic covariance shrinkage methods as compared to the ETPF.
Our final round of experiments seeks to understand the effect of selecting the two free parameters, i.e., the synthetic ensemble size M and the synthetic ensemble inflation factor α. Figure 3 shows the spatio-temporal RMSE of a small dynamic ensemble (N = 5) for various values of M and α, with Gaussian synthetic samples using the single target matrix (35). The results clearly show that in many operationally useful cases, it is necessary to have a sufficiently expressive synthetic ensemble, whose anomalies are sufficiently inflated.
The results of the first two experiments show that adding additional synthetic information during assimilation is more effective than randomly perturbing the ensemble post-assimilation. The authors hypothesize that the results point strongly towards the need of intelligently, and adaptively choosing the target covariance matrices, and to the need for better operational calculation of the covariance shrinkage factor γ.

Conclusions
This paper introduces a stochastic covariance shrinkage-based particle rejuvenation technique for the ensemble transport particle filter. Instead of reweighing existing prior information, the approach incorporates additional prior information into the ensemble through the use of synthetic anomalies. These anomalies come from climatological covariance information. Numerical experiments show that the use of climatological prior information to perform rejuvenation leads to reduced analyses errors for significantly smaller dynamical ensemble sizes than the original rejuvenation approach.
We believe that the stochastic covariance shrinkage approach to importance sampling can be used not just for particle rejuvenation in the ETPF, but in other particle filters as well.