A local particle filter (LPF) is introduced that outperforms traditional ensemble Kalman filters in highly nonlinear/non-Gaussian scenarios, both in accuracy and computational cost. The standard sampling importance resampling (SIR) particle filter is augmented with an observation-space localization approach, for which an independent analysis is computed locally at each grid point. The deterministic resampling approach of Kitagawa is adapted for application locally and combined with interpolation of the analysis weights to smooth the transition between neighboring points. Gaussian noise is applied with magnitude equal to the local analysis spread to prevent particle degeneracy while maintaining the estimate of the growing dynamical instabilities. The approach is validated against the local ensemble transform Kalman filter (LETKF) using the 40-variable Lorenz-96 (L96) model. The results show that (1) the accuracy of LPF surpasses LETKF as the forecast length increases (thus increasing the degree of nonlinearity), (2) the cost of LPF is significantly lower than LETKF as the ensemble size increases, and (3) LPF prevents filter divergence experienced by LETKF in cases with non-Gaussian observation error distributions.

The particle filter (PF) has been explored in the data assimilation
community since the introduction of its Gaussian linear variant, the
ensemble Kalman filter (EnKF), in the mid-1990s (Evensen, 1994). While general PFs have
been intractable for high-dimensional systems, the EnKF has experienced
great success in numerical weather prediction (NWP) (e.g., Kleist, 2012; Hamrud et al., 2014) and ocean data
assimilation (e.g., Penny et al., 2015). However, at least two limitations are on the horizon for
EnKFs. Perhaps counterintuitively, these limitations arise due to
increased computational resources, and have already become challenges at the RIKEN
Advanced Institute for Computational Science (AICS, e.g., Miyamoto et al., 2013; Miyoshi et al., 2014, 2015). First, global
models will be pushed to higher resolutions in which they begin to resolve
highly nonlinear processes. To maintain the Gaussian linear assumption
required for the EnKF, much smaller time steps are needed. For example, the
standard 6 h analysis cycles used for the atmosphere may need to be
decreased to 5 min or even 30 s. Second, large ensembles (e.g., with
ensemble size

The PF is generally applicable to nonlinear non-Gaussian systems, including cases with multimodal distributions or nonlinear observation operators. With little difficulty, PFs can explicitly include representation of model error, nonlinear observation operators (Nakano et al., 2007; Lei and Bickel, 2011), non-diagonal observation error covariance matrices, and non-Gaussian likelihood functions. For example, observed variables such as precipitation are inherently non-Gaussian and cannot be effectively assimilated by standard EnKF techniques (e.g., Lien et al., 2013, 2016). In the expansion to sea-ice and land data assimilation applications, the non-Gaussian quantities such as ice concentration, ice thickness, snow cover, and soil moisture outnumber those that can be modeled with Gaussian error. Bocquet et al. (2010) further review the difficulties using observations with non-Gaussian error distributions. All of these problem-specific variations can create great difficulties for standard methods, such as the EnKF or variational approaches (3D-Var/4D-Var), as used in current operational systems.

Sampling importance resampling (SIR) (also known as the bootstrap filter;
Gordon et al., 1993) is a commonly used enhancement to the basic sequential importance sampling
(SIS) particle filter. However, even with resampling, the number of ensemble
members required by the SIR particle filter to capture the high probability
region of the posterior in high-dimensional geophysical applications is too
large to make SIR usable (Ades and van Leeuwen, 2013). Snyder et al. (2008) found that the number of required ensemble
members scales exponentially with the size of the system, giving the example
that a 200-dimensional system would require 10

Techniques such as localization and inflation are typically applied as
modifications to make the EnKF operationally feasible. Inspired by this
practice, we introduce a local particle filter (LPF) designed for
geophysical systems that is scalable to high dimensions and has
computational cost

Localization is used in most operational NWP data assimilation systems, either through a direct scaling of the background error covariance matrix (e.g., Whitaker and Hamill, 2002) or by a scaling of the observation error covariance matrix (Hunt et al., 2007). Because the computation of a background error covariance matrix is not needed for the PF, the latter approach is applied here to develop an effective PF for high-dimensional geophysical systems. Localization reduces the dimensionality of the solution space, thus requiring fewer ensemble members to sample the phase space. Gaussian noise is applied as an additive inflation to prevent particle degeneracy.

There are many variations of the PF (Stewart and McCarty, 1992; Gordon et al.,
1993; Kitagawa, 1996; Hurzeler and Kunsch, 1998; Liu and Chen, 1998). In essence, it is simply a Monte Carlo
estimation of Bayes' theorem, reformulated as a recursion (Doucet et al.,
2001):

For the experiments here, we generate two experiment cases, each with a
different likelihood function. First, we use a Gaussian likelihood function
corresponding to that used for EnKFs:

The PF can be interpreted similarly to the ETKF
of Bishop et al. (2001). The transform interpretation has been
explored by Reich (2013) and Metref et al. (2014). Namely, we define the PF solution as a transformation of
the background ensemble to the analysis ensemble:

Let

We further define the vector

For reference in the next section, the components of the analysis matrix

Snyder et al. (2008) note that when either the model dimension or observation count is large, the PF requires significantly more particles to give an adequate representation of the system. Localization, as introduced by Houtekamer and Mitchell (1998), reduces both the model and observation dimensions by dividing the problem into a series of subdomains, thus reducing the required number of particles for accurate filtering. Bengtsson et al. (2003) were among the first to point to spatially local updating, using a local subset of observations, as a solution to difficulties of high-dimensional non-Gaussian filtering. Lei and Bickel (2011) introduced the notion of computing local weights in a non-Gaussian filter. The LPF follows Hunt et al. (2007) to select nearby observations for independent analysis at each grid point. Nearby grid points thus assimilate nearly identical sets of observations to derive their analyses.

We use the deterministic resampling of Kitagawa (1996), with complexity

For a given grid point, when the cumulative sums of the particle weights are
near one of the partition values, there may be sensitivity in neighboring
grid points that can lead to discontinuities between local analyses. The
analysis ensemble at this grid point consists of a subset of background
particle indices (1 through

Thus, the modified local analysis at a given model grid point

A new transform can then be defined for the LPF at each point in the model
domain to generate a set of

We define the concept of a “neighbor point” abstractly as a point near the
analyzed grid point based on a specified distance metric. Through
examination of Eq. (20), it is clear that the choice of neighbor points simply
informs the weighting of indices, and that the values at these neighbor
points otherwise have no impact on the analysis. If there are

A hypothetical example depicting the construction of a single analysis
member. Each level represents a different background ensemble member (particle),
with a model space composed of a 3

We summarize that the total smoothing in the resulting analysis is achieved
independently at each grid point due to a combination of localization in
observation space and the formation of a convex combination of analysis
weights in the ensemble space. There is no explicit smoothing in the
physical model space such as the smoothing that might occur when using a Gaussian smoother applied
via a stencil of function values from neighboring grid points. Instead, the
neighbor points simply inform the choice of weights to apply to particle
indices at a single grid point. There is an implicit smoothing achieved by
applying the same procedure to many contiguous model grid points, each
generating a different set of weights that vary only slightly. This is
similar to the effect of observation-space localization. However, as with
most localization techniques, the more distant information suffers from the
poor sampling size of the ensemble. Thus, if holding the ensemble size

The particle selection process of the PF reduces the rank of the ensemble. For a linear deterministic system, this leads to a rapid collapse of the ensemble and divergence of the filter. For a sufficiently stochastic nonlinear system, the members are made distinct after a single forecast step. If the nonlinear system is not sufficiently stochastic, then we must address the ensemble initialization problem at every analysis cycle. Pazo et al. (2010) discuss the desirable properties in an initial ensemble, namely that the members (1) should be well-embedded in the attractor, (2) should be statistically equivalent but have enough diversity to represent a significant portion of the phase space, (3) should adequately represent the error between the analysis and true state, and (4) should sample the fastest growing directions in phase space. We wish to avoid particle degeneracy while also engendering some of these qualities. Applying noise to the PF at the sampling step is a common empirical technique. Therefore, we employ a simple approach: at each cycle we add Gaussian noise with variance scaled locally to a magnitude matching the analysis error variance and apply this to each analysis member prior to the subsequent ensemble forecast. The amplitude of the additive noise was chosen to conform to the dynamics of the growing error subspace, as estimated by the analysis ensemble spread. We note that this amplitude varies spatially and temporally. The results degraded when departing from this approach. The practice of applying noise to the PF sampling step is a standard technique.

We caution that EnKFs have fundamentally different behavior than the general PFs, in that the former maintain a forcing term that drives the DA system toward the observations even when the forecast may start far from the true state. The general PF with resampling essentially requires random chance to generate a state with a sufficient probability to be propagated through the resampling step. As the system size increases and the number of particles is held fixed, the probability of such an event declines. This is connected with the divergence of the PF when there are insufficient particles, which has been explored in detail by Snyder et al. (2008).

Analysis error using an analysis cycle window length d

A data assimilation system is comprised of many components. We simplify the
cost analysis in order to gain an approximate relative measure of the
algorithms presented here. Let

We enumerate the benefits of the LPF vs. the benchmark LETKF, an ensemble square root filter that performs its analysis in the ensemble space at each grid point using a geospatially local selection of observations. The LETKF approach is very efficient as long as the ensemble size is small relative to the number of observations and the model dimension.

Analysis error for

We use LETKF as a proxy for a general EnKF. Nerger (2015) gives a comparison between LETKF and the ensemble square root filter (ESRF) of Whitaker and Hamill (2002), while Tippett et al. (2003) indicate that the ESRF is identical to the ensemble adjustment Kalman filter (EAKF) of Anderson (2001) when using serial single-observation processing.

We demonstrate the algorithms on the L96 system (Lorenz, 1996), composed of

The standard SIR PF performs poorly with any ensemble size

Lorenz (1996) introduced the d

To increase the degree of nonlinearity in a data assimilation system using
L96, it is typical to increase the analysis cycle length (e.g., Lei and Bickel, 2011). The LPF
has superior performance for more nonlinear regimes of the L96 system
(e.g., d

Forecast error for

Forecast error for

Elapsed time in seconds for

Exploring a more complete parameter space, we examine the forecast error for
LETKF over a range of observation coverage (

When using fewer than 20 observations per cycle in this case, both LETKF and LPF experience filter divergence. Due to the unconstrained nature of the LPF as described in Sect. 2.4, large errors occur more frequently than for LETKF in this parameter regime. Control theory concepts regarding observability indicate that with too few observations any filter will diverge. In the linear theory for autonomous systems, the conditions are straightforward. For chaotic systems, the observability of a system is difficult to identify analytically. For the particular experiment parameters given here, these results indicate that the minimum number of required observations is approximately 20 observations per model equivalent of 60 h update intervals. Abarbanel et al. (2009) have made analogous conclusions regarding observability using synchronization methods for L96, and Whartenby et al. (2013) for shallow water flows.

When examining the computational cost of the LPF vs. LETKF, the relative costs reflect the analytical assessment given above in Sect. 2.5. Namely, the elapsed time of the LETKF experiments grows with the cube of the ensemble size, while the elapsed time of the LPF is significantly lower at large ensemble sizes (Fig. 6).

Analysis error for

The previous section examined the impacts of nonlinearity and
non-Gaussianity on the forecast. We now examine the impacts of non-Gaussian
observation error. Using a multivariate Gaussian mixture model (GM

We consider the example of LPF applied to L96 with the analysis cycle
d

To evaluate the impact on ensemble quality, we consider the mean effective
ensemble size

The 40-cycle moving average of the mean absolute forecast error, comparing the non-smoothed (dashed red) and smoothed (solid blue) LPF analyses. The ensemble space smoothing improves the forecast accuracy.

Illustrating the impact of ensemble space smoothing on the effective
ensemble size

The LPF has been shown to outperform a state-of-the-art ensemble Kalman filter (i.e., LETKF) in scenarios that violate the Gaussian/linear assumptions of the Kalman filter. We showed the advantage of the LPF when forecast is more nonlinear (via longer analysis cycles or less frequent observations), and when observation error is non-Gaussian (using a bimodal error distribution). Further, upon transitioning to large ensembles, the LPF has a significant cost-saving advantage relative to LETKF.

The LPF maintains many of the attractive qualities that give PFs advantages over standard EnKFs. While the PF provides a means of assimilating observations with non-Gaussian errors (e.g., precipitation, sea-ice concentration), we caution that the covariances utilized by the EnKF play a critical role in constraining the unobserved variables. Thus, while the LPF is not optimal for all possible data assimilation scenarios, there is great potential for the LPF to be combined with more traditional approaches to create adaptive hybrid systems that can avoid catastrophic filter divergence and manage multimodal forecast distributions, nonlinear observation operators, and non-Gaussian observations.

We found that a large number of ensemble members (or particles) and
observations are sufficient for the LPF to match or surpass the accuracy of
LETKF. The use of large ensemble sizes is a relevant scenario for realistic
systems running on large supercomputers such as the

In a realistic system, some mechanism is needed to drive the ensemble toward the observations in the event of the ensemble drifting away from the true state. The PF itself has no inherent mechanism to do this other than the brute force generation of more particles. There are many techniques in the PF literature for managing filter divergence, but none of them are foolproof. Atkins et al. (2013) presented a promising extension of the use of an importance density that may connect effectively with the existing infrastructure of variational solvers used by most operational centers. Another popular mechanism to achieve this is regularization, which uses a kernel to sample from a continuous distribution at the resampling stage.

Finally, while the inflation mechanism used here was effective for the L96 system, it is not adequate for more realistic atmospheric or oceanic models. For such systems, either geospatially correlated noise or stochastic physics parameterizations may be capable of performing the same function. Stochastic physics parameterizations are an active area of research, and are under development for a number of operational center models, including NCEP (Hou et al., 2006, 2010; Kolczynski et al., 2015), ECMWF (Berner et al., 2009; Weisheimer et al., 2014; Watson et al., 2015), and the Met Office (Tennant et al., 2011; Sanchez et al., 2014; Shutts and Pallarès, 2014; Shutts, 2015).

We gratefully acknowledge the Japan Society for the Promotion of Science (JSPS) whose FY2013 fellowship supported this work. We would also like to thank the RIKEN Advanced Institute for Computational Science (AICS) for hosting Stephen G. Penny. Stephen G. Penny acknowledges additional support from NOAA award NA15NWS4680016 in support of NOAA's Next Generation Global Prediction System (NGGPS). Edited by: Z. Toth Reviewed by: C. Snyder and one anonymous referee