Over the years data assimilation methods have been developed to obtain estimations of uncertain model parameters by taking into account a few observations of a model state. The most reliable Markov chain Monte Carlo (MCMC) methods are computationally expensive. Sequential ensemble methods such as ensemble Kalman filters and particle filters provide a favorable alternative. However, ensemble Kalman filter has an assumption of Gaussianity. Ensemble transform particle filter does not have this assumption and has proven to be highly beneficial for an initial condition estimation and a small number of parameter estimations in chaotic dynamical systems with non-Gaussian distributions. In this paper we employ ensemble transform particle filter (ETPF) and ensemble transform Kalman filter (ETKF) for parameter estimation in nonlinear problems with 1, 5, and 2500 uncertain parameters and compare them to importance sampling (IS). The large number of uncertain parameters is of particular interest for subsurface reservoir modeling as it allows us to parameterize permeability on the grid. We prove that the updated parameters obtained by ETPF lie within the range of an initial ensemble, which is not the case for ETKF. We examine the performance of ETPF and ETKF in a twin experiment setup, where observations of pressure are synthetically created based on the known values of parameters. For a small number of uncertain parameters (one and five) ETPF performs comparably to ETKF in terms of the mean estimation. For a large number of uncertain parameters (2500) ETKF is robust with respect to the initial ensemble, while ETPF is sensitive due to sampling error. Moreover, for the high-dimensional test problem ETPF gives an increase in the root mean square error after data assimilation is performed. This is resolved by applying distance-based localization, which however deteriorates a posterior estimation of the leading mode by largely increasing the variance due to a combination of less varying localized weights, not keeping the imposed bounds on the modes via the Karhunen–Loeve expansion, and the main variability explained by the leading mode. A possible remedy is instead of applying localization to use only leading modes that are well estimated by ETPF, which demands knowledge of which mode to truncate.

An accurate estimation of subsurface geological properties like permeability and porosity is essential for many fields, especially where such predictions can have a large economic or environmental impact, for instance prediction of oil or gas reservoir locations. Knowing the geological parameters, a so-called forward model is solved for the model state and a prediction can be made. The subsurface reservoirs, however, are buried thousands of feet below the Earth's surface and exhibit a highly heterogeneous structure, which makes it difficult to obtain their geological parameters. Usually prior information about the parameters is given, which still needs to be corrected by observations of pressure and production rates. These observations are, however, known only at well locations that are often hundreds of meters apart and corrupted by errors. This gives instead of a well-posed forward problem an ill-posed inverse problem of estimating uncertain parameters, since many possible combinations of parameters can result in equally good matches to the observations.

Different inverse problem approaches for groundwater and petroleum reservoir
modeling, generally termed history matching, have been developed over the
past years; e.g.,

For reservoir models the terms “data assimilation” and “history matching”
are used interchangeably, as the goal of data assimilation is the same as
that of history matching, where observations are used to improve a solution
of a model. Ensemble data assimilation methods such as ensemble Kalman
filters

Importance sampling (IS) is quite promising for such models as it does not
have any assumptions of Gaussianity. It is also an ensemble-based method in
which the probability density function is represented by a number of samples.
One sample corresponds to one configuration of uncertain model parameters.
The forward model is solved for each sample and predicted data are computed.
The weight is assigned to samples based on the observations of the true physical system and the predicted data.
The drawback of IS is that it does not update the uncertain parameters, but
only their weight; thus, a computationally unaffordable ensemble is required.
In order to decrease this cost,
a family of particle filters

The resampling in particle filtering is, however, stochastic. Ensemble
transform particle filter (ETPF) developed by

In this paper we investigate the performance of ETPF and ETKF for parameter
estimation in nonlinear problems and compare them to IS with a large
ensemble. This paper is organized as follows: in Sect.

We implement an ensemble transform Kalman filter and an ensemble transform
particle filter for estimating parameters of subsurface flow. Both of these
methods are based on a Bayesian framework. Assume we have an ensemble of

Assume we have initially an ensemble of

In particle filtering we represent the probability distribution function
using ensemble members (also called particles) as in ensemble Kalman filter.
We start by assigning prior (background) weights

All variations of ensemble Kalman filter and particle filter are limited by
the ensemble size, since, even if the dimension of the problem is just up to
a few thousands, a large ensemble size will make each run of the model
computationally very expensive. This limit of a small ensemble size
introduces sampling errors. To deal with this issue, localized ETKF (LETKF)
was introduced by

For the local update of a model parameter

Probability density functions for the one-parameter nonlinear
problem. Top: ETPF; bottom: ETKF.

First we consider a one-parameter nonlinear problem from

We consider a steady-state single-phase Darcy flow model defined over an
aquifer of a 2-D physical domain

We perform numerical experiments with synthetic observations, where instead
of a measuring device a model is used to obtain observations. We implement a
cell-centered finite difference method to discretize the domain

For our first numerical experiment with Darcy flow, we consider a
low-dimensional problem where the permeability field is defined by a mere
five parameters similarly to

Further, for this test case we assume piecewise constant permeability within
each of the subdomains; hence,

As was pointed out in Sect.

True permeability of the five-parameter nonlinear problem with dots representing the observation locations.

In Fig.

Probability density functions for the parameters

In order to check the sensitivity of the results to the initial parameter
ensemble, we perform

We compute an average of the relative error over all parameters

Next, we consider a high-dimensional problem where the dimension of the
uncertain parameter is

Mean, minimum, and maximum over 10 simulations after data
assimilation for the data misfit

We perform 10 different simulations based on a random draw of an initial
ensemble from the prior distribution. We conduct the numerical experiments
for ensemble sizes varying from

Log permeability field with dots representing the observation
locations. Truth is shown in

In Fig.

In Fig.

Variance of log permeability fields: obtained with ensemble size

Squared error between the true and mean estimated modes for

The posterior probability density function of parameters

In Fig.

Since first modes are well estimated by ETPF and last modes are not (not
shown), we use only three leading modes in the Karhunen–Loeve expansion
given by Eq. (

Using only three leading modes in the KL expansion.

Same as Fig.

Next we apply LETPF and LETKF. The optimal localization radius between 0.2
and 1.2 was obtained in terms of the smallest RMSE and shown in
Table

Optimal localization radius for LETPF and LETKF at different ensemble sizes

At small ensemble sizes both LETKF and LETPF give smaller misfit, smaller
RMSE but larger variance than ETKF and ETPF. For large ensembles LETKF
performs worse than ETKF, which is due to the imposed range on localization
radius, meaning that 1.2 is not optimal. Comparing the performance of LETPF
to (L)ETKF we observe that at small ensemble sizes LETKF still outperforms
ETPF, but at large ensemble sizes LETPF performs now comparably to ETKF.
Moreover, LETPF overfits the data less often than ETPF:

In Figs.

The posterior estimation of the leading mode

Mean over 10 simulations after data assimilation for the data misfit

Same as Fig.

Same as Fig.

MCMC methods remain the most reliable methods for estimating the posterior distributions of uncertain model parameters and states. They, however, also remain computationally expensive. Ensemble Kalman filters (ETKFs) provide computationally affordable approximations but rely on the assumptions of Gaussian probabilities. For nonlinear models, even if the prior is Gaussian, the posterior is not Gaussian anymore. Particle filtering on the other hand does not have such an assumption, but requires a resampling step, which is usually stochastic. ETPF is a particle filtering method that deterministically resamples the particles based on their importance weights and covariance maximization among the particles.

ETPF certainly outperforms ETKF for a one-parameter nonlinear test case by giving a better posterior estimation. For the five-parameter test case, the mean estimations obtained by ETPF are not consistently better than the ones obtained by ETKF, and the spread is smaller. The Kullback–Leibler divergence from ETKF is smaller than from ETPF for all the parameters. When the number of uncertain parameters is large (2500), a decrease in degrees of freedom is essential. This is performed by localization. At large ensemble sizes LETPF performs as well as LETKF, while at small ensemble sizes LETKF still outperforms LETPF. Even though LETPF overfits the data less often than ETPF, localization destroys the property of ETPF to retain the imposed bounds. This deteriorates a posterior estimation of the leading mode. Another plausible drawback of localization is an assumption of observations being local, which might not be the case for inverse modeling. An alternative approach to improve ETPF performance is instead by applying localization to use only leading modes in the approximation of log permeability, as they are better estimated by the method. However, one needs to know at which mode to truncate, and this is highly dependent on the covariance matrix of log permeability.

To conclude, we believe that ETPF is promising for inverse modeling. However, more theoretical studies have to be performed for ETPF before it is considered for realistic applications. Plausible issues related to realistic application are numerous accurate observations, time dependency of an underlying model, and a flow being multiphase, for example.

Data and MATLAB codes for generating the plots are available in Dubinkina and Ruchi (2018).

All authors contributed equally to this work.

The authors declare that they have no conflict of interest.

This work is part of the research programme Shell-NWO/FOM Computational Sciences for Energy Research (CSER) with project number 14CSER007 which is partly financed by the Netherlands Organization for Scientific Research (NWO). Edited by: Takemasa Miyoshi Reviewed by: two anonymous referees