Application of ensemble transform data assimilation methods for parameter estimation in reservoir modelling

Over the years data assimilation methods have been developed to obtain estimations of uncertain model parameters by taking into account a few observations of a model state. The most reliable methods of MCMC are computationally expensive. Sequential ensemble methods such as ensemble Kalman filers and particle filters provide a favourable alternative. However, Ensemble Kalman Filter has an assumption of Gaussianity. Ensemble Transform Particle Filter does not have this assumption and has proven to be highly beneficial for an initial condition estimation and a small number of parameter estimation in chaotic dynamical systems with non-Gaussian distributions. In this paper we employ Ensemble Transform Particle Filter (ETPF) and Ensemble Transform Kalman Filter (ETKF) for parameter estimation in nonlinear problems with 1, 5, and 2500 uncertain parameters and compare them to importance sampling (IS). We prove that the updated parameters obtained by ETPF lie within the range of an initial ensemble, which is not the case for ETKF. We examine the performance of ETPF and ETKF in a twin experiment setup and observe that for a small number of uncertain parameters (1 and 5) ETPF performs comparably to ETKF in terms of the mean estimation. For a large number of uncertain parameters (2500) ETKF is robust with respect to the initial ensemble while ETPF is sensitive due to sampling error. Moreover, for the high-dimensional test problem ETPF gives an increase in the root mean square error after data assimilation is performed. This is resolved by applying distance-based localization, which however deteriorates a posterior estimation of the leading mode by largely increasing the variance. A possible remedy is instead of applying localization to use only leading modes that are well estimated by ETPF, which demands a knowledge at which mode to truncate.


Introduction
An accurate estimation of subsurface geological properties like permeability, porosity etc. is essential for many fields specially where such predictions can have large economic or environmental impact, for instance prediction of oil or gas reservoir locations. Knowing the geological parameters a so-called forward model is solved for the model state and a prediction can be made. The subsurface reservoirs, however, are buried thousands of feet below the earth surface and exhibit a highly heterogeneous structure, which makes it difficult to obtain their geological parameters. Usually a prior information about the parameters is given, which still needs to be corrected by observations of pressure and production rates. These observations are, however, known only at well locations that are often hundreds of meter apart and corrupted by errors. This gives instead of a well-posed forward problem an ill-posed inverse problem of estimating uncertain parameters, since many possible combinations of parameters can result in equally good matches to the observations. Different inverse problem approaches for groundwater and petroleum reservoir modelling, generally termed as history matching, have been developed over the past years, e.g. in [13] the authors implemented Markov chain Monte Carlo methods with different perturbations and tested it on a 2-D reservoir model; [19] obtained reservoir parameter estimations using Gauss-Newton method; [20] used Levenberg-Marquardt method to characterize reservoir pore pressure and permeability. A review of history matching developments has been written in the review paper [12].
For reservoir models the term data assimilation and history matching are used interchangeably, as the goal of data assimilation is the same as that of history matching, where observations are used to improve a solution of a model. Ensemble data assimilation methods such as Ensemble Kalman filters [6] have been originally developed in meteorology and oceanography for the state estimation. Now it is one of the frequently employed approaches for parameter estimation in subsurface flow models as well e.g. [14]. A detailed review of ensemble Kalman filter developments in reservoir engineering can be found in [1]. An ensemble Kalman filter efficiently approximates a true posterior distribution if the distribution is not far from Gaussian, as it corrects only the mean and the variance. For nonlinear models with multimodal distributions, however, an ensemble Kalman filter fails to correctly estimate the posterior, as shown in [5].
Importance Sampling (IS) is quite promising for such models as it does not have any assumptions of Gaussianity. It is also an ensemble based method in which the probability density function is represented by a number of samples. One sample corresponds to one configuration of uncertain model parameters. The forward model is solved for each sample and predicted data is computed. The weight is assigned to samples based on the observations of the true physical system and the predicted data. The drawback of IS is that it does not update the uncertain parameters but only their weight, thus a computationally unaffordable ensemble is required. In order to decrease this cost a family of particle filters [4] has been developed where IS is supplied with resampling, and a sample is called particle. A significant work for parameter estimation using particle filtering has been done in hydrology. In [11] authors used it to estimate model parameters and state posterior distributions for a rainfall-runoff model. [21] compared an ensemble Kalman filter and a particle filter with different resampling strategies for a rainfall-runoff forecast and obtained that as the number of particles increases the particle filter outperforms the ensemble Kalman filter. [8] employed particle filtering to correct the soil moisture and to estimate hydraulic parameters.
The resampling in particle filtering is, however, stochastic. Ensemble Transform Particle Filter (ETPF) [18] is a particle filtering method that deterministically resamples the particles based on their weights and covariance maximization among the particles. ETPF has been used for initial condition estimations and for parameter estimations in chaotic dynamical systems with a small number of uncertain parameters (Lorenz 63 model). It has not been applied, however, in subsurface reservoir modelling for estimating a large number of uncertain parameters. In this paper we employ it for estimating uncertain parameters in subsurface reservoir modelling. ETPF provides the equations that are solved in the space defined by the ensemble members. Therefore for comparison we employ Ensemble Transform Kalman Filter (ETKF) [2] that also transforms the state from the model space to the ensemble space, minimises the uncertainty in the ensemble space and transforms the estimation back to the model space.
In this paper we investigate the performance of ETPF and ETKF for parameter estimation in nonlinear problems and compare them to IS with a large ensemble. This paper is organized as follows: in Sect. 2 we describe IS, ETPF, and ETKF for parameter estimation. We apply these methods in Sect. 3 to a one parameter nonlinear test case, where the posterior can be computed analytically, and in Sect. 4 to a single-phase Darcy flow, where the number of parameters is 5 and 2500. In Sect. 5 we draw the conclusions.

Data assimilation methods
We implement an ensemble transform Kalman filter and an ensemble transform particle filter for estimating parameters of subsurface flow. Both of these methods are based on Bayesian framework. Assume we have an ensemble of M model parameters { u m } M m=1 , then according to this framework, the posterior distribution, which is the probability distribution π( u m | y obs ) of the model parameters u m given a set of observations y obs , can be estimated by the pointwise multiplication of the prior probability distribution π( u m ) of the model parameters u m and the conditional probability distribution π( y obs | u m ) of the observations given the model parameters, which is also referred as the likelihood function, π( u m | y obs ) = π( y obs | u m )π( u m ) π( y obs ) .
The denominator π( y obs ) represents the marginal of observations and can be expressed as: which shows that π( y obs ) is just a normalisation factor.

Ensemble Transform Kalman Filter
Assume we have initially an ensemble of M model parameters { u b m } M m=1 , where b refers to a background (prior) ensemble, which are sampled from a chosen prior probability density function, then the ensemble Kalman estimate (or analysis) { u a m } M m=1 is given by: where diag is a diagonal matrix, s lm is the (l, m) entry of a matrix S and q l is the l-th entry of a column q Here I is an identity matrix of size M × M , 1 M is a vector of size M with all ones,¯ y b is the mean of the predicted data defined by¯ A b is the background ensemble anomalies of the predicted data defined as and R is the measurement error covariance. To ensure that the anomalies of analysis remain zero centered we check whether A a 1 M = A b S1 M = 0, given S1 M = 1 M and A b 1 M = 0. The model parameters u b m and the predicted data y b m are related by y b m = h( u b m ), where h is a nonlinear function and here we assume that the function h is known.

Ensemble Transform Particle Filter
In particle filtering we represent the probability distribution function using ensemble members (also called particles) as in ensemble Kalman filter. We start by assigning prior (background) weights {w b m } M m=1 to M particles and then compute new (analysis) weights {w a m } M m=1 using the Bayes' formula and observations y obs We assume that initially all particles have equal weight, thus w b m = 1/M for m = 1, . . . , M , and that the likelihood is Gaussian with error covariance matrix R, then from Eq. (2.2) w a m is given by In Importance Sampling (IS), which will be used in this paper as a "ground" truth, these weights define the posterior pdf. The mean parameter for IS is then It is important to note that IS does not change the parameters u, it only modifies the weight of the particles (samples). Therefore a resampling needs to be implemented for parameter estimation, which is usually stochastic. Instead particle filtering has been modified using a deterministic coupling methodology which resulted in an ensemble transform particle filter of [18]. ETPF looks for a coupling between two discrete random variables B 1 and B 2 so as to convert the ensemble members belonging to the random variable B 2 with probability distribution π(B 2 = u b m ) = w a m to the random variable B 1 with uniform probability distribution π(B 1 = u b m ) = 1/M . The coupling between these two random variables is an M × M matrix T whose entries should satisfy An optimal coupling matrix T * with elements t * mj minimizes the squared Euclidean distance and the analysis model parameters are obtained by the linear transformation Then the mean parameter for ETPF is¯

M .
We use F astEM D algorithm developed by Pele & Werman [15] to solve the linear transport problem and get the optimal transport matrix. Remark: An important property of ETPF is preservation of imposed interval bounds on ensemble members. Consider an ensemble of parameters { u b m } M m=1 given by where we assume all the parameters {a b m } M m=1 , {b b m } M m=1 and {c b m } M m=1 are bounded between 0 and 1. Therefore, the following inequalities hold: Now we assume two discrete random variables B 1 and B 2 have probability distributions given by with w a m ≥ 0, m = 1, . . . , M and M m=1 w a m = 1. As ETPF looks for a matrix T * which defines coupling between these two probability distributions, each entry of this coupling matrix satisfies the conditions given by Eq. (2.4)-(2.6). These conditions assure that each entry of the coupling matrix will be non-negative and less than 1. Since the analysis given by Eq. (2.8) is . . , M. Thus the coupling matrix bounds the analysis ensemble members to be in the desired range. This is not observed in ETKF as the matrix S given by Eq. (2.1) does not impose any of the non-equality and equality constraints, so it results in values outside the bound.

Localization
All variations of ensemble Kalman filter and particle filter are limited by the ensemble size. Since, even if the dimension of the problem is just up to a few thousands, a large ensemble size will make each run of the model computationally very expensive. This limit of a small ensemble size introduces sampling errors. To deal with this issue localized ETKF (LETKF) was introduced in [9] and localized ETPF (LETPF) in [18]. More recent approaches to particle filter localization include [16] and [17].
For the local update of a model parameter u m (X i ) at a grid point X i , we introduce a diagonal matrix C i ∈ R Ny×Ny in the observation space with an element where i = 1, . . . , n 2 , l = 1, . . . , N y , n 2 is the number of model parameters, N y is the dimension of the observation space, r l denotes the location of the observation, r loc is a localisation radius and ρ(·) is a taper function, such as Gaspari-Cohn function [7] Then the estimated model parameter at the location X i is where diag is a diagonal matrix, s lm (X i ) is the (l, m) entry of the localized transformation matrix S(X i ) LETPF modifies the likelihood and thus the weights given by Eq.
whereĈ i is the diagonal matrix given by Eq. (2.9). Then the estimated model parameter u a j (X i ) at the grid X i is given by where t * mj is an element of an optimal coupling matrix T * which minimizes the squared Euclidean distance at the grid point X i which reduces LETPF to a univariate transport problem. It should be noted that localization can be applied only for grid-dependent parameters.

One parameter nonlinear problem
First we consider a one parameter nonlinear problem from [3]. The prior distribution is Gaussian distribution with mean 4 and variance 1. The nonlinear forward model is The true parameter u true gives h(u true ) = 48 and the observation error is drawn from a Gaussian distribution with zero mean and variance 16. In Fig. 1 we plot the posterior probability density functions estimated by ETPF (top), ETKF (bottom) with ensemble sizes 10 2 (left), 10 3 (center), and 10 4 (right). The prior distribution is shown in red and the posterior estimated by IS with ensemble size 10 5 is shown in black. We can see that ETPF provides better approximation of the true probability density function, while ETKF gives a skewed posterior. It should be noted that ETKF is able to give a non-Gaussian (though wrong) posterior due to the nonlinearity of the map between the uncertain parameters and observations.

Single-phase Darcy flow
We consider a steady-state single-phase Darcy flow model defined over an aquifer of two-dimensional physical domain D = [0, 1] × [0, 1], which is given by, where ∇ = (∂/∂x ∂/∂y) T , · denotes the dot product, P (x, y) the pressure, k(x, y) the permeability, f (x, y) the source term, which we assume to be 2π 2 cos(πx)cos(πy), and ∂D the boundary of domain D.
The forward problem of this second order elliptical equation is to find the solution of pressure P (x, y) for given f (x, y) and k(x, y). We, however, are interested in finding permeability given noisy observations of pressure at a few locations. We perform numerical experiments with synthetic observations, where instead of a measuring device a model is used to obtain observations. We implement a cell-centered finite difference method to discretize the domain D into n × n grid cells X i of size ∆x 2 and solve the forward model with the true parameters. Then the synthetic observations are obtained by with an element of L(P) being a linear functional of pressure, namely where n = 50, σ = 0.01, r l denotes the location of the observation and N y = 16, which is the number of observations. The observation locations are spread uniformly across the domain D and η denotes the observation noise drawn from a normal distribution with zero mean and standard deviation of 0.09. This form of the observation functional and parametrization of the uncertain parameters given below guaranty the continuity of the forward map from the uncertain parameters to the observations and thus the existence of the posterior distribution as shown in [10].

Five parameter nonlinear problem
For our first numerical experiment with Darcy flow, we consider a low-dimensional problem where the permeability field is defined by mere 5 parameters similarly to [10]. We assume that the entire domain D = [0, 1] × [0, 1] is divided into two subdomains D 1 and D 2 as shown in Fig. 2. Each subdomain of D represents a layer and is assumed to have a permeability function k(X), where an element of X is defined by X i for i = 1, . . . , n 2 . Parameters a and b denote the thickness of the bottom layer on either side, which correspondingly defines the slope of the interface. A parameter c defines a vertical fault.The layer moves up or down depending on c < 0 or c > 0, respectively, and its location is assumed to be fixed at x = 0.5.
Further, for this test case we assume piecewise constant permeability within each of the subdomains, hence k(X) is given by where k 1 and k 2 represent permeability of the subdomain D 1 and D 2 , respectively, and δ is Dirac function. Then the parameters defining the permeability field for this configuration are We assume that the true parameters are a true = 0.6, b true = 0.3, c true = −0.15, k true 1 = 12 and k true 2 = 5. These parameters are used to create synthetic observations. Figure 2 shows the true permeability with dots representing the observation locations. Next, we assume that the five uncertain parameters are drawn from a uniform distribution over a specified interval, namely a, b ∼ U[0, 1], c ∼ U[−0.5, 0.5], k 1 ∼ U [10,15] and k 2 ∼ U [4,7].
As it was pointed out in Sect. 2.2, ETPF updates the parameters within the original range of an initial ensemble, while ETKF does not. Therefore a change of variables has to be performed for ETKF so that the updated parameters are physically viable. In order to be consistent we perform the change of variables for ETPF as well. As the domain D is [0, 1] × [0, 1], the parameters a and b should lie within the interval [0, 1]. To enforce this constraint we substitute a according to and similarly b is substituted by b ′ . Thus the uncertain parameters are now u ′ = (a ′ b ′ c log(k 1 ) log(k 2 )) T . In Fig. 3 we plot probability density functions for parameters a (a)-(d), c (e)-(h) and log(k 2 ) (i)-(l), as the parameters b and log(k 1 ) show similar results. The posterior obtained by IS with ensemble size 10 6 is plotted as a black line and the true value of parameters is plotted as a black line with crosses. The posterior of ETPF is shown at the top and the posterior of ETKF at the bottom. ETPF and ETKF used 10 3 (odd columns) and 10 4 (even columns) ensemble members. In order to perform an objective comparison between the probabilities we compute the Kullback-Leibler divergence of a posterior π obtained by either ETPF or ETKF and the posterior π IS obtained by IS where N b = 20 is the number of bins. The Kullback-Leibler divergence for parameters a, c and log(k 2 ) is displayed in the titles of Fig. 3, where we observe that ETKF outperforms ETPF. In order to check the sensitivity of the results to the initial parameter ensemble we perform 10 simulations based on a random draw of an initial ensemble from the same prior distributions. We conduct the numerical experiments for ensemble sizes varying from 10 to 10 3 with an increment of 50. In Fig. 4 we plot the true parameters, the mean estimated by IS, the mean¯ u a and the spread¯ u a ±¯ u a std of estimated parameters averaged over 10 simulations M is ensemble size, i = 1, . . . , 5 is parameter index, and the superscript a is for analysis. We observe that all the methods including IS have a bias in the estimations of geometrical parameters, which is due to a small number of observations. ETPF and ETKF perform comparably in terms of mean estimation, though some are better estimated by ETKF and other are better estimated by ETPF. Comparing the error in pressure of the mean parameters we observe that the methods are equivalent (thus not shown), which is a manifestation of the ill-posedness of the problem. In Fig. 4 we see that the spread from ETPF is smaller than from ETKF for each parameter. Both methods are slightly underdispersive as the spread to error ratio is below 1. For ensemble size 10 3 ETKF gives (0.95 0.88 0.88 0.97 0.98) and ETPF gives (0.92 0.81 0.84 0.99 0.86) for (a b c log(k 1 ) log(k 2 )). Thus ETKF gives better ratio for all the parameters but log(k 1 ).
We compute an average of the relative error over all parameters RE a,r = 1 5   simulation r as a function of ensemble size. ETPF is shown in blue and ETKF in red. Black line is at zero level. Positive values of the differences mean an increase of either data mismatch or relative error after data assimilation. We observe a data misfit decrease for both ETPF and ETKF except at an ensemble size 10. RE does not always decrease for ETPF: for some simulations ETPF is at zero level or slightly above it, while for ETKF the sole exception is at an ensemble size 10.

High-dimensional nonlinear problem
Next, we consider a high-dimensional problem where the dimension of the uncertain parameter is n 2 = 2500. The domain D is now not divided into subdomains. However, unlike in the previous test case here we implement a spatially varying permeability field. We assume the log permeability is generated by a random draw from a Gaussian distribution N(log(5), C). Here 5 is an n 2 vector with all 5. C is assumed to be an exponential correlation with an element of C being Here h i,j is the distance between two spatial locations and v is the correlation range which is taken to be 0.5. For the log permeability we use Karhunen-Loeve expansions of the form where λ and ν are eigenvalues and eigenfunctions of C, respectively, and the vector Z is of dimension n 2 iid from a Gaussian distribution with zero mean and variance one. Making sure that the eigenvalues are sorted in descending order Z i ∼ N(0, 1) produces log(k) ∼ N(log(5), C). The uncertain parameter is thus u = Z with the dimension n 2 = 2500. We perform 10 different simulations based on a random draw of an initial ensemble from the prior distribution. We conduct the numerical experiments for ensemble sizes varying from 10 to 10 3 with an increment of 50. We compute the root mean square error (RMSE) of the log permeability field We also compute the data misfit for each simulation after data assimilation by Eq. (4.2). In Fig. 6 we plot mean, minimum and maximum over 10 simulations after data assimilation for the data misfit (left), RMSE (center), and variance (right). ETPF is shown in blue and ETKF in red. We observe that ETPF is underdispersive compared to ETKF as particle filters are highly degenerative compared to Kalman filters. Misfit given by ETPF is smaller than the one given by ETKF for almost all simulations at ensemble sizes greater than 150. The RMSE on the contrary is larger. In Fig. 7(a)-(b) we plot (misfit a,r − misfit b,r ) and (RMSE a,r − RMSE b,r ), respectively, as a function of ensemble size for a simulation r = 1, . . . , 10. The superscript b is for the metrics before data assimilation and the superscript a is for the metrics after data assimilation. ETKF always provides a decrease in both the data misfit and RMSE except at ensemble size 10. ETPF gives a decrease in the data misfit though an increase in RMSE, which indicates that ETPF overfitts the data. However, as the ensemble size increases this happens less often as can be seen in Fig. 7(c), where we plot for ETPF a percentage of simulations that result in (RMSE a − RMSE b ) > 0 and a linear fit as a function of ensemble size. In Fig. 8 we plot log permeability fields. In Fig. 8(a) the true permeability is shown with dots representing the observation locations, and in Fig. 8(d) the mean permeability field obtained by IS with ensemble size 10 5 . The RMSE provided by IS is 32.62. In Fig. 8(b-e) and Fig. 8(c-f) we display mean permeability fields obtained with ensemble size 10 3 by ETPF and ETKF, respectively. In Fig. 8(b-c) we plot the mean log permeabilities for the smallest RMSE over simulations, which is 30.51 for ETPF and 32.48 for ETKF. In Fig. 8(d-e) we plot the mean log permeabilities for the largest RMSE over simulations, which is 39.2 for ETPF and 33.87 for ETKF. We observe that ETKF as well as IS provide smooth mean permeability fields that have smaller absolute values than the true permeability. ETPF gives higher variations of the mean permeability field and is in an excellent agreement with the true permeability for a good initial ensemble shown in Fig. 8(b). This means that ETPF sensitivity to the initial sample is due to sampling error and that the spatial variability of ETPF is a result of sampling error. It should be noted that IS with ensemble size 10 3 and this good initial ensemble gives the RMSE 30.51 and the same mean log permeability field as ETPF shown in Fig. 8(b). However, IS does not change the parameters, only their weights, while ETPF does change the parameters. Therefore ETPF has an advantage of IS representing the correct posterior but does not have its disadvantage of resampling lacking. In Fig. 9 we plot the variance of the permeability fields obtained with ensemble size 10 5 by IS (d), with ensemble size 10 3 by ETPF (b-e) and ETKF (c-f). Fig. 9(b-c) is for the smallest RMSE and Fig. 9(e-f) is for the largest RMSE. ETKF provides smoother variance than ETPF due to smaller sampling errors. In Fig. 10 we show squared error (Z a − Z true ) 2 in blue for ETPF and in red for ETKF for three leading modes Z 1 (a), Z 2 (b), and Z 3 (c), where solid line is for median and shaded area is for 25 and 75 percentile over 10 simulations. We observe that in terms of the estimation of the three leading modes ETPF outperforms ETKF. In Fig. 11 we plot the posterior of Z 1 (left), Z 2 (center), and Z 3 (right) obtained by IS with ensemble size 10 6 and by ETPF (top) and ETKF (bottom) with ensemble size 10 4 . The posterior of these modes is roughly approximated by ETPF as shown in Fig. 11 (a)-(c). ETKF provides a skewed posterior of the modes shown in Fig. 11 (d)-(f), which was also observed in the one parameter nonlinear problem, see Fig. 1(f). In order to perform an objective comparison between the probabilities we compute the Kullback-Leibler divergence of a posterior π obtained by either ETPF or ETKF and the posterior π IS obtained by IS according to Eq. (4.1). ETPF gives the Kullback-Leibler divergence 0.21, 0.42, and 0.6, while ETKF 0.16, 0.07, and 0.5 for the modes Z 1 , Z 2 , and Z 3 , respectively. Thus ETKF gives a better approximation of the true pdf. Since first modes are well estimated by ETPF and last modes are not (not shown), we use only three leading modes in the Karhunen-Loeve expansion given by Eq. (4.3) when computing the estimated log permeability keeping the number of uncertain parameters the same, namely 2500. In Fig. 12(a) we observe that ETPF outperforms ETKF for large ensemble sizes independent of an initial sample. Moreover, ETPF is not overfitting the data anymore since RMSE always decreases after data assimilation except at small ensemble sizes shown in Fig. 12(b). In Fig. 13 we show the mean fields for the best and worst initial samples of 10 4 size. ETPF gives RMSE at the best sample 31.1 and the worst sample 32.98. By comparing it to 30.51 and 39.2 obtained using the full Karhunen-Loeve expansions, we observe that the maximum RMSE over simulations decreased substantially, while the minimum RMSE only slightly increased. ETKF gives RMSE at the best sample 32.27 and the worst sample 33.23. (Compare to 32.48 and 33.9 using the full Karhunen-Loeve expansions). Thus ETKF slightly decreases both maximum and minimum RMSE over simulations. ETPF is more affected by sampling noise at small scales, so using a truncated representation of the fields significantly improves the results for ETPF. ETKF is filtering out the small scales that are not observed and thus is less affected by the truncation.
Next we apply LETPF and LETKF. The optimal localization radius between 0.2 and 1.2 was obtained in terms of the smallest RMSE and shown in Table 1. It should be noted that smaller localization radius for LETPF than for LETKF was also observed by [?] and it is probably related to more noisy approximation of the posterior by LETPF than by LETKF. In Fig. 14 we plot misfit, RMSE and variance.
At small ensemble sizes both LETKF and LETPF give smaller misfit, smaller RMSE but larger variance than ETKF and ETPF. For large ensembles LETKF performs worse than ETKF, which is due to the imposed range on localization radius, meaning that 1.2 is not optimal. Comparing the performance of LETPF to (L)ETKF we observe that at small ensemble sizes LETKF still outperforms ETPF but at large ensemble sizes LETPF performs now comparably to ETKF. Moreover, LETPF overfits the data less often than ETPF: 40% against 90% for ensemble size 10 and 0% against non-zero% for ensemble sizes greater than 150 (not shown).
In Fig. 15-16   We observe that localization decreases the sampling noise and the spatial variability of the mean field obtained by ETPF at ensemble size 10 3 resembles IS at ensemble size 10 5 . The variance obtained by ETPF with localization shown in Fig. 16(b-e) has also improved. The posterior estimation of the leading mode Z 1 , however, degraded, while of Z 2 and Z 3 improved. The Kullback-Leibler divergence for the leading mode is 0.73 (compare to 0.21 without localization), and for second and third is 0.2 and 0.18, correspondingly (compare to 0.42 and 0.6 without localization). Variance of the posteriors is larger when localization is applied for both methods. The localized weights given by Eq. (2.10) vary less than the non-localized weights given by Eq. (2.3). Therefore the localized pdf is less noisy than the non-localized. However, localization applied in the form of the Karhunen-Loeve expansion given by Eq. (4.3) does not retain the imposed bounds on the modes Z as we need to invert a matrix product of eigenvalue and eigenvector matrices to obtain the modes. Moreover unlike ETKF, LETPF does not converge to ETPF as the localization radius goes to infinity due to the transport problem being univariate for LETPF and multivariate for ETPF.

Conclusions
MCMC methods remain the most reliable methods for estimating the posterior distributions of uncertain model parameters and states. They, however, also remain computationally expensive. Ensemble Kalman filters provide computationally affordable approximations but rely on the assumptions of Gaussian probabilities. For nonlinear models even if the prior is Gaussian the posterior is not Gaussian anymore. Particle filtering on the other hand does not have such an assumption but requires a resampling step, which is usually stochastic. Ensemble transform particle filter is a particle filtering method that deterministically resamples the particles based on their importance weights and covariance maximization among the particles. ETPF certainly outperforms ETKF for a one parameter nonlinear test case by giving a better posterior estimation. This conclusion also holds for the five parameter test case, however demands a substantially larger ensemble size. Moreover the mean estimations obtained by ETPF are not consistently better than the ones obtained by ETKF. When the number of uncertain parameters is large (2500) a decrease of degrees of freedom is essential. This is performed by localization. At large ensemble sizes ETPF performs as well as ETKF, while at small ensemble sizes ETKF still outperforms ETPF. Even though localized ETPF overfits the data less often than non-localized, localization destroys the property of ETPF to retain the imposed bounds. This results in deterioration of the first mode posterior approximation. Another approach to improve ETPF performance is instead of applying localization to use only first modes in the approximation of log permeabilty as they are better estimated by the method. An advantage of this approach is that it is fully Bayesian. However, one needs to know at which mode to make a truncation and this is highly dependent on the covariance matrix of the log permeability.