Fast hybrid tempered ensemble transform ﬁlter formulation for Bayesian elliptical problems via Sinkhorn approximation

. Identiﬁcation of unknown parameters on the basis of partial and noisy data is a challenging task, in particular in high dimensional and non-linear settings. Gaussian approximations to the problem, such as ensemble Kalman inversion, tend to be robust and computationally cheap and often produce astonishingly accurate estimations despite the simplifying underlying assumptions. Yet there is a lot of room for improvement, speciﬁcally regarding a correct approximation of a non-Gaussian posterior distribution. The tempered ensemble transform particle ﬁlter is an adaptive Sequential Monte Carlo (SMC) method, whereby resampling is based on optimal transport mapping. Unlike ensemble Kalman inversion, it does not require any assumptions regarding the posterior distribution and hence has shown to provide promising re-sults for non-linear non-Gaussian inverse problems. However, the improved accuracy comes with the price of much higher computational complexity, and the method is not as robust as ensemble Kalman inversion in high dimensional problems. In this work, we add an entropy-inspired regularisation factor to the underlying optimal transport problem that allows the high computational cost


Introduction
If a solution of a considered partial differential equation (PDE) is highly sensitive to its parameters, accurate estimation of the parameters and their uncertainties is essential to obtain a correct approximation of the solution.Partial observations of the solution are then used to infer uncertain parameters by solving a PDE-constrained inverse problem.For instance, one can approach such problems via methods induced by Bayes' formula (Stuart, 2010).More specifically, the posterior probability density of the parameters given the data is then computed on the basis of a prior probability density and a likelihood, which is the conditional probability density associated with the given noisy observations.The well-posedness of an inverse problem and convergence to the true posterior in the limit of observational noise going to zero were proven for different priors and under assumptions on the parameter-to-observation map by Dashti and Stuart (2017), for example.
When aiming at practical applications as in oil reservoir management (Lorentzen et al., 2020) and meteorology (Houtekamer and Zhang, 2016), for example, the posterior is approximated by means of a finite set of samples.Markov chain Monte Carlo (MCMC) methods approximate the posterior with a chain of samples -a sequential update of samples according to the posterior (Robert and Casella, 2004;Rosenthal, 2009;Hoang et al., 2013).Typically, MCMC methods provide highly correlated samples.Therefore, in order to sample the posterior correctly, MCMC requires a long chain, especially in the case of a multi-modal or a peaked distribution.A peaked posterior is associated with very accurate observations.Therefore, unless a speedup is introduced in a MCMC chain (e.g.Cotter et al., 2013), MCMC is impractical for computationally expensive PDE models.
Adaptive Sequential Monte Carlo (SMC) methods are different approaches to approximate the posterior with an ensemble of samples by computing their probability (e.g, Vergé et al., 2015).Adaptive intermediate probability measures are introduced between the prior measure and the posterior measure to improve upon method divergence due to the curse of dimensionality following Del Moral et al. (2006) and Neal (2001).Moreover, sampling from an invariant Markov kernel with the target intermediate measure and the reference prior measure improves upon ensemble diversity due to parameters' stationarity, as shown by Beskos et al. (2015).However, when parameter space is high dimensional, adaptive SMC requires computationally prohibitive ensemble sizes unless we approximate only the first two moments (e.g.Iglesias et al., 2018) or we sample highly correlated samples (Ruchi et al., 2019).
Ensemble Kalman inversion (EKI) approximates primarily the first two moments of the posterior, which makes it computationally attractive for estimating high dimensional parameters (Iglesias et al., 2014).For linear problems, Blömker et al. (2019) showed well-posedness and convergence of the EKI for a fixed ensemble size and without any assumptions of Gaussianity.However for non-linear problems, it has been shown by Oliver et al. (1996), Bardsley et al. (2014), Ernst et al. (2015), Liu et al. (2017) and Le Gland et al. (2009) that the EKI approximation is not consistent with the Bayesian approximation.
We note that the EKI is an iterative ensemble smoother (Evensen, 2018).Iterative ensemble smoothers for inverse problems introduce a trivial artificial dynamics to the unknown static parameter and iteratively update an estimation of the parameter.Then the parameter-dependent model variables are recomputed using a forward model with a parameter estimation.Examples of iterative ensemble smoothers are ensemble randomised maximum likelihood (Chen and Oliver, 2012), multiple data assimilation (Emerick and Reynolds, 2013) and randomise-then-optimise (Bardsley et al., 2014).
As an alternative ansatz one can employ optimal transport resampling that lies at the heart of the ensemble transform particle filter (ETPF) proposed by Reich (2013).An optimal transport map between two consecutive probability measures provides a direct sample-to-sample map with maximised sample correlation.Along the lines of an adaptive SMC approach, a probability measure is described via the importance weights, and the deterministic mapping replaces the traditional resampling step.A so-called tempered ensemble transform particle filter (TETPF) was proposed by Ruchi et al. (2019).Note that this ansatz does not require any distributional assumption for the posterior, and it was shown by Ruchi et al. (2019) that the TETPF provides encouraging results for non-linear high dimensional PDE-constrained inverse problems.However, the computational cost of solving an optimal transport problem in each iteration is considerably high.
In this work we address two issues that have arisen in the context of the TETPF: (i) the immense computational costs of solving the associated optimal transport problem and (ii) the lack of robustness of the TETPF with respect to high dimensional problems.More specifically, the performance of ETPF has been found to be highly dependent on the initial guess.Although tempering restrains any sharp fail in the importance sampling step due to a poor initial ensemble selection, the number of required intermediate steps and the efficiency of ETPF still depend on the initialisation.The lack of robustness in high dimensions can be addressed via a hybrid approach that combines a Gaussian approximation with a particle filter approximation (e.g.Santitissadeekorn and Jones, 2015).Different algorithms are created by Frei and Künsch (2013) and Stordal et al. (2011), for example.In this paper, we adapt a hybrid approach of Chustagulprom et al. (2016) that uses the EKI as a proposal step for the ETPF with a tuning parameter.Furthermore, it is well established that the computational complexity of solving an optimal transport problem can be significantly reduced via a Sinkhorn approximation by Cuturi (2013).This ansatz has been implemented for the ETPF by Acevedo et al. (2017).
Along the lines of Chustagulprom et al. (2016) and de Wiljes et al. (2020), we propose a tempered ensemble transform particle filter with Sinkhorn approximation (TESPF) and a tempered hybrid approach.
The remainder of the paper is organised as follows: in Sect.2, the inverse problem setting is presented.There we describe the tempered ensemble transform particle filter (TETPF) proposed by Ruchi et al. (2019).Furthermore, we introduce the tempered ensemble transform particle filter with Sinkhorn approximation (TESPF), a tempered hybrid approach that combines the EKI and the TETPF (hybrid EKI-TETPF), and a tempered hybrid approach that combines the EKI and the TESPF (hybrid EKI-TESPF).We provide pseudocodes of all the presented techniques in Appendix A and corresponding computational complexities in Appendix B. In Sect.3, we apply the adaptive SMC methods to an inverse problem of inferring high dimensional permeability parameters for a steady-state single-phase Darcy flow model.Permeability is parameterised following Ruchi et al. (2019), whereby one configuration of parameterisation leads to Gaussian posteriors, while another one leads to non-Gaussian posteriors.Finally, we draw conclusions in Sect. 4.
where η ∼ N (0, R) and N (0, R) is a Gaussian distribution with zero mean and R covariance matrix.The aim is to determine or approximate the posterior measure µ(u) conditioned on observations y obs and given a prior measure µ 0 (u), which is referred to as a Bayesian inverse problem.The posterior measure is absolutely continuous with respect to the prior, i.e. dµ dµ 0 (u) ∝ g(u; y obs ), where ∝ is up to a constant of normalisation, and g is referred to as the likelihood and depends on the forward operator G.
The Gaussian observation noise of the observation y obs implies where denotes the transpose.In the following we will introduce a range of methods that can be employed to estimate solutions to the presented inverse problem under the overarching mantel of tempered Sequential Monte Carlo filters.Alongside these methods we will also propose several important add-on tools required to achieve feasibility and higher accuracy in high dimensional non-linear settings.

Tempered Sequential Monte Carlo
We consider Sequential Monte Carlo (SMC) methods that approximate the posterior measure µ(u) via an empirical measure Here, δ is the Dirac function, and the importance weights for the approximation of µ are .
An ensemble U = {u 1 , . .., u M } ⊂ Ũ consists of M realisations u i ∈ R n of a random variable u that are independent and identically distributed according to u i ∼ µ 0 .
When an easy-to-sample ensemble from the prior µ 0 does not approximate the complex posterior µ well, only a few weights w i have significant value, resulting in a degenerative approximation of the posterior measure.Potential reasons for this effect are high dimensionality of the uncertain parameter, a large number of observations, or lack of accuracy of the observations.An existing solution to a degenerative approximation is an iterative approach based on tempering by Del Moral et al. (2006) or annealing by Neal (2001).The underlying idea is to introduce T intermediate artificial measures {µ t } T t=0 between µ 0 and µ t = µ.These measures are bridged by introducing T tempering parameters {φ t } T t=1 that satisfy 0 = φ 0 <φ 1 <. ..<φT = 1.An intermediate measure µ t is defined as a probability measure that has density proportional to g(u) with respect to the previous measure µ t−1 : Along the lines of Iglesias (2016) the tempering parameter φ t is chosen such that the effective sample size (ESS), does not drop below a certain threshold 1<M thresh <M.Then, an approximation of the posterior measure µ t is A bisection algorithm on the interval (φ t−1 , 1] is employed to find φ t .If ESS t >M thresh , we set φ T = 1, which implies that no further tempering is required.The choice of ESS to define a tempering parameter is supported by results of Beskos et al. (2014) on the stability of a tempered SMC method in terms of the ESS.Moreover, for a Gaussian probability density approximated by importance sampling, Agapiou et al. (2017) showed that ESS is related to the second moment of the Radon-Nikodym derivative (Eq.1).
The SMC method with importance sampling (Eq.4) does not change the sample {u t−1,i } M i=1 , which leads to the method collapse due to a finite ensemble size.Therefore each tempering iteration t needs to be supplied with resampling.Resampling provides a new ensemble { ũt,i } M i=1 that approximates the measure µ t .We will discuss different resampling techniques in Sect.2.3.

Mutation
Due to the stationarity of the parameters, SMC methods require ensemble perturbation.In the framework of particle filtering for dynamical systems, ensemble perturbation is https://doi.org/10.5194/npg-28-23-2021 Nonlin.Processes Geophys., 28, 23-41, 2021 S. Ruchi et al.: Fast hybrid tempered ensemble transform filter for Bayesian elliptical problems achieved by rejuvenation, when ensemble members of the posterior measure are perturbed with a random noise sampled from a Gaussian distribution with zero mean and a covariance matrix of the prior measure.The covariance matrix of the ensemble is inflated, and no acceptance step is performed due to the associated high computational costs for a dynamical system.Since we consider a static inverse problem, for ensemble perturbation we employ a Metropolis-Hastings method (thus we mutate samples) but with a proposal that speeds up the MCMC method for estimating a high dimensional parameter.Namely, we use the ensemble mutation of Cotter et al. (2013) with the target measure µ t and the reference measure µ 0 .The mutation phase is initialised at v 0,i = ũt,i , and at the final inner iteration τ max , we assign u t,i = v τ max ,i for i = 1, . .., M.
For a Gaussian prior we use the preconditioned Crank-Nicolson MCMC (pcn-MCMC) method: (5) Here, m is the mean of the Gaussian prior measure µ 0 , and {ξ τ,i } M i=1 are from a Gaussian distribution with zero mean and a covariance matrix of the Gaussian prior measure µ 0 .
For a uniform prior U [a, b] we use the following random walk: Here Here is from Eq. ( 5) for the Gaussian measure and from Eq. ( 6) for the uniform measure, and The scalar θ ∈ (0, 1] in Eq. ( 5) controls the performance of the Markov chain.Small values of θ lead to high acceptance rates but poor mixing.Roberts and Rosenthal (2001) showed that for high dimensional problems it is optimal to choose θ such that the acceptance rate is in between 20 % and 30 % by the last tempering iteration T .Cotter et al. (2013) proved that under some assumptions this mutation produces a Markov kernel with an invariant measure µ t .
Computational complexity.In each tempering iteration t the computational complexity of the pcn-MCMC mutation is O(τ max MC), where C is the computational cost of the forward model G.For the pseudocode of the pcn-MCMC mutation, please refer to Algorithm 1 in Appendix A. Note that the computational complexity is not affected by the length of u, which is a very desirable property in high dimensions as shown by Cotter et al. (2013) and Hairer et al. (2014).

Resampling phase
As we have already mentioned in Sect.2.1, an adaptive SMC method with importance sampling needs to be supplied with resampling at each tempering iteration t.We consider a resampling method based on optimal transport mapping proposed by Reich (2013).

Optimal transformation
The origin of the optimal transport theory lies in finding an optimal way of redistributing mass which was first formulated by Monge (1781).Given a distribution of matter, e.g. a pile of sand, the underlying question is how to reshape the matter into another form such that the work done is minimal.A century and a half later the original problem was rewritten by Kantorovich (1942) in a statistical framework that allowed it to be tackled.Due to these contributions it was later named the Monge-Kantorovich minimisation problem.The reader is also referred to Peyré and Cuturi (2019) for a comprehensible overview.
Let us consider a scenario whereby the initial distribution of matter is represented by a probability measure µ on the measurable space U, that has to be moved and rearranged according to a given new distribution ν, defined on the measurable space Ũ.In order to describe the link between the two probability measures µ and ν and to minimise a predefined cost associated with the transportation, one aims to find a joint measure on U × Ũ that is a solution to inf where the minimum is computed over all joint probability measures ω on U × Ũ, denoted (µ, ν), with marginals µ and ν, and c(u, ũ) is a transport cost function on (u, ũ) ∈ U × Ũ.The joint measures achieving the infinum are called optimal transport plans.Let µ and ν be two measures on a measurable space ( , F) such that µ is the law of random variable U : → U and ν is the law of random variable Ũ : → Ũ.Then a coupling of (µ, ν) consists of a pair (U, Ũ ).Note that couplings always exist; an example is the trivial coupling in which the random variables U and Ũ are independent.A coupling is called deterministic if there is a measurable function M : U → Ũ such that Ũ = M (U ), and M is called a transport map.Unlike general couplings, deterministic couplings do not always exist.On the other hand there may be infinitely many deterministic couplings.One famous variant of Eq. ( 9), whereby the optimal coupling is known to be a deterministic coupling, is given by The aim of the resampling step is to obtain equally probable samples.Therefore, in resampling based on optimal transport of Reich (2013), the Monge-Kantorovich minimisation problem (Eq.10) is considered for the current posterior measure µ M t (u) given by its samples approximation (Eq.4) and a uniform measure (here the weights in the sample approximation are set to 1/M).The discretised objective functional of the associate optimal transport problem is given by subject to s ij >0 and constraints where matrix S describes a joint probability measure under the assumption that the state space is finite.Then samples { ũt,i } M i=1 are obtained by a deterministic linear transform, i.e.
Reich ( 2013) showed weak convergence of the deterministic optimal transformation (Eq.11) to a solution of the Monge-Kantorovich problem (Eq.9) as M → ∞.Computational complexity.The computational complexity of solving the optimal transport problem with an efficient earth mover's distance algorithm such as FastEMD of Pele and Werman (2009) is of order O(M 3 log M).Consequently the computational complexity of the adaptive tempering SMC with optimal transport resampling (TETPF) is where T is the number of tempering iterations, τ max is the number of pcn-MCMC inner iterations and C is the computational cost of a forward model G.For the pseudocode of the TETPF, please refer to Algorithm 2 in Appendix A.

Sinkhorn approximation
As discussed above, solving the optimal transport problem has a computational complexity of O = M 3 log(M) in every iteration of the tempering procedure.Thus the TETPF becomes very expensive for large M. On the other hand an increase in the number of samples directly correlates with an improved accuracy of the estimation.In order to allow for as many samples as possible, one needs to reduce the associated computational cost of the optimal transport problem.This can be achieved by replacing the optimal transport distance with a Sinkhorn distance and subsequently exploiting the new structure to elude the immense computational time of the EMD (Earth mover's distance) solver, as shown by Cuturi (2013).More precisely the ansatz is built on the fact that the original transport problem has a natural entropic bound that is obtained for S = [ 1 M I M w ], where w = [w 1 , . .., w M ] and I M = [1, . .., 1] ∈ R M , which constitutes an independent joint probability.Therefore, one can consider the problem of finding a matrix S ∈ R M×M that is constrained by an additional lower entropic bound (Sinkhorn distance).This additional constraint can be incorporated via a Lagrange multiplier, which leads to the above regularised form, i.e.
where α>0.Due to additional smoothness the minimum of Eq. ( 12) can be unique and has the form where Z is a matrix with entries and b and a non-negative vectors determined by employing Sinkhorn's fixed point iteration described by Sinkhorn (1967).We will refer to this approach as the tempered ensemble Sinkhorn particle filter (TESPF).
Computational complexity.Solving this regularised optimal transport problem rather than the original transport problem given in Eq. ( 9) reduces the complexity to O(M 2 C(α)), where C(α) denotes a computational scaling factor that depends on the choice of the regularisation factor α. In particular C(α) grows with α.Therefore, one needs to balance between reducing computational time and finding a reasonable approximate solution of the original transport problem when choosing a value for α.For the pseudocode of the Sinkhorn adaptation of solving the optimal transport problem, please refer to Algorithm 3 in Appendix A. For the pseudocode of the TESPF, please refer to Algorithm 4 in Appendix A.

Ensemble Kalman inversion
For Bayesian inverse problems with Gaussian measures, ensemble Kalman inversion (EKI) is one of the widely used algorithms.The EKI is an adaptive SMC method that approximates primarily the first two statistical moments of a posterior distribution.For a linear forward model, the EKI is optimal in the sense that it minimises the error in the mean (Blömker et al., 2019).For a non-linear forward model, the EKI still provides a good estimation of the posterior (e.g.Iglesias et al., 2018).Here we consider the EKI method of Iglesias et al. (2018) where ⊗ denotes the Kronecker product.Then the mean and the covariance are updated as , respectively.Here denotes the transpose, We recall that the non-linear forward problem is y = G(u), the observation y obs has a Gaussian observation noise with zero mean and the covariance matrix R and φ t is a temperature associated with the measure µ t .Since we are interested in an ensemble approximation of the posterior distribution, we update the ensemble members by Here y t,i = y obs + η t,i and η t,i ∼ N (0, t R) for i = 1, . .., M.
Computational complexity.The computational complexity of solving Eq. ( 13) is O(κ 2 n), where n is the parameter space dimension, and κ is the observation space dimension.Then the computational complexity of the EKI is where T is the number of tempering iterations, τ max is the number of pcn-MCMC inner iterations and C is the computational cost of a forward model G.For the pseudocode of the EKI method, please refer to Algorithm 5 in Appendix A.

Hybrid
Despite the underlying Gaussian assumption, the EKI is remarkably robust in non-linear high dimensional settings as opposed to consistent SMC methods such as the TET(S)PF.For many non-linear problems it is desirable to have better uncertainty estimates while maintaining a level of robustness.This can be achieved by factorising the likelihood given by Eq. ( 2), e.g, g(u; y obs ) = g 1 (u; y obs ) • g 2 (u; y obs ), where and Then it is possible to alternate between methods with complementing properties such as the EKI and the TET(S)PF updates; e.g.likelihood, , is used for an EKI update followed by an update with a TET(S)PF on the basis of .
Note that β ∈ [0, 1] and should be tuned according to the underlying forward operator.This combination of an approximative Gaussian method and a consistent SMC method has been referred to as hybrid filters in the data assimilation literature1 (Stordal et al., 2011;Frei and Künsch, 2013;Chustagulprom et al., 2016).This ansatz can also be understood as using the EKI as a more elaborate proposal density for the importance sampling step within SMC (e.g.Oliver et al., 1996).Computational complexity.The computational complexity of combining the two algorithms is for the hybrid EKI-TESPF.For the pseudocode of the hybrid methods, please refer to Algorithm 6 in Appendix A.

Numerical experiments
We consider a steady-state single-phase Darcy flow model defined over an aquifer of a two-dimensional physical do- where ∇ = (∂/∂x ∂/∂y) , • the dot product, P (x, y) the pressure, k(x, y) the permeability, f (x, y) the source term which accounts for groundwater recharge and (x, y) the horizontal dimensions.The boundary conditions are where ∂D is the boundary of domain D. The source term is We implement a cell-centred finite-difference method and a linear algebra solver (backslash operator in MATLAB) to solve the forward model (Eqs.16-17) on an N × N grid.We note that a single-phase Darcy flow model, though not a steady-state model, is widely used to model the flow in a subsurface aquifer and to infer uncertain permeability using data assimilation.For example, Zovi et al. (2017) used an EKI to infer permeability of an existing aquifer located in north-east Italy.The area of this aquifer is 2.7 km 2 and exhibits several channels, such as the one depicted in Fig. 1.There, a size of a computational cell ranged from 2 m (near wells) to 20 m away from the wells.

Parameterisation of permeability
We consider the following two parameterisations of the permeability function k(x, y): F1: log permeability over the entire domain D, u(x, y) = log k(x, y); F2: permeability over domain D that has a channel, k(x, y) = k 1 (x, y)δ D c (x, y) + k 2 (x, y)δ D D c (x, y) as by Iglesias et al. (2014).
Here D c denotes a channel, δ is the Dirac function and k 1 = exp(u 1 (x, y)) and k 2 = exp(u 2 (x, y)) denote permeabilities inside and outside the channel.The geometry of the channel is parameterised by five parameters {d i } 5 i=1 : amplitude, frequency, angle, initial point and width, correspondingly.The lower boundary of the channel is given by y = d 1 sin(d 2 x/6) + tan(d 3 )x + d 4 .The upper boundary of the channel is given by y + d 5 .These parameters are depicted in Fig. 1.
We assume that the log permeability for both F1 and F2 is drawn from a Gaussian distribution µ 0 = N (m, C) with mean m and covariance C. We define C via a correlation function given by the Whittle-Matern correlation function defined by Matérn (1986): where γ is the gamma function, υ = 0.5 is the characteristic length scale and ϒ 1 is the modified Bessel function of the second kind of order 1.
With λ and V we denote eigenvalues and eigenfunctions of the corresponding covariance matrix C, respectively.Then, following a Karhunen-Loève (KL) expansion, log permeability is For F1, the prior for log permeability is a Gaussian distribution with mean 5.The grid dimension is N = 70, and thus the uncertain parameter u = {u } N 2 =1 has dimension 4900.For F2, we assume geometrical parameters d = {d i } 5 i=1 are drawn from uniform priors, namely . Furthermore, we assume independence between geometric parameters and log permeability.The prior for log permeability is a Gaussian distribution with mean 15 outside the channel and with mean 100 inside the channel.The grid dimension is N = 50.Log permeability inside channel u 1 = {u } has dimension 5005.Moreover, for F2 we use the Metropolis-within-Gibbs methodology following Iglesias et al. (2014) to separate geometrical parameters and log permeability parameters within the mutation step, since it allows the structure of the prior to be better exploited.

Observations
Both the true permeability and an initial ensemble are drawn from the same prior distribution as the prior includes knowledge about geological properties.However, an initial guess is computed on a coarse grid, and the true solution is computed on a fine grid that has twice the resolution of the coarse grid.The synthetic observations of pressure are obtained by An element of L(P true ) is a linear functional of pressure, namely for j = 1, . .., κ.
Here σ = 0.01, x 2 is the size of a grid cell X i = (X i , Y i ), N f is the resolution of a fine grid, h j is the location of the observation and κ is the number of observations.This form of the observation functional and the parameterisation F1 and F2 guarantee the continuity of the forward map from the uncertain parameters to the observations and thus the existence of the posterior distribution, as shown by Iglesias et al. (2014).The observation noise η is drawn from a normal distribution with zero mean and known covariance matrix R. We choose the observation noise to be 2 % of L2-norm of the true pressure.With such a small noise the likelihood is a peaked distribution.Therefore, a non-iterative data assimilation approach requires a computationally unfeasible number of ensemble members to sample the posterior.
To save computational costs, we choose an ESS threshold M thresh = M/3 for tempering and the length of the Markov chain τ max = 20 for mutation.

Metrics
We conduct numerical experiments with ensemble sizes M = 100 and M = 500 and 20 simulations with different initial ensemble realisations to check the robustness of results.We analyse the method's performance with respect to a pcn-MCMC solution, from here on referred to as the reference.An MCMC solution was obtained by combining 50 independent chains each of length 10 6 , 10 5 burn-in period and 10 3 thinning.For log permeability, we compute the root mean square error (RMSE) of the mean and u ref is the reference solution.
For geometrical parameters d, we compute the Kullback-Leibler divergence where p ref (d i ) is the reference posterior, p(d i ) is approximated by the weights and M b = M/10 is a chosen number of bins.

Application to F1 inference
For F1, we perform numerical experiments using 36 uniformly distributed observations, which are displayed in circles in Fig. 3a.We plot a box plot of the RMSE given by Eq. ( 18) over 20 independent simulations in Fig. 2a using Sinkhorn approximation and in Fig. 2b using optimal transport.The horizontal axis is for the hybrid parameter β, whose value 0 corresponds to the EKI and 1 to an adaptive SMC method with either a Sinkhorn approximation (TESPF) or optimal transport (TETPF).Ensemble size M = 100 is shown in red and M = 500 in green.First, we observe that at a small ensemble size M = 100 and a large β (namely β ≥ 0.6), TESPF outperforms the TETPF as the RMSE is lower.Since Sinkhorn approximation is a regularisation of an optimal transport solution, the TESPF provides a smoother solution than the TETPF that can be seen in Fig. 3c and 3f, respectively, where we plot mean log permeability.Next, we see in Fig. 2 that the hybrid approach decreases the RMSE compared to TET(S)PF: the smaller the β value, the smaller the median of the RMSE.The EKI gives the smallest error due to the Gaussian parameterisation of permeability.The advantage of the hybrid approach is most pronounced at a large ensemble size of M = 500 and optimal transport resampling.Furthermore, we note a discrepancy between the M = 100 and the M = 500 experiments at β = 0, i.e. for the pure EKI.This is related to the curse of dimensionality.It appears that the ensemble size M = 100 is too small to estimate an uncertain parameter of the dimension 10 3 using 36 accurate observations.However, at the ensemble size M = 500 the EKI alone (β = 0) gives an excellent performance compared to any combination (β>0).We plot mean log permeability at ensemble size M = 100 and the smallest RMSE over 20 simulations in Fig. 3b-f and of the reference in Fig. 3a.We see that the EKI and the TETPF(0.2) estimate not only large-scale features but also small-scale features (e.g.negative mean in the top right corner) unlike the TET(S)PF and TESPF(0.2) well.

Application to F2 inference
For F2, we perform numerical experiments using nine uniformly distributed observations.which are displayed in circles in Fig. 9a.First, we display results obtained using  Sinkhorn approximation.In Fig. 4, we plot a box plot over 20 independent runs of KL divergence given by Eq. ( 19) for amplitude (a), frequency (b), angle (c), initial point (d) and width (e) that define the channel.We see that the EKI outperforms any TESPF(•) including the TESPF for amplitude (a) and width (e).This is due to Gaussian-like posteriors of these two geometrical parameters displayed in Fig. 6c and 6o, respectively.Due to Gaussian-like posteriors the hybrid approach decreases the RMSE compared to the TESPF: the smaller the β value, the smaller the median of the RMSE.
For frequency, angle and initial point, whose KL divergence is displayed in Fig. 4b, c and d, respectively, the behaviour of adaptive SMC is non-linear in terms of β.This is due to non Gaussian-like posteriors of these three geometrical parameters shown in Fig. 6f, i and l, respectively.Due to non Gaussian-like posteriors, the hybrid approach gives an advantage over both the TESPF and the EKI -there is a β = 0 for which the KL divergence is lowest, although it is inconsistent between geometrical parameters.
When comparing the TESPF(•) to the TETPF(•), we observe the same type of behaviour in terms of β: linear for amplitude and width, whose KL divergence is displayed in Fig. 5a and e, respectively, and non-linear for frequency, angle and initial point, whose KL divergence is displayed in Fig. 5b, c and d, respectively.However, the KL divergence is smaller when optimal transport resampling is used instead of Sinkhorn approximation.
In Fig. 6, we plot the posteriors of geometrical parameters: amplitude (a-c), frequency (d-f), angle (g-i), initial point (jl) and width (m-o); on the left the TESPF(0.2) is shown, in the middle the TETPF(0.2) and on the right the EKI.In black is the reference, in red 20 simulations of ensemble size M = 100 and in green 20 simulations of ensemble size M = 500.The true parameters are shown by black crosses.We see that as the ensemble size increases, posteriors approximated by TET(S)PF converge to the reference posterior unlike the EKI.Now we investigate adaptive SMC performance for permeability estimation.First, we display results obtained using Sinkhorn approximation.The box plot shows over 20 independent simulations of the RMSE given by Eq. ( 18) for log permeability outside the channel in Fig. 7a and inside the channel in Fig. 7b.Even though log permeability is Gaussian-distributed, for a small ensemble size M = 100, there is a β = 0 that gives the lowest RMSE both outside and inside the channel.As the ensemble size increases, the methods' performance becomes equivalent.
Next, we compare the TESPF(•) to the TETPF(•) for log permeability estimation outside and inside the channel, whose RMSE is displayed in Fig. 8a and b, respectively.https://doi.org/10.5194/npg-28-23-2021 Nonlin.Processes Geophys., 28, 23-41, 2021  We observe the same type of behaviour in terms of β: nonlinear for a small ensemble size M = 100 and equivalent for a larger ensemble size M = 500.Furthermore, at a small ensemble size M = 100, the TESPF outperforms the TETPF, which was the case for F1 parameterisation (Sect.3.4).In Fig. 9, we show the mean field of permeability over the channelised domain for reference for the lowest error at ensemble size M = 100 for the TESPF(0.2) (b), TESPF (c), EKI (d), TETPF(0.2) (e) and TETPF (f).We plot mean log permeability over the channelised domain at ensemble size M = 100 and the smallest RMSE over 20 simulations in Fig. 9b-f and for the reference in Fig. 9a.We see that TESPF(0.2) does an excellent job at such a small ensemble   size by estimating log permeability outside and inside the channel well, as well as parameters of the channel itself.B1).In particular, the TESPF outperforms the EKI for non-Gaussian distributed parameters (e.g.initial point and angle in F2).This makes the proposed method a promising option for the high dimensional non-linear problems one is typically faced with in reservoir engineering.Further, to counterbalance potential robustness problems of the TETPF and its Sinkhorn adaptation, a hybrid between the EKI and the TET(S)PF is proposed and studied by means of the two configurations of the steady-state single-phase Darcy flow model.The combination of the two adaptive SMC methods with complementing properties, i.e. β ∈ (0, 1), is superior to the individual adaptive SMC method, i.e. β = 0 or 1, for all non-Gaussian distributed parameters and performs better than the pure TETPF and the TESPF for Gaussian distributed parameters in F1.This suggests a hybrid approach has a great potential to obtain robust and highly accurate approximate solutions of non-linear high dimensional Bayesian inference problems.Note that we have considered a synthetic case, where the truth is available, and thus chose β in terms of accuracy of an estimate.However, in a realistic application, the truth is not provided.In the context of state estimation with an underlying dynamical system, it has been suggested to adaptively change the hybrid parameter with respect to the effective sample size.As the tempering scheme is already changed according to the effective sample size, this ansatz would require the interplay between the two tuning variables to be defined.An ad hoc choice for β could be 0.2 or 0.3.This is motivated by the fact that the particle filter is too unstable in high dimensions, and it is therefore sensible to use a tuning parameter prioritising the EKI.The ad hoc choice is supported by the numerical results in Sect.

Figure 1 .
Figure 1.Geometrical configuration of channel flow: amplitude d 1 , frequency d 2 , angle d 3 , initial point d 4 and width d 5 .

Figure 2 .
Figure 2. Application to F1 parameterisation: using Sinkhorn approximation (a) and optimal transport resampling (b).Box plot over 20 independent simulations of the RMSE of mean log permeability.The horizontal axis is for the hybrid parameter, where β = 0 corresponds to the EKI and β = 1 to TET(S)PF.The ensemble size M = 100 is shown in red and M = 500 in green.The central mark is the median, the edges of the box are the 25th and 75th percentiles, whiskers extend to the most extreme data points and crosses are outliers.

Figure 4 .
Figure 4. Application to F2 parameterisation using Sinkhorn approximation.Box plot over 20 independent simulations of KL divergence for geometrical parameters: amplitude (a), frequency (b), angle (c), initial point (d) and width (e).The horizontal axis is for the hybrid parameter, where β = 0 corresponds to the EKI and β = 1 to TET(S)PF.Ensemble size M = 100 is shown in red and M = 500 in green.The central mark is the median, the edges of the box are the 25th and 75th percentiles, whiskers extend to the most extreme data points and crosses are outliers.

Figure 5 .
Figure 5.The same as Fig. 4 but using optimal transport resampling.

Figure 6 .
Figure 6.Posteriors of geometrical parameters for F2 inference: amplitude (a-c), frequency (d-f), angle (g-i), initial point (j-l) and width (m-o).On the left is the TESPF(0.2), in the middle the TETPF(0.2) and on the right the EKI.In black is the reference, in red 20 simulations of ensemble size M = 100 and in green 20 simulations of ensemble size M = 500.The true parameters are shown by black crosses.

Figure 7 .
Figure 7. Application to F2 parameterisation using Sinkhorn approximation.Box plot over 20 independent simulations of RMSE of mean log permeability outside the channel (a) and inside the channel (b).The horizontal axis is for the hybrid parameter, where β = 0 corresponds to the EKI and β = 1 to TET(S)PF.The ensemble size M = 100 is shown in red and M = 500 in green.The central mark is the median, the edges of the box are the 25th and 75th percentiles, whiskers extend to the most extreme data points and crosses are outliers.

A
Sinkhorn adaptation, namely the TESPF, of the previously proposed TETPF has been introduced and numerically investigated for a parameter estimation problem.The TESPF has similar accuracy results to the TETPF (see Figs. 6, 7 and 8), while it can have considerably smaller computational complexity.Specifically, the TESPF has a complexity O[T (MC+M 2 C(α)+τ max MC)] and the TETPF O[T (MC+ M 3 log M + τ max MC)] (for a complete overview, see Table