Bridging classical data assimilation and optimal transport: the 3D-Var case

Bocquet, Marc; Vanderbecken, Pierre J.; Farchi, Alban; Dumont Le Brazidec, Joffrey; Roustan, Yelva

doi:https://doi.org/10.5194/npg-31-335-2024

Articles | Volume 31, issue 3

https://doi.org/10.5194/npg-31-335-2024

Articles | Volume 31, issue 3

Research article

12 Jul 2024

Research article |

| 12 Jul 2024

Bridging classical data assimilation and optimal transport: the 3D-Var case

Marc Bocquet, Pierre J. Vanderbecken, Alban Farchi, Joffrey Dumont Le Brazidec, and Yelva Roustan

Abstract

Because optimal transport (OT) acts as displacement interpolation in physical space rather than as interpolation in value space, it can avoid double-penalty errors generated by mislocations of geophysical fields. As such, it provides a very attractive metric for non-negative, sharp field comparison – the Wasserstein distance – which could further be used in data assimilation (DA) for the geosciences. However, the algorithmic and numerical implementations of such a distance are not straightforward. Moreover, its theoretical formulation within typical DA problems faces conceptual challenges, resulting in scarce contributions on the topic in the literature.

We formulate the problem in a way that offers a unified view with respect to both classical DA and OT. The resulting OTDA framework accounts for both the classical source of prior errors, background and observation, and a Wasserstein barycentre in between states which are pre-images of the background state and observation vector. We show that the hybrid OTDA analysis can be decomposed as a simpler OTDA problem involving a single Wasserstein distance, followed by a Wasserstein barycentre problem that ignores the prior errors and can be seen as a McCann interpolant. We also propose a less enlightening but straightforward solution to the full OTDA problem, which includes the derivation of its analysis error covariance matrix. Thanks to these theoretical developments, we are able to extend the classical 3D-Var/BLUE (best linear unbiased estimator) paradigm at the core of most classical DA schemes. The resulting formalism is very flexible and can account for sparse, noisy observations and non-Gaussian error statistics. It is illustrated by simple one- and two-dimensional examples that show the richness of the new types of analysis offered by this unification.

Download & links

Article (PDF, 4603 KB)

Download & links

How to cite.

Received: 20 Nov 2023 – Discussion started: 05 Dec 2023 – Revised: 17 May 2024 – Accepted: 21 May 2024 – Published: 12 Jul 2024

1 Introduction

1.1 Data assimilation and the double-penalty issue

Geophysical data assimilation (DA) is a set of methods and algorithms at the intersection of Earth sciences, mathematics, and computer science that are designed to enhance our understanding and predictive capability with respect to the complex systems that govern our planet (Carrassi et al., 2018). For example, these systems encompass the atmosphere, ocean, atmospheric chemistry and biogeochemistry, land surfaces, glaciology, hydrology, and climate system as a whole. DA is meant to optimally combine all sources of quantitative information, typically past and present observations, and numerical and statistical models of the system under consideration. DA is critical in forecasting chaotic geofluids by resetting the initial conditions of the flow, estimating physical and statistical parameters of the models, and providing a quantitative reanalysis of the past history of the climate system over decades. Because classical DA is applied to complex and high-dimensional dynamics, the DA algorithms often result from a compromise between the sophistication of the employed mathematical techniques and their numerical scalability and efficiency (Kalnay, 2003; Asch et al., 2016; Evensen et al., 2022). For instance, it is well-known that most DA methods are built around or from an update step – the analysis – where observations and background states are combined, an operation which often relies on Gaussian statistical assumptions.

Here, we would like to focus on one important issue that impacts classical DA, known as the double-penalty error in the geosciences. The double-penalty issue refers to the over-penalisation of errors in both the model and observational data (e.g. Amodei and Stein, 2009) and compromises the balance required for effective DA. It often stems from the mislocation of fields, which is caused by model error in either the forecasting or observation operator. A typical example is given by the slight mislocation of a plume of pollutant resulting in high predicted concentration values at positions where no pollutant is observed, while the model misses the observed concentration peaks (Farchi et al., 2016). This mismatch is heavily penalised due to the use, over the same discretised space, of the root-mean-square error (RMSE) for a point-by-point comparison. Figure 1 shows an exemplar of double-penalty error resulting in the inability to properly evaluate a model and learn from an analysis increment. This double-penalty error, a very common contribution to the representation error (Janjić et al., 2018), is ubiquitous in the geosciences, for example, in numerical weather prediction (in particular for water vapour), in atmospheric chemistry and air quality, in biogeochemistry, and in eddy-resolving ocean forecasting. This especially applies to sharp fields, whereas it may be of less relevance for smoother, larger-scale fields such as temperature.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f01

Figure 1 These two panels schematise the computation of the RMSE of two analysis increments. These increments are the difference between the truth (left mesh in between both pairs of the norm delimiters), concentrated here in the red grid cell, and the analysis, located in the green grid cells (right mesh in between both pairs of the norm delimiters). The increment in panel (a) is the outcome of a better analysis that is spatially closer to the truth, compared with that in panel (b); however, both increments yield the same RMSE. Hence, this verification metric is impacted by the double-penalty error and does not help discriminate location errors.

Download

It has been recognised that, although it can handle amplitude and smoothness mismatch, the weighted Euclidean (Mahalanobis) distance cannot cope with mislocation error; thus, it cannot account for the full distortion between mismatched fields (Hoffman et al., 1995). In the field of precipitation verification, one would alternatively speak of amplitude, structure, and location errors (e.g. Wernli et al., 2008). Hence, even though tuning covariances of Gaussian error distributions as in classical DA, such as increasing the correlation length, might help mitigate the double-penalty error, it is insufficient. In Fig. 1, one might replace the Euclidean norm with a weighted Euclidean one with a large correlation length. This would yield similar norm values for both cases. Unfortunately, it is not difficult to show that, in this limit, this (almost singular) norm can only distinguish between the spatial mean of both fields: it became blunt with no discriminating power. With respect to DA, Feyeux et al. (2018, their Fig. 1) also illustrate why the Euclidean distance cannot properly cope with mislocation error. Note that, if Feyeux et al. (2018) had used a weighted Euclidean distance instead, with the same covariance matrix for the two contributions of the cost function, the resulting analysis state would have been the same and, in particular, independent of the covariance matrix. A similar but two-dimensional illustration is given by Vanderbecken et al. (2023, their Fig. 3).

1.2 Nonlocal verification metrics

The issue can be attributed to the use of a local verification metric, meaning that it compares, through the RMSE, values at the same site, of the same grid cell. Thus, this issue goes beyond DA and pertains to the use of local metrics.

To avoid being impacted by the double-penalty issue stemming from the use of local verification metrics, smarter nonlocal or multiscale metrics have been proposed. A typical metric of this kind consists of the combination of a displacement map followed by the use of classical norm such as the RMSE (Hoffman et al., 1995; Keil and Craig, 2009). In this vein, effective verification metrics can be based on optical flow-based warping or on deformed meshes, prior to using classical norms (Gilleland et al., 2010 a, b). These metrics can also be designed as scale-dependent and possibly multiscale, based on an empirical separation of scales, such as with fuzzy metrics (Ebert, 2008; Amodei and Stein, 2009) or wavelets (Briggs and Levine, 1997). They can be designed to grasp and quantify objects and features, such as lows and highs (Davis et al., 2006 a, b; Lack et al., 2010). Metrics with similar capabilities (but not necessarily based on a displacement concept) have been introduced in computer vision, such as the structural similarity index (Zhou et al., 2004), or in the verification of precipitation (Wernli et al., 2008; Skok, 2023; Necker et al., 2023).

One of the most elegant approaches is based on the theory of optimal transport (OT) and the associated Wasserstein distance, which sits on solid mathematical foundations and significant developments; these are the main reasons why we will focus on OT in the following. Examples of the application of OT to the verification of tracer and greenhouse gases models are given in Farchi et al. (2016) and Vanderbecken et al. (2023).

1.3 Optimal transport and the Wasserstein distance

Before mentioning applications of the Wasserstein distance in the field of geoscience, let us first give a very brief introduction to the concept and mathematical formulation of OT.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f02

Figure 2 Illustration of the earth mover problem introduced by Monge in 1781 (see bulk of paper).

Download

The OT concept stems from an engineering, although rather universal, problem. Gaspard Monge (Monge, 1781) considered the earth mover problem, the goal of which is to efficiently move rubble to an embankment of about the same volume (see Fig. 2). Each displacement of a bit of earth has a known cost, so that the goal is to find the cheapest deterministic map that completely moves the rubble to the embankment. In mathematical terms, the goal is to find the map of minimal cost that transports the origin measure ρ_o to the target measure ρ_b; measure here means that both of them are non-negative and are integrable of integral 1. Note that the value 1 is arbitrary here and can be changed to m>0, provided that this is the mass of both ρ_o and ρ_b. The cost is defined by a non-negative function 𝒞_bo of two variables (one for the origin space and the other for the target space). Let us assume a quadratic cost, defined for any couple of points (x,y) of a geometric domain Ω:

\begin{matrix} (1) & C_{bo} (x, y) = {∥x - y∥}_{2}^{2}, \end{matrix}

where ∥⋅∥₂ is the Euclidean norm. Let us define the set of all admissible differentiable maps T that transport ρ_o to ρ_b:

\begin{matrix} (2) & U_{bo} = \{T : Ω \mapsto Ω, ρ_{o} = |\partial_{x} T| ρ_{b} \circ T\}, \end{matrix}

where |∂_xT| is the absolute value of the determinant of the Jacobian of T, a factor which accounts for the deformation of the measure by the globally mass-conserving T. The square of the Wasserstein distance $W_{C_{bo}}$ is then defined by the following:

\begin{matrix} (3) & W_{C_{bo}}^{2} (ρ_{o}, ρ_{b}) = min_{T \in U_{bo}} \int_{Ω} C_{bo} (x, T (x)) ρ_{o} (x) d x . \end{matrix}

Here, the purpose is to minimise the total transport cost between ρ_o and ρ_b, and the optimal map T is often referred to as the Monge map. It can be shown that $W_{C_{bo}}$ is indeed a proper mathematical distance. The mathematical formulation is deceptively simple, as it is elegant, concise, and easy to grasp, but its theoretical and numerical solutions are far from trivial.

In the 20th century, a breakthrough was made by Leonid Kantorovich, who promoted the Monge problem to a probabilistic formulation. From his point of view, a bit of earth can be split and moved to many sites of the target measure support. Thus, the deterministic map T is replaced with a probabilistic measure π defined over Ω×Ω. Hereafter, such a π is called a transference plan. An admissible transference plan is integrable and has ρ_o and ρ_b as one-variable marginals; hence, the definition of the admissible set is as follows:

\begin{matrix} (4) & \begin{aligned} V_{bo} = & \{π : Ω \times Ω \mapsto R_{+}, ρ_{o} (x) = \int_{Ω} π (x, y) d y, \\ ρ_{b} (y) = \int_{Ω} π (x, y) d x\} . \end{aligned} \end{matrix}

As opposed to the deterministic Monge maps, the transference plans offer a symmetrical view of the origin and target space and their measures. An illustration of a discrete transference plan is given in Fig. 3. From this view, the squared Wasserstein distance can be reformulated as follows:

\begin{matrix} (5) & W_{C_{bo}}^{2} (ρ_{o}, ρ_{b}) = min_{π \in V_{bo}} \int_{Ω \times Ω} C_{bo} (x, y) π (x, y) d x d y . \end{matrix}

Equations (3) and (5) are the main continuous formulations of OT. In the rest of the paper, we will deal instead with discrete related formulations, which are more tangible and amenable to algorithmic and numerical implementations.

The field has attracted a lot of attention from pure and applied mathematicians as well as computer scientists. A complete introduction to the topic by experts can be found in the stimulating text books by Vilani (2003, 2009) and Peyré and Cuturi (2019). Peyré and Cuturi (2019) provide concrete examples, numerical methods, and a broad coverage of the topic from the perspective of applied mathematicians and computer scientists; hence, their work will be referred to quite often in the rest of the paper.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f03

Figure 3 A representation of a discrete transport plan between two discrete origin (blue) and target (red) measures. The black dots represent the value of the transference plan. The radius of the dots is proportional to the values of these measures. This transference plan is checked to be admissible but is not necessarily optimal.

Download

1.4 Nonlocal, multiscale metrics and data assimilation

Let us now go back to DA and narrow our focus to the use of advanced metrics in DA. Accounting for displacement error in DA, and hence relying on nonlocal verification metrics, has been advocated by Hoffman and Grassotti (1996), Ravela et al. (2007), and Plu (2013). Metrics built on a multiscale analysis of the fields to achieve a similar goal have been proposed by Ying (2019) and Ying et al. (2023).

The Wasserstein distance and closely related formulations have been advocated in the flow formulation of the analysis (DA update) to seamlessly transport the prior to the posterior (El Moselhy and Marzouk, 2012; Oliver, 2014; Marzouk et al., 2017; Farchi and Bocquet, 2018; Tamang et al., 2020). It can, for instance, be used to adjust the posterior discrete probability density functions (pdfs) in the particle filter. It has similarly been used to assist ensemble DA (Tamang et al., 2021, 2022). Finally, it has also, very recently, been used to compare forecast ensembles for sub-seasonal prediction (Le Coz et al., 2023; Lledó et al., 2023) or precipitation (Duc and Sawada, 2024).

In the context of this paper, it is critical to be aware that the use of OT in practical DA has, thus far, focused on applying OT independently to the pdf of each single scalar variable. Quite often, OT is applied to the pdf of a single random variable for the following two reasons:

OT in one dimension (the space of the values taken by this random variable), with a quadratic cost, has a very simple solution that only relies on the cumulative distribution functions of the origin and target measures (see e.g. Remark 2.30 in Peyré and Cuturi, 2019), a technique known in statistics as quantile matching.
Increasing the number of random variables is subject to the curse of dimensionality, necessitating an exponential increase in computational resources when increasing the resolution of the discretised fields.

This is very different from our context and objective where the objects dealt with by OT are (non-negative) physical field states, not the pdf of one of their scalar variables.

1.5 Feyeux et al. (2016) proposal

The present paper stands more in the wake of the seminal proposals of Ning et al. (2014), Feyeux (2016), and Feyeux et al. (2018). Their idea is to replace the local metrics of classical variational DA, typically the square of the Euclidean distance (hence related to the L₂ norm), with the squared Wasserstein distance. This is intuitively what we are after in order to cope with mislocation errors mentioned in Sect. 1.1 in the context of DA. This should redefine the nature of the DA update step. Let us formalise this idea (Feyeux, 2016).

We will seize this opportunity to introduce some of our notation in the context of discrete DA, which is a widely adopted standpoint in the geosciences. Let us focus on a classical DA 3D-Var cost function (Daley, 1991):

\begin{matrix} (6) & G_{cl} (x^{a}) = {∥y^{b} - x^{a}∥}_{2}^{2} + {∥y^{o} - H x^{a}∥}_{2}^{2}, \end{matrix}

where ∥⋅∥₂ is the Euclidean norm, $y^{b} \in R^{N_{b}}$ is the first guess, $y^{o} \in R^{N_{o}}$ is the vector of observations, and H is the observation operator.¹ $x^{a} \in R^{N_{a}}$ is the dummy variable of this optimisation problem whose optimal value corresponds to the DA state analysis. Now, the substitution of the Euclidean norm yields the new 3D-Var cost function:

\begin{matrix} (7) & G_{w} (x^{a}) = W_{2}^{2} (y^{b}, x^{a}) + W_{2}^{2} (y^{o}, H x^{a}), \end{matrix}

where 𝒲₂ is some discretisation of the Wasserstein distance based on the cost defined by the square of the Euclidean distance. Note that this 3D-Var case requires balancing two instances of a Wasserstein-based metric. The analysis state is known as a Wasserstein barycentre (abridged W-barycentre in the following).

Feyeux (2016) and Feyeux et al. (2018) explored the optimisation aspects of this DA problem. However, Feyeux (2016) ultimately pointed to a possible inconsistency in the definition of the DA problem formulated in Eq. (7), where the system is only partially observed (non-trivial H). In the case where the system is fully observed, typically when H is the identity operator, the outcome of the optimisation problem, i.e. the analysis, matches our expectations. However, when the system is partially observed, inconsistencies are observed. Let us see why.

Figure 4a considers the DA problem based on Eq. (7), assuming that only half of the domain is observed. We have solved the corresponding mathematical and numerical problem as raised by Feyeux (2016) and displayed its solution. However, one observes that the mass of the solution concentrates on the observed subdomain and neglects the rest of the domain where the prior mainly concentrates, an outcome suspected by Feyeux (2016). Instead, we would have intuitively preferred a solution close to the one offered by Fig. 4b, whose formulation and numerical solution differ and follow the new approach developed in the present paper (how we obtained this solution will be described in Sect. 2).

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f04

Figure 4 The panels illustrate the analysis of a 3D-Var case that relies on the Wasserstein distance rather than a local metric. The red dots represent the observations, while the dashed blue curve represents the background state. The observations are only focused on the left half of the domain. The solution of the optimisation problem in Eq. (7) is displayed as a solid green curve in panel (a). The solution of the optimisation problem that we will propose in this paper is displayed as a solid green curve in panel (b). The support of the observation is suggested using a wavy grey segment. These states are typically one-dimensional puff pollutant concentrations. They should not be confused with pdfs of a single random variable.

Download

The main caveat of Eq. (7) comes from the fact that the system is only partially observed, as well as the requirement that OT is balanced; i.e. the origin and target densities must have the same mass. This mass balance applies to both OT terms in Eq. (7), between y^b and x^a and between y^o and Hx^a:

\begin{matrix} (8) & m (x^{a}) = m (y^{b}), m (H x^{a}) = m (y^{o}) . \end{matrix}

Here, the mass of a vector x∈ℝ^N is defined by

\begin{matrix} (9) & m (x) = 1^{⊤} x = \sum_{i = 1}^{N} x_{i}, \end{matrix}

with 1∈ℝ^N hereafter defined as the vector of entries 1.

Now, if we further assume, for simplicity, that y^b and y^o have the same mass (which is the case in Fig. 4), then

\begin{matrix} (10) & m (H x^{a}) = m (y^{o}) = m (y^{b}) = m (x^{a}) . \end{matrix}

As a result, we obtain m(Hx^a)=m(x^a), which is an undesired prior piece of information as to where to find the mass of x^a. Simply put, unless the system is fully observed, this approach amounts to the streetlight effect. This is precisely what happens in Fig. 4a with the undesired concentration of the mass of x^a close to the edge of the observed subdomain.

To overcome this caveat and find a proper alternative to Eq. (7), we need (i) to renounce comparing the fields in observation space (in the observation discrepancy term of the cost function) and (ii) to introduce unbalanced OT (i.e. we need to be able to accommodate states of distinct masses). In the computer science context of pure OT, the latter has been discussed by Chizat et al. (2018). However, our solution differs formally and will be DA-centric.

1.6 Objective and outline

The objective of this paper is to lift the objection of Feyeux (2016) and propose a DA framework based on the Wasserstein distance, thereby offering a consistent way to bridge OT and classical DA. The new formalism will be referred to as hybrid OTDA (hybrid optimal transport data assimilation) in the rest of this paper (or OTDA for brevity). We will focus on the definition of a 3D-Var DA problem and how to obtain its analysis state and the associated analysis error covariance matrix.

At least within the perimeter of this paper, some restrictions apply. Firstly, the physical fields considered in the DA problem are non-negative (concentration of tracer, pollutants, water vapour, hydrometeors, chemical and biogeochemical species, etc.). However, as opposed to Feyeux (2016), the methods of this paper do not require the (possibly noisy) background state y^b and observation y^o to be non-negative. We stress once again that the states of our DA problem are physical fields onto which OT is applied and are not meant to be a pdf of a random variable. Secondly, the observation operator H is assumed to be linear. This is only meant for convenience and to obtain a rigorous correspondence between the primal and dual cost functions of the 3D-Var case. Making this assumption is very common in geophysical DA: H can indeed be seen as the tangent linear of a nonlinear observation operator within the inner loop of a 3D-Var or a 4D-Var case (see, for instance, Courtier, 1997).

The outline of the paper is as follows. After the present introduction (Sect. 1), Sect. 2 discloses our main idea and discusses two mathematical paths to solve the underlying optimisation problem; the first path is enlightening but not necessarily practical, whereas the alternative path is direct and robust but hides some of the concepts behind it. Section 3 provides one- and two-dimensional illustrations of a 3D-Var analysis based on the new hybrid OTDA formalism. These illustrations will show the possibilities and flexibility of the new framework. Importantly, Sect. 3 will also depict classical DA as a limit case of the formalism. In Sect. 4, the second-order analysis, i.e. the uncertainty quantification of the OTDA 3D-Var case, is derived, discussed, and illustrated. Conclusions and perspectives are given in Sect. 5.

2 The main proposal

2.1 Notation and conventions

Non-negative vectors x of size N are called discrete measures; they lie in the orthant $O_{N}^{+} \overset{Δ}{=} R_{+}^{N}$ . Although most mathematical OT theories work on normalised discrete measures, yielding probability vectors, this assumption will not be needed in this paper. The open subset of $O_{N}^{+}$ of all the positive discrete measures will be denoted as $O_{N}^{+, ⋆} \overset{Δ}{=} R_{+, ⋆}^{N}$ .

We will distinguish the observations $y^{b} \in R^{N_{b}}$ and $y^{o} \in R^{N_{o}}$ from the observable states $x^{b} \in O_{N_{b}}^{+}$ and $x^{o} \in O_{N_{o}}^{+}$ . y^b, which corresponds to the first guess of conventional DA, and y^o, which corresponds to the traditional observation vector, are known before solving the 3D-Var problem. These vectors embody all of the information processed in the analysis. By contrast, the observables x^b and x^o, which are related to y^b and y^o, respectively, through an observation operator (the identity for y^b and x^b), are not known a priori. They will be estimated along with the analysis state $x^{a} \in O_{N_{a}}^{+}$ . Note that these vectors may well lie in distinct vector spaces of different dimensions; hence, the introduction of as many dimensions $N_{b}, N_{o}, N_{b}$ , and 𝒩_o. $x_{i}^{⋆}$ can be seen as the value taken by x^⋆ at site $r_{i}^{⋆}$ , for $⋆ = b, o$ , and a and $i \in [[1, N_{⋆}]]$ . Mind that the distinction between y^b and x^b and the introduction of x^o are novelties of OTDA compared with classical DA.

Like in classical DA, the vectors y^b and y^o are subject to (prior) errors whose statistics are specified by the likelihoods p(y^b|x^b) and p(y^o|x^o), respectively. Up to constants that do not depend on $x^{b}, x^{o}, y^{b}$ , and y^o, we assume the existence of ζ_b and ζ_o, such that

\begin{matrix} (11a) & \ln p (y^{b} | x^{b}) \overset{Δ}{=} - ζ_{b} (y^{b} - x^{b}) + cst, \\ (11b) & \ln p (y^{o} | x^{o}) \overset{Δ}{=} - ζ_{o} (y^{o} - H x^{o}) + cst . \end{matrix}

Thus, various error statistics can be considered. These errors are hypothesised to be mutually independent. The observation operator $H : O_{N_{o}}^{+} \mapsto R^{N_{o}}$ used in the definition of ζ_o is assumed to be linear. This qualification is for convenience and could be lifted if necessary. It is further assumed that ζ_b and ζ_o are strictly convex functions. This is, for instance, the case if we choose Gaussian error statistics yielding

\begin{matrix} (12) & ζ_{b} (e_{b}) = \frac{1}{2} {∥e_{b}∥}_{B^{- 1}}^{2}, ζ_{o} (e_{o}) = \frac{1}{2} {∥e_{o}∥}_{R^{- 1}}^{2} . \end{matrix}

Here, ${∥e∥}_{A} = \sqrt{e^{⊤} A e}$ . B is the positive definite background error covariance matrix and R is the positive definite observation error covariance matrix. Finally, in the following, the m(⋆) operator will act not only on vectors but also, more generally, on any tensor, and it will return the sum of all of its entries.

2.2 Formalism of discrete optimal transport

To discretise and solve the continuous Kantorovich optimisation problem introduced in Sect. 1.3, we will need two elementary pieces of information about OT. These are not the only techniques that we will leverage, but both represent cornerstones towards a numerical solution to our proposal and, hence, require a proper introduction.

2.2.1 The primal cost function

Let us consider two discrete measures $x^{b} \in O_{N_{b}}^{+}$ and $x^{o} \in O_{N_{o}}^{+}$ with the same mass:

\begin{matrix} (13) & m \overset{Δ}{=} m (x^{b}) = m (x^{o}) . \end{matrix}

For convenience, $O_{b, o}^{+}$ will be used as an alias for the set $O_{N_{b} \times N_{o}}^{+}$ . A cost matrix $C_{bo} \in O_{b, o}^{+}$ is given. The optimisation problem will be formulated using discrete Kantorovich transference plans $P^{bo} \in O_{b, o}^{+}$ . The optimal discrete transference plan is given by the minimiser of the following optimisation problem:

\begin{matrix} (14a) & W_{C_{bo}} (x^{b}, x^{o}) \overset{Δ}{=} min_{P^{bo} \in U_{bo} (x^{b}, x^{o})} Tr (C_{bo}^{⊤} P^{bo}) . \end{matrix}

Here, the trace sums up the costs attached to each path, and the set of admissible transference plans is defined by

\begin{matrix} (14b) & U_{bo} \overset{Δ}{=} \{P \in O_{b, o}^{+} : P 1_{o} = x^{b}, P^{⊤} 1_{b} = x^{o}\}, \end{matrix}

which selects the discrete transference plans with the proper marginals. $W_{C_{bo}}$ could be viewed as a discrete equivalent to the square of the Wasserstein distance $W_{C_{bo}}^{2}$ introduced in Eq. (5).

2.2.2 Entropic regularisation

The optimisation problem in Eq. (14) is a linear program that is convex (Peyré and Cuturi, 2019, and references therein). However, it is not generally strictly convex and, hence, does not necessarily exhibit a single minimum. Adding to the difficulty, its cost function (Eq. 14a) is constrained. Entropic regularisation addresses these issues and is used here to lift the constraints and to render the problem strictly convex. In particular, it will force any state vector that is a solution of the problem to be positive. A comprehensive justification is given by Peyré and Cuturi (2019). More precisely, we will use a Kullback–Leibler divergence (KL) regularisation term that is inserted in Eq. (14a),

\begin{matrix} (15) & Tr (C_{bo}^{⊤} P^{bo}) \to Tr (C_{bo}^{⊤} P^{bo}) + ε K (P^{bo} | ν^{bo}), \end{matrix}

which incorporates some prior transference plan ν^bo and does not require m(P^bo)=1, whereas Peyré and Cuturi (2019) opted for a basic entropy term. The KL term (Boyd and Vandenberghe, 2004) is defined by

\begin{matrix} (16) & K (p | q) \overset{Δ}{=} \sum_{i} p_{i} \ln \frac{p_{i}}{q_{i}} - p_{i} + q_{i} . \end{matrix}

It can be checked that the Hessian of the regularised cost function (Eq. 15) is a diagonal matrix of coefficients $ε / P_{i j}^{bo} \geq ε$ because $0 \leq P_{i j}^{bo} \leq 1$ , making the problem ε-strongly convex. We choose, for example, $ν^{bo} = x^{b} {(x^{o})}^{⊤} / m$ and ε>0, the latter of which is the regularisation scalar parameter. Note that this particular ν^bo is an admissible transference plan, i.e. it belongs to 𝒰_bo, and can be interpreted as a complete statistical decoupling of the transference plan with respect to the origin and target discrete measures. In the limit $ε \to 0^{+}$ of vanishing regularisation, the solution should not depend on the choice of ν^bo. However, the convergence to the solution at finite ε may depend on this choice. The primal cost function augmented with such an entropic regularisation is usually solved numerically with the iterative Sinkhorn algorithm (Sinkhorn, 1964). However, this is not the path followed in this paper, although we have used it as well.

Finally note that the technique to convexify such an optimisation problem with a KL term has been introduced in DA by Bocquet (2009) and Bocquet et al. (2011) following principles of statistical physics.

2.3 From classical data assimilation to hybrid optimal transport data assimilation

Figure 5 is a schematic representation of the flow of information in a classical DA update (and in particular 3D-Var), using the notation introduced above. In this case, the observables x^b and x^o and the analysis state x^a are the same by construction; hence, x^b and x^o are not needed. This diagram, which could also be seen as a Bayesian network, corresponds to the cost function

\begin{matrix} (17) & L_{cl} (x^{a}) = ζ_{b} (y^{b} - x^{a}) + ζ_{o} (y^{o} - H x^{a}) \end{matrix}

to be minimised over x^a. Now let us make use of the observables x^b and x^o as new degrees of freedom but bind them by OTs to x^a, using the cost matrices C_ba and C_oa, respectively.

This yields the diagram in Fig. 6, which corresponds to the cost function

\begin{array}{l} L_{w} (x^{a}) = & min_{x^{b} \in O_{N_{b}}^{+} x^{o} \in O_{N_{o}}^{+}} {ζ_{b} (y^{b} - x^{b}) + ζ_{o} (y^{o} - H x^{o}) \\ (18) & + W_{C_{ba}} (x^{b}, x^{a}) + W_{C_{oa}} (x^{o}, x^{a})} . \end{array}

It must be minimised over x^a, yielding an analysis state x^a; this analysis state can also be seen as the W-barycentre between x^b and x^o. Note that x^b and x^o are discrete measures of unknown mass. For the optimisation problem, they lie in $O_{N_{b}}^{+}$ and $O_{N_{o}}^{+}$ , respectively.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f05

Figure 5 A diagrammatic representation of the classical 3D-Var update, with the observations y^b (the first guess) and y^o (the observation vector), the analysis state x^a, and the observed analysis Hx^a. A double-line arrow represents a deterministic map, whereas a single-line arrow represents a statistical binding between the origin and the target.

Download

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f06

Figure 6 A diagrammatic representation of the hybrid OTDA 3D-Var update, with the observations y^b (the first guess) and y^o (the observation vector); the observables x^b and x^o; and x^a, which is the W-barycentre. A double-line arrow represents a deterministic map, a single-line arrow represents a statistical binding between the origin and the target, and a wavy line represents the weaker bindings of x^b with x^a and x^o with x^a through OTs. This diagram can be seen as an unfolding of that in Fig. 5.

Download

Moving from Eq. (17) to Eq. (18) following the principles and guidance of the introductory Sect. 1.4 is empirical, although no more than in Ning et al. (2014) and Feyeux (2016). Showing the merits of this move from Eq. (17) to Eq. (18) is the goal of the present paper. As opposed to Feyeux et al. (2018), it can deal with sparse and noisy observations, i.e. non-trivial H. We will show that classical DA is embedded in this generalisation. Moreover, the merits of the new cost function will be a posteriori qualitatively supported by the outcome of the numerical experiments (to the expert's eyes), which improve over previous formalism's outcomes. We would like to point out that we have also developed a consistent probabilistic and Bayesian formalism fully supporting the introduction of Eq. (18). However, we felt that the derivation was too long and technical for this paper and would not be helpful in the exploration of the direct consequences of Eq. (18).

We call Eq. (18) a high-level primal cost function because the metrics $W_{C_{ba}}$ and $W_{C_{oa}}$ have not yet been replaced by their transference plan expression, as opposed to, for example, Eq. (14a). Passing to a lower-level primal cost function would require expanding Eq. (18) using Eq. (14a) twice.

In the subsequent two subsections, we will investigate two pathways to solve the optimisation problem in Eq. (18). The first path (Sect. 2.4) unveils some of the key concepts behind its solution and partially disentangles the classical DA part from the W-barycentre part of the full analysis. This approach is enlightening but not necessarily practical. The second path is an alternative which is direct and robust but hides some of the fundamental principles underlying the solution. The busy reader could skip directly to the latter (i.e. Sect. 2.5).

2.4 Decomposition of the optimisation problem and effective cost metric

In this section, key ideas behind the minimisation of Eq. (18) are sketched and discussed. The level of mathematical rigour of this section is that of casual methodological DA in the geoscience literature. However, we stress that all of the algorithms discussed here have been successfully numerically tested on various configurations. The solution of Eq. (18) presented in this section is not necessarily robust, but it is enlightening and, hence, worth discussing.

Repeated contravariant indices – meaning the same tensor index is present as the upper and lower index – in tensor expressions will be understood as summed over, following Einstein's convention.

2.4.1 Dual formulation of the primal problem

One way, although not the only one, to write the explicit primal problem associated with Eq. (18) is through the use of a gluing transference plan $P^{boa} \in O_{b, o, a}^{+}$ , where $O_{b, o, a}^{+} = R_{+}^{N_{b} N_{o} N_{a}}$ (see pp. 11–12 of Vilani, 2009). $P^{boa} \in O_{b, o, a}^{+}$ is a 3-tensor whose marginals are x^b, x^o, and x^a and that glues the transference plans P^ba between x^b and x^a and P^oa between x^o and x^a:

\begin{matrix} (19a) & L = min_{x^{a} \in O_{a}^{+}} L_{w} (x^{a}) \\ (19b) & \begin{aligned} = min_{x^{b} \in O_{b}^{+} x^{o} \in O_{o}^{+} x^{a} \in O_{a}^{+}} [ζ_{b} (y^{b} - x^{b}) + ζ_{o} (y^{o} - H x^{o}) \\ + min_{P \in U_{boa}} \{P_{i j k} C_{ba}^{i k} + P_{i j k} C_{oa}^{j k}\}] . \end{aligned} \end{matrix}

Here, the admissible set of (glued) transference plans, the set of all 3-tensors of non-negative entries whose marginals are x^b, x^o, and x^a, is defined by

\begin{matrix} (19c) & \begin{aligned} U_{boa} \overset{Δ}{=} { & P \in O_{b, o, a}^{+} : \forall i, P_{i j k} 1_{o}^{j} 1_{a}^{k} = x_{i}^{b}, \\ \forall j, P_{i j k} 1_{b}^{i} 1_{a}^{k} = x_{j}^{o}, \forall k, P_{i j k} 1_{b}^{i} 1_{o}^{j} = x_{k}^{a}} . \end{aligned} \end{matrix}

Due to the hardly scalable dimensionality of the primal problem, based on either a 3-tensor or a couple of 2-tensors, we wish to derive a dual problem equivalent to the primal one, using Lagrange multipliers to lift the constraints with (as will be checked later) a significantly smaller dimensionality.

This leads to a series of transformations of the problem ℒ, from a Lagrangian to a dual cost function, which is reported in Appendix A for the mathematically inclined reader. The outcome is a dual problem which reads

\begin{matrix} (20a) & \begin{aligned} L^{*} = max_{(f_{b}, f_{o}) \in U_{bo}^{*} (C_{ba}, C_{oa}, H)} { & f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} \\ - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o})}, \end{aligned} \end{matrix}

where the ∗ symbol refers to dual and where the polyhedron $U_{bo}^{*} (C_{ba}, C_{oa}, H)$ is defined by

\begin{matrix} (20b) & \begin{aligned} U_{bo}^{*} (C_{ba}, & C_{oa}, H) \overset{Δ}{=} {f_{b} \in R^{N_{b}}, f_{o} \in R^{N_{o}} : \\ \forall i, j, k, f_{b}^{i} + f_{o}^{l} H_{l}^{j} \leq C_{ba}^{i k} + C_{oa}^{j k}} . \end{aligned} \end{matrix}

In Eq. (20), the maps $ζ_{b}^{*}$ and $ζ_{o}^{*}$ are the Legendre–Fenchel transforms of the maps ζ_b and ζ_o, respectively. Let us recall that the Legendre–Fenchel transform $f \mapsto ζ^{*} (f)$ of the map e↦ζ(e) is defined by $ζ^{*} (f) = \sup_{e} \{f^{⊤} e - ζ (e)\}$ . For instance, in the case of Gaussian error statistics (as in Eq. 12), these transforms are given by

\begin{matrix} (21) & ζ_{b}^{*} (f_{b}) = \frac{1}{2} {∥f_{b}∥}_{B}^{2}, ζ_{o}^{*} (f_{o}) = \frac{1}{2} {∥f_{o}∥}_{R}^{2} . \end{matrix}

Note that, in this section, we do not add the entropic regularisation to the cost functions for the sake of conciseness and because it does not play a role in the key ideas developed in this section; however, it would likely be added and employed in numerical applications.

2.4.2 Decomposition of the dual problem

These transformations allow us to trade the primal for the dual problem. Due to the fact that there are N_a constraints indexed by $k \in [[1, N_{a}]]$ for each f_b and f_o pair in $U_{bo}^{*}$ and that the tightest of these constraints can account for the others, the problem in Eq. (20) should be equivalent to

\begin{matrix} (22a) & \begin{aligned} L^{*} & = max_{(f_{b}, f_{o}) \in U_{bo}^{*} (C_{bo}, H)} \\ \{f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o})\}, \end{aligned} \end{matrix}

where the polyhedron $U_{bo}^{*} (C_{bo}, H)$ is defined by

\begin{matrix} (22b) & \begin{aligned} U_{bo}^{*} (C_{bo}, H) \overset{Δ}{=} { & f_{b} \in R^{N_{b}}, f_{o} \in R^{N_{o}} : \\ \forall i, j, f_{b}^{i} + f_{o}^{l} H_{l}^{j} \leq C_{bo}^{i j}} \end{aligned} \end{matrix}

and the effective cost metric C_bo is given (in the absence of entropic regularisation) by

\begin{matrix} (22c) & [C_{bo}]_{i j} \overset{Δ}{=} min_{k} \{[C_{ba}]_{i k} + [C_{oa}]_{j k}\} . \end{matrix}

According to Eq. (22c), this effective cost is given by the cost of the cheapest path(s), which is intuitive. The optimal transference glued plan, P, can be connected to the optimal transference plan P^bo between x^b and x^o with the cost C_bo in Eq. (22c), by marginalising on the intermediate density, i.e. the W-barycentre,

\begin{matrix} (23) & P_{i j}^{bo} = P_{i j k} 1_{a}^{k} . \end{matrix}

The solution for the analysis state x^a is given by

\begin{matrix} (24) & x_{k}^{a} = P_{i j k} 1_{b}^{i} 1_{o}^{j}, \end{matrix}

by the definition of the marginals of the gluing transference plan P (Eq. 19c). However, we do not have direct access to the optimal gluing P from the dual problem (Eq. 22). This will be made simpler later on when adding the entropic regularisation to the problem.

For now, let us find an alternative solution bypassing the need for the gluing P and define the map

\begin{array}{l} κ^{bo} : & [[1, N_{b}]] \times [[1, N_{o}]] \mapsto P ([[1, N_{a}]]) \\ (25) & (i, j) \mapsto κ_{i j}^{bo} = \underset{k}{\arg \min} (C_{ba}^{i k} + C_{oa}^{j k}), \end{array}

where 𝒫(S) is defined as the set of all subsets of S. The set $κ_{i j}^{bo}$ lists all of the indices k that are relays to the transport in between the sites corresponding to index i and index j. That is why the W-barycentre can be obtained from P^bo:

\begin{matrix} (26) & x_{k}^{a} = P_{i j k} 1_{b}^{i} 1_{o}^{j} = \sum_{i j} P_{i j}^{bo} δ_{k \in κ_{i j}^{bo}} . \end{matrix}

In the next section, we will show how to estimate P^bo using entropic regularisation and, hence, leverage Eq. (26) to compute x^a. κ^bo is reminiscent of the so-called McCann interpolant in OT theory, as it is only related to the OT between x^b and x^o, bypassing x^a and, hence, the transference plan P^bo. Please refer to Remark 7.1 by Peyré and Cuturi (2019) and to Gangbo and McCann (1996) for a description of the McCann interpolant, even when there is no Monge map. This suggests that the analysis x^a is not an interpolation of x^b and x^o in the space of values, as for classical DA, but along a geodesic in a Riemannian space built on a metric derived from the Wasserstein distance.

Nonetheless, the above derivation shows that we can trade a W-barycentre problem characterised by a couple of OT problems for a single OT problem defined by an effective metric C_bo. This principle is schematically illustrated in Fig. 7.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f07

Figure 7 Trading a full hybrid OTDA problem, characterised by a W-barycentre defined by the cost metrics C_ba and C_oa, with a simplified hybrid OTDA problem, characterised by a single OT problem defined by an effective cost metric C_bo.

Download

This suggests a simpler two-step algorithm, where the steps consist of the following: (i) solving a hybrid OTDA problem but with a single OT problem under an effective cost metric, which yields the analysed observables x^b and x^o, and (ii) computing the W-barycentre of x^b and x^o. To avoid making an overly large detour, the derivation of this algorithm is presented in Appendix B.

2.4.3 Classical data assimilation as a particular case

The primal problem (Eq. 17) of classical DA reads as follows:

\begin{matrix} (27) & L_{cl} = min_{x^{a} \in O_{N_{a}}^{+}} \{ζ_{b} (y^{b} - x^{a}) + ζ_{o} (y^{o} - H x^{a})\} . \end{matrix}

Let us see how the OTDA formalism in Eq. (22) can account for classical DA. In the context of classical DA, the observable spaces for x^b, x^o, and x^a are assumed to be the same by construction. Let us then define the cost matrices

\begin{matrix} (28) & {[C_{ba}^{\infty}]}_{i j} \overset{Δ}{=} {[C_{oa}^{\infty}]}_{i j} \overset{Δ}{=} \{\begin{array}{ccc} 0 & if & i = j \\ + \infty & if & i \neq j \end{array}; \end{matrix}

i.e. it is assumed that the cost of moving masses is as large as it can be. Looking back at Eq. (19) but with these costs, it is clear that, in order to avoid the primal cost function being +∞, the transference plan P_ijk must always be 0 unless $i = j = k$ . However, this implies, from the definition of 𝒰_boa, that the observables coincide, $x^{b} = x^{o} = x^{a}$ , and that their mass is given by m(P). In this limit where the specific cost matrices are equal to $C_{ba}^{\infty}$ and $C_{oa}^{\infty}$ , the OTDA primal problem become mathematically equivalent to classical DA. Hence, classical DA is a limit case of OTDA. Note that, from its definition (Eq. 22c), the effective cost C_bo obtained from $C_{ba}^{\infty}$ and $C_{oa}^{\infty}$ coincides with $C_{bo}^{\infty} \overset{Δ}{=} C_{ba}^{\infty} = C_{oa}^{\infty}$ .

2.5 A direct algorithmic solution

The two-step approach of Sect. 2.4 has merit in that it connects to the traditional W-barycentre problem, by first estimating x^b and x^o, and later computes the W-barycentre in between both states. It also suggests the existence of the effective cost metric of the problem. However, going through its consecutive steps may not be necessary for pure computational purposes. Here, we describe a direct approach that yields the analysis of the OTDA problem. It is less enlightening, but it is practical and will be used in the subsequent illustrations of the present paper.

An alternative formulation to the primal problem in Eq. (19) relies on two transference plans P^ba and P^oa corresponding to the two transports of the underlying W-barycentre problem, instead of the gluing one. Moreover, entropic regularisation is now enforced via 𝒦(P^ba|ν^ba) and 𝒦(P^oa|ν^oa). The corresponding optimisation problem reads as follows:

\begin{array}{l} L = & min_{x^{b} \in O_{b}^{+} x^{o} \in O_{o}^{+} x^{a} \in O_{a}^{+}} [ζ_{b} (y^{b} - x^{b}) + ζ_{o} (y^{o} - H x^{o}) \\ + min_{P^{ba} \in U_{ba} P^{oa} \in U_{oa}} {ε K (P^{ba} | ν^{ba}) + ε K (P^{oa} | ν^{oa}) \\ (29a) & + P_{i k}^{ba} C_{ba}^{i k} + P_{j k}^{oa} C_{oa}^{j k}}], \end{array}

where the admissible sets of the respective transference plans P^ba and P^oa are defined by

\begin{array}{l} (29b) & U_{ba} & \overset{Δ}{=} \{P \in O_{b, a}^{+} : P 1_{a} = x^{b}, P^{⊤} 1_{b} = x^{a}\}, \\ (29c) & U_{oa} & \overset{Δ}{=} \{P \in O_{o, a}^{+} : P 1_{a} = x^{o}, P^{⊤} 1_{b} = x^{a}\} . \end{array}

Following the same type of derivation as reported in the previous sections and Appendix B, the corresponding dual problem to be minimised is obtained as follows:

\begin{matrix} (30a) & J_{ε}^{*} = min_{f_{b} \in R^{N_{b}} f_{o} \in R^{N_{o}} f_{a} \in R^{N_{a}}} J_{ε}^{*} (f_{b}, f_{o}, f_{a}) . \end{matrix}

Here, discarding the constant $- ε m (ν^{ba}) - ε m (ν^{oa})$ , the associated regularised Lagrangian is

\begin{matrix} (30b) & \begin{aligned} J_{ε}^{*} (f_{b}, f_{o}, f_{a}) = & ε Z_{ε}^{ba} (f_{b}, f_{a}) + ε Z_{ε}^{oa} (f_{o}, f_{a}) \\ + ζ_{b}^{*} (f_{b}) + ζ_{o}^{*} (f_{o}) - f_{b}^{⊤} y^{b} - f_{o}^{⊤} y^{o}, \end{aligned} \end{matrix}

with a partition function associated with each transport:

\begin{matrix} (30c) & Z_{ε}^{ba} \overset{Δ}{=} \sum_{i k} P_{i k}^{ba}, \\ (30d) & Z_{ε}^{oa} \overset{Δ}{=} \sum_{j k} P_{j k}^{oa}, \end{matrix}

where

\begin{matrix} (30e) & P_{i k}^{ba} = ν_{i k}^{ba} e^{(f_{b}^{i} + f_{a}^{k} - C_{ba}^{i k}) / ε}, \\ (30f) & P_{j k}^{oa} = ν_{j k}^{oa} e^{(f_{o}^{l} H_{l}^{j} - f_{a}^{k} - C_{oa}^{j k}) / ε} . \end{matrix}

It turns out that the optimal f_a can be obtained analytically as a function of f_b and f_o, which we checked makes the optimisation numerically more efficient and robust. Indeed, let us introduce $ψ_{k} \overset{Δ}{=} e^{f_{a}^{k} / ε}$ . We could optimise $J_{ε}^{*} (f_{b}, f_{o}, f_{a} = ε \ln ψ)$ on ψ:

\begin{array}{l} (31a) & 0 = & \partial_{ψ_{k}} J_{ε}^{*} (f_{b}, f_{o}, f_{a}) \\ (31b) & = & \sum_{i} ν_{i k}^{ba} e^{(f_{b}^{i} - C_{ba}^{i k}) / ε} - \frac{1}{ψ_{k}^{2}} \sum_{j} ν_{j k}^{oa} e^{(f_{o}^{l} H_{l}^{j} - C_{oa}^{j k}) / ε}, \end{array}

yielding the solution

\begin{array}{l} (32a) & ψ_{k}^{2} & = \frac{Z_{ε, k}^{oa}}{Z_{ε, k}^{ba}}, \\ (32b) & Z_{ε, k}^{oa} & \overset{Δ}{=} \sum_{j} ν_{j k}^{oa} e^{(f_{o}^{l} H_{l}^{j} - C_{oa}^{j k}) / ε}, \\ (32c) & Z_{ε, k}^{ba} & \overset{Δ}{=} \sum_{i} ν_{i k}^{ba} e^{(f_{b}^{i} - C_{ba}^{i k}) / ε} . \end{array}

Up to irrelevant constants, the resulting effective cost function using the optimal ψ_k is

\begin{matrix} (33) & \begin{aligned} J_{ε}^{*} (f_{b}, f_{o}) = & 2 ε \sum_{k} \sqrt{Z_{ε, k}^{ba} Z_{ε, k}^{oa}} + ζ_{b}^{*} (f_{b}) + ζ_{o}^{*} (f_{o}) \\ - f_{b}^{⊤} y^{b} - f_{o}^{⊤} y^{o} . \end{aligned} \end{matrix}

Now, the optimal W-barycentre x^a is given by either $x_{k}^{a} = P_{i k}^{ba} 1_{b}^{i}$ or $x_{k}^{a} = P_{j k}^{oa} 1_{o}^{j}$ , i.e.

\begin{matrix} (34) & x_{k}^{a} = ψ_{k} Z_{ε, k}^{ba} = \frac{1}{ψ_{k}} Z_{ε, k}^{oa}, \end{matrix}

from which we can infer the ψ_k-free expression

\begin{matrix} (35) & x_{k}^{a} = \sqrt{Z_{ε, k}^{ba} Z_{ε, k}^{oa}} . \end{matrix}

It is also useful to retrieve the optimal value of f_a and obtain

\begin{matrix} (36) & f_{a}^{k} = ε \ln ψ_{k} = \frac{ε}{2} \ln (\frac{Z_{ε, k}^{oa}}{Z_{ε, k}^{ba}}), \end{matrix}

so that we can compute the other two analysed observables, x^b and x^o, using

\begin{matrix} (37a) & \begin{aligned} x_{i}^{b} & = P_{i k}^{ba} 1_{a}^{k} = \sum_{k} ψ_{k} ν_{i k}^{ba} e^{(f_{b}^{i} - C_{ba}^{i k}) / ε} \\ = e^{f_{b}^{i} / ε} \sum_{k} ν_{i k}^{ba} e^{(f_{a}^{k} - C_{ba}^{i k}) / ε}, \end{aligned} \\ (37b) & \begin{aligned} x_{j}^{o} & = P_{j k}^{oa} 1_{a}^{k} = \sum_{k} \frac{1}{ψ_{k}} ν_{j k}^{oa} e^{(f_{o}^{l} H_{l}^{j} - C_{oa}^{j k}) / ε} \\ = e^{(f_{o}^{l} H_{l}^{j}) / ε} \sum_{k} ν_{j k}^{oa} e^{(- f_{a}^{k} - C_{oa}^{j k}) / ε} . \end{aligned} \end{matrix}

Note that most of these expressions can be assessed in a robust way in the log domain. For instance, in practice, we use, equivalently to Eqs. (35) and (37),

\begin{array}{l} ε \ln x_{k}^{a} = & \frac{ε}{2} \ln \sum_{i} ν_{i k}^{ba} e^{(f_{b}^{i} - C_{ba}^{i k}) / ε} \\ (38a) & + \frac{ε}{2} \ln \sum_{j} ν_{j k}^{oa} e^{(f_{o}^{l} H_{l}^{j} - C_{oa}^{j k}) / ε}, \\ (38b) & ε \ln x_{i}^{b} = & f_{b}^{i} + ε \ln \sum_{k} ν_{i k}^{ba} e^{(f_{a}^{k} - C_{ba}^{i k}) / ε}, \\ (38c) & ε \ln x_{j}^{o} = & f_{o}^{l} H_{l}^{j} + ε \ln \sum_{k} ν_{j k}^{oa} e^{(- f_{a}^{k} - C_{oa}^{j k}) / ε} . \end{array}

3 Numerical illustrations

In this section, we showcase a selection of OTDA 3D-Var analyses. These are meant to stress the versatility of the formalism and the diverse solutions it offers, with significantly more degrees of freedom than in classical DA. The OTDA state analysis is carried out using the process in Sect. 2.5 and its formulas. Unless specifically discussed, entropic regularisation is used with $ε = 10^{- 3}$ . The dual cost function in Eq. (33) is minimised using the quasi-Newton method L-BFGS-B (Liu and Nocedal, 1989), which yields the optimal f_b and f_o. Then, Eq. (38) is employed to compute x^b, x^o, and x^a.

3.1 One-dimensional examples

Considering the case in which the physical space of the fields is one-dimensional, we build bell-shaped observations y^b and y^o, related to an observable space of size $N_{b} = N_{o} = N_{a} = 10^{2}$ shared by x^b, x^o, and x^a. As y^b is a fully observed instance of x^b, we have $N_{b} = N_{b} = 10^{2}$ , while 𝒩_o may differ from N_o depending on the definition of the observation operator H. We choose (Gaussian statistics)

\begin{matrix} (39) & ζ_{b} (e_{b}) = \frac{1}{2 σ_{b}^{2}} {∥e_{b}∥}^{2}, ζ_{b} (e_{o}) = \frac{1}{2 σ_{o}^{2}} {∥e_{o}∥}^{2} . \end{matrix}

Here, $σ_{b} = σ_{o} = 10^{- 2}$ . The states are discretised over the interval [0,1] at sites/grid cells $r_{i}^{⋆} = (i - \frac{1}{2}) / N_{⋆}$ for $i \in [[1, N_{⋆}]]$ , with $⋆ = b, o$ , and a. Unless otherwise specified, the cost metric has a quadratic dependence on the distance between sites, i.e. $[C_{ba}]_{i k} = | r_{i}^{b} - r_{k}^{a} |^{2}$ and $[C_{oa}]_{j k} = | r_{j}^{o} - r_{k}^{a} |^{2}$ . This is our reference set-up. The observation operator and the mass of the observations y^b and y^o will be specified for each experiment.

We consider four experiments in which we choose to vary key parameters in the OTDA set-up.

3.1.1 Varying the imbalance of the observation states

In the first experiment, the system is fully observed with H=I. We choose m(y^b)=1 and the mass of y^o to be in the set $m (y^{o}) \in \{0.5, 1, 1.5\}$ , with all the other parameters being fixed to the reference. The results are displayed in Fig. 8. Figure 8a corresponds to the case m(y^o)=0.5. The resulting mass of the analysed observables is then $m (x^{a}) = m (x^{b}) = m (x^{o}) = 0.79$ . The adjustment of x^b compared with y^b and the adjustment of x^o compared with y^o, which are required to balance x^b and x^o, are patent. Figure 8b corresponds to the case m(y^o)=1. The resulting mass of the analysed observables is then $m (x^{a}) = m (x^{b}) = m (x^{o}) = 1$ . No adjustment is required here because m(y^o)=m(y^b), and x^o and y^o as well as x^b and y^b coincide. Finally, the mass of y^o is set to m(y^o)=1.5 in Fig. 8c. The resulting mass of the analysed observables is then $m (x^{a}) = m (x^{b}) = m (x^{o}) = 1.34$ . The adjustment of x^b compared with y^b and the adjustment of x^o compared with y^o, which are required to balance x^b and x^o, are visually obvious, but the balancing goes in the opposite direction compared with Fig. 8a, as expected.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f08

Figure 8 A hybrid OTDA 3D-Var analysis with one-dimensional physical states, where only the mass of y^o is varied. Its mass is m(y^o)=0.5 in panel (a), m(y^o)=1 in panel (b), and m(y^o)=1.5 in panel (c). The dashed blue curve corresponds to the first guess y^b; the red dots correspond to the observations y^o; the analysis state x^a is the solid green curve; and the analysed observables x^b and x^o are blue and red dotted curves, respectively. The support of the observation is underlined by a wavy grey segment. The corresponding classical analysis is also plotted with a dot-dash orange curve. The x axis corresponds to the position in space; the y axis corresponds to the concentration value of the fields.

Download

3.1.2 Varying the sparseness of the observation operator

In this second experiment, with all of the other parameters being fixed to their reference value, only a fraction of the domain is observed, over $[0, \frac{1}{4}]$ , $[0, \frac{1}{2}]$ , and $[0, \frac{3}{4}]$ , where $H \in O_{N_{o} \times N_{o}}^{+}$ with $N_{o} = N_{o} / 4, N_{o} / 2, 3 N_{o} / 4$ , and $H_{l}^{j} = δ_{l, j}$ for $l \in [[1, N_{o}]]$ and $j \in [[1, N_{o}]]$ .

The masses of the states that are built to generate y^b and y^o, before applying any observation operator, are set to 1 and 1.5, respectively. As a result, we have m(y^b)=1, but m(y^o) may depart from 1.5 depending on H. The fully observed case corresponds to Fig. 8c. The results are displayed in Fig. 9. It shows how smooth the OTDA solution can be compared with that of classical DA. However, as in Fig. 9a, OTDA can also handle obviously diverging sources of information, as is the case when the support of H is $[0, \frac{1}{4}]$ and when y^o and y^b can be seen to be barely consistent. In that case, the OTDA solution is smooth but bimodal.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f09

Figure 9 A hybrid OTDA 3D-Var analysis with one-dimensional physical states, where the observation operator is increasingly sparse. The support of H is $[0, \frac{1}{4}]$ for panel (a), $[0, \frac{1}{2}]$ for panel (b), and $[0, \frac{3}{4}]$ for panel (c). See Fig. 8 for a description of the legend.

Download

3.1.3 Changing the nature of the cost metric

In this third experiment, we choose the cost metric to be of the form $[C_{ba}]_{i k} = | r_{i}^{b} - r_{k}^{a} |^{α}$ and $[C_{oa}]_{j k} = | r_{j}^{o} - r_{k}^{a} |^{α}$ . Only half of the domain is observed over $[0, \frac{1}{2}]$ , as in the case of Fig. 9b. As the mass of the state used to produce y^o is 1.5, we have a slightly different m(y^o)=1.49, with the rest of the mass being located in the unobserved part of the domain. All of the other parameters follow the reference set-up. The results are displayed in Fig. 10. For Fig. 10a, α is set to 0.5. For Fig. 10b, α is set to 1. For Fig. 10c, the cost metric is piecewise; it is quadratic, i.e. α=2, for pairs of sites separated by less than 10⁻¹, i.e. $| r_{i}^{b} - r_{k}^{a} | = | r_{j}^{o} - r_{k}^{a} | \leq 10^{- 1}$ , whereas the costs are chosen to be infinite for pairs of sites beyond this range. Hence, transport is prohibited beyond a distance of 10⁻¹. The case of a pure quadratic cost corresponds to Fig. 9b. The impact on the shape of the OTDA analysis is very significant and suggests that one could easily tailor their own cost to suit their specific DA problem.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f10

Figure 10 A hybrid OTDA 3D-Var analysis with one-dimensional physical states, where the cost metrics are changed. See the bulk of the text for a definition of the three cost metrics. See Fig. 8 for a description of the legend.

Download

3.1.4 Classical data assimilation as a sub-case of the hybrid optimal transport data assimilation

In the fourth experiment, we would like to numerically check the theoretical prediction of Sect. 2.4.3. Consider again the reference configuration; however, only half of the domain, over $[0, \frac{1}{2}]$ , is observed, $H \in O_{N_{o} \times N_{o}}^{+}$ with $N_{o} = N_{o} / 2$ and $H_{l}^{j} = δ_{l, j}$ for $l \in [[1, N_{o}]]$ and $j \in [[1, N_{o}]]$ . Most importantly, the cost metric has a quadratic dependence on the distance between sites, i.e. $[C_{ba}]_{i k} = λ | r_{i}^{b} - r_{k}^{a} |^{2}$ and $[C_{oa}]_{j k} = λ | r_{j}^{o} - r_{k}^{a} |^{2}$ . The case λ=1 corresponds to Fig. 9b. Figure 11 shows the results corresponding to λ=10³ for panel (a), λ=10⁴ for panel (b), and λ=10⁶ for panel (c). When λ is increased, the OTDA analysis should tend to the classical DA solution. This is indeed corroborated by Fig. 11 and supports the claim of Sect. 2.4.3. Note that, as opposed to the three earlier experiments, we had to tune ε here, as the wide range of λ has a significant impact on the balance of the key terms of the cost function (transport cost, discrepancy errors, and regularisation).

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f11

Figure 11 Scaling up the cost metrics λC_ba and λC_oa with increasing λ, the OTDA analysis converges to the classical DA analysis. Panels (a), (b), and (c) correspond to the scaling values $λ = 10^{3}, 10^{4},$ and 10⁶, respectively. See Fig. 8 for a description of the legend.

Download

3.2 Two-dimensional examples

Considering the case in which the physical space of the fields is two-dimensional, we perform a couple of 3D-Var analyses on concentration fields (puffs of a pollutant). The states are discretised in the domain [0,1]² at sites/grid cells $r_{i, j}^{⋆} = ((i - \frac{1}{2}) / N_{⋆}^{x}, (j - \frac{1}{2}) / N_{⋆}^{y})$ for $(i, j) \in [[1, N_{⋆}^{x}]] \times [[1, N_{⋆}^{y}]]$ , with $⋆ = b, o$ , and a. We choose $N_{b}^{x} = N_{b}^{y} = N_{o}^{x} = N_{o}^{y} = N_{a}^{x} = N_{a}^{y} = 10^{2}$ , such that $N_{b} = N_{o} = N_{a} = 10^{4}$ . Hence, the number of control variables is 3×10⁴. The observation vectors are y^b and y^o. As y^b is a fully observed instance of x^b, we have 𝒩_b=N_b, while 𝒩_o may differ from N_o depending on the definition of the observation operator H. Moreover, we choose (Gaussian statistics)

\begin{matrix} (40) & ζ_{b} (e_{b}) = \frac{1}{2 σ_{b}^{2}} {∥e_{b}∥}^{2}, ζ_{b} (e_{o}) = \frac{1}{2 σ_{o}^{2}} {∥e_{o}∥}^{2} . \end{matrix}

Here, $σ_{b} = σ_{o} = 10^{- 2}$ . The entropic regularisation parameter is set to $ε = 10^{- 3}$ .

The first analysis is displayed in Fig. 12. The observation operator H is the identity, but its support is restricted to the subdomain [0,0.6]². The plumes of pollutants y^b and y^o are generated from states formed as combinations of bell-like puffs. The system is unbalanced with m(y^b)=1.35 and m(y^o)=0.73. The cost metric has a quadratic dependence on the distance between sites, i.e. $[C_{ba}]_{i k} = {∥r_{i}^{b} - r_{k}^{a}∥}_{2}^{2}$ and $[C_{oa}]_{j k} = {∥r_{j}^{o} - r_{k}^{a}∥}_{2}^{2}$ . The OTDA analysis is clearly smoother than the classical solution. The classical solution does not cope very well with the seemingly disagreeing sources of information y^b and y^o, generating sharp transitions in the classical analysis. If y^b and y^o were consistently obtained from a truth perturbed with errors with short-range correlation, i.e. if they were drawn from the true prior distribution and in the absence of mislocation errors, then the classical analysis would be as good as it can be, whereas the OTDA solution may be too safe, i.e. too smooth. However, if one believes that structural errors and, in particular, location errors can impact y^b and y^o, then the classical solution is improper and the OTDA analysis preferable.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f12

Figure 12 Two-dimensional concentration maps (plumes) of a hybrid OTDA analysis for the first configuration. The observations y^b and y^o; the analysed observables x^b and x^a, i.e. the state analysis; x^o; and the corresponding classical DA analysis $x_{cl}^{a}$ are displayed. All fields are rescaled so that their joint maximum is 1. All concentration maps use the same scale. The colour bar represents a unified contrast scale for the diverse field concentrations.

Download

The second analysis is displayed in Fig. 13. The support of the observation operator H is again contained within the subdomain [0,0.6]², but only one of four grid cells is actually observed in this area. The observation states y^b and y^o are generated from the same states as for Fig. 12. The system is unbalanced with m(y^b)=1.35 and m(y^o)=0.18. The cost metric is defined to be the same as in Fig. 12. The OTDA analysis is even smoother in this case compared with the classical DA analysis. It is much less impacted by the sparseness of the observation operator. The classical solution has to account for the staggered observations in the top left corner of the domain because the first guess in that region is very uncertain. By contrast, the OTDA solution assumes that location errors are possible; hence, it moves around the mass corresponding to these observations so that the structure of the observation operator is not as impactful on the analysis.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f13

Figure 13 Two-dimensional concentrations maps of a hybrid OTDA analysis for the second configuration. The observations y^b and y^o; the analysed observables x^b and x^a, i.e. the state analysis; x^o; and the corresponding classical DA analysis are displayed. Compared to Fig. 12, only H has changed. The level sets in panels (c) and (f) are omitted because they are driven by the staggered observation operator.

Download

4 Uncertainty quantification

In this section, we compute the posterior error covariance matrix P^a associated with the state analysis x^a, in order to complete the OTDA 3D-Var analysis description. There are many ways to proceed depending on the chosen regularisation and on the targeted degree of generality. Here, for the sake of consistency, we report on the way to derive P^a following the computation of the analysis state x^a proposed in Sect. 2.5.

4.1 Mathematical results

Let us denote the compounded vectors of the observations, of the Lagrange multipliers, and of the observables as well as the compounded observation operator by

\begin{matrix} (41) & \begin{aligned} y \overset{Δ}{=} [\begin{array}{c} y^{b} \\ y^{o} \end{array}], f \overset{Δ}{=} [\begin{array}{c} f_{b} \\ f_{o} \end{array}], \\ x \overset{Δ}{=} [\begin{array}{c} x^{b} \\ x^{o} \end{array}], H \overset{Δ}{=} [\begin{array}{cc} I_{b} & 0 \\ 0 & H \end{array}], \end{aligned} \end{matrix}

of size 𝒩_b+𝒩_o, 𝒩_b+𝒩_o, N_b+N_o, and $(N_{b} + N_{o}) \times (N_{b} + N_{o})$ , respectively. Similarly, we define the sum of the error statistics by $ζ (f) \overset{Δ}{=} ζ_{b} (f_{b}) + ζ_{o} (f_{o})$ , whose Legendre–Fenchel transform is $ζ^{*} (f) = ζ_{b}^{*} (f_{b}) + ζ_{o}^{*} (f_{o})$ . Using this notation, we can recapitulate the key results of Sect. 2.5: the effective dual cost function is

\begin{matrix} (42a) & J_{ε}^{*} (f) \overset{Δ}{=} ε Z_{ε} (f) + ζ^{*} (f) - f^{⊤} y, \\ (42b) & Z_{ε} (f) \overset{Δ}{=} 2 \sum_{k} \sqrt{Z_{ε, k}^{ba} (f_{b}) Z_{ε, k}^{oa} (f_{o})} . \end{matrix}

Here, the analysis state reads

\begin{matrix} (43) & x_{k}^{a} (f) = \sqrt{Z_{ε, k}^{ba} (f_{b}) Z_{ε, k}^{oa} (f_{o})}, \end{matrix}

where the dependence of the analysis state and the partition functions on f, f_b, and f_o is now emphasised and made explicit.

Any prior source of error in the system stems from the information vector y and hence drives the posterior error in the analysis x^a. That is why we are interested in the sensitivity of x^a with respect to y, i.e. $δ x^{a} = \partial_{y} x^{a} δ y$ . Denoting the expectation operator by 𝔼, the error covariance matrix is then defined by

\begin{matrix} (44a) & P^{a} = E [δ x^{a} {(δ x^{a})}^{⊤}], \\ (44b) & = (\partial_{y} x^{a}) E [δ y δ y^{⊤}] {(\partial_{y} x^{a})}^{⊤}, \\ (44c) & = (\partial_{y} x^{a}) (\partial_{f}^{2} ζ^{*}) {(\partial_{y} x^{a})}^{⊤}, \end{matrix}

from which a matrix factor X^a of P^a, i.e. which satisfies $P^{a} = X^{a} {(X^{a})}^{⊤}$ and whose expressions are usually much shorter than those of P^a, can be extracted, up to the multiplication by an orthogonal matrix on the right:

\begin{matrix} (45) & X^{a} = \partial_{y} x^{a} (\partial_{f}^{2} ζ^{*})^{\frac{1}{2}} . \end{matrix}

To compute the sensitivity matrix ∂_yx^a, we leverage the stationarity of the dual cost function at the minimum:

\begin{matrix} (46) & \partial_{f} J_{ε}^{*} (f (y), y) = 0 . \end{matrix}

We resort to the implicit function theorem:

\begin{matrix} (47) & 0 = d_{y} \partial_{f} J_{ε}^{*} (f (y), y) = \partial_{f}^{2} J_{ε}^{*} \partial_{y} f + \partial_{f} \partial_{y} J_{ε}^{*}, \end{matrix}

which yields

\begin{matrix} (48) & \partial_{y} f = - {[\partial_{f}^{2} J_{ε}^{*}]}^{- 1} \partial_{f} \partial_{y} J_{ε}^{*} = {[\partial_{f}^{2} J_{ε}^{*}]}^{- 1}, \end{matrix}

as $\partial_{f} \partial_{y} J_{ε}^{*} = - I_{b o}$ , where I_bo is the identity matrix in the compounded observation space $R^{N_{b} + N_{o}}$ . The sensitivity ∂_yx^a can now be computed using the Leibniz chain rule and Eq. (48):

\begin{matrix} (49) & \frac{\partial x^{a}}{\partial y} = \frac{\partial x^{a}}{\partial f} \frac{\partial f}{\partial y} = \partial_{f} x^{a} {[\partial_{f}^{2} J_{ε}^{*}]}^{- 1} . \end{matrix}

Let us now compute the Jacobian and Hessian in the right-hand side of Eq. (49). To that end and in order to externalise the observation operator, we introduce ${\hat{Z}}_{ε}$ and ${\hat{x}}_{a}$ , such that

\begin{matrix} (50) & {\hat{Z}}_{ε} (η = H^{⊤} f) \overset{Δ}{=} Z_{ε} (f), {\hat{x}}_{a} (η = H^{⊤} f) \overset{Δ}{=} x^{a} (f), \end{matrix}

and the related Jacobian and Hessians,

\begin{matrix} (51a) & Ω_{bo, a} \overset{Δ}{=} \partial_{η} {\hat{x}}_{a}, \\ (51b) & Ω_{bo, bo} \overset{Δ}{=} ε \partial_{η}^{2} {\hat{Z}}_{ε}, \\ (51c) & Λ_{bo, bo} \overset{Δ}{=} \partial_{f}^{2} ζ^{*} . \end{matrix}

${\hat{Z}}_{ε}$ and ${\hat{x}}_{a}$ can be shown to exist; they can be read off from the explicit expressions of Z_ε and x^a as functions of f. These Jacobian and Hessians depend on the choice of the regularisation operator and need to be computed analytically, which is simple but tedious; this is not reported here, as it is a regularisation-dependent calculation. The Hessian of the dual cost function Eq. (42a) can then be written as the sum

\begin{matrix} (52) & \partial_{f}^{2} J_{ε}^{*} = Λ_{bo, bo} + H Ω_{bo, bo} H^{⊤}, \end{matrix}

while the sensitivity matrix now reads

\begin{matrix} (53) & \partial_{f} x^{a} = Ω_{bo, a}^{⊤} H^{⊤} . \end{matrix}

Note that Ω_bo,bo can be interpreted as the covariance matrix of x, the compounded observable vector as defined in Eq. (41) (although seen as a random vector), under the assumption that x^b and x^o are connected via the W-barycentre x^a and the optimal transference plans P^ba and P^oa (all seen as random vectors). Combining Eqs. (52) and (53) with Eq. (45), we finally obtain the expression for a factor X^a of P^a:

\begin{matrix} (54) & X^{a} = Ω_{bo, a}^{⊤} H^{⊤} {[Λ_{bo, bo} + H Ω_{bo, bo} H^{⊤}]}^{- 1} Λ_{bo, bo}^{\frac{1}{2}} . \end{matrix}

Alternatively, we can use the Sherman–Morrison–Woodbury transformation, under the assumption that Ω_bo,bo is invertible:

\begin{matrix} (55) & X^{a} = Ω_{bo, a}^{⊤} Ω_{bo, bo}^{- 1} {[Ω_{bo, bo}^{- 1} + H^{⊤} Λ_{bo, bo}^{- 1} H]}^{- 1} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}} . \end{matrix}

These formulas are similar to the normal equations of classical DA. However, it should be noted that, in Eqs. (54) and (55), all of the prior error statistics are encapsulated in Λ_bo,bo, whereas the impact of OT is encoded in Ω_bo,bo. To be concrete, note that, when using Gaussian statistics (Eq. 12), Λ_bo,bo would simply read

\begin{matrix} (56) & Λ_{bo, bo} = [\begin{array}{cc} Λ_{bb} & 0 \\ 0 & Λ_{oo} \end{array}] = [\begin{array}{cc} B & 0 \\ 0 & R \end{array}] . \end{matrix}

4.2 Interpretation

Further, we can perform a block decomposition of Ω onto the spaces of x^b and x^o:

\begin{matrix} (57) & Ω_{bo, bo} \overset{Δ}{=} [\begin{array}{cc} Ω_{bb} & Ω_{bo} \\ Ω_{bo}^{⊤} & Ω_{oo} \end{array}] . \end{matrix}

It can be shown that Ω_bo is proportional to the optimal transference plan of the effective transport between x^o and x^b and that the blocks of the diagonal are themselves diagonal and depend on the observable states:

\begin{matrix} (58a) & Ω_{bo} = \frac{1}{ε} P^{bo}, \\ (58b) & Ω_{bb} = \frac{1}{ε} diag (x^{b}), \\ (58c) & Ω_{oo} = \frac{1}{ε} diag (x^{o}) . \end{matrix}

For instance, this could be shown by the explicit computation of $Ω_{bo, bo} = ε \partial_{η}^{2} {\hat{Z}}_{ε}$ .

Let us now examine the impact of OT on the analysis error covariance matrix. We first define

\begin{matrix} (59) & Δ = Λ_{bo, bo}^{- \frac{1}{2}} H Ω_{bo, bo}^{\frac{1}{2}}, \end{matrix}

whose thin singular value decomposition is UΣV^⊤, where U is an orthogonal matrix of size $(N_{b} + N_{o}) \times (N_{b} + N_{o})$ , Σ is a rectangular and diagonal matrix of size $(N_{b} + N_{o}) \times (N_{b} + N_{o})$ , and V is an orthogonal matrix of size $(N_{b} + N_{o}) \times (N_{b} + N_{o})$ . Then, we standardise Eq. (54) following, for example, Sect. 2.4.1 in Rodgers (2000):

\begin{matrix} (60a) & X^{a} = Ω_{bo, a}^{⊤} H^{⊤} {[Λ_{bo, bo} + H Ω_{bo, bo} H^{⊤}]}^{- 1} Λ_{bo, bo}^{\frac{1}{2}}, \\ (60b) & \begin{aligned} = Ω_{bo, a}^{⊤} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}} \\ \times {[I_{b o} + Λ_{bo, bo}^{- \frac{1}{2}} H Ω_{bo, bo} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}}]}^{- 1}, \end{aligned} \\ (60c) & = Ω_{bo, a}^{⊤} Ω_{bo, bo}^{- \frac{1}{2}} Δ^{⊤} {[I_{b o} + Δ Δ^{⊤}]}^{- 1}, \\ (60d) & = Ω_{bo, a}^{⊤} Ω_{bo, bo}^{- \frac{1}{2}} V Σ^{⊤} {[I_{b o} + Σ Σ^{⊤}]}^{- 1} U^{⊤} . \end{matrix}

Defining $σ = {(Σ Σ^{⊤})}^{\frac{1}{2}}$ , which is a square diagonal of size $(N_{b} + N_{o}) \times (N_{b} + N_{o})$ , we obtain, up to a multiplication by an irrelevant orthogonal matrix on the right, an equivalent factor for P^a:

\begin{matrix} (60e) & X^{a} = Ω_{bo, a}^{⊤} Ω_{bo, bo}^{- \frac{1}{2}} V \frac{σ}{I_{b o} + σ^{2}} . \end{matrix}

The diagonal values of σ, denoted σ_i≥0, represent the independent degrees of freedom (dof) values of information that can be extracted from the observations, which, in our case, is the first guess y^b and the traditional observations y^o, in contrast to Rodgers (2000), who only considers the dof values from y^o. The higher the σ_i, the more information attached to the dof of index i and the more squeezed the corresponding direction in X^a and P^a. From Eq. (59) and, in particular, its transpose, $Δ^{⊤} = Ω_{bo, bo}^{\frac{1}{2}} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}}$ , we can trace the flow of any piece of information. Such information stems from the observation vectors; hence, its flow starts in Δ^⊤ from $Λ_{bo, bo}^{- \frac{1}{2}}$ , the square root of the precision matrix $Λ_{bo, bo}^{- 1}$ . It is then transferred from the observation spaces to the observable spaces through 𝓗^⊤. It is finally optimally transported across the space of x^b and x^o by Ω_bo,bo, whose off-diagonal block is proportional to the transference plan P^bo. Hence, OT is not a primary source of uncertainty, as y^b and y^o can be, but moves information in between the observable spaces.

Let us now check the OTDA analysis error covariance matrix P^a in the classical DA limit. To that end, we study Eq. (54) in the classical limit. Similarly to Ω_bb and Ω_oo in Eq. (58a), Ω_aa is defined as the covariance matrix of x^a when only accounting for both OTs, and it can be shown that it reads

\begin{matrix} (61) & Ω_{aa} = \frac{1}{ε} diag (x^{a}) . \end{matrix}

When the cost tends to $C_{bo}^{\infty}$ , following the same arguments as in Sect. 2.4.3, x^b, x^o, and x^a must merge and, consequently, $Ω_{bo} = Ω_{aa} = Ω_{bb} = Ω_{oo}$ . Hence, in this limit, $Ω_{bo, bo} = 1_{2} Ω_{aa} 1_{2}^{⊤}$ and $Ω_{bo, a} = 1_{2} Ω_{aa}$ , with $1_{2} = [1 1]^{⊤}$ . Then, substituting these expressions of Ω_bo,bo and Ω_bo,a into Eq. (54), we get

\begin{matrix} (62a) & \begin{aligned} X^{a} & = Ω_{aa} 1_{2}^{⊤} H^{⊤} {[Λ_{bo, bo} + H 1_{2} Ω_{aa} 1_{2}^{⊤} H^{⊤}]}^{- 1} Λ_{bo, bo}^{\frac{1}{2}}, \end{aligned} \\ (62b) & \begin{aligned} = Ω_{aa} 1_{2}^{⊤} H^{⊤} {[I_{b o} + Λ_{bo, bo}^{- 1} H 1_{2} Ω_{aa} 1_{2}^{⊤} H^{⊤}]}^{- 1} Λ_{bo, bo}^{- \frac{1}{2}}, \end{aligned} \\ (62c) & \begin{aligned} = Ω_{aa} {[I_{a} + 1_{2}^{⊤} H^{⊤} Λ_{bo, bo}^{- 1} H 1_{2} Ω_{aa}]}^{- 1} 1_{2}^{⊤} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}}, \end{aligned} \\ (62d) & \begin{aligned} = {[Ω_{aa}^{- 1} + 1_{2}^{⊤} H^{⊤} Λ_{bo, bo}^{- 1} H 1_{2}]}^{- 1} 1_{2}^{⊤} H^{⊤} Λ_{bo, bo}^{- \frac{1}{2}} . \end{aligned} \end{matrix}

Here, I_a is the identity matrix of size N_a. From Eq. (62b) to Eq. (62c), we relied on the shift matrix lemma (e.g. Asch et al., 2016). For $Ω_{aa}^{- 1}$ in Eq. (62d) to exist, it must be assumed that $x^{a} \in O_{N_{a}}^{+, ⋆}$ ; i.e. all the entries of x^a are positive. This is verified when using entropic regularisation with ε>0, no matter how small the entries of x^a are. Moreover, if x^a has zero entries, x^a can be represented as the limit of a sequence of positive discrete measures.

Now, as we have

\begin{matrix} (63) & A^{- 1} \overset{Δ}{=} 1_{2}^{⊤} H^{⊤} Λ_{bo, bo}^{- 1} H 1_{2} = Λ_{bb}^{- 1} + H^{⊤} Λ_{oo}^{- 1} H, \end{matrix}

we conclude, from Eq. (62d), that the classical limit of the analysis error covariance matrix is

\begin{matrix} (64a) & P^{a} = X^{a} {(X^{a})}^{⊤} \\ (64b) & = {[Ω_{aa}^{- 1} + A^{- 1}]}^{- 1} A^{- 1} {[Ω_{aa}^{- 1} + A^{- 1}]}^{- 1} . \end{matrix}

If the limit of x^a when $ε \to 0^{+}$ is in $O_{N_{a}}^{+, ⋆}$ , then $Ω_{aa}^{- 1} = ε diag (x^{a})^{- 1}$ must vanish. In this case,

\begin{matrix} (65) & P^{a} \underset{ε \to 0^{+}}{⟶} A = {(\partial_{f_{b}}^{2} ζ_{b} + H^{⊤} \partial_{f_{o}}^{2} ζ_{o} H)}^{- 1}, \end{matrix}

which, assuming Gaussian errors, would read $P^{a} = {(B^{- 1} + H^{⊤} R^{- 1} H)}^{- 1}$ , as expected from classical DA. However, if some of the entries of x^a vanish in the limit $ε \to 0^{+}$ , we suspect that the limit of P^a will be the classical analysis error covariance matrix A but with the columns and rows associated with the vanishing entries of x^a tapered to 0.

4.3 Numerical illustration

We consider the one-dimensional example where half of the domain is observed, over $[0, \frac{1}{2}]$ . Here, $H \in O_{N_{o} \times N_{o}}^{+}$ with $N_{o} = N_{o} / 2$ and $H_{l}^{j} = δ_{l, j}$ for $l \in [[1, N_{o}]]$ and $j \in [[1, N_{o}]]$ . Further, the observations are unbalanced, m(y^b)=1 and m(y^o)=1.49; they have been generated through 𝓗 by discrete measures of mass 1 and 1.5, respectively. Moreover, the cost metric has a quadratic dependence on the distance between sites, i.e. $[C_{ba}]_{i k} = λ | r_{i}^{b} - r_{k}^{a} |^{2}$ and $[C_{oa}]_{j k} = λ | r_{j}^{o} - r_{k}^{a} |^{2}$ , where λ=10³. We use the results of Sect. 4.1 to compute the analysis error covariance matrix P^a, the transference plan P^bo, and the Jacobian Ω_bo,a. The numerical results are displayed in Fig. 14. The OTDA analysis state is bimodal, with some mass being left over to the right of the domain to account for the long tail of the first guess, which is far from the observation support. Hence, there is a vanishing field region, roughly [0.6,0.7], which separates the two components of the analysis state. As expected from OT theory, P^bo seems to converge towards a (non-trivial and barely differentiable) Monge map which, in this discrete context, has two branches, separated by the gap created by the vanishing field region. The analysis error covariance matrix P^a seems to converge to a diagonal matrix, with the exception of the vanishing field region. Indeed, there seems to be an uncertainty with respect to how much mass should be transferred from the first guess tail [0.7,1] to the main region [0,0.6]. This is given away by variance peaks near the edges of the gap and by negative covariances between the two edge points of the gap.

https://npg.copernicus.org/articles/31/335/2024/npg-31-335-2024-f14

Figure 14 Illustration of the second-order analysis of an OTDA 3D-Var. Panel (a) shows the same plot as Fig. 11a but with the addition of shaded regions delineated by plus and minus the standard deviations about the estimates for $x^{a}, x_{cl}^{a}, x^{b}$ , and x^o. These standard deviations are computed from the diagonal of the diagnosed posterior error covariance matrices associated with x^a, $x_{cl}^{a}$ , x^b, and x^o. Panel (b) displays the analysis error covariance matrix P^a. Panel (c) shows the optimal transference plan P^bo. Panel (d) shows the a,b block part of the Jacobian matrix Ω_bo,a, which is denoted Ω_ba.

Download

5 Conclusions

In this paper, we have introduced a theoretical framework for integrating nonlocal optimal transport (OT) metrics into data assimilation (DA), which we refer to as hybrid OTDA. This framework addresses the inconsistencies initially identified by Feyeux et al. (2016) when local metrics in classical DA are replaced with nonlocal ones based on OT.

Our focus has been on defining a 3D-Var approach for hybrid OTDA and deriving the first- and second-order moments of its analysis. The hybrid OTDA 3D-Var method blends classical DA and its background and observation error statistics with a Wasserstein barycentre problem involving the observables associated with the first guess and the observation vector. Importantly, our work demonstrates that classical DA is encompassed within this theoretical framework.

We have shown that this optimisation problem can be decomposed and simplified into a hybrid OTDA problem with a single OT problem based on an effective cost. This first problem yields the estimated x^b and x^o, followed by a pure W-barycentre problem involving these states, whose solution is known as the McCann interpolant. This W-barycentre computation serves as the final analysis step.

Our proposed method can be applied to sparsely and noisily observed systems, as expected from a robust DA method. It can also accommodate non-trivial error statistics typical of a 3D-Var approach. Furthermore, we have illustrated the method's flexibility in defining cost metrics through various one- and two-dimensional numerical examples. We have empirically checked how the OTDA analysis shifts towards the classical DA analysis, within the OTDA framework. Note that, for now, some limitations apply; mainly, the framework is presently meant for non-negative fields.

While we have looked into several other promising developments regarding our methodology, we have chosen not to report them in this paper. These developments will be the subject of a future publication and include the following:

the derivation of a Bayesian and probabilistic standpoint on OTDA;
a generalised formalism enforcing physical regularisation, such as smoothness of the field, on the analysis state;
a stochastic matrix formalism, which is a substitute to using transference plans but could offer more robustness in the presence of entropic regularisation;
employing cost matrices defined across several spaces, which is useful for realistic application where x^b and x^o lie in very distinct spaces, such as the space of emission of a pollutant and the space of the pollutant concentrations, respectively.

While our primary focus in this paper was on the derivation and understanding of key cost functions within the hybrid OTDA framework, we did not delve much into the numerical challenges, algorithmic complexity, or computing acceleration. For this aspect of the developments, we would rather rely on developments from OT experts, who are continuously improving the efficiency of numerical OT (e.g. Flamary et al., 2021).

In addition to strengthening the developments mentioned above, our future research will explore the application of the hybrid OTDA formalism in a sequential DA framework, as this paper concentrated solely on static analysis. We are also interested in investigating the role played by error statistics and cost metrics $\{ζ_{b}, ζ_{o}, C_{ba}, C_{oa}\}$ and their balancing in the hybrid OTDA analysis as well as in developing their objective tuning.

Appendix A: From the primal to the dual cost function for the full problem

This appendix is dedicated to the derivation of the transformation from a Lagrangian variant of the primal problem to the dual cost function (Eq. 20). It takes the form of the following series of transformations of the problem (from a Lagrangian to a dual cost function):

\begin{matrix} (A1a) & \begin{aligned} L = & min_{x^{b} x^{o} x^{a}} [max_{f_{b}} \{(y^{b} - x^{b})^{⊤} f_{b} - ζ_{b}^{*} (f_{b})\} \\ + max_{f_{o}} \{(y^{o} - H x^{o})^{⊤} f_{o} - ζ_{o}^{*} (f_{o})\} \\ + min_{P \in O_{b, o, a}^{+}} {P_{i j k} C_{ba}^{i k} + P_{i j k} C_{oa}^{j k} \\ + max_{h_{b} h_{o} f_{a}} {h_{i}^{b} (x_{i}^{b} - P_{i j k} 1_{o}^{j} 1_{a}^{k}) \\ + h_{j}^{o} (x_{j}^{o} - P_{i j k} 1_{b}^{i} 1_{a}^{k}) \\ + f_{a}^{k} (x_{k}^{a} - P_{i j k} 1_{b}^{i} 1_{o}^{j})}}], \end{aligned} \\ (A1b) & \begin{aligned} = & min_{x^{a} x^{b} x^{o}} [max_{f_{b}, f_{o}} {(y^{b} - x^{b})^{⊤} f_{b} \\ - ζ_{b}^{*} (f_{b}) + (y^{o} - H x^{o})^{⊤} f_{o} \\ - ζ_{o}^{*} (f_{o})} + max_{h_{b} h_{o} f_{a}} min_{P \in O_{b, o, a}^{+}} {P_{i j k} C_{ba}^{i k} + P_{i j k} C_{oa}^{j k} \\ + h_{i}^{b} (x_{i}^{b} - P_{i j k} 1_{o}^{j} 1_{a}^{k}) + h_{j}^{o} (x_{j}^{o} - P_{i j k} 1_{b}^{i} 1_{a}^{k}) \\ + f_{a}^{k} (x_{k}^{a} - P_{i j k} 1_{b}^{i} 1_{o}^{j})}], \end{aligned} \\ (A1c) & \begin{aligned} = & max_{\begin{matrix} h_{b} h_{o} f_{a} \\ f_{b} f_{o} \end{matrix}} min_{\begin{matrix} x^{a} x^{b} x^{o} \\ P \in O_{b, o, a}^{+} \end{matrix}} \\ [(y^{b} - x^{b})^{⊤} f_{b} - ζ_{b}^{*} (f_{b}) \\ + (y^{o} - H x^{o})^{⊤} f_{o} - ζ_{o}^{*} (f_{o}) + P_{i j k} C_{ba}^{i k} + P_{i j k} C_{oa}^{j k} \\ + h_{i}^{b} (x_{i}^{b} - P_{i j k} 1_{o}^{j} 1_{a}^{k}) + h_{j}^{o} (x_{j}^{o} - P_{i j k} 1_{b}^{i} 1_{a}^{k}) \\ + f_{a}^{k} (x_{k}^{a} - P_{i j k} 1_{b}^{i} 1_{o}^{j})], \end{aligned} \\ (A1d) & \begin{aligned} = & max_{f_{b} f_{o}} {f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o}) \\ + min_{\begin{matrix} h_{b} h_{o} f_{a} \\ x^{a} x^{b} x^{o} P \in O_{b, o, a}^{+} \end{matrix}} \\ [(h_{b} - f_{b})^{⊤} x^{b} + (h_{o} - H^{⊤} f_{o})^{⊤} x^{o} + f_{a}^{⊤} x^{a} \\ + P_{i j k} (C_{ba}^{i k} + C_{oa}^{j k} - h_{i}^{b} 1_{o}^{j} 1_{a}^{k} - h_{j}^{o} 1_{b}^{i} 1_{a}^{k} - f_{a}^{k} 1_{b}^{i} 1_{o}^{j})]} \end{aligned}, \end{matrix}

A1e

\begin{array}{l} (A1e) & \begin{aligned} = & max_{f_{b} f_{o}} [f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o}) \\ + min_{\begin{matrix} P \in O_{b, o, a}^{+} \end{matrix}} P_{i j k} (C_{ba}^{i k} + C_{oa}^{j k} - f_{b}^{i} 1_{o}^{j} 1_{a}^{k} - H_{l}^{j} f_{o}^{l} 1_{b}^{i} 1_{a}^{k})] . \end{aligned} \end{array}

In Eq. (A1a), the maps $ζ_{b}^{*}$ and $ζ_{o}^{*}$ are the Legendre–Fenchel transforms of the maps ζ_b and ζ_o, respectively. From Eq. (A1d) to Eq. (A1eA1e), taking the minimum over the observables x^b, x^o, and x^a implies enforcing h_b=f_b, $h_{o} = H^{⊤} f_{o}$ , and f_a=0. Hence, we obtain the dual problem which only depends on the Lagrange multipliers:

\begin{matrix} (A2a) & \begin{aligned} L^{*} = max_{(f_{b}, f_{o}) \in U_{bo}^{*} (C_{ba}, C_{oa}, H)} { & f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} \\ - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o})}, \end{aligned} \end{matrix}

where the ∗ symbol refers to dual and where the polyhedron $U_{bo}^{*} (C_{ba}, C_{oa}, H)$ is defined by

\begin{matrix} (A2b) & \begin{aligned} U_{bo}^{*} (C_{ba}, C_{oa}, & H) \overset{Δ}{=} {f_{b} \in R^{N_{b}}, f_{o} \in R^{N_{o}} : \\ \forall i, j, k, f_{b}^{i} + f_{o}^{l} H_{l}^{j} \leq C_{ba}^{i k} + C_{oa}^{j k}} . \end{aligned} \end{matrix}

The inequality constraints of the polyhedron $U_{bo}^{*}$ stem from the positivity constraint P_ijk≥0 in Eq. (A1eA1e). Very importantly, we have the coincidence of the minimum of the primal problem with the maximum of the dual problem $L = L^{*}$ , a property called strong duality (see Sect. 5.2 in Boyd and Vandenberghe, 2004). Strong duality can, for instance, be achieved if both the primal and dual cost functions are convex, which is the case here.

Appendix B: Derivation of the two-step hybrid optimal transport data assimilation algorithm

Here, we derive the two-step algorithm elaborated in Sect. 2.4.2. Moreover, entropic regularisation is added to the problem.

B1 First step: simplified hybrid optimal transport data assimilation problem

The first step of the full OTDA algorithm is a simplified OTDA problem based on a single OT problem driven by the cost C_bo. The corresponding high-level primal cost function is

\begin{matrix} (B1) & \begin{aligned} L = min_{x^{b} \in O_{b}^{+} x^{o} \in O_{o}^{+}} { & ζ_{b} (y^{b} - x^{b}) + ζ_{o} (y^{o} - H x^{o}) \\ + W_{C_{bo}} (x^{b}, x^{o})} . \end{aligned} \end{matrix}

The associated (lower-level) primal cost function, adding entropic regularisation (ε>0), is then

\begin{matrix} (B2a) & \begin{aligned} L_{ε} = min_{x^{b} \in O_{b}^{+} x^{o} \in O_{o}^{+}} & [ζ_{b} (y^{b} - x^{b}) + ζ_{o} (y^{o} - H x^{o}) \\ + min_{P \in U_{bo}} (ε K (P | ν) + P_{i j} C_{bo}^{i j})] . \end{aligned} \end{matrix}

In this optimisation problem, the admissible set of transference plans, i.e. the set of all 2-tensors of non-negative entries whose marginals are x^b and x^o, is defined by

\begin{matrix} (B2b) & U_{bo} \overset{Δ}{=} \{P \in O_{b, o}^{+} : P 1_{o} = x^{b}, P^{⊤} 1_{b} = x^{o}\} . \end{matrix}

As x^b and x^o are not predetermined, the prior transference plan ν cannot be selected from 𝒰_bo a priori. Hence, the simplest choice, which we decided to implement, is to set ν_ij to a constant, which assumes some statistical prior independence of x^b and x^o. A derivation of the dual problem equivalent to ℒ_ε can be obtained in the exact same way as in the previous subsection, although it is now less cluttered because there is only one OT to account for, instead of two. The associated Lagrangian is

\begin{matrix} (B3) & \begin{aligned} L_{ε} = & max_{f_{b} \in R^{N_{b}} f_{o} \in R^{N_{o}}} [f_{b}^{⊤} y^{b} + f_{o}^{⊤} y^{o} - ζ_{b}^{*} (f_{b}) - ζ_{o}^{*} (f_{o}) \\ + min_{P \in O_{b, o}^{+}} (ε \sum_{i j} \{P_{i j} \ln \frac{P_{i j}}{ν_{i j}} - P_{i j} + ν_{i j}\} \\ + P_{i j} \{C_{bo}^{i j} - f_{b}^{i} 1_{o}^{j} - H_{l}^{j} f_{o}^{l} 1_{b}^{i}\})] . \end{aligned} \end{matrix}

Again, the maps $ζ_{b}^{*}$ and $ζ_{o}^{*}$ are the Legendre–Fenchel transforms of the maps ζ_b and ζ_o, respectively. The variables f_b and f_o are Lagrange vectors; they are used to enforce the marginals of the transference plan associated with $W_{C_{bo}}$ . The unconstrained minimisation over P, i.e. the inner minimisation problem in Eq. (B3), is obtained by cancelling the gradient with respect to P, which yields

\begin{matrix} (B4) & P_{i j} = ν_{i j} e^{(f_{b}^{i} + f_{o}^{l} H_{l}^{j} - C_{bo}^{i j}) / ε} . \end{matrix}

Substituting this solution into minus the Lagrangian −ℒ_ε gives the regularised dual problem

\begin{matrix} (B5a) & J_{ε}^{*} = min_{f_{b} \in R^{N_{b}} f_{o} \in R^{N_{o}}} J_{ε}^{*} (f_{b}, f_{o}), \end{matrix}

with the associated Lagrangian

\begin{matrix} (B5b) & \begin{aligned} J_{ε}^{*} (f_{b}, f_{o}) = & ε (Z_{ε} - m (ν)) + ζ_{b}^{*} (f_{b}) + ζ_{o}^{*} (f_{o}) \\ - f_{b}^{⊤} y^{b} - f_{o}^{⊤} y^{o}, \end{aligned} \end{matrix}

which relies on the partition function

\begin{matrix} (B5c) & Z_{ε} = \sum_{i j} P_{i j} . \end{matrix}

The notation $J_{ε}^{*}$ and $J_{ε}^{*}$ , rather than $L_{ε}^{*}$ and $L_{ε}^{*}$ , signifies that we work on the opposite of $L_{ε}^{*}$ and $L_{ε}^{*}$ so as to obtain a dual problem to be minimised rather than maximised. Most importantly, we have, under conditions that will be satisfied in the following, the coincidence of the two minima $J_{ε}^{*} = - L_{ε}$ , i.e. strong duality. Assuming one can obtain a proper correspondence between the optimal f_b and f_o of the dual problem and x^b and x^o of the primal problem, this implies, once again, that the primal problem can be traded for the dual problem.

Even though the regularised optimisation problem is slightly different from the unregularised one, a difference which is controlled by the value of ε, the new dual optimisation problem is free, i.e. without constraints. It can be solved as it is, using, for instance, the L-BFGS-B minimiser (Liu and Nocedal, 1989). The advantage of the regularised dual formulation is twofold: (1) the dual cost function is unconstrained (free optimisation), and (2) we will trade a minimisation over N_b×N_o variables for a minimisation over N_b+N_o variables. This dual formulation can be viewed as a generalised physical-space statistical analysis system (PSAS) formalism (Courtier, 1997), an approach in which classical DA algebra is mostly carried out in observation space.

Once the optimal values for f_b and f_o are obtained, the optimal discrete Kantorovich transference plan P can be computed using Eq. (B4). As a result, as marginals of this transference plan, the solutions for the observables are

\begin{matrix} (B6) & x_{i}^{b} = P_{i j} 1_{o}^{j} = \sum_{j} P_{i j}, x_{j}^{o} = P_{i j} 1_{b}^{i} = \sum_{i} P_{i j} . \end{matrix}

B2 Second step: Wasserstein barycentre

Now that we have obtained the observables x^b and x^o via Eq. (B6), we would like to compute their W-barycentre. The joint mass m of these observables can be computed as follows:

\begin{matrix} (B7) & m = m (x^{b}) = m (x^{o}) . \end{matrix}

The high-level primal cost function of this W-barycentre problem is

\begin{matrix} (B8) & J_{w} = min_{x^{a} \in O_{N_{a}}^{+}} \{W_{C_{ba}} (x^{b}, x^{a}) + W_{C_{oa}} (x^{o}, x^{a})\} . \end{matrix}

We have found and practised several ways to solve this problem. One way is to compute the McCann interpolant. This is theoretically elegant, but Eq. (26) did not leverage regularisation of the W-barycentre problem. Instead, the approach reported here is to use the dual optimisation problem, in conjunction with entropic regularisation at finite ε>0. We leverage our knowledge of the mass m resulting from the first step of the algorithm by enforcing the mass in the cost function, m(P)=m. This seems redundant, but it actually yields, by construction and very naturally, a numerically efficient algorithm comparable to the ad hoc log-domain scheme proposed in Sect. 4.4 of Peyré and Cuturi (2019).

Again, one way (although not the only way) to write the primal problem is to use a gluing transference plan, a 3-tensor whose marginals are x^b, x^o, and x^a:

\begin{matrix} (B9a) & \begin{aligned} L_{ε} = min_{x^{a} \in O_{N_{a}}^{+} P \in U_{boa} (x^{a})} { & P \cdot C_{boa} + ε K (P | ν) \\ + f_{b}^{⊤} x^{b} + f_{o}^{⊤} x^{o}}, \end{aligned} \end{matrix}

where $[C_{boa}]_{i j k} = C_{ba}^{i k} + C_{oa}^{j k}$ , the binary operator ⋅ denotes the contraction of tensors, and

\begin{matrix} (B9b) & \begin{aligned} U_{boa} ( & x^{a}) \overset{Δ}{=} {P \in O_{b, o, a}^{+} : \forall i, P_{i j k} 1_{o}^{j} 1_{a}^{k} = x_{i}^{b}, \\ \forall j, P_{i j k} 1_{b}^{i} 1_{a}^{k} = x_{j}^{o}, \forall k, P_{i j k} 1_{b}^{i} 1_{o}^{j} = x_{k}^{a}} . \end{aligned} \end{matrix}

The 3-tensor ν is chosen to be $ν_{i j k} = x_{i}^{b} x_{j}^{o} / (m N_{a})$ , which is uniform in k and for which m(ν)=m. The resulting dual problem is

\begin{matrix} (B10a) & J^{*} = min_{f_{b} \in R^{N_{b}} f_{o} \in R^{N_{o}}} J^{*} (f_{b}, f_{o}), \end{matrix}

where the associated Lagrangian is

\begin{array}{l} J^{*} (f_{b}, f_{o}) = & ε (m \ln \frac{Z_{ε}}{m} + m - m (ν)) \\ (B10b) & - f_{b}^{⊤} x^{b} - f_{o}^{⊤} x^{o}, \end{array}

with the partition function

\begin{matrix} (B10c) & Z_{ε} = \sum_{i j k} ν_{i j k} e^{(f_{b}^{i} + f_{o}^{l} H_{l}^{j} - C_{ba}^{i k} - C_{oa}^{j k}) / ε} . \end{matrix}

This partition function is elegant but impractical because, with high dimensionality, a 3-tensor might be too large to store and compute with. However the partition function in Eq. (B10c) can be simplified by noticing that

\begin{matrix} (B11) & Z_{ε} = \sum_{i j} ν_{i j} e^{(f_{b}^{i} + f_{o}^{l} H_{l}^{j} - C_{bo}^{i j}) / ε}, \end{matrix}

where we introduced the effective cost metric

\begin{matrix} (B12) & {[C_{bo}]}_{i j} = - ε \ln (\sum_{k} \frac{ν_{i j k}}{ν_{i j}} e^{- (C_{ba}^{i k} + C_{oa}^{j k}) / ε}), \end{matrix}

which is the regularised cost – known in statistics and machine learning as a soft-plus transform – of Eq. (22c). The 2-tensor ν_ij plays the same role as that of the first step of the algorithm; we choose it as $ν_{i j} = x_{i}^{b} x_{j}^{o} / m$ , for which m(ν)=m. The dual problem now only involves 2-tensors and becomes numerically more efficient. Given the optimal f_b and f_o, the (glued) optimal transference plan P^boa is formally given by

\begin{matrix} (B13) & P_{i j k}^{boa} = \frac{ν_{i j k}}{Z_{ε}} e^{(f_{b}^{i} + f_{o}^{l} H_{l}^{j} - C_{ba}^{i k} - C_{oa}^{j k}) / ε} . \end{matrix}

The W-barycentre x^a is then given as a marginal of P^boa:

\begin{array}{l} (B14a) & x_{k}^{a} & = P_{i j k} 1_{b}^{i} 1_{o}^{j} \\ (B14b) & = \frac{1}{Z_{ε}} \sum_{i j} ν_{i j k} e^{(f_{b}^{i} + f_{o}^{l} H_{l}^{j} - C_{ba}^{i k} - C_{oa}^{j k}) / ε} . \end{array}

Because of the normalisation of the transference plan to m, the entropic regularisation exhibits a εmln Z_ε instead of εZ_ε. This systematically enforces normalisation in the computations of the gradients, as well as in the course of the numerical optimisation of the dual cost function, de facto working in the log domain. We experienced more stable computations and the ability to reach smaller ε, compared with the case without normalisation. This completes the solution through the two-step OTDA algorithm.

Code availability

The products of this paper are exclusively optimisation problems and methods to solve them; their implementation (code) used in the illustrative sections relies on freely available software to solve the optimisation problems, mainly L-BFGS-B and its implementation in SciPy (https://github.com/scipy/scipy, SciPy, 2024) and the Python Optimal Transport library and its implementation (https://github.com/PythonOT, Optimal Transport, 2024).

Data availability

No data sets were used in this article.

Author contributions

MB, PJV, and AF developed the methodology. MB implemented the numerics. MB wrote the manuscript. MB, PJV, AF, JDLB, and YR revised the manuscript.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

The authors would like to thank Oliver Talagrand (the executive and handling editor) and the two anonymous reviewers for their remarks and suggestions that helped shape the paper. This research has been supported by the ANR-ARGONAUT (PollutAnt and gReenhouse Gases emissiOns moNitoring from spAce at high ResolUTion) national research project (grant no. ANR-19-CE01-0007). CEREA is a member of Institut Pierre-Simon Laplace (IPSL).

Financial support

This research has been supported by the Agence Nationale de la Recherche (grant no. ANR-19-CE01-0007).

Review statement

This paper was edited by Olivier Talagrand and reviewed by two anonymous referees.

References

Amodei, M. and Stein, J.: Deterministic and fuzzy verification methods for a hierarchy of numerical models, Meteorol. Appl., 16, 191–203, https://doi.org/10.1002/met.101, 2009. a, b

Asch, M., Bocquet, M., and Nodet, M.: Data Assimilation: Methods, Algorithms, and Applications, Fundamentals of Algorithms, SIAM, Philadelphia, ISBN 978-1-611974-53-9, https://doi.org/10.1137/1.9781611974546, 2016. a, b

Bocquet, M.: Towards optimal choices of control space representation for geophysical data assimilation, Mon. Weather Rev., 137, 2331–2348, https://doi.org/10.1175/2009MWR2789.1, 2009. a

Bocquet, M., Wu, L., and Chevallier, F.: Bayesian design of control space for optimal assimilation of observations. I: Consistent multiscale formalism, Q. J. Roy. Meteor. Soc., 137, 1340–1356, https://doi.org/10.1002/qj.837, 2011. a

Boyd, S. P. and Vandenberghe, L.: Convex optimization, Cambridge university press, ISBN 978-0521833783, 2004. a, b

Briggs, W. M. and Levine, R. A.: Wavelets and field forecast verification, Mon. Weather Rev., 125, 1329–1341, https://doi.org/1520-0493(1997)125<1329:WAFFV>2.0.CO;2, 1997. a

Carrassi, A., Bocquet, M., Bertino, L., and Evensen, G.: Data Assimilation in the Geosciences: An overview on methods, issues, and perspectives, WIREs Clim. Change, 9, e535, https://doi.org/10.1002/wcc.535, 2018. a

Chizat, L., Peyré, G., Schmitzer, B., and Vialard, F.-X.: Scaling algorithms for unbalanced optimal transport problems, Math. Comput., 87, 2563–2609, https://doi.org/10.1090/mcom/3303, 2018. a

Courtier, P.: Dual formulation of four-dimensional variational assimilation, Q. J. Roy. Meteor. Soc., 123, 2449–2461, https://doi.org/10.1002/qj.49712354414, 1997. a, b

Daley, R.: Atmospheric Data Analysis, Cambridge University Press, New-York, ISBN 9780521458252, 1991. a

Davis, C., Brown, B., and Bullock, R.: Object-based verification of precipitation forecasts. Part I: Methodology and application to mesoscale rain areas, Mon. Weather Rev., 134, 772–1784, https://doi.org/10.1175/MWR3146.1, 2006a. a

Davis, C., Brown, B., and Bullock, R.: Object-based verification of precipitation forecasts. Part II: Application to convective rain systems, Mon. Weather Rev., 134, 1785–1795, https://doi.org/10.1175/MWR3145.1, 2006b. a

Duc, L. and Sawada, Y.: Geometry of rainfall ensemble means: from arithmetic averages to Gaussian-Hellinger barycenters in unbalanced optimal transport, J. Meteor. Soc. Jpn., 102, 35–47, https://doi.org/10.2151/jmsj.2024-003, 2024. a

Ebert, E. E.: Fuzzy verification of high-resolution gridded forecasts: a review and proposed framework, Meteorol. Appl., 15, 51–64, https://doi.org/10.1002/met.25, 2008. a

El Moselhy, T. A. and Marzouk, Y. M.: Bayesian inference with optimal maps, J. Comp. Phys., 231, 7815–7850, https://doi.org/10.1016/j.jcp.2012.07.022, 2012. a

Evensen, G., Vossepoel, F. C., and van Leeuwen. P. J.: Data Assimilation Fundamentals: A Unified Formulation of the State and Parameter Estimation Problem, Springer Textbooks in Earth Sciences, Geography and Environment, Springer Cham, ISBN 978-3-030-96708-6, https://doi.org/10.1007/978-3-030-96709-3, 2022. a

Farchi, A. and Bocquet, M.: Review article: Comparison of local particle filters and new implementations, Nonlin. Processes Geophys., 25, 765–807, https://doi.org/10.5194/npg-25-765-2018, 2018. a

Farchi, A., Bocquet, M., Roustan, Y., Mathieu, A., and Quérel, A.: Using the Wasserstein distance to compare fields of pollutants: application to the radionuclide atmospheric dispersion of the Fukushima-Daiichi accident, Tellus B, 68, 31682, https://doi.org/10.3402/tellusb.v68.31682, 2016. a, b

Feyeux, N.: Transport optimal pour l'assimilation de données images, Ph.D. thesis, Université Grenoble Alpes, https://inria.hal.science/tel-01480695 (last access: 7 July 2024), 2016. a, b, c, d, e, f, g, h, i

Feyeux, N., Vidard, A., and Nodet, M.: Optimal transport for variational data assimilation, Nonlin. Processes Geophys., 25, 55–66, https://doi.org/10.5194/npg-25-55-2018, 2018. a, b, c, d, e

Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., K., F., Fournier, N., Gautheron, L., Gayraud, N. T. H., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., Sutherland, D. J., Tavenard, R., Tong, A., and Vayer, T.: POT: Python Optimal Transport, J. Mach. Learn. Res., 22, 1–8, http://jmlr.org/papers/v22/20-451.html (last access: 7 July 2024), 2021. a

Gangbo, W. and McCann, R. J.: The geometry of optimal transportation, Acta Math., 177, 113–1618, https://doi.org/10.1007/BF02392620, 1996. a

Gilleland, E., Ahijevych, D. A., Brown, B. G., and Ebert, E. E.: Verifying forecasts spatially, B. Am. Meteorol. Soc., 91, 1365–1373, https://doi.org/10.1175/2010BAMS2819.1, 2010a. a

Gilleland, E., Lindström, J., and Lindgren, F.: Analyzing the image warp forecast verification method on precipitation fields from the ICP, Weather Forecast., 25, 1249–1262, https://doi.org/10.1175/2010WAF2222365.1, 2010b. a

Hoffman, R. N. and Grassotti, C.: A Technique for Assimilating SSM/I Observations of Marine Atmospheric Storms: Tests with ECMWF Analyses, J. Appl. Meteorol. Clim., 35, 1177–1188, https://doi.org/10.1175/1520-0450(1996)035<1177:ATFASO>2.0.CO;2, 1996. a

Hoffman, R. N., Liu, Z., Louis, J.-F., and Grassoti, C.: Distortion representation of forecast errors, Mon. Weather Rev., 123, 2758–2770, https://doi.org/10.1175/1520-0493(1995)123<2758:DROFE>2.0.CO;2, 1995. a, b

Janjić, T., Bormann, N., Bocquet, M., Carton, J. A., Cohn, S. E., Dance, S. L., Losa, S. N., Nichols, N. K., Potthast, R., Waller, J. A., and Weston, P.: On the representation error in data assimilation, Q. J. Roy. Meteor. Soc., 144, 1257–1278, https://doi.org/10.1002/qj.3130, 2018. a

Kalnay, E.: Atmospheric Modeling, Data Assimilation and Predictability, Cambridge University Press, Cambridge, ISBN 9780521796293, 2003. a

Keil, C. and Craig, G. C.: A displacement and amplitude score employing an optical flow technique, Weather Forecast., 24, 1297–1308, https://doi.org/10.1175/2009WAF2222247.1, 2009. a

Lack, S. A., Limpert, G. L., and Fox, N. I.: An object-oriented multiscale verification scheme, Weather Forecast., 25, 79–92, https://doi.org/10.1175/2009WAF2222245.1, 2010. a

Le Coz, C., Tantet, A., Flamary, R., and Plougonven, R.: Optimal transport for the multi-model combination of sub-seasonal ensemble forecasts, EGU General Assembly 2023, Vienna, Austria, 24–28 Apr 2023, EGU23-13445, https://doi.org/10.5194/egusphere-egu23-13445, 2023. a

Liu, D. C. and Nocedal, J.: On the limited memory BFGS method for large scale optimization, Math. Programm., 45, 503–528, https://doi.org/10.1007/BF01589116, 1989. a, b

Lledó, L., Skok, G., and Haiden, T.: Estimating location errors in precipitation forecasts with the Wasserstein and Attribution distances, EMS Annual Meeting 2023, Bratislava, Slovakia, 4–8 Sep 2023, EMS2023-602, https://doi.org/10.5194/ems2023-602, 2023. a

Marzouk, Y., Moselhy, T., Parno, M., and Spantini, A.: An introduction to sampling via measure transport, in: Handbook of Uncertainty Quantification, edited by: Ghanem, R., Higdon, D., and Owhadi, H., chap. 23, Springer International Publishing, Cham, 785–825, https://doi.org/10.1007/978-3-319-12385-1_23, 2017. a

Monge, G.: Mémoire sur la théorie des déblais et des remblais, in: Histoire de l'Académie Royale des Sciences de Paris, 666–704, 1781. a

Necker, T., Wolfgruber, L., Kugler, L., Weissmann, M., Dorninger, M., and Serafin, S.: The fractions skill score for ensemble forecast verification, Authorea [preprint], https://doi.org/10.22541/au.169169008.89657659/v1, 2023. a

Ning, L.and Carli, F. P., Ebtehaj, A. M., Foufoula-Georgiou, E., and Georgiou, T. T.: Coping with model error in variational data assimilation using optimal mass transport, Water Resour. Res., 50, 5817–5830, https://doi.org/10.1002/2013WR014966, 2014. a, b

Oliver, D. S.: Minimization for conditional simulation: Relationship to optimal transport, J. Comp. Phys., 265, 1–15, https://doi.org/10.1016/j.jcp.2014.01.048, 2014. a

Optimal Transport: Github [code], https://github.com/PythonOT, last access: 7 July 2024. a

Peyré, G. and Cuturi, M.: Computational Optimal Transport: With Applications to Data Science, Foundations and Trends in Machine Learning, 11, 355–607, https://doi.org/10.1561/2200000073, 2019. a, b, c, d, e, f, g, h

Plu, M.: A variational formulation for translation and assimilation of coherent structures, Nonlin. Processes Geophys., 20, 793–801, https://doi.org/10.5194/npg-20-793-2013, 2013. a

Ravela, S., Emanuel, K., and McLaughlin, D.: Data assimilation by field alignement, Physica D, 230, 127–145, https://doi.org/10.1016/j.physd.2006.09.035, 2007. a

Rodgers, C. D.: Inverse methods for atmospheric sounding, vol. 2, World Scientific, Series on Atmospheric, Oceanic and Planetary Physics, ISBN 978-981-02-2740-1, https://doi.org/10.1142/3171, 2000. a, b

SciPy: SciPy library main repository, Github [code], https://github.com/scipy/scipy, last access: 7 July 2024. a

Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices, Ann. Math. Stat., 35, 876–879, 1964. a

Skok, G.: Precipitation attribution distance, Atmos. Res., 295, 106998, https://doi.org/10.1016/j.atmosres.2023.106998, 2023. a

Talagrand, O.: Assimilation of Observations, an Introduction, J. Meteor. Soc. Jpn., 75, 191–209, https://doi.org/10.2151/jmsj1965.75.1B_191, 1997. a

Tamang, S. K., Ebtehaj, A., Zou, D., and Lerman, G.: Regularized variational data assimilation for bias treatment using the W asserstein metric, Q. J. Roy. Meteor. Soc., 146, 2332–2346, https://doi.org/10.1002/qj.3794, 2020. a

Tamang, S. K., Ebtehaj, A., van Leeuwen, P. J., Zou, D., and Lerman, G.: Ensemble Riemannian data assimilation over the Wasserstein space, Nonlin. Processes Geophys., 28, 295–309, https://doi.org/10.5194/npg-28-295-2021, 2021. a

Tamang, S. K., Ebtehaj, A., van Leeuwen, P. J., Lerman, G., and Foufoula-Georgiou, E.: Ensemble Riemannian data assimilation: towards large-scale dynamical systems, Nonlin. Processes Geophys., 29, 77–92, https://doi.org/10.5194/npg-29-77-2022, 2022. a

Vanderbecken, P. J., Dumont Le Brazidec, J., Farchi, A., Bocquet, M., Roustan, Y., Potier, É., and Broquet, G.: Accounting for meteorological biases in simulated plumes using smarter metrics, Atmos. Meas. Tech., 16, 1745–1766, https://doi.org/10.5194/amt-16-1745-2023, 2023. a, b

Vilani, C.: Topics in Optimal Transportation, vol. 58 of Graduate Studies in Mathematics, American Mathematical Society, Providence, Rhode Island, ISBN 9780821833124, 2003. a

Vilani, C.: Optimal Transport: Old and New, vol. 338 of Die Grundlehren der Mathematischen Wissenschaften, Springer-Verlag, Berlin Heidelberg, ISBN 978-3-540-71049-3, 2009. a, b

Wernli, H., Paulat, M., Hagen, M., and Frei, C.: SAL – A Novel Quality Measure for the Verification of Quantitative Precipitation Forecasts, Mon. Weather Rev., 136, 4470–4487, https://doi.org/10.1175/2008MWR2415.1, 2008. a, b

Ying, Y.: A Multiscale Alignment Method for Ensemble Filtering with Displacement Errors, Mon. Weather Rev., 147, 4553–4565, https://doi.org/10.1175/MWR-D-19-0170.1, 2019. a

Ying, Y., Anderson, J. L., and Bertino, L.: Improving Vortex Position Accuracy with a New Multiscale Alignment Ensemble Filter, Mon. Weather Rev., 151, 1387–405, https://doi.org/10.1175/MWR-D-22-0140.1, 2023. a

Zhou, W., Bovik, A. C., Sheikh, H. R., and Simoncelli, E.: Image quality assessment: from error visibility to structural similarity, IEEE T. Image Process., 13, 600–612, https://doi.org/10.1109/TIP.2003.819861, 2004. a

The notation y^b and y^o is at variance with the more familiar x^b and y notation of DA, respectively. However, this change will prove very useful in the following; it follows the idea that the full information vector is $y = [(y^{b})^{⊤}, (y^{o})^{⊤}]^{⊤}$ , whose components may benefit from homogeneous notation (Talagrand, 1997).