Physically constrained covariance inflation from location uncertainty

Zhen, Yicun; Resseguier, Valentin; Chapron, Bertrand

doi:https://doi.org/10.5194/npg-30-237-2023

Articles | Volume 30, issue 2

https://doi.org/10.5194/npg-30-237-2023

Articles | Volume 30, issue 2

Research article

29 Jun 2023

Research article |

| 29 Jun 2023

Physically constrained covariance inflation from location uncertainty

Yicun Zhen, Valentin Resseguier, and Bertrand Chapron

Abstract

Motivated by the concept of “location uncertainty”, initially introduced in Mémin (2014), a scheme is sought to perturb the “location” of a state variable at every forecast time step. Further considering Brenier's theorem (Brenier, 1991), asserting that the difference of two positive density fields on the same domain can be represented by a transportation map, we demonstrate that the perturbations consistently define a stochastic partial differential equation (SPDE) from the original PDE. It ensues that certain quantities, up to the user, are conserved at every time step. Remarkably, derivations following both the SALT (stochastic advection by Lie transport; Holm, 2015) and LU (location uncertainty; Mémin, 2014; Resseguier et al., 2017 a) settings can be recovered from this perturbation scheme. Still, it offers broader applicability since it does not explicitly rely on Lagrangian mechanics or Newton's laws of force. For illustration, a stochastic version of the thermal shallow water equation is presented.

Download & links

Article (PDF, 490 KB)

Download & links

How to cite.

Received: 06 Mar 2023 – Discussion started: 09 Mar 2023 – Revised: 13 May 2023 – Accepted: 23 May 2023 – Published: 29 Jun 2023

1 Introduction

Data assimilation is meant to extract information from measurements to improve the state estimate. Kalman-filter-based and particle-filter-based methods are now commonly used for academic studies and operational forecasts. For both methods, the estimate of a state variable and the uncertainty quantification of the estimate of a state variable are repeated at each data assimilation cycle. In the classical Kalman filter, this uncertainty is represented by a covariance matrix. In Monte Carlo-based methods (i.e., the ensemble Kalman filters and particle filters, etc.), it is represented by the spread of the ensemble members or particles. The uncertainty of the state estimate is further part of the input for the next data assimilation cycle. Frequently observed, the uncertainty can be underestimated in nonlinear numerical experiments when there is no model noise (Schlee et al., 1966; Harlim and Majda, 2010; Franzke et al., 2015). As a consequence, the state estimate in the subsequent time steps may not be efficiently adjusted by the physical measurements: the system is over-confident about its current state estimate. This phenomenon is usually referred to as filter divergence, possibly associated with the “curse of dimensionality” (see for instance Daum and Huang, 2003).

To address the latter issue, “covariance localization” has been developed for both Kalman-filter-based methods and particle filters (Houtekamer and Mitchell, 2001; Poterjoy, 2016). To further mitigate filter divergence, a practical strategy is to inflate the uncertainty estimate at each forecast time step or each data assimilation cycle (Anderson, 2007; Tibshirani and Knight, 1999; Li et al., 2009; Kotsuki et al., 2017; Ying and Zhang, 2015; Miyoshi, 2011; Raanes et al., 2019; Zhen and Harlim, 2015). For geophysical applications, the uncertainty is then often inflated by rescaling the ensemble covariance in order to match bias and variance. A natural alternative is the addition of noise in the dynamical equations.

In the context of ensemble-/particle-based methods, the uncertainty is usually inflated by artificially perturbing each ensemble member/particle. We refer the reader to Resseguier et al. (2021) for a review on the subject. It is then a natural question to ask whether there is a mathematical principle to guide this uncertainty inflation.

In the fluid dynamics community, random forcings are not introduced for inflation but to mimic the intermittent backscattering of energy from small scales toward large scales. Among those approaches, we may mention the stochastic Lagrangian models (Pope, 1994) and the Eulerian Gaussian backscatterings of the EDQNM (eddy damped quasi-normal Markovian; Orszag, 1970; Leith, 1971) model. Additive noise models, like the linear inverse models (Penland and Sardeshmukh, 1995), have then also been proposed for filtering purposes and have been thoroughly reviewed by Tandeo et al. (2020). Most methods mainly focus on comparing the estimated uncertainty and the statistics of the innovation process but ignore other mathematical/physical aspects (for instance, conservation laws). Other empirical approaches, referred to as the SPPT (stochastically perturbed parametrization tendency; Buizza et al., 1999) scheme and the SKEBS (stochastic kinetic energy backscatter; Berner et al., 2009) scheme, introduce multiplicative noise, with success in operational weather and climate forecast centers (Franzke et al., 2015). Still many drawbacks have been reported, above all violations of conservation laws (Reynolds et al., 2016; Leutbechner et al., 2016). Recently, the operational ocean circulation model NEMO has also been randomized (e.g. Leroux et al., 2022) but, again, without consideration of conservation.

Several authors proposed schemes specifically to enforce energy conservation or at least a given energy budget (e.g. Sapsis and Majda, 2013; Gugole and Franzke, 2019; Resseguier et al., 2021). To better constrain non-Gaussian schemes, many authors rely on physics and possibly on time-scale separation. Introduced by Hasselmann (1976), it is generally associated with the rigorous theories of averaging and homogenization. Majda et al. (1999) decomposed the state variable into slow-varying modes x_j and fast-varying modes y_j. The authors demonstrated that the interaction term between x_j and y_j, in the equation for x_j, can be modeled as a stochastic process solely in terms of x_j's, as the ratio of the timescales of x_j and y_j tends to 0. Nevertheless, homogenization methods, like Majda et al. (1999), may also lead to violation of energy conservation, even though some workarounds exist (Frank and Gottwald, 2013; Jain et al., 2014).

In Brzeźniak et al. (1991), later modified in Mikulevicius and Rozovskii (2004), Flandoli (2011), Mémin (2014), and Resseguier et al. (2017 a, 2021), preservation of kinetic energy is specifically emphasized. The true velocity of an incompressible flow is decomposed into a regular component and a turbulent one and the latter modeled by a stochastic noise. Mikulevicius and Rozovskii (2004) and Mémin (2014) further derived stochastic Navier–Stokes equations. For these two approaches, the large-scale advecting velocity differs, induced by different regularization of Newton' second law. Following another path, considering Hamilton’s principle with a stochastic advection constraint on Lagrangian fluid trajectories, Holm (2015) also proposed a consistent stochastic setting, i.e., stochastic advection by Lie transport (SALT). In particular, this derivation preserves Kelvin's circulation. Similarities and differences between these different stochastic frameworks are discussed in Resseguier et al. (2020).

From another perspective, the classical optimal transport theory suggests that the difference of two smooth positive density fields (ρ₁ and ρ₂) on a bounded domain Ω can be described by a transportation map: $T : Ω \to Ω$ . More specifically, there exists a diffeomorphism T of Ω to transform ρ₁ to ρ₂ under the diffeomorphism T with a minimal cost. Broadly speaking, T can be interpreted as how much ρ₂ differs from ρ₁, and T operates as a location correction. Indeed, starting from the same initial condition ρ(t), suppose that $ρ_{1} = ρ^{model} (t + Δ t)$ is the model forecast and $ρ_{2} = ρ (t + Δ t)$ is the true forecast. The additional uncertainty of ρ₁ due to model error can then be represented by a random T. It further suggests that the inflation of uncertainty can be achieved by casting a random T on each ensemble member/particle.

Motivated by such an optimal transport perspective and the concept of “location uncertainty”, proposed in Mémin (2014), a new strategy can thus seek to design a well-constrained “location perturbation” of the state variable.

Specifically, the idea of covariance inflation can be informally generalized to physical fields that are not always positive, i.e., physical fields other than the density field. Mathematically, a density field ρ is naturally associated with a differential n form θ_ρ, where n=dimΩ. The statement “ρ₁ transforms to ρ₂ under the diffeomorphism T” is equivalent to the mathematical relation $θ_{ρ_{1}} = T^{*} θ_{ρ_{2}}$ , where T^*, acting on all differential forms, is the pull-back operator induced by T or, equivalently, $θ_{ρ_{2}} = (T^{- 1})^{*} θ_{ρ_{1}}$ . Therefore, a random T (or equivalently, T⁻¹) could induce a perturbation of any differential k form.

To implement a physically constrained perturbation scheme, the state variable S under consideration must then be associated with some differential form θ, i.e., construct a one-to-one correspondence between snapshots of S and snapshots of θ. Note that this can be generalized to other types of tensor fields.

It will be demonstrated (Sect. 5) that it is indeed sometimes helpful to choose θ to be a contravariant tensor field other than differential forms. Yet, it must be stressed that associating the state variable S with a differential form θ is a key important step.

Correspondingly, at each forecast time step, the covariance inflation should follow four steps.

Step 1: find θ(t) based on S(t).
Step 2: construct a random diffeomorphism $T : Ω \to Ω$ .
Step 3: replace θ(t) with T^*θ(t) and calculate S(t) based on the new value of θ(t).
Step 4: calculate the forecast S(t+Δt) based on the new value of S(t).

Associating S with different θ shall then be constrained by different conservation laws for the perturbation scheme. More precisely, certain physical quantities are conserved in step 3, no matter how T is constructed or realized in step 2. We emphasize that the conservation law of the perturbation scheme merely depends on the choice of θ but is independent of the dynamics of the original deterministic system. A resulting stochastic partial differential equation (SPDE) will conserve a given quantity only if both the perturbation scheme and the original deterministic system conserve that quantity. We also remark that this scheme can not conserve all the physical quantities at the same time unless additional constraints upon the parameters are imposed. Hence the users must choose by themselves which physical quantity to conserve.

In summary, the key perspective of this paper is that the displacement vector field of physical state variables should be determined by the tensor fields associated with the physical fields. The advantage of this perspective is that certain physical quantities can be conserved while applying a displacement vector field to transfer the original physical field. A direct application of this perspective is the physically constrained covariance inflation scheme proposed in this paper. When the tensor fields are positive n forms on a bounded domain that have the same total mass, Brenier's theorem shows that the “optimal” displacement vector field exists and is unique, for a given cost function. In this case, the optimality of displacement vector field is well-defined. In other cases, the issue of “optimality” together with the existence and uniqueness of the optimal displacement vector field need to be carefully explored. We reserve this for future study.

This paper is organized as follows. Section 2 is a brief introduction of optimal transport theory. In Sect. 3 we present the perturbation scheme in detail, including the motivation, the specific techniques in derivation, and several examples. In Sect. 4, the resulting perturbation scheme is then compared with the stochastic advection by Lie transport (SALT) equations (Holm, 2015) and the location uncertainty (LU) equations (Mémin, 2014). For properly chosen θ and T_t, it is demonstrated that both SALT and LU settings are recovered within the proposed framework. To illustrate our purpose, a stochastic version of the thermal shallow water equation is then derived in Sect. 5. A final conclusion and discussion is given in Sect. 6.

The conventions of notation are as follows:

The letter i only refer to the ith independent Brownian motion. The letters p, q, j, and k refer to the components if p, q, j, and k are upper indices.
For Einstein's convention on summation (applies to all indices except i and j), if p is shown in both upper and lower indices, then the summation over p automatically applies.
Summation over i, j, and p automatically applies in all equations. For instance, e_i refers to ∑_ie_i, and y_j refers to ∑_jy_j.

2 Monge's formulation of optimal transport problem and Brenier's answer

Hereafter we briefly summarize some necessary concepts and results in optimal transport theory. Let Ω be a bounded domain in a n-dimensional Euclidean space.

Definition 2.0.1 (Monge's optimal transport problem). Given the cost function $c (x, y) \geq 0$ and probability measures $μ, ν \in P (Ω)$ ,

\begin{matrix} (1) & minimize M (T) = \int_{Ω} c (x, T (x)) d μ (x) \end{matrix}

over μ measurable maps $T : Ω \to Ω$ subject to $ν = T_{no .} μ$ .

Here the probability measures μ and ν are interpreted as mass distributions with total mass equal to 1. The map T is called a transport plan which moves the mass dμ(x) at location x to location T(x), with the cost c(x,T(x)) per unit of mass. Therefore the quantity 𝕄(T) is the total cost of the transport plan T. The constraint $ν = T_{no .} μ$ is interpreted as T transporting the mass distribution μ to the mass distribution ν. In the case that T is a diffeomorphism and that both ν and μ have smooth densities, i.e., assuming that dν(x)=f(x)dⁿx and dμ(x)=g(x)dⁿx for some smooth functions f,g on Ω,

\begin{matrix} (2) & ν = T_{no .} μ ⟺ g (x) = f (T (x)) | J_{T} (x) |, \end{matrix}

where J_T(x) refers to the Jacobian matrix of T at x. If we associate ν and μ with differential n forms $θ_{ν} = f d x^{1} \land \dots \land d x^{n}$ and $θ_{μ} = g d x^{1} \land \dots \land d x^{n}$ , then

\begin{matrix} (3) & ν = T_{no .} μ ⟺ θ_{μ} = T^{*} θ_{ν} . \end{matrix}

Brenier (1991) proved the existence and uniqueness of the solution to Monge's optimal transport problem for $c (x, y) = | x - y |^{2}$ . To better illustrate how optimal transport theory motivates us, we consider the following simplified version of Brenier's theorem.

Theorem 1.

Brenier, simplified version

Let μ and ν be measures with bounded smooth density on a bounded domain Ω⊂ℝⁿ. Let $c (x, y) = | x - y |^{2}$ . Then there is a convex function $ϕ : Ω \to R$ , such that $(\nabla ϕ)_{no .} μ = ν$ . And $\nabla ϕ : x \to x + \nabla ϕ |_{x}$ , defined μ− almost everywhere, is the unique solution to Monge's optimal transport problem.

The convexity of ϕ implies that the map ∇ϕ is one to one. Broadly speaking, Brenier's theorem implies that the difference of two density fields can be represented by a transportation map T.

3 The perturbation scheme

Consider a compressible flow on a bounded domain Ω. Let ρ denote the density field. Let ρ^model(t+Δt) and ρ^true(t+Δt) be the model forecast and the true forecast starting from the same density field at time t. If we assume that the model forecast and the truth have the same total mass, Brenier's theorem says that there exists a diffeomorphism $T : Ω \to Ω$ , so that

\begin{matrix} (4) & ρ^{true} (x, t + Δ t) = ρ^{model} (T (x), t + Δ t) J_{T} (x) . \end{matrix}

Note that the transportation T hereinafter is equivalent to the mapping T⁻¹ used in the Introduction. Equation (4) can further be written in terms of differential form. Let $θ_{ρ} = ρ d x^{1} \land \dots \land d x^{n}$ , then Eq. (4) is equivalent to

\begin{matrix} (5) & T^{*} θ_{ρ}^{model} (t + Δ t) = θ_{ρ}^{true} (t + Δ t) . \end{matrix}

For general differential forms θ, it is unclear whether a diffeomorphism T always exists that satisfies Eq. (5). However, Eq. (5) provides us with a tool for covariance inflation by constructing a random T at every infinitesimal time step.

At each time step we construct a small perturbation T:

\begin{matrix} (6) & T_{t} (x) = x + a (t, x) Δ t + e_{i} (t, x) Δ η_{i} (t), \end{matrix}

where $a (t, x), e_{i} (t, x) \in R^{n}$ , $Δ η_{i} (t) \sim N (0, Δ t)$ is a random number. Essentially, T_t(x)−x can be interpreted as a “location error” caused by the model error. In Eq. (6), a(t,x)Δt refers to a systematic location error, and e_iΔη_i refers to a random location error.

Stated in the Introduction, the state variable S must first be associated with a differential form θ. Then at every time step, T_t induces a perturbation of θ(t) by $θ (t) \to T_{t}^{*} θ (t)$ . It hence induces a perturbation of the state variable S(t). A forecast is then performed based on the perturbed state. Consequently, this perturbation scheme derives an SPDE from the original PDE.

This procedure can also be generalized to other types of tensor fields. We refer to Chern et al. (1999) for a rigorous definition of the tensor fields and the wedge algebra. For instance, we may choose $θ = ρ \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}}$ , where ${\frac{\partial}{\partial x^{i}}}_{i \leq n}$ forms a global basis of the tangent field. Then T_t induces a perturbation of θ by $θ (t) \to T_{t *} θ$ , where T_t* is the push-forward operator induced by T_t. In Sect. 5, such a generalization is found useful in the example of thermal shallow water equation.

Remark 1. When θ is a mixture of covariant and contravariant tensor fields, the perturbation scheme is slightly more complicated. Assume that $T_{t} : Ω_{1} \to Ω_{2}$ is a diffeomorphism, and $θ = v \otimes ω$ , where v and ω are contravariant or covariant tensor fields respectively on Ω₂. Then $T_{t}^{*} ω$ is a covariant tensor field on Ω₁. However, T_t can not directly induce a contravariant tensor field on Ω₁. In order to get a tensor field on Ω₁, we consider $T_{t}^{- 1} : Ω_{2} \to Ω_{1}$ and apply the push-forward operator on v. In sum, we may define the perturbation to be

\begin{matrix} (7) & θ (t) \to ((T_{t}^{- 1})_{*} v) \otimes (T_{t}^{*} ω) . \end{matrix}

Appendix A derives the expression of $T_{t}^{- 1}$ directly from the expression of T_t.

3.1 Calculation of $T_{t}^{} θ$ (or T_tθ)

A rigorous mathematical definition and calculation of T_t and $T_{t}^{*}$ should be given in terms of stochastic flows of diffeomorphisms and its Lie derivatives. A brief discussion of the relationship between $T_{t}^{*}$ and the Lie derivative is given in Sect. 4.1. We further refer to Leon (2021) for a detailed definition of the Lie derivative. Yet, to rapidly assess $T_{t}^{*} θ$ (or T_t*θ), a Taylor expansion and It $\hat{o}$ 's lemma can be used.

Given coordinates $(x^{1}, \dots, x^{n})$ , when θ is a differential k form, it can be written as

\begin{matrix} (8) & θ = \sum_{i_{1} < \dots < i_{k}} f^{i_{1}, \dots, i_{k}} d x^{i_{1}} \land \dots \land d x^{i_{k}} . \end{matrix}

Then

\begin{matrix} (9) & T_{t}^{*} θ = \sum_{i_{1} < \dots < i_{k}} f^{i_{1}, \dots, i_{k}} (T_{t} (x)) T_{t}^{*} (d x^{i_{1}} \land \dots \land d x^{i_{k}}) . \end{matrix}

Given in Appendix B, a Taylor expansion and Itô's lemma are applied to expand $T_{t}^{*} θ$ , leading us to compactly write

\begin{matrix} (10) & T_{t}^{*} θ = θ + M (θ) Δ t + N_{i} (θ) Δ η_{i}, \end{matrix}

for some differential k forms ℳ(θ) and 𝒩_i(θ). Hereafter, several examples of $T_{t}^{*} θ$ are presented.

The full derivation of these examples is skipped. We further express all the terms in coordinates. For instance, we replace 〈∇f〉a with $a^{j} \partial_{x^{j}} f$ , where, by convention of notation, $a^{j} \partial_{x^{j}} f = \sum_{j} a^{j} \frac{\partial f}{\partial x^{j}}$ . Similarly, $e_{i}^{⊤} H_{f} e_{i}$ is replaced with $e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f$ .

Remark 2. When $θ = f \frac{\partial}{\partial x^{i_{1}}} \land \dots \land \frac{\partial}{\partial x^{i_{k}}}$ is a contravariant tensor field,

\begin{matrix} (11) & T_{t *} θ = f (T_{t}^{- 1} (x)) T_{t *} (\frac{\partial}{\partial x^{i_{1}}} \land \dots \land \frac{\partial}{\partial x^{i_{k}}}) . \end{matrix}

The formula for $T_{t}^{- 1}$ is derived in Appendix A. Then the expression of $f (T_{t}^{- 1} (x))$ , $T_{t *} \frac{\partial}{\partial x^{i_{1}}} \land \dots \land \frac{\partial}{\partial x^{i_{k}}}$ and T_t*θ can be derived step by step in a similar way to that in the Appendix B.

Example 3.1.1 When θ=f is a function (differential 0 form),

\begin{matrix} (12) & \begin{aligned} (T_{t}^{*} θ) & = f + (a^{j} \partial_{x^{j}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ t \\ + e_{i}^{p} \partial_{x^{p}} f Δ η_{i} . \end{aligned} \end{matrix}

Example 3.1.2 When $θ = d x^{1} \land d x^{2} \land \dots \land d x^{n}$ ,

\begin{matrix} (13) & T_{t}^{*} θ = {1 + (\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) Δ t + \partial_{x^{p}} e_{i}^{p} Δ η_{i}} θ, \end{matrix}

where $J_{i} = \partial_{x^{p}} e_{i}^{p} \partial_{x^{q}} e_{i}^{q} - \partial_{x^{p}} e_{i}^{q} \partial_{x^{q}} e_{i}^{p}$ .

Example 3.1.3 When $θ = f d x^{1} \land \dots \land d x^{n}$ ,

\begin{matrix} (14) & \begin{aligned} T_{t}^{*} θ = & {f + ((\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) f + (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \partial_{x^{p}} f \\ + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ t \\ + (\partial_{x^{p}} e_{i}^{p} f + e_{i}^{p} \partial_{x^{p}} f) Δ η_{i}} d x^{1} \land \dots \land d x^{n} . \end{aligned} \end{matrix}

Example 3.1.4 When θ=f^jdx^j (note that by the convention of notation, $f^{j} d x^{j} = \sum_{j = 1}^{n} f^{j} d x^{j}$ ),

\begin{matrix} (15) & \begin{aligned} T_{t}^{*} θ = & {f^{j} + (a^{p} \partial_{x^{p}} f^{j} + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f^{j} + \partial_{x^{j}} a^{p} f^{p} \\ + \partial_{x^{j}} e_{i}^{p} e_{i}^{q} \partial_{x^{q}} f^{p}) Δ t \\ + (e_{i}^{p} \partial_{x^{p}} f^{j} + \partial_{x^{j}} e_{i}^{p} f^{p}) Δ η_{i}} d x^{j} . \end{aligned} \end{matrix}

Example 3.1.5 When $θ = f \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}}$ ,

\begin{matrix} (16) & \begin{aligned} T_{t *} θ = & {f + ((\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) f + (- (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \\ + \partial_{x^{q}} e_{i}^{p} e_{i}^{q}) \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ t \\ + (\partial_{x^{p}} e_{i}^{p} f - e_{i}^{p} \partial_{x^{p}} f) Δ η_{i}} \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}} \end{aligned} . \end{matrix}

3.2 Derivation of the SPDE

Suppose S is the full state variable of the dynamical system:

\begin{matrix} (17) & \frac{\partial S}{\partial t} = g (S) . \end{matrix}

Let f be a component or a collection of components of S. We then associate f with a differential form θ in the perturbation scheme; i.e., there is an invertible map ℱ that maps the space of f to the space of θ, such that ℱ(f)=θ. In the examples in this paper, the corresponding map ℱ is obvious. When f refers to a scalar quantity on the domain. We can choose to associate a differential n form $θ = f d x^{1} \land \dots \land d x^{n}$ with f, as in Example 3.1.3, an n vector $θ = f \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}}$ as in Example 3.1.5, or a differential 0 form (function) θ=f. When f refers to a vector-valued function $f = (f^{1}, \dots, f^{n})$ , we can associate the differential 1 form $f = f^{1} d x^{1} + \dots + f^{n} d x^{n}$ with f. It is not hard to see that ℱ is obvious once the type of tensor field is chosen. Suppose the propagation equation for f is

\begin{matrix} (18) & d f = g^{f} (S) d t . \end{matrix}

This implies a propagation equation for θ:

\begin{matrix} (19) & d θ = g^{θ} (S) d t . \end{matrix}

The discrete-time perturbed forecast at each time step consists of the following two steps:

\begin{matrix} (20) & \tilde{θ} (t + Δ t) = θ (t) + g^{θ} (S (t)) Δ t \\ (21) & θ (t + Δ t) = T_{t}^{*} \tilde{θ} (t + Δ t), \end{matrix}

with $T_{t}^{*} \tilde{θ} (t + Δ t) = \tilde{θ} (t + Δ t) + M (\tilde{θ} (t + Δ t)) Δ t + N_{i} (\tilde{θ} (t + Δ t)) Δ η_{i} + o (Δ t)$ for some differential forms $M (\tilde{θ})$ and $N_{i} (\tilde{θ})$ .

As the physical PDE (Eq. 20) is deterministic, $‖ \tilde{θ} (t + Δ t) - θ (t) ‖$ scales in O(Δt). Indeed, there is no noise term to induce a scaling in $O (\sqrt{Δ t})$ . Therefore, it can be assumed that there exists C>0 so that $‖ M (\tilde{θ} (t + Δ t)) - M (θ (t)) ‖ < C Δ t$ and $‖ N_{i} (\tilde{θ} (t + Δ t)) - N_{i} (θ (t)) ‖ < C Δ t$ , for Δt small enough. Then

\begin{matrix} (22) & \begin{aligned} T_{t}^{*} \tilde{θ} & (t + Δ t) = \tilde{θ} (t + Δ t) + (M (θ (t)) + O (Δ t)) Δ t \\ + (N_{i} (θ (t)) + O (Δ t)) Δ η_{i} + o (Δ t) \\ = & \tilde{θ} (t + Δ t) + M (θ (t)) Δ t + N_{i} (θ (t)) Δ η_{i} + o (Δ t) . \end{aligned} \end{matrix}

Therefore,

\begin{matrix} (23) & \begin{aligned} θ (t + Δ t) & = θ (t) + g^{θ} (S (t)) Δ t + M (θ (t)) Δ t \\ + N_{i} (θ (t)) Δ η_{i} + o (Δ t) . \end{aligned} \end{matrix}

This suggests the following stochastic propagation equation for θ:

\begin{matrix} (24) & d θ = g^{θ} (S) d t + M (θ) d t + N_{i} (θ) d η_{i} . \end{matrix}

Since there is a one-to-one correspondence between θ and f, Eq. (19) also suggests a stochastic propagation equation for f, which can be written as

\begin{matrix} (25) & d f = g^{f} (S) d t + M^{f} (f) d t + N_{i}^{f} (f) d η_{i} . \end{matrix}

We denote the additional terms in Eq. (25) by

\begin{matrix} (26) & d_{s} f := M^{f} (f) d t + N_{i}^{f} (f) d η_{i} . \end{matrix}

Then Eq. (25) can be written as

\begin{matrix} (27) & d f = g^{f} (S) d t + d_{s} f . \end{matrix}

Remark 3. (d_sf is not directly related to the original dynamics). d_sf is completely determined by $T_{t}^{*} θ$ but is not directly related to the original dynamics Eq. (18). Therefore, once the expression of T in Eq. (6) and the choice of θ are determined, the perturbation term d_sf is prescribed. However, the choice of θ is up to the user and may then be related to the original dynamics.

Remark 4. In particular, there is no noise in the original dynamics (Eq. 18) which could be correlated with the noise of the resulting stochastic scheme (21). That is why the Itō lemma directly applies in the Taylor development (B4) of f and then in the Eq. (22), leading to Eq. (23) and the final SPDE. Indeed, unlike the Itō–Wentzell formula (Kunita, 1997) – a cornerstone of the LU scheme – there is no additional cross-correlation term between $T_{t}^{*}$ and $\tilde{θ} (t + Δ t)$ . The final SPDE (24) makes the link between the solution θ and the Brownian motions η_i clear. But, at a given time step t, since Eq. (18) has no noise term, $\tilde{θ} (t + Δ t)$ is correlated with the $t^{'} \mapsto η_{i} (t^{'})$ for $t^{'} < t$ only and is independent of the new Brownian increment Δη_i(t) generating T_t. Therefore, there is no cross-correlation term between $T_{t}^{*}$ and $\tilde{θ} (t + Δ t)$ .

Remark 5. For a numerical implementation of our stochastic scheme, the time integration of the SPDE (25) may require a smaller time step Δt than the time integration of the deterministic PDE (18) for two reasons. First, the available SDE (stochastic differential equation) time integration schemes are often less accurate than their deterministic counterparts. Secondly, the modified dynamics may involve additional Courant–Friedrichs–Lewy (CFL) constraints, related to for instance noise-induced diffusion.

Example 3.2.1. When θ=f, Example (3.1.1),

\begin{matrix} (28) & \begin{aligned} T_{t}^{*} θ - θ & = (a^{p} \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ t \\ + e_{i}^{p} \partial_{x^{p}} f Δ η_{i} . \end{aligned} \end{matrix}

This implies that

\begin{matrix} (29) & d_{s} f = (a^{p} \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) d t + e_{i}^{p} \partial_{x^{p}} f d η_{i} . \end{matrix}

To physically interpret this equation, we rewrite

\begin{matrix} (30) & \frac{d_{s} f}{d t} + V^{p} \partial_{x^{p}} f = \partial_{x^{p}} ((\frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f), \end{matrix}

where

\begin{matrix} (31) & V^{p} = - a^{p} + \frac{1}{2} \partial_{x^{q}} (e_{i}^{p} e_{i}^{q}) - e_{i}^{p} \frac{d η_{i}}{d t} . \end{matrix}

Terms of advection and diffusion are recognized. The matrix $\frac{1}{2} e_{i} e_{i}^{T}$ is symmetric and non-negative and represents a diffusion matrix. The pth component of the advecting velocity V^p is composed of the drift −a^p, a correction $\frac{1}{2} \partial_{x^{q}} (e_{i}^{p} e_{i}^{q})$ , and a stochastic advecting velocity $- e_{i}^{p} \frac{d η_{i}}{d t}$ .

If the original deterministic PDE (18) is an advection–diffusion equation, with advecting velocity u and diffusion coefficient D, the final SPDE to simulate (Eq. 25) is now a stochastic advection–diffusion equation, with advecting velocity u+V and diffusion matrix $D I_{d} + \frac{1}{2} e_{i} e_{i}^{T}$ :

\begin{matrix} (32) & \frac{d f}{d t} + (u^{p} + V^{p}) \partial_{x^{p}} f = \partial_{x^{p}} ((D δ_{p q} + \frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f) . \end{matrix}

This type of SPDE appears in the LU framework, detailed in Sect. 4.2.1.

Example 3.2.2. When $θ = f d x^{1} \land \dots \land d x^{n}$ , Example 3.1.3,

\begin{matrix} (33) & \begin{aligned} T_{t}^{*} & θ - θ = {((\partial_{x^{p}} a^{p} \\ + \frac{1}{2} J_{i}) f + (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ \\ + (\partial_{x^{p}} e_{i}^{p} f + e_{i}^{p} \partial_{x^{p}} f) Δ η_{i}} d x^{1} \land \dots \land d x^{n} . \end{aligned} \end{matrix}

This implies that

\begin{matrix} (34) & \begin{aligned} d_{s} f = & ((\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) f + (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \partial_{x^{p}} f \\ + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) d t \\ + (\partial_{x^{p}} e_{i}^{p} f + e_{i}^{p} \partial_{x^{p}} f) d η_{i} . \end{aligned} \end{matrix}

Rewritten, it leads to

\begin{matrix} (35) & \frac{d_{s} f}{d t} + \partial_{x^{p}} ({\tilde{V}}^{p} f) = \partial_{x^{p}} ((\frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f), \end{matrix}

where

\begin{matrix} (36) & \begin{aligned} {\tilde{V}}^{p} & = V^{p} - (e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) = - a^{p} + \frac{1}{2} (\partial_{x^{q}} e_{i}^{p} e_{i}^{q} - e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \\ - e_{i}^{p} \frac{d η_{i}}{d t} . \end{aligned} \end{matrix}

Again an advection–diffusion equation is recognized but of a different nature. Indeed, as expected for an n form, the PDE is similar to a density conservation equation. Moreover, the advecting drift is slightly different to take into account the cross-correlations between f(T_t(x)) and $T_{t}^{*} (d x^{1} \land \dots \land d x^{n})$ .

Recall, in fluid dynamics, that the Reynolds transport theorem provides an integral conservation equation for the transport of any conserved quantity within a fluid, connected to its corresponding differential equation. The Reynolds transport theorem is central to the LU setting. The present example thus already outlines a closed link between the proposed perturbation approach and the LU formulation. Accordingly, the SPDE (35) naturally appears in the LU framework, as detailed in Sect. 4.2.2.

Example 3.2.3. When θ=f^jdx^j, Example 3.1.4,

\begin{matrix} (37) & \begin{aligned} T_{t}^{*} θ - θ = & {(a^{p} \partial_{x^{p}} f^{j} + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f^{j} + \partial_{x^{j}} a^{p} f^{p} \\ + \partial_{x^{j}} e_{i}^{p} e_{i}^{q} \partial_{x^{q}} f^{p}) Δ t \\ + (e_{i}^{p} \partial_{x^{p}} f^{j} + \partial_{x^{j}} e_{i}^{p} f^{p}) Δ η_{i}} d x^{j} . \end{aligned} \end{matrix}

For each j, the coefficients of dx^j in $T_{t}^{*} θ - θ$ and those in θ can be compared, to lead to

\begin{matrix} (38) & \begin{aligned} d_{s} f^{j} = & (a^{p} \partial_{x^{p}} f^{j} + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f^{j} + \partial_{x^{j}} a^{p} f^{p} \\ + \partial_{x^{j}} e_{i}^{p} e_{i}^{q} \partial_{x^{q}} f^{p}) d t \\ + (e_{i}^{p} \partial_{x^{p}} f^{j} + \partial_{x^{j}} e_{i}^{p} f^{p}) d η_{i} . \end{aligned} \end{matrix}

Regrouping the terms for physical interpretation, it reads

\begin{matrix} (39) & \begin{aligned} \frac{d_{s} f^{j}}{d t} & + V^{p} \partial_{x^{p}} f^{j} + \partial_{x^{j}} (- a^{p} - e_{i}^{p} \frac{d η_{i}}{d t}) f^{p} \\ - \partial_{x^{j}} e_{i}^{p} e_{i}^{q} \partial_{x^{q}} f^{p} = \partial_{x^{p}} ((\frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f^{j}) . \end{aligned} \end{matrix}

Two additional terms complete the advection–diffusion term. The first one, $\partial_{x^{j}} (- a^{p} - e_{i}^{p} \frac{d η_{i}}{d t}) f^{p}$ , is reminiscent of the additional terms appearing in SALT momentum equations (Holm, 2015; Resseguier et al., 2020). The second term, $- \partial_{x^{j}} e_{i}^{p} e_{i}^{q} \partial_{x^{q}} f^{p}$ , comes from the cross-correlation in Itô notation.

Example 3.2.4. When $θ = f \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}}$ , Example 3.1.5,

\begin{matrix} (40) & \begin{aligned} T_{t *} θ - & θ = {((\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) f + (- (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \\ + \partial_{x^{q}} e_{i}^{p} e_{i}^{q}) \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) Δ t \\ + (\partial_{x^{p}} e_{i}^{p} f - e_{i}^{p} \partial_{x^{p}} f) Δ η_{i}} \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}} . \end{aligned} \end{matrix}

This implies

\begin{matrix} (41) & \begin{aligned} d_{s} f = & ((\partial_{x^{p}} a^{p} + \frac{1}{2} J_{i}) f + (- (a^{p} + e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) \\ + \partial_{x^{q}} e_{i}^{p} e_{i}^{q}) \partial_{x^{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x^{p}} \partial_{x^{q}} f) d t \\ + (\partial_{x^{p}} e_{i}^{p} f - e_{i}^{p} \partial_{x^{p}} f) d η_{i} . \end{aligned} \end{matrix}

It can then be verified that

\begin{matrix} (42) & \frac{d_{s} f}{d t} + \partial_{x^{p}} {\tilde{V}}^{p} f - {\tilde{\tilde{V}}}^{p} \partial_{x^{p}} f = \partial_{x^{p}} ((\frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f), \end{matrix}

where

\begin{matrix} (43) & {\tilde{\tilde{V}}}^{p} = {\tilde{V}}^{p} - (e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) = V^{p} - 2 (e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) . \end{matrix}

We recognize a diffusion term, $\partial_{x^{p}} ((\frac{1}{2} e_{i}^{p} e_{i}^{q}) \partial_{x^{q}} f)$ , a velocity divergence term, $\partial_{x^{p}} {\tilde{V}}^{p} f$ , and the advection term, $- {\tilde{\tilde{V}}}^{p} \partial_{x^{p}} f$ . The divergence term is comparable to one appearing in the density equation.

However, the velocity fields appearing in the divergent and advecting terms do not coincide. Indeed, they are even opposite for divergence-free noise ( $\partial_{x^{q}} e_{i}^{q} = 0$ ). This type of equation may appear uncommon but will be shown useful when applied to randomized thermal shallow water equations.

3.3 Conservation laws related to d_sf

A major advantage of the proposed perturbation scheme is to possibly prescribe θ to ensure that certain quantities are conserved. Define the discrete time version of d_sf as

\begin{matrix} (44) & Δ_{s} f = M^{f} (f) Δ t + N_{i}^{f} (f) Δ η_{i} . \end{matrix}

In general, conservation laws can be derived from the following two identities about the pull-back operator:

\begin{matrix} (45) & (T_{t}^{*} θ_{1}) \land (T_{t}^{*} θ_{2}) = T_{t}^{*} (θ_{1} \land θ_{2}) \\ (46) & d T_{t}^{*} θ = T_{t} d θ, \end{matrix}

where d refers to the differential operator acting on differential forms. Hereafter, we present how to derive the conservation laws for two particular examples.

Example 3.3.1. Suppose $θ_{1} = f d x^{1} \land \dots \land d x^{n}$ and define

\begin{matrix} (47) & {\hat{θ}}_{1} = T_{t}^{*} θ_{1} \\ (48) & \hat{f} = f + Δ_{s} f . \end{matrix}

Then ${\hat{θ}}_{1} = \hat{f} d x^{1} \land \dots \land d x^{n}$ . Therefore,

\begin{matrix} (49) & \begin{aligned} \int_{Ω} \hat{f} d x^{1} \dots d x^{n} = \int_{Ω} {\hat{θ}}_{1} = \int_{Ω} T_{t}^{*} θ_{1} = \int_{T_{t} (Ω)} θ_{1} = \int_{Ω} θ_{1} \\ = & \int_{Ω} f d x^{1} \dots d x^{n} . \end{aligned} \end{matrix}

Equation (49) implies that the total integral of f is not changed by the perturbation scheme. Next, suppose that θ₂=g is a function. Similarly, we define

\begin{matrix} (50) & {\hat{θ}}_{2} = T_{t}^{*} θ_{2} \\ (51) & \hat{g} = g + Δ_{s} g . \end{matrix}

Applying Eq. (45),

\begin{matrix} (52) & \begin{aligned} \int_{Ω} & \hat{f} \hat{g} d x^{1} \dots d x^{n} = \int_{Ω} {\hat{θ}}_{1} \land {\hat{θ}}_{2} = \int_{Ω} T_{t}^{*} (θ_{1} \land θ_{2}) \\ = \int_{T_{t} (Ω)} θ_{1} \land θ_{2} = \int_{Ω} θ_{1} \land θ_{2} = \int_{Ω} f g d x^{1} \dots . d x^{n} . \end{aligned} \end{matrix}

The total integral of fg is thus also conserved by the perturbation scheme. Similarly for any integer m≥0, fg^m is conserved by the perturbation scheme.

Example 3.3.2. Suppose n=2 and $θ = u d x + v d y$ , where $u = (u, v)$ is the velocity field. The vorticity $ω = \partial_{x} v - \partial_{y} u$ corresponds to the differential 2 form dθ:

\begin{matrix} (53) & d θ = ω d x^{1} \land d x^{2} . \end{matrix}

Define $\hat{θ} := T_{t}^{*} θ = \hat{u} d x^{1} + \hat{v} d x^{2}$ and $\hat{ω} = \partial_{x} \hat{v} - \partial_{y} \hat{u}$ . Then $d \hat{θ} = \hat{ω} d x^{1} \land d x^{2}$ , and

\begin{matrix} (54) & \begin{aligned} \int_{Ω} & \hat{ω} d x^{1} d x^{2} = \int_{Ω} d \hat{θ} = \int_{ω} d T_{t}^{*} θ = \int_{Ω} T_{t}^{*} d θ \\ = \int_{T_{t} (Ω)} d θ = \int_{Ω} ω d x^{1} d x^{2} . \end{aligned} \end{matrix}

Therefore, the vorticity is conserved by the perturbation scheme.

Example 3.3.3. Suppose n=3 and $θ = u d x + v d y + w d z$ , where $u = (u, v, w)$ is the velocity field. The vorticity $ω = (\partial_{y} w - \partial_{z} v, \partial_{z} u - \partial_{x} w, \partial_{x} v - \partial_{y} u)$ corresponds to the differential 2 form dθ:

\begin{matrix} (55) & \begin{aligned} d θ & = (\partial_{y} w - \partial_{z} v) d y \land d z + (\partial_{x} v - \partial_{y} u) d z \land d x \\ + (\partial_{x} v - \partial_{y} u) d x \land d y . \end{aligned} \end{matrix}

The helicity $Θ = u (\partial_{y} w - \partial_{z} v) + v (\partial_{x} v - \partial_{y} u) + w (\partial_{x} v - \partial_{y} u)$ corresponds to the differential 3 form:

\begin{matrix} (56) & \begin{aligned} d θ \land θ & = (u (\partial_{y} w - \partial_{z} v) + v (\partial_{x} v - \partial_{y} u) \\ + w (\partial_{x} v - \partial_{y} u)) d x \land d y \land d z . \end{aligned} \end{matrix}

Similarly, we define $\hat{Θ}$ by $d \hat{θ} \land \hat{θ} = \hat{Θ} d x \land d y \land d z$ . Then,

\begin{matrix} (57) & \begin{aligned} \int_{Ω} \hat{Θ} d x d y d z & = \int_{Ω} d \hat{θ} \land \hat{θ} = \int_{Ω} (d T_{t}^{*} θ) \land (T_{t}^{*} θ) \\ = & \int_{Ω} (T_{t}^{*} d θ) \land (T_{t}^{*} θ) = \int_{Ω} T_{t}^{*} (d θ \land θ) \\ = \int_{T_{t} (Ω)} d θ \land θ = \int_{Ω} Θ d x d y d z . \end{aligned} \end{matrix}

Hence, in this case, the total amount of helicity is conserved.

Example 3.3.4. Suppose that $θ_{1} = f d x^{1} \land \dots \land d x^{n}$ and that $θ_{2} = g \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}}$ . There exists a pairing 〈〉 for the differential n forms and the contravariant n vectors; i.e., 〈θ₁θ₂〉=fg is a function on Ω. Define

\begin{matrix} (58) & {\hat{θ}}_{1} = T_{t}^{*} θ_{1} = \hat{f} d x^{1} \land \dots \land d x^{n} \\ (59) & {\hat{θ}}_{2} = (T_{t}^{- 1})_{*} θ_{2} = \hat{g} \frac{\partial}{\partial x^{1}} \land \dots \land \frac{\partial}{\partial x^{n}} . \end{matrix}

Then we have

\begin{matrix} (60) & \hat{f} \hat{g} (T_{t}^{- 1} (x)) = 〈 {\hat{θ}}_{1} {\hat{θ}}_{2} 〉 |_{T_{t}^{- 1} (x)} = 〈 θ_{1} θ_{2} 〉 |_{x} = f g (x), \end{matrix}

and

\begin{matrix} (61) & \begin{aligned} \int_{Ω} {\hat{f}}^{2} \hat{g} d x^{1} \dots d x^{n} & = \int_{Ω} 〈 {\hat{θ}}_{1} {\hat{θ}}_{2} 〉 θ_{1} = \int_{Ω} 〈 θ_{1} θ_{2} 〉 θ_{1} \\ = \int_{Ω} f^{2} g d x^{1} \dots d x^{n} . \end{aligned} \end{matrix}

Remark 6 (The conservation law of the perturbation scheme is independent of the conservation law of the original dynamical system). The derivation of Eqs. (49), (52), (54), (57), and (61) is based on the generic properties of the pull-back and push-forward operator of tensor fields. Since the choice of θ is not directly determined by the dynamical system, the conservation law of the perturbation scheme is independent of the original dynamical system. Recall that the perturbed forecast consists of two steps: Eqs. (20) and (21). The conservation law of the perturbation scheme implies that certain quantities are conserved in the second step. On the other hand, the original dynamical system (Eq. 20) might enjoy some other conservation law. If a quantity is conserved by both the original dynamical system and the perturbation scheme, then this quantity must be conserved by the final stochastic PDE. If a quantity is conserved by only one of Eqs. (20) and (21), then it can not be concluded that this quantity is conserved by the final SPDE.

4 Comparison with other perturbation schemes

In this section, we demonstrate that both the stochastic advection by Lie transport (SALT) equation (Holm, 2015) and the location uncertainty (LU) equation (Mémin, 2014; Resseguier et al., 2017 a, 2020) can be recovered using the proposed perturbation scheme and properly choosing θ and the parameters a,e_i.

Note that the original LU paper (Mémin, 2014) assumed strong smoothness properties (finite variations in time) of the stochastic Navier–Stokes equations solution, to eventually remove the noises terms of this original Navier–Stokes equations under location uncertainty. Since Resseguier et al. (2017 a, b), this assumption was removed, in order to keep the important noise terms. Accordingly, the original deterministic LU Navier–Stokes equations from Mémin (2014) have been referred to as pseudo-stochastic Navier–Stokes equations (Resseguier et al., 2021). Being deterministic, these pseudo-stochastic equations cannot be recovered by our stochastic scheme, whereas we can recover the stochastic LU Navier–Stokes equations that originated from Resseguier et al. (2017 a).

4.1 Comparison with SALT equation

The original SALT equation (Holm, 2015) is derived based on a stochastically constrained variational principle δS=0, for which

\begin{matrix} (62) & \{\begin{cases} S (u, q) = \int ℓ (u, q) d t \\ d q + L_{d x_{t}} q = 0, \end{cases} \end{matrix}

where ℓ(u,q) is the Lagrangian of the system, 𝔏 is the Lie derivative, and x_t(x) is defined by (using our notation)

\begin{matrix} (63) & x_{t} (x) = x_{0} (x) + \int_{0}^{t} u (x, s) d s - \int_{0}^{t} e_{i} (x) \circ d η_{i} (s), \end{matrix}

in which u is the velocity vector field, and the ∘ means that the integral is defined in the Stratonovich sense, instead of in the Itô sense. Hence, $d x_{t} = u (x, t) d t - e_{i} \circ d η_{i}$ refers to an infinitesimal stochastic tangent field on the domain. Broadly speaking, we can express $d x_{t} = T_{t} (x) - x + u d t$ . Note the difference between Itô's notation and Stratonovich's notation; i.e., $e_{i} \circ d η_{i} \neq e_{i} d η_{i}$ . Our expression of T_t essentially follows Itô's notation, and $T_{t} (x) \neq x - e_{i} Δ η_{i}$ in this subsection. Instead, it becomes $T_{t} (x) = x + \frac{1}{2} e_{i}^{p} \partial_{x_{p}} e_{i} Δ t - e_{i} Δ η_{i}$ .

In the second part of Eq. (62), q is assumed to be a quantity advected by the flow. q can correspond to any differential form that is not uniquely determined by the velocity (since the SALT equation for the velocity is usually determined by the first equation of Eq. 62). In Holm (2015), the Lie derivative $L_{d x_{t}} q$ is calculated using Cartan's formula:

\begin{matrix} (64) & L_{d x_{t}} q = d (i_{d x_{t}} q) + i_{d x_{t}} d q . \end{matrix}

Essentially, the Lie derivative $L_{d x_{t}} q$ corresponds to $T_{t}^{*} q - q + f^{q} (S) d t$ , if we assume that the deterministic forecast of q is simply the advection of q by u. More generally, $L_{d x_{t} - u d t} q = T_{t}^{*} q - q$ . Therefore, the SALT equation for q is the same as our equation for q. We remark that Cartan's formula can not be directly applied to calculate the Lie derivative if the expression of dx_t is in Itô's notation.

The SALT equation regarding the velocity u comes from the first equation of Eq. (62). For most cases, the velocity u is associated with the momentum, a differential 1 form $m = u^{j} d x^{j} = u^{1} d x^{1} + \dots + u^{n} d x^{n}$ . In the examples discussed in Holm (2015), it is observed that, when the Lagrangian includes the kinetic energy, the stochastic noise contributes a term $L_{d x_{t}} θ$ , where θ is a differential 1 form related to the momentum 1 form. For instance, θ=m in the example of “Stratonovich stochastic Euler–Poincaré flow” in Holm (2015), and $θ = m + R^{j} d x^{j}$ in the example of “stochastic Euler–Boussinesq equations of a rotating stratified incompressible fluid” in Holm (2015). Already pointed out, the operator $L_{d x_{t}}$ is closely related to $T_{t}^{*}$ , and the momentum equation in SALT can be derived using our proposed scheme by properly choosing θ.

Holm (2015) requires that q is a differential form since Cartan's formula is only useful for differential forms q. This restriction can be relaxed by employing the original definition of Lie derivative with respect to a deterministic/stochastic flow of diffeomorphism discussed in Leon (2021), so that $L_{d x_{t}} q$ can be generalized to the case where q is a mixed-tensor field. This corresponds to Eq. (7) of the current paper.

Compared with Holm (2015) and Leon (2021), the proposed perturbation approach seems more flexible and does not have to rely on the Lagrangian mechanics. In particular, the velocity field can be associated with other tensor fields than the momentum 1 form. The perturbation, not directly related to the physics, can then be applied to any PDE. Moreover, our approach provides a new interpretation of $L_{d x_{t} - u d t}$ in terms of the optimal transportation associated with the infinitesimal forecast error at each time step. This interpretation certainly suggests practical numerical methods to infer a,e_i. Given a long sequence of reanalysis data or simulated high-resolution data, the one-step forecast can be evaluated using the low-resolution model, with the high-resolution state at each time step being the initial condition. T_t is then estimated at each time step by comparing the low-resolution forecast and the high-resolution forecast. Finally, a and e_i could be learned from these samples of T_t.

4.2 Comparison with the LU equation

Mentioned above, the Reynolds transport theorem is central to the LU setting, and we already outlined a closed link between the proposed perturbation approach and the LU formulation. This link – related to differential n forms – will be described precisely later in this subsection. But, before this, we focus on another key ingredient of LU: the stochastic material derivative of functions (differential 0 forms).

4.2.1 0 forms in the LU framework

Dropping the forcing terms, the LU equation for compressible and incompressible flow reads (Resseguier et al., 2017 a)

\begin{matrix} (65) & \partial_{t} f + w^{⋆} \cdot \nabla f = \nabla \cdot (\frac{1}{2} a \nabla f) - σ \dot{B} \cdot \nabla f \\ (66) & w^{⋆} = w - \frac{1}{2} (\nabla \cdot a)^{⊤} + σ (\nabla \cdot σ)^{⊤}, \end{matrix}

where f can be any quantity that is assumed to be transported by the flow, i.e., $D f / D t = 0$ , where $D / D t$ is the Itō material derivative. For instance, f could be the velocity (dropping forces in the SPDE), the temperature, or the buoyancy. Compared to SALT notations, −e_idη_i is denoted $σ d B = σ_{• i} d B_{i}$ . We refer to Resseguier et al. (2020, Appendix A) for the complete table of SALT–LU notation correspondences. Derived in Resseguier (2017, Appendix 10.1) and Resseguier et al. (2021, 6.1.3), we can rewrite it as

\begin{matrix} (67) & \partial_{t} f + w_{S} \cdot \nabla f = \frac{1}{2} (σ_{• i} \cdot \nabla) (σ_{• i} \cdot \nabla f) - (σ \dot{B}) \cdot \nabla f, \\ (68) & = - (σ \circ \dot{B}) \cdot \nabla f, \\ (69) & w_{S} = w + w_{S}^{c} \\ (70) & w_{S}^{c} = - \frac{1}{2} (\nabla \cdot a)^{⊤} + \frac{1}{2} σ (\nabla \cdot σ)^{⊤}, \\ (71) & = - \frac{1}{2} (σ_{• i} \cdot \nabla) σ_{• i}, \end{matrix}

where $σ \circ \dot{B}$ is the Stratonovich noise of the SPDE, and w and w_S (denoted u in the SALT framework) are the Itō drift and the Stratonovich drift of the fluid flow respectively. Separating the terms of the SPDE related to the deterministic dynamics from the term associated with the stochastic scheme results in

\begin{matrix} (72) & d^{LU} f = g^{f} (S) d t + d_{s}^{LU} f, \end{matrix}

where

\begin{matrix} (73) & g^{f} (S) = - w \cdot \nabla f \\ (74) & \begin{aligned} d_{s}^{LU} f & = - w_{S}^{c} \cdot \nabla f d t + \frac{1}{2} (σ_{• i} \cdot \nabla) (σ_{• i} \cdot \nabla f) d t \\ - (σ d B) \cdot \nabla f . \end{aligned} \end{matrix}

Terms in Eqs. (65) and (66) translate to our notation in the following way:

\begin{aligned} - w_{S}^{c} \cdot \nabla f d t = & \frac{1}{2} e_{i}^{q} \partial_{x_{q}} e_{i}^{p} \partial_{x_{p}} f \\ \frac{1}{2} (σ_{• i} \cdot \nabla) (σ_{• i} \cdot \nabla f) = & \frac{1}{2} e_{i}^{p} \partial_{x_{p}} (e_{i}^{q} \partial_{x_{q}} f) \\ = & \frac{1}{2} (e_{i}^{p} \partial_{x_{p}} e_{i}^{q} \partial_{x_{q}} f + e_{i}^{p} e_{i}^{q} \partial_{x_{p}} \partial_{x_{q}} f) \\ - σ d B \cdot \nabla f = & e_{i}^{p} \partial_{x_{p}} f d η_{i} . \end{aligned}

Hence,

\begin{matrix} (75) & \begin{aligned} d_{s}^{LU} f & = (e_{i}^{q} \partial_{x_{q}} e_{i}^{p} \partial_{x_{p}} f + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x_{p}} \partial_{x_{q}} f) d t \\ + e_{i}^{p} \partial_{x_{p}} f d η_{i} . \end{aligned} \end{matrix}

Recall that Eq. (29) can be obtained by our perturbation scheme, while f is associated with a differential 0 form. Direct calculation means that Eq. (75) coincides with Eq. (29) when

\begin{matrix} (76) & \begin{aligned} T_{t} (x) & = x + e_{i}^{q} \partial_{x_{q}} e_{i} Δ t + e_{i} Δ η_{i} = x - w_{S}^{c} Δ t \\ + (- w_{S}^{c} Δ t - σ Δ B) . \end{aligned} \end{matrix}

The LU equation can thus be derived by choosing θ=f and T_t by Eq. (76). At first glance, it does not seem straightforward to make such a choice. Nevertheless, it can be recognized that the term $(- w_{S}^{c} Δ t - σ Δ B) = (\frac{1}{2} e_{i}^{q} \partial_{x_{q}} e_{i} Δ t + e_{i} Δ η_{i})$ is the Itō noise and its Itō-to-Stratonovich correction. Hence, it corresponds to the Stratonovich noise e_i∘dη_i of the flow associated with T_t. The additional drift $- w_{S}^{c} Δ t$ is different in nature. It is related to the advection correction $w_{S}^{c} \cdot \nabla f$ in the LU setting. Indeed, in the LU framework, the Itō drift, w, is seen as the resolved large-scale velocity. That is why, in this framework, the deterministic dynamics (74) involves the Itō drift, w. This is also the reason why, under the LU derivation, the advected velocity is assumed to be given by the Itō drift, w. It differs from the Stratonovich drift $w_{S} = w + w_{S}^{c}$ , used as an advection velocity in the SALT approach or in Mikulevicius and Rozovskii (2004) (where the Stratonovich drift is denoted u). Interested readers are referred to Resseguier et al. (2020, Appendix A) for a discussion on these assumptions. Note however that in all these approaches, the advecting velocity is always the Stratonovich drift. This can be seen, e.g., in the Stratonovich form of LU Eq. (68).

To also understand Eq. (76), the inverse flow can be considered. According to Appendix (A),

\begin{matrix} (77) & T_{t}^{- 1} (x) = x - e_{i} Δ η_{i} = x + σ Δ B . \end{matrix}

Considering T_t to represent how much the model forecast differs from the true forecast at every time step, $T_{t}^{- 1}$ can be understood to represent how much the true forecast differs from the model forecast at each time step. Therefore, the LU equation can be derived using the proposed perturbation scheme, choosing θ=f and assuming that the true forecast differs from the model forecast by a displacement prescribed by Eq. (77).

4.2.2 n forms in the LU framework

The LU physical justification relies on a stochastic interpretation of fundamental conservation laws, typically conservation of extensive properties (i.e., integrals of functions over a spatial volume) like momentum, mass, matter, and energy (Resseguier et al., 2017 a). These extensive properties can be expressed by integrals of differential n forms. For instance, the mass and the momentum are integrals of the differential n forms $ρ d x^{1} \land \dots \land d x^{n}$ and $ρ w d x^{1} \land \dots \land d x^{n}$ , respectively. In the LU framework, a stochastic version of the Reynolds transport theorem (Resseguier et al., 2017 a, Eq. 28) is used to deal with these differential n forms $θ = f d x^{1} \land \dots \land d x^{n}$ . Assuming an integral conservation $\frac{d}{d t} \int_{V (t)} f = 0$ on a spatial domain V(t) transported by the flow, that theorem leads to the following SPDE:

\begin{matrix} (78) & \begin{aligned} \frac{D f}{D t} & + \nabla \cdot (w^{⋆} + σ \dot{B}) f = \frac{d}{d t} 〈\int_{0}^{t} D_{t} f, \int_{0}^{t} \nabla \cdot σ \dot{B}〉 \\ = (\nabla \cdot σ_{• i}) (\nabla \cdot σ_{• i})^{T} f, \end{aligned} \end{matrix}

where $D / D t$ denotes the Itō material derivative. Here again, forcing terms are dropped for the sake of readability. This SPDE can be rewritten using the expression of that material derivative (Eqs. 9 and 10 of Resseguier et al., 2017 a):

\begin{matrix} (79) & \begin{aligned} \partial_{t} f & + \nabla \cdot (w_{S} f) = \frac{1}{2} \nabla \cdot (a \nabla f) + \frac{1}{2} \nabla \cdot (σ_{• i} (\nabla \cdot σ_{• i})^{T} f) \\ - \nabla \cdot (σ \dot{B} f) \end{aligned} \\ (80) & = \frac{1}{2} \nabla \cdot (σ_{• i} (\nabla \cdot (σ_{• i} f))^{T}) - \nabla \cdot (σ \dot{B} f) \\ (81) & = - \nabla \cdot (σ \circ \dot{B} f) . \end{matrix}

The original deterministic equation and stochastic perturbation correspond to

\begin{matrix} (82) & g^{f} (S) = - \nabla \cdot (w f) \\ (83) & \begin{aligned} d_{s}^{LU} f & = (- \nabla \cdot (w_{S}^{c} f) + \frac{1}{2} \nabla \cdot (a \nabla f) + \frac{1}{2} \nabla \\ \cdot (σ_{• i} (\nabla \cdot σ_{• i})^{T} f)) d t - \nabla \cdot (σ d B f) \end{aligned} \\ (84) & = \nabla \cdot (({(\frac{1}{2} \nabla \cdot a)}^{T} d t - σ d B) f) + \nabla \cdot (\frac{1}{2} a \nabla f) d t . \end{matrix}

Identifying $a = σ_{• i} σ_{• i}^{T} = e_{i} e_{i}^{T}$ and $σ \dot{B} = - e_{i} d η_{i}$ , Eq. (35) corresponds to Example 3.2.2 about n forms, with

\begin{matrix} (85) & \begin{aligned} \tilde{V} & = - a^{p} + \frac{1}{2} (\partial_{x^{q}} e_{i}^{p} e_{i}^{q} - e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) - e_{i}^{p} \frac{d η_{i}}{d t} \\ = - {(\frac{1}{2} \nabla \cdot a)}^{T} + σ \dot{B}; \end{aligned} \end{matrix}

i.e.,

\begin{matrix} (86) & a^{p} = \partial_{x^{q}} (e_{i}^{p} e_{i}^{q}) - (e_{i}^{p} \partial_{x^{q}} e_{i}^{q}) = e_{i}^{q} \partial_{x^{q}} e_{i}^{p} . \end{matrix}

Again, the remapping is obtained,

\begin{matrix} (87) & \begin{aligned} T_{t} (x) & = x + e_{i}^{q} \partial_{x_{q}} e_{i} Δ t + e_{i} Δ η_{i} = x - w_{S}^{c} Δ t \\ + (- w_{S}^{c} Δ t - σ Δ B), \end{aligned} \end{matrix}

previously derived for the differential 0 form in LU framework (Eq. 76). Therefore, the proposed approach also generalizes the LU framework for n forms and its capacity – given by the Reynolds transport theorem – to deal with extensive properties.

Remark 7. For incompressible flows, the LU equation further imposes that

\begin{matrix} (88) & \{\begin{cases} \nabla \cdot σ = 0 \\ \nabla \cdot \nabla \cdot a = 0 . \end{cases} \end{matrix}

Translating it into our notation, it reads

\{\begin{cases} \partial_{x_{p}} e_{i}^{p} = 0 for each i \\ \partial_{x_{p}} \partial_{x_{q}} (e_{i}^{p} e_{i}^{q}) = 0 . \end{cases}

Applying the result in Example 3.1.2, straightforward calculation means Eq. (88) is equivalent to $T_{t}^{*} θ = θ$ for $θ = d x^{1} \land \dots \land d x^{n}$ . Such a result was expected since constraints (Eq. 88) are obtained from the LU density conservation.

5 A stochastic version of thermal shallow water equation

In this section, the proposed approach is applied to derive a stochastic version of thermal shallow water equation. Another stochastic version of thermal shallow water equation can be found in Holm and Luesink (2021). The thermal shallow water equation is derived in Warneford and Dellar (2013):

\begin{matrix} (89) & \frac{\partial h}{\partial t} + \nabla \cdot (h \overline{u}) = 0, \\ (90) & \frac{\partial Θ}{\partial t} + (\overline{u} \cdot \nabla) Θ = - κ (h Θ - h_{0} Θ_{0}), \\ (91) & \frac{\partial \overline{u}}{\partial t} + (\overline{u} \cdot \nabla) \overline{u} + f \hat{z} \times \overline{u} = - \nabla (h Θ) + \frac{1}{2} h \nabla Θ . \end{matrix}

This model can be used to describe a two-layer system under equivalent barotropic approximation. The upper layer is active but with a spatiotemporal varying density ρ(x,t), while the lower layer is quiescent with a fixed constant density ρ₀. The state variable h represents the height of the active layer, and $Θ = g (ρ_{0} - ρ) / ρ_{0}$ is the density contrast. $\overline{u}$ is the averaged horizontal velocity of the active layer at each column. Note that ρ<ρ₀ (hence Θ>0) in the scenario of equivalent barotropic approximation (Warneford and Dellar, 2013).

Stated in Warneford and Dellar (2013), the following physical quantities are conserved up to the forcing.

\begin{matrix} (92) & Total energy: E = \int_{Ω} \frac{1}{2} (h | \overline{u} |^{2} + h^{2} Θ) d^{2} x . \\ (93) & Total mass: M = \int_{Ω} h d^{2} x . \\ (94) & Total momentum: M = \int_{Ω} h \overline{u} d^{2} x . \end{matrix}

The objective is thus to choose proper tensor fields $θ_{\overline{u}}, θ_{h}$ , and θ_Θ for the state variables $\overline{u}, h,$ and Θ, respectively, so that $E, M,$ and M are conserved by the perturbation scheme. Again, it must be emphasized that the conservation law of the perturbation scheme does not directly imply that the same quantities are conserved by the final SPDE.

The domain is 2-dimensional. To conserve mass, the only choice for θ_h is $θ_{h} = h d x^{1} \land d x^{2}$ , which is a differential 2 form. It plays the role of density. In order to conserve the momentum, we need the momentum to be a differential 2 form as well. Hence we must choose $θ_{\overline{u}}$ to be a function (differential 0 form). Therefore, the only choice for $θ_{\overline{u}}$ is $θ_{\overline{u}} = \overline{u}$ . This choice of $θ_{\overline{u}}$ and θ_h implies that $h | \overline{u} |^{2}$ also corresponds to a 2 form $| \overline{u} |^{2} θ_{h}$ . Hence the kinetic energy is automatically conserved by the perturbation scheme. This means that if we want E to be conserved, we must select θ_Θ so that h²Θ corresponds to a differential 2 form. Note that θ_h is already a 2 form. We must thus select θ_Θ so that hΘ corresponds to a function. The only choice for θ_Θ is the contravariant tensor $θ_{Θ} = Θ \frac{\partial}{\partial x^{1}} \land \frac{\partial}{\partial x^{2}}$ . In this case, hΘ corresponds to the differential 0 form 〈θ_hθ_Θ〉=hΘ, where 〈,〉 in this section is the natural pairing of covariant n-tensor fields and contravariant n-tensor fields.

In sum, we have chosen the following tensor fields:

\begin{matrix} (95) & θ_{h} = h d x^{1} \land d x^{2} \\ (96) & θ_{{\overline{u}}^{j}} = {\overline{u}}^{j} (for j = 1, 2) \\ (97) & θ_{Θ} = Θ \frac{\partial}{\partial x^{1}} \land \frac{\partial}{\partial x^{2}} . \end{matrix}

For

\begin{matrix} (98) & T_{t} (x) = x + a Δ t + e_{i} Δ η_{i}, \end{matrix}

we have

\begin{matrix} (99) & T_{t}^{- 1} (x) = x + (- a + e_{i}^{p} \partial_{x_{p}} e_{i}) Δ t - e_{i} Δ η_{i} . \end{matrix}

Then $T_{t}^{*} θ_{h}$ , $T_{t}^{*} θ_{\overline{u}}$ , and $(T_{t}^{- 1})_{*} θ_{Θ}$ can be calculated following Examples 3.1.3, 3.1.1, and 3.1.5. This further implies $d_{s} h, d_{s} \overline{u}$ , and d_sΘ, as shown in Examples 3.2.2, 3.2.1, and 3.2.4. Note that $T_{t}^{- 1}$ instead of T_t is applied to θ_Θ, as shown in Eq. (7). Finally, we end up with the following SPDE:

\begin{matrix} (100) & \begin{aligned} d h = & - \nabla (h \overline{u}) d t + (h (\partial_{x_{p}} a^{p} + \frac{1}{2} J_{i}) + a^{p} \partial_{x_{p}} h \\ + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x_{p}} \partial_{x_{q}} h + \partial_{x_{p}} h e_{i}^{p} \partial_{x_{q}} e_{i}^{q}) d t \\ + (h \partial_{x_{p}} e_{i}^{p} + \partial_{x_{p}} h e_{i}^{p}) d η_{i} \end{aligned} \\ (101) & \begin{aligned} d Θ & = {- (\overline{u} \cdot \nabla) Θ - κ (h Θ - h_{0} Θ_{0})} d t \\ + (Θ (- \partial_{x_{p}} a^{p} + \partial_{x_{p}} (\partial_{x_{q}} e_{i} e_{i}^{q})^{p} + \frac{1}{2} J_{i}) \\ + \partial_{x_{p}} Θ a^{p} + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x_{p}} \partial_{x_{q}} Θ - \partial_{x_{p}} Θ e_{i}^{p} \partial_{x_{q}} e_{i}^{q}) d t \\ - (Θ \partial_{x_{p}} e_{i}^{p} - \partial_{x_{p}} Θ e_{i}^{p}) d η_{i} \end{aligned} \\ (102) & \begin{aligned} d {\overline{u}}^{j} = & - {(\overline{u} \cdot \nabla) \overline{u} - f \hat{z} \times \overline{u} - \nabla (h Θ) + \frac{1}{2} h \nabla Θ}^{j} d t \\ + (\partial_{x_{p}} {\overline{u}}^{j} a^{p} + \frac{1}{2} e_{i}^{p} e_{i}^{q} \partial_{x_{p}} \partial_{x_{q}} {\overline{u}}^{j}) d t \\ + \partial_{x_{p}} {\overline{u}}^{j} e_{i}^{p} d η_{i}, \end{aligned} \end{matrix}

where $J_{i} = \partial_{x_{p}} e_{i}^{p} \partial_{x_{q}} e_{i}^{q} - \partial_{x_{q}} e_{i}^{p} \partial_{x_{p}} e_{i}^{q}$ . And the total mass, total momentum and the total energy shall all be conserved by the perturbation scheme.

6 Conclusions

The starting point of this work was to question how the location of the state variable can be consistently perturbed, motivated by Brenier's theorem (Brenier, 1991) which suggests that the difference of two density fields can be represented by a transport map T. Noting that optimal transportation has a clean representation in terms of differential n forms, we proposed to perturb the “location” of the state variable S, at every forecast time step, by perturbing the corresponding differential k forms θ by $θ \leftarrow T_{t}^{*} θ$ , where T_t is a random diffeomorphism which deviates from the identity map infinitesimally.

Under this framework, we end up with a stochastic PDE of the state variable S in the form

\begin{matrix} (103) & d S = f (S) d t + d_{s} S, \end{matrix}

where f(S)dt is the incremental of S given by the original deterministic system. The term d_sS is the additional stochastic incremental of S caused by the perturbation scheme.

In this paper, we generalize this scheme to a mixed type of tensor fields θ. A key point is indeed to link the state variable S with some tensor field θ. The choice of θ can then correspond to the conservation laws of certain quantities. We describe in detail how to calculate $T_{t}^{*}$ and T_t* and present results for several examples corresponding to different choices of θ. We also discussed about the conservation laws for these examples. We emphasize that Brenier's theorem merely serves as the motivation but not the theoretical foundation of the proposed scheme, since the optimality of the displacement vector field needs to be rigorously defined for general tensor fields θ that are not positive differential n forms.

Interestingly, similarities and differences can be studied between the proposed perturbation scheme and the existing stochastic physical SALT and LU settings (Holm, 2015; Mémin, 2014; Resseguier et al., 2017 a). In particular, both SALT and LU equations can be recovered using a prescribed definition of the random diffeomorphism T_t used by the perturbation scheme. For illustration, a stochastic version of the thermal shallow water equation is presented. Compared with SALT and LU settings (Holm, 2015; Mémin, 2014; Resseguier et al., 2017 a), the proposed perturbation scheme does not directly rely on the physics. Hence it is more flexible and can be applied to any PDE. Yet, the proposed derivation also provides interesting means to interpret the operator $L_{d x_{t} - u d t}$ , appearing in the SALT equation. In terms of the optimal transportation, this term represents the infinitesimal forecast error at every forecast time step.

In order to apply the proposed perturbation scheme to any specific model, the parameters a and e_i must be determined specifically. Hence it is necessary to learn these parameters from existing data, experimental runs, or additional physical considerations (Resseguier et al., 2020, 2021). We anticipate this framework naturally provides a new perspective on how to learn these parameters. Likely, this task will invoke the need of numerical algorithms to estimate the optimal transportation map for general differential k forms or even mixed type of tensor fields. This will be the subject of future investigations.

Appendix A: Calculation of

T_{t}^{- 1}

Suppose that

\begin{matrix} (A1) & T_{t} (x) = x + a Δ t + e_{i} Δ η_{i} . \end{matrix}

We assume that $T_{t}^{- 1}$ has the following form of expression:

\begin{matrix} (A2) & T_{t}^{- 1} (x) = x + z Δ t + b_{i} Δ η_{i} . \end{matrix}

Our goal is to find z and b_i. Then we have

\begin{matrix} (A3) & \begin{aligned} x = & T_{t} (T_{t}^{- 1} (x)) = T_{t} (x + z Δ t + b_{i} Δ η_{i}) \\ = & x + z Δ t + b_{i} Δ η_{i} + a |_{x + z Δ t + b_{i} Δ η_{i}} Δ t \\ + e_{i} |_{x + z Δ t + b_{i} Δ η_{i}} Δ η_{i} . \end{aligned} \end{matrix}

Similar to the derivation in Sect. 3.1, we apply Taylor expansion and Itô's lemma and drop the terms of the higher-order infinitesimal:

\begin{matrix} (A4) & \begin{aligned} a |_{x + a Δ t + b_{i} Δ η_{i}} Δ t = & a |_{x} Δ t + o (Δ t) \\ e_{i} |_{x + z Δ t + b_{i} Δ η_{i}} Δ η_{i} = & e_{i} |_{x} Δ η_{i} + e_{i p} b_{i}^{p} |_{x} Δ t + o (Δ t) . \end{aligned} \end{matrix}

Therefore,

\begin{matrix} (A5) & \begin{aligned} x & = T_{t} (T_{t}^{- 1} (x)) = x + (z + a + e_{i p} b_{i}^{p}) Δ t \\ + (b_{i} + e_{i}) Δ η_{i} + o (Δ t) . \end{aligned} \end{matrix}

This implies that

\begin{matrix} (A6) & b_{i} + e_{i} = 0 \\ (A7) & z + a + e_{i p} b_{i}^{p} = 0 . \end{matrix}

Therefore,

\begin{matrix} (A8) & b_{i} = - e_{i} \\ (A9) & z = - a + e_{i p} e_{i}^{p}, \end{matrix}

or equivalently,

\begin{matrix} (A10) & T_{t}^{- 1} (x) = x + (- a + e_{i p} e_{i}^{p}) Δ t - e_{i} Δ η_{i} . \end{matrix}

Appendix B: Derivation of

T_{t}^{*} θ

Given coordinates $(x^{1}, \dots, x^{n})$ , when θ is a differential k form, it can be written as

\begin{matrix} (B1) & θ = \sum_{i_{1} < \dots < i_{k}} f^{i_{1}, \dots, i_{k}} d x^{i_{1}} \land \dots \land d x^{i_{k}} . \end{matrix}

Since $T_{t}^{*}$ is linear, we may assume that

\begin{matrix} (B2) & θ = f d x^{i_{1}} \land \dots \land d x^{i_{k}} \end{matrix}

for some $1 \leq i_{1} < \dots < i_{k} \leq n$ . Let $T_{t} (x) = (T_{t}^{1} (x), \dots, T_{t}^{n} (x))$ , then

\begin{matrix} (B3) & (T_{t}^{*} θ) (x) = f (T_{t} (x)) d T_{t}^{i_{1}} \land \dots \land d T_{t}^{i_{k}} . \end{matrix}

We calculate f(T_t(x)) and $d T_{t}^{i_{1}} \land \dots \land d T_{t}^{i_{k}}$ separately. We denote $Δ x = T_{t} (x) - x = a Δ t + e_{i} Δ η_{i}$ and H_f the Hessian matrix of f. At a given time t, f is assumed independent from the noise Δη_i(t). Then

\begin{matrix} (B4) & \begin{aligned} f (T_{t} (x)) & = f (x + Δ x) = f (x) + 〈 \nabla f Δ x 〉 + \frac{1}{2} (Δ x)^{⊤} \\ H_{f} Δ x + o ((Δ x)^{2}) \end{aligned} \\ (B5) & = f (x) + 〈 \nabla f a Δ t + e_{i} Δ η_{i} 〉 + \frac{1}{2} e_{i}^{⊤} H_{f} e_{i} (Δ η_{i})^{2} \\ (B6) & \begin{aligned} + O & ((Δ t)^{2}) + O (Δ t Δ η_{i}) + o ((Δ t)^{2}) + o ((Δ η_{i})^{2}) \\ + o (Δ t Δ η_{i}) . \end{aligned} \end{matrix}

According to Itô's lemma, dηdη=dt, and we can replace (Δη_i)² with Δt. Hence,

\begin{matrix} (B7) & \begin{aligned} f (T_{t} (x)) & = f (x) + 〈 \nabla f a 〉 Δ t + 〈 \nabla f e_{i} 〉 Δ η_{i} \\ + \frac{1}{2} e_{i}^{⊤} H_{f} e_{i} Δ t + o (Δ t) \end{aligned} \\ (B8) & \begin{aligned} = f (x) & + (〈 \nabla f a 〉 + \frac{1}{2} e_{i} H_{f} e_{i}) Δ t + 〈 \nabla f e_{i} 〉 Δ η_{i} \\ + o (Δ t) . \end{aligned} \end{matrix}

Next,

\begin{matrix} (B9) & \begin{aligned} T_{t}^{*} (d x^{i_{1}} \land \dots \land d x^{i_{k}}) = d T_{t}^{i_{1}} \land \dots \land d T_{t}^{i_{k}} \\ = & (d x^{i_{1}} + d a^{i_{1}} Δ t + d e_{i}^{i_{1}} Δ η_{i}) \land \dots \land \\ (d x^{i_{k}} + d a^{i_{k}} Δ t + d e_{i}^{i_{k}} Δ η_{i}) . \end{aligned} \end{matrix}

Note that $d a^{i_{j}}$ and $d e_{i}^{i_{j}}$ refer to the spatial differentiation. Again, we apply the “discrete version” of Itô's rule (Δη_i)²=Δt and collect all the terms of order 𝒪(Δt) and 𝒪(Δη_i):

\begin{matrix} (B10) & \begin{aligned} T_{t}^{*} & (d x^{i_{1}} \land \dots \land d x^{i_{k}}) = d x^{i_{1}} \land \dots \land d x^{i_{k}} \\ + (\sum_{s = 1}^{k} d x^{i_{1}} \land \dots \land d a^{i_{s}} \land \dots \land d x^{i_{k}}) Δ t \\ + (\sum_{s = 1}^{k} d x^{i_{1}} \land \dots \land d e_{i}^{i_{s}} \land \dots \land d x^{i_{k}}) Δ η_{i} \\ + (\sum_{s < r} d x^{i_{1}} \land \dots \land d e_{i}^{i_{s}} \land \dots \land d e_{i}^{i_{r}} \\ \land \dots \land d x^{i_{k}}) Δ t \\ + o (Δ t) . \end{aligned} \end{matrix}

According to the chain rule, $d a^{i_{s}} = \partial_{x^{j}} a^{i_{s}} d x^{j}$ , $d e_{i}^{i_{s}} = \partial_{x^{j}} e_{i}^{i_{s}} d x^{j}$ . Note that $\partial_{x^{j}} e_{i}^{i_{s}}$ refers to the i_sth component of $\partial_{x^{j}} e_{i}$ , where $\partial_{x^{j}} e_{i} = \frac{\partial e_{i}}{\partial x^{j}}$ and e_i(x)∈ℝⁿ is the ith basis vector field of T_t. Hence,

\begin{matrix} (B11) & \begin{aligned} T_{t}^{*} & (d x^{i_{1}} \land \dots \land d x^{i_{k}}) \\ = & d x^{i_{1}} \land \dots \land d x^{i_{k}} + (\sum_{s = 1}^{k} \partial_{x^{j}} a^{i_{s}} d x^{i_{1}} \land \dots \land d x^{j} \\ \land \dots \land d x^{i_{k}}) Δ t \\ + (\sum_{s = 1}^{k} \partial_{x^{j}} e_{i}^{i_{s}} d x^{i_{1}} \land \dots \land d x^{j} \land \dots \land d x^{i_{k}}) Δ η_{i} \\ + (\sum_{s < r} \partial_{x^{j}} e_{i}^{i_{s}} \partial_{x^{l}} e_{i}^{i_{r}} d x^{i_{1}} \land \dots \land d x^{j} \land \dots \land d x^{l} \\ \land \dots \land d x^{i_{k}}) Δ t \\ + o (Δ t) . \end{aligned} \end{matrix}

Combining Eqs. (B8) and (B11), with application of Itô's lemma, all terms of order o(Δt) are then removed, to obtain

\begin{matrix} (B12) & \begin{aligned} T_{t}^{*} θ = & f (T_{t} (x)) T_{t}^{*} (d x^{i_{1}} \land \dots \land d x^{i_{k}}) \\ = & θ + {(〈 \nabla f a 〉 + \frac{1}{2} e_{i}^{⊤} H_{f} e_{i}) d x^{i_{1}} \land \dots \land d x^{i_{n}} \\ + \sum_{s = 1}^{k} f \partial_{x^{j}} a^{i_{s}} d x^{i_{1}} \land \dots d x^{j} \land \dots \land d x^{i_{k}} \\ + (\sum_{s < r} f \partial_{x^{j}} e_{i}^{i_{s}} \partial_{x^{l}} e_{i}^{i_{r}} d x^{i_{1}} \land \dots \land d x^{j} \land \dots \land \\ d x^{l} \land \dots \land d x^{i_{k}}) \\ + (\sum_{s = 1}^{k} 〈 \nabla f e_{i} 〉 \partial_{x^{j}} e_{i}^{i_{s}} d x^{i_{1}} \land \dots \land d x^{j} \\ \land \dots \land d x^{i_{k}})} Δ t \\ + {〈 \nabla f e_{i} 〉 d x^{i_{1}} \land \dots \land d x^{i_{k}} \\ + \sum_{s = 1}^{k} f \partial_{x^{j}} e_{i}^{i_{s}} d x^{i_{1}} \\ \land \dots \land d x^{j} \land \dots \land d x^{i_{k}}} Δ η_{i} \\ + o (Δ t) . \end{aligned} \end{matrix}

To simplify Eq. (B12), wedge algebra is applied, and the high-order infinitesimal o(Δt) is ignored. Accordingly, $T_{t}^{*} θ$ is more compactly written as

\begin{matrix} (B13) & T_{t}^{*} θ = θ + M (θ) Δ t + N_{i} (θ) Δ η_{i}, \end{matrix}

for some differential k forms ℳ(θ) and 𝒩_i(θ).

Data availability

No data sets were used in this article.

Author contributions

YZ contributed the main idea and mathematical derivation and wrote the first draft of the paper. VR contributed all the physical interpretations of the derived equations, a much more detailed introduction, and a more detailed comparison of the proposed scheme with the LU equation. BC contributed his insight and enthusiasm to this problem and worked on the writing of the draft. This work was done based on the fruitful discussion of all the three authors since a very early stage.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements

The authors would like to express their gratitude towards Wei Pan, Darryl Holm, Dan Crisan, Long Li, and Etienne Mémin for their patient explanation and insightful discussion. The research of Yicun Zhen was supported by the ANR Melody project when he was a postdoc at Ifremer. The research of Valentin Resseguier is supported by the company SCALIAN DS and by France Relance through the MORAANE project. The research of Bertrand Chapron is supported by ERC EU SYNERGY project no. 856408-STUOD and the ANR Melody project.

Financial support

This research has been supported by the Agence Nationale de la Recherche, Labex Immuno-Oncology (Melody), the European Research Council, H2020 (grant no. STUOD (856408)), the company SCALIAN DS and France Relance through the MORAANE project.

Review statement

This paper was edited by Jinqiao Duan and reviewed by Daniel Goodair and Baylor Fox-Kemper.

References

Anderson, J. L.: An adaptive covariance inflation error correction algorithm for ensemble filters, Tellus A, 59, 210–224, 2007. a

Berner, J., Shutts, G., Leutbecher, M., and Palmer, T.: A spectral stochastic kinetic energy backscatter scheme and its impact on flow-dependent predictability in the ECMWF ensemble prediction system, J. Atmos. Sci., 66, 603–626, 2009. a

Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions, Commun. Pur. Appl. Math., 44, 375–417, 1991. a, b, c

Brzeźniak, Z., Capiński, M., and Flandoli, F.: Stochastic partial differential equations and turbulence, Math. Models Methods Appl. Sci., 1, 41–59, 1991. a

Buizza, R., Miller, M., and Palmer, T.: Stochastic representation of model uncertainties in the ECMWF Ensemble Prediction System, Q. J. Roy. Meteor. Soc., 125, 2887–2908, 1999. a

Chern, S. S., Chen, W. H., and Lam, K. S.: Lectures on Differential Geometry, World Scientific, https://doi.org/10.1142/3812, 1999. a

Daum, F. E. and Huang, J.: Curse of dimensionality and particle filters, 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652), 4, 4_1979–4_1993, Big Sky, MT, USA, 8–15 March 2003, https://doi.org/10.1109/AERO.2003.1235126, 2003. a

Flandoli, F.: The interaction between noise and transport mechanisms in PDEs, Milan J. Math., 79, 543–560, 2011. a

Frank, J. E. and Gottwald, G. A.: Stochastic homogenization for an energy conserving multi-scale toy model of the atmosphere, Physica D, 254, 46–56, 2013. a

Franzke, C., O'Kane, T., Berner, J., Williams, P., and Lucarini, V.: Stochastic climate theory and modeling, Wiley Interdiscip. Rev.: Clim. Change, 6, 63–78, https://doi.org/10.1002/wcc.318, 2015. a, b

Gugole, F. and Franzke, C.: Numerical development and evaluation of an energy conserving conceptual stochastic climate model, Math. Clim. Weather Forecast., 5, 45–64, https://doi.org/10.1515/mcwf-2019-0004, 2019. a

Harlim, J. and Majda, A. J.: Catastrophic filter divergence in filtering nonlinear dissipative systems, Commun. Math. Sci., 8, 27–43, 2010. a

Hasselmann, K.: Stochastic climate models. Part I: theory, Tellus, 28, 473–485, 1976. a

Holm, D. D.: Variational principles for stochastic fluid dynamics, Proc. Math. Phys. Eng. Sci., 471, https://doi.org/10.1098/rspa.2014.0963, 2015. a, b, c, d, e, f, g, h, i, j, k, l, m, n

Holm, D. D. and Luesink, E.: Stochastic Wave–Current Interaction in Thermal Shallow Water Dynamics, J. Nonlinear Sci., 31, 29, https://doi.org/10.1007/s00332-021-09682-9, 2021. a

Houtekamer, P. and Mitchell, H. L.: A Sequential Ensemble Kalman Filter for Atmospheric Data Assimilation, Mon. Wather Rev., 129, 123–137, 2001. a

Jain, A., Timofeyev, I., and Vanden-Eijnden, E.: Stochastic mode-reduction in models with conservative fast sub-systems, arXiv [preprint], https://doi.org/10.48550/arXiv.1410.3004, 2014. a

Kotsuki, S., Ota, Y., and Miyoshi, T.: Adaptive covariance relaxation methods for ensemble data assimilation: Experiments in the real atmosphere, Q. J. Roy. Meteor. Soc., 143, 2001–2015, 2017. a

Kunita, H.: Stochastic flows and stochastic differential equations, vol. 24, Cambridge university press, 1997. a

Leith, C.: Atmospheric predictability and two-dimensional turbulence, J. Atmos. Sci., 28, 145–161, 1971. a

Leon, A. B. D.: On the effect of stochastic Lie transport noise on fluid dynamic equations, PhD thesis, Imperial College London, https://doi.org/10.25560/89936, 2021. a, b, c

Leroux, S., Brankart, J.-M., Albert, A., Brodeau, L., Molines, J.-M., Jamet, Q., Le Sommer, J., Penduff, T., and Brasseur, P.: Ensemble quantification of short-term predictability of the ocean dynamics at a kilometric-scale resolution: a Western Mediterranean test case, Ocean Sci., 18, 1619–1644, https://doi.org/10.5194/os-18-1619-2022, 2022. a

Leutbechner, M., Ollinaha, P., Lock, S.-J., Lang, S., Bechtold, P., Beljaars, A., Bozzo, A., Forbes, R., Haiden, T. Hogan, R., and Sandu, I.: Stochastic representations of model uncertainties in the IFS, in: ECMWF/WWRP Workshop: Model Uncertainty, ECMWF, Reading, 2016. a

Li, H., Kalnay, E., and Miyoshi, T.: Simultaneous estimation of covariance inflation and observation errors within an ensemble Kalman filter, Q. J. Roy. Meteor. Soc., 135, https://doi.org/10.1002/qj.371, 2009. a

Majda, A. J., Timofeyev, I., and VandenEijnden, E.: Models for stochastic climate prediction, P. Natl. Acad. Sci. USA, 96 26, 14687–14691, 1999. a, b

Mémin, E.: Fluid flow dynamics under location uncertainty, Geophys. Astro. Fluid, 108, 119–146, 2014. a, b, c, d, e, f, g, h, i, j, k

Mikulevicius, R. and Rozovskii, B.: Stochastic Navier-Stokes Equations for Turbulent Flows, SIAM J. Math. Anal., 35, 1250–1310, 2004. a, b, c

Miyoshi, T.: The Gaussian Approach to Adaptive Covariance Inflation and Its Implementation with the Local Ensemble Transform Kalman Filter, Mon. Wather Rev., 139, 1519–1535, 2011. a

Orszag, S.: Analytical theories of turbulence, J. Fluid Mech., 41, 363–386, 1970. a

Penland, C. and Sardeshmukh, P.: The optimal growth of tropical sea surface temperature anomalies, J. Climate, 8, 1999–2024, 1995. a

Pope, S.: Lagrangian PDF methods for turbulent flows, Annu. Rev. Fluid Mech., 26, 23–63, 1994. a

Poterjoy, J.: A Localized Particle Filter for High-Dimensional Nonlinear Systems, Mon. Wather Rev., 144, 59–76, 2016. a

Raanes, P. N., Bocquet, M., and Carrassi, A.: Adaptive covariance inflation in the ensemble Kalman filter by Gaussian scale mixtures, Q. J. Roy. Meteor. Soc., 145, 53–75, 2019. a

Resseguier, V.: Mixing and fluid dynamics under location uncertainty, PhD thesis, Université Rennes 1, 2017. a

Resseguier, V., Mémin, E., and Chapron, B.: Geophysical flows under location uncertainty, Part I Random transport and general models, Geophys. Astro. Fluid, 111, 149–176, 2017a. a, b, c, d, e, f, g, h, i, j, k

Resseguier, V., Mémin, E., and Chapron, B.: Geophysical flows under location uncertainty, Part II Quasi-geostrophy and efficient ensemble spreading, Geophys. Astro. Fluid, 111, 177–208, 2017b. a

Resseguier, V., Pan, W., and Fox-Kemper, B.: Data-driven versus self-similar parameterizations for stochastic advection by Lie transport and location uncertainty, Nonlin. Processes Geophys., 27, 209–234, https://doi.org/10.5194/npg-27-209-2020, 2020. a, b, c, d, e, f

Resseguier, V., Li, L., Jouan, G., Dérian, P., Mémin, E., and Chapron, B.: New trends in ensemble forecast strategy: uncertainty quantification for coarse-grid computational fluid dynamics, Arch. Comput. Methods E., 28, 215–261, 2021. a, b, c, d, e, f

Reynolds, C., Leutbecher, M., Batté, L., Chen, S., Christensen, H., Klasa, C., Pegion, P., Plant, B., Raynaud, L., Roberts, N., Sandu, I., Singleton, A., Sommer, M., Swinbank, R., Tennant, W., and Theis, S.: Reports from working group 3 : What are the pros/cons of existing model uncertainty schemes and how should these be measured?, in: ECMWF/WWRP Workshop: Model Uncertainty, ECMWF, Reading, 2016. a

Sapsis, T. and Majda, A.: A statistically accurate modified quasilinear Gaussian closure for uncertainty quantification in turbulent dynamical systems, Physica D, 252, 34–45, 2013. a

Schlee, F. H., Standish, C. J., and Toda, N. F.: Divergence in the Kalman Filter, AIAA J., 5, 1114–1120, 1966. a

Tandeo, P., Ailliot, P., Bocquet, M., Carrassi, A., Carrassi, A., Miyoshi, T., Pulido, M., Pulido, M., and Zhen, Y.: A Review of Innovation-Based Methods to Jointly Estimate Model and Observation Error Covariance Matrices in Ensemble Data Assimilation, Mon. Wather Rev., 3973–3994, https://doi.org/10.1175/MWR-D-19-0240.1, 2020. a

Tibshirani, R. and Knight, K.: The Covariance Inflation Criterion for Adaptive Model Selection, J. R. Stat. Soc. B, 61, 529–546, https://doi.org/10.1111/1467-9868.00191, 1999. a

Warneford, E. S. and Dellar, P. J.: The quasi-geostrophic theory of the thermal shallow water equations, J. Fluid Mech., 723, 374–403, 2013. a, b, c

Ying, Y. and Zhang, F.: An adaptive covariance relaxation method for ensemble data assimilation, Q. J. Roy. Meteor. Soc., 141, 2898—2906, https://doi.org/10.1002/qj.2576, 2015. a

Zhen, Y. and Harlim, J.: Adaptive error covariances estimation methods for ensemble Kalman filters, J. Comput. Phys., 294, 619–638, 2015. a

Physically constrained covariance inflation from location uncertainty

3.1 Calculation of Tt*θ (or Tt*θ)

3.2 Derivation of the SPDE

3.3 Conservation laws related to dsf

4.1 Comparison with SALT equation

4.2 Comparison with the LU equation

4.2.1 0 forms in the LU framework

4.2.2 n forms in the LU framework

3.1 Calculation of $T_{t}^{} θ$ (or T_tθ)

3.3 Conservation laws related to d_sf