Data-driven versus self-similar parameterizations for stochastic advection by Lie transport and location uncertainty

Abstract. Stochastic subgrid parameterizations enable ensemble forecasts of fluid dynamic systems and ultimately accurate data assimilation (DA). Stochastic advection by Lie transport (SALT) and models under location uncertainty (LU) are recent and similar physically based stochastic schemes. SALT dynamics conserve helicity, whereas LU models conserve kinetic energy (KE). After highlighting general similarities between LU and SALT frameworks, this paper focuses on their common challenge: the
parameterization choice. We compare uncertainty quantification skills of a stationary heterogeneous data-driven parameterization and a non-stationary homogeneous self-similar parameterization. For stationary, homogeneous surface quasi-geostrophic (SQG; QG) turbulence, both parameterizations lead to high-quality ensemble forecasts. This paper also discusses a heterogeneous adaptation of the homogeneous parameterization targeted at a better simulation of strong straight buoyancy fronts.



Introduction
Geophysical fluid dynamics and other turbulent flows cover a wide range of spatial and temporal scales (see, e.g. . However, numerical simulations and observations are limited in the range of scales that can be simulated or observed. Accordingly, deterministic subgrid models or parameterizations (e.g. Margolin et al., 2006;San et al., 2013;Bachman et al., 2017;Voosen, 2018) and regularizations (e.g. nugget in optimal interpolation, Tandeo et al., 2014) are incorporated into numerical simulations and measurement processes. The limited resolution of these numerical simulations introduces unknown errors which are continuously amplified during the course of the simulation and which compound with the measurement errors to the extent that the simulations model the true chaotic nature of fluid dynamics. Improvements in accuracy of these simulations can sometimes be made by judicious use of data assimilation (DA) techniques. In the data assimilation process, observations (data) are integrated into the model state of the numerical simulation limited by imperfect physics and imperfect initial conditions. DA proceeds by alternating between forecast and analysis cycles. In each analysis cycle, observations are combined with the results from a prediction model (the forecast) to produce a so-called analysis. The analysis is expected to be more accurate than either predictions based on physical models without incorporating observations or predictions from physics-free interpolations of the observations. In order to optimize the analysis cycle, confidence in both the observations and the prediction needs to be quantified. Observational accuracy depends on the instrumental precision and sampling; and well-known uncertainty quantification (UQ) techniques are available to estimate the uncertainty associated with these limitations. In contrast, the uncertainty due to imperfect model physics is more difficult to quantify because often a "perfect model", exact solution, or direct numerical simulation is not available or feasible for the intended application.
Historically introduced to address the limited range of scales simulated (e.g. Orszag, 1970;Leith, 1971) tic subgrid parameterization is now a widely used UQ tool for assessing the impact of imperfect physics in numerical simulations. In particular, it is generally thought that building such stochastic parameterizations from physical principles promises accuracy and robustness (Berner et al., 2017). Several authors have addressed the issue of stochastic subgrid parameterization evaluation through averaging or homogenization procedures (e.g. Majda-Timofeyev-Vanden-Eijnden -MTV -method; see . Correlated additive and multiplicative noises usually play a key role in these approaches (e.g. Gottwald and Melbourne, 2013). Sardeshmukh andSura (2009), Sardeshmukh et al. (2015), and Pearson and Fox-Kemper (2018) highlight the relevance of skew-symmetric and multiplicative noises in geophysical fluid dynamic applications. Mimetic symmetries built into the fluid dynamics and stochastic subgrid model formulations may preserve some important physical invariants -a desired properties of many UQ models (e.g. Frank and Gottwald, 2013;Sapsis and Majda, 2013;Mitchell and Gottwald, 2012;Majda, 2015) and DA algorithms (Janjić et al., 2014). Other stochastic subgrid parameterizations involve memory terms, justified by the Mori-Zwanzig equations (Chekroun et al., 2011;Lu et al., 2017).
The dynamics under location uncertainty (LU) are a type of systematic stochastic subgrid parameterization introduced by Mikulevicius and Rozovskii (2004b) and Flandoli (2011) for the theory of well-posedness of stochastic partial differential equations and by Mémin (2014) for more applied purposes. More theoretical results about models under location uncertainty are discussed in Resseguier et al. (2017a). This framework has been successfully applied to uncertainty quantification (Resseguier et al., 2017b), geophysics (Resseguier et al., 2017c), attractor exploration (Chapron et al., 2018), and optical flow (Cai et al., 2018).
A similar class of models called stochastic advection by Lie transport (SALT) was first derived in Holm (2015) for the time-homogeneous subgrid parameterization and extended to the inhomogeneous case in Gay-Balmaz and Holm (2018). In the following, we will refer to this class of models as SALT. The approach used is Hamilton's variation principle with the constraint that fluid parcels move according to a Eulerian stochastic subgrid model; see Holm (2015) and Cotter et al. (2017). A local well-posedness result for the 3D SALT Euler equations is proved in Crisan et al. (2019), and a global well-posedness result for the 2D case is proved in Crisan and Lang (2019). Numerical studies for the implementation of the SALT subgrid parameterization are discussed in Cotter et al. (2019a for a 2D Euler model and two-layer 2D QG dynamics. A first study on employing SALT dynamics for data assimilation is provided in Cotter et al. (2019b).
In both LU and SALT frameworks, the velocity is decomposed into a random large-scale component, w, and a timeuncorrelated component, σḂ = σ dB t /dt. The latter is Gaussian (conditionally on some large-scale quantities), correlated in space, with possible heterogeneities and anisotropy.
Hereafter, this unresolved velocity component will further be assumed to be solenoidal. To parameterize those spatial correlations, we apply an infinite-dimensional linear operator, σ , to a d-dimensional space-time white noise,Ḃ. Holm and coauthors rely on a different but equivalent notation. It directly explicates the spectral decomposition of the timeuncorrelated velocity spatial covariance. Specifically, the small-scale velocity reads σ (x)Ḃ = p ξ p (x)dW p (t)/dt, where (W p ) p is a set of independent 1D Brownian motions and (ξ p ) p is a set of orthogonal functions.
That unresolved velocity is the central object of both SALT and LU approaches. The statistics of this velocitydefined either by σ or (ξ p ) p -constitute the parameterization of the random model. In this paper, we discuss several choices for this parameterization and dedicate significant discussion to the physical principles and assumptions about the flow regime underlying each choice. To better motivate this work, we will highlight the strong links and some formal differences between the LU and the SALT approaches. However, a full -theoretical and numerical -comparison of LU and SALT is beyond the scope of this paper.
The paper is structured as follows.
-Section 2.1 focuses on parametric non-data-driven choices. Specifically, we propose a non-stationary and tuning-free improvement of the work of Resseguier et al. (2017b) based on self-similar assumptions and possible heterogeneous modulation. Links with the Smagorinsky subgrid parameterization and non-linear energy fluxes are also discussed.
-Section 2.2 discusses a data-driven stationary but possibly heterogeneous parameterization implemented by Cotter et al. (2019a. The method relies on decomposition of empirical orthogonal functions (EOF) also known as proper orthogonal decomposition (POD) or principal component analysis (PCA).
-Section 3 begins by presenting the deterministic and stochastic surface quasi-geostrophic (SQG) models, their numerical setup, and simulation parameters. These are followed by detailed discussions of uncertainty quantification skills in Sect. 3.4.
-Sections 4 and 5 concludes this work.
Additionally, in Appendix A we include a mathematical summary of LU and SALT formulations for the interested readers. The aim here is to highlight the similarities and differences between the two approaches. Their differing sets of invariants are listed.
The choice of surface quasi-geostrophic flow is convenient, since the SALT and LU versions of the SQG dynamics coincide; thus the results presented apply to both SALT and LU models. Furthermore, present interest in oceanic submesoscale motions for which SQG dynamics are often a reasonable approximation (e.g. Lapeyre and Klein, 2006b;Cal-lies et al., 2016) and the fact that SQG offers the possibility of reconstructing subsurface flow properties from surface satellite observations (e.g. LaCasce and Mahadevan, 2006;Isern-Fontanet et al., 2006Klein et al., 2009;Wang et al., 2013) have resulted in a recent surge in the use of the SQG equations for geophysics. Accordingly, the discussion can focus on the parameterization choices in this timely application.
2 Parameterization of SALT and LU models: the statistics of the unresolved velocity 2.1 Parametric and self-similar model for the unresolved velocity In this section, we propose parametric non-data-driven models for the unresolved velocity statistics. Similar to the Kraichnan model (Kraichnan, 1968(Kraichnan, , 1994Gawedzky and Kupiainen, 1995;Majda and Kramer, 1999), Resseguier et al. (2017b) introduce an unresolved velocity which is homogeneous and isotropic. Even though this method leads to good numerical results, some parameters need to be tuned. To overcome this drawback, we propose here a parameter-free improvement based on what we call the absolute diffusivity spectral density (ADSD). New subgrid statistics will be defined at each time step from the resolved velocity kinetic energy (KE) spectrum. Finally, we also propose a heterogeneous modulation of the previous method based on the local energy flux and discuss the link with Smagorinsky-like subgrid parameterizations.

Parameter-free and non-stationary homogeneous model
We first define the statistics of the small-scale velocity through what we call the ADSD. Then, we explicate how to sample the unresolved velocity from this ADSD. In Resseguier et al. (2017b), the absolute diffusivity (i.e. KE times correlation time; Falkovich et al., 2001;Penland, 2003;Klyatskin, 2005;Vallis, 2006;Keating et al., 2011) of the unresolved velocity is twice the variance tensor trace, tr(a) = E σ dB t 2 /dt, whereas the unresolved kinetic energy is tr(a)/dt. Clearly, this kinetic energy has no physical meaning. Indeed, it depends on the simulation time step, and one should have the possibility to choose the time step as close as possible to zero. Thus, the unresolved velocity amplitude is specified through an absolute diffusivity rather than through a KE. In the mathematics literature of homogenization, Kubo-type formulas may be seen as what physicists call absolute diffusivities. More generally, since the variance of a time-continuous white noise is infinite, it is more relevant to deal with absolute diffusivity rather than kinetic energy in order to describe the statistics of the time-uncorrelated velocity. Thus, keeping a spectral approach, we define -for any spatiotemporal field -an absolute diffusivity spectral den-sity denoted A(κ) at the wavenumber κ. We will rely on this ADSD rather than on the KE spectrum, E(κ). Since the absolute diffusivity is the variance multiplied by the correlation time, it is naturally to define the ADSD as follows: is the eddy turnover time at the scale 1/κ and v κ is at characteristic velocity at this scale. Accordingly, we have If in addition we assume a KE self-similar distribution, and we obtain where r = (s + 3)/2. We aim at defining the unresolved velocity ADSD from the large-scale velocity. For this purpose, we will assume the self-similar model (Eq. 4) is valid at all spatial scales. At each time step, we compute the ADSD of the large-scale velocity, A w , from Eq. (2). Then, we fit the coefficients C and r of Eq. (4), as illustrated by Fig. 1. Let us denote these coefficients with C w and r w . Note that they are time-dependent because they depend on w which is. More precisely, we estimate the coefficients C w and r w in a wavenumber interval which approximately represents a inertial range of fully resolved scales (i.e. before the spectrum roll-off). In the left panel of Fig. 1, this interval is delimited by two vertical lines.
From there, we can define the unresolved velocity ADSD in such a way that the total velocity -resolved plus unresolved -meets Eq. (4) at small spatial scales with the same coefficients (C w and r w ). Since the two velocity components are not correlated, the total ADSD is the sum of the ADSD of each velocity component. In the inertial range, this reads Leading at large scales Leading at small scales Therefore, the unresolved ADSD is chosen to compensate the resolved ADSD roll-off -introduced by the deterministic subgrid parameterization -at small scales. Specifically, the unresolved ADSD is set to As previously, f 2 BP is a band-pass filter between κ m and κ M . In practice, we set κ M to the theoretical resolution, Figure 1. Buoyancy field (m s −2 ) (a, b), KE spectrum (m 2 s −2 /(rad m −1 )) (c, d), and ADSD (m 2 s −1 /(rad m −1 )) (e, f) of the resolved velocity, at t = 0 d (a, ce) and t = 15 d (b, d, f), in blue; ADSD of the unresolved velocity (without a multiplicative constant), in green; and slope − 5 3 (c, d) and corresponding ADSD (e, f) in a black solid line. The two dashed vertical lines define the interval where coefficients C w and r w are fitted. The right-hand-side dashed vertical line is fixed (see Appendix B for its specification). The left-hand-side dashed line corresponds to the energy-injecting scale. This scale is estimated -at each time step -as corresponding to the maximum of the compensated KE spectrum (i.e. KE spectrum divided by a theoretical spectrum). The dashed oblique line is the resulting fit (it is set to meet A w in the left bound of the fitting interval). The unresolved velocity is still restricted to a narrow spectral band, specifically between the dotted dashed line and the right end of the plot. Nevertheless, the form of its spectrum varies in time and is determined by the spectrum of the large-scale velocity. This particular initial condition corresponds to case 2 studied by Constantin et al. (1994Constantin et al. ( , 1999Constantin et al. ( , 2012 at a resolution of 128 2 . and κ m to the effective resolution, which is estimated as follows: where p is the order of the Laplacian used as a deterministic subgrid tensor (i.e. Db Dt = −ν(− ) p b). The justification of the above formula is left in Appendix B. Compared to the work of Resseguier et al. (2017b), the value of κ m is less critical. Indeed, Eq. (6) implies a weaker unresolved ADSD at larger scales where the resolved ADSD, A w , is stronger. This softens the threshold effect introduced by the band-pass filter f BP .
In practice, we set an upper bound for the estimation of r w . Without this upper bound, a concentration of energy at relatively large wavenumbers -scales smaller than κ m -in the resolved fields can become unstable. Indeed, this localized energy concentration would decrease the r w estimation, and hence increase the unresolved ADSD A σḂ through Eq. (6) at large wavenumbers -larger than κ m . This implies a larger noise intake, which can induce a larger concentration of energy at relatively large wavenumbers in the resolved fields, resulting in a positive feedback loop. To prevent these unphysical instabilities, the slope r w is bounded.
We have proposed a way to compute the unresolved ADSD A σḂ from the large-scale velocity statistics. From that ADSD, it is possible to sample the unresolved velocity, as explained in the following.
where denotes a spatial convolution. This velocity field can be easily sampled in Fourier space.
where dB t is the spatial Fourier transform of dB t and dB t √ t is a discrete scalar white-noise process of unit variance in Nonlin. Processes Geophys., 27, 209-234, 2020 www.nonlin-processes-geophys.net/27/209/2020/ space and time. Thus, the small-scale velocity is defined by the Fourier transform of the streamfunction kernel, ψ σ . In order to link the unresolved ADSD to the kernelσ which defines the unresolved velocity, we note that where µ( ) is the surface of the spatial domain and θ k is the angle of the wave vector k. From Eqs. (6)-(11), we can finally express the unresolved velocity as follows: Again the simulated unresolved velocity ADSD is physically relevant, while the KE spectrum is not. Indeed, the simulated unresolved velocity ADSD is expected to match the true (time-correlated) unresolved velocity ADSD, whereas the KE spectra of the simulated and true unresolved velocities differ. Indeed, the true unresolved velocity correlation time spectral distribution τ (κ) is not restricted to the time step dt.
The tuning-free Eq. (12) is attractive because it is not a priori restricted to stationary flows or to a particular type of 2D incompressible dynamics. In order to appreciate that, we can numerically compare it with the previous method of Resseguier et al. (2017b). For stationary, fully developed turbulence and after including a tuning stage to optimize the match, both parameterizations give approximately the same results. Therefore, in order to perceive improvements between the two simulations, we must either work with nonstationary flows or with erroneous tuning. This latter scenario is of course the meaningful one for the application of such models to the real world, where turbulence is non-stationary and heterogeneous, and so region-by-region and season-byseason tuning is impossible to do with quality checks in place.
We believe that non-stationarity may yield a fair, yet idealized, comparison. So, we here compare the two parameterizations in a non-stationary case. Figure 2 below shows simulated buoyancy fields initialized with "case 2" of Constantin et al. (1994) and corresponding errors. After 2 d of advection, there is no turbulence yet, and a 128 × 128 resolution is sufficient to correctly resolve every scale. Therefore, no stochastic subgrid parameterization is needed. The selfsimilar method automatically adapts to the situation, whereas the method of Resseguier et al. (2017b) introduces spurious buoyancy isoline roughness by randomly folding the isolines. Accordingly, the method of Resseguier et al. (2017b) introduces more errors.

Heterogeneity
Our stochastic parameterization randomly folds tracer isolines. This process is often desirable. For instance, it can trigger physically relevant instabilities, such as the filament instabilities highlighted by Resseguier et al. (2017b). After these instabilities have been randomly triggered, eddies are formed by non-linear processes. In a similar way, Figs. 3 and 4 show that our stochastic dynamics enable a more-realistic eddy distribution than deterministic simulations. However, a homogeneous small-scale velocity may also perturb the tracer isolines which should remain still (i.e. which remain still in high-resolution deterministic simulations), e.g. sharp, straight, and coherent fronts. Figure 4 also highlights this drawback. A typical application of this problem in more realistic flow simulations is the simulation of jets like the Gulf Stream and regions of the Antarctic Circumpolar Current. These real-world jets are associated with diffusivity suppression (Ferrari and Nikurashin, 2010), and this effect is not present in our formulation so far. If we seek to preferentially perturb some tracer gradients, a heterogeneous small-scale velocity is required. Note that the heterogeneity discussed here needs to be non-stationary and thus cannot be represented by the stationary EOF presented later in this paper. Besides, in a small ensemble of realizations, relevant heterogeneity of the small scales may make the spreading more accurate for UQ and enable comparable ensemble forecast accuracy with fewer members. We here propose a possible heterogeneous version of the previous method.
In order to obtain a heterogeneous model of the unresolved velocity, we need a heterogeneous version of the ADSD (Eq. 4). Since the wavenumber, κ, cannot depend on the position, x, the constant, C w , and/or the spectrum slope, r, should do. A spatially varying spectrum slope is probably difficult to estimate. Hence, we restrict ourselves to a spatially varying constant, C w , and a spatially homogeneous spectrum slope. The constant may also varies with the time and the wavenumber. According to the Kolmogorov theory (e.g. Frisch, 1995) and Eq. (3): where F is the energy flux through the spatial scales and p = 1/3 for a SQG flow (cst. is an abbreviation for constant).
More specifically, the energy flux describes the energy moving from scales larger than 1/κ toward smaller scales and can be computed as follows:  This particular initial condition corresponds to case 2 studied by Constantin et al. (1994Constantin et al. ( , 1999Constantin et al. ( , 2012) at a resolution of 128 2 . The lowresolution deterministic dynamics shows too many filaments and not enough eddies.
where q is the transported (up to possible source terms) quantity, g < κ is the low-pass filtered version of g (setting to zero the Fourier modes of g which have frequencies larger than κ), and • stands for the spatial average (Frisch, 1995). For a SQG flow, q corresponds to the buoyancy normalized by the stratification: q = b/N. The energy flux is essentially a third-order moment. It is very important because it describes the cascade of the flow by non-linear energy transfers (Frisch, 1995).
If the energy flux through scale is understood locally in space (as indeed Smagorinsky, 1963, also assumes), Eq. (14) provides a natural parameterization of the unresolved ve- , KE spectrum (m 2 s −2 /(rad m −1 )), ADSD (m 2 s −1 /(rad m −1 )), and variance tensor at t = 15 d for 128 2 -resolution stochastic dynamics with (from top to bottom) homogeneous unresolved velocity, unresolved velocity modulated by ∇b 1/4 , and unresolved velocity modulated by 1/6 F . This particular initial condition corresponds to case 2 studied by Constantin et al. (1994Constantin et al. ( , 1999Constantin et al. ( , 2012) at a resolution of 128 2 . The low-resolution stochastic simulations do not show too many filaments and show enough eddies. Among the stochastic simulations, the energy-flux modulation better preserves the sharp straight fronts (e.g. front from (0 m, 2 × 10 5 m) to (2.5 × 10 5 m, 5 × 10 5 m)). locity heterogeneities. We simply modulate the unresolved ADSD (Eq. 6) by the heterogeneous ratio p F / p F , averaged over the resolved inertial range wavenumbers.
Heterogeneous modulation where P = I d − −1 ∇∇ T is the projector onto the space of free-divergence functions. This parameterization is physically meaningful, since, locally in space, a stronger direct cascade at large scales (larger F and thus larger C w ) suggests that the unresolved velocity (large) should maintain this cascade by folding smaller-scale tracer structures. Furthermore, considering that the energy flux is a third-order structure makes this parameterization relevant to differentiate between strait fronts and curved structures (e.g. eddies). Indeed, at least three points are needed to define a curvature and differentiate between these structures. Figure 4 confirms that this modulation enables a more accurate spatial distribution of the stochastic folding.
In order to keep a divergence-free velocity and the ensuing properties (e.g. energy conservation), the modulated velocity is projected onto the space of free-divergence functions, using the operator P. Because of that we do not consider the advection correction w * − w = − 1 2 (∇ · a) T of the LU formalism. Indeed, here the variance tensor has the simple form a = 1 d tr(a)I d . As such, the advection correction is a gradient field and is hence removed by the projection onto the space of free-divergence functions.
Although relevant, the comparison of Figs. 3 and 4about front dynamics -remains qualitative. For a quantitative demonstration, adapted metrics should be used. Indeed, simple isotropic, homogeneous second-order statistics (e.g. KE spectra) cannot distinguish between eddies and filaments. Bi-spectra may overcome this drawback, since they express three-point statistics. This quantitative analysis would necessitate studies of the metrics themselves. Therefore, these analyses will be addressed in future work.
Many other closures rely on the Kolmogorov (Kolmogorov, 1941) model (Eqs. 3-14): in particular, the famous Smagorinsky model (Smagorinsky, 1963) and its variants (e.g. Fox-Kemper and Menemenlis, 2008;Bachman et al., 2017) provide a path to developing deterministic and dissipative scale-aware subgrid models. Typically, these models result in a Laplacian dissipation which involves a heterogeneous eddy diffusivity or viscosity coefficient, ν Sm . Aside from their heuristic theoretical justification, these Smagorinsky-type subgrid terms are formally equivalent to the turbulent dissipation of our stochastic model: ∇· 1 2 a∇q (Resseguier et al., 2017a), where a = 2ν Sm I d and q is a transported quantity. This similarity suggests that a Smagorinskytype model could provide a good estimate for our variance tensor and thus for the heterogeneity of the unresolved velocity.
The goal of the Smagorinsky model is to optimize the KE spectrum by targeting a specific turbulent diffusive scale adapted to the simulation resolution. A turbulent dissipation coefficient expression can be derived from the Kolmogorov model (Eqs. 3 and 14) and the closure, where D is the dissipation, a second-order moment related to the molecular or turbulent diffusion. To develop a Smagorinsky-type model, the resolved flux of the energylike conserved invariant is equated to the dissipation of the energy invariant such that From there, one can obtain an eddy diffusivity or viscosity coefficient proportional to ∇q h . For an SQG flow, the exponent is h = 1/2, where q = b is the buoyancy .
In the Kolmogorov theory of homogeneous and stationary turbulence, the energy flux is a constant of the flow, and the closure (Eq. 19) is an exact result of a simple energy budget over an ensemble mean. Indeed, there is no accumulation of energy at any scales in a stationary regime. This closure is very useful since the dissipation (Eq. 19) is generally much simpler to compute than the energy flux. Nevertheless, in every flow realization, the energy flux and the diffusion vary with space and time, and they do not match each other locally. For instance, a strong straight front of an SQG flow involves a large dissipation but no energy cascade because the velocity is aligned with front (see Eq. 15). Moreover, in any bounded, limited resolution situation, the inertial cascade range is limited so the energy flux through scale varies with the wavenumber, especially outside the inertial range.
The discrepancy between energy flux and dissipation is not so much of an issue for the Smagorinsky model because its aim is the optimization of a second-order statistics at small scales. Unfortunately, this closure cannot be used to simplify the modulation computation in our parametric random model. Indeed, the stochastic dynamics relies on processessuch as folding -associated with higher-order statistics. Figure 4 (middle right) illustrates this statement. The following unresolved velocity parameterization is used there: Heterogeneous modulation Along sharp straight fronts, the dissipation will be larger. Accordingly, the associated modulation ∇b 1/4 enhances the stochastic folding where one would need it to weaken.

Data-driven model for the unresolved velocity
In this section, we detail a procedure proposed by Cotter et al. (2019a) for the estimation of the (weighted) EOFs, (ξ k ) k , involved in the unresolved velocity definition, If we denote λ k as the L 2 norm of those EOFs, their normalized versions (ξ k = ξ k /λ k ) k are the eigenvectors of the self-adjoint operator, defined by the small-scale velocity covariance σ (x)σ T (y)/dt . (λ k /dt) k is the set of eigenvalues of this operator.
The data-driven methods of Cotter et al. (2019a) relies on Lagrangian paths defined at two "resolutions". The first paragraph of this section defines these two types of Lagrangian Nonlin. Processes Geophys., 27, 209-234, 2020 www.nonlin-processes-geophys.net/27/209/2020/ paths. Then, we propose several ways to identify the time increments of the infinite-dimensional Brownian motion, σ B t , from these Lagrangian paths. Preprocessing of the increments are needed in order to meet some structural assumptions. After this, we relate the increment covariance to the EOFs.

Preliminary definitions
We introduce two types of velocity field: a high-resolution velocity, v, on a fine mesh-grid (512 2 ), a low-resolution velocity, v, on a coarse mesh-grid (64 2 ). This velocity field is a spatially low-pass-filtered version of f .
Then, two types of Lagrangian path are defined: a "high-resolution flow", X ij (t 0 , t), defined by the highresolution velocity, u: with x ij = (i x, j y).
a "low-resolution flow", X ij (t 0 , t), defined by the lowresolution velocity u:

Candidate for the increment realization
In order to estimate the EOFs, (ξ i ) i , involved in Eq. (A5), we assume that we can observe increments We will interpret the following residual flow increments as a realization of the above:

Preprocessing
The increments are supposed to be centred and divergence free (as ∇ · ξ i = 0, ∀i). Therefore, after computing the residual flow increments, X m ij m , they are centred, with the estimator and projected onto the space of divergence-free functions,

Covariance, quadratic covariation, and EOF
Then, we can define the EOFs by an estimate of the spatial covariance of the residual flow increments (averaging over the time index, m).
In order to properly define the EOFs, we must add the following orthogonal constraint: Finally, after estimating the (ξ k ) k offline, the ensemble forecast can generated online with Eq. (22).

Numerical simulations and uncertainty quantification
3.1 Surface quasi-geostrophic model

Deterministic SQG
This is a simplified model to describe the ocean surface dynamics at mesoscales (i.e. horizontal length scale of the order of 100 km) (Blumen, 1978;Held et al., 1995;Lapeyre and Klein, 2006a;Constantin et al., 1994Constantin et al., , 1999Constantin et al., , 2012Lapeyre, 2017). The buoyancy, b, is transported at the ocean surface by a horizontal velocity field, where D Dt stands for the horizontal (deterministic or stochastic) material derivative and S represents possible sources and sinks. As the potential vorticity is assumed to be zero in the fluid interior, the Fourier transform of the velocity streamfunction,ψ, is related to the Fourier transform of the buoyancy,b, by the following SQG relationship: where N is the stratification and k is the wave vector.

Stochastic SQG
The LU and SALT versions of the SQG dynamics are formally similar to the deterministic model. However, the buoyancy transport (Eq. 34) has to be understood in the stochastic sense (Eq. A4). Furthermore, the SQG relationship (Eq. 35) must be interpreted with in the SALT context, and w = ∇ ⊥ ψ = ∇ ⊥ (− ) −1/2 (b/N ), in the LU one. Besides, in the LU SQG, the advecting drift w has to be divergence-free. We enforce this constraint by projecting the drift correction (w * − w) onto the space of free-divergence functions. So, we have where P = I d − −1 ∇∇ T . Indeed, the SQG model is derived from a QG model with a transport of the buoyancy at the surface. Then, the potential vorticity (PV) is assumed to be zero inside the fluid. In the SALT framework, the PV reads 0 Nevertheless, the slight difference between the SQG SALT and the SQG LU models are not considered in this section since we neglect the advection correction of the LU framework (w * −w). So, we simulate the stochastic transport (Eq. 34) coupled with the SQG relation (Eq. 36). The unresolved velocity statistics encoded in σ are specified either by the self-similar method of Sect. 2.1.1 or by the data-driven method of Sect. 2.2.

Flow simulation
We perform a high-resolution simulation (512 2 spatial grid) of the deterministic SQG model with the following initial condition: b(x, t = 0) = 0.2B 0 cos 2π L (x + y) with The source term, S, involves a hyperviscosity, a linear drag, and an additive stationary forcing such that The parameters of the simulations are summed up in Table 1. The influence of the initial condition remains for about a month. Then, the turbulence is maintained by the forcing as illustrated by Fig. 5. From t = 50 d to t = 100 d, the EOFs of the data-driven method are learned. From t = 100 d to t = 130 d, the deterministic high-resolution simulation is used as a reference. In this time interval, several stochastic low-resolution (64 2 ) simulations are performed and compared. These simulations are initialized at t = 100 d with the reference high-resolution simulation projected at low resolution (i.e. keeping only the Fourier modes associated with a coarse-resolution grid).

Learning and analysis of the EOF
Here, we describe the convergence of the EOF estimation. The accuracy of this estimation is in particular a function of the number of snapshots used and of the Lagrangian advection time, T . We first describe the convergence with the number of snapshots and then -as a remark -the convergence with the Lagrangian advection time, T .
The EOFs are learned between t = 50 d and t = 100 d from 12 465 spatial fields of residual flow increments. Even though a large number of snapshots are used, the correlation time of the residual flow increments is about 1 d. This is not negligible compared to the estimation time window: 50 d. Accordingly, the EOFs are not fully converged. But, we expect this convergence to be sufficient for the present work. Moreover, for real applications at this spatial scale, we expect the unresolved velocity statistics to be non-stationary on temporal scales larger 50 d. Thus, learning a unresolved velocity stationary statistical representation on a larger time window might be difficult in practice. Therefore, even if our EOF estimation is not converged, it represents a realistic test case.
Remark 1. The integration time of the flows, T , is a critical parameter for the definition of the EOF. Indeed, for small advection time, T , the length of an increment of a Lagrangian flow path is proportional to the velocity and to the advection time. This is the so-called ballistic regime (Falkovich et al., 2001;Vallis, 2006). In particular, residual flow increments squared norm would be proportional to T 2 ; the variance tensor estimator would be proportional to T ; and the estimated EOFs would be proportional to √ T . With a larger advection time, T , the Lagrangian velocity decorrelates -along the flow path -from the initial Lagrangian velocity. When the advection time becomes larger than the correlation time of the Lagrangian velocity, the flow path begins to act as a Brownian motion (Falkovich et al., 2001;Penland, 2003;Klyatskin, 2005;Vallis, 2006;Keating et al., 2011). The length of a displacement scales as √ T ; i.e. the Lagrangian velocity acts as a white noise in time. This is the so-called diffusive regime. Figure 6 illustrates this convergence with the average tensor a 0 = 1 d tr(a) with T . In this paper, the Lagrangian advection time T is computed from a CFL (Courant-Friedrichs-Lewy condition) at the coarse resolution of 64 2 -about 300 s. Although it corresponds to the ballistic regime (i.e. the flow increments are correlated), this choice is coherent with the work of Cotter et al. (2019a) and gives very good UQ results. Moreover, the residual flow increments -and hence the EOFs -are T . This is the relevant regime for the estimation of the EOF, since we meet the fundamental assumption of our model: a Brownian behaviour for the small-scale flow. For method 1, the diffusive regime is not visible, at least in this advection time window, and the EOF estimation is theoretically not possible. In contrast, method 2 provides a converged estimator of the variance tensor for an advection time T > 1 d. spatially aliased, since a large part of those increments lives at small spatial scales but are spatially sampled on a coarse spatial grid, x ij . Figure 7 confirms this idea. If all the 2 × 64 2 − 1 = 8191 EOFs are considered, the ADSD of the unresolved velocity reveals a strong spatial aliasing.
Both the data-driven and the non-data driven methods exhibit large unresolved velocities at the smallest scales of the coarse resolution grid (see Fig. 7). Nevertheless, the two ADSDs are distinct. In particular, the data-driven method involves some large spatial scales. One could think that these large-scale components would disappear if the flow X and X are integrated in a Eulerian way rather than in a Lagrangian way. However, with the advection time, T , being very small, few differences are expected between a Eulerian and a Lagrangian advection. New numeric experiments confirm this idea (not shown). A complete study of these effects is beyond the scope of this paper.

One realization
We now simulate the LU SQG dynamics (equivalent to SALT SQG) (Eqs. 34-36), at low resolution (64 2 ), with two possible parameterizations for the unresolved velocity: either the data-driven model (see Sect. 2.2) or the parametric and self-similar model (see Sect. 2.1). For the data-driven model, we keep 200 EOFs. This choice will be explained in the next section. For all simulations, there is a unique reference initial condition at t = 100 d. This initial condition is the low-resolution (64 2 ) projection of the reference deterministic high-resolution (512 2 ) simulation (i.e. the Fourier modes associated with large wave vectors are set to zero, and the obtained spatial field is subsampled at the low resolution). Spatial fields, KE spectra and ADSD are plotted in Figs. 8 and 10. For comparison purposes, we also show the low-resolution deterministic SQG simulations and the low-resolution (64 2 ) projection of the reference deterministic high-resolution simulation. As already pointed out by Resseguier et al. (2017b) for free-decaying turbulence, the realizations of the LU dynamics are no worse than a lowresolution deterministic simulation. For the short-term simulations, our stochastic subgrid parameterizations have often weak improvements on the low-resolution simulations, even though, sometimes, the stochastic subgrid parameterization can improve the simulation. Indeed, Resseguier et al. (2017b) show that the LU dynamics at a resolution of 128 × 128 can trigger filament instabilities by random destabilization and hence obtain a more realistic proportion of eddies and filaments. This is confirmed by Figs. 3 and 4, also at a resolution of 128×128. In Fig. 8, the resolution is coarser (64×64). Therefore, the stabilizing deterministic subgrid tensor (hyper viscosity) is stronger. This may explain an inhibition of filament instabilities here and hence less difference between deterministic and stochastic coarse simulations. Nevertheless, our main goal is not improving a single simulation. Our main goal is improving the uncertainty quantification -as developed below -without deteriorating single simulations.
At t = 110 d, all the spatial fields of Figs. 8 and 9 are still similar, whereas at t = 120 d, the spatial fields strongly differ. Thus, the predictability timescale -i.e. the time over which initial conditions are forgotten -is between 10 and 20 d. The larger features evolve more slowly, leading to a longer predictability timescale. This effect can be seen in the similar lo-cation across simulations of larger vortices in Fig. 10, while the filaments between the vortices have lost all coherence. In Fig. 11, the larger, more-coherent features persist in the ensemble mean of the coarse-resolution simulations, while the smaller scales cancel. It is the persistent features which are the basis for robust forecasts as will become clear in the next section, and it is the improvement of these robust features that is the goal of the parameterizations, not the improvement of individual filaments in individual ensemble members.

Uncertainty quantification
We now forecast two ensembles of 200 realizations following the stochastic SQG dynamics (Eqs. 34-36), with the same unique reference initial condition at t = 100 d. Again, the ensembles members evolve at low resolution (64 2 ). The first ensemble is generated with the data-driven model for the unresolved velocity (see Sect. 2.2), while the second ensemble is generated with the parametric and self-similar model for the unresolved velocity (see Sect. 2.1). For the data-driven model, we keep only 200 EOFs, since the number of EOFs cannot be larger than the ensemble size without increasing the complexity of the algorithm.
With such ensemble forecasts, we aim at representing the variety of possible behaviours of the fluid dynamic system. In particular, the spreading (i.e. the variance increase) of an ensemble is expected to make an ensemble be closer to the reference. In such a case, the standard deviation -at each point and at each time -is expected to be of the order of the bias. Figure 12 shows that both the data-driven and the self-similar parameterizations achieve this goal everywhere and for every time. The pointwise biases of the data-driven method, of the self-similar method, and of a deterministic simulation (at the same low resolution) are very similar (not shown). Therefore, we only plot the pointwise bias of the data-driven method. In Fig. 12, the represented biases and error estimations (mean ±1.96 times the pointwise standard deviation 1 ) are normalized by the squared energy of the reference solution. Note that those relative pointwise biases increase very quickly with time.
After t = 105 d and especially after t = 115 d, a bifurcation plays a large role in the simulation error and its estimation. This bifurcation is due to the chaotic trajectory of an eddy -centred on (525, 300 km) at t = 105 d. Due to the incorrect trajectory in the ensemble mean, a large yellow spot develops in the bias images. Both ensembles capture well this variability by creating similar spots. According to the doubling of those spots, there are probably only two likely trajectories for this eddy, at least until t = 118 d.
Similar UQ results have been obtained on a freeturbulence flow (Resseguier et al., 2017b). This close anal-ysis has also compared the UQ potential of LU/SALT algorithms against that of a deterministic dynamics with random initial conditions. The latter has shown an underestimation of errors by one order of magnitude. Figure 12 offers a visual validation of our methods' UQ potential in the whole spatial domain. Nevertheless, the pointwise laws of the ensembles are not Gaussian, since we consider a non-linear evolution law with multiplicative noise. Therefore, the pointwise confidence interval is not in general a symmetric interval centred on the pointwise mean with width 2 × 1.96 times the pointwise standard deviation. Such an interval is only a first approximation. In order to be more accurate in our analysis, we now focus on few spatial points. There, we compute -at each time -the true ensemble-based confidence intervals at 95 % and at 50 % and compare them with the reference value. Figure 13 display the results at two grid points. Again, the results are impressively good. Similar UQ proxies have been presented by Cotter et al. (2019a) with a SALT 2D Euler dynamics with Dirichlet boundary conditions. This work also highlights the excellent UQ skills of our frameworks.
To conclude this quantitative UQ analysis, we study the effect of the number of realizations needed. How many is a main issue in operational weather forecast centres, since each realization is costly. Accordingly, we start again the UQ analysis but with 20 ensemble members only. Figure 14 shows that the 97.5 % and the 2.5 % quantiles get closer, compared to the 200-ensemble-member case. It means that the probability density tail estimations -i.e. the representations of extremes -are slower to converge than the more likely values. Nevertheless, the ensembles still capture well the reference dynamics of the centre of the distribution.

Conclusion
This paper develops the SALT and LU models which coexist in a single family of stochastic schemes. In addition to their general theoretical properties, we have discussed and compared possible parameterizations for this new scheme family.
Appendix A highlights the strong theoretical similarities between SALT and LU stochastic subgrid tensors at the heart of their parameterizations. These frameworks assume that tracers are transported by the sum of a resolved large-scale velocity and an unresolved time-uncorrelated velocity. As already mentioned in the literature, these subgrid models can conserve some but not all invariants. The SALT framework imposes helicity and circulation conservation in two and three dimensions and enstrophy conservation in two dimensions, whereas LU dynamics strictly conserves kinetic energy. Yet, for a homogeneous unresolved turbulence, we have proved that -on average -LU dynamics also conserve the helicity and circulation.
This paper mainly focuses on numerical parameterization. We have formulated and described several parameterizations Figure 7. Buoyancy field (m s −2 ) (a), kinetic energy spectrum (m 2 s −2 /(rad m −1 )) (b), and ADSD (m 2 s −1 /(rad m −1 )) (c) of w, at t = 105 d, ADSD of σḂ from the non-data-driven method (without a multiplicative constant); ADSD of σḂ from the data-driven method, with 2, 20, 200, 2000, and 8000 EOFs, and a slope of − 5 3 (b); and a corresponding ADSD (c) in a black solid line. The two dashed vertical lines define the interval where coefficients C w and r w are fitted. The dashed oblique line is the resulting fit (it is set to match A w in the left bound of the interval). Figure 8. Buoyancy fields (m s −2 ) (left), kinetic energy spectra (m 2 s −2 /(rad m −1 )) (middle), and ADSDs (m 2 s −1 /(rad m −1 )) (right) of w, in blue, ADSDs of σḂ, in green, at t = 110 d, for (from top to bottom) the low-resolution (64 2 ) projection of the reference deterministic high-resolution (512 2 ) SQG dynamics; the low-resolution (64 2 ) deterministic SQG dynamic; one realization of the low-resolution (64 2 ) LU SQG dynamics (equivalent to SALT SQG) with the self-similar parameterization; and one realization of the low-resolution (64 2 ) LU SQG dynamics (equivalent to SALT SQG) with data-driven parameterization. Figure 9. Buoyancy fields (m s −2 ) (a, b), kinetic energy spectra (m 2 s −2 /(rad m −1 )) (c, d), and ADSDs (m 2 s −1 /(rad m −1 )) (e, f) of w, at t = 110 d, for ensemble mean of the low-resolution (64 2 ) LU SQG dynamics (equivalent to SALT SQG) with self-similar parameterization (a, c, e) and with data-driven parameterization (b, d, f).
which apply to both SALT and LU frameworks. Parameterization for SALT or LU means choosing a spatial covariance for the unresolved small-scale velocity (i.e. choosing σ ). The stationary homogeneous self-similar parameterization of Resseguier et al. (2017b) has been improved to make it non-stationary and tuning-free. Spectral properties are learned "on the fly" from the resolved large-scale velocity as is common in the practice of large-eddy simulation. A heterogeneous modulation of that parameterization -based on the energy flux through scales -is also proposed. This modulation naturally comes into play when considering Kolmogorov-like energy cascades and spectra. Because the energy flux quantifies the energy cascade, which is assumed to be constant across scales, the modulated unresolved velocity acts only where there is a large-scale energy cascade. This parameterization improvement enables a better simulation of straight strong fronts in SQG dynamics. However, the usual convenient approximation produced in the Smagorinsky diagnosis of the energy flux -the dissipation -cannot be used accurately here. We also recall the stationary heterogeneous data-driven parameterization method of Cotter et al. (2019a. Here, small Lagrangian displacement increments are computed at distinct resolutions from highresolution simulation outputs. The spectral decomposition of the Lagrangian displacements covariance leads to a light and convenient representation of the unresolved velocity statistics: the Lagrangian displacement EOFs.
Finally, two tuning-free parameterizations have been numerically compared: the new non-stationary homogeneous self-similar parameterization for LU and the stationary heterogeneous data-driven one of Cotter et al. (2019a for SALT. The test case is a homogeneous and stationary forced SQG turbulence whose LU and SALT versions are equiva-lent. Single realizations of the LU test case are found to be -at least -as good as the result of the corresponding deterministic simulation at the same resolution. For both parameterizations, the ensemble forecasts are found to predict the right amplitudes and positions of numerical errors. The uncertainty skills of the ensemble forecasts remain impressively good even with only 20 realizations.

Discussion
A lot of theoretical and practical questions remain about SALT and LU schemes.
The numerical explorations of this paper have focused on SQG dynamics because -except for the neglected advection correction -SALT and LU SQG models coincide. This simplification has enabled a clearer parameterization comparison. An interesting dual study would be the comparison of distinct SALT and LU models but with a fixed parameterization (i.e. fixed σ model).
For this purpose, the simplest dynamics to consider would be 2D incompressible Navier-Stokes equations. The 3D or quasi-geostrophic (QG) dynamics would be appropriate, although they possess a larger parameter space and more costly computation. The SALT version will conserve helicity and circulation, while the LU one will conserve kinetic energy. It would be a very interesting exercise to examine the inertial cascades of energy, enstrophy, and potential enstrophy in these systems and how they are affected by the SALT versus LU assumptions. Nonetheless, even an understanding of these cascades is probably not sufficient to objectively conclude which framework is more appropriate in general, as the answer will likely depend on the application, model resolu- Figure 10. Buoyancy fields (m s −2 ) (left); kinetic energy spectra (m 2 s −2 /(rad m −1 )) (middle); and ADSDs (m 2 s −1 /(rad m −1 )) (right) of w, in blue, and ADSDs of σḂ, in green, at t = 120 d, for (from top to bottom) the low-resolution (64 2 ) projection of the reference deterministic high-resolution (512 2 ) SQG dynamics, the low-resolution (64 2 ) deterministic SQG dynamic, one realization of the low-resolution (64 2 ) LU SQG dynamics with self-similar parameterization, and one realization of the low-resolution (64 2 ) LU SQG dynamics (equivalent to SALT SQG) with data-driven parameterization. tion, etc. A variety of idealized and applied numerical studies will probably be necessary to gain insight into how to optimize in a variety of settings. And even for one specific situation, the quality and skill metrics of choice are not obvious. Even in the simple cases studied here, the usual second-order statistics are probably not sufficient to discriminate between these dynamics. Furthermore, it is important to note that the exact conservation of vorticity by the SALT scheme and energy by the LU scheme may be counterproductive in certain heterogeneous settings where the small-scale features are required to exhibit systematic property transport. One interesting example is the oceanic wind-driven gyre, whose circulation magnitude depends critically on the gyre and its turbulence to relocate the source of vorticity and energy from the wind to the regions where dissipation occurs. An eddy vortic-ity transport across large-scale streamlines (i.e. affecting the interpretation of the Kelvin circulation theorem) is needed (Fox-Kemper and Pedlosky, 2004). A cross-streamline energy transport is also part of the system equilibration (Scott and Straub, 1998). In the real ocean, the eddies shed in the Agulhas retroflection are a classic example of organized small-scale transport (Gordon, 1985) as are the "eddy cannons" that fire mesoscale eddies across the Antarctic Circumpolar Current (Hallberg and Gnanadesikan, 2006).
In the presence of turbulence heterogeneity, another distinction between SALT and LU models is the advection correction. It is also the single distinction between LU dynamics and Mikulevicius and Rozovskii (2004a). So far, it is not clear what constitutes a "large-scale velocity" in practice. Again, this probably depends on the situation. Even though Figure 11. Buoyancy fields (m s −2 ) (a, b), kinetic energy spectra (m 2 s −2 /(rad m −1 )) (c, d), and ADSDs (m 2 s −1 /(rad m −1 )) (e, f) of w, at t = 120 d, for the ensemble mean of the low-resolution (64 2 ) LU SQG dynamics (equivalent to SALT SQG) with self-similar parameterization (a, c, e) and with data-driven parameterization (b, d, f). Cotter et al. (2017) have provided some first theoretical clues, further explorations are needed. Numerical and experimental work will probably help if one identifies and studies simple heterogeneous flows meeting the main assumption of SALT and LU models: the velocity timescale gap.
The parameterizations for SALT and LU presented in this study (e.g. the ADSD method) could be naturally extended to physical-domain-based implementation, as is an important step in evaluating schemes for operational use, e.g. Pearson et al. (2017). Indeed, Sect. 2.1 details this tuning-free parameterization in the Fourier space only. A convenient dual of self-similar spectra (i.e. spectra scaling as k α ) in the physical space is the widely used Matérn covariance (Williams and Rasmussen, 2006;Lim and Teo, 2009;Lilly et al., 2017). Thus, the spatial filterσ -appearing in σ dB t =σ * dB t -of a possible future 2D physical-domain-based ADSD parameterization would probably be built from the curl of a Matérn covariance.
Nevertheless, the homogeneity of the ADSD parameterization is a drawback. Heterogeneous solutions need to be developed. Unfortunately, the ADSD heterogeneous modulation of this paper remains expensive to compute, even making use of the Fourier space. Moreover, even though its third-order physical meaning and its natural association with SALT and LU is very instructive, its additive value for UQ is not clear.
The heterogeneous parameterization method of Cotter et al. (2019a is valuable in the presence of some turbulence heterogeneities (e.g. heterogeneities induced by fixed boundary conditions). However, the practical usefulness of this parameterization remains debatable, because its heterogeneity is assumed to be stationary. Other ways for calibrat-ing the subgrid unresolved statistics using machine learning is also on-going research.
Many parameterizations for SALT and LU are now available. Regardless of their differences, the resulting stochastic dynamics and associated UQ skill seems to be relatively independent of the particular parameterization choice. Notwithstanding, we think that learning-free heterogeneous and non-stationary parameterizations may further improve SALT and LU UQ skills and/or enable smaller ensemble size and thus more efficient computation.
Finally, the main goal of all these studies is SALT-and LUbased data assimilation. Cotter et al. (2019b) have opened the way, using a particle filter and the heterogeneous parameterization method of Cotter et al. (2019a. Observed data from a dynamical system of O(10 6 ) degrees of freedom were successfully assimilated into the parameterized system of O(10 4 ) degrees of freedom. We expect that many other data assimilation studies will follow.   www.nonlin-processes-geophys.net/27/209/2020/ Nonlin. Processes Geophys., 27, 209-234, 2020 Appendix A: Theoretical motivations Here, we briefly highlight the similarities and differences between the location uncertainty model and the stochastic Lie transport model. To simplify the comparison, we work in the Stratonovich representation and restrict ourselves to the incompressible case.

A1 Lagrangian path
From a large-scale under-resolved point of view, SALT and LU methods assume that the fluid velocity is partially uncorrelated in time. Hence, the position of a Lagrangian particle X t evolves in time according to where σḂ is time-uncorrelated and w * is the Stratonovich large-scale drift term. Formally, for a spatial domain ⊂ R d , the process (t −→ B t ) is a cylindrical I d (L 2 ( )) d -Wiener process (see Da Prato andZabczyk, 1992, andPrévôt andRöckner, 2007, for more information on infinite-dimensional Wiener processes and cylindrical I d -Wiener processes). Recently, Cotter et al. (2017) have rigorously shown that such a decomposition corresponds to the limit of a deterministic flow when the correlation time of the small-scale velocity goes to zero.

A2 Notation correspondences
The Lagrangian equation can also be written with Itō notations as follows: Note the difference in the drift term when compared with Eq. (A1). Table A1 summarizes the notations differences between SALT and LU both for Itō and Stratonovich notations.

A3 Scalar transport
For a -possibly active -scalar-tracer denoted q, the SALT prescribes the same type of evolution equation as the LU models.
where the material derivative, Dq dt , (in Stratonovich notations) is simply In Itō form, the transport Eq. (A3) makes explicit the turbulent diffusion and the centred anti-symmetric multiplicative noise (Resseguier et al., 2017a).
We again highlight that this analysis holds for both SALT and LU approaches.

A4 Euler models
Nonetheless, the incompressible stochastic transports of velocity and vorticity differ between LU and SALT. First, the SALT approach considers the transport -through a specific Lagrangian choice and up to some forcings -of the Stratonovich large-scale linear momentum, ρw * , whereas the LU Euler derivation assumes the transport of the Itō large-scale linear momentum, ρw. Furthermore, due to its geometrical approach, the SALT Euler equation involves an additional term.
Specifically, the LU Euler equation with neither viscosity nor Coriolis force reads D Dt w Due to the transport of ρw whereas the SALT version is Unlike in the SALT model, the large-scale transported velocity of the LU Euler, w, differs from the large-scale transporting velocity, w * . The latter implicitly appears in the stochastic transport operator, D Dt , of both the SALT and LU equations. Thus, in the LU equations, we can see the correction Table A1. Notation equivalences between LU and SALT approaches.
T as a modification of the large-scale advection. Note that this modification cancels for a homogeneous small-scale velocity. Otherwise, whether the Itō drift, or the Stratonovich drift, On top of this large-scale advection difference between SALT and LU modelling, the SALT additional term, ∇ẋ T t w * , has major consequences on the dynamics invariants, as explained in Sect. A6.

A5 Other dynamical models
The differences between LU and SALT momentum equations -Eqs. (A6) and (A7) respectively -resemble the distinctions between other classical and geophysical fluid dynamic models. Table A2 recalls some of these classical models in SALT and LU stochastic formulations. The LU Euler equations can be found in Mémin (2014) without noise and in Resseguier (2017) including noise. The vorticity equation in each case is easily derived by taking the curl of the Euler equation. The LU QG equations under moderate influence of turbulence has been derived by Resseguier et al. (2017b) with Itō notation. In Table A2, we have written the same equation in Stratonovich notation. The SALT Euler, vorticity and quasi-geostrophic equations are derived by Holm (2015).
Both SALT and LU assumes the stochastic transport of tracers (e.g. temperature and salinity). For QG models, it implies the transport of buoyancy, b, at the surface (z = 0). Therefore, up to the drift correction w * − w in the streamfunction definition, SALT and potential vorticity (PV) coincide. In the SQG framework, the PV is assumed to be zero in the ocean interior (z < 0). This leads to the same relationship between buoyancy and streamfunction, ψ, in LU and SALT dynamics. Therefore, in the Sect. 3 of this paper, we will choose this model to compare two parameterization (i.e. choice of σ ) for SALT and LU methods in a common ground.

A6 Invariants
To interpret these results which apply more generally, mutatis mutandis the KE is conserved by the LU scheme and its variants (Resseguier et al., 2017a), while the enstrophy, its generalizations (e.g. PV), the circulation, and the helicity are conserved by the SALT scheme and its variants (Holm, 2015). In the specific case of homogeneous small-scale velocity, the LU dynamics also conserve circulation and helicity in average (see Appendix A7 for circulation mean conservation; the proof of helicity mean conservation is similar). However, in the homogeneous case, this stochastic dynamics increases enstrophy mean and its generalizations (Resseguier et al., 2017b), while SALT models increase KE mean (Resseguier, 2017).
These distinctions between conservation of vorticity in the SALT approach and conservation of energy in the LU approach would be of critical importance when choosing a model for 2D or QG turbulence, as Kraichnan (1967) and Charney (1971) show that the conservation of energy and enstrophy lead to turbulence typified by an inverse energy cascade at large scales and a direct (potential) enstrophy cascade at small scales (Fox-Kemper and Menemenlis, 2008). In SQG, Blumen (1982) shows that a dual cascade results from conservation of depth-integrated energy and available potential energy on level boundaries. The former is not singled out for special treatment in the SALT or LU SQG formulation, but the latter is exactly conserved in both formulations.

A7 Kelvin theorem under location uncertainty
The Kelvin theorem describes the variation of the circulation, which is defined as follows for LU dynamics as follows: w(x, t) · dl = C(0) w(X t (l 0 ), t) T J M dl 0 , where l = X t (l 0 ) is the Lagrangian path labelled by the initial position l 0 , C(t) = X t (C(0)) is a material loop at time www.nonlin-processes-geophys.net/27/209/2020/ Nonlin. Processes Geophys., 27, 209-234, 2020 Table A2. Some LU and SALT dynamics models with Stratonovich notations.
This equation is the equivalent of the Reynolds transport theorem for vorticity Fox-Kemper and Pedlosky (2004) but in the stochastic framework. This noise term is a priori not centred, since it is a Stratonovich noise. However, in the homogeneous case, its ensemble mean cancels. Indeed, using Itō notations, we simply need to compute a quadratic cross variation. This calculus is possible in Lagrangian coordinates (i.e. when functions are composed by x −→ X t ) by noticing that Let us assume the simulated evolution law is D t b = −ν(− ) p bdt. The deterministic subgrid model −ν(− ) p b acts, in a finite time t, as a low-pass filter. In Fourier space, this filter is If the hyperviscosity ν is well chosen, we may expect that at the Shannon resolution π/ x = κ M , only 10 % of the energy is left by the filter, i.e. F (κ M ) = 1/10.
A ratio smaller than 10 % may lead to an over-damped simulation. Moreover, the precise value of this ratio does not influence much our final estimate.
We may define the effective resolution as the scale κ = κ m , where the deterministic subgrid model influence is negligible. There, we may expect the filter to be equal to 95 %, i.e. F (κ m ) = 95/100.
Data availability. No experimental data were collected nor used in this study. All data used were numerically generated from model implementations. The complete code can be found at the GitHub repository https://github.com/vressegu/LU_SALT_SelfSim (last access: 27 March 2020, GitHub, 2020).
Author contributions. VR and WP decided to perform this SALT versus LU numerical study. BFK obtained the funding and supervised the project. VR and BFK designed the ADSD methodology and its heterogeneous variants. VR developed the code and performed the simulations. WP helped to reproduce the data-driven parameterization of Cotter et al. (2019a). All authors prepared the paper.
Competing interests. One of the author is currently employed by a private company named SCALIAN.