Multivariate localization functions for strongly coupled data assimilation in the bivariate Lorenz ’96 system

. Localization is widely used in data assimilation schemes to mitigate the impact of sampling errors on ensemble-derived background error covariance matrices. Strongly coupled data assimilation allows observations in one component of a coupled model to directly impact another component through inclusion of cross-domain terms in the background error co-variance matrix. When different components have disparate dominant spatial scales, localization between model domains must properly account for the multiple length scales at play. In this work we develop two new multivariate localization functions, 5 one of which is a multivariate extension of the ﬁfth-order piecewise rational Gaspari-Cohn localization function; the within-component localization functions are standard Gaspari-Cohn with different localization radii while the cross-localization function is newly constructed. The functions produce positive semideﬁnite localization matrices, which are suitable for use in both Kalman Filters and variational data assimilation schemes. We compare the performance of our two new multivariate localization functions to two other multivariate localization functions and to the univariate and weakly coupled analogs of all 10 four functions in a simple experiment with the bivariate Lorenz ’96 system. In our experiments the multivariate Gaspari-Cohn function leads to better performance than any of the other multivariate localization functions.


Introduction
An essential part of any data assimilation (DA) method is the estimation of the background error covariance matrix P b .The background error covariance statistics stored in P b provide a structure function that determines how observed quantities affect the model state variables, which is of particular importance when the state space is not fully observed (Bannister, 2008).
A poorly designed P b matrix may lead to an analysis estimate, after the assimilation of observations, that is worse than the prior state estimate (Morss and Emanuel, 2002).In ensemble DA schemes the P b matrix is estimated through an ensemble average.Using an ensemble to estimate P b allows the estimates of the background error statistics to change with the model state, which is desirable in many geophysical systems (Smith et al., 2017;Frolov et al., 2021).However, this estimate of P b will always include noise due to sampling errors because the ensemble size is finite.In practice, ensemble size is limited by computational resources and hence sampling errors can be substantial.The standard practice to mitigate the impact of these errors is localization.A number of different localization methods exist in the DA literature (e.g Gaspari and Cohn, 1999;Houtekamer and Mitchell, 2001;Bishop and Hodyss, 2007;Anderson, 2012;Ménétrier et al., 2015).In this study we concentrate on distance-based localization.Distanced-based localization uses physical distance as a proxy for correlation strength and sets correlations to zero when the distance between the variables in question is sufficiently large.Localization is typically incorporated into the data assimilation in one of two ways -either through the P b matrix or through the observation error covariance R (Greybush et al., 2011).We are focusing on Schur (or elementwise) product localization applied directly to the P b matrix.The Schur product theorem (Horn and Johnson, 2012, Theorem 7.5.3)guarantees that if the localization matrix is positive semidefinite, then the localized estimate of P b is also positive semidefinite.Positive semidefiniteness of estimates of P b is essential for the convergence of variational schemes and interpretability of schemes like the Kalman Filter which are intended at minimizing the statistical variance of the estimation error.
The localization functions presented in this work are suitable for use in coupled DA, where two or more interacting largescale model components are assimilated in one unified framework.Coupled DA is widely recognized as a burgeoning and vital field of study.In Earth system modeling in particular, coupled DA shows improvements over single domain analyses (Sluka et al., 2016;Penny et al., 2019).However, coupled DA systems face unique challenges as they involve simultaneous analysis of multiple domains spanning different spatiotemporal scales.Cross-domain error correlations in particular are found to be spatially inhomogeneous (Smith et al., 2017;Frolov et al., 2021).Schemes that include cross-domain error correlations in the P b matrix are broadly classified as strongly coupled, which is distinguished from weakly coupled schemes where P b does not include any nonzero cross-domain error correlations.The inclusion of cross-domain correlations in P b offers advantages, particularly when one model domain is more densely observed than another (Smith et al., 2020).Strongly coupled DA requires careful treatment of cross-domain correlations with special attention to the different correlation length scales of the different model components.Previous studies, discussed below, show that appropriate localization schemes are vital to the success of strongly coupled DA.
As in single domain DA, there is a broad suite of localization schemes proposed for use in strongly coupled DA.Lu et al. (2015) artificially deflate cross-domain correlations with a tunable parameter.Yoshida and Kalnay (2018) use an offline method, called correlation-cutoff, to determine which observations to assimilate into which model variables and the associated localization weights.The distance-based multivariate localization functions developed in Roh et al. (2015) allow for different localization functions for each component and are positive definite, but require a single localization scale across all components.Other distance-based localization schemes allow for different localization length scales for each component, but are not necessarily positive semidefinite (Frolov et al., 2016;Smith et al., 2018;Shen et al., 2018).Frolov et al. (2016) report that their proposed localization matrix is experimentally positive semidefinite for some length scales.Smith et al. (2018) use a similar method and find cases in which their localization matrix is not positive semidefinite.
In this work, we build on these methods and investigate distance-based, multivariate, positive semidefinite localization functions and their use in strongly coupled DA schemes.We introduce a new multivariate extension of the popular fifth-order piecewise rational localization function of Gaspari and Cohn (1999) (hereafter GC).This function is positive semidefinite for all length scales and hence appropriate for Ensemble-Variational (EnVar) schemes.We compare this to another newly developed multivariate localization function that extends Bolin and Wallin (2016), and to two other functions from the literature (Daley et al., 2015).We investigate the behavior of these functions in a simple bivariate model proposed by Lorenz (1996).In particular, we look at the impact of variable localization on the cross-domain localization function.We show that these functions are compatible with variable localization schemes of Lu et al. (2015); Yoshida and Kalnay (2018).We find that, in some setups, artificially decreasing the magnitude of the cross-domain correlation hinders the assimilation of observations, while in other setups the best performance come when there are no cross-domain updates.We compare all of the multivariate functions to their univariate and weakly coupled analogs and observe that the new multivariate extension of GC outperforms all multivariate competitors.This paper is organized as follows.In Sect. to P b , the localization matrix L may be decomposed into a 2 × 2 block matrix so that the localized estimate of the background covariance matrix is given by where • denotes a Schur product.In distance-based localization, the elements in the L matrix are evaluated through a localization function L with a specified localization radius R, beyond which L is zero.For example, if P b ij is the sample covariance Cov(η i , η j ) where η i = η(s i ) denotes the background error in process X at spatial location s i ∈ R n , then the associated local- Often different model components will have different optimal localization radii and hence one may wish to use a different localization function for each model component (Ying et al., 2018).That is, we may wish to use a different localization function to form each submatrix of L in (1).Since we seek a symmetric L matrix, it suffices to construct L XX , L YY and L XY . .We define the cross-localization weight factor, β ≥ 0, as the value of the crosslocalization function at distance d = 0, i.e. β := L XY (0).The cross-localization weight factor β is restricted to be less than or equal to 1 in order to ensure positive semidefiniteness (Genton and Kleiber, 2015) and smaller values of β lead to smaller analysis updates when updating the X model component using observations of Y , and vice versa.Each function we consider has a different maximum allowable cross-localization weight, which we denote β max .Values of β greater than β max lead to functions that are not necessarily positive semidefinite, while values of β less than β max are allowable and may be useful in attenuating undesirable correlations between blocks of variables (Lu et al., 2015).Note that while this example shows model space localization for a coupled model with two model components, the localization functions we develop and investigate may also be used in observation space localization, and can be extended to an arbitrary number of model components.

Kernel convolution
Localization functions created through kernel convolution, such as GC, may be extended to multivariate functions in the fol- denotes convolution over R n , and k X , k Y are square integrable functions.For ease of notation let the kernels k X and k Y be normalized such that L XX (0) = L Y Y (0) = 1, which is appropriate for localization functions.Then the function As a proof, we define two processes Z j , where j can represent either X or Y , as the convolution of the kernel k j with a white noise field W: (2) It is straightforward to show that the localization functions we have defined are exactly the covariance functions for these two processes, L ij (d) = Cov(Z i (s), Z j (t)), with i, j = X, Y , locations s, t ∈ R n , and distance d = s−t .Thus {L XX , L Y Y , L XY } is a multivariate covariance function, and hence a multivariate, positive semidefinite function (Genton and Kleiber, 2015).
For localization functions created through kernel convolution the localization radii are related to the kernel radii.Suppose the kernels k X and k Y have radii c X and c Y , i.e. k j (d) = 0 for all d > c j .The convolution of two kernels is zero at distances greater than the sum of the kernel radii.Thus the implied within-component localization radii are R jj = 2c j , for processes j = X, Y .Further, the implied cross-localization radius is the sum of the two kernel radii R XY = c X + c Y .Equivalently, the cross-localization radius is the average of the two within-component localization radii, , which is how we will write it going forward.Interestingly, this is exactly the cross-localization radius used in Frolov et al. (2016) andSmith et al. (2018).
Unlike within-component localization functions, cross-localization functions created through kernel convolution will always have L XY (0) < 1 whenever k X ≡ k Y .The maximum allowable cross-localization weight factor (β := L XY (0)) is exactly the value produced through the convolution, i.e. β max = [k X * k Y ](0).Smaller cross-localization weight factors also lead to positive semidefinite functions since if {L XX , L Y Y , L XY } is a multivariate, positive semidefinite function, then so is {L XX , L Y Y , βL XY } for β < 1 (Roh et al., 2015).To aid in comparisons to other cross-localization functions, we re-define kernel-based cross-localization functions as, with β ≤ β max .In this way L XY (0) = β βmax [k X * k Y ] (0) = β which is consistent with our definition of the cross-localization weight factor in the previous section.We will experiment with the impact of varying β, but must always ensure β ≤ β max to maintain positive semidefiniteness.
For most kernels, closed form analytic expressions for the convolutions above are not available.In the following two sections we present two cases where a closed form is available.The kernels used in these two cases are the tent function (GC) and the indicator function (Bolin-Wallin).

Multivariate Gaspari-Cohn
The standard univariate GC localization function is constructed through convolution over R 3 of the kernel, k(r) ∝ 1 − r c + with itself.Here we define r = r with r ∈ R 3 and z + = max{z, 0}.The kernel has radius c and is normalized so that As discussed in the previous section, the localization radius, R, is related to the kernel radius through R = 2c.We develop a multivariate extension of this function through convolutions with two kernels, The resulting within-component localization functions are exactly equal to GC, Eq. (4.10) in Gaspari and Cohn (1999).The formula for the cross-localization function L (GC) is quite lengthy and is thus included in Appendix A.
Recalling from the previous section that the maximum cross-localization weight factor is , where for convenience we define

Multivariate Bolin-Wallin
We derive our second multivariate localization function through convolution of normalized indicator functions over a sphere in R 3 .As in the previous section, the kernels are supported on spheres of radii c X and c Y , where I cj (r) is an indicator function which is 1 if r ≤ c j and 0 otherwise.The resulting within-component localization function This is commonly referred to as the spherical covariance function.The label (BW ) references Bolin and Wallin, who performed the convolutions necessary to create the associated cross-localization function in a work aimed at a different application of covariance functions (Bolin and Wallin, 2016).While Bolin and Wallin never developed multivariate covariance (or in our case localization) functions, the algebra is the same.We present only the localization functions that result from the convolution over R 3 , though similar functions for R 2 and R n are available in Bolin and Wallin (2016).Note that there is a typo in Bolin and Wallin (2016), which has been corrected below.
Let c X > c Y be kernel radii, then the resulting cross-localization function is, Here V 3 (r, x) denotes the volume of the spherical cap with triangular height x of a sphere with radius r, which is given by As with multivariate GC, it is convenient to define a ratio of within-component localization radii by Then we can write the maximum cross-localization weight factor as β max = κ −3 .The cross-localization radius for BW is

Wendland-Gneiting functions
We compare the two functions of the preceding sections to the Wendland-Gneiting family of multivariate, compactly-supported, positive semidefinite functions.This family is not generated through kernel convolution, but rather through Montée and Descente operators (Gneiting, 2002).The simplest univariate function in this family is the the Askey function, which is given by with shape parameter ν and localization radius R. Other functions in this family are called Wendland functions.Several examples of univariate Wendland functions are displayed in Table 1.Porcu et al. (2013) developed a multivariate version of the Askey function, where the exponent in Eq. ( 9) can be different for each process while the localization radius R is constant across all processes.Roh et al. (2015) found that this multivariate Original Wendland Functions The general form for multivariate Wendland functions is, where ψ is defined as, with B the beta function, B(x, y) = The parameters ν and {γ ij } are related to the shape of the localization functions, and are necessary to guarantee positive semidefiniteness in a given dimension.The parameter k determines the differentiability of the Wendland functions at lag zero (Gneiting, 2002).Note that the Askey function in Eq. ( 10) is a special case of the Wendland function (11) which corresponds to the case k = 0. Daley et al. (2015) gave sufficient conditions on the parameters ν, k, R ij , γ ij , and β to guarantee that Eq. ( 11), and hence (10), is positive semidefinite.In particular for two processes X and Y , Eq. ( 11) is positive semidefinite on Going forward we consider the multivariate Askey function (10) and the multivariate Wendland function with k = 1 in (11).
Note that with both of these functions the cross-localization radius depends only on the smallest localization radius.In  2. Summary of important shape parameters for four cross-localization functions.

Experiments
In this section we investigate the performance of a data assimilation scheme using each of the four multivariate localization functions presented in Sect. 2. We choose a setup which isolates the impact of the cross-localization functions and relate the filter performance to important cross-localization shape parameters.As a baseline for comparison, we also test two simple approaches to localization for coupled DA.The first method uses a single localization function and radius to localize all withinand cross-component blocks of the background error covariance matrix, i.e.L XX ≡ L Y Y ≡ L XY .We call this approach univariate localization.In systems with very different optimal localization radii this type of univariate localization is likely to perform poorly, however it does provide a useful comparison point.The second approach uses different localization functions for each process and then zeroes out all cross-correlations between processes, i.e.L XX = L Y Y , and L XY ≡ 0. We call this approach weakly coupled localization as it leads to a weakly coupled data assimilation scheme.All of the experiments are run with the bivariate Lorenz 96 model, which is described below (Lorenz, 1996).

Bivariate Lorenz model
The bivariate Lorenz 96 model is a conceptual model of atmospheric processes and is comprised of two coupled processes with distinct temporal and spatial scales.The "small" process can be thought of as rapidly-varying small-scale convective fluctuations while the "large" process can be thought of as smooth large-scale waves.The model is periodic in the spatial domain, as a process on a fixed latitude band would be.
The "large" process, X, has K distinct variables, X k for k = 1, . . ., K. The "small" process, Y , is divided into K sectors, with each sector corresponding to one "large" variable X k .There are J "small" process variables in each sector, for a total of We represent the variables on a circle where the arc length between neighboring Y variables is 1.Equivalently, the radius of the circle is r = JK 2π .Variable Y j,k is located at (r cos(θ j,k ), r sin(θ j,k )) where θ j,k = 2π JK (J(k − 1) + j).We choose to place the variable X k , located at (r cos(φ k ), r sin(φ k )), in the middle of the sector whose variables are coupled to it, e.g. if J = 10 then X k is halfway in between Y 5,k and Y 6,k and φ k = 2π 10K (10(k − 1) + 5.5).The placement of these variables is illustrated in Fig. 2. The chord distance between any two variables is 2r sin ∆θ 2 , where ∆θ is the angle increment, e.g. the angle increment between Y j1,k1 and Y j2,k2 is ∆θ = |θ j1,k1 − θ j2,k2 |.
The governing equations are, We follow Lorenz (1996) and let K = 36, J = 10, so there are 36 sectors and 10 times more "small" variables than "large" variables.We let a = 10 and b = 10, indicating that convective scales fluctuate 10 times faster than the larger scales, while their amplitude is around 1/10 as large.For the forcing we choose F = 10, which Lorenz (1996) found to be sufficient to make both scales behave chaotically.All simulations are performed using an adaptive fourth-order Runge-Kutta method with relative error tolerance 10 −3 and absolute error tolerance 10 −6 (Dormand and Prince, 1980;Shampine and Reichelt, 1997).The solutions are output each assimilation cycle.Unless otherwise specified, the assimilation cycles are separated by a time interval of 0.005 model time units (MTU), which Lorenz (1996) found to be similar to 36 minutes in more realistic settings.This time scale is 10 times shorter than the time scale typically used in the univariate Lorenz 96 model.The factor of 10 is consistent with the understanding that the "small" process evolves 10 times faster than the "large" process, where the "large" process is akin to the univariate Lorenz 96 model.In choosing the coupling strength, we follow Roh et al. (2015) and set h = 2, which is twice as strong as the coupling used by Lorenz.Varying the coupling strength h across values { 1 2 , 1, 4} changes the magnitude of the analysis errors, but does not change the relative performance of different localization functions in our experiments.

Assimilation scheme
In our experiments we use the stochastic Ensemble Kalman Filter (EnKF) (Evensen, 1994;Houtekamer and Mitchell, 1998;Burgers et al., 1998).The EnKF update formula for a single ensemble member is where x a is the analysis vector, x b is the background state vector, y is the observation, each element of η is a random draw from the probability distribution of observation errors, and H is the linear observation operator.The Kalman gain matrix K is where P b is the background error covariance matrix and R is the observation error covariance matrix.The background covariance matrix is approximated by a sample covariance matrix from an ensemble, x i for i = 1, . . ., N e where N e is the ensemble size.In this experiment we use the adaptive inflation scheme of El Gharamti (2018) and inflate each prior ensemble member through, where Λ is a diagonal matrix with each element on the diagonal containing the inflation factor for one variable and x b is the background ensemble mean.We then use x b λ in place of x b in Eq. ( 16) and in the estimation of P b .Note that estimating P b with the inflated ensemble is equivalent to estimating it with the original ensemble and then multiplying by Λ 1/2 on the left and right, Λ 1/2 P b Λ 1/2 .The Bayesian approach to adaptive inflation in El Gharamti (2018) uses observations to update the inflation distribution associated with each state variable.The inflation prior has an inverse gamma distribution with shape and rate parameters determined from the mode and prior inflation variance.In this study we initialize the inflation factors with Λ = 1.1I,where I is the identity matrix.The localization matrix L is incorporated into the estimate of the background covariance matrix through a Schur product as in Eq. (1).
We use N e = 20 ensemble members, except where otherwise noted.The small ensemble size is chosen to accentuate the spurious correlations and hence the need for effective localization functions.We run each DA scheme for 3,000 analysis cycles, discarding the first 1,000 cycles and reporting statistics from the remaining 2,000 cycles.Each experiment is repeated 50 times with independent reference states, which serve as the "truth" in our experiments.We generate "observations" by adding independent Gaussian noise to the reference state.

Experimental design
The experiments described in this section compare the performance of each of the four multivariate localization functions presented in Sect. 2. Performance is measured through the root-mean-square distance between the analysis mean and the true state, which we refer to as the root-mean-square error (RMSE).We present scaled analysis errors to aid in comparison between the "large" and "small" components.We test the performance of multivariate localization functions using three different observation operators.First we observe all "small" variables and none of the "large" variables.In this setup we isolate the impact the of the cross-localization function, which allows us to make conjectures about important cross-localization shape parameters.Next we flip the setup and observe all "large" variables and none of the "small" variables.We compare and contrast our findings with those from the previous case.
Finally, we observe both processes and observe behavior reminiscent of both of the previous cases.The experimental setups are grouped by observation operator below.The source code for all experiments is publicly available (see Code Availability).

Observe only the "small" process
To isolate the impact of the cross-localization functions we fully observe the "small" process and do not observe the "large" process at all.In this configuration, analysis increments of the "large" process can be fully attributed to the cross-domain assimilation of observations of the "small" process.The treatment of cross-domain background error covariances plays a crucial role in the analysis of the "large" process, so we expect that changes in the cross-localization function will lead to changes in the "large" process analyses.All observations are assimilated every 0.005 MTU.We use an observation error variance of σ 2 Y = 0.005 both in the generation of synthetic observations from the reference state and in the assimilation scheme.The observation error variance is chosen to be about 5% of the climatological variance of the Y process.We also run the experiment with σ 2 Y = 0.02, or about 20% of climatological variance, and find that the analyses are degraded, but the relative performance of the different localization functions is the same.
The localization parameters we use in this experiment are given in Table 3.For all functions we use the maximum allowable cross-localization weight factor, β = β max .In estimating the optimal cross-localization weight factor we find that since the only updates to X are through observations of Y , smaller cross-localization weight factors lead to degraded performance (Appendix B).  3. Localization parameters for the experiment observing only the "small" process.Note that weakly coupled parameters are not shown in this table because they are exactly equal to the multivariate parameters except with β = 0.

Observe only the "large" process
Next we investigate the impact of the different localization functions when we fully observe the "large" process and do not observe the "small" process at all.The "large" process fluctuates about 10 times more slowly than the "small" process, so we use an assimilation cycle length that is 10 times longer than the one in the previous configuration.All observations are assimilated every 0.05 MTU.We use an observation error variance of σ 2 X = 0.28 both in the generation of synthetic observations from the reference state and in the assimilation scheme.The observation error variance is chosen to be about 5% of the climatological variance of the X process.We also run the experiment with σ 2 X = 1.1, or about 20% of climatological variance, and find that the X analyses are degraded, but the relative performance of the different localization functions is the same.The localization parameters we use in this experiment are given in Table B1.We find that the analysis errors are similar with all values of β.
For consistency with the previous experiment we use the maximum allowable cross-localization weight factor, β = β max .

Observe both processes
Finally, we observe both processes and note the impact of the different localization functions.In this configuration we observe 75% of the variables in each process, with the observation locations chosen randomly for each trial.All observations are assimilated every 0.005 MTU, in line with the analysis cycle length for the observation of the "small" process only.We use observation error variances of σ 2 Y = 0.01 and σ 2 X = 0.57 in the generation of observations and in the assimilation scheme.The observation error variance is chosen to be about 10% of the climatological variance of each process.We also run the experiment with σ 2 Y = 0.02 and σ 2 X = 1.1, or about 20% of climatological variance, and find that the performance is similar.
The localization parameters we use in this experiment are given in Table B2.We find that the analysis errors grow with increasing β.Nonetheless, to distinguish between multivariate localization, which allows for cross-domain information transfer, and weakly coupled localization, which does not, we use β = β max for all multivariate functions.

Observe only the "small" process
Figure 3 shows the distribution of analysis errors for the configuration described in Sect.3.3.1.With weakly coupled localization functions no information is shared in the update step between the observed Y process and the unobserved X process.
This leads to no updates of the X variables and eventually to catastrophic filter divergence.In principle the system might be able to synchronize the unobserved ("large") process through dynamical couplings with the observed ("small") process, but in our setup this does not happen.Hence weakly coupled localization functions are not included in the figure.The analysis error distributions for the observed Y process are similar for all functions except multivariate Wendland.For the unobserved X process, the analysis errors are comparable across all of the univariate localization functions.This is consistent with the fact that all of the univariate localization functions have similar shapes as seen in the second panel of and smoothness near zero.For example, while the Askey cross-localization function peaks higher than GC, GC is generally smoother near zero and has a larger cross-localization radius.All of these differences in shape impact how much information propagates across model domains.Based on its height and width, we hypothesize that GC allows for sufficient cross-domain information propagation at both small and long distances and this is why multivariate GC outperforms all other functions in this experiment.

Observe only the "large" process
When we observe only the "large" process (as described in Sect.3.3.2),we find that all localization functions lead to very similar performance.In this case the shape of the localization function is not important.Rather, the dynamics of the bivariate Lorenz model are driving the behavior.In this configuration, the true background error cross-correlations are very small (less than 0.1 even at small distances).The Y variables are restored towards h b X in their sector (Eq.15).Thus even when the assimilation does not update the Y variables, we expect to recover the mean of the Y process.Based on climatology we find that the conditional mean of Y j,k given X k = x is E[Y j,k |x] ≈ 0.0559x.The median root-mean-square difference between Y  (Hoffmann, 2015).
Analysis errors are calculated as RMS deviations from the "truth" and are scaled by the climatological standard deviation of the associated process.All four univariate localization functions perform similarly, while there is a greater range in performance for the multivariate versions of these functions.Multivariate Gaspari-Cohn shows improvement over its univariate counterparts.Univariate and multivariate Bolin-Wallin and Askey functions appear to perform similarly.For Wendland, the multivariate function performs significantly worse than the univariate function.
and its conditional mean is 0.294.Our results show that the median RMSE in the Y process ranges from 0.294 to 0.297.Thus, the filter does not improve upon a simple linear conditional mean prediction, which is perhaps unsurprising given the small magnitude of error cross-correlations.Figure 4 shows the distribution of analysis errors for univariate, weakly coupled, and multivariate GC.The distributions for other functions are nearly identical and hence not shown.(Hoffmann, 2015).
Analysis errors are calculated as RMS deviations from the "truth" and are scaled by the climatological standard deviation of the associated process.Left: results from the experiment where we observe only the "large" process.All functions perform similarly.Right: results from the experiment where we observe both processes.The weakly coupled localization functions appear to lead to the best performance, but are highly unstable.

Observe both processes
When we observe both processes the precise shape of the localization function appears to have little impact.We do see differences between univariate, weakly coupled, and multivariate localization functions.Figure 4 shows analysis error distributions for the three different versions of GC, which are broadly representative of the behavior seen in other functions as well.This configuration is quite unstable.About 80% of the trials with weakly coupled localization functions lead to catastrophic filter divergence.Trials with univariate and multivariate localization diverge less often, but still diverge about 20% of the time.Figure 4 shows results from only the trials (out of 50 total) which did not diverge.Weakly coupled localization appears to lead to the best performance, when the filter does not diverge.There is some variation in the results across the different localization functions.In particular, multivariate Askey appears to lead to better performance than weakly coupled Askey, but this may be attributable to the issues with stability.Catastrophic filter divergence is a well-documented but poorly understood phenomenon (Gottwald and Majda, 2013;Houtekamer and Zhang, 2016, Appendix A.b).The mechanism is understood in highly-idealized models (Kelly et al., 2015), but the dynamics of instability in models as simple as the bivariate Lorenz-96 model remains unclear and is outside the scope of the present investigation.
The complicated story with the weakly coupled schemes indicates that, in this configuration, filter performance is highly sensitive to the treatment of cross-domain background error covariances.The small ensemble size combined with small true forecast error cross-correlations can lead to spuriously large estimates of background error cross-covariances.Meanwhile, we have nearly complete observations of both processes, so within-component updates are likely quite good.Thus, zeroing out the cross terms, as in weakly coupled schemes, may improve state estimates.On the other hand, inclusion of some cross-domain terms appears to be important for stability.

Conclusions
In this work we developed a multivariate extension of the oft-used GC localization function, where the within-component localization functions are standard GC with different localization radii, while the cross-localization function is newly constructed to ensure that the resulting localization matrix is positive semidefinite.A positive semidefinite localization matrix guarantees, through the Schur product theorem, that the localized estimate of the background error covariance matrix is positive semidefinite (Horn and Johnson, 2012, Theorem 7.5.3).We compared multivariate GC to three other multivariate localization functions (including one other newly presented multivariate function), and their univariate and weakly coupled counterparts.We found that the performance of different localization functions is highly dependent on the observation operator.When we observed only the "large" process, all localization functions performed similarly.In an experiment where we observed both processes, weakly coupled localization led to the smallest analysis error.When we observed only the "small" process, multivariate GC led to better performance than any of the other localization functions we considered.We hypothesized that the shape of the GC cross-localization function allows for larger cross-domain assimilation than the other functions.There is still an outstanding question of how multivariate GC will perform in other, perhaps more realistic, systems.
We found that choosing an appropriate cross-localization weight factor, β, is crucial to the performance of the multivariate localization functions.This parameter determines the amount of information which is allowed to propagate between co-located variables in different model components.We found that this parameter should be as large as possible when observing only the "small" process.By contrast, the parameter should be small or even zero when both processes are well observed.This is con-sistent with other studies which have shown the value in deflating cross-domain updates between non-interacting processes (Lu et al., 2015;Yoshida and Kalnay, 2018).
A natural application of this work is localization in a coupled atmosphere-ocean model.The bivariate Lorenz 96 model has a linear relationship between the large and small scales.Hence the results presented here are relevant to linear coupling in climate models, e.g. the sensible heat exchange between ocean and atmosphere which is approximately linearly proportional to the temperature difference.Multivariate GC allows for within-component covariances to be localized with GC exactly as they would be in an uncoupled setting, using the optimal localization length scale for each component (Ying et al., 2018).
The cross-localization length scale for GC is the average of the two within-component localization radii, which is the same as the cross-localization radius proposed in Frolov et al. (2016).We hypothesize that the cross-localization radius is important in determining filter performance.However, the functions considered here did not allow for a thorough investigation of the optimal cross-localization radius, which is an important area for future research.In both cases, the notation is significantly simplified when we let c ) where β max = 5 2 κ −3 − 3 2 κ −5 and β ≤ β max .Note that when we take c X → c Y , which implies κ → 1, this multivariate function converges to the standard univariate GC function, as we would expect.
The second case to consider is c X > 2c Y .Again, let c X = κ 2 c Y .In this case, the cross-localization function is where, as in the above case,

A2 Derivation of multivariate Gaspari-Cohn cross-localization function L (GC) XY
The multivariate GC cross-localization function is created through the convolution of two kernels, L (GC) , and r ∈ R 3 .Theorem 3.c.1 from Gaspari and Cohn (1999) provides a framework for evaluating the necessary convolutions.It is shown that for radially symmetric functions k j (r) = k 0 j (||r||) compactly supported on a sphere of radius c j , j = X, Y , with c Y ≤ c X the convolution over R 3 given by With this framework, we are now able to compute the cross-localization function using the GC kernels.We first compute the normalization factor P 0 jj (0) using GC kernels.Plugging in k 0 j (r) = (1 − r/c j ) + gives, Thus the denominator in Eq. ( A6) is To compute the numerator in Eq. (A6), which is precisely (A4), we consider eight different cases, four cases for each formula presented above.
The case c X > 2c Y and 0 ≤ |d| < c Y is shown in detail here.The other cases are derived similarly and are not shown.The inner integral in Eq. ( A4) is Substituting (A11) into (A4) we see, With the proper normalization, we have the cross-localization function, Now make the substitution κ 2 = c X c Y and this becomes Other cases are calculated similarly, with careful consideration of the bounds of the kernels and integrals.Cross-correlations (pink and light blue) are small everywhere, but still significant out to at least 20 spatial units.
Using this univariate localization radius, we now estimate ν for univariate Askey and Wendland.To maintain positive semidefiniteness we require ν ≥ 1 for Askey and ν ≥ 2 for Wendland.We compare analysis errors for process X and Y with ν ∈ [1, 1.5, 2, 2.5] for Askey and ν ∈ [2, 2.5, 3, 3.5] for Wendland.In general we find that smaller value of ν lead to less peaked localization functions and better performance, and choose ν to be as small as possible, i.e. ν = 1 for Askey and ν = 2 for Wendland.
Next we estimate the optimal multivariate localization radii.We want to eliminate as much ambiguity as possible in our com-   , 3, 4, 5, 6, 7, 9].The guarantee of positive semidefiniteness In this setup, the best performance generally comes when the cross-correlation is at or near its maximum allowable value, as shown in Fig. B3. Figure B3 shows visually that the GC cross-correlation is always greater than the BW cross-correlation, which is easily verified analytically since κ −3 ≤ 5 2 κ −3 − 3 2 κ −5 for all κ ≥ 1 (true by the definition of κ).Similarly we see that the cross-localization weight factor for Askey is greater than cross-localization weight factor for Wendland across the range of parameters considered here.
The localization parameters for the other two observation operators are estimated following the same procedure.The localization parameters for the experiment where we observe only the "large" process are given in Table B1.The localization 2 we present two new multivariate localization functions and two multivariate localization functions from the literature.In Sect. 3 we describe experiments with the bivariate Lorenz 96 model.We conclude in section 4. 2 Multivariate localization functions 2.1 Multivariate localization background Consider the background error covariance matrix P b of a strongly coupled DA scheme with interacting model components X and Y .The P b matrix may be written in terms of within-component background error covariances for components X and Y (P b XX and P b YY ) and cross-domain covariances between X and Y (P b XY and P b YX ).Strongly coupled DA is characterized by the inclusion of nonzero cross-domain covariances in P b XY and P b YX .Here P b XY controls the effect of system X on Y and vice versa for P b YX .Since P b is symmetric, P b XY is necessarily equal to the transpose of P b YX , i.e.P b XY = P b YX .Similar is a compatible cross-localization function in the sense that, when taken together {L XX , L Y Y , L XY } is a multivariate, positive semidefinite function.

Figure 1 .
Figure 1.Four multivariate localization functions are shown in three panels.The first panel shows the function LXX used to localize the "large" process, X.The second panel shows the function LY Y used to localize the "small" process, Y .The third panel shows the crosslocalization function LXY .In each panel, the color of the line shows the different multivariate functions: Gaspari-Cohn (green, solid), Bolin-Wallin (dark, dashed), Askey (dark, dotted), and Wendland (dark, dash-dot).In the case of univariate localization, the functions in the middle panel are used to localize all processes.The within component localization radii are RY Y = 15 and RXX = 45 for all functions.The cross-localization radii are RXY = 30 for Gaspari-Cohn and Bolin-Wallin and RXY = 15 for Askey and Wendland.
as a ratio of the withincomponent localization radii.As with all localization functions created through kernel convolution, the cross-localization radius is the average of the within-component localization radii, R XY = 1 2 (R XX +R Y Y ).An example multivariate GC function with R XX = 45, and R Y Y = 15, and β = β max is shown in Fig. 1.The multivariate GC localization function for three or more model components is discussed in Appendix A3.
because it is created through kernel convolution.An example multivariate BW function with R XX = 45, R Y Y = 15, and β = β max is shown in Fig. 1.

Figure 1 :Figure 2 .
Figure 1: Inspiration from the Wilks 2005 paper in QJRMS.Figure can be adapted for other values of K and J.
RMS errors are divided by the climatological standard deviation for each process.To standardize the comparison of the different shapes, we use the same within-component localization radii for all multivariate functions.We also investigate the performance of univariate and weakly coupled localization functions.The univariate localization functions are chosen to be equal to the within-component localization function for Y , i.e.L ≡ L Y Y .The within-component weakly coupled localization functions are equal to the within-component multivariate localization functions.However, the weakly coupled cross-localization functions are identically zero.The free localization function parameters are chosen to balance performance of the univariate, multivariate, and weakly coupled forms of each localization function.We estimate these parameters through a process which we describe in Appendix B.

Fig. 1 .
The multivariate localization functions, on the other hand, show great diversity of performance.The Wendland function leads to significantly worse performance with the multivariate function when compared to the univariate functions.BW and Askey functions perform similarly for both the multivariate and univariate cases.Out of all of the localization functions we consider, the best performance is achieved with multivariate GC.To understand the improved performance with multivariate GC, we consider two different shape parameters.Recall from Sect.3.3.1 that smaller cross-localization weights led to worse performance when holding all other localization parameters fixed.Extending this finding, we hypothesize that functions with a larger β max will allow for more information to propagate across model domains, thereby improving performance in this setup.With the chosen localization parameters, the multivariate Askey function has the largest cross-localization weight factor with β max ≈ 0.46, followed by GC with β max ≈ 0.38.A visual representation of the cross-localization weight factor is shown as the height of the cross-localization function at zero in the third panel of Fig.1.The shape of each cross-localization function varies not only in its height at zero, but also in its radius

Figure 3 .
Figure 3. Violin plots show the distribution of analysis errors for the X and Y process with different localization functions(Hoffmann, 2015). 375

Figure 4 .
Figure 4. Violin plots show the distribution of analysis errors for all versions of the Gaspari-Cohn localization function(Hoffmann, 2015).
Appendix A: Derivation of multivariate Gaspari-Cohn A1 Multivariate Gaspari-Cohn cross-localization function Let c X , c Y be the kernel radii associated with model components X and Y .Without loss of generality, we take c X > c Y .The formula depends on the relative sizes of c X and c Y , with two different formulas for the cases (i) c X < 2c Y and (ii) c X ≥ 2c Y .

Figure B1 .
Figure B1.True forecast error correlations for variables in the middle of each sector, X k and Y 5,k .Correlations between Y variables (dark blue) decay to zero after about 5 spatial units, while correlations between X variables (dark red) are significant up to 40 spatial units away.
parison of univariate and multivariate localization functions.For this reason we choose to set the univariate localization radius equal to one of the within-component localization radii.From Fig.B1we know that the univariate localization radius R = 15 is closer to the range of significant true forecast error correlations for process Y than for process X so we set R Y Y = R = 15.Now for the within-X localization radius, we consider the following localization values: R XX ∈[30, 40, 45, 50, 60, 75].For Askey and Wendland we use R XY = min{R XX , R Y Y }, and γ ij = 0 for all i, j = X, Y .For all functions we use β max as the cross-localization weight factor.The analysis errors for both processes are minimized with values of R XX between 40 and 50 (not shown).Informed by the true forecast error correlations, we pick R XX = 45.Now we turn to the cross-localization radius.For Gaspari-Cohn and Bolin-Wallin R XY is determined byR XX and R Y Y , with R XY = 1 2 (R XX + R Y Y ).For Askey and Wendland we require R XY ≤ min{R XX , R Y Y } to maintain positive semidefiniteness.From the true forecast error correlations we see that the correlation length scale for X is larger than the cross-correlation length scale, which is in turn larger than the length scale for Y .This intuition tells us that, ideally we would have R Y Y < R XY < R XX .However, because of the requirement for positive semidefinitess in Askey and Wendland the closest we can come is R Y Y = R XY < R XX .We could choose to use a smaller cross-localization radius, but the true forecast error correlation indicates that this would be a mistake, as there are non-negligible cross-correlations out past 15 units.Thus, we choose R XY = R Y Y = min{R Y Y , R XX }.

Figure B2 .
Figure B2.Analysis errors for different univariate localization radii.Analysis errors are calculated as RMS deviations from the "truth" and are scaled by the climatological standard deviation of the associated process.Considering all functions, the best performance comes when R = 15.

Figure B3 .
Figure B3.Left: Maximum cross-localization weight factor as a function of RXX /RY Y .Right: Average analysis errors are shown on the y-axis for different multivariate functions.The top (bottom) plot shows analysis errors for the X (Y ) process.For all functions, as the crosslocalization weight factor increases, the analysis errors decrease.Analysis errors are calculated as RMS deviations from the "truth" and are scaled by the climatological standard deviation of the associated process.
The remaining submatrix L YX is determined through L YX = L XY .Let L XX and the L Y Y be the localization functions associated with model components X and Y respectively.A fundamental difficulty in localization for strongly coupled DA is how to propose a cross-localization function L XY to populate L XY , and hence L Similar block localization matrices are used in scale-dependent localization, where X and Y are not components, but rather a decomposition of spectral wavebands from a single process.Scale-dependent localization aims to use a different localization radius for each waveband, which leads to the same question of how to construct the between-scale localization blocks.Buehner and Shlyaeva (2015) constructed L XX and L YY through evaluation of localization functions with different radii.They then constructed the cross-localization matrix through L XY = (L XX ) 1/2 (L YY ) T/2 , with L YX defined analogously.This is appropriate for scale-dependent localization where X and Y are defined on the same grid and hence L XX and L YY are of the same dimension.It is still an open question how to extend this construction to the strongly coupled application where different components are defined on different grids.The multivariate localization functions we construct below could also be used in scale-dependent localization.In our comparison of multivariate localization functions, we investigate the impact of the shape parameters cross-localization radius, and cross-localization weight factor.The cross-localization radius, R XY , is the smallest distance such that for all d > R XY we have L XY (d) = 0.For all of the functions in this study, the cross-localization radius is related to the withincomponent localization radii R XX and R Y Y (Genton and Kleiber, 2015)block localization matrix L is formed through evaluation of {L XX , L Y Y , L XY } then L is positive semidefinite.We call this collection of component functions a multivariate positive semidefinite function if it always produces a positive semidefinite L matrix(Genton and Kleiber, 2015).We refer to multivariate positive semidefinite functions as multivariate localization functions when they are used to localize background error covariance matrices.In this study we compare four different multivariate localization functions, including one that extends GC.

Table 1 .
5 Selected univariate Wendland functions localization function outperforms common univariate localization methods when assimilating observations into the bivariate Lorenz 96 model.Daley et al. (2015) extended the work of Porcu et al. (2013) and constructed a multivariate version of general Wendland-Gneiting functions that allows for different localization radii for different processes.The multivariate Askey function from Daley et al. (2015) has the form, fact, we choose R XY = min{R XX , R Y Y }, although smaller values for R XY also produce positive semidefinite functions.Thus for given R XX and R Y Y , the cross-localization radius for Askey and Wendland functions is always smaller than the cross-localization radius for GC and BW.With the choice R XY = min{R XX , R Y Y }, we see that β max depends on the ratio max{R XX ,R Y Y } min{R XX ,R Y Y } , as in GC and BW.Examples of multivariate Askey and Wendland functions with R XX = 45, R Y Y = 15, R XY = 15, and β = β max are shown in Fig. 1.Important parameters for the four multivariate localization functions presented in this section are summarized in Table2.