Multivariate localization functions for strongly coupled data assimilation in the bivariate Lorenz 96 system

Stanley, Zofia; Grooms, Ian; Kleiber, William

doi:10.5194/npg-28-565-2021

Articles | Volume 28, issue 4

https://doi.org/10.5194/npg-28-565-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/npg-28-565-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 28, issue 4

Research article

|

15 Oct 2021

Research article |

| 15 Oct 2021

Multivariate localization functions for strongly coupled data assimilation in the bivariate Lorenz 96 system

Zofia Stanley, Ian Grooms, and William Kleiber

Download

Final revised paper (published on 15 Oct 2021)
Preprint (discussion started on 03 Mar 2021)

Interactive discussion

Status: closed

RC1:
'Comment on npg-2021-8', S.G. Penny, 27 Mar 2021

General Comments:

The development of localization schemes for coupled dynamics is an important activity that needs increased attention as operational forecast centers transition to greater reliance on coupled Earth system forecast models. The authors provide a promising advancement to address the localization of cross-domain error correlations. I believe the work should be published after the authors explore a larger parameter space for their experimental results, as described below. In exploring a larger parameter space, it may be sufficient to focus on one or two leading methods (e.g. GC and BW).

One concern is the choice of model, and how well the results can transfer to more realistic scenarios, given the near linear relationship between the slow and fast components in this system (e.g. see S. Rasp note referenced below). Do the authors have confidence that the results can translate in some way to more sophisticated systems? I would be interested to know how the results change as the coupling strength between the slow and fast components is weakened or strengthened from the baseline state used by the authors.

A second concern is the restriction to observing only the fast dynamics. I would like to see a complete investigation examining the observation of fast-only, slow-only, and the full slow-fast coupled system. I’d like to see Figure 3 repeated for a few different scenarios, including those just mentioned, but also potentially varying parameters of the EnKF, such as the frequency of observations, the density of observations, the amount of observation noise, the length of the analysis cycle, etc. Not all results need to be reported in figures, but some indication that the authors have explored more variations in the problem specification would help to build confidence in the robustness of the final reported results.

Minor issue: There are a few instances where the present tense is used when it should be past tense.

Specific Comments:

L 8:

“The functions produce non-negative definite localization matrices, which are suitable for use in

variational data assimilation schemes.”

I think the term ‘positive semidefinite’ is more common, and the one originally used by Gaspari and Cohn (1999). I would suggest changing all instances of this throughout the manuscript.

L 14-16:

“The background error covariance statistics stored in B dictate how information from observations propagates through the domain during the assimilation step (Bannister, 2008)”

The term ‘propagates’ seems appropriate for 4D-Var, but perhaps not for all DA methods. More generally, the background error covariance provides a structure function that determines how observed quantities affect the model state variables, which is of particular importance when the state space is not fully observed.

L 25:

“Localization is typically incorporated into an ensemble estimate of B through a Schur (or element-wise) product.”

I would change this to say that localization is typically incorporated into the data assimilation in one of two ways - either through the B matrix using a Shur product, or through the observation error covariance R (e.g. Greybush et al., 2011). You are focusing on the localization applied directly to the B matrix.

Greybush et al., 2011: Balance and Ensemble Kalman Filter Localization Techniques.

https://journals.ametsoc.org/view/journals/mwre/139/2/2010mwr3328.1.xml

L 32-33:

“In Earth system modeling in particular, coupled DA shows improvements over single domain analyses (Penny et al., 2017; Zhang et al., 2020)”

Additional sources that determined this point clearly are Sluka et al. (2016) and Penny et al. (2019):

Sluka, T., S.G. Penny, E. Kalnay, and T. Miyoshi, 2016: Using Strongly Coupled Ensemble Data Assimilation to Assimilate Atmospheric Observations into the Ocean. Geophys. Res. Lett., 43, doi:10.1002/2015GL067238.

Penny, S.G., E. Bach, K. Bhargava, C-C. Chang, C. Da, L. Sun, T. Yoshida, 2019: Strongly coupled data assimilation in multiscale media: experiments using a quasi-geostrophic coupled model. Journal of Advances in Modeling Earth Systems, 11. https://doi.org/10.1029/2019MS001652 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019MS001652

L 35-37:

“Schemes that include cross-domain [error] correlations in the B matrix are broadly classified

as strongly coupled, which is distinguished from weakly coupled schemes where B does not include any nonzero cross-domain [error] correlations. The inclusion of cross-domain [error] correlations in B offers advantages”

To be more precise, the term “cross-domain error correlations” should be used if referred to the error covariance matrix B.

L 55:

I’ll note that Lorenz himself cited this as (Lorenz, 1996). See comment below regarding line 461.

L 57-58:

“We find that, in our set up, artificially decreasing the magnitude of the cross-domain correlation hinders the assimilation of observations.”

This is a positive sign for the advancement strongly coupled DA, but I wonder if this could be partly due to the use of the Lorenz system II, which has some highly linear relationships between the small and large scale systems. Some discussion was given, for example, in this blog post by Stephan Rasp:

https://raspstephan.github.io/blog/lorenz-96-is-too-easy/

L 61:

“localization function[s] from the literature.”

L 77-78:

“A fundamental difficulty in localization for strongly coupled DA is how to propose a cross-localization function LXY to populate both LXY and LYX”

It might be useful to explain at this point which term controls the effect of system X on Y, and Y on X.

L 102:

“we define two processes Zj , j = X,Y”

I understood this on the third read through. Perhaps the authors could reword this sentence slightly to make it more clear. For example,

“we define two processes Zj, where j can represent either X or Y”

Or simply,

“we define two processes Zj, with j=X,Y”

L 105-106:

“ Thus LXX, LYY ,LXY form a multivariate covariance function, and hence a multivariate, non-negative definite function”

Based on the terminology defined so far, I’m not sure how to interpret the triple (LXX,LYY,LXY) forming a single function. Perhaps a line or two could be added to explain this step.

L 118:

The way I am interpreting the notation is that the term (1-r/c)_+ is zero when the term in parentheses is less than or equal to 0, which would occur when r>=c. Can the authors explain the comment about the convolution being zero at distances greater than 2c in line 120, it is not immediately obvious.

L 139:

“who perform[ed] the”

L 140:

“in never develop[ed] multivariate”

L 156:

“Porcu et al. (2013) develop[ed] a multivariate version”

“et al.” is short for the Latin term “et alia,” meaning “and others.” It is strange to reference the actions of Porcu “and others” in the year 2013 using present tense.

L 157:

“Roh et al. (2015) [found] that”

L 159:

“Daley et al. (2015) extend[ed] the work”

L 166:

“with B the beta function”

Could you define this here for clarity.

L 168:

“Daley et al. (2015) [gave]”

L 184:

“This approach leads to a “weakly” coupled scheme, which is not the focus of this work.”

I understand this may not be the focus, but it seems that it would be appropriate to compare to this approach given that the weakly coupled DA scheme is the standard approach for current operational forecast systems.

L 184-186:

“Additionally, in our setup we observe only one of the two processes and we find that when the assimilation is not allowed to update the unobserved process the result is prone to catastrophic divergence”

It might be appropriate to perform a few experiments where both components are observed, and results are compared using weakly and strongly coupled DA.

L 200/202/205:

“ Lorenz (199[6])”

See comment below for line 461.

L 203:

“using an adaptive fourth-order Runge-Kutta method”

Perhaps provide a citation for the method.

L 204-205:

“The solutions are output with a time interval of 0.005 nondimensional units, or 36 minutes”

It seems strange to say there are non-dimensional units and then indicate that it is the same as 36 minutes. Perhaps repeat some of the justification from Lorenz to indicate the relative error growth rates and its relation to more realistic applications that would be approximately equivalent to 36 minutes in operational prediction in the early 1990’s.

Figure 2 caption:

“setup” is a noun that means “the way in which something… is organized, planned, or arranged.” This should probably be used in most places where the authors current use two words: “set up“.

L 209:

“Increasing the coupling strength leads to larger covariances between the forecast errors in processes X and Y , thereby making the effect of cross-localization more pronounced and easier to study.”

I believe this is the case. However, I would like to see some sensitivity study of how the benefit of strongly coupled DA paradigm breaks down as the coupling strength between the two components weakens and asymptotes to 0.

L 213:

“We choose to place the variable Xk in the middle”

Does the placement of the X variable have any influence on the results of localization? Is there any sensitivity here, or are the results generally the same regardless of how the placement of the X and Y variables are interpreted?

L 218:

“We develop localization functions for EnVar schemes where non-negative definiteness of the localization matrix is essential to ensure convergence of the numerical optimization. Since the minimizer of the 3D-EnVar objective function is the same as the EnKF analysis mean in the case of linear observation (Lorenc, 1986), in this experiment we make use of the EnKF rather than implement an ensemble of 3D-EnVar assimilation scheme (Evensen, 1994; Houtekamer and Mitchell, 1998; Burgers et al., 1998)”

This discussion is a bit confusing. I think it could be cleaned up with a little reorganization, e.g.

We develop localization functions for data assimilation schemes that rely on Schur product modification of the background error covariance matrix B. In our experiments we use the stochastic EnKF (Evensen, 1994; Houtekamer and Mitchell, 1998; Burgers et al., 1998). However, because the minimizer of the 3D-EnVar objective function is the same as the EnKF analysis mean in the case of linear observations (Lorenc, 1986), our results translate to EnVar schemes as well. The positive semi-definiteness of the localization matrix is essential to ensure convergence of the numerical optimization methods used to implement EnVar <cite>.

L 230-231:

“In this experiment we use the adaptive inflation scheme of El Gharamti (2018) and apply the inflation to the prior estimate.”

Can this be added to the EnKF equations above for clarity?

L 233-234:

“We run each DA scheme for 3,000 time steps, discarding the first 1,000 time steps and reporting statistics from the remaining 2,000 time steps.”

Is this referring to model time steps, or the number of analysis cycles?

L 235-237:

“The observation operator H is such that all of the Y variables are observed, and none of the X variables are observed. In this way we can isolate the effect of the localization on the performance of the filter for the X variable.”

This means you are observing the fast dynamics and using this to update the slow dynamics through the error covariance statistics. This has been shown effective in a number of studies exploring strongly coupled DA. Penny et al. (2019) showed that the reverse was also possible, particularly if the size of the analysis window is decreased (or the frequency of observation updates is increased).

L 256:

“, so that we hypothesize that GC allows”

Change to:

“, so we hypothesize that GC allows”

L 266-267:

“ By contrast, the BW and Askey functions show virtually no difference between the multivariate and univariate versions”

The BW method looks slightly improved, and the Askey method slightly degraded.

L 280:

It is up to the authors, but generally the conclusions reads more clearly if this is now written in past tense. E.g.

“In this work, we develop[ed]…”

“We compare[ed] multivariate GC to three…”

“We [found] that, in a toy model…”

“In this work we investigate[d]…”

“We [found] that this…”

L 284:

“the localized estimate of the background [error] covariance matrix”

L 294:

“A natural application of this work is localization in a coupled atmosphere-ocean model. Multivariate GC allows for within component covariances to be localized with GC exactly as they would be in an uncoupled setting, using the optimal localization length scale for each component Ying et al. (2018). In this work we discuss the importance of the cross-localization radius in determining performance. However, this work does not address the question of optimal cross-localization radius selection, which is an important area for future research”

This is certainly of interest - are there any conjectures that can be made about the applicability of the results here extending to an application like a coupled atmosphere-ocean model?

While the interpretation of localization is clear in the within-component covariances, how would you interpret localization on the cross-component covariances? Could a situation in which it might be desirable for an atmospheric observation to have an influence on the ocean state but not vice versa create difficulties with the symmetry relied on above in forming LXX, LYY, and LXY as a triple, and the need for maintaining positive semi-definiteness?

L 461:

The full citation for Lorenz-96 is not given. It should be Lorenz (1996):

Lorenz, E.N., 1996: Predictability—A problem partly solved.Proc. Seminar on Predictability,Vol. 1, Reading, Berkshire, UnitedKingdom, ECMWF, 1–18.

Note that Lorenz cited it himself this way in:

Lorenz, E.N., 2005. Designing chaotic models. J. of the Atmos. Sci. 62, 1574–1587. DOI:10.1175/JAS3430.1.

Citation: https://doi.org/10.5194/npg-2021-8-RC1
- AC1: 'Reply on RC1', Zofia Stanley, 10 Jun 2021
  
  Thank you for your review and careful reading of our manuscript. Our responses are attached.
  
  Citation: https://doi.org/10.5194/npg-2021-8-AC1
RC2:
'Comment on npg-2021-8', Anonymous Referee #2, 30 Mar 2021
The paper by Stanley et.al. develops multis-scale extensions to the traditional families of the parameterized localization functions such as the Gaspari-Cohn 5^th order polynomial. A key contribution of the paper is to note that each of these polynomial expressions can be represented as a product of the square-root kernels that can be cross-multiplied to achieve a positive-definite cross-scale localization. Authors, correctly, draw relevance of these new techniques to the problem of cross scale localization encountered in coupled data assimilation. Authors correctly choose the right type of the test problem that is useful-enough to test the mathematics of the developed extensions yet is not too complex to obscure the interpretation of the results. I agree with authors that there is no need to over-interpret the results of this simple experiment with regard to its relevance in a more complex ocean-atmosphere problems.

I found this article very relevant, well-written, with adequate experimental plan, and appropriate interpretation of the experimental results. I congratulate the authors on a nice contribution to the literature and suggest this paper for publication after minor revision.

Summary of suggested changes:

This contribution parallels the work of Mark Buehner on multi-scale localization. I suggest that authors draw this parallel by referencing some of his work such as [https://doi.org/10.3402/tellusa.v67.28027]. By including these references, authors can then discuss how their work can also be related to the problem of multi-scale localization (e.g. such as assimilation in convective-resolving models).

I made some minor suggestion to how authors might consider changing or extending other references in the introduction section (see pdf attached).

I tried to read the paper from the perspective of someone who might want to implement some of the localization formulas discussed by the authors. I made some suggestions on clarifications. I highly appreciate that authors published the source code for their work. I suggest that authors mention that in the main body of the paper.
Citation: https://doi.org/10.5194/npg-2021-8-RC2
- AC2: 'Reply on RC2', Zofia Stanley, 10 Jun 2021
  
  Thank you for your review and careful reading of our manuscript. Our responses are attached.
  
  Citation: https://doi.org/10.5194/npg-2021-8-AC2
RC3:
'Comment on npg-2021-8', Anonymous Referee #3, 19 Apr 2021
General Comments:

This manuscript presents a new multivariate extension of the standard Gaspari-Cohn localization function which is compared with 3 other multivariate functions, plus their univariate versions. These techniques are extremely relevant to problem of cross-domain localization in strongly coupled data assimilation and this work is an encouraging step towards developing appropriate methods for such systems. The localization techniques are illustrated using the bivariate Lorenz 96 system.

Whilst it would be nice to have an example illustrating how the new multivariate GC method translates to a more realistic system I appreciate that it is always important to test new ideas like this in a relatively simple system, where results can be more easily interpreted. I hope that the authors have the opportunity to extend this work to a more complex coupled model system in the future.

It is always good to see new work addressing issues related to the application of coupled DA. The article is timely, highly relevant and clearly written; it will make a nice contribution to the coupled DA literature. I suggest it is published after minor revisions.

Specific comments:

It is a shame that results were only shown for case where the fast (Y) component is fully observed, and further that the performance of each method was only measured/ reported in terms of the RMSE of the X (slow, unobserved) component. I would like to see some results from experiments where only the X (slow) component is observed, and also where both the X and Y components are observed, both fully and partially. I appreciate that this would potentially increase the number of figures/length of the manuscript, but it may not be necessary to explicitly show all the results. A brief discussion of the results in order to confirm that the general conclusions still hold under different observing scenarios would give the reader greater confidence in the performance of the new GC method.

I am not entirely clear on how the univariate localization functions were implemented. Lines 181-182 state:

"We compare the four multivariate localization functions in Sect. 2 to a simple approach to localization in coupled DA, which is to use the same localization function for all model components. We call this approach univariate localization."
I think this means that each block of the localization matrix L uses the same localization function and radius for all blocks, rather than a different radius for the X and Y blocks and a different function and radius for the cross X,Y block, is this correct? I think what is confusing is that you are calling it univariate localization but you are actually localizing the cross XY blocks of the matrix B. Perhaps this needs to be stated more explicitly somewhere. In systems with very different error correlation scales this type of univariate localization function could be not really be expected to perform well.

Minor comments:

The references are a bit strange – there are multiple web links for a lot of the papers; the https://doi.org/xxx link will be sufficient in most cases.

Further minor comments and technical corrections are marked in the attached pdf.
Citation: https://doi.org/10.5194/npg-2021-8-RC3
- AC3: 'Reply on RC3', Zofia Stanley, 10 Jun 2021
  
  Thank you for your review and careful reading of our manuscript. Our responses are attached.
  
  Citation: https://doi.org/10.5194/npg-2021-8-AC3

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Zofia Stanley on behalf of the Authors (12 Jun 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (15 Jun 2021) by Olivier Talagrand

RR by Anonymous Referee #2 (17 Jun 2021)

RR by S.G. Penny (05 Jul 2021)

Suggestions for revision or reasons for rejection

I appreciate that the authors have expanded their experiments to include a more thorough assessment of the coupled data assimilation problem, including varying observation errors, observation coverage, and coupling strength. The updated results appear to provide a stronger case for the use of the multivariate Gaspari-Cohn localization.

“The note of S. Rasp is in reference to subgrid-scale parameterization with this model, so it is not directly relevant. ”

Certainly this was his focus application, but that does not negate his point of the highly linear relationships between the slow/fast variables in this model. I disagree that this dynamic is not directly relevant.

“However, important couplings between atmosphere and ocean can be linear, e.g. their exchange of sensible heat, which is approximately linearly propotional to the temperature difference. ”

Perhaps this dynamic in the Lorenz model, and the potential application mentioned here, can be highlighted in the manuscript. This would give the reader a better understanding of the potential applicability of the results.

“the cross-assimilation decreases”

I’m not sure what this means.

“The coupling strength is h = 2 in the figure in the paper. The biggest change we saw is that the magnitude of the analysis errors in the unobserved X process increased with decreasing h. This is not surprising”

But the errors also decreased in the observed variables as you reduced the coupling strength. Perhaps some attention should be given to the complete coupled state estimate rather than only the unobserved variables. It does appear that the overall RMSE of the coupled XY state may reduce with stronger coupling, but it might still be worth calculating and reporting.

“Multivariate Gaspari-Cohn still led to better performance than any of the other functions”

Yes, it seems the case for the multivariate Gaspari-Cohn has been strengthened by the further experiments.

“We have included experiments observing only the “long” X process and the full coupled system.”

Lorenz uses the terminology “large” and “small” scales. He also designed the system to represent different timescales, spanning slower growing instabilities associated with planetary and synoptic scales, and the fast evolving mesoscale motions and convective clouds at smaller scales. I think it would be preferable to use one of these two terminologies rather than adding ‘short’ and ‘long’ as new descriptors for such an old and frequently studied model.

Further, I would suggest reviewing a worthwhile analysis performed by Ginelli and collaborators that might provide additional insights:

Carlu, Ginellie, Lucarini, Politi, 2019: Lyapunov analysis of multiscale dynamics: The slow bundle of the two-scale Lorenz ’96 model. https://arxiv.org/pdf/1809.05065.pdf

“When we observed only the long process, all localization functions led to very similar performance (Fig. 4).”

The errors might need to be scaled with some reference here. Using the absolute errors is less informative when working with different scales. Perhaps, for example, you could irescale the errors as a percentage of climatological variability.

“Observing both processes, at least in our configuration, was quite unstable and often led to filter divergence.”

This is concerning, and could point to a problem in the DA approach (perhaps because of the use of the stochastic EnKF? Could the presence of multiple scales make the system more sensitive to magnitude of random noise applied to different components?). It would be useful to understand why this is the case. Could there be some relationship to the imbalance of the effects on the observed versus unobserved variables as mentioned above?

“ filter performance is highly sensitive to the treatment of cross-domain background error covariances.”

Yes we have seen similar results in more complex models.

“Thus, zeroing out the cross terms, as in weakly coupled schemes, may improve state estimates. On the other hand, inclusion of some cross-domain terms appears to be important for stability.”

It would be interesting to develop a strategy to approach this more rigorously.

“The BW method looks slightly improved, and the Askey method slightly degraded.
This is true, however the difference is not statistically significant. ”

The dichotomy (significant vs. non-significant) is problematic since it is based on a somewhat arbitrary threshold for the p-value and can set misguided incentives in the evaluation and interpretation of a study. I suggest a review of the ASA’s guidance on the use of the term, and taking care in discarding a result that may have some value.

For example, from The American Statistician special issue on the Statistical Inference in the 21st Century:

• Don’t base your conclusions solely on whether an association or effect was found to be “statistically significant”
• Don’t believe that an association or effect exists just because it was statistically significant.
• Don’t believe that an association or effect is absent just because it was not statistically significant.
• Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.
• Don’t conclude anything about scientific or practical importance based on statistical significance (or lack thereof).

Wasserstein et al., 2019: “Moving to a World Beyond “p < 0.05””
https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913

“Could you clarify your interpretation of within-component localization?… They are both tapering a covariance between variables based on distance”

Localization is needed across different variables, and when two different components of a system have different length scales, then it is not clear that a simple Euclidean distance metric is sufficient. Applying the localization in a more abstract space, or with an appropriately tailored distance metric, may be more appropriate.

“A setup where the atmosphere influences the ocean state, but not vice versa would necessarily be associated with a background error covariance matrix which is not symmetric (which is not possible).”

This might then indicate that this is not the best approach for the coupled data assimilation problem, e.g. if such more strongly one-way interactions are necessary for constraining coupled systems. Could this have any relevance to the failures of the fully observed cases?

Hide

ED: Publish subject to minor revisions (review by editor) (12 Jul 2021) by Olivier Talagrand

AR by Zofia Stanley on behalf of the Authors (22 Jul 2021) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (26 Jul 2021) by Olivier Talagrand

AR by Zofia Stanley on behalf of the Authors (03 Aug 2021) Author's response Manuscript

Short summary

In weather forecasting, observations are incorporated into a model of the atmosphere through a process called data assimilation. Sometimes observations in one location may impact the weather forecast in another faraway location in undesirable ways. The impact of distant observations on the forecast is mitigated through a process called localization. We propose a new method for localization when a model has multiple length scales, as in a model spanning both the ocean and the atmosphere.