the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Comparison of Two Nonlinear Data Assimilation Methods
Abstract. Advanced numerical data assimilation (DA) methods, such as the fourdimensional variational (4DVAR) method, are elaborate and computationally expensive. Simpler methods exist that take timevariability into account, providing the potential of accurate results with a reduced computational cost. Recently, two of these DA methods were proposed for a nonlinear ocean model. The first method is Diffusive Back and Forth Nudging (DBFN) which has previously been implemented in several complex models, most specifically, an ocean model. The second is the ConcaveConvex Nonlinearity (CCN) method provided by Larios and Pei that has a straightforward implementation and promising results. DBFN is less costly than a traditional variational DA system but it requires integrating the model forward and backward in time over a number of iterations, whereas CCN only requires integration of the forward model once. This paper will investigate if Larios and Pei's CCN algorithm can provide competitive results with the already tested DBFN within simple chaotic models. Results show that observation density and/or frequency, as well as the length of the assimilation window, significantly impact the results for CCN, whereas DBFN is fairly adaptive to sparser observations, predominately in time.
 Preprint
(2296 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on npg20243', Anonymous Referee #1, 12 Mar 2024
General Comments:
The authors provide a set of methods that seek to provide some time dependence to their fit to observations, like 4DVar, but with a potentially lower computational burden.
The introduction is a bit imprecise, and would benefit from an extended literature review on the concepts and categorizations of DA and filtering/smoothing methods. Further, the works from Wang, Pu, Kalnay, etc. (citations below) should all be discussed in the context of alternative methods since they were proposed over 20 years ago and seem to have relevance to the DBFN method and have been tested in a more operational context. There is a lot of historical work from the mathematical field “synchronization of chaos” using Lorenz 63 in particular, but also more recently with Lorenz96 and the shallow water equations. A starting point would be to look at the works of Pecora and Carroll, as well as Abarbanel and coauthors. I suggest the authors review these works and put some of their discussion in context of these efforts, which largely used nudging methods to achieve synchronization and are highly related to DA methods.
It would be very useful to apply a conventional DA method as a baseline for comparison  e.g. either 3DVar or 4DVar would be good choices. 4DVar would be ideal since the authors propose the methods here as alternatives to 4DVar, but at least a simple 3DVar (with a reasonably well calibrated background error covariance matrix B) could serve as a good control. Ideally, the authors would provide both and show where these new methods fall in relation to those. There is also not really any discussion of the costs of the proposed methods in comparison to 4DVar and 3DVar, though this is included as advantage of these methods in the manuscript text, and it would be helpful to put the proposed methods in that context for potential consideration for future research and development.
The description of the experiment setup should be more precise. The details are unclear and require guessing on the part of the reader. In addition, it would be useful to have a section describing the observation sampling strategy, the density/sparsity of the observations, and the noise applied to the observations, prior to describing the DA setup and results.
More investigation would be useful to show how robust the methods are to sparsity in observations in both space and in time, and with increasing noise in the observations. Further, the discussion and results should be further separated between a spinup period (i.e. what conditions are required to have the different DA methods drive the state estimate toward the true state), the “online DA” period (i.e. once the system is spun up, can it maintain the state estimates with reasonable accuracy without filter divergence), and the forecast. At the moment these are all combined into single experiments, and so make it difficult to establish the behavior of the methods. For example, after spinup, long forecasts can be initialized from every DA analysis initial condition in order to produce forecast statistics  as opposed to a single forecast at the end of a cycled DA period. This would be more consistent with how operational forecasts are conducted. Overall, additional efforts like this are needed to improve the statistical reliability of the experiment results reported.
Specific Comments:
L 5:
“The second is the ConcaveConvex Nonlinearity (CCN) method provided by Larios and Pei that has a straightforward implementation and promising results.”
Promising results in what context? Toy models? Full scale ocean models?
L 8:
“integrating the model forward and backward in time”
It might be worth mentioning whether this means with the TLM and Adjoint, as done with 4DVar, or it requires a different model to describe the “backward in time” integration.
L 11:
“is fairly adaptive to sparser observations, predominately in time.”
I think you mean ‘robust’ not ‘adaptive’:
“is fairly robust to sparser observations, predominately in time.”
L 14:
“There are generally two classes of data assimilation (DA) methods: filters and smoothers.”
I’m not sure that I agree with this characterization. Sequential DA methods can be applied as smoothers  for example the ensemble Kalman filter can be applied as a smoother to update states throughout an analysis window.
It is more typical to separate the classes of conventional DA methods into ‘sequential’ and ‘variational’.
L 19:
“Filters assume that all the data within the observation window are collected and valid at the analysis time.”
I don’t think this is the key distinguishing factor between a filter and a smoother.
For example, from Doucet and Johansen (2008):
https://www.stats.ox.ac.uk/~doucet/doucet_johansen_tutorialPF2011.pdf
“filtering methods have become a very popular class of algorithms to solve these estimation problems numerically in an online manner, i.e. recursively as observations become available,”
They define filtering as:
“the problem of filtering: characterising the distribution of the state of the hidden Markov model at the present time, given the information provided by all of the observations received up to the present time”
While they define smoothing as:
“smoothing corresponds to estimating the distribution of the state at a particular time given all of the observations up to some later time”
In that sense, even 3DVar is a smoother if observations are used throughout a window and the analysis is formed at the center of that window (a common operation at NOAA, for example).
L 23:
“Smoothers on the other hand assimilate all observations collected within the observation window at their respective time and provide a correction to the entire model trajectory over the assimilation window.”
Again, this is not specific to a smoother. I suggest the authors just remove the filter/smoother distinction and focus on the key property that observations are assimilated throughout a time window (which all modern operational forecast systems do at this stage, either using 3DVar FGAT, a 4DEnKF, or 4DVar).
L 2526:
“The former refers to the time window over which a correction to the model is computed, while the latter refers to the time window over which observations are collected/considered for assimilation.”
I don’t know of many systems that don’t have these two windows coincide, but I am aware that the SODA system at UMD uses a longer observation window for each analysis, which might be worth citing here as an example ocean DA system that follows this strategy.
L 2931:
“There are a few known smoother methods such as the fourdimensional variational (4DVAR) (Fairbairn et al., 2013; 30 Le Dimet and Talagrand, 1986), the Kalman Smoother (KS) (Bennett and Budgell, 1989), and the Ensemble Kalman Smoother (EnKS) (Evensen and Van Leeuwen, 2000). Of these three, 4DVAR is the one that is most used in geosciences problems.”
By your definition, the 4D Local Ensemble Transform Kalman Filter (LETKF) is also a smoother. I’d argue that the EnKF is potentially used more in geoscience problems due to its ease of implementation compared to 4DVar. So I would just say:
“Of these three, 4DVAR is considered one of the leading stateoftheart methods for geosciences problems.”
L 3233:
“[4DVAR] does, however, require the development of a tangent linear (TLM) and adjoint of the dynamical model being used. This development of the TLM and the adjoint model is both cumbersome and tedious, [and requires regular maintenance as the base model undergoes continued development].”
L 3839:
“the backward integration of the nonlinear model costs less than the adjoint integration”
How exactly do you propose to integrate a model backwards in time, particularly one that has diffusive processes?
Over 20 years ago, Kalnay et al. (2000) proposed the ‘quasiinverse’ method to do something similar. While they made parallels to 3DVar, it was actually similar to what is being described here as an alternative to 4DVar. I think it would be worthwhile to compare the quasiinverse method since they faced similar challenges with reversepropagation of the nonlinear system, and gave an example applying this in the NOAA operational forecast system (e.g. running the TLM backwards with the sign of surface friction and horizontal diffusion changed).
Kalnay, E., S. K. Park, Z. Pu, and J. Gao, 2000: Application of the QuasiInverse Method to Data Assimilation. Mon. Wea. Rev., 128, 864–875, https://doi.org/10.1175/15200493(2000)128<0864:AOTQIM>2.0.CO;2.
This built on the work of Wang et al. (1995) and Pu et al. (1997a/b)
Wang, Z., I. M. Navon, X. Zou, and F. X. Le Dimet, 1995: A truncated Newton optimization algorithm in meteorology applications with analytic Hessian/vector products. Comput. Optim. Appl.,4, 241–262.
Pu, Z.X., E. Kalnay, J. Sela, and I. Szunyogh, 1997a: Sensitivity of forecast errors to initial conditions with a quasiinverse linear model. Mon. Wea. Rev.,125, 2479–2503.
Pu, Z.X., E. Kalnay, J. Derber, and J. Sela, 1997b: An inexpensive technique for using past forecast errors to improve future forecast skill. Quart. J. Roy. Meteor. Soc.,123, 1035–1054.
L 8090, equations 2,3,,4:
It seems that K and K’ are constants here. Much of the work in modern DA is formulating K as a matrix operator  for example the Kalman gain matrix. In that case, K contains all of the information about the forecast error, observation error, and potentially model error.
The use of a constant here bears more resemblance to the methods used in the mathematical field focused on the “synchronization of chaos” (for example see works from Pecora and Carroll, or work by Abarbanel et al.). In those works, however, typically the observations (while they may be sparse in space) are available frequently in time. This scenario seems more akin to an observing buoy in an ocean DA context (unlike, for example a satellite measurement or Argo surfacing profiler).
Louis M. Pecora, Thomas L. Carroll; Synchronization of chaotic systems. Chaos 1 September 2015; 25 (9): 097611. https://doi.org/10.1063/1.4917383
Henry D. I. Abarbanel, Nikolai F. Rulkov, and Mikhail M. Sushchik
Phys. Rev. E 53, 4528 – Published 1 May 1996
These works largely used nudging methods to achieve synchronization and are highly related to DA methods. For example, Penny (2017) showed the connections between modern DA methods and concepts from synchronization of Chaos:
Penny, S.G.; Mathematical foundations of hybrid data assimilation from a synchronization perspective. Chaos 1 December 2017; 27 (12): 126801. https://doi.org/10.1063/1.5001819
L 110:
I think it would be helpful to plot what the eta coefficient looks like as defined by equation 6.
It is a bit unclear from the equations (5) and (6)  is the departure X_obsH(X) the input argument to the function eta(x) as defined in equation (6), or is eta a constant that is the nudging coefficient applied to the departure? If the latter, what is the input ‘x’ value to eta(x)?
L 115:
“if the results can still be achieved with sparse observations”
I’d be interested if this investigation includes both sparsity in space and in time.
L 148161:
It might be worth justifying the interpretation of the time units (approx. time) for each model, e.g. the Lorenz96 timescale is based on the rate of error growth in operational wether prediction models of the time, and was described by Lorenz (1996).
L 169, Figure 1:
On the panel (b) it says:
“Truth is shown in teal whereas the orange line is a test run with no DA that started with the same background initial condition.”
There should have been some difference to cause the divergence in the nature run ’truth’ and the free run.
In the text (L166168) it says figure 1b is the first two months of truth  so it does not look like they have the same initial condition.
Some more clarity is needed here to describe the experiment setup.
L 170:
Again, for Lorenz 96, the exeriment setup is a bit unclear. You are running the nature run for 1 year and then using the end of the first year as the initial state for the DA experiment. Does that mean the initial ’true’ state for the DA experiments, or the background first guess to be corrected to a different truth that is sampled by observations?
L 179 Figure 2 caption:
Typo:
‘Truth IC’ and ‘DA IC’  the first apostrophe is backwards in both cases.
L 190192:
“Here, we provide a few remarks. The first is that the "best choice" for the value chosen can be different depending on the model being used. There are other cases discussed in the results section below where the optimal value had to be changed to adapt to the parameters given.”
I think it should be mentioned here or earlier in the text that the “best choice” coefficient is typically derived in modern DA methods as a matrix formed by a combination of information about the background error and observation error. The nudging approach here assumes a diagonal error covariance in both and simply replaces the ratio of background to observation (or more accurately summed) error with a simple constant coefficient.
It would be interesting to take a step closer to a more realistic application by having two or more sets of data with different observation errors associated with them, or to account for uncertainties in the model by expanding the constant coefficient to a full Kalman gain matrix.
In the latter case, the nudging techniques may be more effective in the presence of sparse data if the timedependent Kalman gain is provided and reasonably accurate.
The authors might consider reviewing the Ensemble Transform KalmanBucy filters proposed, for example, by:
Amezcua, J., Ide, K., Kalnay, E. and Reich, S. (2014), Ensemble transform Kalman–Bucy filters. Q.J.R. Meteorol. Soc., 140: 9951004. https://doi.org/10.1002/qj.2186
The relevance of this method is probably closer to that of the compared CCN method.
L 201:
“Several experiments are carried out with different lengths of DA [analysis] windows. The length of the forecast is the same as the time window chosen for [the] DA [analysis].”
Please be precise when discussing the DA “analysis window” or DA “analysis cycle window”. The term “DA window” is unclear, as it could mean the entire DA experiment period since DA is typically a cycled process.
L 218, Figure 5:
It seems that the results here are combining the DA spinup period, the DA performance period, and the forecast into a single assessment. Since spinup for DA methods is a problem in its own right, I’d suggest the authors compare these methods separately  (1) how well the model spins up the state estimate to be close to the true state, and separately (2) how well the algorithms perform once the systems are spun up, and (3) the skill of the resulting forecasts.
For example, it has been shown the EnKF methods often take longer to spin up since the ensemble members themselves have to stabilize and converge to the unstable manifold of the system, but after that point can have similar accuracy to more sophisticated variational methods like 4DVar.
Since in practice the spinup is usually only performed once, this does not seem to have high relevance to an operational environment. Rather, it is more interesting to know how the systems perform after this spinup is achieved.
That being said, the caption indicates: “All DA experiments assimilated all observations (i.e., all grid points at every timestep/6 minutes).” In that case, I would be very interested to know more about the spinup process, and how robust it is to sparsity in the observations in both space and time.
L 221:
Backwards apostrophe on ‘lgp2ts’.
Also in Table 2 caption ‘all obs’ and others.
The results in Table 2 are difficult to interpret since thy appear to combine MAE of the spinup and performance periods.
L 233:
“CCN did not do well with even [fewer] observations”
L 237238:
“The conclusion from these results was that a larger nudging coefficient was needed for DBFN in cases with sparse observations and/or longer time windows.”
I wouldn’t consider skipping observations for 1 timestep (as in Table 3), within the range of linear dynamics of the model, to be ’sparse observations’. Not until the results of Table 4 would this characterization be more appropriate.
A comparison to a simple 3DVar method (using a reasonable background error covariance B matrix) could be useful as a benchmark. If the target is to proceed a method that can be somewhat competitive with 4DVar, then it seems fair to at least reference a simple 3DVar as a benchmark, if not 4DVar itself.
L 300—303:
“While DBFN is able to retain accuracy for observations that are sparse in time, due to the advantage of spreading these corrections through the back and forth iterations, we observed that the results from CCN decayed as the density and/or frequency of observations were reduced. ”
This is a reasonable conclusion, but I’d like to see it demonstrated a bit more rigorously. For example, a full grid search of different combinations of sparsity of observations in both space and time, or with the addition of increased observational noise, and how the methods respond in the ‘ideal’ scenarios and the more extreme sparse and noisy observation scenarios.
Citation: https://doi.org/10.5194/npg20243RC1 
RC2: 'Comment on npg20243', Brad Weir, 08 May 2024
GENERAL COMMENTS:
I have basically the same general comments as the other reviewer. The introduction would benefit from more depth and context. More importantly, I think the inclusion of 12 other data assimilation (DA) methods, especially at least one filter, is essential. The introduction begins by highlighting the distinction between filters and smoothers, then analyzes two smoothers showing that the one which is more costly (in terms of both person and computational effort) outperforms the other. This result is perhaps unsurprising, but there is very little discussion about when and why this costperformance tradeoff might be acceptable beyond showing that longer assimilation windows reduce the tradeoff. Including other approaches, especially filters, as counterexamples could provide additional context. Perhaps the less costly but less performant smoother still outperforms a wellcalibrated filter. Perhaps it doesn't. The results of such a comparison would go a long way to helping the reader understand when and why they may want to chose one DA method over the other. Finally, given the methodological development in the paper, it is unclear if these approaches can be applied in cases where the observations are not fullrank. Since this is very often the case, some discussion needs to be provided about how one would confront observations that are a lowerrank subset of the state space.
SPECIFIC/TECHNICAL COMMENTS:
Line 14, "generally two classes": There are other ways to cut the DA pie. For example, variational vs. ensemble.
Line 15, "model background state": Very technical point here, but the model background state is typically conditioned on all past observations and the description up to this point seems somewhat misleading regarding this point.
Line 21, "suppressing the time variability in the observations": This isn't really true. The 3DVar systems used at many DA centers use interpolation across the beginning, middle, and end of the time window. This makes the comparison to filters here seem like a bit of a straw man. To address this concern, I would very much like to see an example of 3DVar with this type of time interpolation compared to the methods presented in this paper.
Line 39, "seems to": I would hope we could be more precise in peerreviewed scientific literature.
Line 44, "BFN can however": I would suggest perhaps a new paragraph here and a few more details about AOT. There are a lof of acronyms introduced in a short time (BFN, CDA, AOT, CCN, etc.), and it would best to either reduce or make the distinctions between all of these clear. See also comment on lines 117119.
Lines 7879, "M is used ... the forcing": I don't understand why this is worth noting. M and F are just variable names.
Equation 2: How would this be applied if H is not full rank? In fact, it seems like H is assumed to be the identity matrix here. The latter would only be true in test cases and never in practice.
Line 86, "state": Again, this is the observation operator applied to the state and is only the state when H is the identity.
Line 95, "2.1.1": Suggest 2.2
Line 96, "linear AOT method": I don't really understand what this is and some description would help.
Lines 9798, "The first approach ... linear AOT method": I don't think there's been enough introduction to this point for the reader to have a sense of why this is meaningful.
Lines 101102, "ConcaveConvex Nonlinearity": Acronym has already been defined. Suggest either writing CCN or including the abbreviation again.
Line 110, "eta(x) = eta_3(x)": This additional notation seems to not be needed.
Lines 116117, "not explicitly stated ... every timestep": Seems like it would be better to contact the authors rather than speculating.
Lines 117119, "This paper investigates ... as in BFN.": This seems introductory and maybe should go in the Introduction.
Lines 136137, "For clarification ... DBFN": Again, these are just variable names. This note seems to conflate the variable name with the thing it denotes.
Line 115, "create sufficient chaos": I have no idea what "sufficient chaos" is.
Line 162, "6 minutes": I'm confused what these times correspond to. Is this walltime or the time variable of the equation.
Line 165, "significantly different": Suggest either being more precise or removing this comment.
Figure 1: While it's spelled out in the caption, an inset legend in the figure might be helpful.
Figure 2, "the top figure": Suggest using the term "panel" here and elsewhere instead of "figure" since the entire thing is "Figure 2".
Figure 4: The figure caption and labeling are insufficient for me to understand what's going on. Does each segment in panels a and c correspond to the entire frame of panels b and d? Please clarify.
Line 221, "1gp2ts": Seems like these acronyms could be improved. Perhaps 1GP2TS?
Line 231, "Other experiments ... should be discussed": If the experiments were relevant they should be provided, perhaps in a supplement/appendix.
Table 2, "Fcast": Suggest "FC" for consistency with "DA".
Table 2, all obs, CCN, 1m and 2m Fcast: I don't understand how forecast errors can be smaller, and so much so, than the DA errors. This seems like a typo/mistake. Could you please explain? See also Table 6.Citation: https://doi.org/10.5194/npg20243RC2 
EC1: 'Comment on npg20243', Juan Restrepo, 25 May 2024
Dear Drs. Montiforte, Ngodock, and Souopgui
There are two very helpful reviews for you to work with. Both referees are high technical experts as well as seasoned researchers and are suggesting a number of ways the paper will improve with regard to what and how to present the results. Their reviews should be fully addressed in the revised mss. Readability and impact need to be addressed. My suggestions for the revision are bolstered by my difficulty in finding experts willing to evaluate this mss. I am confident that addressing the referee concerns will target the issues that will make this paper more interesting to the reader of NPG as well as increase its potential referability.
Please include a detailed listing of the referee comments and how these are being addressed in a revision of the paper.
Best Wishes
Juan M. Restrepo
Citation: https://doi.org/10.5194/npg20243EC1
Viewed
HTML  XML  Total  BibTeX  EndNote  

345  53  17  415  14  13 
 HTML: 345
 PDF: 53
 XML: 17
 Total: 415
 BibTeX: 14
 EndNote: 13
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1