the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Comparison of Two Nonlinear Data Assimilation Methods
Abstract. Advanced numerical data assimilation (DA) methods, such as the four-dimensional variational (4DVAR) method, are elaborate and computationally expensive. Simpler methods exist that take time-variability into account, providing the potential of accurate results with a reduced computational cost. Recently, two of these DA methods were proposed for a nonlinear ocean model. The first method is Diffusive Back and Forth Nudging (D-BFN) which has previously been implemented in several complex models, most specifically, an ocean model. The second is the Concave-Convex Nonlinearity (CCN) method provided by Larios and Pei that has a straightforward implementation and promising results. D-BFN is less costly than a traditional variational DA system but it requires integrating the model forward and backward in time over a number of iterations, whereas CCN only requires integration of the forward model once. This paper will investigate if Larios and Pei's CCN algorithm can provide competitive results with the already tested D-BFN within simple chaotic models. Results show that observation density and/or frequency, as well as the length of the assimilation window, significantly impact the results for CCN, whereas D-BFN is fairly adaptive to sparser observations, predominately in time.
- Preprint
(2296 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 19 May 2024)
-
RC1: 'Comment on npg-2024-3', Anonymous Referee #1, 12 Mar 2024
reply
General Comments:
The authors provide a set of methods that seek to provide some time dependence to their fit to observations, like 4D-Var, but with a potentially lower computational burden.
The introduction is a bit imprecise, and would benefit from an extended literature review on the concepts and categorizations of DA and filtering/smoothing methods. Further, the works from Wang, Pu, Kalnay, etc. (citations below) should all be discussed in the context of alternative methods since they were proposed over 20 years ago and seem to have relevance to the D-BFN method and have been tested in a more operational context. There is a lot of historical work from the mathematical field “synchronization of chaos” using Lorenz 63 in particular, but also more recently with Lorenz-96 and the shallow water equations. A starting point would be to look at the works of Pecora and Carroll, as well as Abarbanel and coauthors. I suggest the authors review these works and put some of their discussion in context of these efforts, which largely used nudging methods to achieve synchronization and are highly related to DA methods.
It would be very useful to apply a conventional DA method as a baseline for comparison - e.g. either 3D-Var or 4D-Var would be good choices. 4D-Var would be ideal since the authors propose the methods here as alternatives to 4D-Var, but at least a simple 3D-Var (with a reasonably well calibrated background error covariance matrix B) could serve as a good control. Ideally, the authors would provide both and show where these new methods fall in relation to those. There is also not really any discussion of the costs of the proposed methods in comparison to 4D-Var and 3D-Var, though this is included as advantage of these methods in the manuscript text, and it would be helpful to put the proposed methods in that context for potential consideration for future research and development.
The description of the experiment setup should be more precise. The details are unclear and require guessing on the part of the reader. In addition, it would be useful to have a section describing the observation sampling strategy, the density/sparsity of the observations, and the noise applied to the observations, prior to describing the DA setup and results.
More investigation would be useful to show how robust the methods are to sparsity in observations in both space and in time, and with increasing noise in the observations. Further, the discussion and results should be further separated between a spinup period (i.e. what conditions are required to have the different DA methods drive the state estimate toward the true state), the “online DA” period (i.e. once the system is spun up, can it maintain the state estimates with reasonable accuracy without filter divergence), and the forecast. At the moment these are all combined into single experiments, and so make it difficult to establish the behavior of the methods. For example, after spinup, long forecasts can be initialized from every DA analysis initial condition in order to produce forecast statistics - as opposed to a single forecast at the end of a cycled DA period. This would be more consistent with how operational forecasts are conducted. Overall, additional efforts like this are needed to improve the statistical reliability of the experiment results reported.
Specific Comments:
L 5:
“The second is the Concave-Convex Nonlinearity (CCN) method provided by Larios and Pei that has a straightforward implementation and promising results.”
Promising results in what context? Toy models? Full scale ocean models?
L 8:
“integrating the model forward and backward in time”
It might be worth mentioning whether this means with the TLM and Adjoint, as done with 4D-Var, or it requires a different model to describe the “backward in time” integration.
L 11:
“is fairly adaptive to sparser observations, predominately in time.”
I think you mean ‘robust’ not ‘adaptive’:
“is fairly robust to sparser observations, predominately in time.”
L 14:
“There are generally two classes of data assimilation (DA) methods: filters and smoothers.”
I’m not sure that I agree with this characterization. Sequential DA methods can be applied as smoothers - for example the ensemble Kalman filter can be applied as a smoother to update states throughout an analysis window.
It is more typical to separate the classes of conventional DA methods into ‘sequential’ and ‘variational’.
L 19:
“Filters assume that all the data within the observation window are collected and valid at the analysis time.”
I don’t think this is the key distinguishing factor between a filter and a smoother.
For example, from Doucet and Johansen (2008):
https://www.stats.ox.ac.uk/~doucet/doucet_johansen_tutorialPF2011.pdf
“filtering methods have become a very popular class of algorithms to solve these estimation problems numerically in an online manner, i.e. recursively as observations become available,”
They define filtering as:
“the problem of filtering: characterising the distribution of the state of the hidden Markov model at the present time, given the information provided by all of the observations received up to the present time”
While they define smoothing as:
“smoothing corresponds to estimating the distribution of the state at a particular time given all of the observations up to some later time”
In that sense, even 3D-Var is a smoother if observations are used throughout a window and the analysis is formed at the center of that window (a common operation at NOAA, for example).
L 23:
“Smoothers on the other hand assimilate all observations collected within the observation window at their respective time and provide a correction to the entire model trajectory over the assimilation window.”
Again, this is not specific to a smoother. I suggest the authors just remove the filter/smoother distinction and focus on the key property that observations are assimilated throughout a time window (which all modern operational forecast systems do at this stage, either using 3D-Var FGAT, a 4D-EnKF, or 4D-Var).
L 25-26:
“The former refers to the time window over which a correction to the model is computed, while the latter refers to the time window over which observations are collected/considered for assimilation.”
I don’t know of many systems that don’t have these two windows coincide, but I am aware that the SODA system at UMD uses a longer observation window for each analysis, which might be worth citing here as an example ocean DA system that follows this strategy.
L 29-31:
“There are a few known smoother methods such as the four-dimensional variational (4DVAR) (Fairbairn et al., 2013; 30 Le Dimet and Talagrand, 1986), the Kalman Smoother (KS) (Bennett and Budgell, 1989), and the Ensemble Kalman Smoother (EnKS) (Evensen and Van Leeuwen, 2000). Of these three, 4DVAR is the one that is most used in geosciences problems.”
By your definition, the 4D Local Ensemble Transform Kalman Filter (LETKF) is also a smoother. I’d argue that the EnKF is potentially used more in geoscience problems due to its ease of implementation compared to 4D-Var. So I would just say:
“Of these three, 4DVAR is considered one of the leading state-of-the-art methods for geosciences problems.”
L 32-33:
“[4DVAR] does, however, require the development of a tangent linear (TLM) and adjoint of the dynamical model being used. This development of the TLM and the adjoint model is both cumbersome and tedious, [and requires regular maintenance as the base model undergoes continued development].”
L 38-39:
“the backward integration of the nonlinear model costs less than the adjoint integration”
How exactly do you propose to integrate a model backwards in time, particularly one that has diffusive processes?
Over 20 years ago, Kalnay et al. (2000) proposed the ‘quasi-inverse’ method to do something similar. While they made parallels to 3D-Var, it was actually similar to what is being described here as an alternative to 4D-Var. I think it would be worthwhile to compare the quasi-inverse method since they faced similar challenges with reverse-propagation of the nonlinear system, and gave an example applying this in the NOAA operational forecast system (e.g. running the TLM backwards with the sign of surface friction and horizontal diffusion changed).
Kalnay, E., S. K. Park, Z. Pu, and J. Gao, 2000: Application of the Quasi-Inverse Method to Data Assimilation. Mon. Wea. Rev., 128, 864–875, https://doi.org/10.1175/1520-0493(2000)128<0864:AOTQIM>2.0.CO;2.
This built on the work of Wang et al. (1995) and Pu et al. (1997a/b)
Wang, Z., I. M. Navon, X. Zou, and F. X. Le Dimet, 1995: A truncated Newton optimization algorithm in meteorology applications with analytic Hessian/vector products. Comput. Optim. Appl.,4, 241–262.
Pu, Z.-X., E. Kalnay, J. Sela, and I. Szunyogh, 1997a: Sensitivity of forecast errors to initial conditions with a quasi-inverse linear model. Mon. Wea. Rev.,125, 2479–2503.
Pu, Z.-X., E. Kalnay, J. Derber, and J. Sela, 1997b: An inexpensive technique for using past forecast errors to improve future forecast skill. Quart. J. Roy. Meteor. Soc.,123, 1035–1054.
L 80-90, equations 2,3,,4:
It seems that K and K’ are constants here. Much of the work in modern DA is formulating K as a matrix operator - for example the Kalman gain matrix. In that case, K contains all of the information about the forecast error, observation error, and potentially model error.
The use of a constant here bears more resemblance to the methods used in the mathematical field focused on the “synchronization of chaos” (for example see works from Pecora and Carroll, or work by Abarbanel et al.). In those works, however, typically the observations (while they may be sparse in space) are available frequently in time. This scenario seems more akin to an observing buoy in an ocean DA context (unlike, for example a satellite measurement or Argo surfacing profiler).
Louis M. Pecora, Thomas L. Carroll; Synchronization of chaotic systems. Chaos 1 September 2015; 25 (9): 097611. https://doi.org/10.1063/1.4917383
Henry D. I. Abarbanel, Nikolai F. Rulkov, and Mikhail M. Sushchik
Phys. Rev. E 53, 4528 – Published 1 May 1996
These works largely used nudging methods to achieve synchronization and are highly related to DA methods. For example, Penny (2017) showed the connections between modern DA methods and concepts from synchronization of Chaos:
Penny, S.G.; Mathematical foundations of hybrid data assimilation from a synchronization perspective. Chaos 1 December 2017; 27 (12): 126801. https://doi.org/10.1063/1.5001819
L 110:
I think it would be helpful to plot what the eta coefficient looks like as defined by equation 6.
It is a bit unclear from the equations (5) and (6) - is the departure X_obs-H(X) the input argument to the function eta(x) as defined in equation (6), or is eta a constant that is the nudging coefficient applied to the departure? If the latter, what is the input ‘x’ value to eta(x)?
L 115:
“if the results can still be achieved with sparse observations”
I’d be interested if this investigation includes both sparsity in space and in time.
L 148-161:
It might be worth justifying the interpretation of the time units (approx. time) for each model, e.g. the Lorenz-96 timescale is based on the rate of error growth in operational wether prediction models of the time, and was described by Lorenz (1996).
L 169, Figure 1:
On the panel (b) it says:
“Truth is shown in teal whereas the orange line is a test run with no DA that started with the same background initial condition.”
There should have been some difference to cause the divergence in the nature run ’truth’ and the free run.
In the text (L166-168) it says figure 1b is the first two months of truth - so it does not look like they have the same initial condition.
Some more clarity is needed here to describe the experiment setup.
L 170:
Again, for Lorenz 96, the exeriment setup is a bit unclear. You are running the nature run for 1 year and then using the end of the first year as the initial state for the DA experiment. Does that mean the initial ’true’ state for the DA experiments, or the background first guess to be corrected to a different truth that is sampled by observations?
L 179 Figure 2 caption:
Typo:
‘Truth IC’ and ‘DA IC’ - the first apostrophe is backwards in both cases.
L 190-192:
“Here, we provide a few remarks. The first is that the "best choice" for the value chosen can be different depending on the model being used. There are other cases discussed in the results section below where the optimal value had to be changed to adapt to the parameters given.”
I think it should be mentioned here or earlier in the text that the “best choice” coefficient is typically derived in modern DA methods as a matrix formed by a combination of information about the background error and observation error. The nudging approach here assumes a diagonal error covariance in both and simply replaces the ratio of background to observation (or more accurately summed) error with a simple constant coefficient.
It would be interesting to take a step closer to a more realistic application by having two or more sets of data with different observation errors associated with them, or to account for uncertainties in the model by expanding the constant coefficient to a full Kalman gain matrix.
In the latter case, the nudging techniques may be more effective in the presence of sparse data if the time-dependent Kalman gain is provided and reasonably accurate.
The authors might consider reviewing the Ensemble Transform Kalman-Bucy filters proposed, for example, by:
Amezcua, J., Ide, K., Kalnay, E. and Reich, S. (2014), Ensemble transform Kalman–Bucy filters. Q.J.R. Meteorol. Soc., 140: 995-1004. https://doi.org/10.1002/qj.2186
The relevance of this method is probably closer to that of the compared CCN method.
L 201:
“Several experiments are carried out with different lengths of DA [analysis] windows. The length of the forecast is the same as the time window chosen for [the] DA [analysis].”
Please be precise when discussing the DA “analysis window” or DA “analysis cycle window”. The term “DA window” is unclear, as it could mean the entire DA experiment period since DA is typically a cycled process.
L 218, Figure 5:
It seems that the results here are combining the DA spinup period, the DA performance period, and the forecast into a single assessment. Since spinup for DA methods is a problem in its own right, I’d suggest the authors compare these methods separately - (1) how well the model spins up the state estimate to be close to the true state, and separately (2) how well the algorithms perform once the systems are spun up, and (3) the skill of the resulting forecasts.
For example, it has been shown the EnKF methods often take longer to spin up since the ensemble members themselves have to stabilize and converge to the unstable manifold of the system, but after that point can have similar accuracy to more sophisticated variational methods like 4D-Var.
Since in practice the spinup is usually only performed once, this does not seem to have high relevance to an operational environment. Rather, it is more interesting to know how the systems perform after this spinup is achieved.
That being said, the caption indicates: “All DA experiments assimilated all observations (i.e., all grid points at every timestep/6 minutes).” In that case, I would be very interested to know more about the spinup process, and how robust it is to sparsity in the observations in both space and time.
L 221:
Backwards apostrophe on ‘lgp2ts’.
Also in Table 2 caption ‘all obs’ and others.
The results in Table 2 are difficult to interpret since thy appear to combine MAE of the spinup and performance periods.
L 233:
“CCN did not do well with even [fewer] observations”
L 237-238:
“The conclusion from these results was that a larger nudging coefficient was needed for D-BFN in cases with sparse observations and/or longer time windows.”
I wouldn’t consider skipping observations for 1 timestep (as in Table 3), within the range of linear dynamics of the model, to be ’sparse observations’. Not until the results of Table 4 would this characterization be more appropriate.
A comparison to a simple 3D-Var method (using a reasonable background error covariance B matrix) could be useful as a benchmark. If the target is to proceed a method that can be somewhat competitive with 4D-Var, then it seems fair to at least reference a simple 3D-Var as a benchmark, if not 4D-Var itself.
L 300—303:
“While D-BFN is able to retain accuracy for observations that are sparse in time, due to the advantage of spreading these corrections through the back and forth iterations, we observed that the results from CCN decayed as the density and/or frequency of observations were reduced. ”
This is a reasonable conclusion, but I’d like to see it demonstrated a bit more rigorously. For example, a full grid search of different combinations of sparsity of observations in both space and time, or with the addition of increased observational noise, and how the methods respond in the ‘ideal’ scenarios and the more extreme sparse and noisy observation scenarios.
Citation: https://doi.org/10.5194/npg-2024-3-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
294 | 30 | 9 | 333 | 8 | 6 |
- HTML: 294
- PDF: 30
- XML: 9
- Total: 333
- BibTeX: 8
- EndNote: 6
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1