Precision variational approximations in statistical data assimilation

Introduction Conclusions References Tables Figures


Introduction
In a broad spectrum of scientific fields, transferring information in L observed data time series y l (t n ); l = 1, . . ., L; n = 0, . . ., m to a physically-based model of the processes producing those observations allows estimation of unknown parameters and unobserved states of the model within an observation window {t 0 , . . ., t m }.As a sample of these fields we note applications in meteorology (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007), geochemistry (Eibern and Schmidt, 1999) fluid dynamics (Zadeh, 2008) and plasma physics (Mechhoud et al., 2013), among many others.The probability distribution P (X) = exp[−A 0 (X)] allows evaluation of expected values of physically interesting functions G(X) along the path.A 0 (X) has terms giving a measurement's influence on P (X), terms propagating the model between the measurement times t n , and a term − log[P (x(0))] representing the uncertainty in the initial state (Abarbanel, 2013).
We discuss the familiar case where measurement errors are independent at each t n and Gaussian with covariance R −1 m (l , l , t); l , l = 1, . . ., L. The model is a physical differential equation, discretized in space and time and satisfying the D-dimensional stochastic discrete time map x a (n + 1) = f a (x(n)) + R −1/2 f (a, b)η b (n); a, b = 1, . . ., D and Gaussian noise error η a (n) ∼ N (0, 1).We take With these conditions the action takes a familiar form One interesting function G(X) is X itself.Its expected value gives us the average path over the measurement window [t 0 , t m = T ].Estimates of the parameters and P (x(T )) permit prediction for t > T .Introduction

Conclusions References
Tables Figures

Back Close
Full To approximate the integral G(X) we follow the stationary path method of Laplace (Laplace, 1774) and seek extremum paths X q ; labeled by q = 0, 1 . . .satisfying ∂A 0 (X)/∂X α = 0.The index α is a label for the state and time α = {a, n}.Paths are sorted by increasing action levels A 0 (X q ): A 0 (X 0 ) ≤ A 0 (X 1 ) ≤ • • • .

Evaluating G(X)
We take the the usual data assimilation technique (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007) several steps further by 1. showing how to find a consistent global minimum of the action using an annealing method, 2. demonstrating the importance of the number of measurements at each observation time, and 3. explaining how to make systematic perturbative corrections to the value of G(X 0 ) .
In the neighborhood of X q , we expand A 0 (X) as Changing variables to U α = γ αβ (X q )(X −X q ) β leads to a contribution to the numerator of G(X) in Eq. (2) from each X q exp −A 0 (X q ) detγ(X q ) dU exp −U 2 − V (X q ) G(X q ) + W (X q ) (4) Figures

Back Close
Full where In the denominator of Eq. ( 2), G(X) is replaced by unity.We sum over the contribution of each X q to evaluate G(X) .
If the lowest action level A 0 (X 0 ) is much smaller than all others, it totally dominates the integral, and plus exponentially small corrections.
3 Annealing to find a consistent global minimum action level A 0 (X 0 ) We now turn to an annealing method to find the path X 0 , and within this method examine the importance of the number L of measurements at each observation time t n .We then present an argument that in the integral Eq. (4) the terms V (X q ), W (X q ) behave as inverse powers of R f as R f /R m becomes large.This would leave us with G(X) ≈ G(X 0 ) with corrections which are a power series in R −1 f .The implication is that all statistical data assimilation questions, as R f /R m becomes large, can be approximated by the contribution of the path X 0 with the lowest action level A 0 (X 0 ) along Introduction

Conclusions References
Tables Figures

Back Close
Full with corrections one can evaluate via standard perturbation theory.Variations about G(X 0 ) would be small and computable.The term P (x(0)) in the action is often written assuming Gaussian variation about some base state x base , so − log[P (x(0)] ∝ (x(0) − x base ) 2 R base /2, and this has the form of the measurement term evaluated at n = 0. We incorporate this expression into the term with coefficient R m in the action and no longer display it.
The annealing method is based on the observation (Quinn, 2010) that the equation for the extrema X q simplifies at R f = 0.The solution is x l (n) = y l (n); l = 1, 2, . . ., L for all observation times.The other (D − L) components of the model state vector are undetermined, and the solution is degenerate.As we increase R f , the action levels split, and depending on R m , R f , L and the precise form of the dynamical vector field f(x), there will be 1, 2, . . .minima of A 0 (X).
As R f /R m grows large, the errors in the model diminish relative to the measurement errors and impose high precision x(n+1) ≈ f(x(n)) on the model dynamics.The noise in the measurement has been taken to be Gaussian N (0, σ 2 ), so the measurement error term in the action satisfies a χ 2 distribution with mean R m σ 2 (m + 1)L/2 and standard deviation R m σ 2 (m + 1)L/2.
The action level for large R f approximately approaches a lowest value σ 2 is the noise level in y l (n) and (m + 1) is the number of measurement times t n .This provides a consistency condition on the identification of the path X 0 by giving a consistent global minimum action value A 0 (X 0 ).If the action levels revealed by our annealing procedure do not give this, it is a sign that the data is inconsistent with the model.

Annealing procedure
The annealing process proceeds as follows: with very small initial R f , we call it R f 0 , solve the (m + 1)D-dimensional search problem with an optimization algorithm that seeks minima of A 0 (X).Start the search with a set of trial paths whose components are selected from a uniform distribution within limits suggested by examining the times series generated by the model x → f(x) (or any other selection process for the initial guess).This will generate a collection of approximate paths X q .Increase R f by a small increment (we choose R f = {R f 0 2 β }, where β = 0, 1, . . .), and using the paths found for the previous R f as initial guesses, find a new set of approximate X q .Continue this process until the lowest action level path X 0 produces an A 0 (X 0 ) near Eq. ( 6).If the action level A 0 (X 0 ) is substantially less than the action level on the next path A 0 (X 0 ) A 0 (X 1 ), all expected values G(X) are given by X 0 , and corrections due to fluctuations about that path.The contribution to the expected values of the path X 1 with the next action level is exponentially small, of order exp[−(A 0 (X 1 ) − A 0 (X 0 ))].
The annealing procedure discussed above is different from standard simulated annealing (Aguiar e Oliviera et al., 2012).We call this annealing because varying R f is similar to varying a temperature in a statistical physics problem where R f is inversely proportional to the temperature.At high temperatures, small R f , the dynamics among the degrees of freedom x a (n) is essentially irrelevant, and we have a universal solution where the observed degrees of freedom match the observations.As the temperature is decreased, the dynamics plays a more and more significant role, and allowed paths "freeze" out.Action levels play the role of energy levels in statistical physics, and as the action is positive definite, the paths are directly analogous to instantons in the Euclidian field theory represented by A 0 (X) (Zinn-Justin, 2002).Introduction

Conclusions References
Tables Figures

Back Close
Full We now present the results of a set of calculations on the Lorenz96 (Lorenz, 2006) model used frequently in geophysical data assimilation discussions as a testbed for proposed methods.The model has D degrees of freedom x a (t) satisfying the differential equation .17, for which the x a (t) are chaotic (Kostuk, 2012).We studied D = 20 and added f as an additional degree of freedom satisfying ḟ = 0.
We performed a twin experiment in which we solved these equations with an arbitrary choice of initial conditions using a fourth order Runga-Kutta solver with ∆t = 0.025 over 160 steps in time.Here, t 0 = 0 and t m = T = 4.We then added iid Gaussian noise with zero mean and variance σ 2 = 0.25 to each time series.L = 1, 2, . . . of the data series were then represented in the action at each measurement time t n during our annealing procedure.
In the action we selected R m = 4, the inverse variance of the noise added to the data in our twin experiment, so the minimum action level we expect is 80.5L ± 8.97 √ L. The paths are (m + 1)(D + 1) = 3381-dimensional.Our search for minimum paths used a BFGS routine (Press et al., 2012) to which we provided an analytical form of the gradient of A 0 (X).The search was initialized with 100 times with initial paths from a uniform distribution of values in the interval [-10,10].
In Fig. 1 we display the computed action levels for L = 5, 7, 8 and 9.For L = 5 there are many close action levels associated with the extremum paths of the action Eq. ( 1 of many paths X q to evaluate the expected value G(X) .At L = 7 or higher, X 0 dominates the integral.Our estimate for the forcing parameter, set to 8.17, was 8.22 at large β.
It is important to note that if we begin our search for the extremum states X q at large values of R f we are almost sure to miss the actual path X 0 which gives the lowest action level, since the Hessian matrix of A 0 is ill-conditioned when R f is large.See Fig. 4.6 in (Quinn, 2010).
The real test of an estimation procedure data assimilation is not accuracy in the estimation, but accuracy in prediction beyond the observation window.As this is a twin experiment, we show in Fig. 2 the data, the estimated state variable, and the predicted state variable for an an observed variable x 3 (t) and for an unobserved variable x 12 (t) for L = 8.In a real experiment, we could not compare our estimates for the parameters or the unobserved state variables.Although the estimation procedure for the path X 0 with the minimum action value is rather good, in estimating 12 unobserved states and one parameter, there are, of course, errors in our knowledge of the full state x(t = 4).The predictions lose their accuracy in time because of the chaotic nature of the trajectories at f = 8.17.
To see how well our procedure works for several unknown parameters, we introduced 10 different forcing parameters f a into the Lorenz96 model at D = 10: ẋa (t) = x a−1 (t)(x a+1 (t) − x a−2 (t)) − x a (t) + f a .For D = 10, the lowest action level stands out from the rest at L = 4.In Table 1 we show our estimates for the ten forcing parameters for L = 4, 5, and 6, as well as the actual value used in the calculations of the data.In these estimates, and for the single forcing parameter reported above for D = 20, there is a known source of bias (Kostuk et al., 2012).As one can see in the examples it is small here.
To give some sense of what one might expect if the model were totally wrong, we presented data from a collection of 1963 Lorenz model (Lorenz, 1963)  Twelve time series data are generated by four individual Lorenz63 (Lorenz, 1963) systems with different initial conditions.Gaussian white noise with zero mean and standard deviation σ = 0.5 are added to each time series.All these "data" y l (t) are rescaled to lie within [−10, 10].
We then place these signals as "data" in the action with the model taken as Lorenz96 D = 10, the single forcing parameter is treated as a time-dependent state variable obeying ḟ = 0. We use L = 5 and 6 as measurements using the data time series taken in the order y 1 (t), y 3 (t), y 5 (t), y 7 (t), y 9 (t), y 2 (t), y 4 (t) for the two cases.In Fig. 3 we display the action levels associated with this for L = 5 and 6. Results for other values of L are consistent with these.This example provides a graphic illustration of the inconsistency of the data and the model and how this makes its appearance in the annealing procedure.
We now have seen that a consistent global minimum action level can be identified via an annealing process and the dependence of the action levels on the number of measurements has been demonstrated.We use "consistent" as we have no formal proof it is a global minimum, only the consistency condition Eq. ( 6).

Corrections to the contribution of the extremum path to G(X)
We turn back to the evaluation of the path integral for G(X) .In that integral for our action we have for r ≥ 3. The functions h(X) and g(X) are derivable from the form of A 0 (X).These are to be evaluated at X = X q for the qth extremum path.
In the form of the integral Eq. ( 4) we see that the term in the exponential with order r in U has order of magnitude about R f /(R m + R f h(X q )) r/2 for r ≥ 3. Similarly in the expansion of G(X) the term of order k has a denominator of order 1/(R m +R f h(X q )) k/2 for Figures

Back Close
Full k ≥ 1. Taking into account that only terms with even powers of U are nonzero in the integral because of symmetry, we have a collection of terms which, as R f becomes large with respect to R m , decrease as 1/R f or faster.As a rule of thumb in the calculations we presented for Lorenz96 D = 20, we see that for β = 15 or larger, or R f /R m ≈ 100 or larger, the terms in the path integral beyond the quadratic term in the expansion of the action become small.A stronger estimate would come from evaluating the leading term in 1/R f .In summary, we have examined the path integral formulation of data assimilation and asked how well a variational approximation, which looks for Laplace like estimates of the path integral by seeking extremum paths of the action, works to evaluate the expected values of functions on the path.This approximation is widely used in meteorology and other fields where it is known as weak 4DVar (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007).The action is the cost function which is minimized.
In previous work with variational principles for data assimilation, we are unaware of any procedure such as our annealing method using R f to identify a consistent global minimum action.Nor are we aware of a systematic exploration of the dependence of the action levels on the number of measurements at each observation time.(Of course, there is the discussion in Quinn, 2010, which generated this work.)Finally, we do not know of a discussion of the corrections to the variational approximation, which here is shown to consist of small perturbations when the resolution of the model error term in the action is increased, namely R f becomes large.

Conclusions
We have worked within a framework where the measurement errors and the model errors in the data assimilation are Gaussian, with the inverse covariance of the model errors taken to be of order R f .We have shown that the path which gives a consistent global minimum action can be traced by an annealing procedure starting with a setting Introduction

Conclusions References
Tables Figures

Back Close
Full where the dynamical model essentially plays no role, R f ≈ 0, then systematically increasing the influence of the model dynamics.As the scale of the model error reaches about 100 times the scale of the measurement error, the path X 0 associated with a consistent global minimum action level dominates the path integral with corrections of order 1/R f .The important role of the number of measurements L at each measurement time is also demonstrated.
In this paper we do not address the typical situation where the number of measurements actually available is less than that needed to allow the ground level of the action to lie well below the next level.For a solution to that problem we have used information from the waveform of the measurements as shown in some detail in (Rey et al., 2014).
The results here justify the use of the variational approximation in data assimilation, focus on the role of the number of measurements one requires for accuracy in that approximation, and permits the evaluation of systematic corrections to the approximation.We anticipate that our use of Gaussian model error and measurement error is a convenience and that other distributions of these errors will permit the same set of statements about the value of variational approximations to statistical data assimilation.Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ); as L increases, the lowest action level visibly separates from the others.At the bottom of each panel, we indicate the lowest action level value and its standard deviation.The next-lowest action level A 0 (X 1 ), for L = 5, 7, 8, and 9, is at 403.8, 749.8, 1161.6, and 2256.1, respectively.This means for L = 5 we would have to sum over the contributions Discussion Paper | Discussion Paper | Discussion Paper | oscillators oscillators to a Lorenz96 D = 10 model.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |