Improved Variational Methods in Statistical Data Assimilation

Data assimilation transfers information from an observed system to a physically based model system with state variables x(t). The observations are typically noisy, the model has errors, and the initial state x(t 0) is uncertain: the data assimilation is statistical. One can ask about expected values of functions G(X) on the path X = {x(t 0),. . ., x(t m)} of the model state through the observation window t n = {t 0 ,. . ., t m }. The conditional (on the measurements) probability distribution P (X) = exp[−A 0 (X)] determines these expected values. Variational methods using saddle points of the " action " A 0 (X), known as 4DVar (Talagrand and Courtier, 1987; Evensen, 2009), are utilized for estimating G(X). In a path integral formulation of statistical data assimilation, we consider variational approximations in a realization of the action where measurement errors and model errors are Gaussian. We (a) discuss an annealing method for locating the path X 0 giving a consistent minimum of the action A 0 (X 0), (b) consider the explicit role of the number of measurements at each t n in determining A 0 (X 0), and (c) identify a parameter regime for the scale of model errors, which allows X 0 to give a precise estimate of G(X 0) with computable, small higher-order corrections.


Introduction
In a broad spectrum of scientific fields, transferring the information contained in L observed data time series y l (t n ); l = 1, . . ., L; n = 0, . . ., m to a physically based model of the processes producing those observations allows the estimation of unknown parameters and unobserved states of the model within an observation window {t 0 , . .., t m }.As a sam-ple of these fields, we note applications in meteorology (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007), geochemistry (Eibern and Schmidt, 1999), fluid dynamics (Zadeh, 2008) and plasma physics (Mechhoud et al. , 2013), among many others.
The conditional probability distribution P (X) = exp[−A 0 (X)] allows evaluation of expected values of physically interesting functions G(X) along the path.P (X) is conditioned on the measurements y l (t n ); l = 1, . . ., L; n = 0, . . ., m.These are L-dimensional, while the model state is D-dimensional; it is usually the case that D L. Data assimilation seeks to use the information in y(t n ) to estimate unknown parameters in the model and unobserved states of the model.If this is accomplished, one uses prediction for t > t m to examine the validity of the model.
The action A 0 (X) contains terms giving a measurement's influence on P (X), terms propagating the model between the measurement times t n , and a term − log[P (x(0))] representing the uncertainty in the initial state (Abarbanel, 2013).We discuss the familiar case where additive measurement errors are independent at each t n and Gaussian with covariance R −1 m (l, l , t); l, l = 1, . . ., L. The model is a physical differential equation, discretized in space and time and satisfying the D-dimensional stochastic discrete time map x a (n + 1) = f a (x(n)) + R −1/2 f (a, b)η b (n); a, b = 1, . . ., D with iid Gaussian noise error η a (n) ∼ N (0, 1).We take R m (l, l , n) = R m (n) δ l,l and R f (a, b) = R f δ a,b .R m (n) is zero except near observation times t n .
Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.

J. Ye et al.: Variational approximations in statistical data assimilation
With these conditions, the action takes a familiar form: x a (n + 1) − f a (x(n))

2
− log[P (x(0))]. (1) The conditional expected value G(X) of a function G(X) on the path X = {x(t 0 ), . .., x(t m )} is given as One interesting function G(X) is X itself, whose expected value gives us the average path over the measurement window [t 0 , t m ].Estimates of the parameters and P (x(t m )) permit prediction for t > t m ; in this window, one compares observations and predictions using model output with a userselected metric.No information is passed to the model in the prediction window.Importantly, good estimation of observed state variables (a "good fit") is not sufficient to produce confidence in the quality of the model; good prediction is critical.
To approximate the integral G(X) , we follow the stationary path method of Laplace (Laplace, 1774) and seek saddle points in path space X q labeled by q = 0, 1 . . .satisfying ∂A 0 (X)/∂X α = 0.The index α is a label for the state and time α = {a, n}.To determine their importance in evaluating the integral, Eq. ( 2) paths are sorted by increasing action levels A 0 (X q ): A 0 (X 0 ) ≤ A 0 (X 1 ) ≤ • • •.

Evaluating G(X)
We take the the usual data assimilation technique (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007) several steps further by 1. showing how to find a consistent path X 0 for the minimum action level using an annealing method, 2. demonstrating the importance of the number of measurements L at each observation time t n , and 3. explaining how to make systematic perturbation corrections to G(X 0 ) , i.e., G(X 0 ) evaluated on the minimum action level path.
For nonlinear problems of interest, there may be many paths X q ; q = 0, 1, . . .satisfying the saddle point condition.To assess their contributions to G(X) , we expand A 0 (X) in the neighborhood of each X q as (note that all variables are notated in Table 2) There is an implied sum over all paired Greek indices α j .A sum over all terms with comparable action level A 0 (X q ) then gives an approximation to G(X) .
Changing variables to U α = γ αβ (X q ) (X − X q ) β leads to a contribution to the numerator of G(X) in Eq. ( 2) from each where The contributions to the denominator of Eq. ( 2) are identical to Eq. ( 4), with the factor [G(X q ) + W (X q )] replaced by unity.We sum over the contribution of each X q to evaluate G(X) .
If the lowest action level A 0 (X 0 ) is much smaller than all others, then exp[−A 0 (X 0 )] exp[−A 0 (X q =0 )] and its contribution to G(X) totally dominates the integral.We have then that plus exponentially small corrections from the action levels associated with other saddle paths X q =0 .
3 Annealing to find a consistent minimum action level A 0 (X 0 ) We now turn to an annealing method to find the path X 0 where exp[−A 0 (X 0 )] exp[−A 0 (X q =0 )].Within this method, we first examine the importance of the number L of measurements at each observation time t n .We then present an argument that, in the integral Eq. ( 4), contributions to G(X) from the terms V (X q ), W (X q ) behave as inverse powers of R f as R f /R m becomes large.This would leave us with G(X) ≈ G(X 0 ) with corrections that are a power series in R −1 f .The implication is that all statistical data assimilation questions, as R f /R m becomes large, can be well approximated by the contribution of the path X 0 with the lowest action level A 0 (X 0 ) along with corrections one can evaluate via standard perturbation theory.Variations about G(X 0 ) would be small and computable.
The term in the action expressing uncertainty in the initial model state x(0), P (x(0)), is often written assuming Gaussian variation about some base state x base , so − log[P (x(0)] ∝ (x(0) − x base ) 2 R base /2, and this has the form of the measurement term evaluated at n = 0. We incorporate this expression into the term with coefficient R m in the action and no longer display it.This term, often called a prior, is also seen at the initial condition for the time evolution of the conditional probability distribution P (X).

Annealing details
The annealing method starts with the observation (Quinn, 2010) that the equation for the saddle points X q simplifies at R f = 0.The action at R f = 0 has no information about the model, and relies solely on the measurements.The saddle point solution is x l (n) = y l (n); l = 1, 2, . . ., L for all observation times.The other (D − L) components of the model state vector are undetermined, and the solution is quite degenerate.As we increase R f , the action levels split, and, depending on R m , R f , L and the precise form of the dynamical vector field f (x), there will be 1, 2, . . .saddle points of A 0 (X).The saddle points of interest are local minima.
For any R f > 0, the search for saddle points of the action requires an unconstrained numerical optimization problem to be solved: minimize A 0 (X).This is 4DVar in its "weak" formulation (Talagrand and Courtier, 1987;Evensen, 2009).The methods we have utilized for performing this numerical optimization range from Newton and quasi-Newton methods (Press et al., 2012) to more sophisticated interior point methods (Waëchter , 2002).Each search for saddle paths requires an initial guess X (0) that is then iterated via the algorithm to produce X (1) , then X (2) , and continuing on to produce a final path X (final) for each value of R f .
We begin the annealing process by choosing the initial R f , denoted R f 0 , as very small, but nonzero; indeed, if R f 0 was chosen to be vanishing, the set of optimal paths would be all paths whose measured components match the data, and whose unmeasured components are entirely unconstrained.Such a solution is infinitely degenerate and gives no more insight than the data themselves.Therefore, we begin the search at some R f 0 1 with the first L(m + 1) components of X (0) taken as the observations y(t n ); n = 0, 1, . . ., m.The remaining components of X (0) are chosen from a uniform distribution reflecting the dynamical range of the unmeasured state variables.
Because the search algorithm is an iterative process with potentially many basins of attraction, it is not evident which minimum or saddle point we will hit in this initial optimization.Accordingly, we actually start with N I copies of this procedure, making N I independent, with initial choices of X (0;r) ; r = 1, 2, . . ., N I at R f 0 ; in other words, we initiate many such annealing procedures in parallel.These N I choices differ in the unmeasured components of the path, independently drawn from a uniform distribution over the range of each variable.
In the next step of the annealing process, the value of R f is increased to R f 0 ξ where ξ > 1.Using as our initial paths the final paths X (final;r) from the previous optimization with R f = R f 0 , we perform the numerical optimization procedure once again, independently for each of the N I paths.This results in a new set of N I saddle paths X (final;r) ; r = 1, 2, . . ., This process is repeated as many times as desired, increasing R f from one iteration to the next by a power of ξ .We use the N I final paths from each value of R f as the initial paths for the next value of R f .The output is a set of N I paths and action values for each value of β = 0, 1, . . ., β max , and the action level for each of the N I final paths is plotted for each R f .Specific examples of this will be presented below.
All of these optimizations, for each r and each β, is a 4DVar calculation for which we have used various algorithms (Press et al., 2012;Waëchter , 2002).In a sense, we can think of this as an ensemble version of 4DVar, with the important aspect that we perform N I calculations of 4DVar at each β.
As R f /R m grows large, the errors in the model diminish relative to the measurement errors and impose highprecision x(n + 1) ≈ f (x(n)) on the model dynamics.The noise in the measurement has been taken to be Gaussian N (0, σ 2 ), so the measurement error term in the action satisfies a χ 2 distribution (NIST Handbook of Statistical Methods , 2012) with mean R m σ 2 (m + 1)L/2 and standard deviation R m σ 2 √ (m + 1)L/2.We note that this observation has been made by Bennett (2002) when the weak form of 4DVar, here using the action as the objective function, results in accurate representation of the dynamics.
Using properties of the χ 2 distribution, the action level for large R f approximately approaches a lowest value σ 2 is the noise level in y l (n) and (m + 1) is the number of measurement times t n .This provides a consistency condition on the identification of the path X 0 by giving a consistent minimum action value A 0 (X 0 ).If the action levels revealed by our annealing procedure do not give this result for A 0 (X 0 ), it is a sign that the data are inconsistent with the model.At the beginning of the annealing procedure when R f = R f 0 , there was a degeneracy in the action values.As R f increases, this degeneracy is broken and the action levels split.If the action level A 0 (X 0 ) is substantially smaller than the action level on the next saddle path A 0 (X 0 ) A 0 (X 1 ), all expected values G(X) are given by G(X 0 ), and corrections due to fluctuations about that path.The contribution to the expected values of the path integral for G(X) from X 1 with the next action level is exponentially smaller, of order exp[−(A 0 (X 1 ) − A 0 (X 0 ))].
The annealing procedure discussed above is different from standard simulated annealing (Aguiar e Oliviera et al., 2012).We call this annealing because varying R f is similar to varying a temperature in a statistical physics problem where R f is inversely proportional to the temperature.At high temperatures, small R f , the dynamics among the degrees of freedom x a (n) is essentially irrelevant, and we have a universal solution where the observed degrees of freedom match the observations.As the temperature is decreased, the dynamics plays a more and more significant role, and allowed paths "freeze" out.Action levels play the role of energy levels in statistical physics, and as the action is positive definite, the paths are directly analogous to instantons in the Euclidian field theory represented by A 0 (X) (Zinn-Justin, 2002).
The calculations used in the annealing process may seem formidable.We have chosen N I = 100 in the examples we present now, because we wish to make clear the scope of the calculations and their implications.We chose this rather than concentrating now on optimizing the algorithms.This will come with some experience with different model structures.Nonetheless, due to the manner in which the action levels space themselves and split R f is increased, it might prove less demanding to use a large N I for the first few values of R f , and then significantly decrease the number of paths from N I .Our goal is to trace accurately just the lowest action value state X 0 and the next X 1 to estimate the scale of the dominance of A 0 (X 0 ) in performing the path integral.

Examples from the Lorenz96 model
We now present the results of a set of calculations on the Lorenz96 (Lorenz, 2006) model used frequently in geophysical data assimilation discussions as a test bed for proposed methods.The model has D degrees of freedom x a (t) satisfying the differential equation .17, for which the x a (t) are chaotic (Kostuk, 2012).We studied D = 20 and added f as an additional degree of freedom satisfying ḟ = 0.The number of model equations in the action is therefore equal to D + 1.
We performed a twin experiment in which we solved these equations, with ζ a (t) = 0 and with an arbitrary choice of initial conditions x(0) using a fourth-order Runge-Kutta solver with t = 0.025 over 160 steps in time.Here, t 0 = 0 and t m = T = 4.We then added iid Gaussian noise with zero mean and variance σ 2 = 0.25 to each time series.L = 1, 2, . . . of the data series were then represented in the action at each measurement time t n during our annealing procedure.In our calculations, R f 0 = 0.01 and ξ = 2.
In the action, we selected R m = 4, the inverse variance of the noise added to the data in our twin experiment, so the minimum action level we expect is 80.5 L ± 8.97

√
L. The paths are (m + 1) (D + 1) = 3381-dimensional.Our search for minimum paths used a BFGS routine (Press et al., 2012) to which we provided an analytical form of the gradient of A 0 (X).The search was initialized with N I = 100 times with initial paths from a uniform distribution of values in the interval [−10, 10].
We also investigated using the IPOPT public domain numerical optimization package (Waëchter , 2002) and found that it also worked well, giving essentially the same results as BFGS.Any existing method for 4DVar should work as well for this annealing procedure.
In Fig. 1, we display the computed action levels for L = 5, 7, 8 and 9 at each value of log R f ∝ β.For L = 5, there are many close action levels associated with the extremum  paths of the action Eq. ( 1); as L increases, the lowest action level visibly separates from the others.At the bottom of each panel, we indicate the lowest action level value and its standard deviation.The next-lowest action level A 0 (X 1 ), for L = 5, 7, 8, and 9, is at 403.8, 749.8, 1161.6, and 2256.1, respectively.This means that for L = 5 we would have to sum over the contributions of many paths X q to evaluate the expected value G(X) .At L = 7 or higher, X 0 dominates the integral.Our estimate for the forcing parameter, set to 8.17, was 8.22 at large β.
We think it important to note that if we had begun our search for the saddle point paths X q at large values of R f , we would be almost sure to miss the actual path X 0 , which gives the lowest action level, since the Hessian matrix of A 0 is ill conditioned when R f is large; see Fig. 4.6 in Quinn (2010).
Another sense of why beginning a search at large values of R f may fail to find the action level identified in the annealing approach is that the basin of attraction of the minimum action level is likely to be so small, relative to the size of path space, that "falling into it" through an arbitrary initial path used in any version of a variational procedure is unlikely.The annealing method systematically tracks the known minimum for very small R f and, in that manner, starts in and remains in the appropriate basin of attraction.
The real test of an estimation procedure in data assimilation is not accuracy in the estimation of measured states and fixed parameters, but accuracy in prediction beyond the observation window.The predictions require accurate estimation of the unobserved model state variables at the end of the observation window.Indeed, one can achieve good "fits" of observed variables that lead to inaccurate predictions for t > t m = T (Abarbanel, 2013).
As this is a twin experiment, we show in Fig. 2 the data, the estimated state variable, and the predicted state variable for an observed variable x 3 (t) and for an unobserved variable x 12 (t) for L = 8.In a real experiment, we could not compare our estimates for the parameters or the unobserved state variables.Although the estimation procedure for the path X 0 with the minimum action value is rather good, both in estimating the 12 unobserved states and the one parameter, there nevertheless exist errors in our knowledge of the full state (x(t) = 4).The predictions thus lose their accuracy in time because of the chaotic nature of the trajectories at f = 8.17.
In the Lorenz96 equations, one usually has a single forcing parameter f .To see how well our procedure works for several unknown parameters, we introduced ten different forcing parameters f a into the Lorenz96 model at that is, with no fluctuations in the dynamics.Noise was added to the solutions to the Lorenz96 system before the x a (t) are used as data.For D = 10, the lowest action level stands out from the rest at L = 4.In Table 1, we show our estimates for the ten forcing parameters for L = 4, 5, and 6, as well as the actual value used in the calculations of the data.In these estimates, and for the single forcing parameter reported above for D = 20, there is a known source of bias (Kostuk et al., 2012).As one can see in the examples, it is small here.
In the twin experiments we presented noisy data from the known model as L = 1, 2, . . .data series of observations.To  (Lorenz, 1963) systems with different initial conditions.Gaussian white noise with zero mean and standard deviation σ = 0.5 are added to each time series.All these "data" y l (t) are rescaled to lie within [−10, 10].
We then place these signals as "data" in the action with the model taken as Lorenz96 D = 10; the single forcing parameter is treated as a time-dependent state variable obeying ḟ = 0. We use L = 5 and L = 6 as measurements using the data time series taken in the order y 1 (t), y 3 (t), y 5 (t), y 7 (t), y 9 (t) for L = 5, and y 1 (t), y 3 (t), y 5 (t), y 7 (t), y 9 (t), y 2 (t) for L = 6.In Fig. 3 we display the action levels versus log [R f /R f 0 ] = β associated with this for L = 5 and L = 6.Results for other values of L are consistent with these.This example provides a graphic illustration of the inconsistency of the data and the model and how this makes its appearance in the annealing procedure.
We also investigated, but do not display here, the action levels when the parameter f is held at a different value in the model than was used for generating the data.In particular, we generated the data with f = 8.17 and then held f = 18 in evaluating the action levels for a Lorenz96 model with D = 5 and L = 1 and 2. In each case, no minimum action level split from the collection of X q and no level was close to the consistent χ 2 condition discussed above.This actually emphasizes the importance of allowing the variational method to include all parameters as state variables with a zero vector field, allowing the estimation of the parameters along with the unmeasured state variables.
We now have seen that a consistent smallest action level can be identified via an annealing process, and the dependence of the action levels on the number of measurements, L, has been demonstrated.We have no formal proof that the path X 0 gives a global minimum of the action; our criteria of  (Lorenz, 1963).The structure of the action levels versus R f shows no trace of the minimum allowed level Eq. ( 6).This indicates that the data and the model are incompatible.The action levels are also quite large, and, for L = 6, numerous and not well separated.consistency with Eq. ( 6) and excellent predictions after the observation window are functionally useful features of the procedure.

Corrections to the contribution of a saddle path to G(X)
We turn back to the evaluation of the path integral for G(X) in Eq. ( 2).In that integral for our action, we have and for r ≥ 3. The functions h(X) and g(X) are derivable from the form of A 0 (X).These are to be evaluated at X = X q for the qth saddle path.
In the form of the integral Eq. ( 4), we see that the term in the exponential with order r in U has an order of magnitude about R f /(R m + R f h(X q )) r/2 for r ≥ 3. Similarly, in the expansion of G(X), the term of order k has a denominator of order 1/(R m + R f h(X q )) k/2 for k ≥ 1.Since only terms with even powers of U are nonzero in the integral because of symmetry, we have a collection of terms that, as R f becomes large with respect to R m , decrease as 1/R f or faster.As a rule of thumb in the calculations we presented for Lorenz96 D = 20, we see that for β = 15 or larger, or R f /R m ≈ 100 or larger, the terms in the path integral beyond the quadratic term in the expansion of the action become small.A stronger estimate would come from evaluating the leading term in 1/R f .Paths satisfying the saddle point condition, ordered by action values A 0 (X q ) y(t n ) Measured data time series

Conclusions
We have examined the path integral formulation of data assimilation Eq. ( 2) and asked how well a variational approach to the conditional expected values of functions G(X) on the path X = {x(0), x(1), x(2), . .., x(m)} can approximate the integral.We use Laplace's method, which estimates the path integral by seeking saddle paths of the action where ∂A 0 (X)/∂X α = 0.This approximation is widely used in meteorology and other fields, where it is known as weak 4DVar (Talagrand and Courtier, 1987;Evensen, 2009;Lorenc and Payne, 2007).The action is the cost function that is minimized.
When measurement errors and model errors are distributed as iid Gaussian noise, we have described an annealing method in the strength R f with which the model errors enter the action that permits the following of a collection of action levels from R f = 0, where the saddle paths are known precisely, through a set of increasing values of R f at each of which a numerical optimization algorithm produces a set of saddle paths that are used as the initial conditions for the solution of the variational approximation at the next larger value of R f .This permits the identification of a lowest action level associated with a saddle path X 0 where A 0 (X 0 ) is split from the action values for the other possible saddle paths X q>0) at each R f : A 0 (X 0 ) < A 0 (X q>0) ).The contribution of X 0 then dominates the integral.
We also explored the dependence of the action levels revealed through the annealing method on the number of measurements L made at each observation time.When L reaches and exceeds a critical value, related to the number of unstable directions in the deterministic chaotic dynamics of the model, the contribution of the path X 0 is either certainly the dominant contribution to the conditional expected value of G(X), or it is the only path satisfying the saddle path condition.By expanding the integrand of Eq. ( 2) about X 0 we argued that the resulting corrections to the contribution of X 0 produced a power series in R −1 f .In previous work with variational principles for data assimilation, we are unaware of any procedure such as our annealing method using R f to identify a consistent minimum action.Nor are we aware of a systematic exploration of the dependence of the action levels on the number of measurements at each observation time.(Of course, there is the discussion in Quinn, 2010, which suggested this work.)Finally, we do not know of any previous discussion of the corrections to the variational approximation, which here is shown to consist of small perturbations when the resolution of the model error term in the action is increased; namely, R f becomes large.
The relation of the annealing method to familiar 4DVar calculations (Talagrand and Courtier, 1987;Evensen, 2009) is simple to state: each set of calculations during the annealing in R f , at each value of R f , is a 4DVar calculation.The annealing has the added advantage of allowing one to establish an identified lowest action value path X 0 that gives the dominant contribution to the quantities of interest, namely, the conditional expected values of functions G(X) on the path.
A similar annealing procedure in the importance of the model errors as represented by R f can be incorporated into a Monte Carlo calculation of the path integral.The relation between that method of approximating the high-dimensional integral and the variational method we have focused on here is through the Langevin equation in "algorithmic time" described in Abarbanel (2013).In each approach, the statistics of moments about X 0 or marginal distributions of selected state variables is achieved by the choice of G(X).This paper has compared those evaluations in the variational method to standard 4DVar by using 4DVar at each stage of the annealing adding the identification of the path X 0 with the lowest action level.By performing the annealing procedure in a Monte Carlo context, one could compare it to other stan-dard ensemble methods such as the ensemble Kalman filter (EnKF) (Evensen, 2009).However, that is a subject for another investigation of the annealing approach; we have focused here only on variational methods.
We have worked within a framework where the measurement errors and the model errors in the data assimilation are Gaussian, with the inverse covariance of the model errors taken to be of order R f .We have shown that the path that gives a consistent minimum action level can be traced by an annealing procedure starting with a setting where the dynamical model essentially plays no role, R f ≈ 0, then systematically increasing the influence of the model dynamics.
As the scale of the model error reaches about 100 times the scale of the measurement error, the path X 0 associated with a consistent global minimum action level dominates the path integral with corrections of order 1/R f .The important role of the number of measurements L at each measurement time is also demonstrated.
If the noise terms represented by the model errors are not Gaussian, one can still use the annealing method to identify a path with the lowest action level, but showing that perturbation theory about the path X 0 giving that lowest action level is a power series in R −1 f may not succeed.The precise way R f enters the matrix γ 2 αα (X 0 ) determines that.In this paper, we do not address the typical situation where the number of measurements actually available is less than that needed to allow the ground level of the action A 0 (X 0 ) to lie well below the next level.For a solution to that problem, we have used information from the waveform of the measurements as shown in some detail in Rey et al. (2014).
The results here justify the use of the variational approximation in data assimilation, focus on the role of the number of measurements one requires for accuracy in that approximation, and permit the evaluation of systematic corrections to the approximation when the form of the action is Eq.(1).We anticipate that our use of Gaussian model error and measurement error is a convenience and that other distributions of these errors will permit many of the same set of statements about the value of variational approximations to statistical data assimilation.

Figure 1 .
Figure 1.Action levels as a function of R f for the Lorenz96 model, D = 20, R f 0 = 0.01.(a) At L = 5, we used y 1 (t), y 3 (t), y 5 (t), y 7 (t), and y 9 (t) in the action; (b) at L = 7, y 11 (t) and y 13 (t) are added; (c) at L = 8, y 15 (t) is added; and (d) at L = 9, y 17 (t) is added.The expected values of the lowest action level are denoted by black dashed lines.

Figure 3 .
Figure 3. Action levels as a function of R f for the Lorenz96 model, D = 10, R f 0 = 0.01.(a) L = 5 and (b) L = 6 when the wrong data are used for the Lorenz96 model.We actually used data from four realizations of the Lorenz63 model(Lorenz, 1963).The structure of the action levels versus R f shows no trace of the minimum allowed level Eq.(6).This indicates that the data and the model are incompatible.The action levels are also quite large, and, for L = 6, numerous and not well separated.

Table 1 .
Known and estimated forcing parameters for the Lorenz96 model at D = 10, and L = 4, 5, and 6.

Table 2 .
List of important mathematical notations (in alphabetical order).