Diagnostics on the cost-function in variational assimilations for meteorological models

Several consistency diagnostics have been proposed to evaluate variational assimilation schemes. The "Bennett-Talagrand" criterion in particular shows that the cost-function at the minimum should be close to half the number of assimilated observations when statistics are correctly specified. It has been further shown that sub-parts of the cost function also had statistical expectations that could be expressed as traces of large matrices, and that this could be exploited for variance tuning and hypothesis testing. The aim of this work is to extend those results using standard theory of quadratic forms in random variables. The first step is to express the sub-parts of the cost function as quadratic forms in the innovation vector. Then, it is possible to derive expressions for the statistical expectations, variances and cross-covariances (whether the statistics are correctly specified or not). As a consequence it is proven in particular that, in a perfect system, the values of the background and observation parts of the cost function at the minimum are positively correlated. These results are illustrated in a simplified variational scheme in a one-dimensional context. These expressions involve the computation of the trace of large matrices that are generally unavailable in variational formulations of the assimilation problem. It is shown that the randomization algorithm proposed in the literature can be extended to cover these computations, yet at the price of additional minimizations. This is shown to provide estimations of background and observation errors that improve forecasts of the operational ARPEGE model.


Introduction
Most operational data assimilation schemes in meteorology are loosely based on statistical linear estimation (e.g.Talagrand, 2010.The variational formulation (Le Dimet and Talagrand, 1986;Courtier et al., 1998;Rabier et al., 2000) has proven very effective in assimilating directly and massively satellite observations, which has been a source of huge progress in numerical weather prediction (Simmons and Hollingsworth, 2002;Rabier, 2005).The cost function to be minimized measures the distances of the atmospheric state to the observations and to the background, weighted by the inverse of their error covariances.These error statistics are unfortunately not very well known.Common methods to estimate observation errors include the innovation approach (Hollingsworth and Lonnberg, 1986;Daley, 1993), where observation error statistics are taken to be spatially uncorrelated and are deduced from the innovation covariance.Background error statistics are estimated within an existing data assimilation scheme, for instance using forecasts at different ranges valid at the same time (Parrish and Derber, 1992;Derber and Bouttier, 1999) or, more recently and inspired by the Ensemble Kalman Filter (e.g.Evensen, 2003), ensembles of perturbed variational assimilations (Fisher, 2003;Kucukkaraca and Fisher, 2006;Belo-Pereira and Berre, 2006).
Despite this recent progress, it is still necessary and useful to check for the consistency between the prescribed statistics and the a posteriori error statistics.Any inconsistency could help to point out imperfections in the specification of the statistics.Objective assessment of the quality of assimilation schemes can only be performed against independent observations that are not used in the estimation process (Talagrand, 1999).Yet internal consistency diagnostics can also prove useful.Two applications of these kinds of diagnostics have been proposed: hypothesis testing (Muccino et al., 2004) and variance tuning (Desroziers and Ivanov, 2001;Chapnik et al., 2006).
Hypothesis testing formally checks whether the obtained solution is consistent with the hypotheses made concerning Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
Y. Michel: Diagnostics on the cost-function the errors in the innovation vector.In the perfect case, the minimum value of the cost function has a χ 2 p distribution where p is the number of scalar observations assimilated in the system (Bennett et al., 1993;Talagrand, 1999).However, the values of the cost function were found to differ significantly from their expectation in a series of data assimilation experiments with an ocean model (Bennett et al., 2000).Muccino et al. (2004) also studied how various errors in the specified error covariance matrices were affecting the distribution of the value of the cost function at the minimum.A Kolmogorov-Smirnov test was used to identify departures from the χ 2 distribution, with significant skill in some cases.
When certain sets of observations are taken to have uncorrelated errors, it is possible to split the observation term further as the sum of sub-functions.The expectation of these sub-functions can still be expressed (Talagrand, 1999) as the trace of large matrices.Desroziers and Ivanov (2001) (hereafter DI01) proposed to estimate these traces using a randomization algorithm.They also proposed an iterative approach for tuning the variances.It enforces consistency between the observed value of the sub-parts of the cost functions and their theoretical expectations as computed by the randomized trace estimation.This method was later shown (Chapnik et al., 2004) to be similar to a maximum-likelihood approach.Operational implementation of the scheme in the global model ARPEGE 1 from Météo-France was achieved by Chapnik et al. (2006), and the tuning of observation error variances yielded a positive impact on both analyses and forecasts.The implementation in a limited area model was discussed by Sadiki and Fischer (2005).They proposed in particular the use of time or space averages to improve sampling.Finally, the ensemble of variational assimilations with perturbed observations as implemented at Météo-France and ECMWF (Kucukkaraca and Fisher, 2006) typically uses assimilations with perturbed backgrounds and observations.Desroziers et al. (2009) then showed that the previous statistical expressions were a direct by-product of such an ensemble.
More precisely, Dee (1995) and Dee and da Silva (1999) have introduced the maximum-likelihood approach for the estimation of covariance parameters in meteorology.The likelihood function to be optimized with respect to these covariance parameters is the sum of a quadratic form in the innovation vector and the log-determinant of a precision (inverse covariance) matrix.Because of the very large size of the innovation vector (currently of the order of 10 6 elements in the ARPEGE 4D-Var), a direct computation (using for instance Choleski decomposition) is not feasible.The approach that was favored in meteorology is to evaluate the derivatives (rather than the absolute values) of the likelihood function and then using stochastic trace estimation techniques, as demonstrated by Purser and Parrish (2003) in an ideal-1 "Action de Recherche Petite Echelle Grande Echelle", see Pailleux et al., 2000 for a historical description ized context.This stochastic approximation is nearly optimal in a well-define sense as proven by Stein et al. (2013).It is also possible that direct evaluation of the log-likelihood may become feasible even in high dimensions as shown by Aune et al. (2012).This required in particular (i) more efficient evaluation of the trace with probing vectors (Bekas et al., 2007) and (ii) Krylov subspace methods for matrix inversions (Saad, 2003).These approaches look very promising, but as recognized by Purser and Parrish (2003), it is still necessary to experiment with various aspects in meteorology including non-linear observation operators and introduction of a prior term (to stabilize the solution when the number of estimated parameters become large).This study rather builds upon the simpler but convenient approach taken in DI01 for variance tuning.However, it should be stressed that this approach is unlikely to be easily extended to the estimation of other paramaters such as localisation lengthscales (Houtekamer and Mitchell, 2001), or to the heteromorphic case where there is a need to compare log-likelihoods associated with covariances that use completely different sets of parameters (Purser and Parrish, 2003).
The aim of this paper is to present a different way of deriving the Bennett-Talagrand criterion.The sub-parts of the cost function at the minimum are expressed as quadratic forms in the innovation vector.This allows the application of the theory of quadratic forms in random variables as summarized in Mathai and Provost (1992) (hereafter M92).The statistical expectation, variances and cross-covariances between subparts of the cost-function at the minimum are shown to be related to the trace of matrices in the assimilation.This relies on the assumption that the innovation vector is normally distributed, but some extensions to the Non-Gaussian case have also been derived for some specific distributions.The derivation of these expressions is outlined in Sect. 2. Practical estimation of these terms based on the randomized trace algorithm is demonstrated in Sect.3. The use of these formulas for variance tuning in the operational ARPEGE model is illustrated in Sect. 4.

Variational framework
The best estimate of the atmospheric state, x a , is obtained as the solution of the minimization of the cost function: where any time index has been dropped for convenience.(Courtier et al., 1994).Here we are only interested in the final value of the cost function (at the minimum), but of course preconditioning in minimizing Eq. ( 1) is a crucial problem in practice (e.g.Lorenc, 1997;Gauthier et al., 1999).A Bayesian derivation of Eq. ( 1) is provided by Lorenc (1986).
Introducing the innovation vector the linearized cost-function (still written J ) is expressed as: where H is the linearized observation operator and where every term is now quadratic.
A common derivation of the Bennett-Talagrand criterion casts the problem (3) into a single term using a generalized observation operator (Talagrand, 1999;Desroziers and Ivanov, 2001).This is not necessary for deriving the statistical moments of J = J (x a ).An alternative way is to write down the solution of Eq. (3) in the standard form involving the data assimilation operator (or gain matrix) K: K is a representation of performing variational assimilation for the analysis increments.The background term at the minimum J b is therefore: and similarly, the observation term at the minimum J o is: From Eq. (3), it is also possible to express the observation term before the minimization J o as and J b = 0.This explicitly shows that these terms are (halves of) quadratic forms in the innovation vector d.
In general, observation errors are taken to be diagonal or block-diagonal, with each block corresponding to a given instrument, and observation errors are taken to be uncorrelated between instruments.Following Chapnik et al. (2004), this is written as a projection operator o i operating on the observation vector y and giving a subset y i = o i y of the observations, with i = 1, . . ., P and where P is the number of such subsets.Up to a reordering, the matrix R can be written in the block-diagonal form: T are observation error covariance matrices for the ith-subset of observations.The inverse covariance matrix R −1 is also block-diagonal, with elements R −1 i , and the observation cost-function can be split into P contributions J i o (either at the minimum or before the minimization): which are quadratic forms in the innovation vector.
In general the background error covariance matrix is not block-diagonal as the different variables are correlated.This was neglected in DI01, but it can be taken into account by introducing the parameter transform K p that changes variables into uncorrelated ones (Derber and Bouttier, 1999;Bannister, 2008).Different spatial autocovariance blocks B i are specified for each unbalanced variable.This can be written as: where usually there are S = 3-5 control variables (usually vorticity or streamfunction, unbalanced divergence or velocity potential, and some form of unbalanced temperature and humidity, see Bannister (2008) for a review and comparison between various implementations).K p is generally formulated as a lower-triangular non-singular matrix with leftinverse K −1 p .The projection matrix b i is used to extract the ith-component of the background vector in unbalanced space and can be written explicitely with blocks being either the zero matrix 0 or the identity matrix I in ith position: Using this formulation allows us to split the background term into S contributions J i b that are, again, quadratic forms in d: There may be additional penalty terms in the cost function, related for instance to the coupling information in a limited area model (Guidard and Fischer, 2008) or to the weak constraint digital filter (Gauthier and Thépaut, 2001).These terms will not be considered explicitly here, but they can be put into similar quadratic forms.The digital filter expresses a constraint on the increments which are directly related to the innovation vector.The coupling information requires an extended definition of the information vector, and this issue may be addressed in future work.

Moments
The innovation vector is a random variable, so are the background and observation terms at the minimum, and it is possible to explicitly derive their expectation, but also their variances and covariances in the Gaussian case.Note that this Gaussian assumption can be relaxed for some classes of distributions; see e.g.Mathai et al. (1995) for the case of elliptically contoured distributions and Genton et al. (2001) for skew-normal ones.

Expectations
Without any loss of generality in this section (other than existence of second order moments), we will denote the first two moments of the innovation vector by: where E(•) is the expectation and Cov(•) the covariance.
Here, µ and are the actual (i.e.measured) mean and covariance of the innovation vector that may differ from their assumed values given in Eqs. ( 14) and ( 15).Then, it can be seen by applying the trace operator (M92, Corollary 3.2b.1)that the expectation of any quadratic form in d is: where Tr(•) is the trace (which is a linear operator).This result is robust as it holds for any distribution of d with finite first-two moments (Ménard et al., 2000).Equation ( 11) can be readily applied to the previously derived quadratic forms.
In particular the expectation of background and observation terms follows: No hypothesis has been made on the distribution of the innovation yet.These equations are valid for any data assimilation scheme with possibly incorrect error covariance matrices (and hence K).Now, in a well specified system, we would have unbiased innovation and well-prescribed statistics: The gain matrix can also be written as: which allows us to write the following expressions: Thus, when = D, the expectation of background and observation terms take the form as previously derived by Bennett et al. (1993), Talagrand (1999), and Desroziers and Ivanov (2001).New results are obtained by applying a similar calculation with sub-terms of the background cost function at the minimum: The expectation of any sub-term of the observation cost function at the minimum is derived in a similar way: Analogous expressions can be obtained for the sub-term of the observation cost function before the minimization (Eqs.7 and 9).

Variances
In general, the variance of the quadratic forms will depend on the fourth-order moment of d.If the innovation vector has a Gaussian distribution, then its fourth-order moment can be expressed as a function of the first two ones, and the variance of the quadratic forms can be expressed following M92 (theorem 3.2b.2): This formula can be readily applied to the previous quadratic forms (without assuming perfect statistics).Now, in the particular case of a system with well-specified statistics verifying Eqs. ( 14)-( 15) we have: The results do not seem to appear in the standard literature on data assimilation for the atmosphere, yet analogous formulas appear in Bennett et al. (2000) within the formalism of the representer method.They can be extended to sub-parts of the cost function (not shown).
Before deriving cross-covariances in the next section, one can point out that J b and J o should be positively correlated.Indeed, using the fact that 2 J has a χ 2 p distribution, we can deduce that: The matrices HK and I − HK have eigenvalues between 0 and 1 (?), therefore: Thus it is necessary that In other words, in a system with well-specified statistics, J b and J o are positively correlated.This may sound like a counter-intuitive result, as one could expect the random fluctuations of J to be "split" (with anticorrelations) between J b and J o , but this is not the case.Rather, the random fluctuations go in the same direction.

Covariances
The covariance between (sub)terms of the cost function can be calculated as follows (M92, theorem 3.2d.4): In particular, in a well-specified system: The magnitude of the correlation depends on how Tr(HK) and Tr[(HK) 2 ] differ.This highlights that it could be useful to estimate Tr (HK) 2 in operational schemes.Also, further expressions can be derived for the cross-covariances between sub-parts of the cost function (not shown).

Distribution
Beyond the statistical moments, it may be of some interest to get information on the distribution of the sub-parts of the cost function at the minimum.The random variable 2 J has a variance that is double of its expectation, as expected for a χ 2 p -distributed variable.In view of Eqs. ( 16)-( 21) and ( 17)-( 22), this is generally not the case for 2 J o , 2 J b (nor their subparts).Thus they do not follow a χ 2 distribution -see also the necessary and sufficient conditions in the chapter 5 of M92.However, they are linear combinations of χ 2 distributions in the Gaussian case (M92).
Albeit the previous expressions may be thought to be restricted to variational assimilation, they would also apply to an ensemble variant of the Kalman filter provided that the associated cost function can be computed (Ménard and Chang, 2000;Talagrand, 2010).

Practical estimation and potential use
This section introduces the randomized trace algorithm to compute the previous statistical expressions.They could be used for hypothesis testing.Rather, we focus on their use when tuning global scaling factors for matrices B and R.

Application of the randomized trace estimator
The previous expressions involve the computation of the trace of large matrices that are unavailable in large-size data assimilation schemes used in operations in meteorology.Following DI01, it is however possible to estimate them with a randomized trace estimator: where η i are independent random vectors whose components follow a standard N (0, 1) normal distribution (Girard, 1991) or Rademacher (±1 with probability 1/2) distribution (Hutchinson, 1989).The Hutchinson estimator has lower variance than Girard's but requires more samples to achieve a given relative precision (Avron and Toledo, 2011).In addition, using the Gaussian distribution is more natural in data assimilation and allows Tr(HK) to be computed as a by-product of an ensemble of variational assimilations (Desroziers et al., 2009).It is shown next that randomized estimation of traces in new formulas (21,22, and 25) is possible through perturbed minimizations (as in DI01), yet at the cost of an additional minimization.Indeed, K (the assimilation algorithm) has to be applied three times.Instead of the direct application of Eq. ( 26), DI01 take the approach of further introducing the square root forms of the error covariance matrices B = B 1/2 B T/2 and R = R 1/2 R T/2 .Then, using the fact that the trace is invariant under cyclic permutations (not general ones), the randomized estimate (26) required in Eqs. ( 21), ( 22), and (25) follows: and δy o i = R 1/2 η i are sets of perturbations of observations (accordingly to the error covariance matrix R).Thus, at least three (2M +1) minimizations are necessary for the computation of this trace: first with original observations to give the unperturbed analysis increment δx a| yo , then with perturbed observations to give the perturbed analysis increment δx * a , and finally the difference between those analysis increments is brought into observation space and used as a perturbation of observation in a third analysis giving the analysis increment δx * * a .The trace computation then follows, provided that R −1 is available as an operator, which is easy if observations errors are taken uncorrelated as it is usually the case.
DI01 have provided an independent computation of Tr(KH) = Tr (HK).It is also possible to derive a similar computation of Tr (HK) 2 = Tr (KH) 2 , as: where and δx b i = B 1/2 i are sets of background perturbations.Thus, at least three minimizations are necessary for the computation of this trace: first with original background; second with a perturbed background, third with a background perturbed by differences of analysis increments.This last formula may be simplified when working with B 1/2 -preconditioning (e.g.Bannister, 2008 as it can be expressed in terms of the control variable.Also, those computations only involve analysis increments where observations or background have been perturbed, and they can still be computed when using a weakly non-linear algorithm. Finally, another way to compute directly the expectations and covariances of the sub-parts of the cost function is the "simulated optimal innovation technique" introduced by Chapnik et al. (2006).In this approach, pseudo-observations and pseudo-backgrounds are generated with errors that are exactly consistent with the prescribed error statistics.Then, assimilation of these synthetic observations will result into sub-parts of the cost function that exactly follow the previous expressions.This is a Monte-Carlo approach to compute the expectations and covariances in the specific case where assimilation statistics are correctly specified.

A simplified framework
If one wants to compare the theoretical statistics to the observed ones, the obvious problem lies in the fact that there is only one realization of the observations (and thus of the innovation vector) in the real atmosphere.The expectation of the cost-function can be estimated from a single realization (Chapnik et al., 2004), yet for some observation types the use of space or time averages to build samples large enough is required (Sadiki and Fischer, 2005).Obviously, computing the variance of the cost-function requires many samples, thus the use of space or time averages.This kind of ergodicity assumption raises questions in itself, such that we will start investigating the use of these formulas first in a simplified framework where independent Monte-Carlo realizations of the innovation vector can be easily generated.Such an approach was also used in Muccino et al. (2004) with the same purpose.
Nonlin.Processes Geophys., 21, 187-199, 2014 www.nonlin-processes-geophys.net/21/187/2014/ For that purpose, a simplified one-dimensional variational assimilation framework has been set up in a similar way as in DI01.The goal is to analyze a single variable (say temperature) on a circular domain (say on an earth meridian), minimizing Eq. (3).A conjugate gradient algorithm with B 1/2preconditioning is used for the minimization.It involves a maximum of 100 iterations, which is sufficient for convergence unless correlated observation errors are specified (this is verified a posteriori).Background errors have homogeneous standard deviation σ b = 1 K.The correlation function c is chosen to belong to the Matérn class (Rasmussen and Williams, 2006) with "smoothness" parameter ν = 1, corresponding to: where the length-scale is chosen to be = 250 km; see Michel (2013) for details.The observation operator is a periodic linear interpolation.Observations are taken to be equally spaced, which allows easy inclusion of spatial correlations if needed, using the same model as B but in observation space.In order to roughly simulate the characteristics of operational data assimilation, the number of variables is taken as n = 1000 and there are p = 500 observations with precision σ o = 1 K.The fields are illustrated in Fig. 1.In this particular case, the value of the cost-function at the minimum is J = 245.8≈ p/2 as the specified error statistics are fully consistent with the ones of the innovation vector.

Validation of the randomized trace estimation
The distribution of the observation and background terms at the minimum have been obtained from running 10 4 independent realizations of the assimilation scheme.The distribution of the obtained values is shown in Fig. 2. The total cost function has a mean of 250.11 which is consistent with the theoretical expectation p/2 = 250 (within the bounds of a standard confidence interval with 10 4 realizations).The estimated variances of the observation and background terms are lower than their mean, which is consistent with the discussion in Sect. 2.
The trace estimation algorithm relies on randomization, thus its results depend on the sample size.The algorithm has been applied to compute Tr(HK), Tr (HK) 2 , and also independently Tr(KH) and Tr (KH) 2 for various ensemble (sampling) sizes M between 1 and 100.The procedure has been applied 200 times in order to evaluate the statistical properties of the randomized trace estimator.Results are presented in Fig. 3. Almost identical results are obtained among ensemble sizes.Averaging over the ensemble sizes and the 200 realizations gives the estimations Tr(HK) 80.15, Tr(KH) 80.19, Tr (HK) 2 56.7,Tr (KH) 2 56.86 which are obviously consistent whenever Eq. ( 26) or Eq. ( 27) is used.The comparison with  the independent realizations of the assimilation scheme (the simulated optimal innovation method, Fig. 2) is shown in Table 1.Table 1.Comparison of the simulated optimal innovation method with the randomized trace estimation, following Eqs.( 16), ( 17 Differences between theoretical and estimated values arise from sample size.Moving to cross-correlation between the values of J o and J b , the simulated optimal innovation method gives an estimate which is also in rather good agreement with the estimation from the trace algorithm: Tr [HK (I − HK)] as also shown in Fig. 3; J o and J b are positively correlated.
Figure 4 shows how the trace estimator becomes more and more precise as the ensemble size goes larger.More precisely, the variance of the estimator ( 26) is (Chapnik et al., 2006): Thus, it is possible to estimate the precision of the randomized trace computation as a function of ensemble size ] (b, circles), Tr[(KH) T KH] (b, plus signs), as a function of the ensemble size used for the randomization.Solid line: expectation; dashed lines: expectation plus or minus one standard deviation, as determined from a Monte-Carlo approach with 200 independent realizations.
M and as a function of traces Tr (HK) 2 , Tr (HK) T HK , Tr (KH) 2 and Tr (KH) T KH which themselves can be computed with the randomized approach proposed in Sect.3.1.This also implies that while Eqs.( 25) and ( 26) can be applied to estimate Tr(HK) = Tr (KH), both estimators have different precisions, as apparent in Fig. 4.

Variance tuning
DI01 proposed to adjust multiplicative scaling factors s b and s o for the covariance matrices B and R, assuming that at least approximately where B t and R t are the true covariance matrices.Then DI01 proposed to estimate the scaling factors as the ratio of Nonlin.Processes Geophys., 21, 187-199, 2014 www.nonlin-processes-geophys.net/21/187/2014/ the realization of the cost function at the minimum to their expectations: These equations can be seen as a fixed point relation that is solved iteratively.Alternatively, if the conditions ( 31)-( 32) are satisfied, then it is also possible to derive an alternative formulation for those coefficients.Indeed, the innovation vector d has covariance Using Eq. ( 35) in Eq. ( 12) with µ = 0 gives: Using Eq. (36) in Eq. ( 13) with µ = 0 gives which was already been obtained by Chapnik et al. (2006) when discussing the first iteration of the algorithm designed by DI01.In view of Eqs. ( 21)-( 22), the random fluctuations of J b and J o are small and the expectation can usually be replaced by a single realization.This turns Eqs. ( 37)-( 38) into a linear system of two unknowns s o , s b which can be written as: and where the notations α = Tr (HK), β = Tr (HK) 2 have been introduced for simplification.If the system is not singular, then the unique solution is: At the price of computing Tr (HK) 2 , the final solution of DI01's algorithm can be directly obtained, without any further iteration in the case where specified and true error covariances differ by a scaling factor as indicated by Eqs. ( 31)-( 32).This may however not be the case when observation operators are non-linear or when this condition is not met.
In our setting, Tr (HK) 2 is lower but comparable to Tr(HK) in magnitude (see Sect. 3.3 for details).Both traces are smaller by about one order of magnitude with respect to the number of observations, which is consistent with recent estimates in ARPEGE (Desroziers et al., 2009, Fig. 1).DI01's formulation neglects the term (s o − s b )Tr [HK(I − HK)] (but the correct estimate is recovered through the iterations).This term is indeed likely to be negligible in Eq. ( 38) but not so in Eq. ( 37).Thus, more iterations of the scheme may be needed to recover the appropriate value of s b , rather than s o , as already noticed by DI01 (Tables 3  and 4).In contrast, the correct estimate can be obtained by solving the linear system with the direct solution given by Eqs. ( 39)-( 40).The iterative tuning of the coefficients following DI01's approach is shown in Fig. 5  This suggests that it may be useful to use the formulation (39)-( 40) when trying to diagnose a multiplicative factor for the background term.This happens for instance for the inflation of variances in an ensemble of data assimilations (Raynaud et al., 2012).Extension of this formulation to the simultaneous tuning of several parameters (corresponding to subparts of the background and the observation cost-functions) is possible but more complicated, as a linear system of order S + P (the number of sub-terms) has to be solved.This will be the topic of further studies.

Application in the ARPEGE 4D-Var
This section describes the application of the variance tuning factors to the operational model of Météo-France.The global model ARPEGE is a spectral model with currently 70 vertical levels and features a stretched geometry that allows higher resolution over Europe (about 10 km in physical space).The assimilation scheme is based on the multiincremental formulation (Courtier et al., 1994)

Estimation of tuning factors
Randomized estimates of the traces are performed in the same way as in the idealized 1-D model using the assimilation with perturbed observations.For the background term, we estimate a single scaling factor with Eq. ( 39).Following the result of the previous section, it is clear that there is no need for such a refinement in the observation terms (e.g.inclusion of (s b − s o ) in Eq. 38).Observation scaling  (2009), the obtained values for observation types AIR-CRAFT, BRIGHT.TEMP.(satellite measurements) and GP-SRO are suspiciously low and this is probably due to the fact that these observations indeed have correlated errors (thus, additional parameters describing these correlations should be included in the maximum-likelihood approach).In our study the corresponding coefficients will just be discarded for the tuning as in Desroziers et al. (2009).

Impact in terms of forecast performance
We have investigated the impact of tuning the background and observation variances in the global 4D-Var scheme.Figure 7 shows the differences between root-mean-square errors for the geopotential between the reference and the ex- periment (with tuned values of variances).The error is computed with respect to independent ECMWF analyses.The verification period is one month, which is generally sufficient to obtain reliable results.Bootstrap tests (not shown) indicate that the results are statistically significant during the first 36 h, with positive impact over the whole atmospheric column except for the surface.Other scores however point out to a degradation of the wind analyses in the Tropics with respect to the observations.A posteriori diagnostics can help to detect misspecifications in the statistics of a data assimilation scheme.However, it is generally not possible to tell where the misspecification lies unless some further hypotheses are made (Talagrand, 2010).
Among those diagnostics, one is particularly simple.The value of the cost-function at the minimum should be close to its expected value p/2 where p is the number of observations (Bennett et al., 1993).It is possible to refine this diagnostic to sub-parts of the cost function (Talagrand, 1999).Despite care taken in the estimation of error covariance matrices, implementation of this diagnostic in early operational data assimilation schemes showed that the amplitude of the innovation vector was a priori overestimated by a factor 2-3 as reported by Talagrand (1999), possibly because of the misspecification of error correlations for satellite observations.This paper introduces some additional results using the fact that the sub-parts of the cost function at the minimum can be expressed as quadratic forms of the innovation vector.For some specific distributions of the innovation vector, including the multivariate normal, expressions for the expectation, variances and cross-covariances can be given.In particular, the observation and background terms at the minimum are positively (generally weakly) correlated (when the innovation vector follows the expected statistics).
The expressions involve the trace of larges matrices such as Tr[(HK) 2 ].It is shown that these matrices can be estimated from a randomized method (when K is not explicitly available) by applying the assimilation scheme twice.The randomized method is shown to yield results that are consistent with the "simulated optimal innovation" method (Chapnik et al., 2006).Finally, it is advocated that the computation of Tr[(HK) 2 ] could be useful for variance tuning, in particular for the inflation factor of the background error covariance matrix.Further work will attempt to estimate additional covariance parameters following Purser and Parrish (2003) and also make use of recent advances in randomized linear algebra (Aune et al., 2012;Stein et al., 2013)

Fig. 1 .
Fig. 1.True signal x t (solid black line), observations y o (circles) and the analysis x a (dashed black line) simulated with the prescribed statistics for background and observation errors (see text).The background x b is zero.

Fig. 2 .
Fig. 2. Histograms obtained with 10 4 realizations of the assimilation scheme shown in Fig. 1 for the values of (a) the background cost function and (b) the observation cost function at the minimum.

Fig. 3 .
Fig. 3. Scatterplot of the values of the background cost function and the observation cost function at the minimum.

2E
J o = s o Tr (I − HK) + s b − s o Tr [HK(I − HK)] for overestimated or underestimated σ o and σ b .They confirm that generally, a single iteration is necessary for tuning σ o (panels a, c, e, and g).In contrast, around four iterations can be necessary for recovering a useful value of σ b especially when s o − s b is large (panels b and f), as the term (s o − s b )Tr [HK(I − HK)] cannot be neglected.The formulation (39)-(40) recovers the correct factor within one iteration (shown only in panels a and b).

Fig. 5 .
Fig. 5. Iterative tuning of multiplicative coefficients for the error covariance matrices.σ o (left) and σ b (right) are tuned simultaneously with (a, b) overestimated σ o , underestimated σ b ; (c, d) overestimated σ o and σ b ; (e, f) underestimated σ o and overestimated σ b ; (g, h) underestimated σ o and σ b =.Solid lines and square marks: iterative tuning following DI01's approach; dashed lines and circle marks: new approach (shown only in a, b).Traces are estimated with the randomization method using 100 independent realizations.

Fig. 7 .
Fig.7.Impact of tuning on forecast scores: differences between the root mean squared errors of the reference and of the experiment compared to independent ECMWF analyses (geopotential error in m), as a function of forecast range (in h).NORD20 is the domain on the globe that is north of 20 • N, TROPIQ is the domain between 20 • N and 20 • S, and SUD20 is south of 20 • S. Blue (red) indicates better (worse) performance.The verification period is one month (September 2013).