Interactive comment on “ Improving the ensemble transform Kalman filter using a second-order Taylor approximation of the nonlinear observation operator ”

1) Introduction: Although the title of the paper indicates it is focusing on the ETKF applications and improvements, it would be beneficial to describe the treatment of nonlinearity in general ensemble data assimilation outside of ETKF, including the Maximum Likelihood Ensemble Filter (Zupanski 2005) and the particle filters (van Leeuwen 2009). Also, the proposed methodology implicitly assumes the use of incremental minimization (e.g. a form of truncated Newton method), with outer and inner loops. This should be clearly stated, since this is only one possible approach to iterative minimization, with many more efficient methods available in mathematical optimization and control theory.


Introduction
The spatial and temporal distribution of observations is continuously changing with the improvement in numerical models and observation techniques.Sounding data, remote sensing observations, satellite radiance data and other indirect information bring both opportunities and challenges in data assimilation.How to assimilate these indirect observations is an important research topic in data assimilation (Reichle, 2008).
The observation operators for indirect observations are often nonlinear.For example, radiative transfer codes (e.g., RT-TOV, CRTM, Saunders et al., 1999;Han et al., 2006) can be treated as observation operators by mapping air temperature and moisture to the microwave radio brightness temperature (McNally, 2009).Because the relationship of these observations with modeled variables may be strongly nonlinear (Liou, 2002) and the observation errors may be spatially correlated (Miyoshi et al., 2013), data assimilation schemes have to be appropriately designed to address such indirect observations.
Most data assimilation methods are fundamentally based on linear theory, but have different responses to departures from linearity (Lawson and Hansen, 2004).Conceptually, Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
variational data assimilation schemes (VAR; e.g., Parrish and Derber, 1992;Courtier et al., 1994;Lorenc, 2003) can assimilate data with nonlinear observation operators and spatially correlated observation errors.However, a drawback of VAR is that it has to calculate the adjoint of a dynamical model, which is not an easy task in practice.Moreover, VAR does not give a direct estimate of the background error covariance matrix, which is crucial for the performance of any data assimilation scheme.In general ensemble data assimilation, the maximum likelihood ensemble filter (MLEF) minimizes a cost function that depends on a general nonlinear observation operator to estimate the state vector, which is equivalent to maximizing the likelihood of the posterior probability distribution (Zupanski, 2005).The particle filter uses a set of weighted random samples (particles) to approximate the posterior probability distribution that may depend on a nonlinear observation operator (van Leeuwen, 2009).
The ensemble Kalman filter (EnKF) scheme has a strategy to optimize forecast error statistics without using the adjoint of the dynamical model (e.g., Evensen, 1994a, b;Burgers et al., 1998;Anderson and Anderson, 1999;Wang and Bishop, 2003;Wu et al., 2013).It is also conceptually applicable to data assimilation with nonlinear observation operators.However, it has been demonstrated that when the observation operator is strongly nonlinear, using the linear approximation of the observation operator to derive the error covariance evolution equation can result in an oversimplified closure and dubious performance of the EnKF (e.g., Miller et al., 1994;Evensen, 1997;Yang et al., 2012).
The ensemble transform Kalman filter (ETKF) was first introduced in atmospheric assimilation by Bishop and Toth (1999) and Bishop et al. (2001).Wang and Bishop (2003) transformed the forecast perturbations into analysis perturbations by multiplying a transformation matrix.They also proposed an efficient way to construct the transform matrix through eigenvector decomposition of a matrix of the ensemble size.Hunt et al. (2007) extended the ETKF method to deal with a general nonlinear observation operator using the cost function.In addition to the reduction of computational cost compared with EnKF, another advantage of the ETKF proposed by Hunt et al. (2007) is that it can assimilate observations with strongly nonlinear observation operators (Chen et al., 2009) and with spatially correlated observation errors (Stewart et al., 2013).
However, there are still problems associated with the ETKF when the observation operator is strongly nonlinear.First, the current ETKF is based on the minimization of a cost function similar to that in VAR for nonlinear observation operators (Hunt et al., 2007).First, the direct calculation for the minima requires iterative evaluation of the nonlinear operators and their tangent-linear operators.Using linear approximation of the nonlinear observation operators (e.g., Hunt et al., 2007) can effectively reduce the computational burden, but at the cost of increasing analysis error.Second, tangentlinear approximation of the observation operator is used for the forecast error inflation in the ETKF (e.g., Li et al., 2009).If the observation operators are strongly nonlinear, the inflation factors and hence the forecast error covariance matrices may be estimated erroneously, leading to an eventual increase in the analysis error.
In this study, we propose two alternative approaches to improving assimilation quality when the observation operator is strongly nonlinear.First, in an effort to reduce computational cost without significantly reducing estimation quality, we use the second-order Taylor expansion of the observation operator to estimate both the inflation factors and the analysis states.Second, for the case where the inflation factor is constant in space, we propose a new forecast error inflation method for general nonlinear observation operators without using tangent-linear approximation.It is worthwhile pointing out that the proposed methodology implicitly assumes the use of incremental minimization with outer and inner loops.There may be other efficient methods available in mathematical optimization and control theory.
The potential use of the second-order information has been noted by some authors.For example, Hunt et al. (2007) noted that the second-order derivatives of the objective function might be used to estimate the covariance of analysis weight, which is an important step in ETKF with a nonlinear observation operator.Moreover, Le Dimet et al. (2002) and Daescu and Navon (2007) noted that the second-order information in nonlinear variational data assimilation is important to the issue of solution uniqueness.
In the conventional ETKF scheme, linear approximation of nonlinear observation operators is used for the purpose of reducing the computational cost compared with conventional methods of searching for the minima of nonlinear cost functions (Hunt et al., 2007).This study also aims to investigate the changes in analysis errors when a nonlinear observation operator is substituted by its first-order and secondorder Taylor approximations.However, we focus on the formulation of the forecast error inflation method in the case of a nonlinear observation operator, and on the improved accuracy with second-order versus first-order approximation or linear approximation.Further studies on the performance of the proposed schemes in practical data assimilations are needed, and should be performed in the future.
The rest of the paper is organized as follows.Our modified ETKF schemes are described in Sect. 2. The assimilation results in a Lorenz 96 model with a nonlinear observation system are presented in Sect.3. The discussions are given in Sect.4, and the conclusions are in Sect. 5. Hunt et al. (2007) gave a comprehensive description of the ETKF with a nonlinear observation operator without procedures for forecast error inflation.In this section, we propose an inflation scheme for general nonlinear observation operators.

ETKF with forecast error inflation
Using the notations of Ide et al. (1997), a nonlinear discrete-time forecast and observation system can be written as where i is the time step index; is the n-dimensional analysis state vector that is an estimate of x t i−1 ; M i is the nonlinear forecast operator; is the nonlinear observation operator, where h k,i is an n-dimensional multivariate function; and η i and ε i are the forecast and observation error vectors that are assumed to be statistically independent of each other, time uncorrelated, and to have mean zero and covariance matrices P i and R i , respectively.The detailed procedure of the ETKF with a nonlinear observation operator (Hunt et al., 2007) with the proposed inflation scheme is as follows.
Step 1. Calculate the j th perturbed forecast state at time i as where x a i−1,j is the j th perturbed analysis state at time i − 1.Then, the mean forecast state is defined as where m is the total number of ensemble members.
Step 2. Assume the forecast errors to be in the form , where the inflation factor λ i can be estimated by minimizing the objective function Here, I is the p i × p i identity matrix, is the innovation vector normalized by the square root of the observation error covariance matrix (Wang and Bishop, 2003), and (See Appendix A for details.) Step 3. Calculate the analysis state as where and w a i is estimated by minimizing the objective function Step 4. Calculate a perturbed analysis state as where W a i,j is the j th column of the matrix and Ji|w a i is the second-order derivative of J i (w) at w a i (see Appendix B for details).Lastly, set i = i + 1 and return to Step 1 for the next iteration.
To estimate the inflation factor, Li et al. (2009) proposed a scheme that requires the tangent-linear operator of the observation operator (see Sect. 2.2.1 for the definition).In an effort to reduce the computational cost of searching for the minima of the objective function (Eq.10), Hunt et al. (2007) suggested the following linear approximation: where In this study, this traditional ETKF approach is validated against other approaches.
G. Wu et al.: Improving the ETKF using second-order information

Simplified estimation methods in special cases
To compute the variational minimization in Eq. ( 10) operationally, one can directly compute the explicit solution of the minima and iterate the process as in the 4D-Var outer loop (Lorenc, 2003;Liu et al., 2008).However, doing so still requires repeatedly calculating the nonlinear function H i x f i + λi X f i w and its tangent-linear operator (see Sect. 2.2.1 for the definition), which depend on w and x f i .In this subsection, we propose an alternative procedure when the observation operator H i can be approximated by its Taylor expansions.

First-order Taylor approximation for H i
The first-order Taylor approximation for H i at the forecast state vector x f i is defined as where is the first-order derivative of H i evaluated at the forecast state x f i (tangent-linear operator).Then, λ i can be estimated by minimizing the quadratic function The analytic solution is where Similarly, w a i can be estimated by minimizing the multivariate quadratic function and the analytic solution is (See Appendix C for details.)

Second-order Taylor approximation for H i
The second-order Taylor approximation for H i at x f i is defined as where Ḣi|x f i is the tangent-linear operator defined in Eq. ( 15), and is the second-order derivative of H i at x f i , which is a p i -dimensional vector with the kth element meaning the following Hessian matrix: Here, ⊗ is the outer product operator; i.e., for two arbitrary n-dimensional vectors x and y, is a p i -dimensional vector.Then, λ i can be estimated by minimizing the polynomial objective function of where Nonlin.Processes Geophys., 21, 955-970, 2014 www.nonlin-processes-geophys.net/21/955/2014/ and are two m × m matrices.Moreover, w a i can be estimated by minimizing the multivariate polynomial objective function (see Appendix D for details).

Validation statistics
In the following experiments, the "true" state x t i is known by experimental design, and is non-dimensional.In this case, we can use the root mean square error of the analysis state (A-RMSE) to evaluate the accuracy of the assimilation results.The A-RMSE at the ith step is defined as where • denotes the Euclidean norm, and n is the dimension of the state vector.A smaller A-RMSE indicates a better performance of the assimilation scheme.Following Anderson (2007) and Liang et al. (2012), the root mean square error of the forecast state (F-RMSE) and the spread of the forecast state (F-Spread) at the ith step are defined as and Roughly speaking, if x f i,j and x t i are identically distributed with a mean value of x f i , then F-RMSE and F-Spread should be consistent with each other.This is more likely the case if the model error is small.In general, the F-RMSE can be decomposed into an F-Spread component and a model error component, so it is larger than F-Spread (see Appendix B of Wu et al., 2013, for a detailed proof).Besides model error, the nonlinearities and the sampling error may also affect the consistency between F-RMSE and F-Spread, as is discussed later in this paper.

Experiments with the Lorenz 96 model
In Sect.2.1, we outlined the general ETKF assimilation scheme with second-order least squares (SLS) error covariance matrix inflation.In Sect.2.2, we proposed simplified estimation methods for two special cases where H i either is tangent linear (Sect.2.2.1) or can be approximated by the second-order Taylor expansion (Sect.2.2.2).In this section, we apply these assimilation schemes to the Lorenz 96 model (Lorenz, 1996) with model errors and a nonlinear observation system, because it is a nonlinear dynamical system with properties relevant to realistic forecast problems.

Description of the dynamic and observation system
The Lorenz 96 model (Lorenz, 1996) is a strongly nonlinear dynamical system with quadratic nonlinearity governed by the equation where k = 1, 2, . . ., K (K = 40, so there are 40 variables).
We apply the cyclic boundary conditions X −1 = X K−1 , X 0 = X K , and X K+1 = X 1 .The dynamics of Eq. ( 31) are "atmosphere-like" in that the three terms on the right-hand side consist of a nonlinear advection-like term, a damping term and an external forcing term, respectively.These terms can be thought of as a given atmospheric quantity (e.g., zonal wind speed) distributed on a latitude circle.We solve Eq. ( 31) using the fourth-order Runge-Kutta time integration scheme (Butcher, 2003), with a time step of 0.05 non-dimensional units to derive the true state.This is equivalent to about 6 h in real time, assuming that the characteristic timescale of the dissipation in the atmosphere is 5 days (Lorenz, 1996).In our assimilation schemes, we set F = 8 so that the leading Lyapunov exponent implies an error-doubling time of approximately 8 time steps (i.e., 0.4 non-dimensional time units), and the fractal dimension of the attractor is 27.1 (Lorenz and Emanuel, 1998).The initial condition is chosen to be X k = F when k = 20 and X 20 = 1.001F .
Because the microwave brightness temperature is an exponential function of soil temperature, we use the exponential observation function to mimic the radiative transfer model in this study.Suppose the synthetic observation generated at the kth model grid point is where k = 1, . . ., p i , and T is the observation error vector with mean zero and covariance matrix R i .Here, α is a parameter controlling the nonlinearity of the observation operator, and α = 0 corresponds to the linear case.All 40 model variables are observed in our experiments.Suppose the observation errors are spatially correlated.The leading-diagonal elements of R i are σ 2 o = 1, and the off-diagonal elements at site pair (j , k) are With the exponential observation function and spatially correlated observation errors, the proposed scheme may potentially be applied to assimilate remote sensing observations and radiance data.
We added model errors to the Lorenz 96 model because they are inevitable in real dynamic systems.The model is a forced dissipative model with a parameter F that controls the strength of the forcing (Eq.31).It behaves quite differently with different values of F , and it produces chaotic systems with integer values of F larger than 3. Thus, we used various values of F to simulate a wide range of model errors while retaining F = 8 when generating the "true" state.These observations were then assimilated with F = 4, 5, . . ., 12.We simulated observations every 4 time steps for 100 000 steps to ensure robust results (Sakov and Oke, 2008;Oke et al., 2009).The ensemble size is 30.

Assimilation results
In this section, we examine the following five data assimilation methods corresponding to five different treatments of nonlinearity in inflation factor estimation and optimization: ETKF: traditional ETKF in linear approximation (Eq.12) and optimization (Eq.10).
The corresponding time-mean A-RMSEs of these assimilation schemes with α = 0.1 and F = 4, 5, . . ., 12, over 100 000 time steps, are plotted in Fig. 1a.First, the figure clearly shows that for each estimation method, the A-RMSE increases as F becomes increasingly distant from the true value of 8.
the other hand, the A-RMSEs of methods SS and TN are close, and smaller than that of method TT, suggesting that the second-order Taylor approximation method is comparable to the partial nonlinear method and is better than the first-order Taylor approximation method.Lastly, the traditional ETKF method has the largest A-RMSE, which implies that although the linear approximation is computationally more efficient, it may introduce a larger analysis error.
For the Lorenz 96 model with a large error (F = 12), the time-mean A-RMSEs and F-RMSEs of the five methods are given in Table 1, as well as the time-mean values of the objective functions.The function represents the second-order distance from the squared innovation statistic (d i d T i ) to its expectation.Generally speaking, for a more accurate assimilation scheme, the realization of d i d T i should be closer to its expectation, and therefore the value of the objective function should be smaller.It can be seen that the full nonlinear method (NN) has both the smallest A-RMSE and F-RMSE, while the traditional linear approximation method (ETKF) has the largest RMSEs.The second-order Taylor approximation method (SS) performs similarly to the partial nonlinear method (TN), but better than the first-order Taylor approximation method (TT).In the majority of the cases, a smaller error corresponds to a smaller value of the objective function L. The ratios of F-RMSEs to A-RMSEs are also listed in Table 1, which can be considered a measurement of the improvement gained at the analysis step.All the ratios are larger than 1, which indicates that the analysis state is better than the forecast state.Among all methods, the ratio is largest for the TN method, which indicates the largest error reduction at the analysis step.
To illustrate the variation in A-RMSE with respect to the parameter α, the corresponding time-mean A-RMSEs of different assimilation schemes with F = 12 and α = 0, 0.02, 0.04, 0.06, 0.08, 0.1 are plotted in Fig. 1b.It shows that all the schemes have the same A-RMSE with α = 0 (i.e., the observation operator is linear), indicating that there is no difference between them.For each scheme, the A-RMSE increases as the parameter α increases from 0 to 0.1.The magnitude relation of all schemes is basically consistent with that in Fig. 1a.The larger the parameter α is, the bigger the difference that the different schemes have.
To investigate the consistency between F-RMSE and F-Spread, we present the time-mean values of the five methods for cases F = 12 and F = 8 in Tables 2 and 3, respectively, as well as the ratios of F-RMSE to F-Spread.It is easy to see that in all cases, the F-RMSEs are larger than the F-Spreads, and therefore, all the ratios are greater than 1.However, the ratio of the full nonlinear method (NN) is the smallest, while the ratio of the linear approximation method is the largest.The ratio of the second-order approximation method (SS) is comparable to that of the partial nonlinear method (TN), but smaller than that of the first-order approximation method (TT).This suggests that the ensemble-perturbed predictions are the most (least) reasonable for method NN (ETKF).Moreover, the ratios with F = 8 are much closer to 1 than those with F = 12, because the model error with F = 12 is much larger than that with F = 8 (see Sect. 2.3).
Table 1.The time-mean values of A-RMSE and F-RMSE, the ratio of F-RMSE to A-RMSE, and the objective function (second-order distance from the squared innovation statistic to its expectation) in the five ETKF methods for the Lorenz 96 model, with forcing parameter F = 12 and parameter of observation operator α = 0.1.ETKF: traditional ETKF in linear approximation (Eq.12) and optimization (Eq.10); TT: tangent-linear approximation in both inflation (Eq.17) and optimization (Eq.20); TN: tangent-linear approximation in inflation (Eq.17) and nonlinearity in optimization (Eq.10); SS: second-order Taylor approximation in both inflation (Eq.24) and optimization (Eq.27); NN: nonlinearity in both inflation (Eq.5) and optimization (Eq.10).

Impacts of Taylor approximations
In Sect.3.2, we see that the A-RMSEs derived from the five ETKF assimilation schemes are close when F is close to the true value of 8, but are different when F departs from 8.This effect may depend on how well the Taylor expansions approximate the nonlinear observation operator H i .For example, the Taylor expansion of the kth component of observation operator H i (x) = x exp {αx} (Eq.32) with α = 0.1 around the forecast state x f k,i is To verify how well the Taylor expansions approximate the nonlinear observation operator H i , we calculate the ratios of the Taylor expansion residuals over x t k,i exp 0.1x t k,i .If a ratio falls outside the interval [−0.1, 0.1], then the corresponding residual cannot be regarded as being of a higher order infinitesimal, and hence cannot be ignored.Therefore, a larger proportion of ratios falling outside the interval [−0.1, 0.1] indicates a worse Taylor expansion, and vice versa.
The proportions of the ratios that fall outside the interval [−0.1, 0.1] are plotted in Fig. 2, which shows that when F = 8, the proportions are 0.0169 and 0.0006 for the firstorder and second-order Taylor expansions, respectively.This result indicates that at almost all times and locations, both the first-order and second-order Taylor expansions are good approximations of x t k,i exp 0.1x t k,i .However, when F = 12, at approximately 47 % (19 %) of the times and locations, x t k,i exp 0.1x t k,i cannot be approximated adequately by its first-order or second-order Taylor expansion.Therefore, the A-RMSEs derived by the five ETKF schemes are quite different.This example also indicates that the success of the Taylor approximation method depends on both the smoothness of H i and the range of forecast states.It seems that for the same strongly nonlinear observation operator, the larger the model error, the less the success of the Taylor approximation.

Inflation
It is widely recognized that the initial estimates of ensemble forecast errors should be inflated to improve assimilated results.To date, however, all of the existing adaptive inflation schemes in ETKF are based on the assumption that the observation operator is linear or tangent linear (e.g., Li et al., 2009;Miyoshi, 2011).In this study, a method to estimate the multiplicative inflation factors is proposed for general nonlinear observation operators.
Historically, in systems such as the Met Office ETKF (Flowerdew and Bowler, 2011), the need for inflation arises primarily due to spurious correlations that cause the raw analysis ensemble to be severely underspread even when the background ensemble is well spread.In this case, therefore, inflation must be applied to the analysis ensemble to respond correctly to the actual analysis uncertainty in the nonlinear forecast step.Inflation of the background ensemble may be more appropriate when the inflation primarily represents forecast model error, although stochastic physics or additive inflation may also be appropriate in this case (Hamill and Whitaker, 2005;Wu et al., 2013).
Our choice to inflate the background ensemble is crucial to the ability to find a direct nonlinear solution for Eqs. ( 5)-( 7), because of the way the inflation factor appears in these equations.The objective function for estimating the multiplicative inflation factors is the second-order distance between the expectations of the squared innovation and its realization, which also makes the rms spread equal to the rms error (e.g., Palmer et al., 2006;Wang and Bishop, 2003;Flowerdew and Bowler, 2011).
The proposed nonlinear method is tested using the Lorenz 96 model with nonlinear observation systems (Sect.3.2).The resulting A-RMSEs are clearly smaller than those of the first-order Taylor approximation in the estimation of the inflation factor.This indicates that the proposed full nonlinear inflation method is better than the first-order Taylor approximation inflation method in the case of nonlinear observation operators (i.e., method NN is better than method TN).In addition, the F-RMSE and F-Spread of the proposed nonlinear method are more consistent than those of the first-order Taylor approximation method.The secondorder approximation method for estimating inflation factors while using the nonlinear optimization scheme is also investigated.The corresponding A-RMSE is 2.20 for the forcing parameter F = 12 and the parameter of observation operator α = 0.1, and is larger than that of method TN and smaller than that of method NN.
The proposed inflation methods work well in the cases where observation errors are spatially correlated.Some data assimilation schemes assume the observation error covariance matrix to be diagonal for simplicity and ease of computation (e.g., Anderson, 2007Anderson, , 2009)) the observation error covariance matrix has nonzero offdiagonal entries (Miyoshi et al., 2013).The inflation method proposed in this study can be applied to assimilate such observations.
In many practical experiments, the inflation factor is constant in time, and is chosen by trial and error to give the assimilation with the most favorable statistics (e.g., Anderson and Anderson, 1999).To test the fixed-tuned inflation method, suppose x a i (λ) and x f i (λ) are the analysis sate and the forecast state using the time-invariant inflation factor λ.Then, the statistics 2 are minimized to tune the λ, respectively.When Eq. ( 10) is minimized to estimate the weights of perturbed analysis states, the corresponding A-RMSEs of the two fixed-tuned methods are estimated as 2.97 and 2.85, respectively, which are larger than that of method SS (2.29).The ratios of F-RMSE to F-Spread are estimated as 3.14 and 3.45, respectively, which are also larger than the 1.80 of method SS (see Table 1).All these facts indicate that the empirical estimation method for the inflation factor is not as good as method SS.

Second-order Taylor approximation
In Sect.3.2, we showed that the ETKF scheme equipped with our proposed nonlinear inflation method leads to the smallest A-RMSE in all ETKF schemes analyzed in this study.However, this ETKF scheme requires repeated calculation of the nonlinear observation functions i and H i x f i + λi X f i w when minimizing the objective functions L i (λ) and J i (w).To reduce the computational cost, a commonly used approach is to substitute H i by its tangentlinear operator (i.e., first-order Taylor expansion).However, this approach comes at the cost of losing estimation quality, as we have shown in this study.
As an effort to strike a balance between the estimation quality and computational cost, the nonlinear observation operator H i in the objective functions L i (λ) and J i (w) is substituted by its second-order Taylor expansion.This is because (1) the second-order Taylor expansion is a better approximation of H i than its tangent-linear operator; (2) with secondorder Taylor expansion, the inflation factor λ and the weight vector w are concentrated out of H i , so the objective functions (Eqs.24 and 27) become polynomials, for which a minimum is easier to derive; (3) the second-order derivative of H i is required to estimate ensemble analysis states (Eq.11) in the ETKF scheme, so its computation is not an additional task.
The accuracy of the ETKF scheme with the second-order Taylor approximation is examined in Sect.3.2.The results suggest that the scheme is more accurate than the ETKF scheme based on the first-order Taylor approximation, and is comparable with the scheme based on nonlinear optimization and tangent-linear multiplicative inflation.However, it is less accurate than the nonlinear optimization and the nonlinear inflation estimation ETKF scheme proposed in this study.On the other hand, both schemes have similar F-RMSE over F-Spread ratios.
Despite the advantage that the objective functions (Eqs.24 and 27) are easier to minimize, the computational cost of the ETKF with the second-order Taylor approximation may increase from computing Because the most typical nonlinear observation operator in numerical weather prediction is the radiative transfer model (RTTOV), the related computational issue is discussed and is documented in Appendix E. In fact, unlike forecast operators, the observation operators are usually localized, and therefore, the computation of For the observation operators that are not localized, the computation of the second-order term may be complex.
In addition, there are other ways to address this problem.For example, in the deterministic variational framework, Met Office re-linearizes the observation operator every 10 iterations (Rawlins et al., 2007), and ECMWF uses a nonlinear outer loop.Both approaches retain the efficiency of a tangent-linear approximation in the inner loop, while allowing for nonlinearity at a higher level.To understand the efficacy of the ETKF scheme with second-order Taylor approximation better, a more careful comparison with alternative assimilation schemes is necessary.We plan to face this challenge in the near future.

Caveats
This study assumes the inflation factor to be constant in space, but this is apparently not the case in many practical applications -specifically when observations are sparse.Applying the same inflation value to all state variables may overinflate the forecast errors of the state variables without observations (Hamill and Whitaker, 2005;Anderson, 2009;Miyoshi et al., 2010;Miyoshi and Kunii, 2012).If the forecast model has a large error, a multiplicative inflation may not be effective enough to improve the assimilation results.In this case, the additive inflation and the localization technique may be applied to improve the assimilation quality further (Wu et al., 2013).
This study also assumes that the analysis increment can be expressed as a linear combination of ensemble forecast errors (Eq.8).This assumption is true if the observation operator is tangent linear, but the nonlinear observation operator can affect the combination of possible increments that produce the optimal analysis (Yang et al., 2012) For general nonlinear or even non-smooth radiative transfer operators (Steward et al., 2012), the utility of higher-order elements in a Taylor expansion may be questionable.Also, the development of the second-order term may be time consuming and difficult in the case of complex observation operators, especially when the observation operators cannot be localized.
Last but not least, the results concluded in this study are related to the Lorenz 96 experiment, and may not be regarded as general rules.However, they can serve as counterexamples to validate some ideas.

Conclusions
In this study, a new approach to inflating the ensemble forecast errors is proposed for the ETKF with a nonlinear observation operator.For an idealized model, it is shown that the proposed inflation approach can reduce analysis error compared with the tangent-linear multiplicative inflation, despite it being computationally more expensive.An ETKF scheme with the second-order Taylor approximation is also proposed.In terms of analysis error, the scheme is better than the first-order Taylor approximation ETKF scheme and the traditional ETKF scheme, especially when the model error is larger.However, it is comparable to the scheme based on nonlinear optimization and tangent-linear multiplicative inflation.The proposed ETKF scheme with nonlinear optimization and nonlinear inflation was found to be the best among all schemes presented in this study.Finally, the proposed method is computationally feasible for assimilating satellite observations with radiative transfer models as the nonlinear observation operators (see Appendix E), which are broadly used in atmospheric, ocean and land data assimilations.
In the future studies, we plan to investigate further the computational efficiency of the proposed ETKF schemes and to validate them using more sophisticated dynamic models and observation systems.

Appendix A: Derivation of Eq. (6)
The estimation of the inflation factors λ is based on the innovation statistic normalized by the square root of the observation error covariance matrix where y o i , x f i and x t i are the observation, forecast and true state vectors at the ith time step, respectively, and H i is the observation operator.The mean value of d i d T i is where E is the expectation operator.Especially if the observation operator is a linear matrix (H i ), Eq. ( A2) can be simplified to where I is the p i × p i identity matrix.Then, the covariance matrix of the random vector d i can be expressed as a secondorder regression equation (Wang and Leblanc, 2008): where is a zero-mean error matrix.The expectation in Eq. (A4) has the decomposition Assuming the forecast and observation errors are statistically independent, we have From Eq. ( 2), is the observation error at the ith time step, and hence, In a perfect system, truth would be statistically indistinguishable from one of the ensemble forecast states, but in a real system, this is not guaranteed.Hence, we use an inflation factor to adjust the ensemble forecast states x f i,j Because the ensemble forecast states may be regarded as sample points of x t i (Anderson, 2007), we have Substituting Eqs.(A5)-(A9) into Eq.(A4), we have It follows that the second-order moment statistic of error can be expressed as where is the first-order derivative of H i evaluated at x f i + λi X f i w.Then, the second-order derivative of J i (w) is where A is an m × m matrix with the (k, l) entry The notation "⊗" denotes an outer product operator of the block matrix defined in Eq. ( 23).
is the secondorder derivative of H i at x f i + λi X f i w, that is, Appendix C: Details of the first-order approximation method in Sect.2.2.1 Suppose H i can be approximated by its first-order Taylor expansion at x f i , The term C i (λ) in Eq. ( 6) can be simplified to It follows that the objective function L i (λ) of Eq. ( 5) can be simplified to Because L 1,i (λ) is a quadratic function of λ with positive quadratic coefficients, the inflation factor can be easily expressed as Similarly, Substituting Eq. (C3) into Eq.( 8), we can simplify the objective function J i (w) to The first-order derivative of J 1,i (w) is Setting Eq. (C6) to zero and solving it leads to Lastly, the second-order derivative of J 1,i (w) is Appendix D: Details of the second-order approximation method in Sect.2.2.2 Suppose H i can be approximated by its second-order Taylor expansion at x f i : The notation "⊗" is defined as in Eq. ( 23).The term C i (λ) in Eq. ( 7) can be simplified to where and are p i × p i matrices, which are independent of λ.
It follows that the objective function L i (λ) of Eq. ( 5) can be expressed as Similarly, Substituting Eq. (D6) into Eq.( 10), we can simplify the objective function J i (w) to The first-order derivative of J 2,i (w) is where B 1 is a p i × m matrix with the (k, l) entry The second-order derivative of J 2,i (w) is where B 2 is an m × m matrix with the (k, l) entry

Appendix E: Computational feasibility
We take the RTTOV as an example of observation operators in numerical weather prediction to discuss the computational feasibility of the ETKF with a second-order approximation assimilation method.Generally speaking, the ensemble size m is from tens to hundreds, the dimension of observations (including gauge observations and advanced microwave sounding units, AMSU, brightness temperature) p i is hundreds of thousands, and the dimension of the state vector n is tens of millions.If the storage and the number of multiplications for computing any array are not in the dimension of n×n, n×p i or p i ×p i , the computation should be feasible.
In our proposed ETKF with second-order approximation, the most expensive part is in computing the array Therefore, we only discuss the problems related to the computation of X f i w T Ḧi|x f i ,k X f i w.

E1 Storage problems
By the matrix multiplication rule, where the matrix in the middle of the right-hand side of Eq. (E2), is of dimension m×m, because subscript k runs from 1 to p i , and the size of the array in Eq. ( E1) is m×m×p i .Therefore, there is no storage problem in saving this array.

E2 The computational cost of Eq. (E3)
Usually, mn(m + n) times multiplication is required to compute a matrix such as the one in Eq. (E3).However, in the case of the RTTOV observation operator, Ḧk,i|x f i is a sparse matrix with a large number of zeros, and the non-zero part has a simple regular structure.This is because a microwave sounding units (MSU) brightness temperature measurement on a grid point -denoted by y o i (k) -is only related to the meteorological state variables on the transmission route.Suppose the meteorological model has 50 layers and 6 types of variables; the number of state variables on the transmission route of the MSU brightness temperature y o i (k) is approximately s = 300.For the variables not on the transmission route, the corresponding entries in Ḧk,i|x f i (Eq.22) are zero.Therefore, the computation of Eq. (E3) only requires ms(m + s)/2 times multiplication.
On the other hand, computing the first and second derivatives requires an additional number of operations, but it is manageable.

Figure 1 .
Figure 1.(a) Time-mean values of the A-RMSE as a function of forcing F for different assimilation methods in the Lorenz 96 model and the observation operator (Eq.32) with parameter α = 0.1.(b) Time-mean values of the A-RMSE as a function of parameter α for different assimilation methods in the Lorenz 96 model with F = 12.ETKF: traditional ETKF in linear approximation (Eq.12)and optimization (Eq.10) (cyan line); TT: tangent-linear approximation in both inflation (Eq.17) and optimization (Eq.20) (red line); TN: tangent-linear approximation in inflation (Eq.17) and nonlinearity in optimization (Eq.10) (green line); SS: second-order Taylor approximation in both inflation (Eq.24) and optimization (Eq.27) (blue line); NN: nonlinearity in both inflation (Eq.5) and optimization (Eq.10) (black line).The ensemble size is 30.

Figure 2 .
Figure 2. The proportions of residual ratios of the first-order (solid line) and second-order (dotted line) Taylor expansions over the nonlinear observation operator x t k,i exp 0.1x t k,i that fall outside the interval [−0.1, 0.1], as a function of forcing F .

Table 2 .
The time-mean values of F-RMSE and F-Spread, and the ratio of F-RMSE to F-Spread in the four ETKF schemes for the Lorenz 96 model, with forcing parameter F = 12 and parameter of observation operator α = 0.1.

Table 3 .
Similar to Table 2, but with F = 8.