Strategies for coupling global and limited-area ensemble Kalman filter assimilation

This paper compares the forecast performance of four strategies for coupling global and limited area data assimilation: three strategies propagate information from the global to the limited area process, while the fourth strategy feeds back information from the limited area to the global process. All four strategies are formulated in the Local Ensemble Transform Kalman Filter (LETKF) framework. Numerical experiments are carried out with the model component of the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) and the NCEP Regional Spectral Model (RSM). The limited area domain is an extended North-America region that includes part of the north-east Pacific. The GFS is integrated at horizontal resolution T62 (about 150 km in the mid-latitudes), while the RSM is integrated at horizontal resolution 48 km. Experiments are carried out both under the perfect model hypothesis and in a realistic setting. The coupling strategies are evaluated by comparing their deterministic forecast performance at 12-h and 48-h lead times. The results suggest that the limited area data assimilation system has the potential to enhance the forecasts at 12-h lead time in the limited area domain at the synoptic and subsynoptic scales (in the global wave number range of about 10 to 40). There is a clear indication that between the forecast performance of the different coupling strategies those that cycle the limited area assimilation process produce the most accurate forecasts. In the realistic setting, at 12-h forecast time the limited area systems produce more modest improvements compared to the global system than under the perfect model hypothesis, and at 48-h forecast time the global forecasts are more accurate than the limited area forecasts. Correspondence to: D. Merkova (dagmar.merkova@nasa.gov)


Introduction
An atmospheric limited area model uses time-dependent lateral boundary conditions provided by a global atmospheric model. In current practice, the initial conditions for the limited area model are either analyses prepared using the global model and interpolated to the higher resolution grid of the limited area model, or analyses prepared by using a data assimilation system specifically designed to produce initial states for use by the limited area model. In the latter case, the analysis inside the limited area domain is obtained independently of the global analysis (e.g., Torn et al. 2006;Zhang et al. 2006;Huang et al. 2009). The aforementioned two approaches are motivated by the practical constraint that most weather prediction centers and research groups who run limited area models have access to global analysis products, but do not have the capability to produce global analyses. The only exceptions are a handful of operational NWP centers, e.g., the National Centers for Environmental Prediction (NCEP), who prepare both global and limited area analyses, but, mainly for practical reasons, follow one of the two aforementioned approaches.
In this paper, we consider the scenario in which we have access to both the global and the limited-area model and a model-independent data assimilation system. Our goal is to begin to address the problem of finding that configuration of the coupling between these three components of the forecast system, which provides the best global and limited area model forecasts. In particular, we compare the forecast performance of the system for different coupling strategies using both simulated and operationally used observations of the atmosphere. In our experiments, the global model is the model component of the Global Forecast System (GFS) of the National Centers for Environmental Prediction (NCEP) (Sela 1980) integrated at a T62L28 (about 150 km ) horizontal resolution, the limited area model is the Regional Spectral Model (RSM) of NCEP (Juang 1992;Juang and Kanamitsu 1994;Juang et al. 1997;Juang and Hong 2001) integrated at 48 km and L28 resolution, while the data assimilation system is the Local Ensemble Transform Kalman Filter (Ott et al. 2004;Hunt et al. 2007;Szunyogh et al. 2008). We choose the NCEP RSM for this study, because it has the most consistent dynamics, among all limited area models, with that of the NCEP GFS model. In particular, the two models share the same physical parametrization packages and the GFS model solution affects the RSM solutions not only at the lateral boundaries, but also in the entire limited area domain. We design numerical experiments to start assessing the forecast value added by the limited area assimilation.
The structure of the paper is as follows. Section 2 describes the coupling strategies we consider in this study. Section 3 explains the design of the numerical experiments that we carry out to assess the performance of the different coupling strategies. The results of the numerical experiments obtained for the perfect model scenario are presented in section 4, while the results obtained with assimilating observations of the real atmosphere are reported in section 5. Our conclusions are summarized in section 6.

Coupling Strategies
To design strategies for the coupling of a global and a limited area data assimilation system, we assume that the higher resolution limited area model provides a more accurate representation of the atmospheric dynamics in the limited area than does the global model.
Our goal is to take advantage of the availability of this presumed better model information within the limited area to improve the quality of the analyses. We introduce our strategies assuming that the data assimilation component is based on an ensemble transform algorithm (e.g., Bishop et al. 2001;Hunt et al. 2007). While Strategy 1 is a conventional uncoupled approach, which could be easily implemented using any data assimilation algorithm, and Strategy 2 and 3 could be implemented using any ensemble-based schemes, Strategy 4 takes advantage of the fact that ensemble transform algorithm provides a straightforward way to propagate information from the limited area data assimilation process to the global process.

a. Global and limited area model dynamics
The global model dynamics g, defined by propagates an estimate x g (t) of the global atmospheric state between an initial time t i and a final time t f . The components of x g (t) are the spatially discretized atmospheric state variables (e.g., temperature, components of the wind vector, surface pressure, humidity variables, etc.). The limited area model dynamics f, defined by propagates an estimate x l (t) of the atmospheric state in a limited area sub-domain of the globe at a resolution that is higher than that of x g (t). We introduce the notation for the difference between the high resolution and the global state estimate in the limited area domain. In Eq. 3, L is the mapping from the state space of the global model onto the state space of the limited area model. In practice, this mapping is an interpolation from the lower resolution grid of the global model to the higher resolution grid of the regional model.
While the limited area model resolves motions at scales that are smaller than the smallest scales resolved by the global model, there are scales that contribute to both x l (t) and x g (t).
Thus, x ′ l (t) cannot be simply considered to be a small scale perturbation to the global state vector in the limited area domain.

b. The motivation for coupled data assimilation
The derivation of the version of the LETKF which is considered in this study is based on the assumption that a model can provide a perfect representation of the dynamics of the observed system (Ott et al. 2004;Hunt et al. 2007). An implementation of the scheme on a numerical weather prediction model inevitably violates this assumption. One particular source of the error is the spatial discretization of the dynamics: the atmospheric state at time t is represented by a spatially continuous vector field u(t), while a model uses a finitedimensional discretization x(t) of u(t) assuming that a suitable projection P: x t (t) = P [u] exists. (Here, the superscript t indicates that x t (t) is the model state representation of the true atmospheric state.) Thus the finite-dimensional model dynamics g and f ignore an infinite number of interactions associated with the unresolved flow components. While parametrization of the sub-grid (unresolved) processes are designed to account for the effects of the unresolved scales on the resolved scale (e.g., Kalnay 2002), in general, a higher resolution model is expected to provide a more accurate representation of the atmospheric dynamics. The motivation for employing a limited area model is to provide a more accurate representation of the atmospheric dynamics in a limited area domain of particular interest.
Our intended purpose in coupling the global and limited area data assimilation processes is to take advantage of the presumed superiority of the limited area model in the limited domain to improve the accuracy of the limited area analyses.

c. Ensemble transform data assimilation schemes
An ensemble-based data assimilation system obtains the state estimate at analysis time t n in two steps: (i) in the forecast step, a prior estimate of the state, called the background, and an estimate of the uncertainty in the background are obtained by propagating information from the previous analysis time t n−1 to t n = t n−1 + ∆t using the model dynamics; and (ii) in the state update step, the prior estimates of the state and its uncertainty are updated based on the observations collected in the time window [t n − ∆t/2, t n + ∆t/2].
Formally, the forecast step involves preparing a K-member ensemble of background forecasts {x b(k) (t n ), k = 1, . . . , K}. For instance, in a global data assimilation system where {x a(k) g (t n−1 ), k = 1, . . . , K} are the members of the analysis ensemble at the previous analysis time t n−1 . The background statex b (t n ) is defined by the ensemble mean, while the uncertainty in the estimatex b (t n ) is described by the ensemble based estimate of the background error covariance matrix. Here X b (t n ) is the matrix whose k-th column is In an ensemble-transform-based data assimilation scheme the ensemble mean analysis is obtained byx where the "weight vector" w a (t n ) is the value of w that minimizes the quadratic cost function Here, y o (t n ) is the vector of observations assimilated at time t n and the observation operator h(x) maps the model representation of the atmospheric state to observables at observation times. The observation operator is assumed to satisfy where the vector of Gaussian random variable e(t n ) with mean 0 and covariance matrix R(t n ) represents the observation noise. In practice, h(x) is an interpolation of the model variables from the model grid points to the locations and times of the observations and a conversion of the model variables to the observed quantities. 1 In addition to the analysisx a (t n ), an ensemble transform scheme also generates an en-1 Because the observations assimilated at time t n are collected in the time window t ∈ [t n − ∆t/2, t n + ∆t/2], the model is integrated for a time 3 2 ∆t from t n−1 to provide a background trajec- semble of analysis perturbations by The analysis perturbations, which are the columns of X a (t n ), are added tox a (t n ) to obtain the members of the analysis ensemble x a(k) (t n ); k = 1, ..., K. One approach to compute the weight vector w a (t n ) and the weight matrix W a (t n ) is through a square-root Kalman filter algorithm (e.g., Tippett et al. (2003)).

d. Coupling strategies
In all four configurations of the coupling considered in this paper, the global background ensemble is obtained by equation 4. In the first three strategies we describe, the coupling is in one direction: the limited area data assimilation process uses information provided by the global analysis at the current or the previous analysis time, but the the limited area analysis has no effect on the global analysis at the current or future analysis times. In the fourth strategy, the global analysis within the limited area domain is prepared using information from the limited area analysis, thus feeding back information from the limited area data assimilation process to the global data assimilation process. In our description of coupling strategy 4, we make use of the fact that both the mean analysis and the analysis ensemble members can be computed by linearly combining the background ensemble perturbations.

1) Strategy 1: Limited area analysis by spectral interpolation
The limited area analysisx a l (t n ) is obtained by interpolating the global ensemble mean analysis to the higher resolution grid of the limited area model: In this configuration, although the global model is run in an ensemble mode, only a single limited area run is prepared using the mean of the global ensemble solution to provide the large scale forcing. In this configuration, the limited area model can outperform the global model if it can develop predictable flow features in response to the higher resolution boundary forcing terms in the limited area domain.

2) Strategy 2: Non-cycled limited area analysis
Members of the global analysis ensemble at t n−1 are interpolated to the higher resolution model grid of the limited area model to obtain a limited area analysis ensemble: This limited area analysis ensemble is then propagated forward in time by the limited area model to obtain the limited area background ensemble: A limited area analysisx a l (t n ) is then prepared by applying Eqs.
; applying the same weights to the limited area and global ensemble members, we increase the chance of preserving the dynamical balance between the global ensemble member x a(k) g (t n ) and the high resolution perturbation x ′ (k) l (t n ) during the data assimilation process.
The feedback may also improve the global analysis in the area near and within the limited area domain. In particular using the high-resolution model fields to obtain the global analysis may reduce the effect of the representativeness errors in the observations. There are two practical issues that have to be addressed when implementing Strategy 4. First, using the weights from the higher resolution limited area analysis in the global analysis requires an algorithm to map the weights from the high resolution grid to the lower resolution global grid. Second, abrupt changes may occur in the weights near the boundaries of the limited area domain. This can be addressed by implementing a blending process that smoothes the changes in the weights near the boundary of the limited area domain (see section 3.b).

Experiment design
First, we briefly introduce the three main components of our coupled analysis-forecast system: the global GFS model, the limited area RSM model and the LETKF data assimilation system. Then, we describe the design of the numerical experiments, the observational data sets we assimilate, and the verification scores we use to evaluate the different coupling strategies.

1) The model component of the GFS
The GFS consists of a model and a data assimilation component, but in this study we use only the model component. The dynamical core of the model is described in Sela (1980). The model has been upgraded numerous times since the nineteen-eighties, mainly to improve the physical parametrization and the computational performance, but the general solution strategy of the dynamical core has remained the same. In particular, the model uses the spectral transform technique to solve the model equations; that is, the nonlinear terms and the terms associated with parametrized physical processes are computed on a grid, while the spatial derivatives are computed in spectral space using spherical harmonics for the representation of the atmospheric fields. For vertical discretization the model uses a sigma coordinate system. We integrate the model using a triangular truncation with a cut-off wave-number of 62 and 28 vertical sigma levels (T62L28). At this spectral resolution the nominal resolution of the model (the grid spacing) is about 150 km in the midlatitudes, but, because of the use of a scale dependent diffusion to maintain a realistic kinetic energy spectrum , the small-scale components of the fields are artificially dampened.

2) RSM
The RSM is a perturbation model. That is, it predicts the evolution of a high resolution perturbation to the lower resolution global model solution and obtains the high-resolution model forecast by adding the forecast perturbation to the global forecast (Juang 1992;Juang et al. 1997). For the computation of the sum of the perturbation and the global fields, the spherical harmonics that represent the global fields in spectral space are directly transformed to the grid points of the NCEP RSM.
In the RSM, the time evolution of the perturbation is governed by the nonlinear inter-

b. LETKF
In the LETKF, the state update step of the Kalman filter is performed independently for each component of the state vector (Ott et al. 2004;Hunt et al. 2007). A key step of the LETKF algorithm is the selection of the set of observations that are considered when updating the estimate of a given state vector component. In practice, the different state vector components at a given grid point are analyzed in one step and in situ observations are selected for assimilation if they are closer to the grid point than a given distance. The assimilation of nonlocal radiance observations with the LETKF is also possible, but for those observations the observation selection is done in a different way (Fertig et al. 2008). In this study, we use the same set of LETKF parameters in both the global and the limited area data assimilation system as we used in the global system described in Szunyogh et al. (2008).
The number of ensemble members is K = 40, observations are assimilated if they are in a 800 km radius of the grid point, and the inverse of the assumed observational error variance is tapered linearly from its original value to zero between a distance of 500 and 800 km (thus tapering the effects of observations on the analysis that are further away than 500 km). The initial ensemble members are sampled from a free run with the NCEP GFS.
The one important difference between our implementation of the LETKF on the GFS and the RSM is that in the GFS implementation we employ a digital filter (Lynch and Huang 1992) to control free gravity wave oscillations, but we cannot perform such a filtering of the high-resolution limited area fields, because a digital filtering capability is not available for the RSM.
In our implementation of strategy 4, we compute the weights w a g (t n ) and W a g (t n ) for the global analysis within the limited area domain by taking the algebraic mean of the weights at the four closest grid point of the high resolution grid. We found that the blending procedure applied by the RSM to the model fields results in a sufficiently smooth transition of the weights of the global system near the boundaries. Thus, applying a blending algorithm directly to the weights was deemed not necessary.

c. Observational data set
The observational data set is identical to the one used in Szunyogh et al. (2008). It includes all conventional (non-radiance) measurements that were operationally assimilated at NCEP between 1 January 2004

Results for the perfect model scenario
We first compare the performance of data assimilation systems based on Strategies 1-3.
Then we compare the performance of the two systems based on Strategy 3 and 4.
a. The comparison of Strategies 1, 2 and 3 In Figures 5 and 6, we show the vertical profile of the root-mean-square error at 12-hr and 48-hr forecast times for the temperature and the two horizontal components of the wind.
The results suggest that all three limited area strategies provide forecasts, which are more accurate than the global forecast. On average, Strategy 3 provides more accurate forecasts than Strategy 2, and Strategy 2 provides more accurate forecasts than Strategy 1. While all three limited area forecast systems maintain their large advantage over the global system for the entire 48-hr, the difference between the performance of the three limited area systems is smaller at 48-hr than at 12-hr forecast time.
We show the spatial distribution of the forecast improvements introduced by the increas-ingly more sophisticated limited area data assimilation process for the geopotential height at the 300 and 500 hPa (Figures 7 and 8) and for the temperature at 850 hPa pressure level ( Figure 9). Using the limited area model only to prepare the forecasts (Strategy 1) consistently improves all verified forecast parameters in the verification domain (see panels a and d of the Fig. 7-9 c. Spectral analysis Figure 11 shows the spectral distribution of the error in the zonal wind forecasts for strategies 1, 2 and 3 at both the 12-h and the 48-h forecast times. This figure is produced the same way as Fig. 4, except that the Fourier transform is applied to the errors in the high-resolution wind forecast perturbation. Since the error in the large scale component of the forecast is the same for all three strategies, this figure illustrates the difference between the spectral distribution of the errors in the high resolution forecast for three strategies.
Results are not shown for Strategy 4, because in that case the difference between errors in the large scale forecasts also contributes to the difference between the errors for Strategy 4 and not for the other strategies.
The most striking feature of Fig. 11 is the large advantage of the system that cycles the limited area analysis (strategy 3) at 12-hr forecast time in the wave number range 10-30.
This result indicates that the LETKF coupled with the RSM can more skillfully predict the covariance in the wave number range 10 − 30 when the analysis is cycled. At 12-hr forecast time, the difference between the performance of the different configurations of the system is small at the longest resolved scales (wave numbers larger than 10) and at the shortest resolved scales (wave numbers larger than 60). There is no real difference at 48-hr between the performance of the three configurations, with the exception of a slight advantage of the cycled system (strategy 3).

Results with observations of the real atmosphere
a. Comparison of Strategies 1, 2, and 3 Verification results for our analysis-forecast experiments using observations of the real atmosphere are shown in Figs. 12 and 13. Overall, the limited area systems perform slightly better than the global system at 12-hr forecast time, while the global system performs better than the limited area systems at 48-hr forecast time. The difference between the performance of the limited area systems and the global system is larger for the two components of the wind than for the temperature. In particular, the clear advantage of the limited area systems for the zonal component of the wind below the 300 hPa level at 12-hr lead time turns into a clear disadvantage by the 48-hr forecast time. Another interesting feature of the verification results for the two components of the wind is the big advantage of the global system in the upper troposphere (above 300 hPa), most of which disappears by the 48-hr forecast time.
One possible explanation is that the poorer performance of the limited area systems at 12hr forecast time may be due to vertically propagating spurious gravity waves. Such waves may play a more important role in the limited area model than in the global model either because of the the lack of initialization or because of a less careful tuning of the mountain drag parametrization for the higher resolution orography of the RSM. A further investigation of this issue is beyond the scope of the present paper.
Similar to the results for the perfect model scenario, there is not much difference between the systems based on the different coupling strategies at 48-hr lead time. However, the picture is very different from what we observed for the perfect model scenario at 12-hr forecast time: Strategy 3, which performed the best under the perfect model scenario, performs the worst in the realistic case, while Strategy 2 maintains its slight advantage over Strategy 1.
This suggest that the RSM at the tested resolution is not a sufficiently better model than the GFS in the limited area to compensate for the problems that arise at the boundaries in Strategy 3.

b. Comparison of the Strategy 3 and 4
The comparison of the performance of strategies 3 and 4 with real observations is shown in Figs. 14 and 15. In these figures, we show the impact of the feedback on both the limited area and global forecasts (the two curves without feedback are the same as in Figs. 14 and 15).
We note that some caution should be exercised when interpreting the results shown in this pair of figures: the difference between the errors shown in these figures are statistically not significant when tested using the approach of Szunyogh et al. (2008). That test compares the time (sample) mean of the instantaneous differences between the root-mean-square-errors for variance of the two configurations at the different verification times to the variance of the same differences. The failure of the test indicates that the differences in the errors are not due to consistent differences at the different verification times. Instead, they are the net result of differences that are highly variable in magnitude and sign.
Interestingly, at 12-hr forecast time, the feedback has a much larger effect on the performance of the global forecast than on the performance of the limited area forecast. In particular, while the global forecasts of the temperature is clearly degraded by the feedback above 300 hPa and below 700 hPa, and the two horizontal wind components above 500 hPa are clearly degraded, the feedback improves the global forecasts of the two wind components

Conclusions
This paper documents our first attempt at exploring the potential benefits of coupling the global and limited area ensemble Kalman Filter data assimilations. To the best of our knowledge, ours is the first study that considers a feedback from the limited area data assimilation process to the global process. We carried out analysis-forecast experiments under a perfect model scenario, where the limited are model was considered to be perfect in the limited area domain and the global model errors is considered to be perfect elsewhere.
We also carried out experiments in a realistic setting. Our most important findings for the perfect model scenario are the following: • In the limited area domain, the limited area systems based on the different coupling strategies perform better than the global system. The advantage of the limited area systems is much larger at 12-hr lead time than at 48-hr lead time.
• Preparing a limited area analysis with a cycled limited area system enhances the performance of the limited area forecast system. The main benefit of cycling the limited area analysis may be that, through the inverse energy cascade in the two-dimensional inertial range, it can provide information about the effects of uncertainties of the smallest resolved scales on the uncertainties at synoptic and sub-synoptic scales (global wave number 10-30). A single analysis cycle does not provide sufficient time to achieve a similar effect.
Our additional findings from our tests using real atmospheric data are the following: • The results with observations of the real atmosphere confirmed that the limited-area data assimilation has potentially larger benefits at the shorter forecast times (12-hr vs. 48-hr in our experiments). The advantage of the limited area systems is smaller than in the perfect model scenario at 12-hr forecast time and has a disadvantage at 48-hr forecast time.
• Our attempt to feed back information from the limited area analysis to the global analysis led to mixed results. The feedback improved the 48-hr high resolution wind forecast under the perfect model scenario and the meridional large scale wind forecast at 48-hr in the realistic scenario, but also led to considerable degradation of some of the other verified atmospheric variables.
We emphasize that we consider the current study only to be the first step toward exploring the benefits of coupling the global and limited area data assimilation process. One potential extension of this study would be to increase the ratio of the resolution of the two models from the current 1:3 ratio (48 km vs. about 150 km). Since, in the current system the cutoff wave number for both models is within the inertial range of two-dimensional turbulence, the regional model does not really bring in new physics compared to the global model.
Increasing the resolution of the limited area model to a range where some of the nonhydrostatic processes are explicitly resolved would bring in a new source of kinetic energy (convection), as well as the effects of three-dimensional turbulence. Bringing in new physics could reduce the representativeness component of the observation errors with respect to the limited area model dynamics. This, in turn, could be expected to increase the potential benefits of feeding back information from the limited area data assimilation system to the global data assimilation system. One particular area of research where we expect such an approach to be especially beneficial is in the verification of interaction between a tropical cyclone and the large scale flow. We are currently in the process of testing our coupled data assimilation system for such a scenario.      5. Vertical profile of the root-mean-square forecast error in the limited area domain at 12-hr forecast time for the global forecast (solid) and for the limited area forecasts with coupling Strategies 1 (long dashes and dots), 2 (short dashes) and 3 (dots). Fig. 7. The difference between the root-mean square errors of the geopotential height forecasts at the 300 hPa level for the different configurations of the analysis system at 12-hr and 48-hr lead times. Shown are the difference between the forecasts started from the global analysis and the limited area analysis of strategy 1 (panel a and d), from the limited area analyses of strategies 1 and 2 (panel b and e), from the limited area analyses of strategies 2 and 3 (panel c and f ). Where the values are positive the forecast from latter analysis is more accurate. Also shown is the mean flow at the 300 hPa level for the "true states" in the verification period (contours). Fig. 8. Same as Fig. 7, except for the geopotential height forecast at the 500 hPa level. Fig. 9. Same as Fig. 7, except for the temperature at the 850 hPa level. Fig. 10. The difference between the root-mean square errors of the 48-hr forecasts started from the analyses obtained by Strategy 4 and Strategy 3. Results are shown for the limited area geopotential height forecasts at the 300 hPa (panel a) and the 500 hPa (panel b), and the limited area temperature forecasts at 850 hPa (panel c); for the global geopotential height forecasts at 300 hPa (panel d) and 500 hPa (panel e) and the global temperature forecast at 850 hPa (panel f ). Strategy 4 provides more accurate forecasts where the shades indicate positive values. Contour show the time mean of the true geopotential height at the given level. Fig. 11. The kinetic energy spectrum of the forecast error with respect to the global wave number at 12-hr and 48-hr forecast lead times in a log-log scale. Shown is the error for strategy 1 (blue dashes), strategy 2 (green dashes and dots) and strategy 3 (red dots). The straight solid line with slope -3 indicates the scaling law for the kinetic energy in the inertial range for two-dimensional turbulence. Fig. 12. Vertical profile of the root-mean-square forecast error in the limited area domain at 12-hr forecast time for the global forecast (solid) and for the limited area forecasts with coupling strategies 1 (dashes and dots), 2 (dashes) and 3 (dots) assimilating observations of the real atmosphere.