A framework for variational data assimilation with superparameterization

Superparameterization (SP) is a multiscale computational approach wherein a large scale atmosphere or ocean model is coupled to an array of simulations of small scale dynamics on periodic domains embedded into the computational grid of the large scale model. SP has been success5 fully developed in global atmosphere and climate models, and is a promising approach for new applications. The authors develop a 3D-Var variational data assimilation framework for use with SP; the relatively low cost and simplicity of 3D-Var in comparison with ensemble approaches makes 10 it a natural fit for relatively expensive multiscale SP models. To demonstrate the assimilation framework in a simple model, the authors develop a new system of ordinary differential equations similar to the two-scale Lorenz-‘96 model. The system has one set of variables denoted {Yi}, with large 15 and small scale parts, and the SP approximation to the system is straightforward. With the new assimilation framework the SP model approximates the large scale dynamics of the true system accurately.


Introduction
Superparameterization (SP) is a multiscale computational method for parameterizing small scale effects in large scale atmosphere and ocean models.It was originally developed and has been particularly effective as a cloud parameterization in atmosphere models (Grabowski and Smolarkiewicz, 1999;Randall et al., 2003), and has been implemented in global atmosphere and climate models (Khairoutdinov and Randall, 2001;Tao et al., 2009;Randall et al., 2013).SP cou-ples a large scale, low resolution model to an array of local small scale, high resolution simulations embedded within the computational grid of the large scale model.The computational cost is kept down through a variety of methods, most prominently by reducing the dimensionality of the small scale simulations, e.g., using one vertical and one horizontal coordinate in the aforementioned atmospheric applications.Although atmosphere and climate models with SP are particularly successful at producing a realistic Madden-Julian oscillation and diurnal cycle of convection over land (Khairoutdinov et al., 2005), there are as yet no data assimilation systems designed for use with these models.Instead, the large scale variables are initialized from state estimates generated with non-SP models and the small scale variables are initialized with small-amplitude noise (Khairoutdinov et al., 2005).Once the SP model has been initialized, there is no practical framework for combining observational data with the multiscale model forecast to produce a new initial condition.
The authors recently developed an ensemble Kalman filter framework for data assimilation with SP (Grooms et al., 2014, hereafter GLM14).This framework was developed in the context of stochastic SP, a variant of SP that reduces computational cost by replacing the small scale simulations of SP with quasilinear stochastic models (Grooms and Majda, 2013;Majda and Grooms, 2014).Stochastic SP has only been developed for idealized turbulence models (Grooms andMajda, 2013, 2014a, b;Grooms et al., 2015), and is not yet implemented in global atmosphere, ocean, or climate models.The relatively high cost and computational complexity of global atmosphere and climate models with SP and the extra cost associated with an ensemble-based data assimilation system makes it unlikely that it will be possible to use Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
I. Grooms and Y. Lee: A framework for variational data assimilation with superparameterization these models with the framework of GLM14 in the near future.Here we develop a 3D-Var variational data assimilation framework for SP that builds on and modifies the framework of GLM14.
Observations of physical variables have large scale and small scale parts, the former of which is equated with the large scale model variables, and the latter with the variables of the small scale embedded simulations.A key feature of SP is that the small scale simulations are periodic, so a location on the small scale computational grid does not correspond precisely to any location in the real physical domain; as a result, the small scale simulations provide only statistical information about the small scales, and this information can be used as a prior in the data assimilation context.In GLM14 an ensemble of SP simulations provides prior information on the large scale variables, but in the present approach the prior information on the large scales comes from a single SP simulation and a time-independent "background" covariance matrix for the large scale variables.When the observation operator is linear the analysis estimates of the large and small scale variables can be computed independently of each other, and the small scale covariance information effectively provides a time-and state-dependent estimate of representation error.When the observation operator is nonlinear the large and small scale analysis must be computed simultaneously by minimizing an objective function.Although analysis estimates of the small scale variables can be computed with linear observations, and must be computed with nonlinear observations, our framework does not at this time use the small scale analysis estimate to update any of the small scale SP variables because the latter cannot be unambiguously associated with any real physical location.A key update of the GLM14 framework is that we here compute a small scale analysis estimate at locations where observations are available, rather than at every coarse grid point.This can result in significant computational savings in the case of a nonlinear observation operator.We also update the GLM14 framework to better handle observations at locations off the coarse grid.
The complex superparameterized atmosphere and climate models mentioned above are not particularly convenient for the development of a new data assimilation framework, and existing toy models of SP are of limited utility for this purpose.In Sect. 2 we develop a new system of ordinary differential equations based on the two-scale Lorenz-'96 (L96) model (Lorenz, 1996(Lorenz, , 2006)), and an SP approximation to that system.This new model serves as a test bed in which to demonstrate our new SP 3D-Var framework.The 3D-Var framework with SP is presented in Sect.3, and assimilation experiments using the new framework and the new system are described in Sect.4, followed by conclusions.

A multiscale Lorenz-'96 model with superparameterization
In this section we develop a new simple model for SP in which to demonstrate our data assimilation framework.Majda and Grote ( 2009) developed an idealized model of SP, but the system suffers from one major drawback: it does not consist of an SP approximation to an idealized system, but rather consists only of an idealized SP model.Harlim and Majda (2013) used the model of Majda and Grote (2009) to develop a data assimilation strategy for SP, but with the assumption that direct observations of the large scale variables were available, rather than having both large and small scale contributions to the observations.Lee and Majda (2015) have recently investigated a range of multiscale assimilation methods in a highly condensed model where the "large scale" consists of a single scalar with no spatial extent.Wilks (2012) developed an SP approximation for the two-scale Lorenz-'96 system, which has the following form (Lorenz, 1996(Lorenz, , 2006)): The X k variables have periodicity X k = X k+K , and the Y j,k variables have periodicity Y j +J,k = Y j,k+1 and Y j,k+K = Y j,k , where j = 1, . . ., J and k = 1, . . ., K. The combined index j + J (k − 1) is naturally associated with spatial location along a latitude circle, and the local aver- serves to separate large and small spatial scales.
This system is primarily useful as a two-timescale model, since for large c the Y j,k variables are faster than the X k variables.Wilks's SP approximation to this system reflects this fact by treating the Y j,k variables as purely small-scale; also, in his SP approximation the periodicity of the Y j,k variables is replaced by defining Y 0,k = Y J +1,k = Y J +2,k , which are set to a constant value.Considering that the multiscale nature of SP is primarily based on spatial scale separation rather than timescale separation, a more natural SP approximation to the two-scale Lorenz-'96 system might make the Y j,k variables locally periodic: Y j +J,k = Y j,k .Nevertheless, there would still be two sets of large scale variables (X k , and the j -average of Y j,k ) but only one set of small scale variables (Y j,k minus its j average).Rather than bend the twotimescale Lorenz-'96 model to our two-space-scale purpose, we develop a new two-space-scale version of the Lorenz-'96 model that is more naturally suited to an SP approximation.
The new model is defined by the following equation: where Y = {Y i } J K i=1 , where 1 J K is a vector of length J K with all elements equal to 1, T is a matrix in R K×J K , and the index i, which is periodic Y i+J K = Y i , is analogous to spatial location on a latitude circle, similar to the original L96 model (Lorenz, 1996(Lorenz, , 2006)).The nonlinear functions N Y and N X are defined as where Eqs. ( 4) and ( 5) are evaluated assuming periodicity for the vectors X = {X k } K k=1 and Y : X k+K = X k and Y i+J K = Y i .The matrix T extracts the large scale part of Y ; we choose to let T be defined as the projection onto the first K discrete Fourier modes, followed by evaluation on an equispaced grid of K points.The large scale dynamics are obtained by applying T to Eq. (3) from the left: where we define the large scale component X = T Y , and use that J T T T is the identity matrix and that T 1 J K = 1 K (these are true for our choice of a Fourier projection, but other choices of T are possible).Note that when h = 0 the dynamics are those of the single-scale Lorenz-'96 model with K modes, and when h = 0 the nonlinearity N Y (Y ) couples large and small scales.Energy conservation for the nonlinear terms in Eq. ( 3) is obtained by noting that Eq. ( 4) implies Y T N Y (Y ) = 0, and that Eq. ( 5) implies , and it is convenient to define notation for the small scale part of Y : The superparameterization approximation is governed by where Y j,k , and there is local as well as global periodicity: Y j +J,k = Y j,k and X k+K = X k .The large scale dynamics in the SP approximation are obtained by javeraging Eq. ( 8), which gives When h = 0 the large scale dynamics of the SP approximation and the true system are equivalent.As in more complex SP applications, the small scale variables (here Y j,k − X k ) are locally periodic, and are coupled to the large scale using a local average over a periodic domain in a manner analogous to the coupling in more complex SP models (e.g., Grabowski, 2004).The X k variables in the SP model attempt to accurately model the dynamics of X in the true system, but the small scale variables of the SP approximation are only statistically related to the small scale variables of the true system; i.e., one does not expect an SP variable Y j,k to be a direct approximation of any of the true system variables Y i .The purpose of this research is not to study the SP approximation in this system, but rather to use the system as a test bed for our data assimilation framework.We therefore choose to focus on parameter regimes where the SP approximation is reasonably accurate, setting J = 128 so that there is a good scale separation (the SP approximation should break down for small J ).The number of large scale modes is set to K = 41; we choose 41 rather than the usual 40 so that the discrete Fourier modes associated with the large scale variables are 0, ±1, . . ., ±20, and the twentieth mode is not split between large and small scales.It remains to choose F and h.In general, for fixed nonzero h the small scale variables become more chaotic and larger amplitude as F increases, and similarly for fixed F as h increases.As the small scales become more chaotic and larger amplitude, the large scale variables become less chaotic.This behavior is perhaps counterintuitive, but similar behavior has been observed in the two-scale Lorenz-'96 system by Abramov (2012).Balancing the desire for complex large scale dynamics and turbulent small scale dynamics, we choose to focus on two parameter regimes: Some characteristics of the dynamics in regimes I and II are presented in Figs. 1 and 2, respectively.In regime I the large scale dynamics consist of a train of eight propagating and nonlinearly interacting "waves", as seen in the time series of the X variables in Fig. 1a.The large scale dynamics of the SP approximation are qualitatively similar, as shown in Fig. 1b.The time-lagged autocorrelation function of the X k variables (averaged over k) is shown in Fig. 1e, and displays an oscillatory structure associated with the wave train.The initial decay of the time-lagged autocorrelation is approximated by an exponential of the form exp{−(λ + iω)t} with decorrelation time λ −1 = 0.84 and oscillation period 2π/ω = 0.71; the resurgence of correlation between 6 and 8 time units is associated with the time it takes a single wave to propagate once around the domain.The regularity of the wave train is also reflected in the space-lagged autocorrelation function for the X k variables shown in Fig. 1f, which is well approximated by the SP dynamics.Figure 1c shows the Y i variables at an instant of time (blue), along with the large scale part (red; the projection onto the first 41 Fourier modes) and the X k variables (yellow circles).There is clearly strong small scale variability, but not so strong that it completely obscures the large scale pattern, and the amplitude of the small scale variability varies over the domain.Figure 1d shows the time-averaged energy spectrum | Ŷκ | 2 , where Ŷκ is the discrete Fourier coefficient of Y i with wave number κ.There is a clear separation in amplitude between the large scale Fourier modes (κ ≤ 20) and the small scale modes, showing that the large scale energy is concentrated near wave numbers κ = 7 and 8, while the small scale energy is more broadly distributed among Fourier modes.The broad distribution of small scale energy among Fourier modes is indicative of the strongly chaotic small scale dynamics, as is the rapid temporal decorrelation of the small scale variables y i shown in Fig. 1e.The decorrelation time of the small scale variables y i is estimated as 0.2 using the integral of the time-lagged autocorrelation function.
In regime II the large scale dynamics are more chaotic, though wave trains are still evident in the time series of X in Fig. 2a.The large scale dynamics of the SP approximation are again qualitatively similar, as shown in Fig. 2b.The time-lagged autocorrelation function of the X k variables in Fig. 2e decays much more rapidly than in regime I.The initial decay of the time-lagged autocorrelation is approximated by an exponential of the form exp{−(λ + iω)t} with decorrelation time λ −1 = 0.38 and oscillation period 2π/ω = 0.95, and there is no resurgence of correlations at long lag times.The decreased regularity of the wave train is reflected in the space-lagged autocorrelation function for the X k variables shown in Fig. 2f, which is again well approximated by the SP dynamics.The snapshot of the Y i variables in Fig. 2c shows a diminished level of small scale variability overall, with some regions having almost no small scale activity and others having strong small scale variability.The energy spectrum in Fig. 2d shows that the energy is more broadly distributed among large scale Fourier modes, though there is still a peak at wave number κ = 8.The broad distribution of small scale energy among Fourier modes is again indicative of the strongly chaotic small scale dynamics, as is the rapid temporal decorrelation of the small scale variables y i shown in Fig. 2e.The decorrelation time of the small scale variables y i is estimated as 0.23 using the integral of the time-lagged autocorrelation function.
The Y i variables have a uniform time mean of 3.8 and 3.6 in regimes I and II, respectively, which is accurately reproduced by the SP approximation.The X k variables have variance 31 and 32 in regimes I and II, respectively, and their SP counterparts have slightly higher variances of 33 and 34.The small scale variables y i have climatological variance of 70 in regime I and 29 in regime II, though Figs.1c and 2c show that this variability is unevenly distributed over the physical domain at any given instant.

Variational data assimilation with superparameterization
The primary difficulty in developing a data assimilation framework for an SP model is that observations of the true system include contributions from large and small scales, and it is necessary to relate the observations to the large and small scale variables of the SP model.GLM14 provided a frame-work for relating observations to SP model variables, and we improve on this framework below.
Let the large scale variables of the SP simulation be denoted u (the overbar does not denote a statistical mean), and let the small scale variables be denoted u.In the context of the new Lorenz-'96 model, u = X and u = {Y j,k − X k } j,k .In most SP applications there is a set of small scale variables at every point of the large scale computational grid.The small scale variables exist on local periodic domains so that the small scale variables at each coarse grid point are disconnected from those at surrounding coarse grid points, and the small scale variables have zero average across the periodic directions.Each location in the small scale periodic domains does not correspond to a different location in the real physical domain.Instead, all points in a given periodic domain are best thought of as existing at one physical location: the associated coarse grid point.
In GLM14, observations are related to the SP model variables using the following observation model where H is the observation operator and ε is a vector of zeromean normal random variables associated with observation error.The vector u has the same size as u, and models the small scale contribution to physical variables at the coarse grid points; i.e., u = u + u is the vector of real physical variables at the coarse model grid points.The physical variables u are interpolated to the location of the observations by L. The vector u is not the same as the small scale SP variables u.Instead, the mean and covariance of u are computed from the statistics of the small scale SP variables u.Although the true small scale variables u can in principle have nonzero statistical mean, the small scale SP variables u always have zero mean because their average over the local periodic domains is always zero by definition.For example, in the context of the new Lorenz-'96 model the GLM14 version of the vector u has length K, has zero mean, and has a diagonal covariance with entries As noted in GLM14, it is unrealistic to use the same interpolation operator for both the large and small scale variables because it assumes that the small scale variables vary smoothly between the coarse grid points, whereas the small scale variables should by definition vary over shorter distances.(Observations in GLM14 were taken only on the coarse grid points, avoiding the issue.)Instead of specifying an alternative interpolation operator for the small scales, we update the framework by altering the definition of u to include small scale variables only at the points where observations are taken.We also assume that the statistics of the small scale variables vary on large scales and can therefore be smoothly interpolated from the coarse grid points, where small scale SP statistics are available, to the locations of the observations.
Let P denote the number of different physical locations where observations are available (for simplicity of exposition, we assume that there is only one observation per location, i.e., v ∈ R P ).The updated observation model for the pth location is where L p interpolates the large scale model variables u to the observation location and ε p is a zero-mean Gaussian random variable.There is thus one vector u p of small scale variables per observation location.The updated observation model for all P observations can be written in vector form as where u is no longer defined as in GLM14, but according to the discussion above.
The covariance of the small scale variables P is computed from the small scale variables of the SP model, and thus changes from one assimilation cycle to the next.Specifically, it is first assumed that the small scale variables at different observation locations are uncorrelated from each other so that one needs only compute the covariance matrices P p of the u p variables.This assumption is reasonable as long as the observations are taken at locations reasonably well separated compared to the correlation length of the small scale variables.(The framework could be updated for situations where the observations are closer than this, e.g., by using spatial correlation information for the small scale variables computed from the SP simulation, but this is beyond the scope of the present investigation.)To compute P p we begin by computing auxiliary small scale sample covariance matrices P k using the small scale SP variables u at each coarse grid point.Let { u k,j } J j =1 be the small scale SP variables located in a periodic domain at the kth coarse grid point, where there are J grid points in the periodic embedded domain.Then, recalling that their average over J is zero, the auxiliary small scale sample covariance matrix is where the superscript T denotes a vector transpose.It is typically the case that J is large enough that P k is full rank, and we do not consider exceptions here.In the context of the new Lorenz-'96 model the auxiliary small scale sample covariances are given by Eq. ( 11).Finally, the small scale covariance matrices at the observation locations P p are obtained by interpolating the elements of the matrices P k from the coarse grid to the locations of the observations, which assumes that the small scale statistics vary smoothly on the large scale.The interpolation method used to interpolate the small scale covariance matrices need not be the same as L, and should have positive coefficients in order to ensure that the small scale covariance matrices remain positive definite.(It may not be necessary to compute sample covariance matrices P k at every coarse grid point; one only needs to compute them at points needed in the interpolation.)For comparison, in GLM14 the covariance of the small scale variables P is the same size as the large scale background covariance B, and consists of the auxiliary small scale sample covariance matrices P k arranged in block-diagonal form.When observations are taken at every coarse grid location the GLM14 formulation is equivalent to the new one.
To complete the specification of the 3D-Var framework we specify a prior joint distribution for u and u with mean and covariance As typical in a 3D-Var setting, the background covariance matrix B for the large scale variables is independent of time, and the prior mean for the large scales is given by a single forecast of the large scale part of the SP model.The small scale variable u is assumed to be uncorrelated with the large scale variable.In practice, the large and small scale variables are certainly not independent, but as shown in GLM14 the assumption that they are uncorrelated is reasonable within the context of an SP model where the small scale variables have zero mean.To wit, the joint probability distribution of large and small scale variables can be factored into the large scale marginal and the small scale conditional distributions p(u, u ) = p M (u) p C (u |u).The cross-covariance between large and small scale variables is where the term in square brackets is zero because the small scale variables are assumed to have zero mean regardless of the state of the large scale variables.
Having thus specified the observation model and prior mean and covariance, the 3D-Var analysis estimate of the system state minimizes the following objective function (Talagrand, 2010): where R is the covariance matrix of the observation error vector ε.
When the observation operator is linear, H = H, the analysis can be computed from the Kalman filter formulas (Talagrand, 2010), which in this case gives where the superscript "a" denotes the analysis estimate.A key feature of these formulas is that the large scale and small scale estimates can be computed independently.In particular, the large scale estimate can also be computed as the minimizer of the following objective function: In cases where the small scale estimate is not used and the observation operator is linear, the small scale estimate does not need to be computed.It can be seen from Eqs. ( 20) and ( 23) that the observed small scale covariance matrix H P H T acts as a time-varying estimate of the representation error since it inflates the measurement error covariance R.
In GLM14 the small scale covariance matrix P is defined differently (as described above) and the small scale vector u is the same size as the large scale vector u.In the GLM14 formulation the final term in the objective function Eq. ( 18) is replaced by For linear observations the GLM14 versions of the Kalman filter formulas are In the new approach there is one set of small scale variables for each location where observations are available, whereas in GLM14 there are small scale variables at each coarse grid point.In global atmosphere and climate models there are typically fewer observations than coarse grid points; when the observation operator is nonlinear the new formulation is more efficient because the objective function has fewer degrees of freedom.Another key difference is in the assumptions that go into the specification of the small scale background covariance: in GLM14 the small scale variables are tacitly assumed to vary smoothly over the physical domain, since they are smoothly interpolated between coarse grid points, whereas in the present approach only the small scale covariance is assumed to vary smoothly over the domain.

Assimilation experiments
In this section we describe data assimilation experiments in both regimes of the test model using the 3D-Var framework from Sect. 3.
Observations are taken at P = MK equispaced points with M = 1, 2, and 4; specifically, observations are taken at i p = 1 + pJ /M for p = 1, . . ., P .Observations are either linear, with v p = Y i p + ε p , or nonlinear, with v p = (Y i p + 30) 2 /50 + ε p .In both cases the observation errors ε p are iid Gaussians with zero mean and variance 0.1.Observations are assimilated every t time units.In regime I we test t = 0.2 and 0.6; for comparison the decorrelation times of the small scale and large scale variables in this regime are 0.2 and 0.84.In regime II we test t = 0.2 and 0.4, which are close to the decorrelation times of the small scale and large scale variables, respectively.
Specification of the background covariance matrix is a crucial aspect of any 3D-Var assimilation system.We consider the simplest possible estimate B = σ 2 I K where I K is the K × K identity matrix and σ 2 is a tunable parameter.Assimilation experiments are run over a range of σ 2 and the optimal value is chosen based on rms (root mean square) errors; the results are very weakly sensitive to σ 2 as long as it is within a factor of 2 of the diagnosed forecast error variance.Since our observing system includes at least one observation for every X k variable, it is less important to build a background covariance matrix with correlations between the X k variables.
A single assimilation experiment consists of 1000 cycles, where the SP variables for the first forecast are initialized directly from the true model variables.Although the assimilation system provides estimates of the small scale part of the true system at the location of the observations, this information is far from sufficient to provide an estimate of the full state Y of the true system.We view the 3D-Var assimilation as primarily aimed at estimating the large scale model variables X k , and error statistics are tracked only for the large scale variables.We track two performance metrics for the large scale variables, the time averaged rms error and the time averaged pattern correlation both for the forecast and for the analysis.As a point of comparison for the performance of the forecast in the assimilation experiments, we consider climatological values of rms error and pattern correlation defined using As a point of comparison for the performance of the analysis estimate in the assimilation experiments we take a "smoothed observation" estimate that is obtained by projecting the observations onto the largest K Fourier modes.For example, when M = 1 there are K observations and the "smoothed observation" estimate of the X k variables is simply X k ≈ v p for the linear case and X k ≈ 50v p − 30 for the nonlinear case.The rms errors in the smoothed observation estimate are tracked over the course of each assimilation experiment, rather than computing climatological values.The 3D-Var should at a minimum perform better than the smoothed observations.The results for both regimes are presented in Tables 1 and 2 in the format Forecast→Analysis.In all cases the errors decrease as M increases, and the analysis significantly improves over the forecast.
We also compare to the performance of an ensemble adjustment Kalman filter using the true system dynamics.These experiments and their results are described in Section "Comparison to EAKF".
The large scale dynamics are more predictable in regime I than in regime II, but the small scale variance is larger as well, making it harder to obtain an accurate estimate of the large scales.With a short observation time t = 0.2, the forecast and analysis for linear and nonlinear observations both have rms errors smaller than both the climatological error In regime II the results with the linear and nonlinear observations are very similar in all cases.With a short observation time t = 0.2, the forecast is always more accurate than the climatological mean, and the analysis is always more accurate than the smoothed observations.With a longer observation time t = 0.4 the forecasts are no more accurate than climatology, but the analysis is still more accurate than the smoothed observations, though at M = 4 the analysis is only slightly more accurate.

Comparison to EAKF
To put the foregoing results into perspective we compare to the results of an ensemble adjustment Kalman filter (EAKF; Anderson, 2001) using an ensemble of 100 simulations of the true, non-SP model dynamics.The experiments were run with relatively frequent ( t = 0.2), relatively plentiful (M = 4), linear observations in an effort to obtain the best possible results.Experiments were run with multiplicative covariance inflation factors from 0 to 20 % and covariance localization radii of two, four, and six grid points (Gaspari and Cohn, 1999), and optimal results were obtained with 5 % inflation and a localization radius of four grid points.
In regime I the rms forecast errors of the X k variables were 5.1, decreasing to 4.6 after the analysis; the rms forecast pattern correlation was 0.61, improving to 0.69 after the analysis.In regime II the rms forecast errors of the X k variables were 5.9, decreasing to 5.6 after the analysis; the rms forecast pattern correlation was 0.53, and remained essentially unchanged at 0.52 after the analysis.
In both regimes the EAKF estimates the large scale part of the solution very poorly, much worse than the SP 3D-Var.This poor performance is presumably associated with the fact that the EAKF is attempting to estimate the full system state Y , whereas the SP 3D-Var is only estimating the large scale part.From the point of view of the EAKF the observations are very sparse, since there is only one observation for every 32 variables, whereas from the point of view of the SP 3D-Var there are four observations for every large scale X k variable.The significant improvement in both cost and accuracy of using the SP 3D-Var instead of a perfect-model ensemble Kalman filter underscores the utility of the present approach, though it bears noting that one should be very hesitant to extrapolate results such as these to the far more complex setting of SP atmosphere models.Furthermore, whether or not SP 3D-Var will be more accurate than a state of the art ensemble Kalman filter in an atmospheric model context is somewhat beside the point since the goal here is to provide a practical framework for data assimilation with SP models where no such framework currently exists.

Conclusions
Superparameterization (SP) is a multiscale computational approach that has been successfully applied to modeling atmospheric dynamics, and that shows promise for more general applications (Tao et al., 2009;Randall et al., 2013;Majda and Grooms, 2014).Grooms et al. (2014) have developed an ensemble Kalman filter framework for use with SP.However, the standard approach to SP in global atmosphere and climate models, where small scale nonlinear dynamics are simulated on an array of periodic domains embedded in the computational grid of a large scale model, is too computationally demanding for use in an ensemble framework.As a result, there is at present no practical framework for data assimilation with SP models.We here develop a 3D-Var variational data assimilation framework for SP that builds on and modifies the framework of GLM14.The main update to the GLM14 framework, in addition to using a variational as opposed to ensemble Kalman filter setting, is that small scale estimates are computed at locations where observations are taken, rather than at every point of the large scale model's computational grid.The computational costs of the new framework are such that it could be used with computationally demanding global atmosphere and climate SP models.
The data assimilation framework is demonstrated in a new system of ordinary differential equations based on the twoscale Lorenz-'96 model (Lorenz, 1996(Lorenz, , 2006)).Unlike the two-scale Lorenz-'96 model the new model has only one set of variables, Y i , and these variables have large and small scale parts.An SP approximation to the new system is developed, which is perhaps the simplest idealized model of SP.The new data assimilation framework is tested in two regimes of the new model, with both linear and nonlinear observation operators.In regime I the large scale dynamics consist of a weakly chaotic wave train, with relatively strong small scale variability superposed.In regime II the large scale dynamics are more strongly chaotic, and there is less small scale variability.In both regimes the data assimilation performs as expected (and better than an ensemble Kalman filter using 100 simulations of the true dynamics), with increased accuracy as the number of observations increases.
Our work lays a foundation for 3D-Var data assimilation with existing SP models.In order to implement our framework with an SP atmosphere or climate model, it would be necessary to specify an appropriate background covariance matrix for the large scale model, but this should be straightforward given the extensive use of the 3D-Var approach in atmosphere and ocean data assimilation (e.g., Kalnay, 2002;Kleist et al., 2009).In addition, the new framework removes one of the difficulties associated with development of a 3D-Var framework for large scale models: the small scale simulations in the multiscale SP computation provide direct information on the small scale statistics, obviating, or at least simplifying, the need to develop models of representation error.

Figure 1 .
Figure 1.Climatological statistics in regime I. (a) Time series of the X k variables.(b) Time series of the X k variables from the SP approximation.(c) A snapshot showing Y i (blue), the large scale part of Y i defined by projection onto the first 41 discrete Fourier modes (red), and the X k variables (yellow circles).(d) Time-averaged energy spectrum | Ŷκ | 2 where Ŷκ is the discrete Fourier coefficient of Y i with wave number κ. (e) Time-lagged autocorrelation functions for X k (blue) and the small scale part of Y i (red), defined by projecting out the first 41 Fourier modes.(f) Space-lagged autocorrelation functions for X k from the true dynamics (blue) and the SP approximation (red).

Table 1 .
Results of the assimilation experiments for regime I.There are P = MK equispaced observations, assimilated at time intervals of t , and σ 2 is the amplitude of the background covariance matrix.For comparison, the climatological rms error and pattern correlation are 5.6 and 0.57.= 3.8 in regime I and X k = 3.6 in regime II.The climatological rms error is simply the square root of the climatological variance: 5.6 in regime I and 5.7 in regime II.The climatological pattern correlation is the time averaged pattern correlation between X k and its uniform climatological mean value: the climatological pattern correlation is 0.57 in regime I and 0.53 in regime II.If the forecast has a larger rms error or smaller pattern correlation than the climatological values, then the forecast is of very limited utility.

Table 2 .
Results of the assimilation experiments for regime II.There are P = MK equispaced observations, assimilated at time intervals of t , and σ 2 is the amplitude of the background covariance matrix.For comparison, the climatological rms error and pattern correlation are 5.7 and 0.53. the error in the smoothed observation estimate.The nonlinear observations generate slightly more accurate results than the linear observations when M = 1, and the linear observations generate slightly more accurate results for M = 4, but overall the results are similar.With a longer observation time t = 0.6 the results are, naturally, less accurate.In every case the analysis is more accurate than both the climatological error and the smoothed observations, but the forecasts are more accurate than the climatological mean only with M = 4.With M = 1 and 2, the rms forecast errors are worse than the climatological error, but the forecast pattern correlations are still a bit better than the climatological pattern correlation.As with the shorter observation time, the results are more accurate with the nonlinear observations.