An ETKF approach for initial state and parameter estimation in ice sheet modelling

. Estimating the contribution of Antarctica and Greenland to sea-level rise is a hot topic in glaciology. Good estimates rely on our ability to run a precisely calibrated ice sheet evolution model starting from a reliable initial state. Data assimilation aims to provide an answer to this problem by combining the model equations with observations. In this paper we aim to study a state-of-the-art ensemble Kalman ﬁlter (ETKF) to address this problem. This method is implemented and validated in the twin experiments framework for a shallow ice ﬂowline model of ice dynamics. The results are very encouraging, as they show a good convergence of the ETKF (with localisation and inﬂation), even for small-sized ensembles.


Introduction
Antarctica and Greenland account for a significant fraction of today's sea-level rise (Shepherd et al., 2012).Good estimates of their future contribution are therefore crucial to producing sea-level change forecasts, as underlined by Hanna et al. (2013).Producing pertinent estimates of polar ice sheet contribution to sea-level change relies on our ability to run a precisely calibrated ice sheet evolution model starting from a reliable initial state.Calibrating the model means identifying as best we can the unknown parameters of the model (bedrock topography, basal friction law), and finding a reliable initial state means of identifying the model variables that vary in time (ice thickness).
In this work we will not address calibration of the full comprehensive model.Indeed, this problem covers a wide range of research fields: numerical analysis, mass-balance modelling, thermo-mechanical modelling.We will address the problem of initial state and parameter estimation.There exists a wide range of methods to solve this problem, all of them looking to produce a good fit between the estimated present state and the available observations.
There is a variety of models dealing with ice flow; all of them are based on the quasi-static equilibrium equation and consider ice as a viscous fluid with a nonlinear viscosity.Most models take advantage of the very small aspect ratio of ice sheets and use a thin layer approximation, but they differ on the order of the approximation.We use here the simplest approximation (shallow ice approximation).Although simple and fast, it preserves the nonlinearity of the system and is sensitive to the parameters we want to estimate, bedrock elevation and basal drag.
The Greenland and Antarctic ice sheets are huge ice bodies, more than 1000 km wide with harsh climatic conditions that limit field observations.We rely consequently on satellite observations that provide a global coverage.The two major characteristics that can be used for our study are surface elevation and ice velocities at the ice sheet surface.Although in this work we use synthetic experiments and generate our own observations, we stay as close as possible to the availability, resolution and uncertainties of real observations.Ice sheet surface elevation has been monitored by radar (ERS-1, ENVISAT) or laser (ICESat) satellite altimetry for approximately 20 years.The spatial resolution depends on the instrument but is typically kilometric (smaller along track).The measurement errors for these observations are usually small: for instance for the DEM (digital elevation model) of Antarctica, 42 % of the map has an RMS smaller than 2 m, but RMS can be up to 130 m in mountainous regions; see Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.Griggs and Bamber (2009).Ice velocity is obtained by radar interferometry and maps are available over the major part of both ice sheets with a kilometric resolution (Rignot et al., 2011for Antarctica, Joughin et al., 2010 for Greenland).The uncertainty ranges from 1 to 17 m per year (Rignot et al., 2011).Beside satellite observations, there are also localised measurements of the bedrock topography generally obtained by radar measurements from planes and thus restricted to flight lines.A bedrock topography map has been recently published gathering all the measurements from any country (Fretwell et al., 2013).The coverage is however still heterogeneous, from closely spaced flight lines in some places to huge regions with no flight at all.The map comes thus with an associated error map, ranging from 20 m (on measured points) up to 1000 m (in unobserved areas), and it is worth noting that this error map is obtained thanks to the kriging method, with no ice flow consideration.
Data assimilation (DA) covers the methods used to combine model and data in order to estimate initial states or parameters.There are two main classes of DA algorithms: variational, based on optimal control theory (the prototype being 4D-Var), and sequential, based on optimal statistical estimation (the prototype being the Kalman filter).DA is widely known in weather and oceanography forecasting, but its introduction in glaciology is fairly recent, in particular for the initial state estimation problem for sea-level rise.MacAyeal (1992) and MacAyeal (1993) introduced control methods to infer basal drag in ice-stream models, using in particular the self-adjoint property of such models, leading to many application papers (Rommelaere and MacAyeal, 1997;Vieli and Payne, 2003), and later for full Stokes models (Morlighem et al., 2010;Jay-Allemand et al., 2011).Later on, many DA and inverse methods were introduced in glaciology.The Best Linear Unbiased Estimation (BLUE) and Optimal Interpolation (OI) methods were introduced by Arthern (2003) and Berliner et al. (2008).The Robin inverse method due to Chaabane and Jaoua (1999) has been introduced by Arthern and Gudmundsson (2010) for ice sheet models, and finally, Heimbach and Bugnion (2009) presented the first adjoint ice sheet model derived automatically.
As the ice surface elevation is pretty well (in terms of accuracy and data density) observed by satellite sensing, initial state estimation focuses on bedrock topography and basal sliding law estimation.The joint identification of bedrock topography and basal friction law is a difficult problem.Indeed, different configurations (e.g. a low bed and no sliding versus a higher bed with sliding) may lead to identical surface observations.This is why most of the previous works on this subject chose to investigate the basal sliding identification only, usually for local modelling (glaciers or ice streams).Recently, a couple of papers addressed the ice-sheet initialisation problem: Arthern and Hindmarsh (2006) 2012) use a more general approach called Markov Chain-Monte Carlo using neural networks.Contrary to the variational approach or Kalman, this Monte Carlo approach solves the Bayes theorem exactly, which is the basis of every DA system.However, this method cannot be applied in our context due to the too high dimension of the problem: Tarasov et al. ( 2012) needed over 50 000 runs of their model in order to estimate accurately the statistics of 39 parameters.
Variational and Kalman DA methods are equivalent under the condition of linearity, in which they achieve leastvariance linear estimation.If in addition errors are Gaussian, they achieve Bayesian estimation.Of course in the framework of realistic applications, models are seldom linear and errors are not Gaussian, so that both methods have different advantages and drawbacks, and may compare differently depending on the application.In this paper we made the choice to start our investigation with a Kalman-based method.The Kalman filter (KF) was introduced by Kalman (1960) for linear systems.Later on it was extended to nonlinear systems.Because of the curse of dimensionality the extended Kalman filter is not practical in large realistic systems, as the size of the state vector prevents the computation of the time-evolving covariances.Evensen (1994) then introduced the ensemble Kalman filter (EnKF), which is a Monte Carlo approximation of the KF that avoids the computation of the covariance matrices.Following the corrected version of the algorithm by Burgers et al. (1998), numerous authors used the EnKF and proposed improvements and variants.
To the best of our knowledge, neither KF nor EnKF has ever been used in glaciology, and we propose to implement and investigate its performance to tackle the sea-level rise problem.This paper is organised as follows: in Sect. 2 we give some details about the DA methods, in Sect. 3 we describe the ice-sheet model and in Sect. 4 we describe our numerical results.Finally we discuss the results and conclude in Sect. 5.

Methods
First introduced by Evensen (1994) and later corrected by Burgers et al. (1998), the ensemble Kalman filter (EnKF) is a common approach in data assimilation for state and parameter estimation.In the traditional Kalman filter, the state of a physical system at a time t k is represented by two quantities: x k , the augmented state vector of size n x which contains an estimation of all variables and parameters we wish to estimate, and P k , the error covariance matrix of this estimation.
k , i = 1, . . ., N e .The estimation of the extended state vector is given by the ensemble mean x (i) k (1) and P k by the ensemble covariance matrix with − x k the ensemble perturbation matrix.
EnKF is a two-step algorithm: forecast and analysis.Quantities produced during the forecast step (analysis step) are denoted with a superscript f (superscript a ).
The forecast consists in propagating the ensemble using a numerical model: with M k a nonlinear model which propagates the state vector from time t k−1 to time t k (of course, the model does not change the parameters) and η (i) k a sample vector following the model error pdf.In this paper we assume a perfect model so that η (i) k = 0.At time t k observations are available.We denote them y o k , which is a vector of size n y .The mapping between observations and model state is given by the following relation: with H k a possibly nonlinear observation operator, x t k the true extended model state vector and k the observation error.
k is assumed unbiased (zero mean) and Gaussian with the covariance matrix R k .
The aim of the analysis step is to correct the forecast ensemble thanks to the observations.Several versions of EnKF exist.Here we use a deterministic one called the ensemble transform Kalman filter (ETKF).In the following, we omit the time index k for readability.

ETKF formulation
The ETKF was first introduced by Bishop et al. (2001).The main idea of this formulation is the following: the analysis covariance matrix P a can be written as a transformation of the forecast ensemble perturbations as with P a an N e by N e matrix.The same idea (in a slightly different formulation) was developed for the SEIK filter (Pham, 1996;Pham et al., 1998;Pham, 2001).Here we follow the ETKF implementation given by Hunt et al. (2007).In order to formulate the analysis step, we first define the forecast ensemble projected in observation space as follows, and we denote y f the mean of the above ensemble and Y f the associated perturbation matrix.The analysis step then consists in minimising the following cost function: where w is a vector of size N e .The minimiser of J is given by where is the inverse of the Hessian of J at the minimum.The ensemble covariances after the analysis are given by Eq. ( 5) and the ensemble mean by The analysis ensemble perturbation matrix is obtained by where U is an arbitrary orthogonal matrix which preserves the ensemble mean.For the sake of simplicity, we set U as the identity matrix in the sequel.Using Eqs. ( 10) and ( 11) we can then proceed to the next forecast step using Eq.(3).

Inflation and ETKF
EnKF is known for suffering from under-sampling issues (the so-called curse of dimensionality).Two well-known procedures have been developed to correct their symptoms: inflation and localisation.The former scales up by a factor the ensemble perturbation matrix to avoid the underestimation of the variances (Anderson and Anderson, 1999).The scaling factor is also known as the forgetting factor in Pham et al. (1996).In the ETKF formulation of Hunt et al. (2007), the inflation consists in replacing Eq. ( 9) by with ρ > 1 the inflation factor.It is equivalent to rescale Y f by √ ρ.If the observation operator is linear, it is equivalent to rescale the ensemble perturbation matrix.This procedure is very popular due to its simplicity but can only resolve the variance underestimation due to EnKF.

Localisation and ETKF
Another under-sampling issue of EnKF is the problem of long-range spurious correlations.These can be overcome by the use of localisation.The ETKF configuration in Hunt et al. (2007) does not allow the use of localisation on a forecast error covariance matrix as in Hamill et al. (2001) or Houtekamer and Mitchell (2001).However, localisation on R is still possible.One way is to perform local analysis by assimilating a subset of observations on a given location (Ott et al., 2004).It has been implemented by Hunt et al. (2007) and is known as LETKF.This method can lead to discontinuities in the analysis when an observation leaves the subset of used observations for a given location.To solve this, Hunt et al. (2007) select a subset of observations and increase error variance for observations close to the boundary of the subset.It is equivalent for a given location to replace R with ρ −1 l •R, with ρ l some distance-based correlation matrix (such as the one suggested by Gaspari and Cohn, 1999), the origin being the given location and • denoting the elementwise (Hadamard or Schur) product.Note that ρ −1 l • R must be changed for each given location even if it uses the same observations as for another location.We use this corrected version of LETKF for our experiments.
Note that (L)ETKF can produce negative ice thicknesses.In those (rare) cases, we set back ice thickness to zero.

Basis of ice sheet model
The goal of an ice sheet model is to simulate the evolution of ice thickness H (x, t) over a supposedly fixed bedrock topography B soc (x) (Fig. 1).The total has a surface elevation denoted S(x, t), according to Table 1.List of parameter values involved in the parameterisation of the surface mass balance.As F clim is a time-evolving climatic parameter, its value will be specified for each experiment in Sect.4.1.
Parameter Value where t represents the time, and the x axis represents the latitude.
Ice sheet thickness H (x, t) is governed by the balance between surface mass budget, precipitation or surface melting, and ice flow that drains ice accumulated in the central parts toward the edges of the ice sheet.All these components can evolve with time in response to climatic changes and internal feedbacks.In the context of model initialisation, we will focus on ice flow, surface mass balance being assumed as known.
In this study, we use a parameterisation of surface mass balance b m as a function of atmospheric temperature T s .The values of the different parameters involved in this parameterisation are detailed in Table 1.The surface mass balance is the sum of accumulation Acc ≥ 0 (snowfalls) and Abl ≤ 0 (melting).These two terms are parameterised as follows: Ablation occurs when the temperature exceeds the melting point, represented by T melt .The surface temperature T s depends on surface elevation of the ice sheet S, distance along the x axis (representing latitude with x = 0 the pole) and climate temperature F clim , according to the formula: F clim evolves in time; its value will be specified for each experiment in Sect.4.1.This parameterisation (values are detailed in Table 1) reproduces qualitatively the typical surface mass balance over an ice sheet and the feedbacks associated with its possible changes in geometry.
Ice flow is gravity driven: it follows the quasi static equilibrium equation, i.e. the momentum conservation equation in which the only body force is due to gravity and acceleration is negligible.
Incompressibility is assumed for the whole domain.This is a classical approximation in ice sheet modelling.Although Nonlin.Processes Geophys., 21, 569-582, 2014 www.nonlin-processes-geophys.net/21/569/2014/ at the surface snow slowly densifies in firn and ice, this process is restricted to the top 100 m of the ice sheet (over about 3000 m).Thus, for large-scale modelling, this layer is replaced by an ice equivalent one with density equal to ice density.
Ice behaves as a viscous fluid with a nonlinear viscosity.It is usual among glaciologists to use the empirical "Glen" flow law (Duval, 1979) with an exponent 3 (see below for more details).In the following however we will use a polynomial law associating the Glen flow law and a Newtonian one.This law was suggested by Lliboutry (1993) and Cuffey and Paterson (2010), and can better represent deformation at very low stresses existing for instance in the upper part of the ice sheet.Moreover, it prevents infinite viscosity in the case of no deformation.It is written as: where ˙ ij is the train rate tensor component, u i the velocity in the direction i, τ ij is the deviatoric stress component, A and φ are the coefficients of, respectively, the Glen flaw law (n = 3) and the linear (Newtonian) flow law and τ is the effective stress, or formally the second invariant of the deviatoric stress tensor.Boundary conditions at the surface are atmospheric pressure and no tangential stress.At the ice bed interface, a dragging law (usually called sliding law by glaciologists) must be prescribed to derive the tangential stress τ b (also called basal drag).
The dragging law depends on the substratum, and especially on whether it is hard-bed or soft-bed (sediment).However, many of the sliding/dragging laws proposed in glaciology can be written as follows (see Cuffey and Paterson, 2010): where U slid is the basal tangential velocity, K and a are respectively coefficient and exponent of the dragging law and their value depends on the characteristics of the rock-ice interface, basal temperature, sediment thickness, water pressure, roughness, etc.In this article, we will restrict our work to a linear law (a = 1).Note that when the ice sheet reaches the ocean, ice begins to float and forms structures called ice shelves (or floating tongues if they are narrow).Basal drag there is zero and Eq. ( 18) is not valid.In this work we will only consider an ice sheet resting on the bedrock (grounded ice sheet).
Associating quasi static equilibrium and viscous behaviour leads to the full Stokes equations.There is a whole hierarchy of approaches to solve this problem (see for instance the ISMIP inter-comparison project, Pattyn et al., 2008).Recently, a few ice sheet models were designed to solving rigorously the full Stokes problem, but given the numerical cost they are usually restricted to a regional scale or can only afford a few century simulations; see Gillet-Chaulet et al. (2012), Larour et al. (2012).All the other ice sheet models use approximations based on the very small aspect ratio of ice sheets.Ice sheets are indeed a few kilometres thick and more than a thousand kilometres wide, so that asymptotic developments are possible.In this work, we will use an approximation called shallow ice approximation that preserves the main nonlinearity of the problem, linked to the deformation law, and the dependency on the dragging law coefficient that is one of the characteristics we want to estimate.

Shallow ice approximation (SIA) for a flowline model
This approximation (Hutter, 1983) describes ice deformation in the vertical plane and allows us to calculate the vertical profile of horizontal velocity.It is based on an asymptotic approach and is vertically integrated.We use here a model, Winnie, which is a flowline and isothermal version of a 3-D ice sheet model, GRISLI (see Ritz et al., 2001;Peyaud et al., 2007;Quiquet et al., 2012).According to the SIA, ice flow is in the direction of the steepest slope of the surface and can be thus relatively well represented by a flow line following that direction.The isothermal assumption allows uniform deformation coefficients, A and φ (their values are given in Table 2).It is taken as a first step in this study because taking thermal processes into account would require us to estimate a third poorly known parameter that is the geothermal heat flow.Such a topic is beyond the scope of this study.
The vertically averaged horizontal velocity, U , is given by U def comes from ice deformation in the vertical plane, and according to the SIA equations and our constitutive equation (polynomial) reads as where x is the horizontal coordinate, ρ is ice density, g gravity, H and S are ice thickness and ice sheet surface elevation.
Note that the nonlinearity of the constitutive equation is implicitly taken into account by the exponents for surface slope and ice thickness.
The horizontal velocity at the surface U s is one of the observed characteristics and is given by To derive the basal velocity U slid , we assume that basal drag and basal velocity are linked through a linear dragging law and that the basal shear stress τ b is given by the SIA: Parameter Value where β is positive and is one of the characteristics we want to determine.Because this parameter covers several orders of magnitude, we often use α = log 10 (β).
The previous equation is one example of the sliding laws described in Eq. ( 18).Although simple, this method reproduces the various types of ice flow found in ice sheets: slow flow in the central parts where ice deformation is dominant, and fast flow in ice streams for low values of β.Such a simple sliding relation cannot reflect the large variety of sliding processes and their nonlinearities, but is sufficient for our purpose, especially with the isothermal assumption (Bueler and Brown, 2009).Moreover, due to the thin layer approximations, one should be careful in the interpretation of sharp horizontal changes in the bedrock topography or basal drag coefficient.
Finally, the evolution of ice sheet geometry is derived from the mass conservation equation: In our model, mass conservation is the only time-dependent equation and H is the only prognostic variable.Ice velocities are diagnostic (completely determined by the ice sheet geometry) because acceleration terms are neglected.

Numerics
The velocity field is obtained analytically from Eqs. ( 19), (20), and ( 22) as a function of ice thickness and surface slope.
The mass conservation equation ( 23) is solved by a finite difference method with a staggered regular grid, velocities and slopes grid points are taken at the midpoint between thickness grid points.Because velocity is strongly dependent on the ice sheet geometry, the time scheme is semi-implicit; see Hindmarsh (2001).We use for every experiment a fixed time step of 0.01 year.

Numerical results
We now present our numerical results.This section is divided into four paragraphs: firstly we present the numerical setup of the various experiments.Secondly we explain how to build the initial ensemble members.Thirdly we present the preliminary results obtained with the ETKF and a large ensemble.
Finally, we study the impact of smaller ensemble sizes, for ETKF with inflation and localisation.
We recall here that our problem is to estimate jointly the ice thickness H , the bedrock topography B soc , and the basal sliding coefficient β = 10 α .As this particular problem has never been studied before with ensemble filtering, this work is focused on twin experiments, because it will allow quantitative assessment of the method performance.

Numerical setup
Twin experiments consist in comparing three model states: the reference (or true) state, which is used to generate observations; the background state, which is another model state, different from the reference; the assimilated state, also called analysis, resulting from the DA procedure.We can then compare the misfit between the assimilation and the reference, and the misfit between the reference and the background (no assimilation).
We performed the twin experiments over a 20-year time window.This is realistic for this problem, as the available observation records roughly span 20 years.Note that this small window size makes our proposed DA experimental setup quite challenging.
We first explain how to choose the reference state.We set up the Winnie model with 241 grid points and a space step x = 5 km, so that the state vector (H (x), B soc (x), α(x)) length is 743.The reference run is initialised at t = 0, with the ice thickness H and bedrock topography B soc presented in Fig. 2.This ice thickness is chosen as the stable state for a constant climate forcing (F clim = 8 • C in Eq. 15).In practice, it is produced from a very long (50 000 years) model run, starting from H = 0. Also at t = 0, the sliding parameter coefficients β(x) = 10 α(x) are defined as in Fig. 3.
With this reference initial state, we now impose a linear climate change (+0.2 • C for 20 years, so F clim (t) = 8 + 0.01t with t in years) and we perform a 20-year model run in order to get the reference state over the assimilation window.Note that after 20 years the ice thickness is very similar to the initial one: the dynamics, highly nonlinear, are also very slow.
We then use the reference state to generate synthetic observations.We choose t = 1 year as a realistic observation interval, therefore our assimilation experiments will have 20 observation times, and thus 20 analysis steps.Every year, we observe the surface elevation S and the surface velocity U s at each grid point (we detailed the reference surface velocity profile with its sliding and deformation components in Fig. 4).We also observe the bedrock topography every year and every 30 grid points.To simulate observation errors, we add noise to these (perfect) observations.More precisely, we add a Gaussian white noise to each observation independently, with realistic standard deviations: 2 m for surface elevation, 3 m year −1 for surface velocity and 20 m for bedrock topography.Every DA experiment presented in this section uses the same background state (see Figs. 2 and 3) as an initial ensemble mean.The next section describes how to build the ensemble from the background state.The same process has been used to produce the background state and parameters from the reference.

Initial ensembles
As we said before, the initial ensemble mean is set to the background state.Before we describe the procedure, we have to acknowledge that some observations were used twice: once to build initial ensembles (i.e. as a priori information), and another time in the DA system (i.e. as observations).This violates a crucial hypothesis required by Kalman-based filters: the independence between the a priori estimation and the observations.However, this is a characteristic of realistic frameworks: indeed, the state-of-the-art a priori bedrock topography Bedmap2 (Fretwell et al., 2013) is produced using surface altitude observations (as well as all available bedrock measurements).Now let us describe how to build initial ensemble members.For β = 10 α , we first generate an ensemble of α(x) parameters drawn from a Gaussian distribution with a given covariance matrix (a squared exponential covariance function).Then we transform this ensemble of α into an ensemble of β.
Then we generate an ensemble of bedrock topographies according to a Gaussian distribution, whose mean is the background and whose covariance matrix is consistent with observation statistics.The covariance matrix requires two ingredients: the variances and a correlation matrix, according to the formula B = C , where is the square root of the diagonal matrix of variances, and C is a correlation matrix.Where bedrock topography observations are available, the variance is set to the observation variance (20 m in standard deviation).Where there is no ice, the variance is set to the surface altitude observation variance (2 m in standard deviation).Elsewhere the variance increases with the distance to the closest observation (bedrock or surface) available.Figure 5 presents the standard deviations that form the diagonal of .At worst the standard deviation is equal to 320 m, which is accurate compared to available real uncertainty estimation on B soc .
For the correlation matrix, we assume that the correlation of bedrock topography between two grid points only depends on distance (isotropy).Classically a Gaussian function is taken for the correlation function.However, we choose a sum of two Gaussian functions (see Fig. 6), as using only one Gaussian function produced too smooth or too rough bedrock topographies.Indeed, this combination allows us to ensure the smoothness of simulated bedrock topographies at a large scale and their slight roughness at shorter scales.This behaviour seems to produce realistic bedrock topographies.
We now have an ensemble of bedrock topographies.We then generate an ensemble of surface elevations according to a Gaussian distribution.As before, its mean is the true surface elevation (from the reference run) and its covariance matrix is consistent with surface elevation observation statistics (white noise with a standard deviation of 2 m).To obtain the initial ensemble for ice thickness, we subtract the ensemble of bedrock topography from the ensemble of surface elevation: H = S − B soc .Then we correct this new ensemble to avoid negative ice thicknesses (we just set them back to zero).Finally we run the model for each ensemble member during 1 year in order to obtain more physically balanced ice sheets and we perform a multiplicative rescaling on the produced ensemble so that its mean remains equal to the background.An example of 50 ensemble members is presented in Fig. 7.

First results with a large size of ensemble
We first perform twin experiments with a large ensemble (size N e = 1000) compared to the state vector dimension, in order to validate our ETKF approach, without having to deal immediately with under-sampling issues.We therefore run the ETKF without inflation or localisation.The bedrock topography obtained after 20 years is presented in Figs. 8 (mean topography with or without assimilation) and 9 (absolute difference from the reference with or without assimilation).This clearly shows that the final results on ice thickness and bedrock topography are very accurate, as the average RMS error for bedrock topography has decreased from 207.5 to 45.9 m and the maximum standard deviation from 671.6 to 150.5 m.The average RMS error for ice thickness has also decreased from to 45.5 m and the maximum absolute difference from 671.6 to 150.5 m.
Figure 10 presents the results for the β parameter.We can see that the accuracy after assimilation is quite good at the edges of the ice sheet and worse in the centre, where β is large, and also where there is no ice.However, poor results on large β are not meaningful.Indeed, we are mostly interested in recovering the sliding component of the velocity, and it is well known among glaciologists that its sensitivity to variations of β for large β is very low: β ≥ 10 5 leads to zero sliding velocity in any cases, so that β = 10 6 or 10 7 does not make any difference.Similarly, β in areas where there is no ice is meaningless.This is confirmed in Fig. 11, which shows the sliding counterpart of the surface velocity: we can see that the analysis is much closer to the reference than the background.Precisely, Fig. 12 shows the absolute difference from the reference on surface sliding velocity with or without assimilation, and we can see that it is much reduced with assimilation.However, U slid is poorly retrieved at some points around x = 1000 km.This corresponds to a zone where ∂S ∂x is much smaller, which is the onset of the sliding.Absolute difference between analysis and reference (in purple x) after 20 years of the 1000-member ETKF is compared to the absolute difference between background and reference (in green crosses) for bedrock topography.The average RMS error is decreased from 207.5 m for the background to 45.9 m for the analysis and the maximum absolute difference from 671.6 to 150.5 m.
the dynamics are close to being discontinuous, and therefore highly nonlinear.
We then apply a slight inflation of 1.01 to the 1000member ETKF. Figure 13 (to be compared to Fig. 9) shows the improved results for the ice sheet geometry and Fig. 14 (to be compared to Fig. 12) for the sliding velocity.The improvements due to inflation are particularly pronounced in areas where sliding and deformation are both predominant and the dynamics highly nonlinear.

Dealing with small sizes of ensemble
The results obtained with a large ensemble are satisfying.Nevertheless we will not be able to perform such experiments with a full 3-D large-scale ice sheet model.Indeed, in that case the state vector dimension is larger than 100 000, and it would be impossible to use an ensemble that large.With this remark in mind we now perform ETKF experiments with smaller ensembles: 100, 50, and 30.Without localisation and/or inflation the ETKF is known for its divergence for small ensembles.To check this fact, we perform one experiment with 100 members, without localisation or inflation.
Figure 15 presents the final analysed bedrock topography, which is clearly degraded with respect to the background.Actually, RMS error on bedrock topography increases from The background (green) is compared to the reference (blue) and the analyses for various ensemble sizes: 100 members (purple), 50 members (cyan) and 30 members (orange).RMS error evolution is synthesised in Table 3. See also Fig. 17 for detailed results.
207.5 m for the background state to 302.4 m for the final analysis step, which is a 50 % increase.
We then use inflation, as we did with 1000 members.Best results are obtained in that case with an inflation of 1.10.Compared to the results without inflation, RMS errors are improved to 121.2 m for the bedrock topography and to 235.4 m year −1 for the sliding velocity; maximum absolute differences from the reference are also improved (see Table 3).However, those results are still quite poor.In such a context localisation is mandatory.Consequently, for the remaining part of this section, we use LETKF with manually tuned inflation and localisation.Fig. 17.Absolute difference between results and reference for the bedrock topography after 20 years of the LETKF with inflation.The background (green) is compared to the analyses for various ensemble sizes: 100 members (purple), 50 members (cyan), and 30 members (orange).RMS error evolution is synthesised in Table 3.
Let us recall briefly how localisation is performed.The observation error covariance matrix R is modified into ρ −1 l • R, with ρ l some distance-based correlation matrix (Gaspari and Cohn, 1999).The distance l must be tuned manually in order to achieve good results.It corresponds to the maximal distance between a given grid point and the observations used in analysis at this point.
We performed numerous experiments in order to determine by hand the best l and the best inflation parameter for each ensemble.We chose the bedrock topography RMS error as a score to measure the performance of inflation and localisation.Regarding localisation, the best results were obtained with l between 16 x (ensemble size 30) and 24 x (ensembles size 50 and 100), so that this distance l is quite insensitive to ensemble sizes.In contrast, the optimal value for the inflation parameter proved to be much more sensitive to ensemble sizes (ranging from 0.98 to 1.14).
Table 3 presents the RMS error and maximal absolute difference from the reference for B soc and U slid for the optimally tuned LETFK with 30, 50, and 100 members, as well as the ETKF for 100 and 1000 members.We can see that despite the small ensemble sizes the results are pretty good.Figures 16  and 17 present the bedrock topography and its absolute difference from the reference with 30-, 50-, and 100-member ensembles and confirm the good performance of the filters.As before, the performance on β itself is not significant, so we show only results on the sliding velocity.Figure 18 shows the sliding velocity, and Fig. 19 presents the absolute difference from the reference for the sliding velocities.As previously, these figures enlighten two different regimes.First, where the ice is either grounded or in full sliding the filters perform quite well.Second, where the ice just starts to slide (where the proportion between the sliding and deformation counterparts of the velocity changes) the filters fail and the RMS is larger, as already noticed in the case of the 1000member ETKF without inflation.Fig. 19.Absolute difference between results and reference for the sliding component U slid of the surface velocity after 20 years of the LETKF with inflation.The background (green) is compared to the analyses for various ensemble sizes: 100 members (purple), 50 members (cyan), and 30 members (orange).RMS error evolution is synthesised in Table 3.
We also performed LETFK (with inflation) experiments with smaller ensembles (sizes 10 and 20), but the results were very dissatisfying because of filter divergence (not shown here).

Conclusions and further directions
In this article, we developed an ensemble transform Kalman filter (Hunt et al., 2007) in order to estimate jointly the bedrock topography, the ice thickness and the basal sliding parameter of an ice sheet.The originality of this work is the application of an EnKF approach in ice sheet modelling.We performed twin experiments with a flowline simplified ice sheet model using the shallow ice approximation and a  sliding law for basal velocities.Every experiment used surface elevation and surface velocity observations all over the ice sheet and a couple of observations of bedrock topography.First we successfully tested our DA approach with a large ensemble to validate the use of ETKF.Then we tried with smaller ensembles.In those cases, localisation and inflation are mandatory.Obtained performances were good even for ensembles with a size as small as 30.However, localisation and inflation were manually tuned.Localisation distance was not very sensitive to ensemble size but inflation was.In order to avoid the manual tuning of optimal inflation parameters, we could use online estimation.This is a growing interest in the DA community.Some works such as Bocquet (2011) or Bocquet and Sakov (2012) provided convincing theoretical arguments for the use of inflation and its automatic computation.
Finally, results shown here are preliminary, as we used a flowline model.The logical choice to improve the model complexity would be to use a hybrid shallow ice-shallow shelf model as in GRISLI (Ritz et al., 2001).
with a BLUE/OI method and Gillet-Chaulet et al. (2012) with control and Robin methods.More recently still, researchers have started to investigate coupled inversion of bedrock topogra-phy and basal drag simultaneously; see Raymond-Pralong and Gudmundsson (2011) (using the Bayes theorem) and van Pelt et al. (2013) (with a simple Picard iteration to reduce the observation-model misfit).In the slightly different context of calibrating climatic parameters used by an ice sheet model, Tarasov et al. (

Fig. 2 .Fig. 3 .
Fig. 2.Ice sheet geometry for the reference and background states.The x axis represents the horizontal extent of the ice sheet (in km).The y axis is the elevation in metres from the sea level.The reference bedrock topography is in blue, the background bedrock is in green crosses.Reference and background surface altitude are identical, in red.

Fig. 4 .
Fig.4.Ice velocity at the surface of the ice sheet for the reference state, in metres per year, as a function of the ice sheet extent x.The total surface velocity U s is represented in red, its sliding component U slid is in orange crosses, its deformation component U s,def with cyan x (we recall that U s = U slid + U s,def ).

Fig. 5 .Fig. 6 .
Fig. 5. Standard deviations (in metres) for the matrix used to generate the initial bedrock topography ensembles, as a function of the extent x.Local minimums correspond to either "no ice" points (close to the boundaries of the domain) or observation points.

Fig. 7 .
Fig. 7. Example of an initial ice sheet ensemble (50 members): topographies B soc (x) and surface elevation S(x) as functions of the horizontal distance x.

Fig. 8 .
Fig.8.Ice sheet geometries after 20 years, with the 1000-member ETKF.The analysis bedrock topography (final mean ensemble in purple x) is compared to the background (green crosses) and the reference (blue).The analysis result and the reference are really close except for around 150 km from the origin (see also Fig.9for detailed results).The surface elevation is accurately observed (in red), so that background, reference and analysis are similar.

Fig. 10 .Fig. 11 .Fig. 12 .Fig. 13 .Fig. 14 .
Fig.10.β parameter after 20 years of the 1000-member ETKF.The analysis (purple x) is compared to the background (green crosses) and the reference (blue).Values of β above 10 5 are all equivalent in terms of sliding velocity.Similarly, β is meaningless where there is no ice (close to boundaries).

Fig. 15 .Fig. 16 .
Fig.15.Bedrock topography after 20 years, with the 100-member ETKF (no inflation, no localisation).The analysis (final mean ensemble in purple x) is compared to the background (green crosses) and the reference (blue).RMS error increases from 207.5 m for the background state to 302.4 m for the final analysis step and the maximal absolute difference from 671.6 to 1093.7 m.

Fig. 18 .
Fig.18.Sliding component U slid of the surface velocity after 20 years of the LETKF with inflation.The mean of the final analysis ensemble is compared to the background (green) and the reference (blue), for various ensemble sizes: 100 members (purple), 50 members (cyan), and 30 members (orange).Localisation and inflation parameters are described in Table3.See also Fig.19for detailed results.

The EnKF as a Monte Carlo method approximates both Nonlin. Processes Geophys., 21, 569-582, 2014 www.nonlin-processes-geophys.net/21/569/2014/ quantities
by the use of an ensemble of N e realisations x Ice sheet geometry: an ice cap of thickness H (x, t) lies on a bedrock (in blue) whose topography is B soc (x).S(x, t) represents the surface altitude of the ice sheet.

582, 2014 574 B. Bonan et al.: ETKF for state and parameter estimation in ice sheet modellingTable 2 .
List of parameter values involved in the computation of ice velocities.

Table 3 .
Summary of best LETKF + inflation results in term of RMS and maximal absolute difference and comparison with standard ETKF results.Experiment Size of ensemble Loc.l Inflation RMSE B soc Max.error B soc RMSE U b Max.error U b