The impact of nonlinearity in Lagrangian data assimilation

The focus of this paper is on how two main manifestations of nonlinearity in low-dimensional systems – shear around a center fixed point (nonlinear center) and the differential divergence of trajectories passing by a saddl (nonlinear saddle) – strongly affect data assimilation. The impact is felt through their leading to non-Gaussian distribution functions. The major factors that control the strength of these effects is time between observations, and covariance of the prior relative to covariance of the observational noise. Both these factors – less frequent observations and larger prior covariance – allow the nonlinearity to take hold. To expose these nonlinear effects, we use the comparison between exact posterior distributions conditioned on observations and the ensemble Kalman filter (EnKF) approximation of these posteriors. We discuss the serious limitations of the EnKF in handling these effects.


Introduction
The assimilation of Lagrangian or pseudo-Lagrangian data from fluid flow, obtained by surface and subsurface drifters and floats in the ocean as well as by gliders and autonomous underwater vehicles (AUV), has now become an indispensable and widely used tool for studying the oceans and other natural water bodies (Chyba, 2009;The Argo Science Team, 2001).The set of mathematical techniques that use the data from these instruments for the purpose of state estimation are generally referred to as Lagrangian data assimilation (LaDA) (Ide et al., 2002;Kuznetsov et al., 2003).The emphasis in the recent past on studying various aspects of LaDA has been prompted mainly by the rapidly increasing availability and density of such data in a variety of situations.Another aspect of the LaDA problem that has received much less attention is its suitability for studying certain funda-mental aspects of the data assimilation problem in general -namely, the effects of nonlinearity and the consequent non-Gaussianity of probability densities involved in assimilation problems.
The main emphasis of this paper will be precisely on these issues of nonlinearity using the Lagrangian data assimilation framework.We will use the Bayesian viewpoint of the data assimilation problem (Apte et al., 2007(Apte et al., , 2008a) that is summarized later in Sect.3. The central object of study in the Bayesian framework is the posterior probability distribution function (PDF) on the dynamic state of the system, i.e., the conditional PDF of the state conditioned on the observations.This posterior is a compilation of the available information, and approximating it as closely as possible is the main goal of data assimilation.We take the viewpoint that valuable insights on the interplay between the dynamics and statistics can be gained through comparisons between this posterior PDF and the results of other data assimilation methods that produce, in effect, different approximations of this posterior.
In particular, we focus on comparing the Markov chain Monte Carlo (MCMC) sampling of this posterior PDF with a commonly used data assimilation technique -namely, the ensemble Kalman filter (EnKF), (Evensen, 2009) which generates an ensemble of the states of the system, incorporating the observations sequentially using a Kalman-filter-based method, explained in detail later in Sect.3. The ensemble produced by the EnKF method also samples, albeit approximately, the posterior PDF.In the limit of a large ensemble size, the ensemble of states resulting from the "update" step of the EnKF, or the various variants of it, is an exact representation of posterior PDF if the ensemble prior to the update is Gaussian.(Lei et al., 2010) If the EnKF is initialized with a Gaussian ensemble, the nonlinearity of the dynamical model may result in a significantly non-Gaussian prior ensemble that will update to a poor approximation of the posterior.
We argue that this discrepancy between the exact MCMC sampling of the posterior and a possibly poor approximation of it produced by EnKF can be fruitfully exploited to analyze and understand the effects of nonlinearity on data assimilation strategies.In particular, we will utilize this method to illustrate that the nonlinear shear and differential divergence effects, explained in detail below, can lead to the failure (or divergence) of the EnKF method precisely because of the non-Gaussianity of the prior distributions.
To understand the nonlinear shear effect, we will consider the situation occurring in a low-dimensional dynamical system wherein a fixed point is surrounded by periodic orbits.In the linear case (harmonic oscillator) the orbits rotate around the center fixed point with the same angular velocity.But this synchronicity is destroyed by nonlinearity, which will result in a differential speed of rotation around the center.The resulting prior distribution will then tend to spread around the center in a annular shape.
A similar story holds in the saddle case, although of course with different-shaped distributions emerging.In the case of a saddle fixed point of a linear system, nearby orbits diverge from the separatrix at an exponential rate that is the same for all the orbits.But different orbits near a saddle point of a nonlinear system diverge away from the separatrix at a different rate and this differential divergence gives rise to non-Gaussian distributions for this case.Since the case of saddle has been studied more extensively in previous works, (Kuznetsov et al., 2003;Salman et al., 2006) we put greater focus in this paper on the nonlinear shear phenomenon.Both these are discussed and illustrated in greater detail in Sect.2.1.
These nonlinear effects may be mitigated in two ways: (1) if the frequency of observations is high enough, i.e., the time between observations small enough, then the shear effect will be less pronounced, or (2) if the covariance of the prior is small enough, then it will be dominated by the (Gaussian) observational likelihood.In both of these cases, the linear-based approximations implicit in any Kalman filter scheme will be effective.The point is, however, that working with a specific system, a priori control of the observational frequency and/or the prior covariance may be lacking.In the former case, the issue is that the specifications for the needed observational frequency will depend on a (potentially unknown) period in the system.As for the prior covariance, in sequential data assimilation it will have been built up through repeated applications of the dynamics and assimilation steps, and there is no guarantee that this will have any particular size.
In general, the role of data assimilation is not only to give information about the observed variables but also about the unobserved ones.A novel phenomena that our studies illustrate is about the effects of nonlinearity on unobserved variables.In particular, we will see cases when the marginal posterior given by EnKF for the observed variables and some of the unobserved variables agrees well with the exact marginal posterior for them, but the marginal posterior for some of the other unobserved variables disagrees.This is because the information contained in the nonlinearity about the latter type of the unobserved variables is not captured by EnKF.
The saddle effect (Ide et al., 2002;Kuznetsov et al., 2003;Salman et al., 2006) has been known to be a cause of the failure of EnKF and the effect of the observational time on filtering has also been studied.(Apte et al., 2008b;Nurujjaman et al., 2012) The four main contributions of the present work are a detailed study of the precise origin of the EnKF divergence, the new phenomenon of EnKF failure due to the shear around a center, the exact role played by the prior covariance in these phenomena, and the effect of the nonlinearity on unobserved variables.
A brief outline of the paper is as follows.The next section contains a description of the specific dynamical model we use, along with an illustration of the shear and the saddle effect.In Sect. 3 we review the Bayesian framework and the EnKF method.Section 4 is devoted to describing how the shear manifests itself in the failure of EnKF and generally its effects on data assimilation.The saddle effect is more briefly described in Sect.5, followed by Sect.6 containing a summary and a discussion of some of the directions for further studies.

Shear and saddle effects in linear shallow water Lagrangian dynamics
The specific dynamical system under consideration is the same as that used in Apte et al. (2008b), i.e., the following two Fourier modes of the velocity (u, v)(x, y, t) and the height h(x, y, t) of linear shallow water (LSW) equations with M Lagrangian floats.
The dynamics for the velocity field Fourier modes (u 0 , u 1 , v 1 , h 1 ) and positions (x i , y i ), i = 1, . . ., M of M Lagrangian drifters is given by the following set of ODE.
where the functions u and v on the right-hand side are as given in Eq. ( 1).The LaDA problem is to infer the velocity field (the Fourier modes (u 0 , u 1 , v 1 , h 1 ) in the above model) from the measurements of the positions (x i , y i ), i = itions used in this paper.We will focus on Lagrangian dynamics near the saddle at (x,y) = (0.5,0) and the (0.25,0.25).The Poincaré plot at the bottom shows a number of different orbits -some quasiperiodic orbits s, some periodic orbits, and some chaotic orbits near the two separatrices at x = 0.5 and y = 0.5.This plot art of the phase space for the Lagrangian trajectories is chaotic while some is regular.
effect is the basic feature of the time-independent flow and the time-dependent flow modifies it tly.Thus trajectories near the centre contain information mostly about the first Fourier mode u 0 less about the other modes u 1 ,v 1 ,h 1 .This will again be revealed by the posterior distributions of bles.
le panel shows dynamics near the separatrix at x = 0.5.We note that the origin of the saddle ata assimilation is not just the divergence near the saddle which is a purely linear effect, but the ial divergence" (as shown by the distortion of the line of initial conditions in the small inset on top s the latter of the two that leads to non-Gaussian distributions. 5 Fig. 1.Top panel shows the velocity field (arrows) and the height field (shaded) of the linear shallow water equations for typical initial conditions used in this paper.We will focus on Lagrangian dynamics near the saddle at (x, y) = (0.5, 0) and the center at (x, y) = (0.25, 0.25).The Poincaré plot at the bottom shows a number of different orbits -some quasi-periodic orbits on invariant circles, some periodic orbits, and some chaotic orbits near the two separatrices at x = 0.5 and y = 0.5.This plot shows that some parts of the phase space for the Lagrangian trajectories are chaotic, while some are regular.
1, . . ., M of the M Lagrangian drifters.The following key features of this model make it suitable for studying the effects of nonlinearity in data assimilation.
1.As with other Lagrangian data assimilation problems, the dynamical model has a skew-product structure -the drifter dynamics depends on the velocity field but not the other way around -and only the drifter positions are observed.
2. For the LSW the velocity dynamics is linear, while all the nonlinearity is contained in the Lagrangian drifter dynamics, which is observed.Thus the observed component is highly nonlinear.This helps us in isolating the precise nonlinear effects.
3. Most importantly, the model shows the dynamical behavior of interest to our purposes -namely, presence of saddles and centers.Figure 1 shows typical velocity and height fields along with a representative Poincaré plot of the Lagrangian trajectories.Note that the velocity dynamics alone, i.e., the solutions of the first four of Eq. ( 2) are periodic in time with period T flow = 2π/ √ 1 + 4π 2 ≈ 1 and the Poincaré plot shows the positions of a number of drifters at times that are multiples of T flow .

Illustrative examples
In order to understand the effects of shear and saddle on ensembles of initial conditions, we start with a simple ensemble of Lagrangian drifters along a straight line.Figure 2 shows the positions of these Lagrangian drifters after every 0.05 time units (which is small compared to the period of the velocity flow T flow ≈ 1).The blue dots show the positions of Lagrangian drifters after every 0.05 time step for trajectories in the time-independent velocity flow (i.e., with u 0 = 1, (u 1 , v 1 , h 1 ) = (0, 0, 0)), the red ones in a timedependent flow (with initial conditions u 0 = 1, (u 1 , v 1 , h 1 ) = (0.5, 0.34, 0.26)), and the black ones in an ensemble of velocity flows (with initial conditions u 0 = 1, h 1 = 0 and (u 1 , v 1 ) initial conditions chosen randomly to be uniform in the interval (0, 1)).(Note that here and in the rest of the paper, the dependence of u 1 , v 1 , h 1 on time will be understood and the plots show the values of these variables at the corresponding time relevant to that discussion.) 1.The figure clearly shows the strong shear effect -the trajectories near the center rotate much faster than those away from it, leading to strong distortion of the initial line of drifters.We will see later that this effect leads to initial Gaussian distributions being distorted into "annulus"-shaped non-Gaussian distributions (see, e.g., Fig. 4).
2. The shear effect is the basic feature of the timeindependent flow, and the time-dependent flow modifies it only slightly.Thus, trajectories near the center contain information mostly about the first Fourier mode u 0 and much less about the other modes u 1 , v 1 , h 1 .This will again be revealed by the posterior distributions of these variables.
3. The middle panel shows dynamics near the separatrix at x = 0.5.We note that the origin of the saddle effect in data assimilation is not just the divergence near the saddle, which is a purely linear effect, but the "differential divergence" (as shown by the distortion of the line of initial conditions in the small inset at the top right).It is the latter of the two that leads to non-Gaussian distributions. of shear and saddle on an ensemble of initial conditions of the Lagrangian drifter near a centre at (x,y) = panel) and near the saddle at (x,y) = (0.5,0.5) with separatrices x = 0.5 and y = 0.5 (middle panel).The ectories starting near the saddle, within a time span of 0.50 in an ensemble of velocities, is shown in bottom te the observations at various times t 1 ,...,t K to write the total observational random vector y T = 7 4.The middle panel clearly shows that the trajectories near the saddle are strongly affected by the timedependent flow since even small perturbations of the time-independent flow lead to drastically different dy-namics near the saddle at (0.5, 0.5).(This is seen from the fact that there is a large difference between the blue, red, and black lines, which are the trajectories in time-independent, time-dependent, and an ensemble of time-dependent flows, respectively.)Thus, the trajectories near the saddle contain information about precisely those modes u 1 , v 1 , h 1 of the velocity about which the trajectories near the center do not contain much information.Thus, a judicious mix of the trajectories near the centers and saddles will be required to get enough knowledge of the full velocity field.This has implications for the so-called "drifter placement strategies", as discussed in Salman et al. (2008).
5. The divergence near the different saddles leads to drastically different dynamics of the drifters that are initially nearby, as shown in the panel at the bottom.This sensitivity to initial conditions poses problems for data assimilation methods in general, as discussed later.(For clarity, the positions at different times are shown in different colors.Note also that the velocity flow is different for each different drifter.) We will see later in Sects.4-5 the implications for data assimilation of all these basic facts about the dynamics of nonlinear systems.Before beginning that discussion, we will describe the Bayesian formulation of data assimilation and the ensemble Kalman filter in the following section.

Posterior distribution: exact sampling and ensemble Kalman filter approximation
We will use the Bayesian viewpoint outlined in detail in Apte et al. (2007Apte et al. ( , 2008a)).Here we will only state a few key points for reference.We will consider deterministic models with solution x(t) = (x 0 ; t 0 , t) for n-dimensional state vector x(t) ∈ R n with initial conditions drawn from a prior with probability density function p ζ (x 0 ).The data, or the observations, are taken at discrete times and are subject to noise whose statistical characteristics, or the probability densities, are assumed to be known.Thus, the observation y k ∈ R m at time t k is modeled as a random vector where h : R n → R m .

Remark 1
For the specific dynamical model used in this paper -namely, the linear shallow water equations with M Lagrangian drifters -the state vector or the control vector is x(t) ≡ Since the observations are only of the drifter location, the observation vector is which is a projection on the drifter position components of the state x(t).In this section we will only discuss the general setting, and hence in this section, instances of x will refer to the state vector of the model and those of y to the observations, consistent with generally accepted notation in data assimilation literature.(Ide et al., 1997) In the rest of the paper, (x, y) will refer to the positions of Lagrangian drifters.
We concatenate the observations at various times t 1 , . . ., t K to write the total observational random vector where and We assume that we are given the probability density function p η : R mK → R of the random vector η.Thus, for a given set of the observations ŷ, which are a realization of the random vector y, the posterior probability P ex (x(t 0 )| ŷ1 , . . ., ŷK ) for the initial condition x o conditioned on observations up to time t K is obtained from Bayes' theorem: is a function of the observations y alone, and hence p( ŷ) is a constant for a given realization ŷ.In particular the constant of proportionality in Eq. ( 5) does not depend upon x 0 .This distribution can be pushed forward to obtain conditional distribution P ex (x(t)| ŷ1 , . . ., ŷK ) for the state at any later time t > 0. Note that when t < t K , this is a smoothing distribution, whereas P ex (x(t K )| ŷ1 , . . ., ŷK ) is the filtering distribution.
In this paper we have used Markov chain Monte Carlo (MCMC) methods in order to get samples from this posterior distribution π(z) := P ex (x(t 0 )| ŷ1 , . . ., ŷK ) for the initial condition z := x 0 .The details of the method are detailed in Apte et al. (2008b).Here we only give a brief description of the Metropolis-Hastings algorithm we use.The Markov chain is generated by using a proposal where ω n are iid standard Gaussian random variables.Thus, where z 2 = z T −1 z.The standard Metropolis-Hastings criterion is used for accepting or rejecting the proposed state: if α = min{(π(z * )q(z * , z n )/π(z n )q(z n , z * )), 1} > u n , then z n+1 = z * ; otherwise z n+1 = z n , where u n ∼ U (0, 1) are iid uniform random variables.The parameter α = 1 corresponds to the Metropolis-adjusted Langevin algorithm, (Robert and Casella, 1999;Roberts and Rosenthal, 2001) while α = 0 gives the usual random walk Metropolis-Hastings (Robert and Casella, 1999).The appropriate choice of the proposal covariance , step-size δ n , and their adaptation is discussed in detail in Ref. (Apte et al., 2008b, Sect. (5.2)).
Continuing the study begun in Apte et al. (2008b), we argue in this paper that this Bayesian posterior density P ex (x(t)| ŷ1 , . . ., ŷK ) -henceforth referred to as exact posterior -can be used as a key tool in understanding the effects of nonlinearity on data assimilation problems.(Zhou et al., 2006) The method we use for this purpose is to compare this exact posterior with an approximation of it given by the ensemble Kalman filter (EnKF) algorithm (Evensen, 2009), which we also outline below for further reference in this paper, using the same notation as the above discussion.
EnKF begins with an ensemble {x 1 u (t 0 ), . . ., x N u (t 0 )} of N samples of the state at initial time, drawn from the prior density p ζ (x 0 ).For each of the observational time instances t 1 , . . ., t K , the following steps are performed: 1. Evolve the "updated" ensemble {x 1 u (t i−1 ), . . ., x N u (t i−1 )} from time t i−1 to t i to get the "forecast" ensemble {x 1 f (t i ), . . ., x N f (t i )}.I.e., In practice, for large system size n, typical EnKF algorithms avoid calculating the n × n covariance Pf (t i ), by using appropriate matrix identities involving the much smaller n × N matrix formed with δ k f as columns.3. Find an updated ensemble at time t i in such a fashion that the ensemble mean xu (t i ) and covariance Pu (t i ) of this updated ensemble are the same as the mean and covariance given by the following equations, which are basically the update equations of the Kalman filter: ry used to study the effect of shear on data assimilation in the first case, the EnKF fails to approximate the posterior, whereas in the latter two cases, tty accurate representation of the posterior.We will see in subsection 4.1 that this is precisely hear near the center.The above choice of the specific trajectory is only representative of this ch is observed in all the trajectories which show qualitatively the same feature, namely the shear.ar do not lead to quantifying the effect of varying T obs or the prior covariance Σ ζ but we are g on this aspect.
of EnKF update steps due to shear effect he exact and EnKF priors and the posteriors p ex,pr (black), p kf,pr (magenta), p ex,po (red), and subsection 3.1 for the notation), for the update step involving the first and second observation, 0.1,0.2, for the first of the three cases described above, i.e., broad prior with covariance Σ L ζ at nd infrequent observations with T L obs = 0.1.Note that for the first observation, p kf,pr 1 = p ex,pr 1 ply the distribution at time t = 0 pushed forward up to time t = T L obs .Since the posterior after first observation using EnKF is different from the exact posterior, i.e. p ex,po 1 = p kf,po 1 , the prior ion times is different for EnKF and exact sampling method, i.e., p ex,pr i = p kf,pr i for i > 1.
this figure that the main cause of the failure of EnKF in this case is the highly non-Gaussian r distribution p kf,pr 1 .Clearly, a Gaussian approximation of this prior will have a mean which is x,y) = (0.25,0.25) which is a very low probability state.Of course, since the drifter position is gh accuracy, the EnKF posterior in the position variables is very close to the exact posterior.But the update of the unobserved velocity components -the EnKF posterior in the velocity variables mation of the exact posterior.We also see that since the shear around the center is mainly affected endent mode u 0 and not much by the time-dependent modes (u 1 ,v 1 ,h 1 ), the update in u 0 is the hereas the update is other velocity components is reasonably good.
11 Fig. 3.The trajectory used to study the effect of shear on data assimilation and where and H = ∇h( xf (t i )) .Thus, it is clear that this update process is nonunique since only the first two moments of this updated or "posterior" density are specified by the above two equations.This nonuniqueness gives rise to the many different versions of the EnKF available in the literature, such as the ensemble transform Kalman filter (ETKF) (Bishop et al., 2001) or the ensemble adjustment Kalman filter (EAKF) (Anderson, 2001).In our numerical experiments described below, we implement the perturbed observations EnKF (Burgers et al., 1998), (Evensen, 2009, pp. 41-44) with large enough ensemble size.We also choose an ensemble size between 10 4 and 10 5 in many of the cases shown below and this is chosen so that increasing the ensemble size does not affect the results.Also note that because of the large ensemble size, the results are expected to be independent of the choice of Kalman filter such as perturbed observation EnKf or EAKF or ETKF.

Comparison between EnKF and exact sampling
For any observation time t i , we will consider the following four ensembles.
1. and shown with blue.
Note that the first two of these are obtained by pushing forward from time t i−1 to time t i the ensemble of samples drawn from P ex (x(t i−1 )| ŷ1 , . . ., ŷi−1 ) and from P kf (x(t i−1 )| ŷ1 , . . ., ŷi−1 ), respectively.The EnKF is considered to "fail" when the P kf (x(t i )| ŷ1 , . . ., ŷi ) is a poor approximation of the P ex (x(t i )| ŷ1 , . . ., ŷi ).This is to be contrasted with what is commonly known as divergence of EnKF (or filtering/smoothing methods in general), which refers to comparing the results with a "true" state of the system that is known (Brett et al., 2011).We will later also comment on the approximation of the "true" state both by the exact and the EnKF posteriors.
We will see later that in many cases including those when EnKF fails, the latter two distributions, i.e., the posteriors, are approximately Gaussian.The main deciding factor for failure or success of EnKF will be seen to be the Gaussianity or the lack of it for the prior P kf (x(t i−1 )| ŷ1 , . . ., ŷi−1 ).This is the main reason for considering the above four distributions.Of course, if the dynamical system is linear, a Gaussian P kf (x(t i−1 )| ŷ1 , . . ., ŷi−1 ) will lead to Gaussian P kf (x(t i )| ŷ1 , . . ., ŷi−1 ), and thus EnKF will not fail, as has been known to be the case.

The shear effect near the center
In this section we will focus on comparison of exact Bayesian posterior and the EnKF posterior for a trajectory, shown in Fig. 3, that is near the center at (x, y) = (0.25, 0.25).The true initial condition is (u 0 , u 1 , v 1 , h 1 , x, y) = (1.0,0.5, 0.8, 0.7, 0.23, 0.33).In all these experiments, the observational error covariance R is taken so that √ R = diag(0.005,0.003).For this specific trajectory, we will discuss the following three different numerical experiments.
-In the first case, we choose a relatively large time between the observations T L obs = t i − t i−1 = 0.1, and these observations are shown by filled circles in Fig. 3.The prior in this case is Gaussian with mean (0.9, 0.2, 0.2, 0.2, 0.2, 0.3) and broad covariance L ζ , where L ζ = diag(1.0,0.7, 0.7, 0.7, 0.005, 0.005).Note that we have chosen the velocity covariance to be large but the drifter position covariance to be small.In the case of infrequent observations with T L obs = 0.1 but with a prior which is also broad in the drifter position components, e.g.Σ ζ = diag(1.0,0.7,0.7,0.7,0.5,0.5), the above discussed distributions (not shown) are almost 235 identical to those shown in figure 4, except those at the first observation time t = T L obs (where the prior in position variables have a very large covariance).This is because the large prior in position components only affects p ex,pr 1 which is very broad in this case.But by assimilating the observation of the position which has a much smaller error covariance, the posterior p ex,po 1 is seen to be identical to that shown in the above figure.
In contrast, figure 5 shows the prior and posterior distributions for the second of the three cases, i.e., still with 240 infrequent observations with T L obs = 0.1 but using a prior that is narrow in velocity components.We see that even at the first observation time, the prior p kf,pr 1 is not as severely non-Gaussian as in the previous case, leading to a good approximation of the posterior, i.e., p kf,po 1 ≈ p ex,po 1 .Because of this, the EnKF posterior gives a much better approximation of the exact one at later times, as seen in the second row.
Another way by which the effect of shear in producing the non-Gaussianity is suppressed is obviously by 245 decreasing the time between the observations, as shown in figure 6.The prior chosen is the same as the first case presented above, but the time between the observations is T L obs = 0.01.The resemblance with figure 5 is quite striking.In fact, the posterior and prior distributions at the second observation time are not qualitatively different from those shown in figure 5 and hence not shown.
In order to quantify this effect, we will compare the so called degree of freedom (DOF) for signal, denoted None of the remarks we make below about the posterior are affected by this choice since the drifter positions are observed with error covariance comparable to this prior covariance.
-In the second case, we choose the same time T L obs = 0.1 between observations as in the first case (and also the same realization of the observations), but choose a tight prior covariance S ζ with S ζ = diag(0.1,0.07, 0.07, 0.07, 0.005, 0.005).
-In the last case, we choose the broad prior covariance L ζ as in the first case, but choose a smaller time between the observations: T S obs = 0.01.These frequent observations are shown by open triangles in Fig. 3.
We will see that in the first case, the EnKF fails to approximate the posterior, whereas in the latter two cases, it provides a pretty accurate representation of the posterior.We will see in Sect.4.1 that this is precisely because of the shear near the center.The above choice of the specific trajectory is only representative of this phenomena, which is observed in all the trajectories that show qualitatively the same feature, namely the shear.Our studies so far do not lead to quantifying the effect of varying T obs or the prior covariance ζ , but we are currently working on this aspect.

The failure of EnKF update steps due to shear effect
Figure 4 shows the exact and EnKF priors and the posteriors p ex,pr (black), p kf,pr (magenta), p ex,po (red), and p kf,po (blue) (see Sect. 3.1 for the notation) for the update step involving the first and second observation, i.e., at times t = 0.1, 0.2, for the first of the three cases described above, i.e., broad prior with covariance L ζ at the initial time and infrequent observations with T L obs = 0.1.Note that for the first observation, p kf,pr 1 = p ex,pr 1 since this is simply the distribution at time t = 0 pushed forward up to time t = T L obs .Since the posterior after incorporating the first observation using EnKF is different from the exact posterior, i.e., p ex,po 1 = p kf,po 1 , the prior at other observation times is different for EnKF and exact sampling method, i.e., p ex,pr i = p kf,pr i for i > 1.It is clear from this figure that the main cause of the failure of EnKF in this case is the highly non-Gaussian nature of the prior distribution p kf,pr 1 .Clearly, a Gaussian approximation of this prior will have a mean that is near the center (x, y) = (0.25, 0.25), which is a very low probability state.Of course, since the drifter position is observed with high accuracy, the EnKF posterior in the position variables is very close to the exact posterior.But the EnKF fails in the update of the unobserved velocity components -the EnKF posterior in the velocity variables is a poor approximation of the exact posterior.We also see that since the shear around the center is mainly affected by the time independent mode u 0 and not much by the time-dependent modes (u 1 , v 1 , h 1 ), the update in u 0 is the worst for EnKF, whereas the update in other velocity components is reasonably good.
In the case of infrequent observations with T L obs = 0.1 but with a prior that is also broad in the drifter position components, e.g., ζ = diag(1.0,0.7, 0.7, 0.7, 0.5, 0.5), the above discussed distributions (not shown) are almost identical to those shown in Fig. 4, except those at the first observation time t = T L obs (where the prior in position variables have a very large covariance).This is because the large prior in position components only affects p d s (t), which in the case of data assimilation has the form Zupanski et al. (2007), Here, Σ ζ is the covariance of the prior distribution at time t = 0 while Σ(p) is the covariance of the distributions 250 p ex,pr , p ex,po , p kf,pr , or p kf,po .The larger the value of d s , the more information is gained through assimilation and it asymptotes to tr(I).This is shown in Fig. 7 for both the broad prior (top row) and the narrow prior (bottom row).The left column shows the degree of freedom for the distributions on all the six variables of the model, while the right column shows it for the marginal distributions on only the velocity variables (u 0 ,u 1 ,v 1 ,h 1 ).We notice that in the case of a broad prior, the exact posterior has more information than the prior at all the times.In fact, in 255 our numerical experiments, at time t = 5.0, the value of d s for the posterior approaches 6 which is the theoretical maximum, as shows in Table 4.1.On the other hand, the EnKF posterior is unable to gain such information.As 13 Fig. 5. Same as Fig. 4, except that the prior distribution at time t = 0 has a "narrow" covariance with S ζ = diag(0.1,0.07, 0.07, 0.07, 0.005, 0.005).d s (t), which in the case of data assimilation has the form Zupanski et al. (2007), Here, Σ ζ is the covariance of the prior distribution at time t = 0 while Σ(p) is the covariance of the distributions 250 p ex,pr , p ex,po , p kf,pr , or p kf,po .The larger the value of d s , the more information is gained through assimilation and it asymptotes to tr(I).This is shown in Fig. 7 for both the broad prior (top row) and the narrow prior (bottom row).The left column shows the degree of freedom for the distributions on all the six variables of the model, while the right column shows it for the marginal distributions on only the velocity variables (u 0 ,u 1 ,v 1 ,h 1 ).We notice that in the case of a broad prior, the exact posterior has more information than the prior at all the times.In fact, in 255 our numerical experiments, at time t = 5.0, the value of d s for the posterior approaches 6 which is the theoretical maximum, as shows in Table 4.1.On the other hand, the EnKF posterior is unable to gain such information.As 13 Fig. 6.Same as Fig. 4, except that S obs = 0.01 and distributions at only the first observation time t = T S obs = 0.01 shown.
in this case.But by assimilating the observation of the position that has a much smaller error covariance, the posterior p ex,po 1 is seen to be identical to that shown in the above figure.In contrast, Fig. 5 shows the prior and posterior distributions for the second of the three cases, i.e., still with infrequent observations with T L obs = 0.1 but using a prior that is narrow in velocity components.We see that even at the first observation time, the prior p kf,pr 1 is not as severely non-Gaussian as in the previous case, leading to a good approximation of the posterior, i.e., p kf,po 1 ≈ p ex,po 1 .Because of this the EnKF posterior gives a much better approximation of the exact one at later times, as seen in the second row.
Another way by which the effect of shear in producing the non-Gaussianity is suppressed is obviously by decreasing the time between the observations, as shown in Fig. 6.The prior chosen is the same as the first case presented above, but the time between the observations is T L obs = 0.01.The resemblance with Fig. 5 is quite striking.In fact, the posterior and prior distributions at the second observation time are not qualitatively different from those shown in Fig. 5 and hence not shown.
In order to quantify this effect, we will compare the socalled degree of freedom (DOF) for signal, denoted d s (t), which in the case of data assimilation has the form from Zupanski et al. (2007), Here, ζ is the covariance of the prior distribution at time t = 0, while (p) is the covariance of the distributions p ex,pr , p ex,po , p kf,pr , or p kf,po .The larger the value of d s , the more information is gained through assimilation and it asymptotes to tr(I).This is shown in Fig. 7 for both the broad prior (top row) and the narrow prior (bottom row).The left column shows the degree of freedom for the distributions on all the six variables of the model, while the right column shows it for the marginal distributions on only the velocity variables (u 0 , u 1 , v 1 , h 1 ).We notice that in the case of a broad prior, the exact posterior has more information than the prior at all the times.In fact, in our numerical experiments, at time t = 5.0, the value of d s for the posterior approaches 6, which is the theoretical maximum, as shows in Table 1.On the other hand, the EnKF posterior is unable to gain such information.
As indicated before, when the prior is narrow, the EnKF posterior has as much information as the exact posterior.
Nonlin   8) for various distributions described in the text.Left column is for the full distributions on all the six variables of the model (u0,u1,v1,h1,x,y) while the right column is for the velocity variables (u0,u1,v1,h1).Top row is for the case of broad prior (see Fig. 4), bottom row for narrow prior (see Fig. 5).

Time
Broad prior Narrow prior p ex,pr p ex,po p kf,pr p kf,po p ex,pr p ex,po p kf,pr p kf,po 0.0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 indicated before, when the prior is narrow, the EnKF posterior has as much information as the exact posterior.
14 Fig. 7.The degree of freedom Eq. ( 9) for various distributions described in the text.Left column is for the full distributions on all the six variables of the model (u 0 , u 1 , v 1 , h 1 , x, y), while the right column is for the velocity variables (u 0 , u 1 , v 1 , h 1 ).Top row is for the case of broad prior (see Fig. 4), bottom row for narrow prior (see Fig. 5).
Table 1.The degree of freedom Eq. ( 9) for various distributions described in the text.

Time
Broad prior Narrow prior p ex,pr p ex,po p kf,pr p kf,po p ex,pr p ex,po p kf,pr p kf,po 0.0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Effect of nonlinearity on assimilation over longer time periods
Figure 8 shows the posterior distributions at time t = 5.0, conditioned on 50 observations in the first two of the above three cases with T L obs = 0.1 (top and middle rows, with broad and narrow priors at t = 0, respectively) and conditioned on 500 observations in the last of the three cases with T S obs = 0.01 (bottom row).We see that in the last of the three cases, when the prior is broad and the time between the observations is short, the EnKF approximates the posterior very well.At the other extreme, when the prior is broad and the time between the observations is long, the EnKF gives a very poor approximation of the posterior.In the case when the prior is narrow but the time between the observations is long, the EnKF does produce distributions whose covariance is comparable to that of the exact distributions but the mean is systematically biased compared to the exact distribution.

Saddle effect
The nonlinear divergence of trajectories, as illustrated in Fig. 2, poses a challenge for data assimilation methods in general.In this section we describe the comparison of the EnKF and the exact posteriors in order to understand in detail the origin of the saddle effect, using a chaotic trajectory, shown in Fig. 9, which is near the saddle point at (x, y) = (0.5, 0).The true initial condition is (u 0 , u 1 , v 1 , h (red) for the first two of the cases described above, with T L obs = 0.1, for the update step of the last observation i = 50 at t = 50T L obs = 0.5: broad prior (top) and narrow prior (middle row).The bottom row is for the third case with T S obs = 0.01, but for the update of the 500 th observation, thus at the same t = 500T S obs = 0.5.Note that the true values of the variables at this time are (u0,u1,v1,h1,x,y) = (1.0,0.539,0.442,0.946,0.21862,0.18650),shown by the cyan dot in the figures.

Effect of nonlinearity on assimilation over longer time periods
Figure 8 shows the posterior distributions at time t = 5.0, conditioned on 50 observations in the first two of the 260 above three cases with T L obs = 0.1 (top and middle rows, with broad and narrow priors at t = 0, respectively) and conditioned on 500 observations in the last of the three cases with T S obs = 0.01 (bottom row).We see that in the last of the three cases, when the prior is broad and the time between the observations is short, the EnKF approximates the posterior very well.At the other extreme, when the prior is broad and the time between the observations is long, the EnKF give a very poor approximation of the posterior.In the case when the prior is narrow but the time 265 between the observations is long, the EnKF does produce distributions whose covariance is comparable to that of the exact distributions but the mean is systematically biased compared to the exact distribution.(red) for the first two of the cases described above, with T L obs = 0.1, for the update step of the last observation i = 50 at t = 50T L obs = 0.5: broad prior (top) and narrow prior (middle row).The bottom row is for the third case with T S obs = 0.01, but for the update of the 500th observation, thus at the same t = 500T S obs = 0.5.Note that the true values of the variables at this time are (u 0 , u 1 , v 1 , h 1 , x, y) = (1.0,0.539, 0.442, 0.946, 0.21862, 0.18650), shown by the cyan dot in the figures.rgence of trajectories, as illustrated in Fig. 2, poses a challenge for data assimilation methods ection, we describe the comparison of the EnKF and the exact posteriors in order to understand of the saddle effect, using a chaotic trajectory, shown in Fig. 9, which is near the saddle .5,0).The true initial condition is (u 0 ,u 1 ,v 1 ,h 1 ,x,y) = (1.0,0.5,0.8,0.7,0.458,0.4),and the covariance R is fixed so that √ R = diag(0.005,0.003).(1.0, 0.5, 0.8, 0.7, 0.458, 0.4), and the observational error covariance R is fixed so that √ R = diag(0.005,0.003).As in the case of the center, Fig. 10 shows the posterior and prior distributions when assimilating frequent and infrequent observations with a broad and a narrow prior using the EnKF as well as using exact sampling.The broad conclusions we can draw from these are the same as in the case of the center, except that the cause of these effects is different.In particular, the prior at each observational time is still non-Gaussian in the case when the time between the observations is large: T L obs = 0.1 and the prior has a broad prior covariance L ζ with L ζ = diag(1.0,0.7, 0.7, 0.7, 0.005, 0.005) (first and second row of Fig. 10 for assimilation of first and second observations, respectively).In contrast with the data for the trajectory around the center, the observations in this case do contain information about the time-dependent modes (u 1 , v 1 , h 1 ), as noted in Sect.2.1.Thus, the exact posteriors in all the components are much more narrow compared with assimilation of trajectory near a center.Hence, we see here an example of how information about different aspects of velocity flow is obtained through different trajectories, either near the saddle or near the centers.Similar conclusions have been discussed and used to propose different drifter placement strategies in Salman et al. (2008).
Recall that in the case of trajectory near the center, we looked at the effect of assimilating observations over long time periods, on both the exact and the EnKF posteriors.In contrast, for chaotic trajectories such as the one shown in on more than a few observations.This is clearly because of the sensitive dependence of such trajectories on initial conditions.It is known that assimilation of such trajectory data may lead to the divergence of EnKF -the so-called saddle effect (Kuznetsov et al., 2003) -in the sense that the EnKF estimates and the true trajectory diverge from each other.Our current work or other previous studies of the Lagrangian data assimilation problem have not yet given sufficient indications about the conditions under which the EnKF distribution converges to the exact distribution in the case of trajectories near the saddle, though there are previous results proving convergence of the EnKF distribution to the exact distribution when the dynamics is linear, but to a distribution that may not be the exact Bayesian distribution in the case of nonlinear dynamics.(Le Gland et al., 2011) 6 Discussion and future directions This paper discusses the effects of nonlinearity on data assimilation.Adopting the Bayesian framework, we work with two different DA methods -namely, the exact Markov chain Monte Carlo sampling of the posterior distribution for the initial conditions of the model conditioned on observations over a given time period, and the ensemble Kalman filter (EnKF) approximation of this distribution.Since the EnKF update step is exact only when the prior distribution at each step is Gaussian, the comparison of the exact posterior and the EnKF posterior gives us information about which aspects of nonlinearity play a significant role in DA.Some of the main conclusions we draw from this comparison are summarized below.
The flow near an elliptic fixed point, or center, in a nonlinear dynamical systems generically has a shear or differential speed of rotation.Specifically, the initial conditions near the www.nonlin-processes-geophys.net/20/329/2013/ Nonlin.Processes Geophys., 20, 329-341, 2013 center rotate faster than those farther away.The probability densities on the phase space evolving under such flow naturally develop non-Gaussianity.
The flow near a hyperbolic fixed point, or saddle, in a nonlinear systems is qualitatively and quantitatively similar to the flow near a saddle in the linearized system.But in the case of a nonlinear system, the saturation of the diverging parts leads to the eventual development of characteristic non-Gaussian features of the densities.
The presence of such non-Gaussian densities leads to the failure of data assimilation techniques such as the ensemble Kalman filter (EnKF).We emphasize our viewpoint, also presented in (Apte et al., 2008b), that the failure or success of data assimilation scheme such as the EnKF is measured by comparing its outcome with the exact posterior distribution Eq. ( 5) given by Bayes' rule.This is to be contrasted with the so-called divergence of EnKF (or filter divergence in general), which refers to the growing difference between the filter estimate and the true state of the system.
The comparisons between the exact posterior and the EnKF posterior reveal that the shear effect is one of the main causes of EnKF failure.In fact, even in the case of failure of EnKF for chaotic trajectories near the saddle, it is the differential speed of divergence and ultimately the shear that causes the failure of the EnKf.
We note that the EnKF fails when the observations are infrequent and the prior at the initial time is broad.We show using examples that the effects of nonlinearity are less prominent either when the observations are frequent or when the prior is narrow.We also emphasize the effects of nonlinearity on the marginal posterior distributions of unobserved variables.In our numerical experiments, we have taken the observations to be relatively accurate, and because of which the marginal posterior distribution in the observed variables is very close to the observational likelihood, but the nonlinearity leads to a poor EnKF approximation of the marginal posterior of unobserved variables.Using numerical observations for the case of trajectories near the center, we surmise that this is precisely the cause of EnKF divergence.
Finally, We have hinted at the connection between the failure of the EnKF (mismatch between exact posterior and EnKF posterior) and the filter divergence (mismatch between filter estimate and the true state of the system) over long periods of time.But because of sensitivity to initial conditions, sampling the exact (smoothing) posterior conditioned on data over long time periods becomes prohibitively difficult for the MCMC method we have used for the studies in this paper.
velocity field (arrows) and the height field (shaded) of the linear shallow water equations for

Fig. 2 .
Fig. 2.The effect of shear and saddle on an ensemble of initial conditions of the Lagrangian drifter near a center at (x, y) = (0.25, 0.25) (top panel) and near the saddle at (x, y) = (0.5, 0.5) with separatrices x = 0.5 and y = 0.5 (middle panel).The divergence of trajectories starting near the saddle, within a time span of 0.50 in an ensemble of velocities, is shown in bottom panel.
Fig. 4. The four distributions p kf,pr i

12Fig
Fig. 4. The four distributions p kf,pr i (magenta), p i

Fig. 7 .
Fig. 7.The degree of freedom Eq. (8) for various distributions described in the text.Left column is for the full distributions on 15

Fig. 8 .
Fig. 8.The four posterior distributions p kf,pr i the effect of saddles on data assimilation.Note that the Poincaré plot, shown by dots, of that it is a chaotic trajectory.

Fig. 9 .
Fig.9.The trajectory used to study the effect of saddles on data assimilation.Note that the Poincaré plot, shown by black dots, of this trajectory shows that it is a chaotic trajectory.

Fig. 10 . 18 Fig. 10 .
Fig. 10.Same as Figs. 4 (broad prior and infrequent observations, top row is update step at the first observation, second row at second observation) and 5 (narrow prior and infrequent observations, third row is update step at the first observation, bottom row at second observation), except that the observations from the trajectory near the saddle are assimilated.

Table 1 .
The degree of freedom Eq. (8) for various distributions described in the text.