Toward the assimilation of images

. The equations that govern geophysical ﬂuids (namely atmosphere, ocean and rivers) are well known but their use for prediction requires the knowledge of the initial condition. In many practical cases, this initial condition is poorly known and the use of an imprecise initial guess is not sufﬁcient to perform accurate forecasts because of the high sensitivity of these systems to small perturbations. As every situation is unique, the only additional information that can help to retrieve the initial condition are observations and statistics. The set of methods that combine these sources of heterogeneous information to construct such an initial condition are referred to as data assimilation. More and more images and sequences of images, of increasing resolution, are produced for scientiﬁc or technical studies. This is particularly true in the case of geophysical ﬂuids that are permanently observed by remote sensors. However, the structured information contained in images or image sequences is not assimilated as regular observations: images are still (under-)utilized to produce qualitative analysis by experts. This paper deals with the quantitative assimilation of information provided in an image form into a numerical model of a dynamical system. We describe several possibilities for such assimilation and identify associated difﬁculties. Results from our ongoing research are used to illustrate the methods. The assimilation of image is a very general framework that can be transposed in several scientiﬁc domains.


Introduction
For more than six decades, following the works of J. Von Neumann and J. Charney, the fluid envelope of the Earth has been described by mathematical models giving the evolution of its state variables: wind, temperature, pressure and moisture for the atmosphere; current, temperature, salinity and surface elevation for the sea.Models are routinely used for prediction and the level of prediction has been dramatically improved over the last few years.
For more than five decades, the fluid envelope of the Earth has been observed by satellite providing a long time and total coverage of the ocean and of the atmosphere.Billions of images have been produced, some of which are exhibited in art galleries showing the beauty of our Earth.These images and their dynamics show complex structures in different areas: tropical depressions, storms at mid-latitudes, but also temperature, salinity and phytoplankton blooming in the ocean.These images are often used in meteorological bulletins on TV to illustrate the evolution of the weather.Thus they are important for a qualitative understanding of the evolution of the weather.
Images and models describe the same objects but with different tools.Images are often used to verify models -in general in fluid dynamics and turbulence -but it is done in a qualitative way rather than in a quantitative one.Both models and experiments display, for instance, images of Kelvin waves showing that models can mimic nature.But to what extent?How is it possible to quantitatively compare images Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
of Kelvin waves observed from experiments and images from numerical models?
Over more than two decades, data assimilation has progressed into a very important development which is considered as the main reason for the improvement of forecasts.By data assimilation, we mean all the methods able to link together all the available information on geophysical fluids: 1. mathematical information provided by models; 2. physical information provided by in situ or remote observations; 3. statistical information issued both from observations and from past predictions; 4. a priori information (e.g., the regularity of the fields).
For many years, numerical models and images have both been used for a qualitative prediction.However, the structured information borne by images and by models are not presently used together in a quantitative framework.The topic of this paper is twofold: Direct Image Assimilation, which can be summarized as a solution to the problem of how to couple the information provided by numerical models and the information provided by images; and the dual problem: how to validate mathematical models of flows using images and their dynamics.By validating, we mean a quantitative validation and not just a qualitative one.We must deal with a very general problem arising in many fields out of geophysical problems.In nature, every situation is unique; steady-state or asymptotic solutions do not exist.Most of the time, this assumption of uniqueness is implicitly used in modeling.Because of the enhancement of modeling, this fact will become crucial.In many cases, the initial condition and/or boundary conditions cannot be experimentally controlled and consequently mathematical models are not sufficient to give an accurate representation of the situation.More information must be added and inserted into models.Images are, in some situations, candidates for this purpose.We would like to point out the ambiguity of the nature of some sequences of images.Some elements are clearly Lagrangian -as in the case of small cumulus humilis clouds drifting with the wind: under the tropics they are used, by operational centers, as Lagrangian markers.As such, they give an estimation of the wind that can be used in a data assimilation scheme.Some elements are clearly Eulerian -as in the case of lenticularis clouds, they seem to be quasi-steady, but in fact they are the signature of a strong wind.Estimating wind velocity from the shift of these clouds would lead to erroneous data.In between these examples, many visual elements in geophysical fluids dynamics have both an Eulerian and a Lagrangian character.The methods developed for assimilating images could be helpful for a better understanding of the underlying physics.
For engineering problems, the unknown conditions may be some parameters which have to be identified as a solu-tion of an inverse problem, a methodology which can be included in data assimilation as it is.For example, in Chapelle et al. (2013) the methodology of data assimilation is used to solve Biological Engineering problems.Nevertheless, this paper will be more oriented towards applications to geophysical fluids.The remainder of the paper is organized as follows: Sect. 2 gives a brief description of the observation by satellites.Section 3 is devoted to a brief introduction to data assimilation using variational methods.It also describes the characteristics of satellite observations in the sense of data assimilation.Section 4 describes the use of images as the source of pseudo-observations in data assimilation.Section 5 gives a methodology for direct assimilation of images and introduces the notion of an observation operator for images.Examples of such operators are given in Sect.6 as well as associated numerical results.Section 7 concludes the paper.

Satellite observations
At the present time, more than 40 satellites are continuously scanning the atmosphere and the ocean.As an illustration, Fig. 1a gives the number of observations provided by satellites and its evolution from 1996 to 2010.

Classification of satellites
Satellites can be classified according to many criteria: usage and orbit characteristics are the major criteria.In terms of usage, we are interested in Earth observation and weather satellites.The next section gives a brief description of data provided by those satellites.In terms of orbit characteristics, the most important are the altitude and the inclination (in reference to the equatorial plane).The altitude and the inclination define the resolution, the coverage and the acquisition conditions (local solar time at the acquisition point) of the measurement instruments on board the satellite.Most of the Earth observation and weather satellites can be classified as geostationary or polar orbiting: the Earth.Most of the visual information displayed on weather bulletins on TV are issued from these satellites.They clearly show the evolution of the large-scale air masses, the birth of tropical depressions and hurricanes and, even at a local scale, the development of thunderstorms. Figure 1b shows the coverage of the Earth by observations from geostationary satellites.
2. Polar orbiting satellites.These satellites have an altitude between 400 and 800 km.At that elevation, they can cover the broad spectrum of radiation including the microwave range as opposed to geostationary satellites.Also, the spatial resolution of measurements is very fine thanks to the low altitude.However, the geographical coverage is quite narrow at a given time.Also, the time resolution is coarse.In some orbits, it takes several days for the satellite to travel above the same point.They need several orbits to cover the entire globe.These satellites pass over the poles at each revolution, making it possible to get information more frequently in those areas not covered by geostationary satellites and almost inaccessible by conventional instruments.According to their inclination, they can be divided into subclasses.The subclass that draws attention is the subclass of sunsynchronous orbit.The polar sun-synchronous satellites pass the equator at the same local time on every pass.These satellites are useful for imaging and weather.Figure 1c shows the distribution of observations from polar orbiting satellites equipped with the AMSU-A sensor.

Content of satellite measurements
Satellite sensors measure radiation reflected or emitted by the Earth, the seas or the atmosphere.The measured radiations are reflected light for visible channels and radiance for infrared channels.Depending on the wavelength employed, the measured radiations quantify a variable or a set of variables of the studied system.They can therefore be considered as observations in the sense of data assimilation.
In visible channels, satellites measure the reflective properties of the observed system (see Fig. 3).This is often limited to the upper layer of clouds.If the atmosphere is not cloudy, the observed surface can be extended to the Earth and sea surface.The observation of the sea in the visible channel produces the sea surface color (see Fig. 2).This shows the concentration of the phytoplankton in a thin upper layer of the sea.In infrared channels, satellites measure an integration over a certain thickness of the emissive properties of the observed system.Examples are water vapor images in the atmosphere and the sea surface temperature (SST) images.The integration thickness is highly dependent on the observed system.The sea is impervious to electromagnetic waves causing measurements to be limited to a very thin layer of the sea surface.SST measurements, for example, are limited to only a few millimeters of the sea surface.This thickness is negligible compared to the thickness of the top layer in numerical models of the sea, which can extend to some hundred meters.Data like the SST are thus more related to interactions between the sea and the atmosphere than to the sea state variables.For the atmosphere, the probed layer can extend to its full thickness under the satellite.In the case of water vapor, the thickness of the probed layer is more important than in the case of the SST, but depends on the distribution of moisture in the atmosphere.Some specialized satellites provide more complex data -an example is the Jason type satellites that give the sea surface elevation with a precision of some centimeters.
As opposed to in situ measurement devices that usually acquire observations at a single point at each time, satellite measurements provide observation of a large area at the same time.Thus, these satellites provide the image of their coverage area by the radiation function at each time.For this reason, satellite observations are usually called images.Subsequently, the expression "satellite image" means satellite observation.
3 Data assimilation

Definition of data assimilation
By data assimilation, we mean the methods permitting the best retrieval of the state of the environment, a mandatory step prior to prediction.From the formal point of view, the problem is to link heterogeneous sources of information, the heterogeneity bearing on the nature, the quality and density.Basically we have 1.mathematical information: this is the model which is used to describe the flow; 2. physical information: it is given by data issued from in situ or remote measurements such as images; 3. statistical information: it could be produced from statistics on the observations as well as statistics on the outputs of the model; 4. a priori information, for instance on the regularity of the fields or the existence of singularities; more generally, qualitative information used in the analysis.
Basically, there are two approaches of data assimilation methods for combining all the previously mentioned information: 1. Approaches derived from the Kalman filter: these are based on Bayesian estimation and are of great theoretical importance.Having to deal with a huge covariance matrix, the traditional Kalman filters are not implemented in operational centers.The Ensemble Kalman filter of Evensen (1994) overcomes that limitation by introducing an ensemble approximation of the covariance matrix.
2. Variational approaches: these are based on optimal control and the calculus of variations.These methods are presently used by most important operational centers for weather prediction.They seem well adapted for the assimilation of images and, in the sequel, we will only consider the variational approaches.

Variational data assimilation
The ingredients of variational data assimilation are as follows: -A model describing the evolution of the state variable X ∈ X .The model is usually given as a system of partial differential equations (PDEs) of the form where the initial condition U ∈ X is unknown, X is the state space and M is the model operator pressure, and the dynamic model M describes the set of physical laws that the variables must respect over time -thermodynamics laws, conservation laws, etc.
-A set of observations Y o given by physical measurement (direct or indirect) of the system state.For the sake of simplicity, we will assume observations to be continuous in time: -An operator of observation H: observations are usually made up of partial or indirect measurements of the state variables.The observation space O is not necessarily the same as the state space X .The observation operator H is defined as the mapping operator from X onto O: -A background estimation U b of the initial state U .
In operational meteorology, this background estimation can be deduced from previous forecasts.
-Statistical information, for instance the error covariance matrix Q of the observation error and the covariance matrix B of the background estimation.
Variational data assimilation (VDA) defines the optimal initial condition U a as where the so-called cost function J is defined as with the norms The cost function contains two terms: the first one is the discrepancy between observations and the solution of the model associated with the initial condition U .The second one is the background term.This will require the solution to be located in the vicinity of U b .It is also a regularization term in the sense of Tikhonov (1963).This term is mandatory due to the ill-posedness of the problem.In operational meteorology, the dimension of the state vector, and consequently the dimension of the initial condition, is of the order of 1 billion while the number of daily observations is of the order of 10 million.Therefore we would have to deal with a severely ill-posed problem (as defined by Hadamard) if the regularization were not introduced.A necessary condition for the optimality is given by the Lagrange-Euler equation: This is also a sufficient condition if J is strictly convex and coercive.This is the case if we have a linear model but realistic models are nonlinear in general.Solving the Lagrange-Euler equations requires the gradient ∇J of the cost function.
The main difficulty is that J is an implicit function of U .In VDA, ∇J is computed through the adjoint variable P , which is defined as the solution of the adjoint model: where the * denotes the adjoint operator.The adjoint model is deduced from the direct model Eq. ( 1) using calculus of variations based on the Gateaux derivatives, see Le Dimet and Talagrand (1986) for details.The gradient of the cost function ∇J (U ) is given by The gradient is then used in an optimization algorithm (truncated or quasi-Newton methods, L-BFGS) to compute an estimate of the optimal solution.

Satellite observations and data assimilation
Observations that are quantitatively used in data assimilation are usually limited to measurements of the state variables, such as wind, moisture and pressure given by terrestrial centers for meteorological prediction.These observations will be named conventional in what follows.Apart from conventional observations, there exists another class of observations that is mainly used only for qualitative purposes: these are images.Among the various sources of images, satellites plays an important role for the observation of the atmosphere and seas.
As mentioned in Sect.2.2, satellite observations or satellite images are indirect measurements of the state variables of observed systems like the atmosphere or the sea.Thanks to post-processing, they can be converted into observations of the associate variables.As an example, Fig. 2 shows the SST and the chlorophyll concentration derived from MODIS (on board satellite Aqua) observation of the Gulf Stream.MODIS stand from Moderate-resolution Imaging Spectroradiometer; it is a 36-channel scientific instrument that equips NASA satellites Terra and Aqua.
Thanks to their high resolution and their spatial coverage, satellite images also provide information on structures ranging from the mesoscale to synoptic scale.Structure refers to the spatial organization of individual measurements.A sequence of images shows the dynamical evolution of the structures.As an example, in addition to the SST and the chlorophyll concentration, Fig. 2 shows a couple of large Gulf Stream eddies.The similarity of observed structures between SST (infrared channel) and Chlorophyll concentration (visible channel) shows that such information can be obtained from different measurements.The example of Fig. 3 shows a depression over western Europe and its evolution from 28 April to 29 April 2008.The observed structures (eddies for the sea, depression for the atmosphere, etc.) represent Lagrangian information and are clearly useful for the prediction of the observed system.From the above description, we can distinguish two major types of satellites observations: 1. indirect measurements of the state variables of observed systems; 2. characteristic structures of the observed system and their dynamics.

Satellite observations as indirect measurements of the state variables
From this point of view, one can derive two approaches for using satellite observations.The first approach consists in extracting the variables that are indirectly observed and use them as conventional observations in a model that contains those variables in the system state.The second approach consists in modeling an appropriate observation operator that computes radiance from the system state given by the model.In both cases, satellite observations are used as conventional observations; this consideration will not be taken into account in the rest of this paper.The two cases are subject to some problems including: the difficulty of extracting variables from indirect measurements or modeling the appropriate observation operator, and the sensitivity of satellite measurements to acquisition conditions.For example, a substantial cloud cover makes the error rate prohibitive in the observations of temperature and moisture of the atmosphere.In these cases, measurements are used to derive other products such as velocity fields (atmospheric motion vector -AMV).However, the combination of 4-D variational data assimilation (4D-VAR) and the use of satellite measurements has significantly improved the forecasts, as shown by Fig. 4.This figure shows the anomaly correlation at 500 hPa height for 3, 5 and 7 days forecast between the years 1992 to 2007.Before the year 2000, there was a significant difference between the forecast in the Northern Hemisphere (high performance) and Southern Hemisphere (poor performance).The difference was due to the lack of conventional observations in the Southern Hemisphere.In the early 2000s, the introduction of satellite observations in data assimilation made it possible to get the same performance in both hemispheres.

Satellite observations as characteristic structures of the studied system and their dynamics
This is the approach that will be developed in the rest of this paper.In this case, satellite measurements cannot simply be used as conventional observations.In fact, as structures refer to the spatial organization of individual measurements, a single measurement is useless.Similarly, as the dynamics refers to the evolution of measurements in time, a single image is not sufficient.However, the observed structures are indirectly present in the model output, provided with appropriate initial conditions and other parameters of the model.The question that arises is, how should one use such information in data assimilation?The answer to this question is the assimilation of images.This is a technique that emerged recently with the aim of using images as observations in data assimilation.There are two basic approaches: 1. Assimilation of pseudo-observations: in a first step, the images are analyzed.The result of the process is a field of velocities obtained by the comparison of two or several successive images.In a second step, these velocities are assimilated as conventional observations in a classical method of data assimilation.
2. Direct assimilation of images: images (image structures) are considered as conventional observations and assimilated as such.To do so, depending on the application, we need to define an adequate mathematical space in which images or image structures will be modeled.
Corresponding observation operators that map the control space into the structure space should be constructed.The structure space must conserve and extract the most pertinent information of the images.If we want to remain in the framework of optimal control methods, then the space must be defined in such a way that the rules of differential calculus can be applied.It is also important to underline that, for computing purposes, the space dimension should not be too large.
In both cases, a preliminary step for using images in data assimilation is the identification of the underlying process.However, this paper does not focus on this preliminary step; instead, it focuses on the mathematical aspects of the use of images in data assimilation.Before moving forward, let us give some illustrations of the importance of that preliminary step.Lenticular clouds may be observed under the wind over a mountain; they are an Eulerian property of an area where there is condensation of water vapor.These clouds appear to be quasi-stationary, consequently if they were used as a Lagrangian tracer, they would lead to a small wind velocity.Such an analysis would be a misinterpretation of reality as these clouds are actually the signature of a strong wind.This is also the case for some small cumulus clouds that can appear at the vertical edge of some crops with strong radiative properties.They are the signature of a local vertical convection and therefore are not useful for retrieving horizontal velocities.It is important to mention that phase errors and joint phase-amplitude should be considered in the assimilation of remote measurements.This issue is not addressed in this paper.However, there is a significant literature on the topic that may be of interest to the reader (Hoffman et al., 1995;Hoffman and Grassotti, 1996;Brewster, 2003a, b;Ravela et al., 2007).

Principle of assimilation of pseudo-observations
Since the early 1980s and the work of Horn and Schunck (1981), research has been carried out to derive velocity fields from images sequences, with applications to fluid dynamics mainly (and very recently to movie compression and medical imagery).The velocity field derived from the image processing techniques can be used as pseudo-observations of wind in an assimilation system, for instance in a regular VDA scheme.
The left panel of Fig. 5 shows the principle of assimilation of pseudo-observations of velocity fields.From a sequence of images, a velocity field is estimated and used as observation of the velocity field in a regular scheme of data assimilation.There are several methods to extract a velocity field from a sequence of images.They can be divided into two categories: the frame-to-frame motion estimation and the so-called image model technique.Motion estimation or frame-to-frame motion estimation is a technique from image processing that aims at estimating the velocity field that transports an image to another.A mathematical description of this technique is given in Sect.4.2.From frame-to-frame motion estimation, one gets a velocity field between each pair of successive images of a sequence, but there is no guarantee of consistency in the resulting sequence of fields if it is applied to many pairs of images of the same sequence.In such cases, the image model technique can be more appropriate.It couples the frame-to-frame technique with an evolution model for the velocity field.For details of this technique, the reader is referred to Herlin et al. (2006), Huot et al. (2006) and Korotaev et al. (2007).

Frame-to-frame motion estimator
The description of motion estimation in this paper is limited to optical flow.It is a variational method and is well suited for image sequences in geophysics.There also exist statistical methods based on the correlation between successive images.For more information on those methods, the reader is referred to Adrian (1991) who describes the commonly used one: particle image velocimetry (PIV).Optical flow is a classical method of motion estimation.It is based on the conservation of the global luminance between two images (Horn and Schunck, 1981).Let I : × R → R be the luminance function defined on the pixel grid ⊂ R 2 and the time t ∈ R, then the optical flow is the vector field V (x, y) that satisfies the luminance conservation given by According to the nature of the images, the law of conservation of the luminance Eq. ( 11) can be replaced by a specific law: for example, with images of the ocean's color, conservation of chlorophyll (with source and sink terms) can be considered; with images of SST, the Boussinesq approximation can be used.In many cases, the validity of these laws, from the optical point of view, is dubious.For instance, between two satellite images, the enlightenment will have changed and some corrective term will have to be added to the equation.As a consequence, it is necessary to carry out a preliminary study of the images to detect structures on which the information borne by the equation of conservation and the images is maximized.For instance, if we are working with SST, filaments are important structures which have to be identified in the analysis.They are characterized by elongated structures, constant temperature, significant contrast with the surrounding area, and motion by translation.To identify filaments, it is necessary to use the tools of mathematical morphology (Serra, 1988;Najman and Talbot, 2010).In the images, it will also be necessary to discard points with a weak spatial gradient or with a weak temporal evolution.Detecting and/or eliminating structures from the images requires the application of a thresholding operator (e.g., on the norm of the gradient of the SST).Of course, the analysis will be sensitive to the threshold value chosen.The choice of the threshold is usually empirical.For a 2-D problem, the velocity field V = (u, v) is determined as the solution of an optimization problem.To this end, one defines a cost function J to be minimized as follows: A necessary condition for optimality is expressed by the Euler-Lagrange equations which involve the gradient of J with respect to u and v.For the cost function of Eq. ( 12), the Euler-Lagrange equations give the solution V * = (u * , v * ) as the solution of the linear system The determinant "det" of this system and the determinants det u and det v relative to unknowns u and v are all zero, meaning that the solution is not unique.The problem is ill-posed.
In fact, let V = (u, v) be a solution of the non-regularized problem, and W = (w 1 , w 2 ) a vector field, orthogonal to the image gradient, i.e., W , ∇I = 0. We have As a consequence, it is impossible to determine the motion in the direction orthogonal to the image gradient: this is the aperture problem that is well known in computer vision.It is a source of ill-posedness.To address the ill-posedness, regularization techniques are used.The literature on the regularization for image processing is very large.The references Tikhonov (1963), Horn and Schunck (1981), Alvarez et al. (1999), Nagel (1983), Schnörr (1994), Suter (1994), Weickert and Schnörr (2001), Black and Anandan (1991), Hinterberger et al. (2002), and Mémin and Perez (1998) give a start point for the interested reader.

Mathematical formulation
By direct assimilation of images, we mean using image observations directly in the cost function of variational data assimilation.In this case, image observations are jointly used with conventional observations to compute the optimal control variable of numerical models.The right panel of Fig. 5 shows a schematic representation of the direct assimilation of images.Images are used directly in the optimality system jointly with conventional observations.This direct use of images in the optimality system requires the definition of a mathematical space for the images with adequate topology and the associated images observation operator.An images observation operator is a mapping from the space of the numerical solution of the model toward the space of images.No prior step to extract pseudo-observations of state variables is needed.Direct assimilation of images requires the modification of the cost function in order to take into account the image observations.The cost function that takes into account images can be written as follows: www.nonlin-processes-geophys.net/22/15/2015/ Nonlin.Processes Geophys., 22, 15-32, 2015 where f (t) is the image function at time t, | F is the appropriate norm in the image space F, and H X→F is the observation operator for images; subsequently, it will be called the model to image operator.In Eq. ( 16), the background and regularization terms are omitted for sake of clarity.The regularization term will be the canonical one in VDA.

Observation operators for images
A first consideration of the image cost function is to take the norm of the left-hand side of the equation of the optical flow Eq. ( 11).The image is considered as a passive tracer moving with respect to the dynamics of the system, and more precisely with the motion field V .This approach, proposed in Béréziat and Herlin (2011), Papadakis and Mémin (2008) and Gorthi et al. (2011), leads to the following image cost function: This cost function cannot be turned easily into the form suggested by Eq. ( 16).The covariance matrix Q is defined with respect to the image gradients ∇I in order to restrict image information to pertinent areas containing discontinuities.If the model M is monotone and ensures the spatio-temporal continuity of the state X(t), the regularization of the flow V at time t now only depends on the regularity of the background condition V 0 .Due to the characteristics of images, they should not be used directly as an array of pixels in the cost function.Specific structures of the image, such as the lenticular clouds mentioned above, may have their own dynamics.In such cases, image observations cannot simply be considered as a passive tracer moving under the dynamics of the studied system.It is also important to point out that from a dynamical point of view, information in an image sequence are located in discontinuities and the dynamics of those discontinuities.Even with advanced covariance matrices, the pixel representation of images is not suitable to describe such phenomena in data assimilation.Additional operators should be used to isolate structures of interest from the image.In this case, the cost function Eq. ( 16) takes the form where S is the space of features of interest to be isolated from the image.This notation is borrowed from Titaud et al. (2009) where such a space is called the "space of structures".H F →S is the image to structure operator and H X→S is the model to structure operator.Image to structure operator: the goal of such an operator is to extract features of interest from image observations.As we said previously, the main information obtained by human vision from the image is located in the discontinuities.A definition of S must be related to the discontinuities in the image function.Discontinuities are well characterized in spectral spaces.Thus, the basic definition of S may be based on a spectral decomposition such as Fourier, wavelet or curvelet.
Model to structure operator: this operator extracts features of interest from the system state given by the model.It can be defined as where H X→F is the model to image operator previously defined.Setting the image to structure operator to be the identity (H F →S = Id), we get the cost function given by Eq. ( 16).Another approach of using image observation in laboratory data assimilation can be found in Ravela et al. (2010).The authors used a computer vision system to extract measurements from the physical simulation with parallel computing and decomposition to account for observation in real time as well as using the numerical model to adapt the observing system.

Adjoint model in direct image assimilation
When the image term is added to the cost function Eq. ( 18), the adjoint model of variational data assimilation Eq. ( 9) becomes Nonlin.Processes Geophys., 22, 15-32, 2015 www.nonlin-processes-geophys.net/22/15/2015/ image forcing term The expression is the Jacobian of the model to structure operator.The presence of this expression means that the model to structure operator must be differentiable.We can then compute its Jacobian and the gradient of the cost function in order to be able to carry out an optimization algorithm and identify the optimal initial condition.

Examples of direct image assimilation techniques
In this section, we describe two tools that can be used to construct observation operators.The first method uses the advection of a passive tracer whose concentration map is considered as the image.This method is well adapted for assimilating a sequences of images.The second method uses the computation of Lagrangian coherent structure (LCS) of the flow.This method exploits the integrated information contained in tracer images and is well suited for single image assimilation.We will also discuss different examples of mathematical spaces for image structures.All the associated topologies will be of L 2 -type.The main purpose of this choice is its convenience; other choices, for example the L 1 that is commonly used in image processing, could be considered.Also the question of introducing some covariance matrix into the definition of the topology remains open.The choices shown below are not exhaustive.Many other potential spaces could be considered.The choice of the mathematical space for images defines the image to structure operator introduced in the previous section.

Observation operators based on the advection of passive tracer
In this section, we consider the case where the model to structure operator can be decomposed into a model to image operator and an image to structure operator.We focus only in the image to structure operator that is the most important as stated in Sect. 5. A simple example of the model to image operator can be defined by considering images as observations of a passive tracer.Image evolution is then modeled by a transport equation, the initial distribution of the passive tracer being given by the first image.Interpolation from the grid points of the numerical model toward the grid points of the image may be necessary.In this case, an image is considered as the concentration of the passive tracer.

Pixel representation of image
The pixel representation of a 2-D image is a discretization (numerical representation) of a mathematical function of two variables that defines the image.It is usually given as a 2-D array, each entry of the array being the value of the image at the associated grid cell of the discretization.The simple case of image assimilation is to consider the identity image to structure operator.In this case, the cost function associated with the image will take the form where H X→F defines the concentration of the passive tracer from the system state and f is the observed concentration (image) at time t.In this case, the image is considered as an array of Eulerian observations of the tracers and the features of the dynamics (fronts, vortices, etc.) are not explicitly taken into account.

Multiscale analysis of images: curvelets
Recent years have seen a rapid development of new tools for harmonic analysis.For general fluid dynamics and also for geophysical flows, there are coherent structures evolving in an incoherent random background.If the flow is considered as an ensemble of structures, then the geometrical representation of flow structures might seem to be restricted to a welldefined set of curves along the singularities in the data.The first step in using images as observations in data assimilation is to separate the resolved structures, which are large, coherent and energetic, from the unresolved ones, which are supposed to be small, incoherent and bearing little energy.One of the first studies in this sense can be found in Farge (1992).It shows that the coherent flow component is highly concentrated in wavelet space.Wavelet analysis is a particular space-scale representation of signals which has found a wide range of applications in physics, signal processing and applied mathematics in the last few years.The literature is rich regarding wavelets.The interested reader is referred to Mallat (1989), Coifman (1990) and Cohen (1992) for example.A major inconvenience of wavelets is that they tend to ignore the geometric properties of the structure and do not take into account the regularity of edges.This issue is addressed by the curvelet transform.The curvelet transform is a multiscale directional transform that allows an almost optimal nonadaptive sparse representation of objects with edges.It was introduced by Candes andDonoho (2004, 2005a, b) and Candes et al. (2006).In R 2 , the curvelet transform allows an optimal representation of structures with C 2 -singularities.As curvelets are anisotropic, they have a high directional sensitivity and are very efficient in representing vortex edges.
A function f ∈ L 2 (R 2 ) is expressed in terms of curvelets as follows: where j,l,k is the curvelet function at scale j , orientation l and spatial position k (with k = (k 1 , k 2 )).The orientation parameter is the one that makes the major difference with the wavelet transform.The set of curvelet functions j,l,k does not form an orthonormal basis as is the case for some families of wavelets.However, the curvelet transform satisfies the Parseval relation so that the L 2 -norm of the function f is given by where c j,l,k are the curvelet coefficients given by In Fig. 6 from Ma et al. (2009) the supports of some wavelets and curvelets are presented.The figure shows the strong anisotropy curvelets and suggests that curvelet representation will give a better adjustment for 2-D curves.
Figure 7 shows an illustrative comparison of the approximation of a circle by wavelets and by curvelets.The curvelets provide a better approximation of this perfectly anisotropic object.The convergence of curvelets is also better: the best m-term approximation f m of a function f has the representation error ||f − f m || ≈ m −1 for wavelets and ||f − f m || ≈ Cm −2 (ln m) 3 for curvelets.Another interesting property of curvelets in the framework of variational data assimilation is that the adjoint of the curvelet transform is the inverse of the curvelet transform.Therefore, to represent an image, we will consider the truncation of its expression in a curvelet frame.

Numerical experiments
In this section, we present numerical experiments of direct image assimilation with observation operators based on the advection of a passive tracer.We used images from experimental physics; the drift of a vortex is studied through physical experiment in the Coriolis platform: it is a circular rotating tank with a diameter of 14 m, located at Laboratoire des Écoulements Géophysiques et Industriels (LEGI), Grenoble, France.The rotation of the tank recreates the effect of the Coriolis force in a thin layer of fluid.The vortex is generated by stirring the fluid and made visible for optical images thanks to the addition of the fluorescein that is a passive tracer.Pictures are taken from above the turntable at regular time intervals to study the evolution of the vortex.A full description of a similar experiment can be found in Flór and Eames (2002).A sequence of two images from that experiment is used for the motion estimation experiment in this paper.This sequence is named Coriolis sequence after the name of the platform.

Experimental framework
In the configuration of the Coriolis platform as described above, the state variable is X = (u, v, h), which satisfies the shallow-water equations Unknowns are the zonal component u(t, x, y) and meridional component v(t, x, y) of the current velocity and the surface elevation h(t, x, y).They depend on time t and the two horizontal directions x and y.We define the relative vorticity ζ = ∂ x v − ∂ y u and the Bernoulli potential , where g is gravity.The Coriolis parameter on the β-plane is given by f = f 0 + βy, ν is the diffusion coefficient and r the bottom friction coefficient.In this paper, the following numerical values are used for the parameters: r = 0.9 × 10 −7 s −1 , ν = 0 m 2 s −1 , f 0 = 0.The domain is discretized on a N × N = 128 × 128 uniform Arakawa C-type square grid.A finite difference scheme is used for space discretization.Time integration is performed using a fourth-order Runge-Kutta scheme.The time step is set to 0.01 s in the turntable experiment, which corresponds to 14.4 s in the atmosphere.

Assimilation procedure
We consider the problem of recovering the initial state of the fluid U (x, y) = X 0 (x, y) = (u, v, h)(0, x, y) which constitutes our control variable.Only images are used as observations.We use image to structure operators based on pixels.
Edge structures of images are extracted by applying a threshold operator on their curvelet coefficients.More precisely, let c i,j,k be the curvelet coefficients of the expression of a given function f in the frame of curvelets, see Eq. ( 25): we consider the scale-dependent hard thresholding operator τ defined as where σ j is the threshold value for the scale j .The σ j are predefined and depend on the problem and on the data.We mention two particular cases: 1. hard thresholding τ h with the same threshold for all scales: σ j = σ for a given σ , 2. hard thresholding zeroing coarse-scale coefficients τ z ; this is similar to hard thresholding with the exception that the coefficient associated with each curvelet function of the coarsest scale is set to zero.
Curvelet thresholding for edge extraction can also be found in Ma et al. (2006).

Numerical results
Figure 8 shows the initial analyzed velocity field with different observation operators.With the identity observation operator (pixels), the analyzed velocity field shows a nonsymmetric vortex and large motion where there must be no dynamics.With the hard thresholding of the curvelet decomposition, the problem of parasitic motion is solved.On the other hand, the order of magnitude is underestimated.Using hard thresholding with the coarsest-scale coefficients set to zero, the problem of order of magnitude is solved, although the problem of parasitic motion arises again with less significance.Using scale-by-scale thresholding of the curvelet decomposition, the main problems (parasitic motion, underestimation of order of magnitude) encountered with other operators are solved.The result of this set of experiments illustrates the importance of an adequate observation operator in direct image assimilation.

Observation operators based on finite-time Lyapunov exponents and vectors computation
Ocean tracer images (SST and Ocean Color for instance) show patterns, like fronts and filaments, that characterize the flow dynamics.They are closely related to the underlying flow dynamics and are referred to as Lagrangian coherent structures (LCSs).They are material curves which exhibit locally the strongest attraction, repulsion or shearing in the flow over a finite-time interval (Haller and Yuan, 2000;Haller, 2011).Their location and shape are the signature of integrated dynamic information that should be exploited in a data assimilation scheme.LCSs are usually identified in a practical manner as maximizing ridges in finitetime Lyapunov exponent (FTLE) fields (Haller, 2001).FTLE is a scalar local notion that represents the rate of separation of initially neighboring particles over a finite-time window [t 0 , t 0 + T ], T = 0.It is defined as the largest eigenvalue of the Cauchy-Green strain tensor of the flow map.The corresponding eigenvector is called the finite-time Lyapunov vector (FTLV).Let X(t) = X(t; X 0 , t 0 ) be the position of a Lagrangian particle at time t, which started at X 0 at t = t 0 and was advected by the time-dependent fluid flow U(X, t), t ∈ [t 0 , t 0 + T ].An infinitesimal perturbation δX(t) started at t = t 0 from δ 0 = δX(t 0 ) around X 0 then satisfies, for all t ∈ [t 0 , t 0 + T ], Let λ max be the largest eigenvalue of the Cauchy-Green strain tensor where φ t t 0 : X 0 −→ X(t; X 0 , t 0 ) represents the flow map of the system (it links the location X 0 of a Lagrangian particle at t = t 0 to its position X(t; X 0 , t 0 ) at time t = t 0 ).The forward FTLE at the point X 0 ∈ and for an advection time T starting at t = t 0 is defined as FTLV is the eigenvector associated with λ max .The FTLE thus represents the growth factor of the norm of the perturbation δX 0 started around X 0 and advected by the flow after the finite advection time   (BFTLE and BFTLV) are similarly defined, with the time direction being inverted, in Eq. ( 29).BFTLE (BFTLV) is a scalar (vector) that is computed at a given location X 0 .Seeding a domain with particles initially located on a grid leads to the computation of discretized scalar (BFTLE) and vector (BFTLV) fields.Ridges of the BFTLE field approximate attracting LCSs (Haller, 2011).An example of a BFTLE and corresponding BFTLV orientation maps, computed from a mesoscale (1/4 • ) time-dependent surface velocity field coming from a simulation of the North Atlantic ocean, is given in Fig. 9.The BFTLE field shows contours that correspond reasonably well to the main structures such as filaments, fronts and spirals that appear in the SST field of the same simulation (see Fig. 10, left panel).Note that this field can be distinguished by spatial observations.Also the BFTLVs align with the gradients of this tracer field: Figs. 9 and 10 (right panels) show that BFTLVs and SST gradients have similar orientations.These similarities illustrate the strong link between the tracer field patterns and the underlying flow dynamics.BFTLE and BFTLV properties may thus be exploited to identify the appropriate structure space to be used in a direct image assimilation framework (Titaud et al., 2011).
First, thanks to its almost-Lagrangian nature (Lekien et al., 2005) the BFTLE field can be considered as an image of the tracer field.Then H X→F : U −→ BFTLE(U) defines a model-to-image operator which is composed of an imageto-structure operator H F →S : The image-to-structure operator can be a contour extraction function: as an example, Fig. 12 shows the binarisation of the gradient of the FTLE and SST fields presented in Figs. 9 (left panel) and 10 (left panel) respectively.In that case, the structure space is the set of binarised images.Note that BFTLE produces images with stronger discontinuities than operators based on passive tracer advection: in this latter case, the numerical diffusion softens the discontinuities which makes the comparison with high-resolution satellite images less accurate.
Secondly, the alignment of the BFTLV with the tracer gradients (Lapeyre, 2002;d'Ovidio et al., 2009) allows the construction of a strict model-to-structure operator: structures are identified as the orientation of the gradient of the image and the observation operator is simply defined as H X→S (U) = BFTLV(U). (34) Several studies showed that model-to-structure operators based on BFTLE-V are sensitive to perturbations of the velocity field in a direct image assimilation framework: Fig. 11 (left panel) (resp.right panel) shows a set of misfits in the image ridges space (resp. in the image gradient orientation space) between BFTLE (resp.BFTLV) and SST fields with respect to the amplitude of random perturbations: each misfit admits a unique minimum close to the non-perturbed state.Moreover BFTLV shows a more robust behavior than BF-TLE: misfits are smoother and minima are identical.These studies clearly illustrate the feasibility of the use of such operators in direct image assimilation to control surface velocity fields.For more details about the theoretical framework and experimental setup for the use of BFTLE-V as model-tostructure operator see Gaultier et al. (2013) and Titaud et al. (2011).other fields like agronomy, economy or medicine.Data assimilation is a universal problem if we want to understand and predict the evolution of a system governed by a corpus of deterministic or random equations.This is especially true if, in reality, any realization of the system is unique.More and more, information is available as images or image sequences of the observed system.Their dynamics often permit a better understanding of the system.However, the information contained in images is still mainly used in a qualitative way by experts of the application domain.
In this paper, we described two frameworks whereby data assimilation schemes can deal with image information.First, images and sequence of images may be post-processed in order to extract some indirect (pseudo) observations that are related to the state variables of the model.The most common example is the motion vector field which can be inferred using motion estimation techniques.The result of post-processing is then used as a conventional observation in the data assimilation scheme.This approach has several limitations which should be overcome by the Direct Image Assimilation approach.In this framework, we consider the image or the image sequence as regular observations which must be linked to the control variable using an appropriate observation operator.For dynamical systems, the pertinent information that should be observed is brought by the structures of the image (e.g., the discontinuities).The observa-Nonlin.Processes Geophys., 22, 15-32, 2015 www.nonlin-processes-geophys.net/22/15/2015/ tion operator must then map the control space into the image structure space.We show some examples of direct assimilation techniques.The corresponding results are very encouraging.However, we are still far from an operational use of the assimilation of images.We need to keep in mind that almost two decades were necessary to make variational data assimilation operational in the primary meteorological centers worldwide.Many questions and difficulties remain both from the theoretical and practical points of view: 1. What are the most adapted structure spaces defining images?From the computational point of view, images have to live in a reduced space with respect to the trivial definition as an ensemble of pixels.
2. What topology should be used in the space of images?
In this paper we have used L 2 type metrics which tend to regularize the estimated control variable.We have to keep in mind that the information in images is borne by their singularities, so that other metrics, such as L 1 , have to be considered.
3. How to use images to guide nesting of models?
Outside of geophysics, there are many fields of application: aeronautics, especially for non-stationary flows, medicine and other fields for which images are an important source of information.

Figure 3 .
Figure 3. Evolution of a storm on western Europe: 28 April 2008 (left panel) and 29 April 2008 (right panel).

Figure 4 .
Figure 4. Performance of the forecast: anomaly correlation at 500 hPa height forecast (courtesy of ECMWF).

Figure 5 .
Figure 5. Schematic representation of the use of images in data assimilation: assimilation of pseudo-observations (left panel); direct assimilation of images (right panel).

Figure 7 .
Figure 7. Schematic view of a single-scale approximation of a circle with multiscale decomposition wavelet (left panel) and curvelet (right panel).

Figure 8 .
Figure 8. Analyzed initial velocity field computed by direct image sequence assimilation with different image observation operators: identity operator (top left panel); curvelet decomposition and hard thresholding (top right panel); curvelet decomposition and scale-by-scale thresholding (bottom left panel); curvelet decomposition and hard thresholding zeroing coarsest scale (bottom right panel).

Figure 9 .Figure 10 .
Figure 9. Backward FTLE (day −1 ) (left panel) and corresponding backward FTLV orientations (angular degree) (right panel) computed from the surface velocity of a simulation of the North Atlantic Ocean.

Figure 11 .
Figure11.Sensitivity of the misfit between BFTLEs (BFTLVs) and SST fields with respect to the amplitude of nine random perturbations applied to a reference velocity field.Left panel: misfit between BFTLE and SST fields computed in the space of binary images (after the application of the image-to-structure operator).Right panel: angular misfit between BFTLV and SST gradients.

Figure 12 .
Figure 12.Structures extraction: binarization of the FTLE (left panel) and SST (right panel) gradient fields of Figs. 9 (left panel) and 10 (right panel) using a basic threshold technique.