In numerical weather prediction, the problem of estimating initial conditions with a variational approach is usually based on a Bayesian framework associated with a Gaussianity assumption of the probability density functions of both observations and background errors. In practice, Gaussianity of errors is tied to linearity, in the sense that a nonlinear model will yield non-Gaussian probability density functions. In this context, standard methods relying on Gaussian assumption may perform poorly.

This study aims to describe some aspects of non-Gaussianity of forecast and
analysis errors in a convective-scale model using a Monte Carlo approach
based on an ensemble of data assimilations. For this purpose, an ensemble of
90 members of cycled perturbed assimilations has been run over a highly
precipitating case of interest. Non-Gaussianity is measured using the

Results confirm that specific humidity is the least Gaussian variable according to that measure and also that non-Gaussianity is generally more pronounced in the boundary layer and in cloudy areas. The dynamical control variables used in our data assimilation, namely vorticity and divergence, also show distinct non-Gaussian behaviour. It is shown that while non-Gaussianity increases with forecast lead time, it is efficiently reduced by the data assimilation step especially in areas well covered by observations. Our findings may have implication for the choice of the control variables.

In data assimilation, the analysis step may be seen as finding a maximum
likelihood of the probability density functions (PDFs) of the state

The time integration of the model nonlinear dynamics leads inevitably to
non-Gaussian forecast errors

In NWP, the analysis of humidity may be the most problematic with respect to
non-Gaussianity (NG). This is due to the condensation effects near saturation
and the intrinsic positivity of humidity. The choice of the control variable
for humidity is a long-standing debate

The 4D-Var (4-dimensional variational) algorithm commonly used in NWP
(e.g.

The PDF of observation errors is also non-Gaussian in general. In NWP,
quality controls are performed to exclude observations that are outliers
compared to the model and using statistical knowledge

The main goal of this paper is to rely on a Monte Carlo approach to document
the spatial variations of non-Gaussianities of background and of analysis
errors for a particular meteorological case, in the context of convective-scale NWP. For this purpose, a large ensemble of perturbed cycled
assimilations has been set up with the AROME-France

Application
de la Recherche à l'Opérationnel à Méso-Echelle

The paper is organized as follows: Sect.

In NWP, dimensions of the state and observation vectors, including satellite
and radar, are huge (respectively around

The D'Agostino test

The theoretical skewness and kurtosis are respectively estimated over an
ensemble by the sample third (

The efficiency of the

The POD of the

The POD is estimated for three non-Gaussian distributions: uniform,
log-normal, and a Gaussian mixture. The Gaussian mixture is defined through
its PDF as

PODs are estimated over

A review of other well-established tests for Gaussianity are presented in

AROME-France is an operational nonhydrostatic model covering France with a
2.5

Action de Recherche Petite Échelle Grande Échelle

The simulation of background and analysis errors is achieved by using a
Monte Carlo sampling, called an ensemble data assimilation (EDA) in the
context of NWP. A 90-member EDA is first run for the global model

The case of interest is 4 November 2011 between 00:00 and 06:00 UTC (universal time coordinated).
A strong southerly convergent flow occurs at low levels over southern France
(Fig. 2). Warm and moist air from the Mediterranean sea is advected over
land, which triggers deep convection. Those high intensity events, named
cévenol events, are studied by the HyMeX research program

Vertical profiles of

Background-error standard deviations of

Time evolution of the vertical profiles of

The vertical profiles of quantities related to NG are shown in
Fig.

Vertical profiles of

Vertical profiles of

Comparison of

In the troposphere,

The range, defined as the difference between the 95th and the 5th
percentiles, could be used to describe roughly the horizontal spatial
variability for each vertical level. Vertical profiles of ranges of

Supporting the conclusion drawn from Fig.

It may be interesting to compare NG with the variance of the ensemble, as

NG of the surface pressure is not shown in this study since, according to our
diagnostics, it is a mainly Gaussian variable (averaged

For each member of the ensemble, 18

In order to get insights into the processes that may be involved in NG
development, the diagnostics have been separately computed for cloudy and for
clear sky areas, following a similar approach to that of

During the first 6 h of forecasts, NG quickly increases. For

It is interesting to note that different behaviours can be found for
diagnostics computed over cloudy and clear-sky areas. For

For the wind components, behaviours close to

Based on comparisons of NG diagnostics between successive background and analysis errors, this section focuses on the evolution of NG through cycled 3D-Var assimilations. Analysis errors will be treated for both model and control variables. The link between assimilated observations and NG reduction will be shown.

An overview of the NG evolution during the analysis process is given in
Fig.

Geographical variations of NG are illustrated in Fig.

Previous results are documenting the NG of four model prognostic variables:

Vertical profiles of NG for control variables are presented in
Fig.

Negative values of

Those results agree with one of the conclusions of

While very similar, the horizontal structures of

As for Fig.

To go further in the discussion on Gaussianity of the control variables, this
section compares the

According to Fig.

It is suggested to use the

According to our diagnostic, among model variables,

Despite the fact that this work is attributing non-Gaussian behaviours to well-known
nonlinear processes, such as the microphysical or boundary layer processes,
it is not precisely addressing the cause of NG. However, two important
questions on variational data assimilation are highlighted. First, regarding
control variables of the assimilation, according to our diagnostic, the most
non-Gaussian variables are the vorticity and the divergence. Yet, the main
efforts have been put on “Gaussianization” of specific humidity

This study uses an ensemble at convective scale that does not include model
error either in the analysis or in the forecast steps. It is possible that
conclusions would be different if stochastic noise drawn explicitly from a
Gaussian distribution is added to the model states during the forecasts, as
stated by

This work has been supported by the French Agence Nationale de la Recherche (ANR) via the IODA-MED Grant ANR-11-BS56-0005 and by the MISTRALS/HyMeX program. The authors thank Benjamin Ménétrier, Gérald Desroziers, and Loïk Berre for their scientific advice and their careful readings that proved very useful to improve the manuscript.Edited by: O. Talagrand Reviewed by: C. Pires and two anonymous referees