Data assimilation (DA) aims at optimally merging observational data and model outputs to create a coherent statistical and dynamical picture of the system under investigation. Indeed, DA aims at minimizing the effect of observational and model error and at distilling the correct ingredients of its dynamics. DA is of critical importance for the analysis of systems featuring sensitive dependence on the initial conditions, as chaos wins over any finitely accurate knowledge of the state of the system, even in absence of model error. Clearly, the skill of DA is guided by the properties of dynamical system under investigation, as merging optimally observational data and model outputs is harder when strong instabilities are present. In this paper we reverse the usual angle on the problem and show that it is indeed possible to use the skill of DA to infer some basic properties of the tangent space of the system, which may be hard to compute in very high-dimensional systems. Here, we focus our attention on the first Lyapunov exponent and the Kolmogorov–Sinai entropy and perform numerical experiments on the Vissio–Lucarini 2020 model, a recently proposed generalization of the Lorenz 1996 model that is able to describe in a simple yet meaningful way the interplay between dynamical and thermodynamical variables.

We split the Introduction into three parts. The first two are proper introductory discussions providing the context. In part three, we provide the motivations and describe the goals of the present work.

The dynamics of several natural systems, including the atmosphere and the
ocean, are characterized by chaotic conditions which, roughly speaking,
describe the property that a system has sensitivity to initial states. This
means that, even in the presence of a perfect model, small errors in the
initial conditions will grow in size with time, until the forecast becomes de
facto useless

In the words of Ed Lorenz, “Chaos:
When the present determines the future, but the approximate present does not
approximately determine the future”; see

It is possible to associate each LE with a physical mode.

The properties of the dynamical models have large implications on data
assimilation

Numerical and analytic evidence has emerged recently showing that under certain observational conditions (data types, spatio-temporal distribution, and accuracy), the performance of DA with chaotic dynamics relates directly to the instability properties of the dynamical model where data are assimilated. One can thus in principle use the knowledge of the dynamical features to inform not only the design of the DA that better suits the specific application – e.g. how many model realizations for the Monte Carlo based DA methods, or the length of the assimilation window in variational DA – but also the best possible observational deployment.

A stream of research has shed light on the mechanisms driving the response of
the ensemble-based DA

The picture above slightly changes in the presence of a degenerate spectrum of
LEs, which often arises in systems with multiple scales, associated with the
presence of coupling between subsystems with different characteristic
dynamical timescales

In the stochastic scenario, noise is usually injected irrespective of the
flow-dependent modes of instabilities. Consequently, with a non-zero
probability, error is also injected onto stable directions that would not have
been otherwise influential in the long term. The trade-off between the
frequency of the noise injection and its amplitude on the one hand, and the
dissipation rate of stable modes on the other, determines the amplitude of the
long-term error along stable modes

The knowledge of the LEs and its associated Lyapunov vectors (LVs) can be used
to operate key choices in the implementation of ensemble-based DA schemes
aimed at enhancing accuracy with the smallest possible computational
cost. This point of view is at the core of DA algorithms that operates a
reduction in the dimension of the model

While extremely theoretically appealing and practically useful in
low-to-moderate dimensional problems, the use of the dynamically informed DA
approaches is difficult in high dimensions, where even just computing the
asymptotic spectrum of LEs, let alone the very relevant state-dependent local
LEs (LLEs), is very difficult or just impossible. A major but not exclusive
issue is that LE estimation algorithms require computation of the tangent space
of the dynamical system, a task usually unfeasible for high-dimensional
systems, or impossible when the model equations of are not explicitly
accessible. On the other hand, the existence of a relationship between the DA
and the unstable–neutral subspace suggests reversal of the viewpoint: use DA as a tool for estimating the properties of a given system that would be otherwise
very difficult to compute. As a model-agnostic technique, DA, and in particular ensemble-based methods such as the EnKF, can be applied to any
model without the need of computing the tangent space. This makes the EnKF a
potentially powerful instrument to reveal the stability properties of a
dynamical system. This is the goal of this work. Specifically, we shall
investigate whether we can use DA to infer the spectrum of the LEs and the
Kolmogorov–Sinai entropy (

The paper is structured as follows. In Sect.

We are interested in searching for a further relation between the skill of
EnKF-like methods applied to perfect (no model error) chaotic dynamics and the
spectrum of LEs. We shall build our derivations on the results mentioned in
Sect.

At time

In general, we can write

Let us define the

The asymptotic mean squared error of the forecast (MSEF) of the KF solution is
given by the trace of Eq. (

Equation (

Let us substitute the SVD of

By defining the maximum of

Given that

We can thus finally use Eq. (

As alluded to at the beginning of the section, a direct expression (e.g. an
equality in place of a bound) relating the model instabilities and the error
can be obtained under strong simplified and somehow unrealistic assumptions on
the form of the model dynamics and of the data, for example, if the linear
dynamics

In the next sections we will perform numerical experiments under controlled scenarios to investigate the conditions for which the bound holds. In particular, we will study the conditions leading to the smallest possible upper bound, such that the output of a converged DA, i.e. its asymptotic MSEF, can be used to infer the LE spectrum of the model dynamics.

Our test bed for numerical experiments is the low-order model recently developed by

In the VL20 model it is possible to introduce a notion of kinetic energy

Instabilities features of the VL20 model for the three forcing configurations;

In all the following experiments, we set

Synthetic observations are generated according to Eq. (

In line with previous studies

The performance of DA experiments will be assessed primarily using the root mean square error of the analysis, normalized by the observation variance:

Our analysis focuses on the relation between observational design and filter
accuracy and the relation between the model instabilities and the filter
accuracy. By exploiting the novel dynamical–thermodynamical feature of VL20
over its L96 precursor, we will also study the EnKF-N under observational
scenarios that alternatively measure the dynamical variable,

Figure

Time series of nRMSEa over the first

The first connection between the filter performance and the model
instabilities is drawn from Fig.

The time-averaged nRMSEa for all experiment configurations. The vertical dashed lines indicate the dimension of unstable–neutral subspace,

As mentioned above, the VL20 model represents four main physical mechanisms: (i) conversion between kinetic and potential energy, (ii) the energy injection from external
forcing, (iii) the advection, and (iv) the dissipation. Although these
processes all participate in the evolution of the model, the non-linear interplay
cannot be straightforwardly disentangled. Nevertheless, we shall try to refer
to them when interpreting the outcome of the DA experiments. In particular, in
each experiment we will attempt to identify the prevailing mechanism over the
aforementioned four. We perform three experiments, where we observe the full
system state (i.e.

The nRMSEa with varying energy transfer coefficient

Overall, and as expected, the analysis error is smaller in the observed
variables (cf. the left and middle columns and corresponding colour lines) and
attains the smallest level when

Finally, the effect of the energy transfer and advection can be revealed by
looking at the partially observed experiments (left and middle columns). Both
mechanisms involve the momentum, making it more efficacious to observe

Further insight into the role of the driving (unstable) variable and on the interplay between the prevailing physical mechanisms and the analysis error is
given by looking at the CLVs

Normalized time-averaged amplitude of CLV components (the absolute value) for

As discussed above, changes in the value of

The behavior changes substantially when the energy exchange is the dominant physical mechanism (

The results in Fig.

Along with

The nRMSEa with varying dissipation coefficients

Normalized time-averaged amplitude of CLV components for

With the leading CLVs strongly affected by the external forcing, the amplitude
of the CLVs along the system's components is similar to the pattern of low
energy exchange rate in Fig.

The results of Sect.

The derivation in Sect.

A first insight on the existence of a direct relation between the model instabilities and the skill of the EnKF-N is already provided in Figs.

The nature of this relation is further studied in
Fig.

Scatter plots of

The scatter plots also demonstrate the validity of the upper bound (red
markers) of Eq. (

The linear relation does not hold for numerical experiments when

The error bounds in Sect.

The effect of the first is studied in Fig.

Scatter of

The impact of partially observing the system causes the emergence of a weakly
quadratic relationship between the analysis error and either

We study the effect of changing the amplitude of the observational error in Fig.

Scatter plots of

Scatter plots of

Finally, the impact of varying the observation frequency is explored in
Fig.

The larger sensitivity to the observation frequency than to observation noise
(cf. Figs.

It is sometimes of great importance to be able to obtain information on the
instability of a system of interest by performing data analysis of suitably
defined observables. This is of key importance when one does not have direct
access to the evolution equations of the system or when the analysis of its
tangent space is too computationally burdensome. As an example, quantitative
information on the degree of instability of a chaotic system can be extracted
using extreme value theory by studying the statistics of close dynamical
recurrences as well as of extremes of so-called physical observables

In this study, we have addressed this problem by taking the angle of DA. The
relation between DA and the instability of the dynamical system where it is
applied has long been studied

The existence of a relation between

We demonstrate that the skill of the EnKF-N is directly linked to both

The error bound and the linear/quasi-linear relation between the error and

Our numerical experiments indicate a second way to estimate

The linear relation between error and

We are currently considering how these results will change when performing DA
for state and parameter estimation. In this context, a relevant recent study
has shown how the minimum number of ensemble members,

The Python script for the plotting and data assimilation experiments is available at

All data used in this paper are generated by Python scripts provided in the code availability section.

YC designed and conducted the experiments and prepared the manuscript. AC and VL both provided the original idea and wrote the manuscript. All authors contributed to the development of the work.

At least one of the (co-)authors is a member of the editorial board of

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors are thankful to Patrick Raanes (NORCE, NO) for his support with the use of the data assimilation Python platform DAPPER. Yumeng Chen and Alberto Carrassi are thankful for the funding by the UK Natural Environment Research Council. Valerio Lucarini is thankful for the support from the EPSRC and the EU Horizon 2020. We thank two anonymous reviewers for their valuable comments and suggestions for our paper.

This research has been supported by the National Centre for Earth Observation (grant no. NCEO02004), the Horizon 2020 (TiPES (grant no. 820970)), and the Engineering and Physical Sciences Research Council (grant no. EP/T018178/1).

This paper was edited by Takemasa Miyoshi and reviewed by two anonymous referees.