Existent methods to identify linear response functions from data require tailored perturbation experiments, e.g., impulse or step experiments, and if the system is noisy, these experiments need to be repeated several times to obtain good statistics. In contrast, for the method developed here, data from only a

To gain understanding of a physical system, it is very helpful to know how it responds to perturbations. Considering a small time-dependent perturbation

Linear response functions have been successfully applied within different contexts in many fields of science and technology. In physics, for example,
material constants like the magnetic susceptibility or the dielectric function must be understood as linear response functions that can be obtained by
Kubo's theory of linear response

From a theoretical point of view, the existence of a linear response is by no means obvious: structurally stable dynamical systems are the exception

In practical applications where the response function must be recovered from data, its identification may be a challenging task. The reason is that
the identification problem is generally ill-posed, so that by classical numerical methods one obtains a recovery severely deteriorated by noise (see below). In addition, existent methods to identify these functions from data require one to perform special perturbation experiments. In the present study, we develop a method to identify linear response functions, taking data from

The generality of our method allows for derivation of response functions in cases hardly possible before. Examples are problems where performing perturbation experiments is computationally expensive, so that one must use data that were not designed for the purpose of deriving these functions. In the geosciences, this may be the case when one is interested in characterizing by response functions the dynamics of Earth system models – extremely complex systems employed to simulate climate and its coupling to the carbon cycle. In principle, with our method one can derive these functions, taking
simulation data from Earth system model intercomparison exercises such as

In the field of climate science, the typical method to identify linear response functions is by means of the impulse response function, which is the
response to a Dirac delta-type perturbation

Other studies have proposed to identify linear response functions by making use of other types of perturbations.

As noted by

Deriving

To remedy these noise problems, a method intended to “damp” the noise in the response is usually employed. In

Instead of trying to improve the signal-to-noise ratio of the data by improved experiment design, here we are interested in deriving

As a preparation for introducing our method in Sect.

The linearity assumption is on purpose: in the present approach to derive the linear response function (see next section), hereafter called the

In addition, although we derived Eq. (

In this section we derive the RFI method. As mentioned above, the aim of this method is to obtain the linear response function using data from a
single realization of a given perturbation experiment. For this purpose, an essential step is our novel estimation of the noise term

Starting from the ansatz Eq. (

This section is organized as follows. In the first subsection, we introduce the assumption for the functional form of the linear response function. In
Sect.

In general, the identification of linear response functions from data may be performed either pointwise

Assuming this ansatz, the question on the functional form of

Therefore, to avoid the complication of determining

Accordingly, we assume that the response is dominated by relaxing exponentials, meaning that potential contributions from oscillatory modes are not
distinguishable from noise. By this approach the timescale

This approach has an additional advantage. By prescribing the distribution of timescales, one must not solve a

In view of applications to geophysical systems like the climate or the carbon cycle (Part 2 of this study) that are known to cover a wide range of
timescales

Hereafter,

In order to apply the basic Eq. (

Matrix

Unfortunately, it turns out that solving Eq. (

To treat the ill-posedness of Eq. (

To deal with the ill-posedness, it is useful to perform a singular value decomposition (SVD) of the matrix

In practice, when a SVD is applied to a discrete version of a Fredholm equation of the first kind, the components of the singular vectors

It is well known that when applying the solution (Eq.

Regularization remedies this problem by suppressing the problematic high-frequency components. This approach assumes that the main information on the
solution is contained in the low-frequency components, so that the high-frequency contributions to the sum (Eq.

To perform such filtering, we employ the Tikhonov–Phillips regularization method

The standard Tikhonov–Phillips regularization yields the regularized solution in the simple form

Therefore, now the problem boils down to determining

By construction it is clear that

To introduce our approach, in the following we assume that data from an unforced experiment (control experiment) are available – as is typically the
case in applications to Earth system models (see Part 2) – that allow for an independent estimate of the noise level

A naive way to invoke these data to determine

Formally in Eq. (

Accordingly, the first term in the sum gives the “true” solution

Therefore,

This equation determines the high-frequency components of the noise

For this purpose, we take advantage of the data from the control experiment. The control experiment is an experiment performed for the same conditions
as the perturbed experiment, with the only difference that the forcing

After these considerations,

In this way, the magnitude of the high-frequency components of

Compared to taking for

In the application to the land carbon cycle in Part 2 of this study, we show that certain response functions

Final RFI algorithm (see text for details).

The idea is to adjust the low-frequency components of noise independently of the high-frequency components iteratively until the solution obeys the
monotonicity constraint. To understand how to do so, several things must be explained.

A sufficient condition for

From Eq. (

As seen from Eqs. (

To obtain larger values of

Following the reasoning of the previous section, in order to obtain a larger value for

Summarizing these considerations, we have to increase the level of low-frequency contributions to

This leads to the overall algorithm listed in Fig.

For the RFI algorithm to be applicable, two conditions must be met: (1) a linear response exists for sufficiently weak perturbation and, (2) in addition to the response experiment, a control experiment is also available. The assumptions needed for the successful application of the algorithm are
summarized in Table

Summary of assumptions underlying the RFI algorithm.

In application to real data, the presence of noise and nonlinearities may complicate the recovery of linear response functions. Therefore, by using artificial data generated from a toy model, in the present section we analyze the robustness of the RFI method in the presence of such complications. Robustness for real data is studied in Part 2.

As a toy model we take

Here the matrix

To complete the description of the toy model, one has to specify its parameters. For the dimension

Experiments considered in this study. Forcings are shown in Fig.

Forcings for the experiments considered in this study. To standardize the type of experiments considered here and in Part 2, we select forcing functions that mimic those employed in climate change simulation experiments to whose data the RFI method is applied in Part 2. Note that in principle any type of forcing could be employed.

In our experiments we explore how

To apply the RFI method, we choose

To gain trust in the numerics of our implementation of the RFI method, we present in this section a technical test considering conditions under which it is known that the linear response function should be quite perfectly recoverable. Such ideal conditions are characterized by perfect linearity and
absence of noise. Hence we use the presented toy model (which is anyway linear) in the absence of noise (

Figure

Applying Eq. (

Demonstration of robust recovery for noise-free data from the toy model:

The presence of noise may severely hinder the detailed recovery of

To study how the quality of the recovery depends on the noise level, we introduce the signal-to-noise ratio (SNR) of the response data from a perturbation experiment as

Mean prediction error (Eq.

To demonstrate the dependence of the mean prediction error (Eq.

Demonstration of the operation of the RFI algorithm in the presence of noise using toy model data from a 1 % and control experiment. To demonstrate the relevance of the noise-level adjustment (step 3 from Fig.

In Fig.

How the estimation of the noise in the data and the resulting regularization affects the projection coefficients of the spectrum

It is important to note that in the situation of Fig.

Demonstration of the additional noise-level adjustment in the presence of a monotonicity constraint using toy model data from a 1 % and control experiment:

Finally in this section, we demonstrate that by accounting for monotonicity of the linear response function, one may obtain a better estimate of the low-frequency components of the noise whereby the recovery of the response function is improved. In
Fig.

This further adjustment is the purpose of step 6 of the RFI algorithm (see Fig.

The second difficulty in recovering the linear response function

To understand how contributions from nonlinearities affect the recovery of the response function, we write the nonlinear terms in Eq. (

Accordingly, the nonlinear contributions can be understood as an additional noise in the spectrum

However, for the RFI algorithm to give good results, a second condition is that the contributions from

All this is demonstrated in the following by toy model experiments. For this purpose, we artificially consider the response of the toy model not in

Mean prediction error (Eq.

In Fig.

Demonstration of how nonlinearities affect the recovery of the response function:

More insight into how nonlinearities affect the recovery is obtained from the more detailed SVD analysis shown in
Fig.

In the second row of Fig.

In the third row, we demonstrate for this type of nonlinearity that by accounting for monotonicity one can remove from the recovered solution all
components dominated by noise. For this purpose, we set the nonlinearity parameter to the same value as for the second row (

As a last test of the quality of the results given by the RFI method in application to the toy model, in this section we compare our method against
two existent methods in the literature to identify response functions in the time domain. The comparison is performed for the particular case where
the response function is known to be monotonic and also for the more general case where it is not. As a side issue, this section also reveals some insight into the relation between the quality of the recovery of

In climate science, the most commonly used method is to obtain

The second method consists of deriving the linear response function from a step response, i.e., the response to a Heaviside-type perturbation

These two methods therefore share two limitations: first, they require a special perturbation experiment; second, because of noise in the data they might yield a response function with large errors. In principle, the second limitation may be overcome by using instead of a single response the ensemble average over multiple responses. However, this comes at the expense of the numerical burden of performing multiple experiments, which is especially large when dealing with complex models such as state-of-the-art Earth system models.

The main advantages of the RFI method lie precisely in overcoming these two limitations: it recovers the response function from any type of perturbation experiment and automatically filters out the noise by regularization.

For the results of this section, we performed ensembles of 200 simulation experiments with the toy model (see
Sect.

We computed the response function by the pulse and step method as follows. For a pulse experiment the forcing is

Therefore, for the pulse method we took the response from the pulse experiment and obtained the response function by

The recovery by the step method was calculated by taking the response from the step experiment and applying Eq. (

To obtain comparable results with these two methods, we recovered the response function by the RFI method from the same pulse and step experiments. To compare the quality of the results using also an experiment not decidedly tailored for the identification, we include additionally the recovery from the 1 % experiment.

To obtain a quantitative comparison for the quality of the recovery for each method, we define the recovery error:

Quality of response function recovery by the full RFI method (including step 6 in Fig.

First, we compare the pulse and step methods against the full RFI algorithm, i.e., the RFI algorithm taking monotonicity into account (step 6 in Fig.

In the second row, we compare results by taking only a single response for the recovery. Since the quality of the recovery by the different methods
may vary depending on the particular noise realization, we again performed 200 simulations to obtain better statistics but this time deriving the linear response function for each ensemble member separately. Figure

This difference can be better understood as follows (see

Overall, the analysis of Fig.

Second, by taking only a single response – and not the ensemble average – the full RFI algorithm gives on average smaller recovery and prediction errors than the pulse and step methods when comparing results obtained from the same experiment.

Quality of response function recovery by our RFI method excluding step 6 in Fig.

However, the results above cover only the case where the full RFI algorithm is employed. In the following, we also analyze the case where monotonicity is not taken into account. For this purpose, we repeated in full detail the exercise that led to Fig.

Yet the improvement brought by the additional noise-level adjustment is clear when looking at the recovery error for the 1 % experiment. Compared to Fig.

Nevertheless, we find that, although extreme, such poor recoveries are not frequent. In fact, extreme cases with recovery error

Existent methods to identify linear response functions from data require tailored perturbation experiments. Here, we developed a method to identify
linear response functions from data using only information from an arbitrary perturbation experiment and a control experiment. The RFI method addresses the ill-posedness inherent to the identification problem by applying Tikhonov–Phillips regularization. The regularization parameter is computed by
the discrepancy method, which involves the estimation of the noise level. For this purpose, we take advantage of information given by a spectral
analysis of the perturbation experiment and by the control experiment. Assuming that the Picard condition holds, we estimate from the perturbation
experiment the high-frequency components of the noise. Then, assuming that the spectral distribution of noise is approximately the same for the
perturbed and control experiments (spectral similarity assumption), we estimate from the control experiment the low-frequency components of the
noise. The obtained noise-level estimate can be further adjusted if the linear response function is known to be monotonic. The robustness of the method in the presence of noise and nonlinearity was demonstrated in Sect.

As discussed in Sect.

The main novelty of the method is the estimation of the noise level (steps 1–3 of Fig.

Because our noise-level estimation is not particularly related to the problem of identifying response functions, it can in principle be applied to solve also other types of linear ill-posed problems

A problem of the type

Data from a situation similar to the control experiment, where

The singular values of

Then, as long as both the Picard condition and the spectral similarity assumption hold, the method gives a reasonable noise estimate – since then, by
assumption, the noise estimate is simply a scaling of the noise in the control experiment (see Sect.

While the Picard condition is necessary for a solution to be recoverable from an ill-posed problem, the validity of the spectral similarity assumption
is less clear. An intuitive explanation for this assumption can be thought as follows. Since here the interest lies in identifying linear response
functions, the perturbation to the system must be sufficiently weak so that the response can be considered linear. If the noise in the control
experiment depends on the perturbation, a sufficiently weak perturbation will modify its characteristics only slightly. The RFI method accounts
partially for this change by adjusting the overall level by which the noise increases. Nevertheless, it assumes that since the characteristics of the
noise change only slightly, then the spectral components of the noise in the perturbed experiment can be thought of as having the same relative contributions as those in the control experiment. When in addition the response function is known to be monotonic, the estimate of the noise can be
further improved (step 6 of Fig.

Although it is assumed that

In the present paper the robustness of our method has been investigated only for artificial data taken from toy model experiments. In this analysis, we not only knew the “true” response function underlying the data, but also had control over the two complications that may hinder its recovery, namely the level of background noise and nonlinearities. Under these ideal conditions, we could carefully examine the quality of the response functions identified by our RFI method. Nevertheless, such conditions are hardly met in practice. Therefore, the applicability of our method must be investigated as well for real problems. Such an investigation is presented in Part 2 of this study.

In this Appendix we show that Eqs. (

A Fredholm equation of the first kind is an equation of the type

Clearly, by setting

That Eq. (

Since Eq. (

Now, entering Eq. (

This Appendix complements Sect.

We start by defining the nondimensional timescale

Due to the wide range of timescales of the systems of interest such as climate and the carbon cycle (Part 2 of this study), calculations are facilitated if the timescales are evenly distributed at a logarithmic scale. To do so, the following change of variables is performed in
Eq. (

Thus, Eq. (

A convenient choice for the reference value is

For convenience of notation we use simply

For the discretization the support of

Taking a constant step

Naming

Plugging Eq. (

Assuming constant steps

In order to simplify the notation, Eq. (

This Appendix is referred to in Sect.

Let

Since

In this Appendix it is shown how the linear response function and the noise terms are computed in Sect.

We first demonstrate how to obtain the linear response function. Plugging Eq. (

Therefore,

Now, by taking

To define the combined noise

Then, the noise term consists of the remaining terms of the nonlinear response

Hence, the combined noise is given by

In this Appendix, it is shown that as long as the extent and resolution of the discrete distribution of timescales approximate the spectrum sufficiently densely, the derived spectrum

Response function

Response function

Response function

Response function

Response function

Figures

Response function

Response function

Response function

Response function

Response function

Response function

The scripts employed to produce the results in this paper as well as information on how to obtain the underlying data can be found at

The ideas for this study were jointly developed by all the authors. GLTM conducted the study and wrote the first draft. All the authors contributed to the final manuscript.

The authors declare that they have no conflict of interest.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank Andreas Chlond, two anonymous referees, and Valerio Lucarini for very helpful suggestions on the manuscript.

The article processing charges for this open-access publication were covered by the Max Planck Society.

This paper was edited by Ilya Zaliapin and reviewed by three anonymous referees.