Identiﬁcation of linear response functions from arbitrary perturbation experiments in the presence of noise Part I. Method development and toy model demonstration

. Existent methods to identify linear response functions from data require tailored perturbation experiments, e.g. impulse or step experiments. And if the system is noisy, these experiments need to be repeated several times to obtain a good statistics. In contrast, for the method developed here, data from only a single perturbation experiment at arbitrary perturbation is sufﬁcient if in addition data from an unperturbed (control) experiment is available. To identify the linear response function for this ill-posed problem we invoke regularization theory. The main novelty of our method lies in the determination of the 5 level of background noise needed for a proper estimation of the regularization parameter: This is achieved by comparing the frequency spectrum of the perturbation experiment with that of the additional control experiment. The resulting noise level estimate can be further improved for linear response functions known to be monotonic. The robustness of our method and its advantages are investigated by means of a toy model. We discuss in detail the dependence of the identiﬁed response function on the quality of the data (signal-to-noise ratio) and on possible nonlinear contributions to the response. The method development 10 presented here prepares in particular for the identiﬁcation of carbon-cycle response functions in Part II of this study. But the core of our method, namely our new approach to obtain the noise level for a proper estimation of the regularization parameter, may ﬁnd applications in solving also other types of linear ill-posed problems.


Introduction
To gain understanding of a physical system it is very helpful to know how it responds to perturbations. Considering a small 15 time-dependent perturbation f : R → R, the resulting time-dependent response R : R → R can from a very general point of view be written as where the linear response function χ : R → R is a characteristic of the considered system. In fact, under a number of assumptions -among which smoothness and causality are the most important -, Eq. (1) is the first term of a functional expansion of the 20 response R into the perturbation f around the unperturbed state f (·) = 0, known as Volterra series (Volterra, 1959;Schetzen, 2010). In this framing, the key to gain insight into the system is the linear response function χ : By knowing this function one has at hand not only a powerful tool to predict the response for sufficiently small but otherwise arbitrary perturbations, but also a means to study the internal dynamic modes of the unperturbed system by analyzing the temporal structure of the response function. 25 Linear response functions have been successfully applied within different contexts in many fields of science and technology.
Yet another perspective is that from engineering sciences, in which the impulse response -that to a large extent corresponds to the linear response function -and the closely related transfer function (or system function) characterize linear time-invariant (LTI) systems, widely applied in fields such as signal processing and control theory (Kuo, 1966;Rugh, 1981;Beerends et al., 2003;Boulet and Chartrand, 2006). Regardless of which viewpoint a particular community takes to investigate the linear 40 response of a system, a fundamental step in this investigation is the identification of the appropriate linear response function, the topic of the present study.
From a theoretical point of view, the existence of a linear response is by no means obvious: Structurally stable dynamical systems are the exception (Abraham and Marsden, 1982) so that already small parameter changes typically lead to topological changes in their sets of stable and unstable solutions. Not every such bifurcation must prevent a linear response in macroscopic principle be computed by the inverse Laplace transform 90 where L{·} is the Laplace transform operator. In fact, a first step towards the derivation of χ (t) from the general Eq. (1) was taken by Pongratz et al. (2011), although the problem was not systematically discussed.
Deriving χ (t) from perturbation experiment data is not a trivial problem. For the general case where the perturbation is different from a Dirac delta-type function, the problem is ill-posed (e.g., Bertero, 1989;Landl et al., 1991;Lamm, 1996;Engl et al., 1996). This basically means that attempts to recover the exact χ (t) yield a solution with large errors due to an 95 amplification of the noise in the data. On the other hand, when f (t) is a Dirac delta-type function with sufficiently small perturbation strength so that the response can be considered linear, the impulse response gives directly the linear response function, i.e. χ (t) = R(t). But even in this case noise may hinder the recovery: Because the perturbation is only one "pulse" with small perturbation strength, the response may have a too low signal-to-noise ratio because of internal variability (Joos et al., 2013), giving once more a recovery with large errors.

100
To remedy these noise problems, a method intended to "damp" the noise in the response is usually employed. In MacMartin and Kravitz (2016), a step experiment with large perturbation strength is used to obtain a better signal-to-noise ratio in the response, but at the cost of enhancing the effect of nonlinearities. An alternative approach is employed by Ragone et al. (2016) and Lucarini et al. (2017), who employ an ensemble of simulation experiments and take the ensemble averaged response so that the level of noise is reduced. But especially for complex models such as Earth System Models, ensembles of simulations 105 can be computationally extremely expensive so that such a procedure may not be feasible.
Instead of trying to improve the signal-to-noise ratio of the data by improved experiment design, here we are interested in deriving χ (t) from a single realization of a given experiment by accounting for the ill-posedness of the problem. For this purpose, we employ regularization theory. Although this theory offers a variety of methods to solve ill-posed problems (see e.g. Groetsch, 1984;Bertero, 1989;Bertero et al., 1995;Engl et al., 1996;Hansen, 2010), currently no general all-purpose 110 method exists. Typically, methods rely on some type of prior information about the problem (Istratov and Vyvenko, 1999).
Hence, they must be tailored according to the particularities of each application. Here, we develop a method that under certain assumptions solves the ill-posed problem when in addition to the data from a single arbitrary perturbation experiment also data from an associated unperturbed -or control -experiment are given to obtain independent information about the noise level (section 3). First, we assume that the response function is well approximated by a sum of decaying exponentials, meaning 115 that potential oscillatory contributions to the response function are so small that they can be considered as part of the noise.
The response function is obtained by applying Tikhonov-Phillips regularization. The regularization parameter is chosen via the discrepancy method. An essential ingredient of the discrepancy method is the noise level, which is usually not known a priori. For this reason, we propose a method to estimate the noise level by taking advantage of the information given by a spectral analysis of the perturbation experiment and the control experiment. If the desired response function is known to be 120 monotonic, the noise estimate can be further adjusted. In section 4, the method is demonstrated to give reliable results under appropriate conditions of noise and nonlinearity. In section 5, we compare the derived method with two existent methods in the literature to identify the response function in the time domain. Results and technical details are discussed in section 6.
Additional calculations are shifted to appendices.
2 Linear response theory and basic ansatz of the method 125 As a preparation for introducing our method in section 3, in the present section we derive its basic ansatz, which takes into account in addition to the response formula (1) also the noise in the data. Depending on the application context, the noise may arise for different reasons, such as errors in the measurements, stochastic components in the system, etc. As will be seen, our basic ansatz is in principle applicable to all those cases. But to make the connection to modern applications of linear response functions that arise in the context of Ruelle's developments (e.g., climate), here we derive this ansatz starting from 130 considerations of linear response theory (Ruelle, 2009). Ruelle considered systems of type where x(t) is the possibly infinite dimensional state vector and the perturbation f (t) couples to the unperturbed system d dt x = A 0 (x) via the field A 1 (x). In the present context Eq. (3) could e.g. represent the dynamics of the Earth system perturbed by anthropogenic emissions f (t). Considering an observable Y (x), Ruelle proved that the ensemble average of its deviation from 135 the unperturbed system ∆Y can be expanded in the perturbation f (t): where the order symbol O(f 2 ) represents terms that vanish in the limit f (·) → 0 faster than the leading linear term. This expansion describes the response of a system that is noisy as a result of its chaotic evolution: Starting from different initial states one obtains different values for ∆Y (x(t)). Compared to Eq. (1), in Eq. (4) the linear response function does not describe the 140 response in observables directly but only in their ensemble average, i.e. in an average over the initial states of the unperturbed system. For the recovery of linear response functions from numerical experiments, this would mean that one had to perform many experiments starting from different initial states to obtain the appropriate ensemble averages. Using tailored perturbations experiments, it was demonstrated in several studies (e.g., Ragone et al., 2016;Lucarini et al., 2017;Bódai et al., 2020) that linear response functions can indeed be obtained in this way, but at the expense of a large numerical burden from the need to 145 perform many experiments. Instead, the aim here is to obtain the linear response functions from a given experiment and only from a single realization. Since we are dealing with a single realization, Eq. (4) becomes where η(t) is a noise term that must show up as a consequence of dropping the ensemble average. Accordingly, the noise η(t) is introduced here as the difference between the noisy response in a single realization ∆Y (t) and the response in the ensemble 150 average ∆Y (x(t)) (compare Eq. (4)). In addition, we assume linearity in the perturbation. As a consequence, the present study is based on the ansatz where now the response ∆Y (t) is divided into a deterministic term t 0 χ (t − s)f (s)ds and a noise term η(t).
The linearity assumption is by purpose: In the present approach to derive the linear response function (see next section), 155 hereafter called RFI method (Response Function Identification method), we first use Eq. (6) to obtain χ (t) and justify the linearity assumption a posteriori by analyzing how robustly the response can be recovered for different perturbation strengths.
Dropping the nonlinear terms has the advantage that one can use the corpus of linear methods to derive χ (t) from Eq. (6).
Note that in practice, however small the perturbation may be, the nonlinear terms do not vanish. Therefore, the contribution of nonlinearities is in this way distributed between χ (t) and η(t), which will be different from the previous χ (t) and η(t) in 160 Eq. (5). How strongly nonlinearities affect the numerical identification of χ (t) depends on the estimation of η(t), which is a crucial part of our RFI method and the main novelty introduced here to deal with the ill-posedness of the problem to identify In addition, although we derived Eq. (6) starting from considerations of linear response theory, it is clear that this ansatz can also be employed in any other context where it may be assumed that the response formula (1) applies and that the data is 165 contaminated by additive noise.

Identification of linear response functions from arbitrary perturbation experiments
In this section we derive the RFI method. As mentioned above, the aim of this method is to obtain the linear response function using data from a single realization of a given perturbation experiment. For this purpose, an essential step is our novel estimation of the noise term η(t), which requires additional data from an unperturbed (control) experiment.

170
Starting from the ansatz (6), the method is based on the idea that the noise term η(t) can be estimated using information on the internal variability from the control experiment in combination with a spectral analysis of the perturbation experiment. The identification of the linear response function proceeds as follows: First, we assume that the linear response function decays multi-exponentially, i.e. we take a functional form for χ (t). Second, Eq. (6) is discretized for application to the discrete set of time series data, which results in a matrix equation. Then, assuming that the solution obeys the Picard condition (see below), 175 we estimate the high-frequency components of the noise term η(t) in Eq. (6) via a spectral analysis of the matrix equation applied to the data from the perturbation experiment. Next, assuming that the spectral distribution of noise is similar in the control and in the perturbation experiment, we estimate also the low-frequency components of η(t). The final estimate of η(t) is then used in a regularization procedure to determine the regularization parameter and thereby find an approximate solution for χ (t). In case χ (t) is known to be monotonic, the approximated solution is further adjusted by checking for monotonicity.

180
In the first subsection, we introduce the assumption for the functional form of the linear response function. In subsection 3.2, we present the discretized problem. In subsection 3.3 we shortly review some elements of regularization theory employed in our method, in particular Tikhonov-Phillips regularization (subsection 3.3.1) and the discrepancy method (subsection 3.3.2).
In subsection 3.4 we present our noise estimation procedure by which the regularization parameter is determined. Finally, in subsection 3.5 we show how this procedure can be further improved in the presence of a monotonicity constraint.

Functional form of the linear response function
In general, the identification of linear response functions from data may be performed either pointwise (e.g., Ragone et al., 2016) or assuming a functional form (e.g., Maier-Reimer and Hasselmann, 1987). Both approaches usually lead to an ill-posed problem, and therefore to similar difficulties to find the solution (see more details in subsection 3.3.1). Although the RFI method may be applied in either case, here we assume that the response function consists of an overlay of exponential modes. By this 190 ansatz we guarantee from the outset that the response relaxes to zero for t → ∞, which is consistent with the expectation that real systems have finite memory. Besides constraining the function space for the derivation of the response function, another added value of this approach is that in principle it also gives the spectrum of internal time scales of the response.
where the τ i values are interpreted as characteristic time scales and the g i values are their respective weights. τ i and g i are then obtained by applying some fitting technique taking a fixed number of terms M . Thus, an important step in this type of 200 approach is to determine a suitable value for M . A common practice is to initially take only a small number of terms M , solve the problem and then add terms progressively, until the addition of a new term does not anymore improve the fit according to some quality-of-fit criterion (e.g., Kumaresan et al., 1984;Maier-Reimer and Hasselmann, 1987;Hasselmann et al., 1993;Pongratz et al., 2011;Colbourn et al., 2015;Lord et al., 2016). Thereby it is assumed that once results stabilize the information in the data has been already fully exploited so that fitting of additional terms would be artificial. Nevertheless, finding the 205 parameters τ i and g i either from a given χ (t) by Eq. (7) or from ∆Y (t) by inserting Eq. (7) into Eq. (1) means to solve a special case of a Fredholm equation of the first kind (see Appendix A), which is an ill-posed problem (Groetsch, 1984). This implies that even though the obtained solution may give a very good fit to the data, it may significantly differ from the exact solution (see e.g. the famous example from Lanczos, 1956, p. 272).
Therefore, to avoid the complication of determining M , we assume instead that the response function is characterized by a 210 continuous spectrum g(τ ) (Forney and Rothman, 2012): Accordingly, we assume that the response is dominated by relaxing exponentials, meaning that potential contributions from oscillatory modes are not distinguishable from noise. By this approach the time scale τ is not anymore an unknown, but is given after discretization by a prescribed distribution with M terms covering a wide range of τ i values. Thus, only a discrete 215 approximation to the spectrum g(τ ) needs to be found. In this way the functional representation is made independent from the question of information content as long as the spectrum of discrete time scales is chosen sufficiently large and dense to widely cover the spectrum of internal time scales of the considered system. This approach has an additional advantage. By prescribing the distribution of time scales one must not solve a nonlinear ill-posed problem (by solving Eq. (7) for τ i and g i ) but only a linear ill-posed problem (by solving Eq. (8) only for g(τ )), for 220 which the mathematical theory is fairly well developed (Groetsch, 1984;Engl et al., 1996). Because the problem is linear, the solution is even given analytically (see section 3.3.1), which makes the method very transparent.

Discretized problem
In view of applications to geophysical systems like the climate or the carbon cycle (Part II of this study) that are known to cover a wide range of time scales (Ghil and Lucarini, 2020;Ciais et al., 2013, Box 6.1), it is useful to switch to a logarithmic 225 scale (Forney and Rothman, 2012) by rewriting Eq. (8) in terms of log 10 τ : Hereafter, q(τ ) and its discrete version q (see below) will be called spectrum.
In order to apply the basic Eq. (6) together with Eq. (9) to experiment data, the whole problem needs to be discretized in time and also with respect to the spectrum of time scales. Here we assume the data to be given at equally spaced time steps 230 t k = t 0 + k∆t, k = 0, 1, . . . , N − 1, where N is the number of data, while the time scales are assumed to be equally spaced at a logarithmic scale between maximum and minimum values τ max and τ min , i.e.
log 10 τ j = log 10 τ min + j∆ log 10 τ, j = 0, 1, . . . , M − 1, with ∆ log 10 τ := log 10 τ max − log 10 τ min M where M is the number of time scales. As shown in Appendix B, the resulting discretized equations corresponding to Eq. (6) and Eq. (9) are and χ k = ∆ log 10 τ M −1 j=0 q j e −k∆t/τj , k = 0, ..., N − 1, where η k stands for the noise. Combining the response data ∆Y k , the spectral values q j , and the noise values η j into vectors ∆Y ∈ R N , q ∈ R M , and η ∈ R N , these equations can be written in vector form as with the components of matrix A given by The matrix A is known from the prescribed spectrum of time scales τ i and the forcings f i . Considering η as a fitting error, in principle one can apply standard linear methods to solve Eq. (13) for the desired spectrum by minimizing where || · || denotes the Euclidean norm, i.e. ||x|| = i x 2 i . Here we denoted the spectrum as q η instead of q to emphasize that the spectrum found in this way can only be an approximation to the original q depending on the noise present in the data.
Unfortunately, it turns out that solving Eq. (15) is not a trivial task. The first difficulty is that the finite information provided by the data makes the problem underdetermined: Ideally one wants to obtain a spectrum q(τ ) defined for τ ∈ [0, +∞[ , but the 250 data ∆Y is discrete and covers only a limited time span. However, the most serious issue in identifying χ (t) arises because Eq. (1) is a special case of a Fredholm equation of the first kind (Groetsch, 1984(Groetsch, , 2007, see also Appendix A), where the quest for the integral kernel is well-known to be an ill-posed problem (see e.g. Bertero, 1989;Hansen, 1992). This basically means that any solution q η of Eq. (15) obtained via classical numerical methods such as LU or Cholesky decomposition will be extremely sensitive to even small errors in the data (Hansen, 1992). Therefore, to solve Eq. (15) for the spectrum q η we 255 invoke regularization.

Regularization
As discussed above, the problem of solving Eq. (15) must be expected to be underdetermined and ill-posed. Strictly, the underdetermination cannot be overcome. But the ill-posedness can be treated: Our RFI method combines techniques from regularization theory with a novel approach to estimate the noise level in the data. To facilitate the understanding of the 260 method, in this section we briefly review these techniques along with some other aspects of the theory that are relevant for our method development.

Regularized solution
To deal with the ill-posedness, it is useful to perform a Singular Value Decomposition (SVD) of the matrix A: 265 with A ∈ R N ×M , U ∈ R N ×N , Σ ∈ R N ×M , and V ∈ R M ×M . Σ is a diagonal (not necessarily square) matrix with diagonal entries σ 0 ≥ σ 1 ≥ ... ≥ σ M −1 ≥ 0 known as singular values, and are orthonormal matrices with u 0 , u 1 , ..., u N −1 being the left singular vectors and v 0 , v 1 , ..., v M −1 the right singular vectors 270 of A. In practice, assuming that there is more data than prescribed time scales, i.e. N ≥ M the singular values σ i computed numerically are nonzero (see Golub and Van Loan, 1996, section 5.5.8). In this case, Eq. (15) has the unique solution (see Golub and Van Loan, 1996, Theorem 5.5.1) where • denotes the usual scalar product.

275
In practice, when a SVD is applied to a discrete version of a Fredholm equation of the first kind, the components of the singular vectors v i and u i tend to have more sign changes with increasing index i, as observed by Hansen (1989Hansen ( , 1990. This observation justifies that in the following we dub low-index terms in Eq. (19) as low-frequency contributions, and high-index terms as high-frequency contributions.
It is well-known that when applying solution (19) one encounters certain numerical problems. Regularization is a means to 280 handle these problems. These problems arise -even in the absence of noise -as follows. From the Riemann-Lebesgue lemma (see e.g. Groetsch, 1984) it is known that the high-frequency components of the data ∆Y (t) must approach zero. In the discrete case, by Hansen's observation this means that the projections u i • ∆Y should approach zero for increasing index values i. But due to machine precision or the noise η contained in ∆Y , numerically the absolute values |u i • ∆Y | do not approach zero but settle at a certain non-zero level for large i or, in the presence of noise, may even increase. Due to the ill-posedness also 285 the singular values σ i in the denominator of Eq. (19) tend to zero so that these high-frequency contributions to q η are strongly amplified. Hence applying Eq. (19) naively would not give a stable solution for q η because its value would depend critically on numerical errors and the noise present in the data.
Regularization remedies this problem by suppressing the problematic high-frequency components. This approach assumes that the main information on the solution is contained in the low-frequency components so that the high-frequency contributions 290 to the sum (19) can be ignored. This assumption is consistent with the very nature of ill-posed problems because in such problems information on high frequencies is anyway supressed so that only low-frequency components of the solution are recoverable (Groetsch, 1984, section 1.1).
To perform such filtering, we employ the Tikhonov-Phillips Regularization method (Phillips, 1962;Tikhonov, 1963)although the method is most famously known as Tikhonov regularization, here we choose this different name in recognition 295 of Phillips earlier work (see Groetsch, 2003). Besides being mathematically well-developed (see e.g. Groetsch, 1984;Engl et al., 1996), the Tikhonov-Phillips Regularization method gives an explicit solution in terms of the SVD expansion, which allows for a clear interpretation of the filtering. In addition, it provides a smooth filtering of the solution, in contrast to the also well-known Truncated Singular Value Decomposition method (Hansen, 1987). For additional regularization methods, see e.g. Bertero (1989), Bertero et al. (1995) and Palm (2010).

300
The standard Tikhonov-Phillips Regularization yields the regularized solution in the simple form (Hansen, 2010;Bertero, 1989) where the f i (λ) are the filter functions

305
By adding the filter function indeed the high-frequency components are suppressed: With λ properly chosen, at large index i, where σ 2 i λ, f i (λ) approaches σ 2 i /λ so that the terms under sum sign are proportional to σ i meaning that the terms for large i do not contribute significantly to the sum. In contrast, for small i, λ σ 2 i so that f i (λ) is about 1 and the terms under the sum are almost unchanged. In this way the filter function indeed selects only the low-frequency components. Therefore, now the problem boils down to determining λ (see next section). Once λ is determined, the solution q λ is obtained by Eq. (20) and 310 the desired linear response function χ (t) finally follows from Eq. (12).

Determining the regularization parameter λ
By construction it is clear that q λ as computed from Eq. (20) strongly depends on the regularization parameter λ. Accordingly, much effort has been put in developing methods to determine suitable values for λ (e.g., Engl et al., 1996;Hansen, 2010). Of special interest are methods that give solutions converging with decreasing noise level to the "true" solution. One such method 315 known to conform to this condition while uniquely determining the regularization parameter has been proposed by Morozov (1966). His discrepancy method is based on the idea that the solution to the problem allows the data to be recovered with an error of the magnitude of the noise (Groetsch, 1984): Let δ denote an upper bound of the noise level ||η||, i.e. δ ≥ ||η||. Then, λ should be chosen such that the discrepancy matches δ, i.e.
320 Groetsch (1983) motivates the choice of this method by demonstrating that determining λ from Eq. (22) minimizes a natural choice for an upper bound of the error in the solution given by regularization. Unfortunately, in practical applications the noise level δ is usually not known. To try to solve this problem for the application of interest, in the next section we propose an approach to estimate δ.

Estimating the noise level δ 325
To introduce our approach, in the following we assume that data from an unforced experiment (control experiment) are available -as is typically the case in applications to Earth System Models (see Part II) -that allow for an independent estimate of the noise level δ.
A naive way to invoke these data to determine λ would be to take δ essentially as the standard deviation of the control experiment -more precisely: δ := σ √ N = ||∆Y ctrl ||. Technically, to find λ, one way is to start with a large value for λ and 330 decrease it until the left hand side of Eq. (22) matches δ (as suggested by Hämarik et al., 2011). That this procedure works is explained by the fact that the function λ → ||Aq λ − ∆Y || is continuous, increasing and contains δ in its range (Groetsch, 1984, Theorem 3.3.1). Having found λ in this way, the desired solution q λ is then obtained from Eq. (20). But this approach is not as straightforward as one may think: Because of the forcing, the noise in the perturbed experiment may have different characteristics from that in the control experiment. Therefore in the following we devise a method how to account for this 335 problem.
Formally in Eq. (13) ∆Y is split into a "clean" part and noise η. Entering this into Eq. (19) gives Accordingly, the first term in the sum gives the "true" solution q while the second term gives the noise contribution to q η .
As already pointed out when discussing regularization, the "true" solution of ill-posed problems can only be recovered if it is 340 dominated by the projection onto the first singular vectors. This requirement is formally stated by the discrete Picard condition (Hansen, 1990), which demands that the size of the projection coefficients |u i • Aq| drops sufficiently fast to zero so that they get smaller than σ i before σ i levels off to a finite value because of numerical errors. To find a good estimate for the noise level δ we use this in the following way. Let i max be the value of the index i where the singular values σ i start to level off. Assuming that the Picard condition holds, one can infer that Therefore, This equation determines the high-frequency components of the noise η. It remains to determine also the low-frequency components to obtain an estimate for δ.

350
For this purpose, we take advantage of the data from the control experiment. The control experiment is an experiment performed for the same conditions as the perturbed experiment, with the only difference that the forcing f is zero so that the resulting ∆Y ctrl can be understood as pure noise; therefore we write η ctrl := ∆Y ctrl . While in the forced experiment the low-frequency noise is obscured by the low-frequency response induced by the forcing, the low-frequency part of the control experiment data can to first order be expected to give an estimate of the low-frequency noise present in the forced experiment.

355
Nevertheless, it is clear that due to the forcing the spectral characteristics of noise may be different in the forced and unforced experiments. More precisely, the spectrum of noise may differ in its overall level and spectral distribution (i.e. the "shape" of the spectrum). In the following, we account for a possible difference in the overall level. However, we will assume that the spectral distribution is approximately the same for η ctrl and η; we call this the spectral similarity assumption.
After these considerations, λ can be determined as follows: Take i max as the last index i before the plateau σ i ≈ 0. This i max distinguishes high-frequency (i > i max ) from low-frequency (i ≤ i max ) components. Then are the levels of high-frequency noise in the perturbed (see Eq. (25)) and in the control experiment, respectively. We now scale the spectral components of η ctrl so that its high-frequency level matches the high-frequency level of ∆Y : In this way, the magnitude of the high-frequency components of η matches that of ∆Y , and because of Eq. (25) also that of η. On the other hand, the spectral distribution of η is the same as for η ctrl , and by the spectral similarity assumption approximately the same as for η. Because η and η have similar spectral distributions, the fact that the magnitude of the highfrequency components of η matches that of η implies that also the magnitude of their low-frequency components matches.

370
Therefore, η can be seen as an estimate of the noise η in the perturbed system not only at high but also at low frequencies.
Hence this corrected noise vector η can be used to obtain an estimate of the noise level of the perturbed system by setting Compared to taking for δ simply the noise level from the unperturbed experiment (as was insinuated above), taking it in this scaled way assures that the high-frequency components are consistent with the Picard condition that must hold for q to be 375 recoverable from the ill-posed problem tackled here. Having determined δ, λ can now be computed from Eq. (22) as described above, from which the q follows (Eq. (20)) and hence χ (t) (Eq. (12)).

Additional noise level adjustment in the presence of a monotonicity constraint
In the application to the land carbon cycle in Part II of this study we show that certain response functions χ (t) decrease monotonically to zero. In attempts to recover such response functions by employing the noise level adjustment described in 380 the previous section, it may turn out that the numerically found response function fails to be monotonic. There may be several reasons for this failure (strong nonlinearities, signal too obscured by noise, etc.). But one additional reason may be that the lowfrequency level of the noise was not properly estimated by assuming that the spectral distribution in the unperturbed experiment reflects the distribution in the perturbed experiment. For such cases one may try to improve the result by further adjustment of the low-frequency noise level to obtain a more reasonable result.

385
The idea is to adjust the low-frequency components of noise independently of the high-frequency components iteratively until the solution obeys the monotonicity constraint. To understand how to do so, several things must be explained: 1. A sufficient condition for χ (t) being monotonic is that all components q i have the same sign (see Appendix C). Therefore, starting out from a numerical solution for χ (t), it would develop towards monotonicity if one could come up with a sequence of vectors q λ having less and less sign changes. From Eq. (20) it is seen that because of Hansen's observation explained in section 3.3.1, that singular vectors v i are less noisy for lower i, q λ has fewer sign changes the fewer v i contribute to the sum.
3. As seen from Eq. (20) and Eq. (21), this is the case the more components the filter function is suppressing, i.e. the larger the value of λ.
4. To obtain larger values of λ, one sees from the discrepancy method (22) that one has to increase δ. The proof for this can be 395 found in Groetsch (1984) (Theorem 3.3.1) but it can also be made plausible as follows: Starting from λ = 0, q λ = q η , which is the solution of the minimization problem (15). Hence, for λ = 0 the discrepancy in the left-hand side of Eq. (22) is minimal.
By increasing λ one decreases all components of q λ (Eq. (20)), increasing thereby the discrepancy.
5. Following the reasoning of the previous section, in order to obtain a larger value for δ one must increase the noise level ||η || (compare Eq. (29)). In doing so, one must keep the high-frequency components of ||η || unchanged because they must keep 400 matching the level of the high-frequency components of the noise in the perturbed experiment η (given by Eq. (25)). Hence, to increase δ one sees from Eq. (29) that this is achieved by scaling up the low-frequency components of ||η ||.
Summarizing these considerations, we have to increase the level of low-frequency contributions to η to develop a given solution for χ (t) towards monotonicity.
This leads to the overall algorithm listed in Fig. 1. The first five steps have been already explained at the end of section 405 3.3.2. To account for monotonicity the additional step 6 combined with the loop back to step 4 has to be iteratively executed.
To enhance the low-frequency noise level as explained above, we calculate in step 6 a new noise vector η new by keeping the high-frequency part from η and enhancing its low-frequency components by a factor c > 1. Then we recompute χ (t) from steps 4 and 5 and once more check for monotonicity.
For the RFI algorithm to be applicable, two conditions must be met: 1) a linear response exists for sufficiently weak pertur-410 bation; and 2) in addition to the response experiment also a control experiment is available. The assumptions needed for the successful application of the algorithm are summarized in Table 1. 5. If the response function χ (t) is known to be monotonic: The correct response function χ (t) can be recovered by iteratively adjusting the noise level estimate η to account for monotonicity. Subsection 3.5 1. Take η ctrl := ∆Y ctrl from the control experiment.
2. Determine i max as the last i before the plateau σ i ≈ 0.
This is the final result except a monotonicity should be accounted for. In that case the algorithm proceeds as follows: 6. Check if the resulting χ (t) decays monotonically to zero. If so, we are done. Else, enhance the low-frequency noise level by setting where c is some value larger than 1. Then set η := η new and repeat calculations starting from step 4.

Applicability in the presence of noise and nonlinearities
In application to real data the presence of noise and nonlinearities may complicate the recovery of linear response functions.
Therefore, by using artificial data generated from a toy model, in the present section we analyze the robustness of the RFI 415 method in the presence of such complications. Robustness for real data is studied in Part II.

Toy model and artificial experiments
As toy model we take Here the matrix M ∈ R D×D describes the relaxation of the unperturbed model. The second right-hand side term represents 420 the deterministic forcing constructed from the time-dependent forcing strength f : R → R and the coupling vector a ∈ R D . Additionally, the system is perturbed by the stochastic forcing n : R → R D , which for simplicity is assumed to be white noise.
To make the relation to the carbon cycle considered in Part II, the components of x may be understood as the carbon stored in plant tissues and soils at the different locations worldwide, so that the observable Y (t) := i x i (t) is the analogue of globally stored land carbon. The solution of the system is 425 We assume from the outset M to be diagonal with eigenvalues −1/τ * i , the τ * i being the relaxation time scales. Then with the linear response function χ * (t) and the noise term η * (t) given by To complete the description of the toy model one has to specify its parameters. For the dimension D we take 70 and the time scales are assumed to be distributed logarithmically between τ * min = 0.01 and τ * max = 1000, i.e. τ * i = 0.01 × 10 i∆ log τ with ∆ log τ = (log 10 10 3 − log 10 10 −2 )/70. With carbon cycle applications in mind, the distribution of the components of the coupling vector is adapted from the log-normal rate distribution found by Forney and Rothman (2012) for the decomposition 435 of soils: with µ and σ chosen so that the peak time scale is around τ = 5 and the limits of the log-normal distribution are approximately within τ = 0.1 and τ = 200 (see "true" spectrum in Fig. 3). The components of n are are taken as uncorrelated, i.e. n i (0)n j (t) = ξδ ij δ(t), with standard deviation ξ being chosen differently in different experiments.

440
In our experiments we explore how Y (t) behaves as a function of the forcing f (t). To this end, we choose a forcing function f (t) (see Table 2 and Fig. 2). The most obvious way to perform the toy model experiments would be to integrate Eq. (30). But to have a better control over the noise it is for our purpose more appropriate to use the analytical solution (32)-(34). Hence, we numerically integrate Eq. (32), using the representation (33) and (34). The data from these experiments are then used to investigate the performance of the RFI algorithm to recover χ * (t). Since all a i values are non-negative, the response function 445 (33) is monotonic, so that we apply the extended version of the algorithm (see Fig. 1 including step 6). In all experiments we generate N = 140 data points, to have a time series of similar length as in the climate change simulations analyzed in Part II (140 years, one value for each year). To apply the RFI method also the noise from an associated control experiment is needed. This is obtained from Eq. (34) by using another realization n i (t) of white noise for each system dimension i.
450 Table 2. Experiments considered in this study. Forcings are shown in Fig. 2. To standardize the type of experiments considered here and in Part II, we select forcing functions that mimic those employed in climate change simulation experiments to whose data the RFI method is applied in Part II. Note that in principle any type of forcing could be employed.  To standardize the type of experiments considered here and in Part II, we select forcing functions that mimic those employed in climate change simulation experiments to whose data the RFI method is applied in Part II. Note that in principle any type of forcing could be employed.

Choice of parameters for the RFI method
To apply the RFI method, we choose M = 30 time scales for the recovery of χ * . Using τ min = 0.1 and τ max = 10 5 we distribute the spectrum of time scales according to Eq. (10). These parameters are also used for the application on the carbon cycle in Part II and for the comparison with previous methods in section 5.

Ideal conditions
To gain trust in the numerics of our implementation of the RFI method we present in this section a technical test considering conditions under which it is known that the linear response function should be quite perfectly recoverable. Such ideal conditions are characterized by perfect linearity and absence of noise. Hence we use the presented toy model (which is anyway linear) in the absence of noise (n = 0) for this test. Actually, this will not be a full test of the algorithm, but only of the implementation of its basic apparatus (sections 3.2 and 3.3.1) culminating in Eq. (20) since in the absence of noise the method to determine the 460 regularization parameter λ (sections 3.3.2 and 3.5) is not applicable. One might think that in the absence of noise one could use Eq. (19) to determine the linear response function, but even under such ideal conditions the ill-posedness of the problem calls for regularization to suppress the numerical noise that prevents from obtaining a sensible solution from Eq. (19) (see discussion in the paragraph after Eq. (19)). But choosing the small value of λ = 10 −8 for the regularization parameter when evaluating Eq. (20) is sufficient for this technical test.

465
Figure 3(c) shows the response of the noiseless toy model to the forcings shown in Fig. 2, i.e. we performed the experiments listed in Table 2, although for the present test the control experiment is not needed.
Applying Eq. (20) to the experiment data gives the spectrum q λ shown in Fig. 3(a). Here, we derived the spectrum q λ for each experiment separately, although in the figure only single dots are seen, because all results coincide so closely and are almost indistinguishable from the "true" solution q * as was expected for this ideal case. The next Fig. 3(b) shows the response 470 function obtained from the spectra q λ using Eq. (12). Obviously from Fig. 3(a) the "true" response function is reconstructed perfectly from whatever experiment used. As a final test we predict using in Eq.
(1) the response function obtained from the 1% experiment the responses of other experiments. And indeed, these predicted responses are indistinguishable from the responses obtained directly from the experiments (see Fig. 3(c)). This latter result demonstrates perfect robustness of the numerical approach to recover the responses in this ideal case.

First complication: noise
The presence of noise may severely hinder the detailed recovery of χ * (t) due to the ill-posed nature of the problem. In order to demonstrate the effect of the addition of noise on the quality of the derived χ (t), we define a relative error for the prediction of the responses from different experiments. Consider a particular experiment -which is in our case the 1% experimentfrom which we have obtained by the RFI method the response function, which we call here χ 0 (t). The relative error for the 480 prediction of the response from an experiment "k" by the recovered χ 0 (t) via the convolution (1) is where stands for the discrete form of the convolution operation (1) used to predict the responses, i.e. ∆t i χ 0 j−i f k i . In the following we denote ε 0 k the prediction error for the experiment "k". To measure the quality of the prediction across multiple experiments we define also the mean prediction error where K is the number of predicted responses. The reader may wonder why we quantify the quality of the recovery only indirectly from the responses found in different experiments and not directly from the recovery of χ (t). The reason is that in real applications the "true" χ (t) is not known but the responses are. The reliability of this indirect measure for the quality of the recovery is discussed in section 5.

490
To study how the quality of the recovery depends on the noise level we introduce the signal-to-noise ratio (SNR) of the response data from a perturbation experiment as where δ is the final noise level estimate obtained by the RFI method, as described in section 3.3.2 (see Eq. (29)).
To demonstrate the dependence of the mean prediction error (37) on the SNR, we performed 1% experiments using different 495 noise levels. The resulting dependence is shown in Fig. 4. As expected, for a small error a sufficiently large SNR is needed, i.e. a good recovery may be hindered by a too large noise level.
In Fig. 5 we demonstrate how the overall noise level adjustment in step 3 of the RFI algorithm (see Fig. 1) affects regularization to recover the correct response function. To guarantee that the overall level of the noise spectrum is indeed substantially different in the control and perturbed experiment (so that the adjustment is really needed), we take for the noise in the control 500 experiment a standard deviation ten times smaller than that for the noise in the perturbed experiment. To demonstrate how the adjustment works it is helpful to consider the so-called "Picard plot". This type of plot was originally introduced to analyze the spectral characteristics of an ill-posed problem (see e.g. Hansen, 1992). In Fig. 5(a) we show the Picard plot for data obtained from a 1% experiment with the toy model using a SNR ≈ 520 to assure a good recovery. The singular values σ i decrease to extremely small values as the index i increases. This demonstrates that indeed the problem to solve for the response function is 505 ill-posed and therefore regularization is needed for its solution (compare Eq. (19) with Eq. (20)). The data labelled by |u i • η| are the "true" noise coefficients, obtained by subtracting the "clean" response Aq, known analytically from the toy model description, from the noisy toy model response ∆Y . Comparing them to the projection coefficients of the response |u i • ∆Y | one sees that with exception of the first few coefficients the response is dominated by its noise content. Accordingly, only the information contained in these first few coefficients is recoverable from this ill-posed problem whatever method is used. The 510 data labelled by |u i • η est | have been added to the Picard plot to demonstrate how the RFI algorithm operates: These data are the projection coefficients of the estimated noise content in the data, where η est is the final value of η obtained by the RFI method. Obviously, the RFI algorithm correctly estimates the "true" noise level not only at high frequencies -where it is correct by the noise level adjustment in step 3 of the RFI algorithm (see Fig. 1) -but also at low frequencies, where it is predicted from the adjusted low-frequency components of the control experiment (also step 3). Acordingly, in this case the 515 spectral similarity assumption holds and there is no need to further adjust the noise level (step 6).
How the estimation of the noise in the data and the resulting regularization affects the projection coefficients of the spectrum q can be seen in Fig. 5(b): Only those few coefficients not dominated by noise contribute to the regularized solution. In this case these few coefficients selected by determinining the regularization parameter λ from the noise level are sufficient for an almost perfect recovery of the response function, as seen in Fig. 5(c).

520
It is important to note that in the situation of Fig. 5 where the overall noise level differs considerably in the control and in the perturbed experiment, a naive noise estimate taken from the control experiment without the adjustment in step 3 (as first suggested in section 3.3.2) would severely underestimate the noise actually in the data. This would in turn lead to an underestimation of the regularization parameter (see Groetsch, 1984, Theorem 3.3.1). As a result, the wrong filtering by regularization would leave projection coefficients dominated by noise in the solution, likely leading to large errors in the recovered response 525 function. This example therefore demonstrates the relevance of the noise adjustment in step 3.
Finally in this section, we demonstrate that by accounting for monotonicity of the linear response function one may obtain a better estimate of the low-frequency components of the noise whereby the recovery of the response function is improved. In the recovery when the additional noise level adjustment was not used. Because the spectral similarity assumption does not hold, the estimated low-frequency components of the noise |u i •η est | do not match those of the "true" noise |u i •η| (subfigure (a 1 )).
Ideally, only those four projection coefficients of the data |u i • ∆Y | which are larger than the projection coefficients of the "true" noise |u i •η| should contribute to the recovered response function. Instead, as seen in subfigure (b 1 ), the coefficients with index between i = 4 and i = 7 give the dominant contributions because they are larger than the estimated noise coefficients 535 |u i • η est | (compare subfigure (a 1 )). Therefore, the recovery of the response function is poor (subfigure (c 1 )). But since in this case the low-frequency components of noise are such that the recovered response function is non-monotonic although the "true" response function is known to be monotonic, one may further adjust the noise level to improve the results.
This further adjustment is the purpose of step 6 of the RFI algorithm (see Fig. 1). Its effect is demonstrated by the second-row plots of Fig. 6: The estimated noise components match now better the "true" noise components that had been underestimated in  (20); (c) "true" and recovered linear response functions. Since the RFI algorithm correctly adjusted the noise level to the "true" noise in the data, the resulting regularized solution has contributions only from the first few projection coefficients which are not completely obscured by noise. Overall, the recovery is almost perfect, because the SNR (chosen as about 520) is still sufficiently good and because the noise was chosen to conform with the spectral similarity assumption. The regularization parameter determined by the algorithm is λ ≈ 30364. Because the noise level adjustment (step 3 from Fig. 1) already gave a good estimate to the "true" noise in the data, no monotonicity check was needed (step 6 from Fig. 1).
the first row (compare subfigures (a 2 ) and (a 1 )) so that only those four components that carry information (compare in subfigure (a 2 ) the projections |u i • ∆Y | for low index i with |u i • η|) survive the regularization (subfigure (b 2 )). As a result, the quality of the recovery of the response function has considerably improved (subfigure (c 2 )).

Second complication: nonlinearity
The second difficulty in recovering the linear response function χ (t) from a perturbation experiment may arise from nonlin-545 earities present in the considered system. Generally it must be suspected that nonlinearities are present so that they should not hurt as long as they are small. And indeed, from the viewpoint of regularization, contributions from nonlinearities can be considered as an additional noise so that in principle they can also be filtered out. But as with noise, when getting stronger they cause a deterioration of the recovery of the response function. In the following, we show this more formally and discuss in detail how the RFI algorithm behaves in the presence of nonlinearities.

550
To understand how contributions from nonlinearities affect the recovery of the response function we write the nonlinear terms in Eq. (5) collectively as η(t). This formally gives instead of Eq. (13). Plugging this into Eq. (20) the spectrum is obtained as 555 Accordingly, the nonlinear contributions can be understood as an additional noise in the spectrum q λ so that the theory of regularization fully applies when replacing η by the combined noise η + η. Hence, as in their absence, nonlinearities do not prevent the application of regularization as long as the signal is not buried under this combined noise.
But for the RFI algorithm to give good results a second condition is that the contributions from η must not be large compared to those from η. To understand this, one must realize that the response and with it the nonlinear contributions η are dominated 560 by low-frequency components because of the low-frequency nature of the forcing for the problems of interest (for instance in %-experiments). The RFI algorithm uses an estimate for the noise level in the perturbation experiment obtained from the control experiment assuming that the spectral distribution is approximately the same in the noise from the control experiment and the noise in the data from the perturbation experiment (spectral similarity assumption; step 3 of Fig. 1). But the control experiment does not contain any contributions from nonlinearities because the forcing is zero. Therefore, if in the data from the 565 perturbation experiment the contributions from nonlinearities η are not small compared to those from η, the spectral similarity assumption does not hold. Since this assumption is at the heart of the RFI algorithm, its breakdown leads to a poor recovery of the linear response function.
All this is demonstrated in the following by toy model experiments. For this purpose, we artificially consider the response of the toy model not in Y but in its nonlinear transform where the parameter a determines the strength of the nonlinearity. Indeed, in such a way the nonlinearity does not result from nonlinearity of the underlying dynamics (the toy model is linear), but from the way the response is looked at. But this distinction is artificial since in practice a response experiment is an indivisible unity of system and observation so that the origin of the nonlinearity is irrelevant. The particular functional form chosen for Y nonlin (t) mimics the nonlinear effect of saturation 575 encountered for instance in the land carbon sink when atmospheric CO 2 rises to high values. In the following, to demonstrate the effect of nonlinearities, we set the noise level in the toy model experiments to a rather small value in order to have a good SN R in the experiments considered.
In Fig. 7 we show by plotting the mean prediction error (see Eq. (37)) how the recovery of the response function deteriorates as the nonlinearity parameter a increases. To demonstrate that this is indeed caused by a breakdown of the spectral similarity assumption we plot in addition the ratio || η||/||η||. It is seen that indeed, as claimed above, the recovery works well only when this ratio is not large, i.e. when the contributions from nonlinearities η are not large compared to those from the noise η.
More insight into how nonlinearities affect the recovery is obtained from the more detailed SVD analysis shown in Fig. 8.
The first row of subfigures was obtained from the toy model assuming a rather small nonlinearity (a = 10 −10 ). In the Picard plot (subfigure (a 1 )) it is seen that in this case both conditions necessary for a good recovery are met: First, the signal |u i •∆Y | 585 is well visible above the combined noise |u i • (η + η)| (see the first four components). Second, in this case |u i • η|/|u i • η| is small over the whole spectrum, i.e. the contributions from η are small compared to those from η. As explained above, because this second condition is also met, the noise estimate from the RFI algorithm η est is a good approximation to the combined noise across all frequencies (compare in the Picard plot |u i •(η + η)| to |u i •η est |). As a result, the four components selected by the regularization for the recovered solution (subfigure (b 1 )) are precisely those dominated by the signal (compare . This example demonstrates that as long as these two conditions are met, small contributions from nonlinearities do not prevent a good recovery of the response function (see subfigure (c 1 )).
In the second row of Fig. 8, we demonstrate how the violation of the second condition obstructs the recovery. In this case the nonlinearity parameter has been given a larger value (a = 2.5 × 10 −5 ). As a consequence, one sees in the Picard plot that the low-frequency components of the combined noise are enhanced. The first condition is still met: The signal |u i • ∆Y | is 595 visible above the combined noise |u i • (η + η)| (see the first two components). But now the ratio |u i • η|/|u i • η| gets large at low frequencies, violating the second condition. As explained, the violation of the second condition leads to the breakdown of the spectral similarity assumption. As a result, the RFI algorithm underestimates the combined noise at low frequencies (compare in the Picard plot |u i •(η + η)| to |u i •η est |). Using this wrong noise estimate, regularization selects components for the recovered solution that are to a large extent dominated by the combined noise (see components i = 2 to i = 6 in subfigure 600 (b 2 )). The result is that the strong low-frequency contributions from nonlinearities deteriorate the recovery of the response function at long time scales (subfigure (c 2 )).
In the third row, we demonstrate for this type of nonlinearity that by accounting for monotonicity one can remove from the recovered solution all components dominated by noise. For this purpose, we set the nonlinearity parameter to the same value as for the second row (a = 2.5 × 10 −5 ) but employ the additional noise level adjustment (step 6 of Fig. 1), i.e. the low-frequency 605 range of the noise estimate is now automatically adjusted in order to recover a response function that decays monotonically to zero. As seen in the Picard plot, the additional noise level adjustment results in an artificial enhancement of the low-frequency components of the noise estimate, with a large jump separating the low-from the high-frequency range. In this case, such enhancement is able to better estimate the largest components of the combined noise (first few components in the Picard plot).
As a consequence, regularization correctly selects for the recovered solution only the two first components which are not 610 dominated by noise (subfigure (b 3 )). Unfortunately, as seen in subfigure (c 3 ), these two first components do not contain enough information for a perfect recovery, since the quality improves at long time scales, but deteriorates at short time scales (compare subfigures (c 3 ) and (c 2 )). This is a consequence of how regularization works: It filters out components dominated by noise (or in this case, nonlinearity) at the expense of removing also useful information contained in those components.
Also interesting to note from this SVD analysis is that although in general the presence of nonlinearities cannot be detected 615 from only the two experiments needed for our RFI method, it can be detected in cases where the response function is known to be monotonic but the nonlinearity is such that the recovered response function is non-monotonic. This is shown in the last example above, where strong nonlinearities result in a large jump between the low-and high-frequency components of the noise estimate. Such jump arises because strong nonlinearities cause the response function to be non-monotonic, and this enforces the additional adjustment of the noise estimate by the RFI algorithm. This effect is obviously a result of the particular 620 type of nonlinearity considered for that example. Nevertheless, such jump may be a relevant indication of strong nonlinearities in applications to the land carbon cycle because this type of nonlinearitiy mimics precisely the saturation behaviour observed in the land carbon sink under high values of atmospheric CO 2 .

Comparison with previous methods
As a last test of the quality of the results given by the RFI method in application to the toy model, in this section we compare 625 our method against two existent methods in the literature to identify response functions in the time domain. The comparison is performed for the particular case where the response function is known to be monotonic and also for the more general case where it is not. As a side issue this section reveals also some insight into the relation between the quality of the recovery of χ (t) as measured by the prediction of responses, and the quality of the recovery of χ (t) itself.
In climate science, the most commonly used method is to obtain χ (t) from an impulse response, i.e. the response to a 630 perturbation of Dirac delta-type (e.g., Siegenthaler and Oeschger, 1978;Maier-Reimer and Hasselmann, 1987; Joos and Bruno,  Nonlinearity factor a = 2.5 × 10 −4 (no monotonicity check); Third row: Nonlinearity factor a = 2.5 × 10 −4 (with monotonicity check). The noise is overestimated in the low-frequency spectrum in the third row because nonlinearities yield a derived χ (t) that does not obey the monotonicity constraint. As a consequence, the method increases the level of low-frequency components until the monotonicity constraint is obeyed. The failure to obey the monotonicity constraint and consequent large overestimation of noise in this case can be taken as an indication of the presence of nonlinearities in the response. Note that the "true" linear response function in this nonlinear case a = 0 is obtained analytically from the linear case a = 0 via Eq. (41) (see Appendix D). The regularization parameter determined by the algorithm is λ ≈ 3120 for the first row, λ ≈ 74 for the second, and λ ≈ 14611873 for the third. For more details see text.
1996; Thompson and Randerson, 1999;Joos et al., 2013). We call it here pulse method. Although this method is conceptually straightforward, in some cases it might not yield satisfactory results. Since the perturbation is only one "pulse", depending on the observable of interest it may give a response with small SNR. As a consequence, the recovered response function may be severely affected by noise. On the other hand, if the strength of the pulse is made large to obtain a good SNR, 635 the linear regime may be exceeded. In this case, the impulse response does not correspond anymore to the linear response function.
The second method consists of deriving the linear response function from a step response, i.e. the response to a Heavisidetype perturbation (e.g., Hasselmann et al., 1993;Ragone et al., 2016;MacMartin and Kravitz, 2016;Lucarini et al., 2017;Van Zalinge et al., 2017;Aengenheyster et al., 2018). We call it here step method. Due to the special form of this "step" 640 perturbation, the linear response function can in principle be derived from where f step is the step perturbation and ∆Y step is the corresponding response. Unfortunately, such derivation involves numerical differentiation, which is known to be an ill-posed problem (Anderssen and Bloomfield, 1974;Engl et al., 1996). Because the problem is ill-posed, noise is amplified, potentially resulting in large errors in the derived linear response function.

645
These two methods therefore share two limitations: First, they require a special perturbation experiment; second, because of noise in the data they might yield a response function with large errors. In principle, the second limitation may be overcome by using instead of a single response the ensemble average over multiple responses. But this comes at the expense of the numerical burden of performing multiple experiments, which is especially large when dealing with complex models such as state-of-the-art Earth System Models.

650
The main advantages of the RFI method lie precisely in overcoming these two limitations: It recovers the response function from any type of perturbation experiment and automatically filters out the noise by regularization.
For the results of this section, we performed ensembles of 200 simulation experiments with the toy model (see section 4.1).
Each ensemble member is defined by a realization of the noise η * (t) with a fixed standard deviation (see Eq. (34)). Each realization was added via Eq. (32) to three experiments: 1%, step (2×f 0 ), and pulse (4×f 0 ). Note that because of the issue 655 with the SNR mentioned above, we had to employ for the pulse experiment twice the forcing strength employed for the step experiment. Further, for each ensemble member an additional realization of the noise was generated to serve as a control experiment to compute the noise estimate for the RFI method (step 1 of Fig. 1).
We computed the response function by the pulse and step method as follows. For a pulse experiment the forcing is f (t) = aδ(t) with forcing strength a, so that the response is given by Therefore, for the pulse method we took the response from the pulse experiment and obtained the response function by The recovery by the step method was calculated by taking the response from the step experiment and applying Eq. (42). The derivative was computed by forward difference.
To obtain comparable results with these two methods, we recovered the response function by the RFI method from the same pulse and step experiments. To compare the quality of the results using also an experiment not decidedly tailored for the identification, we include additionally the recovery from the 1% experiment.
To obtain a quantitative comparison for the quality of the recovery for each method, we define the recovery error

670
where χ is the recovered response function and χ * is the "true" response function, which is known because we use the toy model. In contrast to the prediction error, that measures the quality of the recovery of χ (t) by means of the response (see Eq. (45)), the recovery error ε r measures the quality of the recovery of χ (t) itself. Another reason for introducing the recovery error is to compare its results with results from the prediction error. By doing that, we can gain insight into how much the prediction error can be trusted as an indirect measure of the quality of recovery in real applications, where the "true" response 675 function is not known.
First, we compare the pulse and step methods against the full RFI algorithm, i.e. the RFI algorithm taking monotonicity into account (step 6 in Fig. 1). Results are shown in Fig. 9. In the first row of subfigures, we took for the recovery the ensemble average over the 200 responses for each experiment. For the RFI method, we took the ensemble average over the control experiments as well to estimate the noise (step 1 of Fig. 1). As shown in subfigure (a 1 ), with this approach all methods recover 680 the response function almost perfectly. The quality of the recovery is quantified by the recovery error in subfigure (b 1 ). The RFI method shows the smallest values for the step and pulse experiments when compared to the step and pulse methods.
Overall, the step method clearly shows the largest value. To quantify the quality of the prediction, we plot in subfigure (c 1 ) the prediction error (36). As seen, values are even smaller than for the recovery error. Overall, we see a similar pattern: the step method again stands out, with other methods showing much smaller error values.

685
In the second row, we compare results by taking only a single response for the recovery. Since the quality of the recovery by the different methods may vary depending on the particular noise realization, we again performed 200 simulations to obtain better statistics, but this time deriving the linear response function for each ensemble member separately. Subfigure (a 2 ) shows an example of recovery for one of the ensemble members. As expected, the recoveries by the pulse and step methods largely deviate from the true response function. For the pulse method, the large errors result from the low SNR of the pulse response:

690
Even taking twice the forcing strength of the step experiment, the SNR of the pulse response is of order 10 0 against order 10 1 for the step and 1% responses. For the step method, on the other hand, the large errors are not a result of low SNR, but of the noise amplification associated with the ill-posedness of numerical differentiation. In contrast to the recovery by these two methods, because of regularization the recoveries by the RFI method are smoother and visually seem to better fit the true response function. To quantitatively check these results, we plot in subfigure (b 2 ) for each method the average and standard is probably related to the low SNR in the response from the pulse experiment. The results from the 1% and pulse experiments by the RFI method are better, showing comparable error magnitudes. The smallest average recovery error is obtained from the RFI method using the step experiment. In subfigure (c 2 ) we show the average and standard deviation over the 200 values 700 of the prediction error (36). The smallest average prediction errors are obtained from the RFI method using the 1% and step experiments. The largest errors are obtained for the pulse method and the RFI method using the pulse experiment. In contrast to the situation for the recovery error, for the prediction error no substantial difference between the two is found. Note also that when comparing recoveries from the same response (i.e. comparing "Pulse" with "RFI pulse " and " Step" with "RFI step "), the RFI method gives better results than both the pulse and step methods. Another interesting point is that prediction errors 705 for the step method remain approximately unchanged by taking the ensemble mean and a single response (compare " Step" in subfigures (c 1 ) and (c 2 )). Overall, as in the first row, the prediction error shows for each individual method values smaller than the recovery error. But now there is a difference between the plots for the recovery and prediction error: Although the pulse and step methods show the largest averages with values of comparable size for the recovery error, for the prediction error the pulse method has the largest average with a value much larger than the step method.

710
This difference can be better understood as follows (see MacMartin and Kravitz (2016) for more details including the influence of the forcing scenario). Because Eq. (1) is ill-posed, the convolution operator acts on χ (t) as a "low-pass filter" (see e.g. Bertero et al., 1995;Istratov and Vyvenko, 1999). This means that high frequencies in χ (t) are suppressed by convolution and show up damped in the response ∆Y (t). Hence, recoveries with large errors only at high frequencies tend to give relatively small prediction errors. Because of the low SNR, the pulse method yields a recovery of χ (t) with large errors both at high and 715 low frequencies. Although the errors at high frequencies are damped in the prediction, errors at low frequencies are not. Hence, the large recovery error results in a large prediction error. On the other hand, because of the good SNR for the step response, the step method gives a relatively good recovery of χ (t) at low frequencies, with large errors concentrated at high frequencies.
As a result, the large recovery error results only in a small prediction error. This suppression of high-frequency errors might also explain why the prediction error for the step method remains unchanged when recovering the response function from a 720 single response instead of the ensemble average. By comparing the recovery of χ (t) by the step method in Fig. 9(a 1 ) and (a 2 ), one sees that the main difference is indeed at high frequencies (the recovery in Fig. 9(a 2 ) is quite "noisy" but follows the long term trend). This is because the noise amplification has a larger effect on the recovery from the single response due to its larger noise level. But since low frequencies are well recovered in both cases, the resulting prediction errors are almost the same.
Overall, the analysis of Fig. 9 suggests two main conclusions. First, as expected, the prediction error gives indeed an indi-725 cation of the quality of the recovery, since good recoveries result in good predictions. But care should be taken when judging the recovery only from the prediction error, because a good prediction does not necessarily imply a good recovery: Due to the ill-posedness, Eq. (1) might damp large high-frequency recovery errors so that they do not show up in the prediction. Nevertheless, from a good prediction error one can still infer a good recovery at low frequencies, because at these frequencies large recovery errors result in large prediction errors. Since regularization filtering leaves only low-frequency terms in the recovery, 730 the RFI method shows in Fig. 9 small prediction errors associated to small recovery errors.
Second, by taking only a single response -and not the ensemble average -the full RFI algorithm gives on average smaller recovery and prediction errors than the pulse and step methods when comparing results obtained from the same experiment.
But the results above cover only the case where the full RFI algorithm is employed. In the following, we analyze also the case where monotonicity is not taken into account. For this purpose, we repeated in all detail the exercise that led to Fig. 9 735 but did not apply the additional noise level adjustment to enforce monotonicity of the response function. Figure 10(a) shows the results for the recovery error. Once more, the RFI method gives smaller values than the step and pulse methods when comparing the recovery from the same responses. In addition, now the recovery for the RFI method using the step experiment even improved in comparison to Fig. 9(b 2 ). The reason may be related to the numerical check for monotonicity: Depending on the tolerance value that is used to judge whether the recovered response function is monotonic, the additional adjustment 740 might actually overestimate the noise level, leading to slightly worse results.
Yet, the improvement brought by the additional noise level adjustment is clear when looking at the recovery error for the 1% experiment. Compared to Fig. 9(b 2 ), the average error increases substantially, and the spread is much larger (see inset for the whole value). As explained in section 4.4, this deterioration results from cases where the noise in the response is such that the spectral similarity assumption does not hold. Since here the noise estimate resulting from this assumption is not further 745 improved by the monotonicity check, the result is a poor recovery (see subfigure (b) for an example). But because the large errors are mostly at high frequencies, even poor recoveries are still sufficiently good for predictions, as shown by the small mean prediction error in subfigure (c) (see "RFI 1% "). Therefore, in contrast to the case where monotonicity is taken into account, here some small prediction errors are associated to large recovery errors.
Nevertheless, we find that although extreme, such poor recoveries are not frequent. In fact, extreme cases with recovery error  As discussed in section 5, the developed method to identify linear response functions is an alternative approach to existent methods in the literature, which require special perturbation experiments and often give results with large errors caused by noise. In contrast, the RFI method accounts in a systematic way for the noise and can be directly applied to data from any type of perturbation experiment once also a control experiment is given. Because it filters out the noise, its results show in the cases analyzed here a higher quality compared to results from previous methods when applied to the same data from a toy model.

775
And because it can identify response functions from any type of perturbation experiment, the method is particularly suitable for application to data from the C 4 MIP carbon cycle model intercomparison as shown in Part II of this study.
The main novelty of the method is the estimation of the noise level (steps 1-3 of Fig. 1), which is known to be critical for the application of regularization theory. When solving a problem by regularization, the most crucial step is the computation of the regularization parameter. To compute this parameter in a way that the solution converges to the "true" solution for decreasing 780 noise, methods need to account for the noise level (Bakushinskii, 1984;Engl et al., 1996). But in practical applications the noise level is rarely known. Therefore, methods to obtain good estimates are needed. Our new method to estimate the noise level consists essentially of two steps: First, estimating the high-frequency components from data and then the low-frequency components from the control experiment. While the second step is completely novel, the main idea behind the first step was already brought up in earlier studies (e.g., Hansen, 1990) and has recently been further developed by methods to compute the 785 "Picard parameter" (Taroudaki and O'Leary, 2015;Levin and Meltzer, 2017), which is different from the i max that we use in our method. The Picard parameter is computed as the index for which the components |u i • ∆Y | start to level off. Typically, from this index onwards the data can be interpreted as noise. For this reason, one might think that from the Picard parameter one can obtain all data components dominated by noise and thereby estimate the noise level. But this is not generally true: For instance, if the noise has large low-frequency components such as in Fig. 8(a 2 ), then the components |u i • ∆Y | level off at 790 an index larger than that at which the data starts to be dominated by noise, so that in this case the Picard parameter does not determine all data components dominated by noise. In our RFI method, the interest lies in obtaining not all data components dominated by noise, but only enough components to obtain the overall level of the high-frequency noise. For this purpose, we define instead of the "Picard parameter" the more conservative index i max , above which the singular values are zero and by the Picard condition also the "true" data components must be zero. In this way, we unambiguously identify data components 795 that contain only noise (see Eq. (24)). These components give the high-frequency noise level so that in the second step also the remaining low-frequency noise components can be estimated from the control experiment (step 3 of Fig. 1).
Because our noise level estimation is not particularly related to the problem of identifying response functions, it can in principle be applied to solve also other types of linear ill-posed problems (see e.g. Engl et al., 1996). In general, all one needs for the application is: 800 1. A problem of the type where given the matrix A and the noisy data y one is interested in finding x.
2. Data from a situation similar to the control experiment, where Ax = 0, so that the resulting y ctrl gives the noise term

805
3. The singular values of A decaying to values sufficiently close to zero to obtain i max .
Then, as long as both the Picard condition and the spectral similarity assumption hold, the method gives a reasonable noise estimate -since then, by assumption, the noise estimate is simply a scaling of the noise in the control experiment (see section 3.3.2) -by which the regularization parameter can be determined.
While the Picard condition is necessary for a solution to be recoverable from an ill-posed problem, the validity of the spectral 810 similarity assumption is less clear. An intuitive explanation for this assumption can be thought as follows. Since here the interest lies in identifying linear response functions, the perturbation to the system must be sufficiently weak so that the response can be considered linear. If the noise in the control experiment depends on the perturbation, a sufficiently weak perturbation will modify its characteristics only slightly. The RFI method accounts partially for this change by adjusting the overall level by which the noise increases. Nevertheless, it assumes that since the characteristics of the noise change only slightly, then the 815 spectral components of the noise in the perturbed experiment can be thought as having the same relative contributions as those in the control experiment. When in addition the response function is known to be monotonic, the estimate of the noise can be further improved (step 6 of Fig. 1), this time by adjusting the relative contribution of the spectral components: Since the high-frequency region is known from the spectral analysis of the response, then the components of the noise are adjusted in the low-frequency region; this is done iteratively until the resulting response function gets monotonic. Such additional adjustment 820 has been demonstrated to give good results in the applications in the present study and subsequent Part II for the special case where the response function can be considered monotonic.
Although it is assumed that χ (t) is given by the spectral form (8), this is not essential for our method. In principle, any functional form can be assumed for χ (t), or even none -in which case one would recover χ (t) pointwise. But compared to the simpler pointwise recovery of χ (t), assuming Eq. (8) has some advantages. The most obvious is that in contrast to the pointwise approach, with Eq. (8) both χ (t) and the spectrum can be recovered together. If χ (t) is recovered pointwise, the spectrum has to be derived in a second step from χ (t), which is also an ill-posed problem (Istratov and Vyvenko, 1999). Further, the description (8) restricts the function space for the recovered χ (t), forcing lim t→∞ χ (t) = 0 as is expected for most problems of interest, which greatly simplifies the problem compared to the case where χ (t) can assume any form. Our ansatz (8) has also advantages in comparison with the typical multi-exponential ansatz (7) assumed in most previous studies 830 (see discussion in section 3.1). When assuming that χ (t) is given by a sum of few exponents, an important problem is how to choose the number of exponents. The typical methods to choose this number rely on "quality-of-fit" criteria; but for illposed problems these criteria can be unreliable because in these problems a good fit does not mean that the derived parameters are close to the "true" parameters (see e.g. the famous example from Lanczos, 1956, p. 272). In our approach, as long as the distribution of time scales is appropriately prescribed and the data quality is sufficiently good, numerical results indicate 835 that the solution is approximately independent of the number of exponents (Appendix E). Moreover, compared to the multiexponential approach, our ansatz (8) has two additional advantages: The first is that it leads to the linear problem of finding only the spectrum q(τ ) -in contrast to the nonlinear problem of finding both the time scales τ i and the weights g i from Eq. (7) -, which permits an analytical solution and thereby gives more transparency to the method. The second is that compared to the assumption of only a few time scales, the ansatz of a continuous spectrum of time scales is typically more realistic for real 840 systems, which is e.g. the case for the carbon cycle study presented in Part II. One limitation is however that our ansatz (8) restricts the solution to systems with exponentially relaxing responses and vanishing oscillatory contributions.
In the present paper the robustness of our method has been investigated only for artificial data taken from toy model experiments. In this analysis, we not only knew the "true" response function underlying the data but also had control over the two complications that may hinder its recovery, namely the level of background noise and nonlinearities. Under these ideal 845 conditions, we could carefully examine the quality of the response functions identified by our RFI method. Nevertheless, such conditions are hardly met in practice. Therefore, the applicability of our method must be investigated as well for real problems.
Such an investigation is presented in Part II of this study.
Appendix A: Basic equations in this study are Fredholm equations of the first kind In this appendix we show that Eq. (1), Eq. (7), and Eq. (1) with χ (t) given by Eq. (7) are indeed special cases of the Fredholm 850 equation of the first kind, as claimed in sections 3.1 and 3.2. Since inverse problems in the form of this equation are well-known to be ill-posed (e.g., Groetsch, 1984;Bertero, 1989;Hansen, 2010), this clarifies the inherent difficulties in identifying linear response functions from perturbation experiment data.
A Fredholm equation of the first kind is an equation of the type (Groetsch, 1984) Clearly, by setting a := 0, k(t, s) := 0 ∀ s > t, and k(t, s) := k(t − s), one obtains the form of Eq. (1) -which can also be seen as a Volterra equation of the first kind (Olshevsky, 1930;Polyanin and Manzhirov, 1998;Groetsch, 2007).
That Eq. (7) is a special case of Eq. (A1) can be seen (Istratov and Vyvenko, 1999) by noting that Eq. (7) can be written in integral form as Since Eq. (A2) is a particular case of Eq. (A1) and Eq. (7) is a particular case of Eq. (A2), Eq. (7) is also a particular case of which is a special case of Eq. (A1). Thus, Eq. (1), Eq. (7), and Eq. (1) with χ (t) given by Eq. (7) can all be understood as Fredholm equations of the first kind. particular the logarithmic transformation (9) and a discretization of the representation (8) for the response function by means of a spectrum of time scales. Since χ (t) is assumed to be given by a spectrum of time scales according to Eq. (8), the discretization 875 must be performed both in the time and time scale domain.
Appendix C: Spectrum q(τ ) positive or negative for all τ implies χ (t) monotonic

925
This appendix is referred to on section 3.5 with the claim that a sufficient condition for χ (t) being monotonic is that all components q i have the same sign. The proof is as follows.
Now, by taking f = 0 in Eq. (D1) one obtains for this nonlinear case the noise from the control experiment: To define the combined noise η(t) + η(t), one must first define the nonlinear term η(t) from Eq. (39). For the nonlinearized 945 response from the toy model, this term is given by the nonlinear term in Eq. (D1), i.e.
Then, the noise term consists of the remaining terms of the nonlinear response Y nonlin after subtracting the "clean" linear response and the nonlinear term η, i.e.
Hence, the combined noise is given by Appendix E: Sensitivity of the recovered response function and spectrum to the parameters M , log τ min and log τ max of the RFI algorithm In this appendix, it is shown that as long as the extent and resolution of the discrete distribution of time scales approximates 955 the spectrum sufficiently densely, the derived spectrum q λ and the derived linear response function χ (t) are approximately independent of the number of time-scales M and on the limits of the distribution log τ min and log τ max . To isolate the effect of changes in M , log τ min and log τ max from the effect of noise, a relatively high SN R ∼ O(10 5 ) is taken. For the computations we took data from 1% experiments performed with the toy model described in section 4.1. No monotonicity needed to be accounted for (step 6 of Fig. 1).

960
Figs. E1-E5 show the recovery taking the same limits used throughout the paper (log τ min = −1 and log τ max = 5) but different number of time scales M . Figs. E6-E8 show the recovery keeping the number of time scales and the lower limit used throughout the paper (M = 30 and log τ min = −1) but changing the upper limit log τ max . Figs. E9-E11 show the recovery keeping the number of time scales and the upper limit used throughout the paper (M = 30 and log τ max = 5) but changing the lower limit log τ max . As expected, the results are approximately independent of the changes in the prescribed parameters.

965
The only substantial differences are found in the recovered spectra at time scales smaller than the time step ∆t = 1, thus time scales over which anyway only little information is given by data. These small time scales are also problematic because of the ill-posedness of the problem that suppresses high-frequency information from the solution (see Groetsch, 1984, section 1.1).           Code and data availability. The scripts employed to produce the results in this paper as well as information on how to obtain the underlying data can be found at http://hdl.handle.net/21.11116/0000-0008-0F02-6 (Torres Mendonca et al., 2021).