Identiﬁcation of linear response functions from arbitrary perturbation experiments in the presence of noise Part II. Application to the land carbon cycle in the MPI Earth System Model

. The Response Function Identiﬁcation method introduced in the ﬁrst part of this study is applied here to investigate the land carbon cycle in the Max Planck Institute for Meteorology Earth System Model. We identify from standard C 4 MIP 1% experiments the linear response functions that generalize the land carbon sensitivities β and γ . The identiﬁcation of these generalized sensitivities is shown to be robust by demonstrating their predictive power when applied to experiments not used for their identiﬁcation. The linear regime for which the generalized framework is valid is estimated, and approaches to improve 5 the quality of the results are proposed. For the generalized γ -sensitivity, the response is found to be linear for temperature perturbations until at least 6 K. When this sensitivity is identiﬁed from a 2 × CO 2 experiment instead of the 1% experiment, its predictive power improves, indicating an enhancement in the quality of the identiﬁcation. For the generalized β -sensitivity, the linear regime is found to extend up to CO 2 perturbations of 100 ppm. We ﬁnd that nonlinearities in the β -response arise mainly from the nonlinear relationship between Net Primary Production and CO 2 . By taking instead of CO 2 (cid:58) as (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) forcing the 10 resulting Net Primary Production as forcing (cid:58)(cid:58)(cid:58)(cid:58)(cid:58) instead (cid:58)(cid:58)(cid:58) of (cid:58)(cid:58)(cid:58)(cid:58) CO 2 , the response is approximately linear until CO 2 perturbations of about 850 ppm. Taking Net Primary Production as forcing also substantially improves the spectral resolution of the generalized β -sensitivity. For the best recovery of this sensitivity, we ﬁnd a spectrum of internal time scales with two peaks, at 4 and 100 years. Robustness of this result is demonstrated by two independent tests. We ﬁnd that the two-peak spectrum can be explained by the different characteristic time scales of functionally different elements of the land carbon cycle. The peak at 4 years results from the collective response of carbon pools whose dynamics is governed by fast processes, namely pools representing living vegetation tissues (leaves, ﬁne roots, sugars, and starches) and associated litter. The peak at 100 years results from the collective response of pools whose dynamics is determined by slow processes, namely the pools that represent the wood in stem and coarse roots, the associated litter, and the soil carbon (humus). Analysis of the response functions that characterize these two groups of pools shows that the pools with fast dynamics dominate the land carbon response only for times below 20 2 years. For times above 25 years the response is completely determined by the pools with slow dynamics. From 100 years onwards only the humus pool contributes to the land carbon response. taking data from standard C 4 MIP 1% experiments. To make sure the identiﬁed generalized sensitivities are indeed characteristics of the land carbon cycle in the MPI-ESM, we demonstrate their predictive power by applying them to predict the response of the model in several experiments that were not used for their identiﬁcation. In preparation for future studies applying these generalized sensitivities to study the dynamics of the carbon-climate system in C 4 MIP models, we also investigate various ideas to improve the quality of the recovery of the response functions by using additional types of data routinely available in C 4 MIP simulations or using log-transformed data to account at least partially for process-immanent nonlinearities that hinder the usage of experiment data from larger levels of forcing.


Introduction
In Part I of this study we developed a method to identify linear response functions from arbitrary perturbation experiments.
The RFI method (Response Function Identification method) was tested by means of artificial toy model simulations. Here, we 25 demonstrate the applicability of our method to a practical problem: We investigate the dynamics of the land carbon cycle in the Max Planck Institute for Meteorology Earth System Model (MPI-ESM; see Appendix A). In particular, we show how our RFI method provides insight into two aspects of central relevance to carbon-cycle research: The sensitivity of the land carbon cycle to changes in atmospheric CO 2 and its distribution of internal time scales.
To quantify this sensitivity in a more systematic way and thereby gain deeper insight into the land carbon dynamics one needs 85 a more general formalism. For small changes in atmospheric CO 2 one can shown :::: show : that by accounting for the memory of the carbon cycle these values are generalized :::::::: generalize : to linear response functions, which in turn can be seen as :::: turn ::: out :: to :: be properties of the land carbon cycle itself (Rubino et al., 2016;Enting and Clisby, 2019, see also Appendix B). As a result, these linear response functions characterize the land carbon sensitivities for any perturbation scenario. For this reason these functions will here be called land carbon generalized sensitivities.

90
The essential step to investigate the carbon-cycle dynamics within this general formalism is the identification of the generalized sensitivities. Typically, the γ and β values proposed by Friedlingstein et al. (2003) are obtained taking data from standardized C 4 MIP simulation experiments where, starting from an equilibrium state, atmospheric CO 2 concentration is increased by 1% each year :::: (e.g., ::::::: Since data from such 1% experiments performed with several Earth System Models ::::: C 4 MIP ::::::: models 95 are readily available in international databases, one would be interested in identifying the generalized γand β-sensitivities as well from these experiments. But methods in the literature to identify response functions from data require special perturbation experiments, and C 4 MIP experiments were not tailored for this purpose. It is here that our RFI method is useful: Because it was . :::: Our :::::: method :: is : designed to derive response functions from experiments driven by any arbitrary perturbation : . :::: Thus, in the present study we show that by this method one can robustly derive the land carbon generalized sensitivities for the MPI-ESM 100 taking data from standard C 4 MIP 1% experiments. To make sure the identified generalized sensitivities are indeed characteristics of the land carbon cycle in the MPI-ESM, we demonstrate their predictive power by applying them to predict the response of the model in several experiments that were not used for their identification. In preparation for future studies applying these generalized sensitivities to study the dynamics of the carbon-climate system in C 4 MIP models, we also investigate various ideas to improve the quality of the recovery of the response functions by using additional types of data routinely available in 105 C 4 MIP simulations or using log-transformed data to account at least partially for process-immanent nonlinearities that hinder the usage of experiment data from larger levels of forcing.
Apart from giving a systematic quantification of the sensitivities, : as :::::: hinted ::::: above linear response functions can be a powerful tool to gain insight into the internal dynamics of the carbon cycle. Because response functions fully characterize the linear response of a system, they contain information on its distribution of internal time scales, i.e. the weights with which character-110 istic time scales from internal processes contribute to the macroscopic response of the system. These weights may shed light into which are the most relevant processes to the response at different time scales. For our application to the land carbon, such information may give hints :::::: insight into the main processes influencing the model spread found in the C 4 MIP results.
But ::::::: However, : while several studies have tried to obtain the weights of different time scales in the carbon cycle by fitting response functions to a sum of few exponents (Maier-Reimer and Hasselmann, 1987;Enting and Mansbridge, 1987;Enting, 115 1990; Joos et al., 1996;Pongratz et al., 2011;Joos et al., 2013;Colbourn et al., 2015;Lord et al., 2016), in principle it is not clear to what extent such results can be trusted, let alone interpreted. The reason is that finding these weights from data is a severely ill-posed problem that requires special methods to be dealt with (Istratov and Vyvenko, 1999). In contrast to the classical fitting procedures, by employing a regularization technique (Groetsch, 1984;Engl et al., 1996) combined with a novel estimation of the noise level in the data (see Part I) our RFI method accounts for this ill-posedness. In addition, instead of 120 assuming that the response functions result from only few time scales, the RFI method recovers a continuous spectrum of time scales, in agreement with what one would expect when studying the carbon cycle response (Forney and Rothman, 2012a). In the present work we show that, in contrast to results obtained with classical fitting procedures, spectra recovered by the RFI method may be reliable and even interpretable. For this purpose, we investigate a relatively detailed spectrum of time scales that arises from a high-quality recovery of the generalized β-sensitivity. We examine (i) the robustness of the obtained spectrum 125 and (ii) the explanation for its time-scale structure.
An additional novelty introduced here is a simple procedure to estimate the linear regime of the response, i.e. the range of perturbation strengths for which the response of the system can be considered linear. As discussed in Part I, the presence of traces of nonlinear responses in the data can severely deteriorate the recovery of the response function. Hence, to make sure that the data from which the response function is recovered contain no strong nonlinearities, one must be able to estimate the 130 linear regime of the response. Because the response functions will be derived from 1% experiments, we introduce a technique to estimate with the aid of additional simulations the linear regime from this type of experiment. By this technique the linear regime of the response of land carbon to changes in CO 2 and climate for the MPI-ESM will be estimated.
The outline of the paper is as follows. In the next section we introduce the methodology of the study including the RFI method, the C 4 MIP experiments, and the technique to estimate the linear regime of the response from "percent" experiments.

135
In sections 3 and 4 we identify and investigate the generalization of the γand β-sensitivities in the MPI-ESM. In section 5 we investigate the detailed spectrum of time scales obtained for the generalized β-sensitivity. In section 6 the results are summarized and discussed.

Methodology
In this section we introduce the methodology employed throughout the study. We start by briefly introducing the RFI method 140 (for a detailed description please refer to Part I), the C 4 MIP-type experiments considered here, and technical details for the identification of the generalized sensitivities. Then, we present our procedure to estimate the linear regime of the response from "percent" experiments.

RFI method and C 4 MIP experiments
The RFI method identifies the response function χ (t) taking data from the response Y (t) -in our example the global land 145 carbon -and the perturbation f (t) -atmospheric CO 2 or temperature (see below) -assuming the following ansatz based on linear response theory :::::: relation : (see Part I): where η(t) is a noise term. Here ∆Y (t) and ∆f (t) mean that we are taking only the change in the variables with respect to their equilibrium values from a control simulation. Following Forney and Rothman (2012a), the spectrum of internal time 150 scales is obtained by assuming that the response function χ (t) can be represented by an overlay of exponential modes: To account for the large range of time scales in the carbon cycle (Ciais et al., 2013) it is useful to rewrite Eq. (4) in terms of log 10 τ , so that The spectrum of time scales is then given by q(τ ), which following the terminology from Part I we call simply spectrum. The problem is solved by discretizing Eq. (3) and Eq. (5), prescribing a distribution of time scales τ , taking the data on ∆Y (t) and ∆f (t), and solving a minimization procedure for the spectrum q(τ ). The parameters to prescribe the distribution of time scales are taken identically to those chosen for the application to the toy model in Part I. To treat the ill-posedness we employ Tikhonov-Phillips regularization (Phillips, 1962;Tikhonov, 1963) in a Singular Value Decomposition (SVD) framework that 160 gives the solution by the expansion (Hansen, 2010;Bertero, 1989) where M is the number of time scales, u i and v i are the singular vectors, σ i are the singular values, λ is the regularization parameter, and f i (λ) are the filter functions 165 The regularization parameter λ is determined by the discrepancy method (Morozov, 1966) with noise level estimated from a SVD analysis of the data combined with information from the control simulation. Once the spectrum q λ is found, the response function χ (t) follows from Eq. (5).
All linear response functions are identified by the RFI method taking data from C 4 MIP-type experiments performed with the MPI-ESM. We focus on identifying the response functions from standard 1% experiments that are widely available in 170 international databases. In addition, to examine the quality of the results, we identify as well some response functions taking data from additional experiments. To investigate the robustness of the identified response functions, we employ them to predict the response of the MPI-ESM in several experiments not used for the identification. A summary of the experiments considered in the study is given in Table 1, with forcing scenarios shown in Fig. 1.
The variables taken for the identification of the response functions are the ones relevant for the quantification of the land 175 carbon sensitivities γ and β, respectively: (a) The change in global land carbon in response to changes in global land temperature; (b) The change in global land carbon in response to changes in atmospheric CO 2 .
Global land carbon is computed as the sum of the total land carbon over all grid cells of the model. Global land temperature is calculated as the mean near-surface air temperature over land at 2 m height. The changes are computed as with Y P I being the mean value of observable Y from a control simulation at pre-industrial conditions. Since the main interest lies in long-term variations, annual mean data are used.
As demonstrated in Appendix C, the generalization of the β-sensitivity can be shown to be monotonic. Therefore, in the following we will derive it employing the additional noise level adjustment in the RFI algorithm (step 6 of Fig. 1 in Part I).
Since the generalization of the γ-sensitivity is not known to be monotonic, for this sensitivity the RFI algorithm will be applied 185 without the additional adjustment. 2.2 Estimating the linear regime from "percent" experiments As described in Part I, the recovery of response functions is cursed :::::::::: complicated by the presence of noise and nonlinearities.
The RFI method is designed to cope with the former. In the present section we present a technique to cope with the latter, but 190 at the expense of performing additional response experiments. This technique will serve as a complement to the RFI method in the application to the land carbon cycle in the following sections. By these additional experiments we will determine the range of forcing strengths for which the response can be considered linear -an analysis in general not possible using only the control and perturbation experiments on which the RFI method is based.

210
:: To :::::::: introduce ::: our ::::::::: technique :: we : use the example of simulations with the linear toy model employed in Part I. To demonstrate the effect of nonlinearities on the recovery of χ (t), following Part I we artificially add a nonlinear term −aY 2 (t) to the linear response Y (t) of the toy model: In this way we obtain a nonlinear response Y nonlin (t), with nonlinearity strength controlled by the parameter a, from which 215 χ (t) is derived. In addition to including this nonlinear term in the toy model response, to introduce the technique we will need to quantify the quality of the recovery of the response function. Since in our application to the land carbon the "true" response function is not known a priori, following Part I we quantify the quality of the recovery indirectly by measuring the quality with which the recovered response function can predict the response of the model in experiments not used for the recovery itself.
For this purpose we introduce the :::::: relative : prediction error where stands for the convolution operation, ∆Y k and ∆f k are the response and the perturbation in experiment "k", and χ is the response function recovered from the 1% experiment.
We can now present our technique. Taking first a purely linear situation (a = 0) we show in Fig. 2(a) the :::::: relative : prediction error (10) when using the response function obtained from a 1% simulation to predict the response from two other 225 %-simulations with smaller growth rate. More precisely, performing a sequence of 1% experiments for increasingly longer simulation periods, we calculated for each experiment the response function and used it to predict the response for a 0.5% and 0.75% experiment covering the same simulation period. Then we plotted in Fig. 2(a) the :::::: relative : prediction error against the final forcing strength of the 1% experiment. As a result, it is seen that with increasing final forcing strength the :::::: relative prediction error decreases. This happens because in this linear case the SNR is increasing with increasing simulation period, i.e. with increasing final forcing strength, so that the recovery of the response function continuously improves.
Calculating the :::::: relative prediction error only for experiments with smaller growth rate gets important in the next case where nonlinearities are considered ( Fig. 2(b)). This plot was obtained by the same procedure except that we took for the nonlinearity parameter a value a > 0. As seen, in this nonlinear case the ::::: relative : prediction error is first improving but deteriorating afterwards. For small forcing, nonlinearities are small and therefore the :::::: relative prediction error behaves as in the linear case, i.e. it 235 decreases with final forcing strength. But when forcing strengths get larger, nonlinearities start to contribute substantially to the response, thereby causing a deterioration of the recovery of the response function and consequently the :::::: relative prediction error once more increases. This increase of the prediction error can be unambiguously traced back to the presence of nonlinearities in the 1% simulation because the prediction error was calculated only for experiments with smaller growth rate, i.e. smaller forcing strength throughout the whole simulation. Therefore, nonlinearities contribute already substantially to the response in 240 the 1% simulation before they get relevant in the other experiments. Accordingly, with this type of experiment setup we can be sure that the increase in the prediction error comes solely from the deterioration of the recovery of the response function and not from nonlinearities in the additional experiments used for calculating the prediction error.  Part I). Therefore, for these special cases we also modify the algorithm to interpret the two smallest singular values as a plateau, since their small magnitude makes them have a similar effect to those singular values belonging to the plateau itself. In addition, to illustrate the most general case where χ (t) is not known to be monotonic, we exclude here the monotonicity check (step 6 of Fig. 1 in Part I).
Obviously, for forcing strengths smaller than at the minimum, nonlinearities do not hinder the recovery of the response function so that one can consider this to be the regime of linear system behaviour. In view of the trade-off between noise and 245 nonlinearities, for the 1% experiment the response function is thus optimally recovered when taking as final forcing strength the value at the minimum of the prediction error curve. Similarly, if the error curve has no minimum (as in the linear case shown in Fig. 2(a)) the optimal recovery is obtained from the experiment with the largest forcing strength.
With the presentation of this additional technique to identify the linear regime we are ready for the application to the MPI-ESM in the next sections.
3 Generalized sensitivity χ γ In this section we identify from MPI-ESM simulations the linear response function χ γ (generalization of the ::::::::: generalized γ-sensitivity), defined by where now the response is ∆Y (t) := ∆C rad (t) and the forcing is ∆f (t) := ∆T (t), with ∆C rad (t) being the change in global 255 land carbon obtained in the "rad" experiment (see Table 1) and ∆T (t) the change in global land temperature.
That χ γ indeed generalizes γ can be understood by considering that γ is defined by for a negligible noise η(t). From Eq. (12) it is clear that by knowing χ γ (t) one can compute the response ∆C rad (t) and thereby γ(t) for any time-dependent perturbation ∆T (t), as long as the perturbation strength is small. Hence, χ γ (t) can be 260 seen as a property of the land carbon system and a generalization of γ(t).

Estimating the linear regime
As a first step in obtaining a proper approximation of χ γ (t), we investigate what maximum forcing strength can be used to assure that the recovered χ γ (t) is not spoiled by the presence of nonlinearities. Using the technique introduced in section 2.2, we show in Fig. 3 the :::::: relative prediction error (10) for χ γ (t) recovered from the 1% rad experiment as a function of the final 265 forcing strength in the 1% rad experiment. There is no clear minimum so that for the data available the recovery of χ γ (t) seems not to be limited by nonlinearities. For optimal recovery we thus take the full time series, i.e. the maximum final forcing strength.
χ γ and the quality of its recovery The quality of the recovery can in principle be improved by taking an experiment with better SNR. To investigate if the recovery Figure 3. Prediction :::::: Relative :::::::: prediction : error (10) for the 0.5% and 0.75% rad experiments using χ γ (t) recovered from the 1% rad experiment. The error is shown as function of the final forcing strength used for the recovery of χ γ (t). No clear minimum is found so that the recovery seems not to be limited by nonlinearities. rad experiment, the temperature behaves approximately as a step function (ignoring the transient in the first 20 years), which gives a Laplace transform ∆ T (p) proportional to 1/p. This means that for the same χ γ , the first term on the right-hand side 280 of Eq. (13) -the "clean" response -decays to zero faster for the 1% rad experiment than for the 2×CO 2 rad experiment.
Hence, assuming the same noise η for both cases, the response from the 2×CO 2 rad experiment gets buried in the noise only at larger p, meaning that this experiment carries useful information until higher rate values p than the response from the 1% rad experiment.
The response function χ γ (t) recovered from the two types of experiments is shown in Fig. 5(a). As expected, the different 285 experiments indeed result in different recoveries. Because we know from the analysis of Fig. 3 that nonlinearities do not limit the recovery, the difference between the two response functions results probably from the difference in the quality of the data from the two types of experiments. To compare the robustness of each recovery, we analyze how well they predict the response from other experiments (Figs. 5(b) and (c)). If the response function is correctly recovered, it should be able to predict not only experiments with smaller but also experiments with higher forcing rate. Therefore, we include in the analysis also 1.5% 290 and 2% rad experiments. To exclude errors that may be caused by nonlinearity, we take as a conservative estimate of the linear regime forcing strengths smaller than the final forcing strength at the end of the 1% rad experiment (which is approximately the maximum forcing strength; see temperature value at t = 140 years for the 1% rad experiment in Fig. 4). We take these values as an estimate of the linear regime because the 1% rad experiment has only 140 years so that no estimate for higher forcing strengths is available. Hence, for the 1.5% and 2% rad experiments the responses are expected to be reasonably predictable at 295 least until the values marked with circles, where their respective forcing strengths reach this maximum forcing strength. All other experiments have forcing strengths smaller or equal to this maximum forcing strength so that they should be predictable for the whole time series.
Figure 5(b) shows the quality of the prediction using χ γ (t) recovered from the 1% rad experiment. Visually, model response and prediction seem to have a comparable quality of agreement across the 1.1×CO 2 , 0.5%, 0.75% and 1.5% rad experiments, 300 while for the 2% and 2×CO 2 rad experiments there are larger discrepancies. For a quantitative analysis, we compute for the estimated linear regime the :::::: relative prediction error (10) for each experiment (right side of the plot). It is seen that the error varies from less than 10% for the 0.75% and 1.5% rad experiments to values between 10-20% for the 0.5%, 2% and 2×CO 2 rad experiments, and a significantly larger value of 57% for the 1.1×CO 2 rad experiment. To better understand these differences, it is important to note that as long as nonlinearities are small, experiments with higher forcing strength are expected to have 305 smaller prediction error because they have higher SNR. This can be made plausible by considering that a perfectly recovered response function predicts a "clean" linear response (infinite SNR) with zero error, whereas the same response function can predict a noisy response only with some finite error. Therefore, if χ γ (t) is well recovered, we expect large prediction errors for experiments with small forcing strengths such as the 1.1×CO 2 rad -which is indeed the case -, but small errors for experiments with large forcing strengths but still well within the linear regime such as the 2×CO 2 rad (compare the forcing 310 strengths for the 2×CO 2 rad experiment and the maximum forcing strength for the 1% rad experiment in Fig. 4). Since contrarily to the expectation the prediction error is not particularly small for the 2×CO 2 rad experiment, probably the recovery of χ γ (t) derived from the 1% rad experiment is not completely accurate and may still be further improved. As suggested above, such improvement may be achieved by taking data with better quality from the 2×CO 2 rad experiment.
Figure 5(c) shows the quality of the prediction using χ γ (t) recovered from the 2×CO 2 rad experiment. As expected, results

315
indicate an improvement in the recovery (compare to subfigure ::: (a) χ γ derived from 1% and 2×CO 2 rad (b) Prediction from χ γ derived from 1% rad (c) Prediction from χ γ derived from 2×CO 2 rad This section therefore suggests two main conclusions: First, for χ γ (t) the response seems to be approximately linear for temperature perturbations up to at least 6 K. Second, the overall improvement of the prediction in Fig. 5(c) compared to 320 Fig. 5(b) confirms the expectation from the Laplace transform analysis that indeed the step-like 2×CO 2 rad experiment carries more information on the response function than the 1% rad experiment. This suggests that step-like experiments may be more appropriate than the standard 1% experiment for the identification of linear response functions.
That χ β indeed generalizes β can be understood analogously to section 3 by considering that β is defined by :::::::: :::::::: for a negligible noise η(t). Hence, χ β (t) can be seen as a generalization of β(t) and a property of the land carbon cycle that characterizes the response ∆C bgc (t) to any time-dependent perturbation ∆c(t) ::::::::: ∆C atm (t), as long as the perturbation strength is small.
For the second and third approaches, we take advantage of the fact that in the "bgc" setup, perturbations in atmospheric CO 2 affect land carbon only via changes in photosyntetic productivity ::::::: primary :::::::::: productivity 1 . Therefore, we use the CO 2 forcing only indirectly via its relationship to NPP. By doing this, the hope is to account for some of the nonlinear contributions to 350 the response that arise from this relationship. The advantage of accounting for these nonlinear contributions is that one can recover the response function from experiments with perturbation strengths larger than those possible when not accounting for nonlinear contributions (as in the first approach). Experiments with larger perturbation strengths give responses with higher SNR, which makes it possible to recover the response function with better quality.
Accordingly, by this approach we compute χ β (t) in three steps: First, we identify the response function χ N P P (t) using Eq. (17); second, we take the first derivative of NPP with respect to CO 2 at c = c P I ::::::::::: C atm = C 0 atm ; third, we apply formula (21) to obtain χ β (t) from χ N P P (t).
Checking nonlinearities with the three approaches 370 Before analyzing the recovery of χ β (t) employing the three approaches, one must verify that indeed these two additional approaches account for some of the nonlinearities in the response. If this is true, response formulas (16) and (17) should be able to predict the response to larger perturbation strengths than Eq. (14). To verify this expectation, in the following we compare the :::::: relative : prediction error (10) by applying Eqs. (14), (16) and (17).
Figure 6(a) shows the :::::: relative prediction error for χ β (t) recovered from the 1% bgc experiment with the first approach 375 (Eq. (14)). The prediction is also computed via Eq. (14). The clear minima indicate the presence of strong nonlinearities for forcing strengths above around 100 ppm (94 ppm for the 0.5% and 133 ppm for the 0.75%). Therefore, in contrast to the case of χ γ discussed in the last section, we see here that indeed one has to cope with the additional difficulty of nonlinearities. Figure 6(b) shows the prediction error when using χ ln β (t) recovered from the 1% bgc experiment with the second approach (Eq. (16)). To check how well nonlinearities are accounted for by taking the logarithmic forcing, the prediction is as well Figure 6(c) shows the prediction error for χ N P P (t) recovered via Eq. (17) from the 1% bgc experiment (first step of the 385 third approach). To check how well nonlinearities are accounted for by taking the NPP forcing, we employ as well Eq. (17) for the prediction. Here, we see a substantial improvement in the results. The response is almost completely linear, with very "flat" (14) Figure 6. Prediction :::::: Relative :::::::: prediction : error (10) for the 0.5% and 0.75% bgc experiments obtained when using χ β (t), χ ln β (t) and χ N P P (t) obtained from the 1% bgc experiment to predict the response. The error is shown as a function of the CO2 final forcing strength.
minima. This indicates that indeed a large part of the nonlinearity encountered in the response of the land carbon to changes in CO 2 can be explained by the nonlinear relationship between NPP and CO 2 . Accordingly, by employing Eq. (17)  χ β and the quality of its recovery So far, we have only considered the ability of Eqs. (14), (16) and (17) to predict the land carbon response. Now, we analyze how well the generalized sensitivity χ β (t) can be identified by the three approaches. For the identification we took data from the 1% bgc experiment until the CO 2 forcing strength reaches the first minimum for each case in Fig. 6: ∆c ::::::: ∆C atm = 94 ppm 395 for the first approach (30 years); ∆c :::::: ∆C atm : = 133 ppm for the second approach (40 years); and ∆c :::::: ∆C atm = 279 ppm for the third approach (70 years). Since ∆c ::::::: ∆C atm = 279 ppm is approximately the forcing strength for the 2×CO 2 bgc experiment and results from last section suggested that this type of experiment may carry more information for the identification, we employ the third approach also taking the 2×CO 2 bgc experiment. For the present application where the recovery is limited by nonlinearities, taking the 2×CO 2 bgc experiment has the additional advantage that because its forcing strength has a constant 400 value throughout the whole experiment, we can use the full time series (140 years). To compute the first derivative of NPP with respect to CO 2 (second step of the third approach), we fitted the function N P P = N P P (c) ::::::::::::::::::: N P P = N P P (C atm ) to polynomials of order 4, 5 and 6, and then took the first derivative from the fits.
The results from the three approaches are shown in Fig. 7(a). At short time scales there is an overall agreement among all recoveries with only small discrepancies. To be able to compare the results also for longer time scales, we extend the response 405 functions recovered from the 1% bgc experiment -obtained from time series with 30, 40 and 70 years respectively for the first, second and third approaches -until 140 years (extensions are indicated by dotted lines). This can be done because with the RFI method we derive the response function from the ansatz (5), which formally gives the values of the response function for all times. The result is that all response functions recovered from the 1% bgc experiment present relatively small discrepancies even at long time scales. Response functions derived from the 2×CO 2 bgc experiment with the third approach show a similar behaviour among themselves, but differ from the recoveries using the 1% bgc experiment. The reason for this difference will be investigated below.
To quantitatively compare the quality of the recoveries, we plotted in Fig. 7(b) the :::::: relative : prediction error (10). Since the response functions were derived using different time series lengths, for a fair comparison we compute the error only at the minimal time series length of 30 years (the time series length used for the first approach). Results show no large discrepancies 415 among the different approaches. For the third approach, there seems to be a small advantage in using a polynomial of order 6 for the computation of the derivative.
But results from subfigure :::: Fig. : 7(b) reflect only the quality of the recoveries at short time scales, for which anyway no large discrepancies were encountered in subfigure ::: Fig. :: 7(a). To evaluate the quality of the recovery also at long time scales, one must take the extended version of the response functions that were derived from shorter time series. Since the only substantial 420 difference at long time scales in Fig. 7(a) is found between the response functions recovered from the 1% and 2×CO 2 bgc experiments, we take for this analysis exemplarily only the response functions recovered with the first approach (1% bgc experiment) and the third approach (2×CO 2 bgc experiment). By choosing the response function recovered with the first approach, we evaluate for the worst case scenario (where only 30 years are used for the recovery) how reliable predictions are for time scales longer than the time series used for recovery. In contrast, by choosing the response function recovered with 425 the third approach from the 2×CO 2 bgc experiment, we check whether the different values at long time scales are actually an improvement in the recovery. As mentioned above, such improvement is expected because this response function was recovered taking the full time series (140 years).
Following the same procedure as in the last section, in Fig. 8 we show the quality of the prediction for different experiments using the aforementioned recoveries of χ β (t). Subfigure ::::: Figure :: 8(a) shows the results for the predictions calculated via Eq. (14) 430 using χ β (t) recovered with the first approach. We take as an estimate of the linear regime forcing strengths smaller than the forcing strength at the first minimum in Fig. 6(a). The response values corresponding to this forcing strength are marked in Fig. 8(a) with circles. It is seen that although χ β (t) was recovered using a time series of only 30 years, it predicts the 1.1×CO 2 bgc experiment with only 3% error over 140 years. Other experiments are predicted within the linear regime with error smaller than 10% with exception of the 2% bgc experiment, for which the error is around 14%. Since the 2×CO 2 has a constant forcing 435 strength outside the linear regime already from the beginning, its prediction fails as expected for the whole time series.
In subfigure ::: Fig. :: 8(a) we could only evaluate the quality of the long time scales of χ β (t) by the prediction of the 1.1×CO 2 bgc experiment, because this is the only experiment which has forcing strengths within the linear regime over the whole time series. To check the ability of χ β (t) to predict also other experiments at long time scales, we account for some of the nonlinearity in the response by taking NPP instead of CO 2 as forcing (see discussion of Fig. 6). Therefore, we perform the 440 prediction by employing Eq. (17) instead of Eq. (14). Since for employing Eq. (17) one needs χ N P P (t) and not χ β (t), we take the χ β (t) derived with the first approach and compute χ N P P (t) from it via Eq. (21). Because the conversion from χ β (t) to χ N P P (t) is a simple scaling, the time scales structure is maintained. Hence, we can evaluate the quality of the recovered χ β (t) from the results given by the obtained χ N P P (t). The prediction results computed via Eq. (17) with the obtained χ N P P (t) are ∆C 1% atm : recovery with first approach from 1% bgc experiment; log(c) 1% :::::::::: log(Catm) 1% : recovery with second approach from the 1% bgc experiment; N P P 1% polx : recovery with third approach from the 1% bgc experiment using for the derivative a polynomial fit of order x; N P P 2× polx : recovery with third approach from the 2×CO2 bgc experiment using for the derivative a polynomial fit of order x. Continuous lines denote values of χ (t) that are within the time series length used for the recovery (30 years for ∆c 1% ::::: ∆C 1% atm , 40 years for log(c) 1% :::::::::: log(Catm) 1% and 70 years for N P P 1% polx ). Dotted lines denote extended parts of the response function, i.e. values not covered by the time series used for the recovery but obtained from the recovered spectrum using Eq. (5). Circles denote the response functions derived taking the full time series (N P P 2× polx ). For more details see text.
(a) Prediction by Eq. (11) ( χ β (t) derived with first approach) (b) Prediction by Eq. (14) ( χ β (t) derived with first approach, converted to χ NPP (t) via Eq. (18) shown in subfigure ::: Fig. : 8(b). Because errors at the minima in Fig. 6(c) are not substantially smaller than those at the maximum 445 final forcing strength, we take as an estimate of the linear regime all values smaller than the last value of NPP for the 1% bgc experiment (marked with circles in the responses). Once again it is seen that although χ β (t) was recovered using a time series of 30 years, after conversion to χ N P P (t) almost all experiments can be predicted with less than 10% error for the whole time series. The 1.5% and 2% bgc experiments are predicted with errors of 17% and 4% within the linear regime. Results from subfigures :::: Figs. :: 8(a) and : 8(b) therefore suggest that although nonlinearities do restrict the recovery from Eq. (14), taking 450 experiments with forcing strengths within the linear regime for the recovery leads to reliable prediction results even for times reasonably longer than the length of the time series from which the response function was recovered.
Finally, we investigate whether the different values seen in Fig. 7(a) for the χ β (t) recovered from the 2×CO 2 bgc experiment indeed reflect a better quality of recovery. Following the same reasoning that led to subfigure ::: Fig. :: 8(b), since in the third approach χ β (t) is obtained from a scaling of χ N P P (t), we evaluate the quality of χ β (t) from the results given by χ N P P (t).

455
Accordingly, in subfigure :::: Fig. : 8(c) we plot the prediction via Eq. (17) using the χ N P P (t) recovered from the 2×CO 2 bgc experiment. As expected, the overall prediction indeed improves compared to subfigure ::: and : 8(c) differ only at long time scales, this improvement suggests that indeed obtaining χ β (t) from the 2×CO 2 bgc experiment gives a better recovery over these time scales. Further, because all recoveries are similar at short time scales (see Fig. 7(a)), 460 overall this recovery shows the best quality.

Spectrum of land carbon time scales
A response function obtained with high quality not only results in more accurate predictions, but may also provide valuable information about the internal dynamics of the system. For the case of χ β (t), we find that the recovery with best quality gives a relatively detailed description of the spectrum of land carbon time scales (see Eq. (5)). In this section, we investigate (i) 465 the robustness of this result and (ii) the explanation for the structure of the obtained spectrum. The robustness of this detailed spectrum must be analyzed because, as explained in Part I, the problem to recover the spectrum of time scales from data is ill-posed so that in principle it is not clear to what extent the recovered spectrum can be trusted. To investigate this robustness, we check whether the main characteristics of the spectrum recovered by our RFI algorithm can as well be obtained by two independent methods: a Gregory-plot approach (Gregory et al., 2004) and an approach that combines regional responses for 470 the tropics and extra-tropics. Since the best recovery of χ β (t) was obtained from the response to NPP for the 2×CO 2 bgc experiment, in our investigations we will study only this case.

Obtaining the detailed spectrum
But before we investigate the recovered spectrum, we demonstrate how such a detailed structure may arise from a better recovery of the response function. In Fig. 9(a) we plot the spectrum q(τ ) of the response function used for Fig. 8(b), i.e. χ β (t) 475 recovered with the first approach (see beginning of section 4) and converted to χ N P P (t) via Eq. (21). Because the data used for the recovery have a relatively low quality -the response function was recovered from the 1% bgc experiment taken for only 30 years -, regularization filters out most of the SVD components of the spectrum in Eq. (6). Since the low-index SVD components that are not filtered out tend to be smooth (Hansen, 1989(Hansen, , 1990, see also Part I), the final result of this filtering is the smooth spectrum seen in Fig. 9(a). Obviously, although this smooth spectrum is a sufficiently good approximation to make 480 the predictions shown in Fig. 8(b), it is not very informative about the internal time scale structure. Instead, the spectrum of the response function χ N P P (t) used for Fig. 8(c) has a more detailed structure ( Fig. 9(b)). In this case, the higher quality of the data used for the identification (2×CO 2 bgc experiment taken for 140 years) results in less filtering by regularization, thereby revealing more details of the underlying spectrum. The result is a spectrum with two peak time scales, at around 4 and 100 years 2 .

490
idea here is to try to identify from an independent method important time scales in the response so that they can be compared against the spectrum in subfigure ::: As seen, also the Gregory plot ::: this :::::::::::::: linear-regression :::::::: procedure : reveals a time scale structure 500 dominated by two time scales. While the shortest time-scale peak of the spectrum partially overlaps with the corresponding time-scale range from the Gregory :::::::::::::: linear-regression plot, the longest time-scale peak is almost perfectly matched 3 . This result suggests that indeed the recovered spectrum reflects internal characteristics of the global response of the land carbon cycle. 2 We ignore the negative values obtained for time scales smaller than 1 year (data time step) and larger than 140 years (time series length) because spectral contributions at time scales much longer or much shorter than the time scales covered by data cannot be correctly recovered. Yet, as demonstrated in Appendix D, their wrong recovery does not strongly affect the recovered response function so that they can be safely ignored. Other slightly negative values between the two peaks are probably small recovery errors that inevitably appear as a consequence of ill-posedness and the filtering by regularization (such slightly negative values have also been observed in recovered spectra shown in Appendix E). 3 A possible reason for the better matching at long time scales is that these time scales contribute to the response at short and long times, while contributions at short time scales decay rapidly and therefore contribute only at short times. This gets clear by considering for example a response function χ (t) = a 1 e −t/τ 1 + a 2 e −t/τ 2 , τ 2 τ 1 . While a 2 e −t/τ 2 contributes to χ (t) at small and long times, a 1 e −t/τ 1 contributes only at small times.
(a) First approach taking CO 2 as forcing (1% bgc experiment) (b) Third approach taking NPP as forcing (2×CO 2 bgc experiment) (c) Gregory-type plot for land carbon Figure 9. Spectra associated to χ N P P derived with different resolutions and Gregory : a :::::::::: Gregory-type : plot for :: to :::::: identify ::: the :::::::: time-scale Checking the robustness of the spectrum via regional response functions The robustness of this detailed spectrum can be further checked by a different method. In the following, we test this robustness 505 by checking the consistency between the time scales of global and regional carbon responses.
To study regional carbon responses we considered two regions: Tropics and extra-tropics. Tropics were defined as the region between latitudes 30°south and 30°north, and extra-tropics as the remaining part of the globe. We then determined separately the linear response functions that characterize the land carbon response to NPP in the tropics and extra-tropics, defined respectively by 510 ∆C bgc,tr (t) = t 0 χ tr N P P (t − s)∆N P P tr (s)ds + η tr (t), The data were taken once more from the 2×CO 2 bgc experiment.
Before assessing the robustness of the spectrum, we check the consistency between the regional and global response functions. In this test, we show that the global response function χ N P P can be reconstructed by combining χ tr N P P and χ et N P P . For 515 this purpose, we write the global land carbon as the sum of the land carbon in the tropics and extra-tropics: ∆C bgc (t) = ∆C bgc,tr (t) + ∆C bgc,et (t).
Plugging Eqs. (17), (22) and (23) in Eq. (24) and recognizing that Applying a Laplace transform to both sides of Eq. (25) and reorganizing the resulting equation gives 520 χ N P P (p) = χ tr N P P (p) ∆N P P tr (p) + χ et N P P (p) ∆N P P et (p) ∆ N P P (p) .
(26) Therefore, the Laplace transformed χ N P P (p) can be obtained from combining the NPP forcings and the response functions for the tropics and extra-tropics. Hence, if the response functions are correctly recovered by the RFI algorithm, Eq. (26) should hold at least approximately. In order to check this, we computed the Laplace transforms analytically by approximating the NPP forcings by step functions (since we take the 2×CO 2 bgc experiment), and using Eq. (5) for the response functions. Figure   525 10(a) shows the quality of the result. The small discrepancy between χ N P P (p) obtained from the global carbon response and from combining the regional responses can be at least partially explained by the approximation error made to represent the forcings by a step function.
Hence, one can infer that Therefore, the global response function χ β (t) can be obtained from χ tr β (t) and χ et β (t). But in addition, since χ (t) is given by 540 Eq. (5), Eq. (30) implies that one can also obtain the global spectrum q β (τ ) by combining the regional spectra: Therefore, using the recovered response functions for tropical and extra-tropical carbon one can obtain the global response function χ β and its associated spectrum q β . Accordingly, in this test we check Eqs. (30) and (31). For the calculation of the derivatives in Eqs. (27) and (28) we fitted N P P tr = N P P tr (c) and N P P et = N P P et (c) ::::::::::::::::::::: N P P tr = N P P tr (C atm ) :::: and 545 ::::::::::::::::::::: N P P et = N P P et (C atm ) : once again by a polynomial of order 6 (which obtained the best results for global NPP in Fig. 7(b)) and took the derivatives from the fits. For χ β (t) we used the recovery with best quality from Fig. 7 ("NPP 2× pol6 "). The spectra q β (τ ), q tr β (τ ) and q et β (τ ) are obtained by scaling the spectra from χ N P P (t), χ tr N P P (t) and χ et N P P (t) by the respective derivatives ∂N P P ∂c c=c P I , ∂N P P tr ∂c c=c P I and ∂N P P et ∂c c=c P I :::::::::::::::: ∂N P P ∂Catm Catm=C 0 atm , :::::::::::::::: ∂N P P tr ∂Catm Catm=C 0 atm :::: and ::::::::::::::::: sults are shown for χ β in Fig. 10(b) and for q β in Fig. 10(c). As seen in subfigure ::: perfectly the values of χ β for times beyond about 20 years. Likewise, the spectrum q β is almost perfectly reconstructed for time scales above 6 years, with a slight discrepancy between 15 and 25 years (see Fig. 10(c)). A larger disagreement is seen below 20 years for χ β and below 6 years for q β . One of the reasons is that only little information is available for time scales smaller than 1 year because this is the time step taken for the data. However, Appendix D shows that time scales much shorter than the time step affect only χ (0). The main reason for the disagreement is that high frequencies (and thus small time scales) are the 555 most problematic to recover due to the ill-posedness of the problem that obscures the signal particularly at high frequencies.
Despite the disagreement at small time scales, Figs. 10(b) and (c) add confidence that: 1) The response functions for global, tropical and extra-tropical carbon can be trusted over mid-to-long time scales (Fig. 10(b)); and 2) the two-peak spectrum obtained for global land carbon indeed characterizes the response, since its computation from two independent approaches (combining regional spectra in Fig. 10(c) and the Gregory ::::::::::: Gregory-type plot in Fig. 9(c)) yield similar results with characteristic 560 time scales matching the peaks of the spectrum.
Investigating the two-peak structure of the spectrum The reasons for the two-peak structure of the land carbon spectrum are conceivable. In principle, one possibility could be that the short time scales reflect the carbon dynamics in the tropics, a region with higher temperatures and thus larger heterotrophic respiration rates (see e.g. Raich and Potter, 1995), while the long time scales may originate from the carbon dynamics in the 565 extra-tropics, where respiration rates are smaller due to lower temperatures. This hypothesis may be checked by examining Fig. 10(d), which shows the contribution from the tropics and the extra-tropics to the land carbon spectrum (Eq. (31)). But as seen, the two peaks arise both in the tropical and in the extra-tropical spectrum, so that one cannot attribute each peak to a particular region. Therefore, this cannot be the explanation.
An alternative hypothesis is that the different peak time scales originate from the very different characteristic time scales of 570 functionally different elements in the land carbon cycle such as leaves, wood, litter and soil. In the following we investigate whether this hypothesis can explain the two-peak structure.
For this purpose, we split the land carbon pools of the MPI-ESM into two groups (see Fig. 11). In the first group are the pools whose dynamics is governed by fast processes such as shedding and decomposition of leaves, thus the pools that respond at short time scales. These are the pools representing non-woody tissues in living vegetation (leaves, fine roots, sugars, and 575 starches) and the associated litter. In the second group are the pools with dynamics determined by slow processes such as the decomposition of woody plant parts, hence the pools that respond at long time scales. These are the pools representing the wood in stems and coarse roots, the associated litter pools, and the soil carbon (humus). Now, we separate the land carbon spectrum into contributions arising from each of these two groups. The land carbon response can be described as the sum of the collective responses from the pools with fast and those with slow dynamics: In the linear response framework, the response of each term to NPP is given by which implies χ N P P (t) = χ f N P P (t) + χ s N P P (t).
(34) 585 Finally, assuming that each response function in Eq. (34) is described by Eq. (5), we obtain the separation of the land carbon spectrum in terms of the contribution from each pool group: Hence, if our hypothesis is correct, then the peak at short time scales should be explained by q f N P P and the peak at long time scales should be explained by q s N P P .

590
To proceed with the analysis we now need to obtain the spectra q f N P P and q s N P P . When investigating the tropical and extra-tropical land carbon, we obtained each regional spectrum individually by applying the RFI algorithm to the data from the tropical and extra-tropical responses. In principle one could proceed in the same way to separate contributions from the two pool groups, but, as will get clear below, this approach introduces slight inconsistencies between the separate recoveries that makes a quantitative comparison of the two contributions less reliable than the alternative method that we use in the following 595 to separate the fast and slow components of q N P P . The idea of this alternative approach is the following. Numerically the land carbon spectrum is given by the regularized solution (6), i.e.
where the regularization parameter λ is determined by the RFI algorithm. The land carbon response ∆C bgc is given by the 600 sum (32) of the responses from the pools with fast and those with slow dynamics. Entering Eq. (32) into Eq. (36) yields Therefore, by deriving the spectra q f N P P,λ and q s N P P,λ using the same regularization parameter λ employed for the land carbon spectrum q N P P,λ , we can in principle accurately reconstruct q N P P,λ from q f N P P,λ and q s N P P,λ . By obtaining q f N P P,λ and q s N P P,λ in this way and combining them via Eq. (35) we show that indeed q N P P,λ can be very 605 accurately reconstructed (Fig. 12(a)). This approach leads as well to an almost perfect reconstruction of the response function χ N P P via Eq. (34), as shown in Fig. 12(b). Compared to our previous approach employed to reconstruct the global spectrum from the regional spectra (Eq. (31)), this alternative approach gives a more accurate reconstruction because it takes the same regularization parameter λ for all q N P P,λ , q f N P P,λ , and q s N P P,λ , in contrast to the previous approach where λ was separately calculated by the RFI algorithm for each spectrum in Eq. (31). (c) separation of spectrum qNP P into q f N P P and q s N P P ; (d) separation of response function χ N P P into χ f N P P and χ s N P P ; (e) separation of χ N P P into contributions from the individual pools. For more details see text.
Now, the question is whether the spectra q f N P P and q s N P P can indeed explain each peak in the land carbon spectrum q N P P . To check this, in Fig. 12(c) we plot the three spectra. As seen, clearly the peak at short time scales of the land carbon spectrum arises mostly from the large peak in the fast-dynamics spectrum, while the peak at long time scales follows closely the large peak in the slow-dynamics spectrum. This result thus indicates that our hypothesis is correct, so that indeed the two-peak spectrum originates from the different characteristic time scales of functionally different elements in the land carbon cycle.

615
More insight into the dynamics of the land carbon can be gained by analyzing the different contributions to the response function χ N P P shown in Figs. 12(d) and 12(e). Subfigure ::::: Figure :: 12(d) shows the contribution of the pools with fast dynamics χ f N P P and that of the pools with slow dynamics χ s N P P . We see that χ f N P P only dominates the response at times smaller than about 2 years. Further, for times larger than 25 years only χ s N P P contributes to the response. In subfigure ::: Fig. ::: 12(e) we further separate the response functions into the contributions from the individual pools, i.e. 620 χ f N P P (t) = χ nwt N P P (t) + χ nwl where χ nwt N P P is the response function for the non-woody tissues in living vegetation, χ nwl N P P for non-woody litter, χ wt N P P for woody tissues in living vegetation, χ wl N P P for woody litter, and χ h N P P for the humus pool. This more detailed separation shows that for long times the humus pool starts to dominate the response so that at times larger than 100 years it gives the only 625 contribution.

Summary, discussion and outlook
Although the γ and β values introduced by Friedlingstein et al. (2003) provide a useful framework for model intercomparison, they only characterize the sensitivity of a model for a particular perturbation scenario. Instead, one would like to characterize the sensitivity as such :: an :::::::::: independent ::::::: property ::: of :: the :::::: carbon ::::: cycle. The dependence of γ and β on the considered scenario arises 630 because of the internal memory of the system, i.e. because of the dependence of the response on past values of the perturbation.
When the memory is taken into account, linear response functions arise as natural generalization of the γ and β sensitivities.
But a fundamental step for applying this generalized framework is to identify the appropriate linear response functions. In Part I, we developed a method to identify linear response functions from data using only information from an arbitrary perturbation experiment and a control simulation. In that study, the robustness of the method in the presence of noise and nonlinearity was 635 demonstrated in applications to data from perturbation experiments performed with a toy model.

Generalized land carbon sensitivities in the MPI-ESM
In the present study, we demonstrated that our RFI method can as well be employed to derive response functions from C 4 MIP data. Here, we identified for the MPI-ESM, using data from standard 1% experiments, the land carbon generalized sensitivities χ γ and χ β , i.e. the linear response functions that generalize the γand β-sensitivities for land carbon. The robustness of the identification (sections 3 and 4). With the aid of additional experiments, we estimated the linear regime that gives the range of forcing strengths for which each generalized sensitivity can predict model responses. For χ γ , results indicate a linear response at least until the end of the 1% experiment, corresponding to temperature perturbations of around 6 K. For χ β , the estimate is for CO 2 perturbation strengths up to 100 ppm. In addition, we analyzed different approaches that may be employed to improve 645 the quality of the recovery of the response functions. For χ γ , taking the response from a 2×CO 2 experiment demonstrated to give smaller prediction errors for every response evaluated, suggesting that this type of experiment gives also a better recovery.
For χ β , we used the knowledge of the relationship between NPP and CO 2 to account for some of the nonlinearities in the response. We found that nonlinearities can to a large extent be explained by the nonlinear relationship between NPP and CO 2 : By using NPP instead of CO 2 as forcing an approximately linear response is found over the whole 1% experiment, i.e. until 650 CO 2 perturbations of about 850 ppm. In addition, this approach yielded the best recovery for χ β . Evidences for this conclusion are the quality of its prediction and the detailed spectrum it yields for the response.

Spectrum of land carbon time scales
Obtaining the spectrum of the response is an additional advantage of the RFI method. Most methods in the literature either recover χ (t) pointwise -and therefore do not give the spectrum -, or try to fit it with few exponents without accounting 655 for ill-posedness, which does not give reliable results for the spectrum (see e.g. the famous example from Lanczos, 1956, p. 272). In the application to the MPI-ESM, obtaining the spectrum has proven advantageous for two reasons. First, it allows to formally extend the recovery of the response function beyond the time range of the length of the underlying time series.
Results from such an extension (Figs. 8(a) and (b)) demonstrated that the recovered response function contains information on times reasonably longer than the time series length used for the recovery.

660
Second, the spectrum gives valuable insight into the internal dynamics of the system. In particular, for our application the spectrum gives the most relevant time scales for the land carbon response on a global or regional level. The spectrum associated with the best recovery of χ β showed two peak time scales for the global response: One around 4 and another around 100 years (section 5).
To obtain evidence for the robustness of this result, we showed that it is possible to reconstruct the global spectrum from the 665 recovered tropical and extra-tropical spectra (see Fig. 10(c)), and that similar time scale ranges can be obtained via a "Gregory plot" :::::::::: Gregory-type : approach (see Fig. 9(b) and (c)). Further, we demonstrated that the recovered tropical and extra-tropical response functions combine to the identified global response functions, indicating consistency between regional and global recovery (see Fig. 10(a) and (b)).
We then proceeded to investigate the reason for the two-peak structure of the spectrum. To this end, we separated the land 670 carbon response into the response from pools with fast dynamics (non-woody vegetation tissues and associated litter) and the response from pools with slow dynamics (woody vegetation tissues, woody litter, and humus). By analyzing the spectrum for each of these responses we showed that the peak at short time scales of the land carbon spectrum arises mostly from the contribution of the pools with fast dynamics, while the peak at long time scales follows closely the contribution from the pools with slow dynamics. This investigation therefore suggests that the two-peak spectrum results from the different contributions 675 of functionally different elements of the land carbon cycle. Analysis of the response functions showed that the pools with fast dynamics dominate the land carbon response only for times below 2 years. Further, for times larger than 25 years only the pools with slow dynamics contribute to the response, and from 100 years onwards the contribution comes solely from the humus pool.
We remark that results for the spectrum should be always interpreted with care, since deriving the spectrum may be a more 680 complicated problem than deriving the response function. This is partially explained by the degree of ill-posedness of each problem. In deriving the response function from Eq.
(3), one is solving a deconvolution problem, which is known to be illposed because of the smoothing property of the convolution operator (e.g., Landl et al., 1991;Bertero et al., 1995;Hansen, 2002). On the other hand, by deriving the spectrum from Eqs. (3) and (5), one is solving not only a deconvolution (Eq. (3)) but also a type of inverse Laplace transform (to obtain the spectrum from the response function (5)), which is known to be 685 extremely ill-posed because of the smoothing property of the Laplace operator (e.g., McWhirter and Pike, 1978;Istratov and Vyvenko, 1999). As a result, deriving q(τ ) from the data ∆Y (t) involves two smoothing operations, namely the Laplace transform and convolution, whereas deriving χ (t) from ∆Y (t) involves only convolution, so that the problem of finding q(τ ) may be more ill-posed than that of finding χ (t). This difficulty was exemplified by results from appendices D and E, which discuss cases where the response function can be perfectly recovered but the recovered spectrum is relatively poor.

690
Another worthwhile comment is that, in comparison to the widely used assumption that the spectrum can be described by only a few exponents (Maier-Reimer and Hasselmann, 1987;Enting and Mansbridge, 1987;Enting, 1990;Joos et al., 1996;Pongratz et al., 2011;Joos et al., 2013;Colbourn et al., 2015;Lord et al., 2016), our assumption of a continuous spectrum (ansatz (5)) has some advantages. First, by our approach one circumvents the complication of determining the number of exponents. This leads to a linear problem and thereby to a more transparent method (see Part I, section 3.1). Second, our ansatz 695 better describes the expectation that variables integrated over many different climate zones are composed by a large range of time scales. If, in contrast to these expectations, the underlying spectrum has only few time scales, results in Appendix E show that our approach may still yield a reasonable recovery.

Outlook
In this study we investigated the generalized γand β-sensitivities that solve the scenario dependence problem noted e.g. by 700 Gregory et al. (2009) and Arora et al. (2013) to linear order in the perturbation -an approach that can in principle be extended to higher orders (Ruelle, 1998;Lucarini, 2009). By demonstrating how to successfully recover generalized sensitivities, this study paves the way for their future application in studying the dynamics of the combined carbon-climate system in different Earth System Models. Applying our RFI method, we showed for the MPI-ESM that one can obtain these sensitivities from standard C 4 MIP 1% simulations. In addition, the estimates for the linear regime obtained by employing the recovered sensitivities to 705 predict additional experiments give a hint into the range of perturbation strengths for which this generalized framework might be valid as well for other models, for which data from additional experiments necessary to determine this linear regime are not available. Since the process descriptions of the land carbon cycle are quite similar in different models, it may be assumed that the linear range estimates obtained for MPI-ESM in the present study apply also to other models. Considering the radiative land carbon response, thereby also for other models their χ γ should fully characterize the response to temperature up to at least 710 6K. As a consequence, in models with temperature response similar or weaker than the MPI-ESM, χ γ can as for the MPI-ESM be recovered taking the full time series from the 1% experiment. Analogously, considering the generalized sensitivity χ β , the experience presented in the present study for the MPI-ESM suggests that also for other models the response to CO 2 perturbations is linear up to 100 ppm. As a consequence, χ β can be recovered taking data from the 1% experiment for the first 30 years. Alternatively, as discussed in section 4, the time range for the recovery of χ β can be extended if one takes 715 as forcing NPP instead of CO 2 (third approach investigated in section 4). For models with NPP response similar or weaker than the MPI-ESM, by this approach χ β can be recovered taking the full time series from the 1% experiment. Still, as the relatively successful predictions in Fig. 8 TCR are shown to be only particular values encoded in a linear response function.
Linear response functions can also be useful in studying "committed changes" (Wigley, 2005;Plattner et al., 2008;Jones et al., 2009;Mauritsen and Pincus, 2017). As shown by Ragone et al. (2016), the concept of climate inertia (closely related to commited changes) can be explicitly described within the linear response framework. Since the linear response function describes the delayed response of the system to a perturbation, by deriving this function one has at hand also the information 730 of how the system reacts once the perturbation -for instance CO 2 emissions -ceases.
Further, linear response functions can help understanding the concept of "emergent constraints" (Nijsse and Dijkstra, 2018).
Recent studies have shown how to obtain from response functions derived from conceptual models emergent constraints for the real Earth System (Cox et al., 2018;Williamson et al., 2019). With the method developed here, in principle such type of analyses may be carried out employing instead response functions derived from Earth System Models.

735
As a result of accounting for the memory of the system, the linear response function gives information on the strength of the response at all internal time scales covered by the time series underlying its recovery and probably even slightly longer (see Fig. 8(b)). Using our method, in principle one can even compare the spectra of time scales from models to those from observations (e.g., Forney and Rothman, 2012b). The method presented here may also be applied to analyze changes in age and transit time distributions of carbon in different models, since these distributions can be derived directly from linear response 740 functions (Thompson and Randerson, 1999).
In all of the mentioned examples, the generality of the RFI method allows for the derivation of the appropriate linear response functions for any model by taking only data from an arbitrary perturbation experiment and a control experiment.
Such generality opens the possibility of combining the linear response framework, which has been gaining increasing attention due its wide applicability in climate science (e.g., Lucarini, 2009;Lucarini and Sarno, 2011;Lucarini et al., 2014;Ragone 745 et al., 2016;Lucarini et al., 2017;Aengenheyster et al., 2018;Ghil and Lucarini, 2020;Lembo et al., 2020;Bódai et al., 2020), with model intercomparison studies, hopefully leading to a deeper understanding of critical differences encountered across models. Earth System Model (MPI-ESM; for more details see Giorgetta et al., 2013). The MPI-ESM consists of the coupled general circulation models ECHAM6 (Stevens et al., 2013) for atmosphere, at T63/1.9 • horizontal resolution with 47 vertical levels, and MPIOM  for ocean, with a nominal resolution of 1.5 • with 40 vertical levels. In addition, the MPI-ESM includes the subsystems JSBACH Schneck et al., 2013), a land and vegetation model, and the marine biogeochemistry model HAMOCC (Ilyina et al., 2013). JSBACH and HAMOCC describe respectively the land and ocean 755 carbon cycle in the MPI-ESM. Thus, of particular interest for our study is the subsystem JSBACH. JSBACH simulates fluxes of energy, water, momentum and CO 2 between the land surface and atmosphere. To represent subgrid scale heterogeneity, each grid cell of the land surface is divided into tiles, being each tile associated with a vegetation type (or "plant functional type").
The photosynthesis scheme follows Farquhar et al. (1980) and Collatz et al. (1992). The land carbon structure is divided into three vegetation pools (living tissues, carbohydrate and starch storage, and wood), four aboveground and belowground pools 760 for litter from woody and non-woody parts and a pool for soil carbon (humus). In addition, JSBACH includes a dynamic vegetation scheme that allows for simulating changes in vegetation cover driven by climate.

Appendix B: Generalization of climate-carbon cycle sensitivities
In this appendix we show that indeed linear response functions generalize Friedlingstein α-β-γ sensitivities. We explain this by taking the β-sensitivity of land carbon uptake as an example. The calculation of β is based on an experiment where the 765 CO 2 -rise acts only biogeochemically, i.e. concerning the land carbon via the photosynthesis of plants. Calling ∆C bgc (t) the difference in land carbon between the perturbed and the control simulation, β is defined as :::::::: , ∆cC atm :::: (t) = c P I C 0 atm :::: where t is the time in years elapsed since the perturbation was switched on, and ∆c(t) :::::::: ∆C atm (t) : is the change in CO 2 concentration when increasing atmospheric CO 2 each year by 1% starting from its pre-industrial value c P I ::::: C 0 atm of the control 770 simulation. In the framework of linear response, one can understand ∆c(t) :::::::: ∆C atm (t) : as perturbation and ∆C bgc (t) as response so that the response formula reads ∆C bgc (t) = This equation defines χ β (t) as the linear response function describing the biogeochemical response of land carbon to any type of CO 2 perturbation ∆c(t) ::::::::: ∆C atm (t), as long as the perturbation is sufficiently weak. Employing in particular the percent 775 perturbation from Eq. (B1), β(t) is thus obtained from the linear response function χ β (t) as β(t) = 1 ∆c(t) 1 ∆C atm (t) :::::::: t 0 χ β (t − s)∆cC atm :::: (s)ds. (B3) In this way, χ β (t) can be understood as a generalization of β(t), that gives not only the response to percent-type perturbations, but also to other perturbations. Accordingly, χ β (t) characterizes system properties independent of the type of perturbation, in contrast to β(t).

780
Appendix C: Monotonicity of χ β (t) In this appendix it is shown that the response function χ β (t) is monotonic, as claimed in subsection 2.1. The argument here is actually more general, namely that the response function to changes in the input of the system (for land carbon, the Net Primary Production) is monotonic. Since χ β (t) is related to χ N P P (t) by Eq. (21), the monotonicity property transfers to χ β (t).
Let the linear response of land carbon be described by Eq. (17): where for simplicity we assume that η(t) is small so that it can be neglected. If NPP is a Dirac delta function ∆N P P (t) = δ(t), then the response is given by ∆C bgc (t) = χ N P P (t). Therefore we can interpret the response function χ N P P (t) as follows: If a certain number of carbon atoms enter the biosphere at time t = 0, the response function χ N P P (t) gives the fraction of these atoms still left in the biosphere at time t.

790
Let p(t)dt be the probability that an atom that entered the system at time t = 0 will leave it at time t. Then is the probability that an atom that entered the system at time t = 0 will leave the system until time t. Hence from the interpretation of χ N P P (t) above From Eqs. (C2) and (C3) it follows that But since the probability density function p(t) ≥ 0 ∀ t, then d dt χ N P P (t) ≤ 0 ∀ t, i.e. χ N P P (t) decays monotonically towards zero. Therefore, because χ β (t) is simply a scaling of χ N P P (t) given by Eq. (21), χ β (t) also decays monotonically towards zero.

800
Appendix D: Derivation of spectrum and χ (t) when the response contains time scales much longer or much shorter than the time scales covered by data Time scales much longer or much shorter than the time scales covered by data cannot be correctly recovered in the spectrum.
Nevertheless, in this appendix we show that the wrong recovery of these extreme time scales does not strongly affect the recovery of χ (t). These considerations add to the footnote remark in section 5 where we claim that such extreme time scales 805 can be safely ignored.
First, we consider the case where the response has time scales much longer than those covered by data. Let χ at time T be given by Let T be the time series length and assume that q(τ ) has significant contributions at time scales τ ≥ τ L with τ L >> T . Then 810 (D1) can be written as where e −T /τ ≈ 1 was used in the second integral because τ L >> T . Thereby the second term in the right-hand side of (D2) is just a constant offset ∞ log 10 τ L q(τ )d log 10 τ = k, k constant. (D3) 815 Hence, for internal time scales much longer than the time series length T , the correct recovery of the individual q(τ ) values is not relevant for the derivation of χ (t), as long as those values combine to the correct offset in Eq. (D3). Note that because this argument is based on the condition τ L >> T , it applies not only to χ (T ) but also to all χ (t), t < T . Now, we consider the case where the response has time scales much smaller than those covered by data. Let t =: i∆t, where ∆t is the time step and i ∈ IN . If q(τ ) has significant contributions at time scales τ ≤ τ S with τ S << ∆t, then for i > 0 Eq. (5) 820 can be written as where e −i∆t/τ ≈ 0 was used for τ < τ S because τ S << ∆t. As a result, values of q(τ ) are irrelevant to χ (t) for almost every i. The only exception is i = 0, where one has

825
Hence, for time scales much shorter than the time step ∆t, the correct recovery of the individual q(τ ) values is not relevant for the derivation of χ (t) as long as those values combine with the other recovered values of q(τ ) to the correct χ (0) in Eq. (D5). As shown by Eq. (D4), if they combine to the wrong value, only the recovery of χ (0) is affected. Therefore, in principle χ (t) can be correctly recovered also when the response contains much longer and much shorter time scales than those covered by data, as long as the recovered q(τ ) values at these extreme time scales combine to the correct 830 values of the offset in Eq. (D3) and χ (0) in Eq. (D5). In Fig. D1 we show an example of recovery of the spectrum and response function in such a case. For the recovery we took data with SN R = 6 × 10 4 from the 1% experiment with our toy model (see Part I). As seen, the spectrum can only be partially recovered (subfigure ::: Fig. ::: D1(a)). The recovery wrongly estimates spectral contributions at time scales longer and shorter than those covered by data, i.e. time scales larger than the time series length T = 140 and smaller than the time step ∆t = 1. In addition, the existence of such spectral contributions at time scales 835 not covered by data seems to also deteriorate the recovery of the spectrum at the time scales actually covered by data: In the region 1 < τ < 140, despite the high SNR the recovered spectrum matches the true spectrum only partially. This is probably a compensation effect where wrong information shows up in the recovered spectrum to compensate missing information on the response function. For instance, to obtain the correct χ (0) Therefore, as claimed in section 5, this numerical example shows that even though very long and very short time scales cannot be correctly recovered in the spectrum, they do not strongly influence the recovery of χ (t). This is because they only influence the offset in Eq. (D3) and χ (0) in Eq. (D5), and those two values seem to be reasonably well recovered numerically.

850
Appendix E: Recovering a discrete spectrum q(τ ) This appendix gives numerical evidence for the claim in section 6 that although the RFI method assumes that the response has a continuous spectrum of many time scales, for sufficiently good signal-to-noise ratio the recovered continuous spectrum may also give a reasonable approximation to an underlying discrete spectrum of only few time scales.
Before we give numerical examples, one must first understand the limitations of the recovery of the spectrum. As explained in 855 Part I, the spectrum q(τ ) can only be completely recovered (for sufficiently high SNR) if it is dominated by the first components of expansion (6). By Hansen's observation (Hansen, 1989(Hansen, , 1990, see Part I), this means that the spectrum must be dominated by low frequencies, i.e. it must be to some extent smooth. However, an underlying discrete spectrum implies a spectrum that  Figure D1. Recovery of spectrum and χ (t) taking a true spectrum with contributions at time scales much longer and much shorter than the time scales covered by data. For the recovery, data with SN R = 6 × 10 4 from the 1% experiment with the toy model (see Part I) was taken (times series length T = 140 and time step ∆t = 1). RFI parameters are taken as in Part I except for M = 140. The recovered spectrum matches only partially the true spectrum. Nevertheless, the response function χ (t) is almost perfectly recovered.
is instead given by "spikes" in the time scale domain. Such "spikes" can only be described by high-frequency components of Eq. (6). Therefore, a discrete spectrum can be only partially recovered. To obtain a sufficiently good recovery, the solution (6) 860 must contain many components. Hence, the data must have a sufficiently high signal-to-noise ratio. The explanation for this conclusion is the following: If many components in Eq. (6) must be recovered, the regularization parameter must be small; but a small regularization parameter requires a small noise level (see Theorem 3.3.1 in Groetsch, 1984).
With this in mind, we show in Figs. E1-E9 that at least smooth approximations to an underlying discrete spectrum can be obtained. For the results we took data with SN R ∼ O(10 4 )−O(10 5 ) from different experiments performed with the toy model 865 described in Part I. Since the aim is to recover discrete spectra, a larger number of time scales M gives a better resolution.
Therefore, we take M = 140. All other RFI parameters are taken according to Part I. Also, monotonicity needed not to be taken into account (step 6 of Fig. 1 in Part I).
Figures E1-E6 show results for taking data from a 1% experiment and Figs. E7-E9 from a 2×f 0 experiment. We start with one time scale τ = 37 (Fig. E1). As expected, the spike cannot be perfectly recovered, but the recovery gives a smooth 870 approximation to it, with peak coinciding approximately with the "true" value. On the other hand, the response function is almost perfectly recovered. This is a result of the ill-posedness of Eq. (5): In the same way that high frequencies of the spectrum are suppressed in the data by Eq. (3), they are also suppressed in the response function by Eq. (5) (see Groetsch, 1984, section 1.1). Therefore, both the true spectrum and its smooth recovery result in practically the same response function.
In Fig. E2, the true spectrum has two time scales, this time at τ = 7 and τ = 100. Similarly to Fig. E1, the time scales are 875 recovered by a smooth approximation with peaks approximately at the true values. Also similarly to Fig. E1, the response function is almost perfectly recovered. A similar result is obtained if we take time scales that are a bit closer together, as seen in Fig. E3 (τ = 7 and τ = 37). Nevertheless, now the peak for the longer time scale is a worse approximation, and there is a slightly pronounced negative peak that does not reflect the true spectrum.
In Fig. E4, we take instead time scales τ = 37 and τ = 100. Here, the resolution is not sufficiently high for a recovery 880 of each time scale separately. Instead, the recovered spectrum shows only one mode that spans both spikes. Once more the response function can be almost perfectly recovered. Taking the three time scales τ = 7, τ = 37 and τ = 100 (Fig. E5), the smooth recovery shows only two modes: One at τ = 7 with peak almost coinciding with the true value, and another with peak in between τ = 37 and τ = 100. If one in addition considers a fourth time scale τ = 1 (Fig. E6), once more the recovery shows only two modes. But now the mode at shorter time scales displays a longer tail that spans both τ = 7 and τ = 1.

885
The situation changes when we take for the recovery the 2×CO 2 experiment. According to the Laplace transform analysis in section 3, data from this experiment should give more information on small time scales. Figure E7 shows that this is indeed the case: In contrast to the recovery from Fig. E6, now the time scale τ = 1 is also recovered. On the other hand, now there are two small negative peaks that do not reflect information from the true spectrum. Also, the resolution is not sufficiently good to recover separately the two peaks at long time scales. However, if the longest time scale is increased from τ = 100 to τ = 228 890 (Fig. E8), a small peak can be seen around this time scale. In addition, now the peak at τ = 37 matches slightly better the true value. In Fig. E9 we add another time scale, now at τ = 518. We see that although this time scale cannot be recovered separately from τ = 228, the peak at long time scales is more pronounced. This is in contrast to the fact that the time series used for the recovery reaches only t = 140, indicating that the method can recover information on time scales even longer than the time series length. This is in agreement with the conclusions from section 4, where in one case χ β (t) was recovered from 895 a time series with only 30 years but could recover responses for t = 140 years (see Fig. 8(b)).          Figure E9. Spectrum q λ and response function χ (t) recovered from a 2×f0 experiment performed with the toy model described in Part I taking an underlying discrete spectrum with five time scales τ = 1, τ = 7, τ = 37, τ = 228, τ = 518. The data were taken with SN R ∼ O(10 4 ) − O(10 5 ).