Comparing estimation techniques for timescale-dependent scaling of climate variability in paleoclimate time series

climate variability in paleoclimate time series Raphaël Hébert 1, Kira Rehfeld 2, and Thomas Laepple 1,3 1Alfred-Wegener-Institut Helmholtz-Zentrum für Polarund Meeresforschung, Telegrafenberg A45, 14473 Potsdam, Germany 2Institut für Umweltphysik, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 229, 69120 Heidelberg, Germany 3University of Bremen, MARUM – Center for Marine Environmental Sciences and Faculty of Geosciences, 28334 Bremen, Germany Correspondence: Raphaël Hébert, raphael.hebert@awi.de

justified for temperature and precipitation timeseries at timescales longer than the turbulent weather regime (i.e. timescales longer than weeks or a few years depending on the location and medium (Lovejoy and Schertzer, 2012)). On the other hand, highly intermittent archives which clearly display multifractality, such as volcanic series (Lovejoy and Varotsos, 2016) or dust 90 flux (Lovejoy and Lambert, 2019), would require the "intermittency correction" from the moment scaling function K(q).
Under the quasi-Gaussian assumption, we can simplify equation 2 since this implies K(2) ≈ 0 and therefore: This allows to convert estimatedβ (where "ˆ" denotes an estimator for the given quantity) into their equivalent estimated fluctuation scaling exponentsĤ. We use the fluctuation scaling exponent H for inter-comparison between the methods in 95 this study. H is a natural choice as it describes directly the behaviour of fluctuations in real space and is the usual parameter in functions to generate fractional noises (Mandelbrot and Van Ness, 1968;Mandelbrot, 1971;Molz et al., 1997). While the fluctuation exponent H takes its origin in the so-called Hurst exponent (Hurst, 1956), its meaning and definition has evolved and changed over time, see Graves et al. (2017) and Lovejoy et al. (2021) for historical summaries.
The process to estimate the scaling exponents can be divided into three steps. First, if the series is irregularly spaced, it 100 requires to be regularized in order to be usable for the CPG and the MTM. The regularization is not necessary for the HSF and the LSP which have the advantage that they can be calculated directly on the irregular timeseries. Second, the fluctuations proper to each method are calculated as a function of timescale, and finally, the scaling exponent is fitted on the result.

Power Spectrum
For an ergodic, weakly stationary stochastic process X, the power spectrum P is given by the Fourier transform of the auto-105 covariance function γ, and equivalently the squared Fourier transform of the signal (Storch and Zwiers, 1984), where τ is the timescale and F denotes the Fourier transform operator. Equivalently, the power spectrum can be given as a function of the frequency f = τ −1 instead of the timescale τ . We choose to write it as a function of timescale to allow for visual comparison with the Haar method below, and also because it is more intuitive: a non-expert can easily grasp what the 110 1000-year timescale means rather than the equivalent 0.001-year -1 frequency.

Classical Periodogram
The power spectrum for a discrete process X with N timesteps at regular intervals can be estimated using the classical periodogram (Chatfield, 2013): 115 If log(P (τ )) behaves linearly over a range of timescales τ , the timeseries is considered to be power-law scaling over this timescale band, i.e. P (τ ) ≈ Aτ β for an arbitrary constant A.

Multitaper Spectrum
The MTM method improves upon the CPG by producing independent estimates using a set of orthogonal functions: the prolate spheroidal wave functions h t,k (Slepian and Pollak, 1961), also known as the Slepian tapers, which have the desirable property 120 of minimizing spectral leakage (Thomson, 1982). The spectral individual estimates for the k th taper can be written in a form similar to the periodogram above: The estimator can then be expressed as the mean of the K tapered estimates: 125

Interpolation
To produce the power spectrum with the methods above, it is necessary to interpolate the series at a regular resolution. Following Laepple and Huybers (2013), the data were first linearly interpolated to 10 times the mean resolution, then low-pass filtered using a finite response filter with a cut-off frequency of 1.2 divided by the target mean resolution in order to avoid aliasing.
Linear interpolation corresponds to a convolution in the temporal domain with a triangular window. The Fourier transform of 130 the triangular window is sinc 2 , where the sinc function is defined as sinc = x −1 sin(x), and therefore, a linear interpolation multiplies the power spectrum by sinc 4 modulated by the resolution of the interpolation (Smith, 2011), resulting in a power loss near the Nyquist frequency.

Lomb-Scargle Periodogram
As an alternative to the classical periodogram which requires regular data, Scargle (1982) introduced the LSP as a generalized (i.e. an exponential distribution) and therefore, A and B will be chosen to retain this property. For independent and identically distributed white noise, it is the case when: The CPG is invariant to time translation since shifting the timesteps by a constant value only affects the phase of the complex exponential inside the absolute value. The function T is thus introduced for the generalized periodogram to retain that property, which is the case when: The LSP has been mostly used in astronomy to detect periodic components in noisy irregular data. As outlined above, the functions A and B are chosen following the assumption that the process X is approximately white noise with no temporal correlation. This assumption is of course problematic from the perspective of climate timeseries which usually exhibit longrange temporal correlation, but, as we will see later, good estimates of scaling exponents can nonetheless be recovered over a wide range of H.

Haar Structure Function
The HSF allows to perform the scaling analysis in real space, i.e. without performing a Fourier transform. When used to define a structure function, Haar wavelets are appropriate to estimate the fluctuation exponent of processes with H ∈ (−1, 1) (Lovejoy and Schertzer, 2012), a range covering most geophysical processes, and therefore, also because of its readiness to be applied to irregular series and its ease of interpretation, Lovejoy and Schertzer (2012) argued it makes it a convenient choice for the 160 scaling analysis of geophysical timeseries.
Haar fluctuations H are simple to implement for irregular sampling. They can be defined for a given interval of length τ as the difference between the mean of the first half of the interval with the second half: The discrete sampling t i does not have to be regular since the average of the available fluctuations in the intervals is taken. This 165 is of course an approximation since the fluctuations available might not exactly correspond to the expected timescale.
Higher-order structure functions can be defined by considering other values of q and analyze multifractal processes, but since we are only dealing with the quasi-Gaussian case in this manuscript, q = 1 is sufficient. For simplicity, we refer to the first-order HSF simply as HSF.
Since we are taking the average value of the absolute of the fluctuations, if the fluctuations are distributed in a quasi-Gaussian fashion, then we expect the distribution of Haar fluctuations at each scale to be a half-normal distribution, i.e. a zero-mean Gaussian distribution folded along zero. The mean µ HG of such half-Gaussian distribution, which can be obtained by taking the absolute of a Gaussian distribution, is proportional to the standard deviation σ G of the original Gaussian distribution, 175 i.e. µ HG = 2 π σ G . Since the sign of the Haar fluctuation before taking the absolute is arbitrary, we can also consider the negative of those fluctuations to create an ensemble which is even closer to a Gaussian distribution and has mean equal zero by construction. We then estimate the mean absolute Haar fluctuation using the definition with the standard deviation given above. This procedure marginally improves the estimation procedure over directly taking the average of the absolute of the Haar fluctuations available.

Slope Estimation
For a power-law relationship between variables x and y such as y = Ax B , it is usual to use standard least-square fitting to find the coefficient A and the exponent B as the linear coefficients of the equivalent linear relationship after taking the logarithm of the equation such that: log y = B log x + log A. Least-square fitting assumes the residuals are normally distributed which is often a good approximation for the logarithm of the power spectral density of a stochastic process at a given frequency.

185
In the case of stationary Gaussian processes, it can be shown that the CPG and LSP estimates at a given timescale are distributed like a chi-square distribution with degrees of freedom equal to two (i.e. an exponential distribution), and for the MTM it is approximately twice the number of tapers, albeit slightly less depending on their degree of dependence. The logarithm of the distributions mentioned above is similar enough to a normal distribution to obtain reasonable estimates with an ordinary least-square fit. Another option is to use a generalized linear model with a gamma distribution model which the chi-square is a 190 special case of. While very similar results are obtained for both fitting methods, we chose the generalized linear model method since it is theoretically more justified.
For a Gaussian process, the first-order Haar fluctuations will follow a normal distribution, but as we are fitting the absolute value of the fluctuations, the estimates at a given timescale will follow a half-normal distribution as mentionned above.
Although the half-normal distribution is not a specific case of the gamma distribution, but rather the generalized gamma distri-195 bution, we also use the generalized linear model to estimate its scaling exponent for practical purposes.

Timescale Band for Slope Estimation
For all methods, we need to select a range of timescales over which to estimate the scaling exponent using the generalized linear model. The maximum fitting timescale τ max used was always one third of the largest scale available in order to maximize the range of timescales used while avoiding the longer timescales which have poor statistics in the case of the HSF (Lovejoy and 200 Schertzer, 2012), and underestimate variance in the case of the MTM because of its known bias (Prieto et al., 2007) and the usual linear detrending. Since the impact of irregularity is not important at long timescales, we chose this same τ max for all methods for consistency. The optimal minimum timescale τ min on the other hand can vary depending on the method used and requires a careful treatment.
For the CPG and the MTM, we fitted above the timescale corresponding to the Nyquist frequency, i.e twice the resolution, 205 for regular series. In the case of irregular data, since the interpolation brings about a power loss at small timescales (Schulz and Stattegger, 1997;Rhines and Huybers, 2011;Kunz et al., 2020), fitting over the small timescales produces a positive bias on the slope estimation. Therefore, when estimating the scaling exponents for the irregular series, we consistently fitted above three times the mean resolution τ µ as a compromise (i.e. 1.5 times the Nyquist frequency). This choice of τ min was informed (as for the methods below) from the results using the paleoclimate database (see section 3.3).

210
For both the HSF ans LSP, we used twice the resolution as τ min for the regular cases and we used τ min = 2τ µ for the irregular cases.

Surrogate data
To test the methods, we produce surrogate data with the same characteristic as the paleoclimate archives, namely a given 215 scaling behaviour and an irregular sampling in time.

Simulation of power-law noise
A classical example of a non-stationary process (H > 0) is normal Brownian motion (H = 1 2 ) which is produced by (integer order) integration of a normal Gaussian white noise (H = − 1 2 ). A process with a given scaling behaviour can be obtained by fractional, rather than integer, order integration (or differentiation) of a given set of innovations, i.e. a random series of 220 uncorrelated values obtained from a certain distribution. To generate our surrogate data, we consider the simplest case when the innovations are drawn from a normal Gaussian distribution. This leads to the classical fractional Gaussian noise (fGn) and fractional Brownian motion (fBm) (Mandelbrot and Van Ness, 1968;Mandelbrot, 1971;Molz et al., 1997). For any fGn process there is a related fBm process which can be obtained by (integer-order) integration, which increases the scaling exponent of the process by +1. It is usual to define the associated fGn and fBm processes by the same scaling exponent h ∈ (0, 1) which 225 directly describes the scaling behaviour of the HSF for the fBm, or for the fGn after (integer order) integration. However, it is inconvenient to use the same parameter for both fGn and fBm when considering them in a common framework and we prefer to also identify the fGn by the scaling behaviour of its HSF, rather than that of its integral, such that it has H ∈ (−1, 0). In the current paper, we generally refer to both fGn and fBm as "fractional noise" described by an exponent H ∈ (−1, 1). They are generated using an algorithm developed by Franzke et al. (2012).

230
Series with H ∈ (−0.5, −0.3) are typical of monthly land air-surface temperature up to decadal timescales, while series with H ∈ (−0.3, 0) are more typical of sea-surface temperature over similar timescales. Non-stationary behaviour with H > 0 is typically observed in pre-Holocene series comprising Dansgaard-Oeschger events (Nilsen et al., 2016).

Generation of irregular paleoseries
To produce irregularly sampled series akin to paleoclimate archives, an "annual resolution" series with a given scaling exponent 235 is first produced with the above method, and then degraded at the desired resolution using two different methods. The first one is to simply block average, and the second one is to sub-sample a low-pass filtered version of the series.
For the block averaging method, boundaries are determined as the midpoint between subsequent timesteps, and all data in between is averaged. This corresponds, in the temporal domain, to a convolution with a square window, or equivalently in the frequency domain, multiplying the Fourier transform by the sinc function (sinc(x) = x −1 sin(x)). Therefore, the power 240 spectrum is multiplied by a sinc 2 and this brings about a loss of power at the high frequencies. For the second method, the timescale of the low-pass filter is taken as twice the mean resolution of the series, and the filtered series is simply sub-sampled at the desired timesteps. In the frequency domain, the filtering corresponds to multiplying the spectrum by a square function which cuts off variability below the specified timescale, and therefore, this is useful to reduce the aliasing of higher-frequencies.
The first method would correspond to an archive sampled without gaps and where there is no smoothing of the signal, for 245 example speleothem and varved sediments if samples containing several layers are taken, or marine records when the sample distance is smaller than the typical mixing depth in the sediment (Berger and Heath, 1968). The second method would correspond to archives with spaced out sampling (for example 1 cm every 10 cm) and including processes such as bio-turbation and diffusion which smooth out the high-frequency signal, for example biochemical and ecological data extracted from sediment cores (Dolman and Laepple, 2018;Dolman et al., 2020). We show the second method as our main result since it is more common to have archives with such smoothing processes, but also because it removes most aliasing effect from our results. It therefore leaves a clearer picture of the other effects inherent to each methods. It is however an idealization since the smoothing timescale of the physical processes involved is not related to the resolution; it should be independently estimated and accounted for in applied studies, although it is seldom reported. All the results were also computed for the block average method and are shown in the supplementary information (

255
Previous studies have shown that the distribution of sampling times for typical sedimentary records can be approximated by a gamma distribution (Reschke et al., 2019a). To test systematically the impact of increasingly irregular sampling, we thus draw the time steps from a gamma distribution defined by its shape parameter k (or skewness ν such that k = ν −1 ) and mean parameter µ which corresponds to the mean time-step τ µ . When the skewness parameter is ν = 0, then we have regular sampling of width µ. Figure 1 shows an example of such surrogate series at annual resolution and an irregularly degraded 260 version (with timesteps drawn from a gamma distribution with ν = 1), along with the results of applying the three scaling estimation methods considered for the main analysis: the MTM, the HSF and the LSP. We omit a discussion of the results computed using the CPG as they are generally very similar to those of the MTM, albeit with higher variance.

Performance metrics and performance plots
Our aim is to evaluate how the different methods perform in the estimation of scaling exponents for irregularly sampled 265 paleoclimate data X(t) with different scaling behaviour, and of variable length and irregularity. In order to assess the accuracy and precision of our scaling exponent estimatorĤ for a given set of parameters, we generate an ensemble of surrogate datâ X(t) and analyze its statistics. We evaluate three measures: the bias B for the accuracy of the estimatorĤ with respect to the input H, the standard deviation σ for the precision of the estimator, and the root-mean-square error RM SE which combines both previous measures: 275 We exploited the geometric relation between the three to easily visualize the results. We summarize the results for a set of parameters by one point on a bias-standard deviation diagram (Fig. 3) where the x-axis gives the bias, the y-axis the standard deviation and the distance from the origin then corresponds to the root-mean-square error. The bias and standard deviation at for each combination is calculated from a large ensemble of surrogate data (10 000 realizations).

280
In order to test the methods with the sampling properties of typical paleo-proxy data, we consider an available database of marine and terrestrial proxy records for temperature (Rehfeld et al., 2018), which was also used for signal-to-noise ratio assessments by Reschke et al. (2019b). Each of the 99 sites covers at least 4000 years in the interval of the Holocene (8-0 kyr ago, 88 time series in total) and/or the LGM (27-19 kyr ago, 39 time series total) at a mean sampling interval of 225 years or lower. These records are irregularly sampled in time, and come with different sampling characteristics (Fig. 2).

285
3 Results Using the methods described above to generate synthetic data, we can evaluate the ability of each method to recover the input scaling exponents. For each method, we calculate the bias and standard deviation over the ensembles of surrogate data, for input H between -0.9 and 0.9 in increments of 0.1. This wide range of H values covers all values encountered later in the multi-proxy database; the vast majority of climatic series even fall within an even more restricted range of H ∈ [−0.5, 0.5].

290
First, we consider the ideal case of regular sampling, then we study the effect of irregular sampling and finally, the impact of the length of the time series.

Effect of Regular and Irregular Sampling
We evaluate the estimators for four cases pertaining to the resolution of the data and always a fixed length of 128 data points (Fig. 3). The first case considers "annual data" which was directly simulated and not degraded afterwards; it is therefore regular. It is shown for comparison with the second case where we simulate 5120-year long series and then degrade them to 40-year resolution. This allows us to study the impact of the degrading method which is necessary for producing irregular series. The third and fourth cases deal with series of 128 irregular timesteps drawn from a gamma distribution with skewness ν = 0.5 and ν = 1 respectively, and a mean parameter of 40-year so that the resulting series have the same mean resolution as the second (regular) case. To illustrate the contribution of different frequency ranges to the precision and accuracy of the estimators, the 300 scaling exponents were also fit on sub-ranges of equal width in the log of the timescales which we refer to as the shorter timescales, the intermediate timescales and the longer timescales, corresponding to, respectively, 2-9.2 τ µ , 4.3-19.9 τ µ and 9.2-42.7 τ µ , where τ µ is the mean resolution of the timeseries (Figure 4).
For the "annual data" series, the scaling exponent H can generally be recovered for all values of H ∈ [−0.9, 0.9] with a standard devation below 0.13 and an absolute bias less than 0.06 (Fig. 3, top  q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q The second case of regular 40-year series yields similar results as the previous case, although there is no more aliasing since we sub-sampled the 5120-year long series at 40-year intervals after an 80-year low-pass filter was applied (Fig. 3, top right).
The MTM returns a consistent estimate with a variance which remains almost constant over the range of H. The estimator has little bias for stationary series with H < 0, but it steadily increases for higher values as the lower-frequencies bend upward 315 (Fig. 4). This seems to be related to the bias low characteristic of the MTM at the longest timescales (Prieto et al., 2007) which creates an inflection point (around ∆t ≈ 3000 years on Fig. 1) and the power lost to its right appears redistributed to its left. Therefore, as H increases, the amount of power lost is more important and the inflection stronger. The LSP estimates are also largely unbiased until about H = 0.5, and above it a strong negative bias is developed, particularly on the side of the smaller timescales (Fig. 4, Fig. 1).

320
The problem of irregularity is considered using irregular surrogate data skewness parameter ν = 0.5 and ν = 1 (Fig. 3 q q q q q q q q q q q q q qq q q q q Long Timescales q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q corresponds very closely to the expected bias from the linear interpolation step, and it could potentially be accounted and corrected for. While it is outside of the scope of this study to formally develop such method, if we apply a first-order correction 325 by dividing the MTM spectra by a sinc 4 with time constant corresponding to the mean sampling resolution, we can greatly reduce the bias down to the Nyquist frequency (Fig. 1, Fig. S7).
The negative bias for the LSP evoked above for H > 0.5 amplifies with irregularity and even affects significantly a wider range of H values, down to H = −0.5 for the most irregular case (ν = 1.). This is a consequence of higher than expected variance over the smaller timescales in the LSP when the slope is steep, as already identified by Schulz and Mudelsee (2002)

Effect of Time series length
While the irregularity had a larger impact on the bias of the estimator, the length of the time series mostly influences the variance of the estimator (Fig. S8).
In the case of the MTM estimates, all values of H result in a similar standard deviation for a given resolution. When the 340 resolution is doubled, the standard deviation increases by a factor close to the expected √ 2 (by ∼1.4-1.7), going from σ ≈ 0.27 at the 160-year resolution to σ ≈ 0.07 at the 20-year resolution (Fig. S8). The bias also improves when the resolution increases, particularly for the higher values of H such as for H = 0.9 which goes from B ≈ 0.27 at the 160-year resolution to B ≈ 0.10 at the 20-year resolution, while for negative H values it goes from B ≈ 0.05 − 0.07 to |B| < 0.01 for the same resolution change.
In the case of the LSP, up to H ≈ 0.1, the standard deviation of the estimates also decrease by a factor close to √ 2 (by 345 ∼1.4-1.8, Fig. S8) with each doubling of resolution, going from σ ≈ 0.35 at the 160-year resolution to σ ≈ 0.08 at the 20-year resolution, and the bias improves only slightly since it was already small (|B| < 0.04). For the higher H values, the standard deviation improves less (by ∼1.2-1.4 at H = 0.9, Fig. S8), going from σ ≈ 0.43 at the 160-year resolution to only σ ≈ 0.19 at the 20-year resolution. At the same time, the bias change becomes more positive, thus compensating slightly better the overall strong negative bias characteristic of the LSP estimates for very high H values.

350
In the case of the HSF, when increasing the resolution, the standard deviation of the estimates improves more for the lower H values than for the higher values, improving by a factor of ∼1.4-1.8 for H = −0.9 to ∼1.2-1.4 for H = 0.9 (Fig. S8).
Overall, the standard deviation improves from σ ≈ 0.20 − 0.25 at the 160-year resolution to σ ≈ 0.06 − 0.09 at the 20-year resolution. The slight negative bias found for the H > −0.5 estimates (|B| < 0.04) practically vanishes, but not the positive one for H < −0.5 which only decreases slightly. q q q q q q q qqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q

Application to database
In order to see how these results translate to typical proxy records, we perform the analysis with surrogate data with sampling characteristic and scaling behaviour directly extracted from the database of Holocene and LGM records. We make the assumption that the series are approximately power-law scaling and that they can therefore be modelled by fractional noise described by a scaling exponent H. Since the best H to describe the approximate scaling behaviour of the series is unknown, we make 360 an initial approximation with each method and take the median of the three results as the reference value for the given series, with which we generate an ensemble of surrogate with the same sampling scheme as the given series. We use the real timesteps in order to evaluate the impact of each specific sampling scheme. Again, we consistently fitted up to one third of the length (see section 2.1.3), but determined empirically the best minimum fitting scale τ min such that it minimizes the RMSE of the estimator with respect to the reference H used to generate the surrogate data. 365 We found the HSF to yield the smallest τ min of the three methods, with τ min below twice the mean resolution τ µ for 121 out of 127 series and even below τ µ for 43 out of 127. The LSP yielded similar results with 109 out of 127 series having τ min below twice τ µ , and 41 out of 127 having τ min even below τ µ . This contrasts with the MTM which almost never suggests best results for τ min below twice τ µ , but rather an average minimum resolutionτ min = 3.3τ µ , compared toτ min = 1.2τ µ for the HSF andτ min = 1.3τ µ for the LSP. This underlines because interpolation-free methods give more reliable estimates at shorter 370 timescales, they allow a better usage of the full data.
The HSF yields the best results of all three methods both in terms of variance of the estimates, and in terms of bias, with a mild tendency towards positive biases. The MTM also tends to show positive biases, albeit higher, while the LSP tends to show negative biases. Both the MTM and the LSP generally show higher variance of their estimators than the HSF (Fig. 7), although for different reasons. The MTM has higher variance because the higher τ min does not allow to use the smaller timescales in the 375 estimation procedure, while the LSP shows higher variance because of the timeseries with high input H for which it performs poorly, especially when the data are irregular (Fig. 3). On average, the HSF method gave the lowest RMSE = 0.11 ± 0.04 compared to RMSE = 0.30 ± 0.15 and RMSE = 0.19 ± 0.15 for the MTM and the LSP, respectively. The poor performance of the LSP compared to the HSP stems from the higher H since it performs as well as the Haar when H < 0, with RMSE = 0.12 ± 0.07 (Fig. 8).

Discussion
Our comparative study indicates that for irregular timeseries the methods with interpolation, i.e.the CPG and the MTM, are less efficient to evaluate variability across timescales than the methods without interpolation, i.e. the LSP and the HSF. In the case of regular timeseries however, all methods were found to perform similarly (Fig. 3b) for a wide range of input H, and only showed bias on the fringes of the H range considered. In addition, we found that the choice of method should also 385 be informed by the characteristics of the underlying process measured since there was an observed dependence between the performance metrics and the scaling exponent H which generated the surrogate data. As such, the LSP may be only appropriate for irregular series suspected of having H near or below H = −0.5, while the HSF shows better reliability even when H is near or above H = 0. It might not be so surprising that the LSP performed poorly for higher H values since it has been developed to approximate the CPG for stationary noise processes (Scargle, 1982) and processes with H > 0 are non-stationary, i.e. their In addition to the irregularity, we also compared how the resolution of the timeseries affects the estimators of H. While making the series more irregular mostly affected the accuracy of the estimates, i.e. their bias, without affecting their precision, i.e. their standard deviation, increasing the resolution mostly improved the precision and the accuracy only to a lesser extent.
Increasing the resolution naturally improves the estimates as it provides more information and also because it decreases the 395 relative weight of the most uncertain timescales: the smaller timescales, near the sampling resolution, and the longer timescales, near the length of the series.
Our conclusions however rely on the real data having similar properties to the surrogate data used to validate the methods, and the conclusions may be different if the real data have properties different than the assumed power-law scaling. It is difficult to say for example which method would perform best if the data analyzed would contain two different scaling regimes, one 400 with low H and the other with high H (Lovejoy and Schertzer, 2013). Furthermore, in order to degrade the simulated annual timeseries into irregular paleoclimate-like data we had to use simplified numerical methods to mimic the physical processes and manipulations leading to the recording and retrieval of paleoclimate archives. For our purpose, we chose to low-pass filter the data and sub-sample since this is very common in sedimentary data and ice cores because of, respectively, bio-turbation and diffusion (Dolman and Laepple, 2018;Dolman et al., 2020;Kunz et al., 2020). However, reality is more complex and 405 other processes could significantly alter the recorded signal and bias the variability estimates in ways not accounted for by our surrogate model. For specific case studies, it is advisable to develop forward models (Stevenson et al., 2013;Dee et al., 2017;Dolman and Laepple, 2018;Casado et al., 2020) of the studied paleoclimate archives in order to understand potential biases generated by the recording and specific sampling.
In summary, it is difficult to ascertain which method is the "best" since the answer depends on the irregularity, the resolution 410 and, most importantly, the underlying H values. In this respect, the HSF is possibly the safer method for paleoclimate applications since it only performs poorly for series with H < −0.5, an almost unseen behaviour in climatic timeseries at timescales longer than decadal. Although the LSP gives equally good result for H 0 and even better near H = −0.5, i.e. white noise, its increasing bias and standard deviation for higher H values makes it an uncertain choice for timescales longer than centennial since climate timeseries at those timescales often show H ≈ 0 and greater. The MTM produces good estimates for relatively 415 regular series, but struggles with highly irregular timeseries since it must rely on interpolation. While this produces a large bias on the shorter timescales which we have thus discarded, the bias is rather consistent and methods could be developed to correct for it (e.g. Fig. S7). On the other hand, the longer timescales of the MTM are relatively well estimated and it is appropriate to estimate the absolute variance over a timescale band well above the mean resolution (Rehfeld et al., 2018;Hébert et al., 2021).
Further, the effect of proxy biases are well studied for the power spectrum Dolman et al., 2020) mainly due 420 to the known properties of the Fourier transform. Our study is not comprehensive and other methods such as the bias corrected version of Lomb-Scargle (Schulz and Mudelsee, 2002, REDFIT) and the z-transform wavelet (Zhu et al., 2019) exist. More precise estimates of the scaling exponent are obtained by parametric methods based on maximum likelihood, but they require strong assumptions about the underlying process (Del Rio Amador and Lovejoy, 2019). We aimed to cover the most commonly used methods that also allow for a simple interpretation of variability across timescales, either as they represent the well studied 425 power spectrum or the Haar fluctuations that directly provide changes in amplitude.

Conclusions
Characterizing the variability across timescales is important to understand the underlying dynamics of the Earth system. It remains challenging to do so from paleoclimate archives since they are more than often irregular and traditional methods to produce timescale-dependent estimates of variability such as the classical periodogram and the multitaper spectrum generally 430 require regular time sampling.
We have compared those traditional methods using interpolation with interpolation-free methods, namely the Lomb-Scargle periodogram and the first-order Haar structure function. The ability of those methods to produce timescale-dependent estimates of variability when applied to irregular data was evaluated in a comparative framework. The metric we chose to compare them is the scaling exponent, i.e. the linear slope in log-transformed coordinates, since it summarizes the behaviour of the 435 variability over a given timescale band. Doing so assumes power-law scaling, a behaviour which is often observed in geophysical timeseries at least approximatively (Mandelbrot and Wallis, 1968;Cannon and Mandelbrot, 1984;Pelletier and Turcotte, 1999;Malamud and Turcotte, 1999;Fedi, 2016;Corral and Gonz\'alez, 2019). To evaluate our estimators, we generated frac-tional noise as surrogate annual timeseries characterized by a given scaling exponent H, also known as fractional Gaussian noise when they are stationary (H < 0) and fractional Brownian motion when they are non-stationary (H > 0). The annual 440 timeseries were then degraded to resolutions characteristics of paleoclimate archives for the recent Holocene.
We found that for scaling estimates in irregular timeseries, the interpolation-free methods are to be preferred over the methods requiring interpolation as they allow for the utilization of the information from shorter timescales without introducing additional bias. In addition, our results suggest that the Haar structure function is the safer choice of interpolation-free method since the Lomb-Scargle periodogram is unreliable when the underlying process generating the timeseries is not stationary.

445
This conclusion was reinforced by the application to the proxy database which indeed showed the Haar structure function to give more reliable estimates over a wide range of H. Given that we cannot know a priori what kind of scaling behaviour is contained in a paleoclimate timeseries, and that it is also possible that this changes as a function of timescale, it is a desirable characteristic for the method to handle both stationary and non-stationary cases alike.
Code and data availability. The data is available as a supplement to Rehfeld et al. (2018)