Comparing estimation techniques for temporal scaling in palaeoclimate time series

. Characterizing the variability across timescales is important for understanding the underlying dynamics of the Earth system. It remains challenging to do so from palaeoclimate archives since they are more often than not irregular, and traditional methods for producing timescale-dependent estimates of variability, such as the classical pe-riodogram and the multitaper spectrum, generally require regular time sampling. We have compared those traditional methods using interpolation with interpolation-free methods, namely the Lomb–Scargle periodogram and the ﬁrst-order Haar structure function. The ability of those methods to produce timescale-dependent estimates of variability when applied to irregular data was evaluated in a comparative framework, using surrogate palaeo-proxy data generated with realistic sampling. The metric we chose to compare them is the scaling exponent, i.e. the linear slope in log-transformed co-ordinates, since it


Introduction
Palaeoclimate archives are crucial for improving our understanding of climate variability on decadal to multi-centennial timescales and beyond (Braconnot et al., 2012).They provide independent data for the evaluation of the climate models which are used to project future climate change.To this end, quantitative estimates of variability based on palaeoclimate records allow us to compare past changes in variance over a wide range of timescales (e.g.Laepple and Huybers, 2014b, a;Rehfeld et al., 2018;Zhu et al., 2019) and to compare how variance is distributed across different timescales (e.g.Mitchell, 1976;Huybers and Curry, 2006;Lovejoy, 2015;Nilsen et al., 2016;Shao and Ditlevsen, 2016).
We generally refer to scaling as the statistical characterization of the changes in climate variability as a function of timescale τ or, equivalently, as a function of frequency f , such that f = τ −1 .The term scaling often implies, although not necessarily, power law scaling.A stochastic process is said to be power law scaling over a range of timescales [τ 1 , τ 2 ] if a timescale-dependent statistical metric S(τ ) approximately follows a power law relationship such that S(τ ) ∝ τ a , where a is a general power law scaling ex-Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.
ponent.Therefore, the power spectrum of such processes will appear linear on a log-log plot over a given range of timescales (Percival and Walden, 1993).The corresponding power law scaling exponent can be informative of the underlying dynamics of the system, such as the degree of temporal auto-correlation, i.e. the system's memory (Mandelbrot and Wallis, 1968;Lovejoy et al., 2015;Graves et al., 2017;Fredriksen and Rypdal, 2015;Del Rio Amador and Lovejoy, 2019).While assuming power law scaling may be a simplification, it is a rather accurate first-order description for a vast range of geophysical processes (Cannon and Mandelbrot, 1984;Pelletier and Turcotte, 1999;Malamud and Turcotte, 1999;Fedi, 2016;Corral and González, 2019).
Methods used for scaling analysis generally assume that the process under investigation has been sampled at regular time steps.This is appropriate for some instrumental observations and annually resolved palaeoclimate archives such as tree and coral rings.However, since most palaeoclimate archives are the product of slow and intermittent accumulation in sediments or ice sheets, sampling them at regular depth intervals translates to irregular time intervals (Bradley, 2015).In addition, the irregular accumulation process usually has to be reconstructed and necessarily introduces age uncertainty (Rehfeld and Kurths, 2014), although it does not affect the scaling estimates strongly (Rhines and Huybers, 2011).
Therefore, the primary challenge is that scaling analysis methods need to be adapted for the analysis of sparse and irregular series.There are two approaches to this problem that can be distinguished.
First, an interpolation routine can be employed prior to the analysis in order to regularize the series.Once the series are regular, traditional methods such as the classical periodogram (CPG; Schuster, 1898; see Table 1 for a list of all the acronyms used in this paper) or the multitaper spectrum method (MTM; Thomson, 1982) can be used.See this approach used in a palaeoclimatology context in Huybers and Curry (2006), Laepple and Huybers (2014a, b) and Rehfeld et al. (2018) and an implementation of these methods in R for regular climate data, including functions for statistical testing, scaling exponent estimation and trend estimations for different residual models provided by Vyushin et al. (2009).
Second, the estimator can be adjusted for arbitrary sampling times.The so-called Lomb-Scargle periodogram (LSP; Lomb, 1976;Scargle, 1982;Horne and Baliunas, 1986;Van-derPlas, 2018) was developed in the field of astronomy to identify periodic components in noisy astronomical time series with sampling hiatus and was often used to analyze laser Doppler velocimetry experiments, which produce irregularly sampled velocity data (Benedict et al., 2000;Munteanu et al., 2016;Damaschke et al., 2018), and for the detection of biomedical rhythms (Schimmel, 2001).The LSP has sometimes been used in a palaeoclimatological context (Schulz and Stattegger, 1997;Trauth, 2020), although it may introduce additional bias and variance (Schulz and Stattegger, 1997;Schulz and Mudelsee, 2002;Rehfeld et al., 2011).More recently, Lovejoy and Schertzer (2012) advocated for the use of the Haar structure function (HSF), based on Haar wavelets (Haar, 1910), in geophysics due to its ease of interpretation and accuracy.Incidentally, the HSF can easily be adapted for irregular time series.
Another method which is often used for scaling analysis is the detrended fluctuations analysis (DFA Peng et al., 1995;Kantelhardt et al., 2001Kantelhardt et al., , 2002)), which has also been applied to climatic and palaeoclimatic time series (Koscielny-Bunde et al., 1996;Rybski et al., 2006;Shao and Ditlevsen, 2016).However, a prestudy showed that it was less efficient for irregular time series, and we decided to omit it for clarity.In addition, it underestimates variance at any given timescale because of the necessary detrending (Nilsen et al., 2016), and it is therefore of limited use beyond the estimation of scaling exponents.
In the present work, we compare the different methods for the scaling of variability and make them accessible in a single software package.Our main aim is to assess their ability to estimate variability across timescales in a palaeoclimatological context which often entails scarce and irregular time series.In order to benchmark the methods, we evaluate them on surrogate data with known properties similar to those of palaeoclimate records without abrupt transitions.Finally, we apply the methods to a database comprising Last Glacial maximum (LGM) and Holocene records to evaluate their performances on real data.

Data and methods
In this section, we outline the different methods considered to evaluate scaling and how they can be compared.We also provide a method to generate palaeo-proxy surrogate data with realistic variability and sampling, which is then used to test and compare the scaling methods.

Scaling estimation methods
Scaling generally refers to the behaviour of a quantity S(τ ) as a function of scale τ (or frequency f such that f = τ −1 ) for a given process X(t).In the current work, we exclusively consider time series, but the same methods can be used to investigate spatial scaling relationships.The quantity S(τ ) considered can be the statistical moments of an appropriately defined fluctuation X, such as the power spectral density or Haar fluctuations.It is usual to assume such a form to define a structure function for the statistical moments (Schertzer and Lovejoy, 1987;Lovejoy and Schertzer, 2012) of the process under investigation as follows: where " . . ." denotes ensemble averaging over all fluctuations j available at the given scale τ , q is the statistical mo- Fractional Brownian motion Mandelbrot and Van Ness (1968) ment (i.e.q = 1 corresponds to the mean, q = 2 to the variance, q = 3 to the skewness and so on), and ξ(q) = qH − K(q) is the exponent function, where H is the fluctuation scaling exponent and K(q) is the moment scaling function.K(q) is zero for some monofractals such as the well-studied case of Gaussian processes and, thus, for these specific cases, all statistical moments scale similarly, i.e. ξ(q) ∝ qH .The equivalent metric for the power spectrum method is the spectral scaling exponent β (see Sect. 2.1.1 for a formal definition).The two scaling exponents can be related by the following: where K(q) at q = 2 is used since the spectrum is a secondorder statistic.This relation can be understood intuitively since β describes the scaling of the power spectral density obtained via the Fourier transform of the auto-covariance function, whereas H describes the scaling of the real space fluctuations.Therefore, since the auto-covariance is proportional to the expectation value of the (zero mean) time series squared, the exponent H is multiplied by two, while the integration in the definition of the Fourier transform increases the scaling exponent by +1.
In this work, we will focus on the (monofractal) quasi-Gaussian case, i.e. when statistics approximately follow the Gaussian distribution, in order to minimize the number of estimated parameters.Palaeoclimate archives often lack the resolution and/or length to estimate many parameters with confidence.This assumption is rather well justified for temperature and precipitation time series at timescales longer than the turbulent weather regime (i.e.timescales longer than weeks or a few years, depending on the location and medium; Lovejoy and Schertzer, 2012), but it is unclear over what range of time and timescales it may hold.For a series on glacial-interglacial timescales comprising a Dansgaard-Oeschger event, multifractality has been observed (Schmitt et al., 1995;Shao and Ditlevsen, 2016), and the Gaussian approximation does not hold.This is also the case for highly intermittent archives which clearly display multifractality, such as volcanic series (Lovejoy and Varotsos, 2016) or dust flux (Lovejoy and Lambert, 2019), which would also require the intermittency correction from the moment scaling function K(q).
Under the quasi-Gaussian assumption, we can simplify Eq. (2) since this implies K(2) ≈ 0 and, therefore, the following: (3) This allows us to convert estimated β (where ∧ denotes an estimator for the given quantity) into their equivalent estimated fluctuation scaling exponents Ĥ .We use the fluctuation scaling exponent H for intercomparison between the methods in this study.H is a natural choice as it directly describes the behaviour of fluctuations in real space and is the usual parameter in functions for generating fractional noises (Mandelbrot and Van Ness, 1968;Mandelbrot, 1971;Molz et al., 1997).While the fluctuation exponent H takes its origin in the so-called Hurst exponent (Hurst, 1956), its meaning and definition have evolved and changed over time (see Graves et al. (2017) and Lovejoy et al. (2021) for historical summaries).
The process for estimating the scaling exponents can be divided into three steps.First, if the series is irregularly spaced, it requires regularization in order to be usable for the CPG and the MTM.The regularization is not necessary for the HSF and the LSP, which have the advantage that they can be calculated directly on the irregular time series.Second, the fluctuations proper to each method are calculated as a function of timescale, and finally, the scaling exponent is fitted on the result.

Power spectrum
For an ergodic, weakly stationary stochastic process, the power spectrum P of a single realization X is given by the Fourier transform of the auto-covariance function γ and, equivalently, the squared Fourier transform of the signal, as follows (von Storch and Zwiers, 1984): where τ is the timescale, T is the temporal coverage of X, and F denotes the Fourier transform operator.Equivalently, https://doi.org/10.5194/npg-28-311-2021Nonlin.Processes Geophys., 28, 311-328, 2021 the power spectrum can be given as a function of the frequency f = τ −1 instead of the timescale τ .We choose to write it as a function of timescale to allow for a visual comparison with the Haar method below, and also because it is more intuitive; a non-expert can easily grasp what the 1000year timescale means rather than the equivalent 0.001-year −1 frequency.

Classical periodogram
The power spectrum for a discrete process X with N time steps at regular intervals τ 0 can be estimated using the classical periodogram as follows (Chatfield, 2013): If log(P (τ )) behaves linearly over a range of logtransformed timescales log(τ ), the time series is considered to be power law scaling over this timescale band, i.e.P (τ ) ≈ Aτ β for an arbitrary constant A.

Multitaper spectrum
The MTM method improves upon the CPG by producing independent estimates using a set of orthogonal functions.The prolate spheroidal wave functions h t,k (Slepian and Pollak, 1961), also known as the Slepian tapers, have the desirable property of minimizing spectral leakage (Thomson, 1982).The spectral individual estimates for the kth taper can be written in a form similar to the periodogram above, as follows: The estimator can then be expressed as the mean of the K tapered estimates as follows:

Interpolation
To produce the power spectrum with the methods above, it is necessary to interpolate the series at a regular resolution.Following Laepple and Huybers (2013), the data were first linearly interpolated to 10 times the mean resolution and then low-pass filtered, using a finite response filter with a cut-off frequency of 1.2 divided by the target mean resolution in order to avoid aliasing.Linear interpolation corresponds to a convolution in the temporal domain with a triangular window.The Fourier transform of the triangular window is sinc 2 , where the sinc function is defined as sinc = x −1 sin(x), and therefore, a linear interpolation multiplies the power spectrum by sinc 4 modulated by the resolution of the interpolation (Smith, 2011), resulting in a power loss near the Nyquist frequency.

Lomb-Scargle periodogram
As an alternative to the classical periodogram which requires regular data, Scargle (1982) introduced the LSP as a generalized form of the CPG, as follows (Eq.5): where A, B and T are arbitrary functions of timescale τ and sampling times t j , which can be irregular.We see that, if A = B = 2 N and T = 0, we would recover the classical periodogram in case of regular sampling, but the reduction is not unique and other choices of A and B can be made.The periodogram estimates are chi-squared distributed with 2 degrees of freedom (i.e. an exponential distribution), and therefore, A and B will be chosen to retain this property.For independent and identically distributed white noise, it is the case when, as in the following: The CPG is invariant to time translation since shifting the time steps by a constant value only affects the phase of the complex exponential inside the absolute value.The function T is thus introduced for the generalized periodogram to retain that property, which is the case when, as in the following: The LSP has been mostly used in astronomy to detect periodic components in noisy irregular data.As outlined above, the functions A and B are chosen following the assumption that the process X is approximately white noise with no temporal correlation.This assumption is, of course, problematic from the perspective of climate time series which usually exhibit long-range temporal correlations, but as we will see later, good estimates of scaling exponents can nonetheless be recovered over a wide range of H .

Haar structure function
The HSF allows us to perform the scaling analysis in real space, i.e. without performing a Fourier transform.When used to define a structure function, Haar wavelets are appropriate for estimating the fluctuation exponent of processes with H ∈ (−1, 1) (Lovejoy and Schertzer, 2012), a range covering most geophysical processes.Therefore, and also because of its readiness to be applied to irregular series and its ease of interpretation, Lovejoy and Schertzer (2012) argued that it makes it a convenient choice for the scaling analysis of geophysical time series.
Haar fluctuations H are simple to implement for irregular sampling.They can be defined for a given interval of length τ as the difference between the mean of the first half of the interval with the second half, as follows: The discrete sampling t i does not have to be regular since the average of the available fluctuations in the intervals is taken.This is, of course, an approximation since the fluctuations available might not exactly correspond to the expected timescale.
The first-order HSF S H,q=1 can now be defined by using the Haar fluctuation defined in equation 1 and letting q = 1, as follows: Higher-order structure functions can be defined by considering other values of q and can be used to analyze multifractal processes, but since we are only dealing with the (monofractal) quasi-Gaussian case in this paper, q = 1 is sufficient.For simplicity, we refer to the first-order HSF simply as HSF.Since we are taking the average value of the absolute of the fluctuations, if the fluctuations are distributed in a quasi-Gaussian fashion, then we expect the distribution of Haar fluctuations at each scale to be a half-normal distribution, i.e. a zero-mean Gaussian distribution folded along zero.The mean µ HG of such half-Gaussian distribution, which can be obtained by taking the absolute of a Gaussian distribution, is proportional to the standard deviation σ G of the original Gaussian distribution, i.e. µ HG = 2 π σ G .Since the sign of the Haar fluctuation before taking the absolute is arbitrary, we can also consider the negative of those fluctuations to create an ensemble which is even closer to a Gaussian distribution and has mean equal zero by construction.We then estimate the mean absolute Haar fluctuation using the definition with the standard deviation given above.This procedure marginally improves the estimation procedure over directly taking the average of the absolute of the Haar fluctuations available.

Slope estimation
For a power law relationship between variables x and y, such as y = Ax B , it is usual to use standard least square fitting to find the coefficient A and the exponent B as the linear coefficients of the equivalent linear relationship after taking the logarithm of the equation, such that log y = B log x + log A. Least square fitting assumes the residuals are normally distributed, which is often a good approximation for the logarithm of the power spectral density of a stochastic process at a given frequency.
In the case of stationary Gaussian processes, it can be shown that the CPG and LSP estimates at a given timescale are distributed like a chi-square distribution with degrees of freedom equal to 2 (i.e. an exponential distribution), and for the MTM, it is approximately twice the number of tapers, albeit slightly less depending on their degree of dependence.The logarithm of the distributions mentioned above is similar enough to a normal distribution to obtain reasonable estimates with an ordinary least-square fit.Another option is to use a generalized linear model with a gamma distribution model of which the chi-square is a special case.While very similar results are obtained for both fitting methods, we chose the generalized linear model method, since it is theoretically more justified.
For a Gaussian process, the first-order Haar fluctuations will follow a normal distribution, but as we are fitting the absolute value of the fluctuations, the estimates at a given timescale will follow a half-normal distribution, as mentioned above.Although the half-normal distribution is not a specific case of the gamma distribution but rather the generalized gamma distribution, we also use the generalized linear model to estimate its scaling exponent for practical purposes.

Timescale band for slope estimation
For all methods, we need to select a range of timescales over which to estimate the scaling exponent using the generalized linear model.The maximum fitting timescale τ max used was always one-third of the largest scale available in order to maximize the range of timescales used, while avoiding the longer timescales which have poor statistics, in the case of the HSF (Lovejoy and Schertzer, 2012), and underestimate variance, in the case of the MTM because of its known bias (Prieto et al., 2007) and the usual linear detrending.Since the impact of irregularity is not important at long timescales, we chose this same τ max for all methods for consistency.The optimal minimum timescale τ min , on the other hand, can vary, depending on the method used, and requires careful treatment.
For the CPG and the MTM, we fitted above the timescale corresponding to the Nyquist frequency, i.e. twice the resolution, for regular series.In the case of irregular data, since the interpolation brings about a power loss at small timescales (Schulz and Stattegger, 1997;Rhines and Huybers, 2011; https://doi.org/10.5194/npg-28-311-2021 Nonlin.Processes Geophys., 28, 311-328, 2021 R. Hébert et al.: Scaling of palaeoclimate time series Kunz et al., 2020), fitting over the small timescales produces a positive bias on the slope estimation.Therefore, when estimating the scaling exponents for the irregular series, we consistently fitted above 3 times the mean resolution τ µ as a compromise (i.e.1.5 times the Nyquist frequency).This choice of τ min was informed (as for the methods below) from the results using the palaeoclimate database (see Sect. 3.3).
For both the HSF and LSP, we used twice the resolution as τ min for the regular cases, and we used τ min = 2τ µ for the irregular cases.

Surrogate data
To test the methods, we produce surrogate data with the same characteristic as the palaeoclimate archives, namely a given scaling behaviour and an irregular sampling in time.

Simulation of power law noise
A classical example of a non-stationary process (H > 0) is normal Brownian motion (H = 1 2 ), which is produced by (integer order) integration of a normal Gaussian white noise (H = − 1 2 ).A process with a given scaling behaviour can be obtained by fractional, rather than integer, order integration (or differentiation) of a given set of innovations, i.e. a random series of uncorrelated values obtained from a certain distribution.To generate our surrogate data, we consider the simplest case for which the innovations are drawn from a normal Gaussian distribution.This leads to the classical fractional Gaussian noise (fGn) and fractional Brownian motion (fBm; Mandelbrot and Van Ness, 1968;Mandelbrot, 1971;Molz et al., 1997).For any fGn process, there is a related fBm process which can be obtained by (integer order) integration, which increases the scaling exponent of the process by +1.It is usual to define the associated fGn and fBm processes by the same scaling exponent h ∈ (0, 1), which directly describes the scaling behaviour of the HSF for the fBm or for the fGn after (integer order) integration.However, it is inconvenient to use the same parameter for both fGn and fBm when considering them in a common framework, and we prefer to also identify the fGn by the scaling behaviour of its HSF rather than that of its integral, such that it has H ∈ (−1, 0).In the current paper, we generally refer to both fGn and fBm as "fractional noise" described by an exponent H ∈ (−1, 1).They are generated using an algorithm developed by Franzke et al. (2012).
Series with H ∈ (−0.5, −0.3) are typical of monthly land air surface temperature up to decadal timescales, while series with H ∈ (−0.3, 0) are more typical of sea surface temperature over similar timescales.Non-stationary behaviour with H > 0 is typically observed in pre-Holocene series comprising Dansgaard-Oeschger events (Nilsen et al., 2016).

Generation of irregular palaeo-series
To produce irregularly sampled series akin to palaeoclimate archives, an annual resolution series with a given scaling exponent is first produced, with the above method, and then degraded at the desired resolution using two different methods.The first one is to simply block average, and the second one is to subsample a low-pass filtered version of the series.
For the block averaging method, boundaries are determined as the midpoint between subsequent time steps, and all data in between are averaged.This corresponds, in the temporal domain, to a convolution with a square window, or, equivalently in the Fourier domain, to a multiplication of the Fourier transform with the sinc function (sinc(x) = x −1 sin(x)).Therefore, the power spectrum is multiplied by a sinc 2 , and this brings about a loss of power at the high frequencies.For the second method, the timescale of the lowpass filter is taken as twice the mean resolution of the series, and the filtered series is simply subsampled at the desired time steps.In the frequency domain, the filtering corresponds to multiplying the spectrum by a square function, which cuts off variability below the specified timescale, and therefore, this is useful for reducing the aliasing of higher frequencies.
The first method would correspond to an archive sampled without gaps and where there is no smoothing of the signal, for example speleothem and varved sediments, if samples containing several layers are taken or marine records when the sample distance is smaller than the typical mixing depth in the sediment (Berger and Heath, 1968).The second method would correspond to archives with spacedout sampling (for example, 1 cm every 10 cm) and including processes such as bio-turbation and diffusion which smooth out the high-frequency signal, for example biochemical and ecological data extracted from sediment cores (Dolman and Laepple, 2018;Dolman et al., 2021).We show the second method as our main result, since it is more common to have archives with such smoothing processes, but also because it removes most of the aliasing effect from our results.It, therefore, leaves a clearer picture of the other effects inherent to each method.It is, however, an idealization since the smoothing timescale of the physical processes involved is not related to the resolution; it should be independently estimated and accounted for in applied studies, although it is seldom reported.All the results were also computed for the block average method and are shown in the Supplement (Figs.S1-S5 and S7-S10).
Previous studies have shown that the distribution of sampling times for typical sedimentary records can be approximated by a gamma distribution (Reschke et al., 2019a).To systematically test the impact of increasingly irregular sampling, we thus draw the time steps from a gamma distribution defined by its shape parameter k (or skewness ν, such that k = ν −1 ) and mean parameter µ which corresponds to the mean time step τ µ .When the skewness parameter is ν = 0, then we have regular sampling of width µ. Figure 1 shows an example of such a surrogate series at annual resolution and an irregularly degraded version (with time steps drawn from a gamma distribution with ν = 1), along with the results of applying the three scaling estimation methods considered for the main analysis, i.e. the MTM, the HSF and the LSP.We omit a discussion of the results computed using the CPG as they are generally very similar to those of the MTM, albeit with higher variance.

Performance metrics and performance plots
Our aim is to evaluate how the different methods perform in the estimation of scaling exponents for irregularly sampled palaeoclimate data X(t), with different scaling behaviour and of variable length and irregularity.In order to assess the accuracy and precision of our scaling exponent estimator Ĥ for a given set of parameters, we generate an ensemble of surrogate data X(t) and analyze its statistics.We evaluate three measures, namely the bias B for the accuracy of the estimator Ĥ with respect to the input H , the standard deviation σ for the precision of the estimator and the root mean square error (RMSE), which combines both previous measures as follows: We exploited the geometric relation between the three to easily visualize the results.We summarize the results for a set of parameters by one point on a bias-standard deviation diagram (Fig. 4), where the x axis gives the bias, the y axis the standard deviation and the distance from the origin then corresponds to the root mean square error.The bias and standard deviation for each combination are calculated from a large ensemble of surrogate data (10 000 realizations).

Data
In order to test the methods with the sampling properties of typical palaeo-proxy data, we consider an available database of marine and terrestrial proxy records for temperature (Rehfeld et al., 2018), which was also used for signal-to-noise ratio assessments by Reschke et al. (2019b).Each of the 99 sites covers at least 4000 years in the interval of the Holocene (8-0 kyr ago; 88 time series in total) and/or the LGM (27-19 kyr ago; 39 time series total) at a mean sampling interval of 225 years or lower.These records are irregularly sampled in time and come with different sampling characteristics (Fig. 2).

Results
Using the methods described above to generate synthetic data, we can evaluate the ability of each method to recover the input scaling exponents.For each method, we calculate the bias and standard deviation over the ensembles of surrogate data, for input H between −0.9 and 0.9 in increments of 0.1.This wide range of H values covers all values encountered later in the multiproxy database; the vast majority of climatic series even fall within an even more restricted range of H ∈ [−0.5, 0.5].First, we consider the ideal case of regular sampling, then we study the effect of irregular sampling and finally, the impact of the length of the time series.

Effect of regular and irregular sampling
We evaluate the estimators for four cases pertaining to the resolution of the data and always a fixed length of 128 data points (Figs. 3 and 4).The first case considers annual data, which was directly simulated and not degraded afterwards; it is, therefore, regular.It is shown for comparison with the second case, where we simulate 5120-year-long series and then degrade them to 40-year resolution.This allows us to study the impact of the degrading method, which is necessary for producing irregular series.The third and fourth cases deal with series of 128 irregular time steps drawn from a gamma distribution with skewness ν = 0.5 and ν = 1, respectively, and a mean parameter of 40 years so that the resulting series have the same mean resolution as the second (regular) case.
To illustrate the contribution of different frequency ranges to the precision and accuracy of the estimators, the scaling exponents were also fit on subranges of equal width in the log of the timescales, which we refer to as the shorter timescales, the intermediate timescales and the longer timescales, corresponding to, respectively, 2-9.2 τ µ , 4.3-19.9τ µ and 9.2-42.7 τ µ , where τ µ is the mean resolution of the time series (Fig. 5).
For the annual data series, the scaling exponent H can generally be recovered for all values of H ∈ [−0.9, 0.9], with a standard deviation below 0.13 and an absolute bias less than 0.06 (Fig. 4; top left), except for the LSP when H > −0.1, as it increasingly underestimates higher values of H .The small bias observed in the MTM and LSP estimates stems from the shorter timescales (Fig. 5), which are sensitive to aliasing of the power below the Nyquist frequency, and, therefore, the measured scaling slope of series with negative β (i.e.H < −0.5) are biased high, whereas increasingly negative bias is obtained for increasingly positive values of β (H > −0.5).It follows that the flat case of β = 0 (H = −0.5) is unbiased by aliasing since the aliased power is likewise flat.The HSF-based estimates also have a significant bias in the other direction when the series considered have an input H decreasing below H = −0.5 (Fig. 3).
The second case of regular 40-year series yields similar results to the previous case, although there is no more aliasing since we subsampled the 5120-year-long series at 40year intervals after an 80-year low-pass filter was applied (Fig. 4; top right).The MTM returns a consistent estimate with a variance which remains almost constant over the range https://doi.org/10.5194/npg-28-311-2021 Nonlin.Processes Geophys., 28, 311-328, 2021    of H .The estimator has little bias for stationary series with H < 0, but it steadily increases for higher values as the lower frequencies bend upward (Fig. 5).This seems to be related to the biased low characteristic of the MTM at the longest timescales (Prieto et al., 2007), which creates an inflection point (around t ≈ 3000 years on Fig. 1), and the power lost to its right appears redistributed to its left.Therefore, as H increases, the amount of power lost is more important and the inflection stronger.The LSP estimates are also largely unbiased until about H = 0.5, and above it, a strong negative bias is developed, particularly on the side of the smaller timescales (Figs. 5, 3 and 1).
The problem of irregularity is considered, using irregular surrogate data skewness parameter ν = 0.5 and ν = 1 (Fig. 4; bottom row).In the case of the MTM, we observe a consistent bias for all H of about 0.5 and 0.1 for the shorter and intermediate timescales, respectively, for the weakly irregular case (ν = 0.5) and practically no bias (0.02) at the longer timescales (Fig. 5).This corresponds very closely to the expected bias from the linear interpolation step, and it could potentially be accounted and corrected for.While it is outside of the scope of this study to formally develop such a method, if we apply a first-order correction by dividing the MTM spectra by a sinc 4 with time constant corresponding to the mean sampling resolution, we can greatly reduce the bias down to the Nyquist frequency (Figs. 1 and S11).
The negative bias for the LSP evoked above for H > 0.5 amplifies with irregularity and even significantly affects a wider range of H values, down to H = −0.5 for the most irregular case (ν = 1.).This is a consequence of higher than expected variance over the smaller timescales in the LSP when the slope is steep, as already identified by Schulz and Mudelsee (2002) using red noise.The HSF estimates, on the other hand, remain mostly insensitive to the irregular sampling for all H > −0.5, but we already identified the positive bias in the regular case when H < −0.5 amplifies for more negative values of H . Similarly, all methods yield increasing positive bias for such an anti-persistent time series (i.e. with H < −0.5) when the sampling is irregular.This conjuncture seems to indicate that the anti-correlation characteristic of H < −0.5 processes is lost, to some extent, when degraded in an irregular manner to produce the surrogate data, as they all show a similar increase of their bias (by about 0.15 for the H = −0.9case) with respect to their bias for the H = −0.5 case (Figs. 3 and 4).

Effect of time series length
While the irregularity had a larger impact on the bias of the estimator, the length of the time series mostly influences the variance of the estimator (Figs. 6 and S12). .Timescale dependence of the bias and variance for regular and irregular series.We evaluate the following three methods: MTM (circles), HSF (squares) and LSP (triangles).The colours correspond to the input H value for each simulation, ranging from −0.9 to 0.9 in increments of 0.1.The rows correspond to different types of surrogate series, namely (a-c) annual data, (d-f) regular data, (g-i) mildly irregular data (ν = 0.5) and (j-l) strongly irregular data (ν = 1.0); see Sect.2.2.1.The columns correspond to three different fitting ranges in terms of the mean resolution τ µ .(a, d, g, j) Shorter timescales -2-9.2 τ µ .(b, e, h, k) Intermediate timescales -4.3-19.9τ µ .(c, f, i, l) Longer timescales -9.2-42.7 τ µ .
In the case of the MTM estimates, all values of H result in a similar standard deviation for a given resolution.When the resolution is doubled, the standard deviation increases by a factor close to the expected √ 2 (by ∼ 1.4-1.7),going from σ ≈ 0.27 at the 160-year resolution to σ ≈ 0.07 at the 20-year resolution (Fig. S12).The bias also improves when the resolution increases, particularly for the higher values of H , such as for H = 0.9, which goes from B ≈ 0.27 at the 160year resolution to B ≈ 0.10 at the 20-year resolution, while https://doi.org/10.5194/npg-28-311-2021 Nonlin.Processes Geophys., 28, 311-328, 2021 for negative H values it goes from B ≈ 0.05-0.07 to |B| < 0.01 for the same resolution change (Figs. 6 and S6).
In the case of the LSP, up to H ≈ 0.1, the standard deviation of the estimates also decrease by a factor close to √ 2 (by ∼ 1.4-1.8;Fig. S12) with each doubling of resolution, going from σ ≈ 0.35 at the 160-year resolution to σ ≈ 0.08 at the 20-year resolution, and the bias improves only slightly since it was already small (|B| < 0.04).For the higher H values, the standard deviation improves less (by ∼ 1.2-1.4 at H = 0.9; Fig. S12), going from σ ≈ 0.43 at the 160-year resolution to only σ ≈ 0.19 at the 20-year resolution.At the same time, the bias change becomes more positive, thus compensating slightly better for the overall strong negative bias characteristic of the LSP estimates for very high H values.
In the case of the HSF, when increasing the resolution, the standard deviation of the estimates improves more for the lower H values than for the higher values, improving by a factor of ∼ 1.4-1.8 for H = −0.9 to ∼ 1.2-1.4 for H = 0.9 (Fig. S12).Overall, the standard deviation improves from σ ≈ 0.20-0.25 at the 160-year resolution to σ ≈ 0.06-0.09at the 20-year resolution.The slight negative bias found for the H > −0.5 estimates (|B| < 0.04) practically vanishes but not the positive one for H < −0.5, which only decreases slightly.

Application to database
In order to see how these results translate to typical proxy records, we perform the analysis with surrogate data with sampling characteristic and scaling behaviour directly extracted from the database of Holocene and LGM records (see Sect. 2.3).We make the assumption that the series approximately follow power law scaling, and that they can, therefore, be modelled by fractional noise described by a scaling exponent H . Since the best H to describe the approximate scaling behaviour of the series is unknown, we make an initial approximation with each method and take the median of the three results as the reference value for the given series, with which we generate an ensemble of surrogates with the same sampling scheme as the given series.We use the real time steps in order to evaluate the impact of each specific sampling scheme.Again, we consistently fitted up to one-third of the length (see Sect. 2.1.3)but empirically determined the best minimum fitting scale τ min such that it minimizes the RMSE of the estimator with respect to the reference H used to generate the surrogate data.
We found the HSF to yield the smallest τ min of the three methods, with τ min below twice the value of the mean resolution τ µ for 121 out of 127 series and even below τ µ for 43 out of 127 (Fig. 7).The LSP yielded similar results, with 109 out of 127 series having τ min below twice the value of τ µ and 41 out of 127 having τ min even lower than τ µ .This contrasts with the MTM, which almost never suggests best results for τ min below twice the value of τ µ but rather an average minimum resolution τ min = 3.3 τ µ , compared to τ min = 1.2 τ µ for the HSF and τ min = 1.3 τ µ for the LSP.This underlines the point that, because interpolation-free methods give more reliable estimates at shorter timescales, they allow for a better usage of the full data.
The HSF yields the best results of all three methods, both in terms of variance of the estimates and in terms of bias, with a mild tendency towards positive biases.The MTM also tends to show positive biases, albeit higher, while the LSP tends to show negative biases.Both the MTM and the LSP generally show a higher variance of their estimators than the HSF (Fig. 8), although for different reasons.The MTM has a higher variance because the higher τ min does not allow us to use the smaller timescales in the estimation procedure, while the LSP shows higher variance because of the time series with high input H for which it performs poorly, especially when the data are irregular (Fig. 4).
On average, the HSF method gave the lowest RMSE = 0.11 ± 0.04 compared to RMSE = 0.30 ± 0.15 and RMSE = 0.19±0.15for the MTM and the LSP, respectively.The poor performance of the LSP compared to the HSF stems from the higher H , since it performs as well as the Haar when H < 0, with RMSE = 0.12 ± 0.07 (Fig. 9).

Discussion
Our comparative study indicates that, for irregular time series, the methods with interpolation, i.e. the CPG and the MTM, are less efficient for evaluating variability across timescales than the methods without interpolation, i.e. the LSP and the HSF.In the case of regular time series, however, all methods were found to perform similarly (Fig. 4b) for a wide range of input H and only showed bias on the fringes of the H range considered.In addition, we found that the choice of method should also be informed by the characteristics of the underlying process measured since there was an observed dependence between the performance metrics and the scaling exponent H which generated the surrogate data.As such, the LSP may only be appropriate for irregular series suspected of having H near or below H = −0.5, while the HSF shows better reliability even when H is near or above H = 0.It might not be so surprising that the LSP performed poorly for higher H values since it has been developed to approximate the CPG for stationary noise processes (Scargle, 1982), and processes with H > 0 are non-stationary, i.e. their mean is not stable and changes with time.
In addition to the irregularity, we also compared how the resolution of the time series affects the estimators of H .While making the series more irregular mostly affected the accuracy of the estimates, i.e. their bias, without affecting their precision, i.e. their standard deviation, increasing the resolution mostly improved the precision and the accuracy only to a lesser extent.Increasing the resolution naturally improves the estimates as it provides more information and also because it decreases the relative weight of the most uncertain timescales, i.e. the smaller timescales, near the sampling resolution, and the longer timescales, near the length of the series.
Our conclusions, however, rely on the real data having similar properties to the surrogate data used to validate the methods, and the conclusions may be different if the real data have properties different than the assumed power law scaling.It is difficult to say, for example, which method would perform best if the data analyzed contained two different scaling regimes, i.e. one with low H and the other with high H (Lovejoy and Schertzer, 2013).Another limitation of our surrogate data is that it is Gaussian distributed, whereas real palaeoclimate data can exhibit multifractality (Schmitt et al., 1995;Shao and Ditlevsen, 2016).Furthermore, in order to degrade the simulated annual time series into irregular palaeoclimate-like data, we had to use simplified numerical methods to mimic the physical processes and manipulations leading to the recording and retrieval of palaeoclimate archives.For our purpose, we chose to low-pass filter the data and subsample, since this is very common in sedimentary data and ice cores because of, respectively, bio-turbation and diffusion (Dolman and Laepple, 2018;Dolman et al., 2021;Kunz et al., 2020).However, reality is more complex, and other processes could significantly alter the recorded signal https://doi.org/10.5194/npg-28-311-2021 Nonlin.Processes Geophys., 28, 311-328, 2021  and bias the variability estimates in ways not accounted for by our surrogate model.For specific case studies, it is advisable to develop forward models (Stevenson et al., 2013;Dee et al., 2017;Dolman and Laepple, 2018;Casado et al., 2020) of the studied palaeoclimate archives in order to understand potential biases generated by the recording and specific sampling.
In summary, it is difficult to ascertain which method is the best since the answer depends on the irregularity, the resolution and, most importantly, the underlying H values.In this respect, the HSF is possibly the safer method for palaeoclimate applications since it only performs poorly for series with H < −0.5, an almost unseen behaviour in climatic time series at timescales longer than decadal.Although the LSP gives equally good results for H 0 and even better near H = −0.5, i.e. white noise, its increasing bias and standard deviation for higher H values makes it an uncertain choice for timescales longer than centennial timescales since climate time series at those timescales often show values of H ≈ 0 and greater.The MTM produces good estimates for relatively regular series but struggles with highly irregular time series, since it must rely on interpolation.While this produces a large bias on the shorter timescales which we have thus discarded, the bias is rather consistent, and methods could be developed to correct for it (e.g.Fig. S11).On the other hand, the longer timescales of the MTM are relatively well estimated, and it is appropriate to estimate the absolute variance over a timescale band well above the mean resolution (Rehfeld et al., 2018;Hébert et al., 2021).Furthermore, the effect of proxy biases are well studied for the power spectrum (Kunz et al., 2020;Dolman et al., 2021), mainly due to the known properties of the Fourier transform.However, the MTM, and the LSP, do not allow for the characterization of intermittency through the study of all statistical moments; this is in contrast to the HSF, which is, thus, better suited for the analysis of time series displaying multifractality, such as palaeoclimate time series at glacial-interglacial timescales comprising Dansgaard-Oeschger events (Schmitt et al., 1995;Shao and Ditlevsen, 2016) Our study is not comprehensive, and other methods such as the bias-corrected version of Lomb-Scargle (Schulz and Mudelsee, 2002, REDFIT) and the z transform wavelet (Zhu et al., 2019) exist.More precise estimates of the scaling exponent are obtained by parametric methods based on maximum likelihood, but they require strong assumptions about the underlying process (Del Rio Amador and Lovejoy, 2019).We aimed to cover the most commonly used methods that also allow for a simple interpretation of variability across timescales, either because they represent the well-studied power spectrum or the Haar fluctuations directly provide changes in amplitude.

Conclusions
Characterizing the variability across timescales is important for understanding the underlying dynamics of the Earth system.It remains challenging to do so from palaeoclimate archives, since they are more often than not irregular, and traditional methods for producing timescale-dependent estimates of variability, such as the classical periodogram and the multitaper spectrum, generally require regular time sampling.
We have compared those traditional methods using interpolation with interpolation-free methods, namely the Lomb-Scargle periodogram and the first-order Haar structure function.The ability of those methods to produce timescaledependent estimates of variability when applied to irregular data was evaluated in a comparative framework.The metric we chose to compare them is the scaling exponent, i.e. the linear slope in log-transformed coordinates, since it summarizes the behaviour of the variability over a given timescale band.Doing so assumes power law scaling, a behaviour which is often observed in geophysical time series, at least approximatively (Mandelbrot and Wallis, 1968;Cannon and Mandelbrot, 1984;Pelletier and Turcotte, 1999;Malamud and Turcotte, 1999;Fedi, 2016;Corral and González, 2019).To evaluate our estimators, we generated fractional noise as surrogate annual time series characterized by a given scaling exponent H , also known as fractional Gaussian noise when they are stationary (H < 0) and fractional Brownian motion when they are non-stationary (H > 0).The annual time series were then degraded to resolution characteristics of palaeoclimate archives for the recent Holocene.
We found that, for scaling estimates in irregular time series, the interpolation-free methods are to be preferred over the methods requiring interpolation as they allow for the utilization of the information from shorter timescales without introducing additional bias.In addition, our results suggest that the Haar structure function is the safer choice of interpolation-free method, since the Lomb-Scargle periodogram is unreliable when the underlying process generating the time series is not stationary.This conclusion was reinforced by the application to the proxy database which indeed showed the Haar structure function to give more reliable estimates over a wide range of H .Given that we cannot know a priori what kind of scaling behaviour is contained in a palaeoclimate time series, and that it is also possible that this changes as a function of timescale, it is a desirable characteristic for the method to handle both stationary and nonstationary cases alike.
Author contributions.All authors participated in the conceptualization of the research and the methodology.RH developed the software and visualization and conducted the formal analysis and investigation.KR and TL provided supervision.RH prepared the original draft, and all authors contributed to reviewing and editing the final paper.
Competing interests.The authors declare that they have no conflict of interest.
Disclaimer.Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Special issue statement.This article is part of the special issue "A century of Milankovic's theory of climate changes: achievements and challenges (NPG/CP inter-journal SI)".It is not associated with a conference.
Acknowledgements.This study was undertaken by members of Climate Variability Across Scales (CVAS), following discussions at several workshops.CVAS is a working group of the Past Global Changes (PAGES) project, which, in turn, received support from the Swiss Academy of Sciences and the Chinese Academy of Sciences.We particularly thank Torben Kunz and Shaun Lovejoy, for the in-depth discussions, and Timothy Graves and Christian Franzke, for freely sharing the software for generating fractional noise.We also acknowledge discussions with Mathieu Casado, Lenin del Rio Amador, Andrew Dolman, Igor Kröner and Thomas Münch.We thank all the original data contributors who made their proxy data available.This is a contribution to the SPACE and GLACIAL LEGACY ERC projects. https://doi.org/10.5194/npg-28-311-2021 Nonlin.Processes Geophys., 28, 311-328, 2021

Figure 1 .
Figure1.(a, e, i, m, q, u) Surrogate time series generated with a given H are shown at annual resolution (brown), degraded to a regular 50-year resolution (blue) and degraded to an irregular and random spacing drawn out of a gamma distribution, with skewness ν = 1 and a mean of 50 years.(b, f, j, n, r, v) Shown are the mean power spectra, estimated using the MTM of 100 realizations of surrogate time series, generated as in the first column and shown with the same colour scheme.The irregular case is also shown after dividing the power spectra by the expected sinc 4 bias due to interpolation (dashed pink line).Also shown are the bounds for the fitting range considered later (vertical dashed blue line) at the Nyquist frequency corresponding to 100 years, at 1.5 times the Nyquist frequency corresponding to 150 years and at one-third of the length of the time series at 2000 years.(c, g, k, o, s, w) Same as the second column but for the LSP instead of the MTM.(d, h, l, p, t, x) Same as the second column but showing HSF instead of MTM power spectra.

Figure 2 .
Figure 2. Sampling characteristics of the 127 palaeoclimate time series (Rehfeld et al., 2018).The 88 Holocene series (teal) and 39 Last Glacial maximum series (pink) are evaluated along the following characteristics: (a) the skewness ν of the distribution of time steps, (b) the number of points, i.e. samples, and (c) the mean resolution τ µ of the time steps.

Figure 3 .
Figure 3. Mean estimate Ĥ from the surrogate time series with input values of H ∈ (−1, 1) (i.e.β ∈ (−1, 3)).Deviations from the oneto-one line (dashed grey) correspond to the bias B of the mean estimate Ĥ .Different types of data are evaluated, namely (a) regular annual data, i.e. it was directly simulated and not degraded after, (b) regular surrogate data degraded at regular 40-year intervals and irregular surrogate data with time steps drawn from a gamma distribution with (c) skewness ν = 0.5 or (d) skewness ν = 1.

Figure 4 .
Figure 4. Bias-standard deviation diagram for ensembles of H estimate surrogate time series, with input values of H ∈ (−1, 1) (i.e.β ∈ (−1, 3)).Different types of data are evaluated, namely (a) regular annual data, i.e. it was directly simulated and not degraded after, (b) regular surrogate data degraded at regular 40-year intervals and irregular surrogate data with time steps drawn from a gamma distribution with (c) skewness ν = 0.5 or (d) skewness ν = 1.

Figure 6 .
Figure 6.The performance of the estimators is shown for regular surrogate data of different resolutions at (a) 160 years, (b) 80 years, (c) 40 years and (d) 20 years.For each case, the series spanned 5120 years, and therefore, each case contains 32, 64, 128 and 256 data points, respectively.

Figure 7 .
Figure 7.The best minimum fitting timescales τ min (minimizing the RMSE in H ) are shown for each method as a function of the mean resolution τ µ for ensembles of surrogate data generated with the same sampling scheme as the corresponding palaeoclimate time series from the database.

Figure 8 .
Figure8.Bias-standard deviation diagram for the surrogate time series generated using time steps from the palaeoclimate database.The input H , used to generate the time series, is indicated by the colour inside the markers.The shaded polygons contain all the points for a given method (see the legend for colours).

Figure 9 .
Figure 9. RMSE of the H estimates as a function of the input H for the surrogate time series based on the proxy database.The RMSEs are given for the three estimation methods as a function of the H used to generate the surrogates.Also shown is a Gaussian smoothing of the points for each method (thick line) and the 1 standard deviation confidence interval (shaded).

Table 1 .
Table of acronyms used in this paper and references.