We develop a general framework for the frequency analysis of irregularly
sampled time series. It is based on the Lomb–Scargle periodogram, but
extended to algebraic operators accounting for the presence of a polynomial
trend in the model for the data, in addition to a periodic component and a
background noise. Special care is devoted to the correlation between the
trend and the periodic component. This new periodogram is then cast into the
Welch overlapping segment averaging (WOSA) method in order to reduce its
variance. We also design a test of significance for the WOSA periodogram,
against the background noise. The model for the background noise is a
stationary Gaussian continuous autoregressive-moving-average (CARMA) process,
more general than the classical Gaussian white or red noise processes. CARMA
parameters are estimated following a Bayesian framework. We provide
algorithms that compute the confidence levels for the WOSA periodogram and
fully take into account the uncertainty in the CARMA noise parameters.
Alternatively, a theory using point estimates of CARMA parameters provides
analytical confidence levels for the WOSA periodogram, which are more
accurate than Markov chain Monte Carlo (MCMC) confidence levels and, below
some threshold for the number of data points, less costly in computing time.
We then estimate the amplitude of the periodic component with least-squares
methods, and derive an approximate proportionality between the squared
amplitude and the periodogram. This proportionality leads to a new extension
for the periodogram: the weighted WOSA periodogram, which we recommend for
most frequency analyses with irregularly sampled data. The estimated signal
amplitude also permits filtering in a frequency band. Our results
generalise
and unify methods developed in the fields of geosciences, engineering,
astronomy and astrophysics. They also constitute the starting point for an
extension to the continuous wavelet transform developed in a companion
article

In many areas of geophysics, one has to deal with irregularly sampled time
series. However, most state-of-the-art tools for the frequency analysis
are designed to work with regularly sampled data. Classical methods include
the discrete Fourier transform (DFT), jointly with the Welch overlapping
segment averaging (WOSA) method, developed by

In order to deal with non-interpolated astronomical data,

The periodogram is often accompanied by a test of significance for the
spectral peaks, which relies on the choice of an additive background noise.
Two traditional background noises are used in practice. The first one is the
Gaussian white noise, which has a flat power spectral density, and which is a
common choice with astronomical data sets, e.g. in

Estimating the percentiles of the distribution of the weighted WOSA
periodogram of an irregularly sampled CARMA process is the core of this
paper. This gives the confidence levels for performing tests of significance
at every frequency, i.e. test if the null hypothesis – the time series is a
purely stochastic CARMA process – can be rejected (with some percentage of
confidence) or not. We aim at developing a very general approach. Let us
enumerate some key points.

Estimation of CARMA parameters is performed in a Bayesian framework and relies on state-of-the-art algorithms provided by

Based on 1, we provide confidence levels computed with Markov chain Monte Carlo (MCMC) methods, that fully take into account the uncertainty
of the parameters of the CARMA process, because we work with a

Alternatively to 2, if we opt for the traditional choice of a unique set of values for the parameters of the CARMA background noise, we develop
a theory providing

Confidence levels are provided for any possible choice of the overlapping factor for the WOSA method, extending the traditional 50 % overlapping choice

Under the case of a white noise background, without WOSA segmentation and without tapering, we define the

The paper is organised as follows. In Sect.

Let us introduce the notations for the time series. The measurements

Let

Let

We use the terminology

A sequence of independent and identically distributed random variables is denoted by “iid”.

The orthogonal projection on a vector space spanned by the

Let

The biggest time step for which

The GCD
is usually defined on the integers, but we can extend it to rational numbers.
In practice,

A suitable and general enough model to analyse the periodicity at frequency

We follow here the definitions and conventions of

The background noise term,

A CARMA(

A CARMA(

In practice, only CARMA processes of low order are useful in our framework,
typically,

When

When

Note that, if the time series is regularly sampled,

The solution of Eq. (

Generation of a CARMA

The model for the trend must be as general as possible and compatible with a
formalism based on orthogonal projections (see Sect.

Consider the orthogonal projection of the data

Now, rescale

Some properties of the LS periodogram are presented in Appendix

The LS periodogram applies well to data which can be modelled as

Schematic view of the linear rescaling in

Signal average and sampling.

In order to deal with the mean in a suitable way, we define the periodogram
as

If we have

Now, we do a Gram–Schmidt orthonormalisation like in

If we want to work with the full model, Eq. (

It may happen that, for large

Similarly to Sect.

A finite-length signal can be seen as an infinite-length signal multiplied by
a rectangular window. This implies, among other things, that a mono-periodic signal
will have a periodogram characterised by a peak of finite width, possibly with
large side lobes, instead of a Dirac delta function. This is called

The phenomenon has been deeply studied in the case of regularly sampled data.
Leakage may be controlled by choosing alternatives to the default rectangular
window. This is called

In the case of irregularly sampled data, building windows for controlling the
leakage is a much more challenging task. Even with the default rectangular
window, leakage is very irregular and is data and frequency dependent, due to the
long-range correlations in frequency between the vectors on which we do the
projection. To our knowledge, no general and stable solution for that issue
is available in the literature. We thus recommend using the default
rectangular window, i.e. do no tapering, if

Besides spectral leakage, another issue with the periodogram is consistency.
Indeed, for regularly sampled time series, the periodogram is known not to be
a consistent estimator of the true spectrum as the number of data points
tends to infinity

Multitaper methods are certainly not generalisable to the case of irregularly
sampled data, except in very specific cases that are not of interest in
geophysics, like in

The time series is divided into overlapping segments. The tapered LS
periodogram is computed on every segment, and the WOSA periodogram is the
average of all these tapered periodograms. This technique relies on the fact
that the signal is stationary, as always in spectral
analysis

Basically, the

For regularly sampled data, each segment of fixed length has the same number
of data points. In the irregularly sampled case, it is not the case any more
and we have two options.

Take segments with a fixed number of points and thus a variable length. In the non-tapered case, the periodogram on each segment provides deterministic peaks (coming from the deterministic sine–cosine components) with more or less the same height. But variable length segments will give deterministic peaks of variable width.

Take segments of fixed length but with a variable number of data points. The periodogram on each segment provides deterministic peaks with more or less the same width, except if there is a big gap at the beginning or at the end of the segment, such that its effective length is reduced. But they will have variable height since the number of data points is not constant.

The only difference with the previous case is that, for each segment, we
consider the projection on

Two parameters are required: the length of WOSA segments,

In Formula (

First, note that the Gram–Schmidt orthonormalisation process requires at
least

Second, as we want to get deterministic peaks with more or less the same
width on every segment, a WOSA segment is kept in the average if the data
cover some percentage of its length

Third, the frequency range on the

In practice,

Fourth, in order to have a reliable average of the periodograms, we only
represent the periodogram at the frequencies for which the number of WOSA
segments is above some threshold. In WAVEPAL, default value for the threshold
at frequency

Significance testing allows us to test for the presence of periodic
components in the signal. It is mathematically expressed as a hypothesis
testing

To perform significance testing, we thus need

to estimate the parameters of the process under the null hypothesis (this is studied in
Sect.

to estimate the distribution of the periodogram under the null hypothesis (this is studied in Sect.

Under the null hypothesis, the signal is

Estimation of CARMA parameters is done in a Bayesian framework. We analyse
separately the case of the white noise, which is done analytically, and the
case of CARMA(

We want to estimate the two parameters of the white noise, the mean

Since we do not actually need to estimate

For other cases than the white noise,

Under the null hypothesis, the signal is

For each frequency, we need the distribution of the WOSA periodogram, Eq. (

We are thus able to estimate confidence levels for the WOSA periodogram, taking into account the uncertainty in the parameters of the background noise.

If we consider constant CARMA parameters, we show in this section that
analytical confidence levels can be computed, even in the very tail of the
distribution of the periodogram of the background noise. An example is given
in Fig.

It provides confidence levels converging to the exact solution, as the number of conserved moments increases (see below). From a certain number of conserved
moments, we can consider that convergence is numerically reached (see Fig.

As a consequence, for a given percentile, computing time is usually shorter with the analytical method than with the MCMC method. We note, however, that the
MCMC approach generally needs less computing time when the number of data points becomes large, as shown in Appendix

If the marginal posterior distribution of each CARMA parameter is unimodal,
we take the parameter value at the maximum of its PDF (white noise case, see
Eq.

For CARMA
processes with

The WOSA periodogram, defined in Eq. (

If the background noise is white, we have

The variance of the distribution of the periodogram, Eq. (

Going back to Eq. (

We approximate the linear combination of independent chi-square
distributions, conserving its first

Analytical variance of the WOSA periodogram for a Gaussian red noise
with

We require the expected
value of the process to be conserved, which is satisfied with the following
approximation:

The approximate distribution of the linear combination of the chi-square
distributions must have two parameters, and we conserve the expected value
and variance. A chi-square distribution with

We apply here the formulas presented in

The gamma-polynomial approximation can be extended to the

Finally, we mention that there exists an alternative expression to the above
development, in terms of Laguerre polynomials

We have shown in Eq. (

The background noise is assumed to be white.

There is no WOSA segmentation.

There is no tapering.

With a WOSA segmentation, projections at the numerator and at the denominator are not performed any more on orthogonal spaces, and this cannot therefore be applied.

The above results are a generalisation of formulas in

Going back to Eq. (

The estimated amplitudes we look for,

Like with the periodogram, leakage also appears in the amplitude periodogram.
Consequently, it may be better to work with the projection on tapered cosine
and sine if the data are not too irregularly sampled, as explained in
Sect.

Illustration of the quality of the approximations

Note that the approach we follow does not correspond to the classical least-squares problem as above since, in Eq. (

Similarly to the non-tapered case, we now determine an approximate
proportionality between the amplitude periodogram and the tapered
periodogram. We start with the model (Eq.

We now work with the full model (Eq.

The signal being stationary, we can estimate the amplitude on overlapping
segments and take the average. That gives a better estimation, more robust
against the background noise, but it has the disadvantage of widening the
peaks and thus reducing the resolution in frequency. We simply take Eq. (

We
remind the reader
that the vectors

So far, we have studied in detail the periodogram and its confidence levels
as well as the estimated amplitude. Of course, confidence levels can also be
determined for the amplitude, with Monte Carlo simulations, or with an
analytical approximation similar to Sect.

In the regularly sampled case, at Fourier frequencies, the cosine and sine vectors are orthogonal, so that, in the non-tapered case and with a constant trend, there is no difference between the periodogram and the amplitude periodogram, up to a multiplicative constant. Even with WOSA segmentation, the number of data points being identical on each segment, that multiplicative constant remains invariant.

In the irregularly sampled case, choosing one or the other depends on what
one wants to conserve. On the one hand, the periodogram conserves the flatness of the white
noise pseudo-spectrum (see Eq.

Taking into account the approximate linearity between the amplitude
periodogram and the tapered periodogram, Eq. (

When filtering is to be performed, the amplitude periodogram must be computed as well. This is the topic of the next section.

We want to reconstruct the deterministic periodic part,

Note that, in theory, reconstruction could be done segment by segment, using
the WOSA method. But, in practice, we observe that it does not give good
results with stationary signals. Of course, if the signal is not stationary,
reconstruction segment by segment is a clever choice, but, with such signals,
it is better to use more appropriate tools such as the wavelet transform. See
the second part of this study

The time series we use to illustrate the theoretical results is the benthic
foraminiferal

We first look at the sampling;

We choose the order of the background noise CARMA process. We opt for the
traditional red noise background

The age step,

The time series and its 7th-degree polynomial trend.

CARMA(1,0) background noise analysis. Panels

Frequency analysis.

The marginal posterior distributions of the CARMA parameters are shown in
Fig.

As explained in Sect.

We compute the weighted WOSA periodogram of Sect.

The weighted WOSA periodogram and its 95 and 99.9 % confidence
levels are presented in Fig.

At six particular frequencies, check for the convergence of the analytical percentiles.

We also compute the amplitude periodogram, Eq. (

We show in Fig.

Comparison between the amplitude periodogram (

Note that we do not apply here the Akaike information criterion (AIC)

Figure

The weighted WOSA periodogram and its 95 % confidence levels for
different orders (

WAVEPAL is a package, written in Python 2.X, that performs frequency and
time–frequency analyses of irregularly sampled time series, significance
testing against a stationary Gaussian CARMA(

We proposed a general theory for the detection of the periodicities of
irregularly sampled time series. This is based on a general model for the
data, which is the sum of a polynomial trend, a periodic component and a
Gaussian CARMA stochastic process. In order to perform the frequency
analysis, we designed new algebraic operators that match the structure of our
model, as extensions of the Lomb–Scargle periodogram and the WOSA method. A
test of significance for the spectral peaks was designed as a hypothesis
testing, and we investigated in detail the estimation of the percentiles of
the distribution of our algebraic operators under the null hypothesis.
Finally, we showed that the least-squares estimation of the squared amplitude
of the periodic component and the periodogram are no longer proportional if
the time series is irregularly sampled. Approximate proportionality relations
were proposed and are at the basis of the weighted WOSA periodogram, which is
the analysis tool that we recommend for most frequency analyses. The general
approach presented in this paper allows an extension to the continuous
wavelet transform, which is developed in Part 2 of this study

The Python code generating the figures of this article is available in the Supplement.

We present some properties of the LS periodogram, defined in Sect.

The LS periodogram and all its generalisations (e.g. Eq.

Integrating the orthogonal projection

As stated in

We show here the equivalence between some published formulas, with notations that are a mix between those of the cited articles and those of the present one in order to facilitate the reading.

We define the pseudo-spectrum as the expected value of the WOSA
periodogram under the null hypothesis (see Sect.

When dealing with a trendless signal, we can perform the WOSA on the
classical tapered periodogram, and the pseudo-spectrum becomes

In that book, the authors work with the projection on
complex exponentials,

In the case of irregularly sampled data, the spectrum

We extend the gamma-polynomial approximation of Sect.

We work with the generalised gamma distribution, which has three parameters,

In

We extend here the formulas

In

A comparison between the computing times, for generating the WOSA
periodogram, with the analytical and with the MCMC significance levels, based
on the hypothesis of a red noise background, is presented in Fig.

CPU type: SandyBridge 2.3 GHz. RAM: 64 GB.

.We see that the analytical approach is faster than the MCMC approach as long
as the number of data points is below some threshold, the latter increasing
with the level of confidence. Indeed, the analytical approach delivers
computing times of the same order of magnitude regardless of the percentile
(the two blue curves in Fig.

Computing times for generating the WOSA periodogram with analytical
(blue) and MCMC (green) confidence levels, in function of the number of data
points (disposed on a regular time grid). Log–log scale.

The formula of the F periodogram (Eq.

A slightly different formula was published in

The supplement related to this article is available online at:

The authors declare that they have no conflict of interest.

The authors are very grateful to Reik Donner, Laurent Jacques, Lilian Vanderveken, and Samuel Nicolay, for their comments on a preliminary version of the paper. This work is supported by the Belgian Federal Science Policy Office under contract BR/12/A2/STOCHCLIM. Guillaume Lenoir is currently supported by the FSR-FNRS grant PDR T.1056.15 (HOPES). Edited by: Jinqiao Duan Reviewed by: two anonymous referees