This study is concerned with the intrinsic temporal scales of the variability in the surface solar irradiance (SSI). The data consist of decennial time series of daily means of the SSI obtained from high-quality measurements of the broadband solar radiation impinging on a horizontal plane at ground level, issued from different Baseline Surface Radiation Network (BSRN) ground stations around the world. First, embedded oscillations sorted in terms of increasing timescales of the data are extracted by empirical mode decomposition (EMD). Next, Hilbert spectral analysis is applied to obtain an amplitude-modulation–frequency-modulation (AM–FM) representation of the data. The time-varying nature of the characteristic timescales of variability, along with the variations in the signal intensity, are thus revealed. A novel, adaptive null hypothesis based on the general statistical characteristics of noise is employed in order to discriminate between the different features of the data, those that have a deterministic origin and those being realizations of various stochastic processes. The data have a significant spectral peak corresponding to the yearly variability cycle and feature quasi-stochastic high-frequency variability components, irrespective of the geographical location or of the local climate. Moreover, the amplitude of this latter feature is shown to be modulated by variations in the yearly cycle, which is indicative of nonlinear multiplicative cross-scale couplings. The study has possible implications on the modeling and the forecast of the surface solar radiation, by clearly discriminating the deterministic from the quasi-stochastic character of the data, at different local timescales.

The power of the electromagnetic radiation from the Sun that
reaches the surface of the Earth is estimated at around

Temporally, the SSI exhibits a very wide dynamic range. Its short-term
timescales of variability, such as clouds briefly obscuring the Sun, are
observed over seconds. At the opposite scale, thousands or even millions of
years are to be used, as related to the change of the orbital parameters of
the Earth–Sun system or to stellar evolution

To do so, first a decomposition of the time series into uncorrelated
sub-constituents with distinct characteristic timescales should be
preferred. Analysis would then ensue in a like manner for each scale. The
timescales, or characteristic periods of a time series, can be identified
with the inverse of the frequency at which the processes that generate the
data occur. It then follows that methods portraying the changes of the
spectral content of a time series with respect to time are potentially good
candidates. This would enable both the identification of the periodicities
and of the dynamic evolution of the processes generating the data. A general
class of useful signal processing techniques can thus be identified in the
so-called time-frequency distributions that depict the intensity (or energy)
of a signal in the time and the frequency domains simultaneously

Another factor to be taken into account is the nonlinear and non-stationary
characteristics of the measured solar radiation data

The study at hand will make use of the Hilbert–Huang transform (HHT), an
adaptive, data-driven analysis technique designed specifically for
investigating nonlinear and non-stationary data

Regardless of the methods used, when analyzing data there is always the need
to discriminate between deterministic signals and what are assumed to be
background stochastic realizations

At this point, the general outline of our study can be summarized as follows. We analyze measurements of daily means of SSI at different geographical locations. We focus on identifying and analyzing the intrinsic modes of the temporal variability in the SSI, as revealed by the HHT. We also investigate the physical and statistical significance of these modes. We show that the HHT is able to discriminate between a deterministic yearly cycle and multiple high-frequency (quasi-)stochastic components. We also find a non-null, statistically significant rank correlation between the amplitude envelopes of the high-frequency scales and the yearly cycle. We then discuss the possible implications of our findings on the modeling and forecast of the SSI.

The study is organized as follows. Section

Ground measurement stations listing.

The data under scrutiny in this study consist of 10-year time series of daily
means of SSI obtained from high-quality measurements performed at four
different locations (Table

The four decennial SSI time series investigated in this study, spanning 2001 through 2010. From top to bottom: BOU, CAR, PAY, and TAT. Each point corresponds to a daily mean of SSI. Time markers on the abscissa indicate the start of the corresponding year.

The four time series for the period 2001–2010 have been quality checked
according to

Two measuring stations are located in Europe, one in Japan, and one in North
America in order to capture various climatic conditions. Boulder (hereafter
abbreviated as BOU) experiences a midlatitude steppe, cool type of climate
(Köppen–Geiger: BSk), while at Carpentras (abbreviated as CAR) the
climate is a humid subtropical, Mediterranean one (Köppen–Geiger: Csa).
Both sites experience many sunny days during the year. As a rule of thumb,

Any further reference to seasons and seasonal phenomena shall be understood as occurring in the Northern Hemisphere since the stations are situated at boreal latitudes.

Histograms of the daily clearness index

Ideally, data analysis methods should require that no assumptions be made
about the nature of the scrutinized time series, i.e., neither linearity nor
stationarity should be presumed. This is because the true character of the
underlying processes that have generated the data is usually not known
beforehand. Adaptivity to the analyzed data would also be a sought-after
feature, in the sense of not imposing a set of patterns against which data
would be decomposed, but rather letting the data themselves drive the
decomposition. This latter criterion ensures both that the extracted
components carry physical meaning and that the influence of the mathematical artifacts inherent to the method on the
rendered picture of temporal variability is kept to a minimum

The Hilbert–Huang transform (HHT) is an adaptive data analysis technique built with the previous consideration in mind. It involves two distinct steps – the empirical mode decomposition (EMD) followed by Hilbert spectral analysis. In-depth discussion of each step is carried out within the dedicated subsections that follow.

The first step of the HHT is the empirical mode decomposition (EMD), an
algorithmic procedure in essence, by which oscillations that present a common
local timescale are iteratively extracted from the data. These oscillatory
components of the data are called intrinsic mode functions (IMFs). An IMF is
any function that satisfies two criteria: (1) its number of extrema and zero
crossings differs at most by one and (2) at any data point the mean value of
its upper and lower envelopes is zero. These two properties ensure that IMFs
have a well-behaved Hilbert transform

Lines 6–12 of the EMD algorithm represent the so-called “sifting loop”
which has a two-fold purpose – to discard any riding waves and to render the
IMFs more symmetric. The stoppage criterion for the sifting loop is closely
related to how the latter controls the filter character of the EMD. On the
one hand, an infinite number of sifting iterations would asymptotically
approach the result of the Fourier decomposition (i.e., constant amplitude
envelopes)

It also worth noting that the preferred interpolation method in the EMD,
i.e., lines 8 and 9 of Algorithm 1, are cubic splines

The eight IMFs obtained by decomposing the BOU time series; from top to bottom: IMF1–IMF8. The panels plot SSI (ordinate) versus time (abscissa). Time markers on the horizontal axes indicate 1 January of the corresponding year. The zero-centered oscillatory nature of the modes can be clearly seen. Also apparent is the local timescale increase with mode number.

One of the drawbacks of the original EMD is that it may introduce a
phenomenon known as “mode mixing”. This is the manifestation of
oscillations with dissimilar timescales in the same IMF or the presence of
oscillations with similar timescales in different IMFs. A workaround was
proposed by

The power spectral density (PSD) of the eight IMFs for BOU (solid
line) on a logarithmic scale normalized with respect to the power of the
highest spectral peak. The period, or inverse frequency, runs on the abscissa
in a base-2 logarithm. The individual spectra are shown in the same colors
as the IMFs from Fig.

To illustrate the workings of the EMD, the eight IMFs of the BOU time series
are presented in Fig.

Once the empirical mode decomposition is completed, the second and last step of the HHT consists in the Hilbert spectral analysis of the previously obtained IMFs. Each IMF and its Hilbert transform are used to construct a complex analytic signal, described by an amplitude-modulation–frequency-modulation (AM–FM) model. This decomposition into two time-varying parts corresponding, respectively, to instantaneous amplitude and instantaneous frequency is very useful for the purpose of this study. It enables the identification, in a time-varying sense, of how much power (i.e., the square of amplitude) occurs at which timescale (i.e., the inverse of frequency).

The Hilbert transform of each real-valued IMF

Hilbert spectral analysis of the fifth IMF of the BOU time series. The intrinsic mode function (IMF5 panel) is the product of its constituent slowly varying amplitude-modulation part (AM panel) and of its rapidly changing frequency-modulation component (FM panel). The time-varying local timescale, extracted from the FM component, is also depicted (timescale panel). Time markers on the abscissa denote the beginning of the corresponding year.

Figure

The Hilbert spectrum

The original time-series

The square of the instantaneous amplitude and the instantaneous frequency of
the IMFs can then be used to represent the data as an energy density
distribution overlaid on the time-frequency space, as in
Eq. (

The time-integrated version of Eq. (

An example of Hilbert spectral representation is given in Fig.

Interpretation of Hilbert spectral features at data boundaries must be done
with care due to possible oscillations of the spline interpolants used in the
EMD (see Algorithm 1). This effect is similar to the “cone of influence” in
the popular wavelet transform

The plot in Fig.

Which confidence can be attributed to the information extracted by the EMD? More specifically, how can one ascertain that a certain IMF is the result of a real physical process as opposed to it possibly being a stochastic manifestation of background processes?

In the past, several investigations have been carried out in order to
identify the effects of the EMD when applied to time series issued from
various models, such as white, red, or fractional Gaussian noise,

Nevertheless, the rejection of a null hypothesis based on an a priori assumed model of the background does not preclude the probability that the now statistically significant deemed signals originate from a stochastic process of a different kind. Furthermore, as the EMD is an adaptive, data-driven decomposition, it would be desirable to also employ a null hypothesis that shares the same characteristics, making no beforehand assumptions about the character of the background processes.

Following P. Flandrin (personal communication, 2015) and

A Hilbert marginal spectrum

Since the EMD is an efficient dyadic filter, frequency deviation from the
unity line will occur for IMFs generated by stochastic processes. It follows
that when

Box plot of the instantaneous timescales of the IMFs for the four
stations. The top and the bottom edges of the boxes represent the first (Q1)
and, respectively, the third (Q3) quartiles. The bars inside boxes denote the
second quartile (Q2), i.e., the median. The whisker length is set at at most
1.5 times the interquartile range, i.e.,

The IMFs obtained from the BOU time series from Fig.

From the Fourier spectra of the IMFs in Fig.

Finally, the last two components, IMF7 and IMF8, having median periods of
783.3 and 1457.4 days (Fig.

With the FM components obtained, it becomes possible to illustrate the
frequency contents of each time series in terms of its individual IMFs, as
shown in Fig.

Statistical descriptors of the instantaneous timescales of the IMFs, expressed in days.

For all time series, IMF1–IMF5 have very similar median periods
(Fig.

At this point, the Hilbert frequency distribution of the IMFs for BOU may be
compared to the Fourier one from the PSD in Fig.

Box plot of the instantaneous amplitudes of the IMFs for the four
stations. The bottom and the top edges of the boxes represent the first (Q1)
and, respectively, the third (Q3) quartiles. The bars inside boxes denote the
second quartile (Q2), i.e., the median. The whisker length is set at at most
1.5 times the interquartile range, i.e.,

Resuming the discussion of the IMF timescales from
Fig.

With the scrutiny of these low-frequency components, the discussion of the timescale distribution of the
IMFs from Fig.

The BOU Hilbert spectrum from Fig.

Thus far, all time series have been shown to share a high-frequency constituent between 2 and 100 days composed of five IMFs with mean periods following a dyadic sequence, and an IMF around 365 days that captures the yearly variability. For BOU, CAR, and PAY, a low-power region can be found in the 100- to 300-day band. Beyond the 1 year timescale, the low-frequency variability in the 1.5- to 6-year band is captured by another two (BOU and CAR) or three (PAY and TAT) components. The TAT data are the only time series that has an IMF in the low-power band between the high-frequency feature and the yearly cycle (median period 143.2 days).

The previously identified features of the SSI time series will now be discussed in terms of their intrinsic temporal scales of variability and physical statistical meaning.

At this point, having identified the spectral characteristics of the SSI
time series by means of the HHT, a question arises with regard to their
physical and statistical significance, namely how can one ascertain which
features represent the expression of real, deterministic physical phenomena
and which ones can be attributed to random realizations of background
processes. Such a method, proposed by

The drift of normalized SWMF

At this point, several precautionary notes are compulsory. First, the rule of
inference used here is

This section investigates whether the first five IMFs can be modeled as
purely uncorrelated, random noise or whether they also contain any other form of
information. To test this, the rank correlation between the yearly and
sub-yearly IMFs and their envelopes, e.g., the AM part in the middle panel of
Fig.

The resulting rank correlation coefficients and the associated

Values of

For the BOU and CAR datasets the first row (AM1) exhibits blue and dark blue
cells for IMF3–IMF5 at the statistically significant level. This
indicates a negative rank correlation. Similar, but lighter, amplitude
modulation is observed on the second row (AM2), but only by IMF4 and IMF5.
For the PAY series, this negative rank correlation is greatly reduced for the
first row (light blue tones) and is absent in the second row. For TAT no such
correlation can be observed. At this point it is interesting to note that, in
a similar way to the discussion from Sect.

It should be mentioned that the amplitude modulation of high-frequency
components by lower frequency ones is also found in the sunspots number time
series

Firstly, the median periods of the IMFs composing the high-frequency band are
revisited. It is shown in Fig.

Rank correlations between IMFs and their AM components for
BOU

Secondly, in the 100- to 300-day band, two of the stations, BOU and CAR, do
not exhibit any variability. For PAY, the support of yearly IMF6 protrudes in
this region, although its first quartile rests well below the 200-day mark.
As mentioned before, the power of the portion of this IMF that extends into
the high-frequency range is very small (not shown). Hence, while not totally
devoid of spectral features, this band contains negligible power. A distinct
mode is present at TAT in this band, whose median period of 143.2 days
somehow seems to continue the dyadic sequence of the previous five modes.
Since a similar transitional mode has also been found for two locations in
Europe

Thirdly, the median periods detected around the 1-year mark in all the datasets can be explained by the revolution of the Earth around the Sun and the associated orbital parameters. The interpretation of these components is unambiguous, with one notable exception for the PAY time series, whose IMF6 exhibits mode mixing; i.e., it has a total range that overlaps some of the modes in the high-frequency band. Nevertheless, it will be subsequently shown that it is indeed these components that account for variability at the 1-year timescale.

Lastly, the components indicative of low-frequency variability on timescales
greater than 1 year are discussed. The intrinsic timescales found in these
IMFs seem to match once more those pertaining to the so-called quasi-biennial
oscillations (QBOs) that have been
observed in solar activities and proxies with periodicities between 0.6 and
4 years

It is shown in Sect.

It can be noted that the IMF6 for both BOU and CAR has a well-defined period
(Fig.

PAY and TAT need four IMFs to account for the low-frequency variability,
i.e., one IMF more than BOU and CAR. IMF6 in PAY has a median period of 356.6 days,
close to 1 year (Fig.

Similar to PAY, TAT also has a low median clearness index

To sum up, the HHT analysis of decennial time series of daily means of measurements of the SSI from distinct BSRN stations has revealed the following: the presence of a high-frequency band (2–100 days) consisting of quasi-stochastic IMFs that have been shown to be amplitude modulated by the yearly cycle; a low-power spectral band in the 100- to 300-day region; a well-defined spectral peak at the 1 year mark accounting for the yearly variability; and multiple QBO-like components whose character has been, inconclusively, attributed to quasi-stochastic random processes.

This separation of the (quasi-)periodic components of the signal from the
apparently random realizations of a stochastic background has been shown to
significantly augment accuracy in time-series modeling

We have shown that the adaptive Hilbert–Huang transform is a versatile tool in analyzing SSI datasets, exhibiting significant nonlinearity and non-stationarity. First, we have employed it to extract the intrinsic modes of variability in the SSI at distinct timescales. Second, the HHT has been used to discriminate between the deterministic yearly cycle and the quasi-stochastic high-frequency components. The same methodology could also be employed on different geophysical signals, such as wind speed time series and river discharge datasets.

When modeling climate processes as dynamical systems with low-frequency
oscillations and noise effects,

We have also proposed that a classification of the measuring stations
according to climate and/or solar insolation conditions may be possible,
based on the Hilbert spectral features of the data. Thus, one future research
pathway could consist in creating a catalogue of the variability in the solar
resource, at different timescales, on a global scale via satellite estimates
of the SSI. Current meteorological reanalyses are too noisy in their
estimates of the SSI to form the basis for such a catalogue

The software used for this study, comprising general EMD and Hilbert spectral analysis routines, is publicly available online.

The fast EMD routine used in this study is provided by

Methods pertaining to Hilbert spectral analysis are part of a general HHT
toolkit provided by

The code for the ICEEMD(AN) algorithm

The raw BSRN datasets employed in this study are made
available by

All authors contributed equally to this work.

The authors declare that they have no conflict of interest.

The authors wish to thank Patrick Flandrin from École Normale Supérieure de Lyon, France, and Gerard Thuiller from Laboratoire Atmosphères, Milieux, Observations Spatiales in Guyancourt, France, for the fruitful conversations and their insightful comments that sparked the development of this study. Dmitrii Kolotkov from the University of Warwick, United Kingdom, is also acknowledged for the personal communications pertaining to the stochastic nature of the high-frequency variability band. The authors thank all ground station operators of the BSRN network for their valuable measurements and the Alfred Wegener Institute (AWI) for hosting the BSRN website. Edited by: Ioulia Tchiguirinskaia Reviewed by: two anonymous referees