Many geophysical quantities, such as atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long memory (LM). LM implies that these quantities experience non-trivial temporal memory, which potentially not only enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LM. In this paper we present a modern and systematic approach to the inference of LM. We use the flexible autoregressive fractional integrated moving average (ARFIMA) model, which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LM, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g., short-memory effects) can be integrated over in order to focus on long-memory parameters and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data and the central England temperature (CET) time series, with favorable comparison to the standard estimators. For CET we also extend our method to seasonal long memory.

Many natural processes are sufficiently complex that a stochastic model is
essential, or at the very least an efficient description

The asymptotic power-law form of the ACF corresponds to an absence of a
characteristic decay timescale, in striking contrast to many standard
(stationary) stochastic processes where the effect of each data point decays
so fast that it rapidly becomes indistinguishable from noise. An example of
the latter is the exponential ACF, where the

The study of long memory originated in the 1950s in the field of hydrology,
where studies of the levels of the Nile

For a detailed exposition of this period of
mathematical history, see

Most research into long memory and its properties has been based on classical
statistical methods, spanning parametric, semi-parametric, and non-parametric
modeling

Towards easing the computational burden, we focus on the autoregressive fractional integrated moving average (ARFIMA) class of processes

Here we present a Bayesian framework for the efficient and systematic
estimation of the ARFIMA parameters. We provide a new approximate likelihood
for ARFIMA processes that can be computed quickly for repeated evaluation on
large time series, and which underpins an efficient Markov chain Monte Carlo (MCMC) scheme for Bayesian
inference. Our sampling scheme can be best described as a modernization of a
blocked MCMC scheme proposed by

The aim of this paper is to introduce an efficient Bayesian algorithm for the
inference of the parameters of the ARFIMA(

We provide here a brief review of the ARFIMA model. More details are given in
Appendix

An ARFIMA model is given by

For now, we restrict our attention to a Bayesian analysis of an
ARFIMA(0,

Here we develop an efficient and new scheme for evaluating the (log)
likelihood, via approximation. Throughout, the reader should suppose that we have observed the
vector

Our proposed likelihood approximation uses a truncated
autoregressive model (AR) (

It is now possible to write down a

This is still of little use because the

Evaluating this expression efficiently depends upon efficient calculation of

We are now ready to consider Bayesian inference for
ARFIMA(0,

We follow earlier work

The MH algorithm, applied alternately in a Metropolis-within-Gibbs fashion to
the parameters

A useful cancellation in

For a full-MH approach, we recommend an independence sampler to backward
project the observed time series. Specifically, first relabel the observed
data:

Besides simplicity, justification for this approach lies primarily in is preservation of the autocorrelation structure – this is clear since the ACF is symmetric in time. The proposed vector has a low acceptance rate, and the potential remedies (e.g., multiple-try methods) seem unnecessarily complicated given the success of the simpler method.

Simple ARFIMA(0,

Such approaches, especially ones allowing larger

Below we show how the likelihood may be calculated with extra short-memory
components when

Recall that short-memory components of an ARFIMA process are defined by the
AR and moving average (MA) polynomials,

We combine the short-memory parameters

An exact likelihood evaluation requires an explicit calculation of the ACV

To focus the exposition, consider the simple, yet useful,
ARFIMA(1,

To propose parameters in the manner described above, a two-dimensional,
suitably truncated Gaussian random walk, with covariance matrix aligned with
the posterior covariance, is required. To make proposals of this sort, and
indeed for arbitrary

The only technical difficulty is the choice of proposal covariance matrix

We now expand the parameter space to include models

A potential approach is to parametrize in terms of the inverse roots (poles)
of

We therefore propose reparametrization

Besides mathematical convenience, this bijection has a very useful property

We now use this reparametrization to efficiently propose new parameter
values. Firstly, it is necessary to propose a new memory parameter

Now consider the between-model transition. We must first choose a model
prior

Now, denote the probability of jumping from model

Now suppose the current (

Posterior outputs;

Here we provide empirical illustrations for the methods above: for classical
and Bayesian analysis of long-memory models, and extensions for short memory.
To ensure consistency throughout, the location and scale parameters will
always be chosen as

Standard MCMC diagnostics were used throughout to ensure, and tune for, good
mixing. Because

We start with the null case; i.e., how does the algorithm perform when the data
are not from a long-memory process? One hundred independent
ARFIMA(0,0,0), or Gaussian white noise, processes are simulated,
from which marginal posterior means, standard deviations, and credibility
interval end points are extracted. Table

The average estimate for each of the three parameters is less than a quarter
of a standard deviation away from the truth. Credibility intervals are nearly
symmetric about the estimate and the marginal posteriors are, to a good
approximation, locally Gaussian (not shown). Upon, applying a proxy
credible-interval-based hypothesis test, one would conclude in 98 of
the cases that

Next, consider the more interesting case of

Posterior outputs:

Posterior summary statistics for an ARFIMA(0,0,0) process. Results are
based on averaging over 100 independent ARFIMA(0,0,0) simulations for
the long-memory parameter

From the figure is clear that the estimator for

Next, the corresponding plots for the parameters

It appears that the marginal posterior standard deviation

Posterior outputs from an ARFIMA(0,0,0) series:

Table: mean difference of estimates

We now analyze the effect of changing the time series length. For this we
conduct a similar experiment but fix

Observe that all three marginal posterior standard deviations are
proportional to

In many practical applications, the long-memory parameter is estimated using
non-/semi-parametric methods. These may be appropriate in many situations,
where the exact form of the underlying process is unknown. However, when a
specific model form is known (or at least assumed) they tend to perform
poorly compared with fully parametric alternatives

Each of these four methods will be applied to the same 100 time series with
varying

Observe that all four methods have a much larger variance than our Bayesian
method, and moreover the

In general, this method works very well; two example outputs are presented in
Fig.

Comparison of Bayesian estimator with common classical estimators:

Posterior samples of (

Spectra for processes in Fig.

Figure

Marginal posterior density of

In cases where there is significant correlation between

For both series, kernel density for the marginal posterior for

Notice how the densities obtained via the RJ method are very close to those
obtained assuming

Posterior model probabilities for time series from
Figs.

Marginal posterior densities

As a test of the robustness of the method, consider a complicated short-memory input combined with a heavy-tailed

Performance looks good despite the complicated structure. The posterior
estimate for

We conclude with the application of our method to two long data sets: the Nile
water level minima data and the CET. The
Nile data are part of the R package “longmemo” and the CET time series
can be downloaded from

Posterior model probabilities based on simulations of model
Eq. (

Annual Nile minima time series.

Marginal posterior densities for Nile minima;

Because of the fundamental importance of the Nile river to the civilizations
it has supported, local rulers kept measurements of the annual maximal and
minimal heights obtained by the river at certain points (called gauges). The
longest uninterrupted sequence of recordings is from the Roda gauge (near
Cairo), between AD 622 and 1284 (

There is evidence

We immediately observe the apparent low frequency component of the data. The
data appear to be on the “verge” of being stationary; however, the general
consensus amongst the statistical community is that the series

Table: summary posterior statistics for Nile minima.
Plots: marginal posterior densities for Nile minima –

Posterior model probabilities for Nile minima time series for
the autoregressive parameter

Summary posterior statistics for Nile minima time series for
the long-memory parameter

The posterior summary statistics and marginal densities of

It is interesting to compare these findings with other literature.

We note that the interpretation as persistence of the

CET time series (deseasonalized).

CET time series;

Posterior model probabilities for Nile minima time series for
the autoregressive parameter

In conclusion, our findings agree with all published Bayesian long-memory
results (except for the anomalous finding of

There is increasing evidence that surface air temperatures posses
long memory

The estimated seasonal function

Applying this model, the marginal posterior statistics are presented in
Table

Joint posterior samples of (

CET time series; posterior estimate (solid line) and 95 %
credibility interval (dotted line) for four blocks (black) and whole index
(red) for

Posterior summary statistics for CET index for the
long-memory parameter

In order to compare these results with other publications', it is important to
note that to remove annual seasonality from the CET, the series of annual
means is often used instead of the monthly series. This of course reduces the
fidelity of the analysis.

Of course all these studies assume the time series is stationary and, in
particular, has a constant mean. The validity of this assumption was
considered by

In order to consider the stationarity of the time series, we divided the
series up into four blocks of length 1024 months (chosen to maximize
efficiency of the fast Fourier transform) and analyzed each block
independently. The posterior statistics for each block are presented in
Table

Posterior summary statistics for four blocks of CET index for
the long-memory parameter

It is interesting to note that the degree of (conventional) long memory is
roughly constant over the last three blocks but appears to be larger in the
first block. Of particular concern is that there is no value of

Interestingly, the seasonal long-memory parameter

We have provided a systematic treatment of efficient Bayesian inference for ARFIMA models, the most popular parametric model combining long- and short-memory effects. Through a mixture of theoretical and empirical work we have demonstrated that our method can handle the sorts of time series data with possible long memory that we are typically confronted with.

Many of the choices made throughout, but in particular those leading to our
likelihood approximation, stem from a need to accommodate further extension.
For example, in future work we intend to extend them to cope with
heavy-tailed innovation distributions. For more evidence of potential in this
context, see

Finally, an advantage of the Bayesian approach is that it provides a natural
mechanism for dealing with missing data, via data augmentation. This is
particularly relevant for long historical time series, which may, for a myriad
of reasons, have recording gaps. For example, some of the data recorded at
other gauges along the Nile have missing observations although
otherwise span a similarly long time frame. For a demonstration of how this
might fit within our framework, see Sect. 5.6 of

We define an autocovariance ACV

A process

Many authors define

The process

Before turning to long memory, we require one further result. Under some
extra conditions, stationary processes with ACV

Since ACV of a stationary
process is an even function of lag, the above equation implies that the
associated SDF is an even function. One therefore only needs to be interested
positive arguments: 0

For an ARFIMA process (Eq.

There are a number of alternative definitions of LM, one of which is
particularly useful, as it considers the frequency domain: a stationary
process has long memory when its SDF follows

The simplest way of

In practice this model is of limited appeal to time series analysts because
the entire memory structure is determined by just one parameter,

Practical utility from the perspective of (Bayesian) inference demands
finding a representation in the temporal domain. To obtain this, consider the
operator (1

Finally, to connect back to our first definition of long memory, consider the
ACV of the ARFIMA(0,

We define a seasonal differencing operator (1

The generalization to include both seasonal and non-seasonal short-memory
components is obvious

Focusing on the first of these issues,

A process

The term “Gegenbauer” derives from the close relationship to the Gegenbauer
polynomials, a set of orthogonal polynomials useful in applied mathematics.
The Gegenbauer polynomials are most usefully defined in terms of their
generating function. The Gegenbauer polynomial on the order of

The spectral density function of the Gegenbauer(

Note that Gegenbauer(

The spectral density function of the

Indeed,

Although

We thank one anonymous reviewer and M. Crucifix for their comments, which helped to improve this manuscript. C. L. E. Franzke is supported by the German Research Foundation (DFG) through the cluster of excellence CliSAP (EXC177), N. W. Watkins is supported by ONR NICOP grant N62909-15-1-N143, and both are supported by the Norwegian Research Council KLIMAFORSK project 229754. N. W. Watkins thanks the University of Potsdam for hospitality. Edited by: Z. Toth Reviewed by: M. Crucifix and another anonymous referee