A propagation-separation approach to estimate the autocorrelation in a time-series

The paper presents an approach to estimate parameters of a local stationary AR(1) time series model by maximization of a local likelihood function. The method is based on a propagation-separation procedure that leads to data dependent weights defining the local model. Using free propagation of weights under homogeneity, the method is capable of separating the time series into intervals of approximate local stationarity. Parameters in different regions will be significantly different. Therefore the method also serves as a test for a stationary AR(1) model. The performance of the method is illustrated by applications to both synthetic data and real time-series of reconstructed NAO and ENSO indices and GRIP stable isotopes.


Introduction
A frequent use of the first order autoregressive model, AR(1), in climate applications is conditioned by its simplicity and efficiency in capturing the inertial nature of climatic phenomena (Hasselmann, 1976).Whereas a choice of a global parametric structure for the fitted model is common, it is often worthwhile to know whether such an approximation provides an adequate description of the analyzed series.This can become an issue, for example, in null hypothesis testing, when the fitted model is used to assess whether or not the variability recorded in a time series is consistent with a stochastic origin of this type.
In this paper we propose a method which, using the idea of structural adaptation, is capable of isolating the periods in the analyzed data where the parameters of the fitted AR(1) Correspondence to: D. V. Divine (dmitry.divine@npolar.no)model differ significantly.The method is largely based on ideas implemented in the adaptive weight smoothing (AWS) procedure for local constant modeling, introduced earlier in (Polzehl and Spokoiny, 2000) in the context of image denoising.The AWS technique was later successfully generalized to the case of an arbitrary local linear parametric structure (Chen et al., 2008) and extended to a broad class of nonparametric models, including e.g. the regression, density, Poisson and binary response model.The same idea was applied to estimation of the tail index parameter, classification, density and volatility estimation (Polzehl and Spokoiny, 2002).Theoretical results for exponential families were achieved in (Polzehl and Spokoiny, 2006).
The paper is presented as follows.In Sect. 2 we recall a global likelihood approach to the estimation of the sought parameters in the AR(1) model.Section 3 introduces the concept of local likelihood.A definition of weights is given in Sect. 4. Sections 5 and 6 present a numerical implementation of the method and explain the choice of parameters used, respectively.Examples demonstrating the performance of the method are shown in Sect.7 followed by brief conclusions in Sect.8.

Global likelihood estimation
The model of the discrete AR(1) process is formulated as Given parameters φ and σ 2 , the likelihood function of the vector Y = (y 1 , ..., y n ) has the form (Brockwell and Davis, 1998) (see 8.7.2-4): where The global maximum likelihood estimates for φ and σ 2 are defined by maximization of the log-likelihood function log(L(Y |φ, σ 2 )).Solving for σ 2 and φ yields after some algebra approximative likelihood estimates of φ and σ 2 given as The asymptotic variance estimates of the model parameters can be determined using the Fisher information matrix for the log-likelihood log L(Y |φ, σ 2 ).Using the expectations this leads to Inverting I φ,σ 2 yields the asymptotic covariance matrix for {φ, σ 2 } as Respective asymptotic confidence intervals (CI) for φ and σ 2 can now be obtained employing the asymptotic normality of the estimates.

Local likelihood and its estimation using structural adaptation
A global parametric structure considered in the previous section is generally too restrictive.To allow for possible variability in the model coefficients we introduce a local parametric structure to the model.The most general way of describing a local model is based on weights assigned to each observation used.Let, for a fixed t, a nonnegative weight w i =w i (t)≤1 be assigned to the observations y i at t i , i=1, .., n.The local, or weighted, maximum loglikelihood estimate is, in this case, introduced as where log p(y i |φ, σ 2 ) denotes the contribution to the global log-likelihood function at any given point t i , given by log p(y i |φ, The principal idea of the approach proposed in (Polzehl and Spokoiny, 2000) is to use a structural assumption of local homogeneity to determine data dependent weights w i (t j ) that define the local likelihood.Weights w i (t j ) are obtained using a sequence of likelihood ratio tests for the hypothesis that the parameters at times t j and t i coincide.
For any local model characterized by a set of weights W (t)=(w 1 (t), ..., w n (t)), the local log-likelihood defined by these weights is Maximization of this weighted log-likelihood with respect to φ and σ 2 yields the local Maximum Likelihood estimates Let us assume that the time series is stationary within some local vicinity U (t i ) of time t i .If we knew the neighborhood U (t i ) in advance, we would define local weights as w ij =w j (t i )=I t j ∈U (t i ) , where I denotes the indicator function, and use them to estimate the model parameters.Such knowledge is usually not available, but information on a suitable local model W (t i ) can be inferred from estimates φ(t), σ 2 (t) .Assume that we have estimates in all observed time points t j .We can use this information to infer on the neighborhood U (t i ) by testing the hypothesis A weight w ij can be assigned based on the value of a test statistic T ij , assigning zero weights if φ(t i ), σ 2 (t i ) and φ(t j ), σ 2 (t j ) are significantly different.This can be embedded into an iterative procedure.At each iteration (k) we restrict positive weights to observations at times t j with |t j − t i |<h (k) , starting with a very small initial bandwidth (h (0) ) and increasing the bandwidth with each iteration.Testing along the entire time-series yields a set of weights W (t i ) that determines a local model in t i and hence a new estimate of φ(t), σ 2 (t) in t i .
We define a test statistics T ij by a local likelihood ratio test for the hypothesis of equal parameters at times t i and t j .
Given estimates φi , σ 2 i and φj , σ 2 j at times t i and t j , the difference between the local log-likelihoods evaluated for the two estimates employing the weighting scheme W i is The value of T ij is used to define a statistical penalty.That is, if T ij is sufficiently larger than some prescribed value λ, the parameters at time t i and t j are significantly different and therefore the corresponding weight is set to zero.The process of finding the local models and estimating the local likelihoods is implemented in an iterative procedure, formally presented below in Sect. 5. Approximate confidence limits for the parameters φ, σ 2 at every t i can be established using an equivalent of Eq. ( 7) for the local likelihood and assuming asymptotic normality of the estimates.This yields approximate variances of φ(t) and σ 2 (t) as The correspondent 100(1 − α)% CIs are given by φ(t) ± Z α/2 V φ (t) and σ 2 (t)±Z α/2 V σ 2 (t) on φ(t) and σ 2 (t), respectively, where Z α/2 denotes the α/2 quantile of the standard Gaussian distribution.These intervals are approximative in the sense that they are centered at mean parameter values within the set of time points with positive weights w i (t j ).
They are, as usual for any nonparametric estimates, not adjusted for a potential bias.Additionally the asymptotic for σ 2 is very slow.Note that in a homogeneous situation with a free extension of every local model during the iterative process, local MLE's φ(t), σ 2 (t) and their variances converge asymptotically to their global MLE estimates, as given by Eqs. 5 and 7.

Definition of weights
For every pair (i, j ), the weight w ij is constructed on the base of two independent quantities: a location penalty l ij = ρ(t i , t j )/ h 2 and a statistical penalty Here h and ρ(t i , t j )=|t i −t j | denote, respectively, the bandwidth and the Euclidean distance between the design points t i and t j .The parameter λ can be treated as the critical value for the test statistics T ij and defined empirically to satisfy a propagation criteria in the homogeneous case (see below).The weights are thereby defined as with K loc and K st being two positive kernel functions.We present the definitions of kernels later in Sect.6.

Numerical implementation
The computational steps to analyze a time-series using the proposed technique are as follows: 1. Initialization Specify the initial bandwidth (h (0) ) and compute, for every i=2, ..., n, the weights w 2i as defined in Eq. ( 10).Obtain initial estimates of φ (1) i and σ 2(1) i using Eq. ( 11).

Adaptation
At each iteration step k compute for every pair i, j =2, ..., n the location and statistical penalties and obtain a new set of weights as This specifies the local in .

Local estimation
Compute new estimates of the sought parameteres φ Stop the procedure if the bandwidth reaches the specified limit h max , otherwise continue with the adaptation step.

Choice of parameters
The proposed method involves several parameters and kernel functions that are to be specified.The most important is λ which scales the statistical penalty and defines its contribution to the weighting scheme at each particular point.If λ is too large, this contribution is negligible and leads to a kernel estimate with bandwidth h max .Too small values of λ, in turn, lead to overpenalization and may result in a random segmentation.We therefore suggest to chose a minimal value of λ satisfying a propagation condition (Polzehl and Spokoiny, 2006).This implies, in a completely homogeneous situation, a free extension of every local model during the iterative process.More specifically, if φ(t)≡φ and σ 2 (t) ≡ σ 2 , we request at each iteration step k and for a specified constant α>0 that Here φ(k) and σ (k) denote the nonadaptive kernel estimate of φ employing the bandwidth h (k) from step k.Then, if h max is sufficiently large, the resulting local estimates of φ and σ 2 will employ the same local weighting scheme and will with a high probability coincide in every point with the global estimates.Note that the sought λ does not depend on the unknown parameter φ and therefore can be approximately found by simulations.For convenience, the value of λ is introduced in the programming code via the quantile of the χ 2 distribution with two degrees of freedom and probability p λ .A value of p λ =0.7 was found to obey the propagation condition for a prespecified value of α=0.1 by simulation in a homogeneous situation.
The minimal bandwidth, h (0) , which determines the method's resolution, should be reasonably small.The minimal value of h (0) in the model is 2, to ensure identifiability of parameters.Nevertheless, to improve stability we suggest using h (0)  =3, which is set as a default in the implementation.The maximal bandwidth h max controls the numerical complexity of the algorithm.Ideally, a sufficient value of h max is comparable to the size of the maximal homogeneous region in the analyzed data.However, since this is not known a priori, h max equal to the length of the time-series can be used instead.It ensures a convergence of the algorithm in terms of convergence of the statistical penalties K s (s ij ) to 1, if t i and t j are in the same homogeneous region, or 0, if they belong to regions with different parameters, respectively.Factor a controls the growth rate of the local neighborhood in the course of the iterative process for every point x i .By default we use the value of a=1.25.
The kernel functions used in constructing the local weighting scheme should satisfy two main criteria: be nonnegative and non-increasing.For calculating the statistical penalty we use K st (u)=e −u I u≤5 .A good alternative is K st (u)=I u∈[0,1/4) +4/3(1 − u)I u∈[1/4,1) leading to better sensitivity and an earlier stabilization of the estimates.The localization kernel used in calculating the location penalty is K loc (u)=(1−u) + .Numerical experiments with different kernel designs have demonstrated that the shape of the location kernels has only minor influence on the resulting estimates.

Case studies
We will in this section present results obtained by applying the AWS method to both synthetic and real data sets.The goal is to test the behavior of the algorithm and demonstrate the overall performance of the method.We first run a series of simulations using synthetic data assessing how a goodness of fit varies depending on the properties of the test series (Sect.7.1).Then two simple examples with synthetic timeseries where the parameters φ and σ 2 are piecewise constant and linearly changing are presented in Sect.7.2.The other three examples in SectS.7.3, 7.4 and 7.5 show the application of the proposed method to time-series widely used in climate studies: the NAO and ENSO N i no − 3 indices and GRIP oxygen isotope data (see further details below).
Appropriateness of the AR(1) model is often checked by default in geophysical applications.We fitted the global AR(1) model to the sample time-series using the armcov MatLab function utilizing a modified covariance method.The cumulative periodogram test for randomness of the residuals (Box et al., 1994) did not reject stationarity of the proposed model for describing the analyzed time-series of NAO and N i no − 3, although it was not the case for the last example of oxygen isotope record from Greenland ice core shown in Sect.7.5.This is in contradiction to our findings indicating that the default test is not sensitive enough.
The results of our analysis are visualized in a fourcomponents visual display, see Figs. 4-8.It comprises the analyzed time-series itself (top left panel), estimated φ and σ 2 of a signal (top right and bottom left panels, respectively) and a sum of weights N (t) at the last iteration step (bottom right panel).
A goodness of fit measure is established on the basis of the corrected Akaike's information criterion (AICC, Burnham and Anderson, 2002).We ran the program twice for each particular synthetic series with p λ =0.7 and 1.0, which in the second case implies a global MLE estimate of φ(t), σ 2 (t) .The corresponding AICC AWS and AICC g measures were calculated using RSS y i , the residual sum of squares between the series and its fitted values where k denotes the effective number of parameters in the fitted model.For the global AR(1) model we have k=2 while for the AWS approach this number is random.Instead of this random value we used the effective number of parameters in the true underlying model which is 4.This number is larger than or equal to the expected effective number of parameters, so that our estimate will in most cases be larger or equal to the AICC using the correct value of (k).Each numerical experiment with {n, φ} yielded in total 1000 values of the test statistics AICC AWS − AICC g scaled, for convenience, on the length of the series n.We then used the 95% quantile of the empirical distribution of (AICC AWS − AICC g )/n as a conservative estimate of the relative skill of the method in identifying the autocorrelation structure of the data.Note that Fig. 1 shows that the AICC-derived goodness of fit is preferentially negative, suggesting that the choice of the proposed technique within this particular ensemble of numerical experiments is at least as relevant as the use of the common global MLE method.Usually for small n and small φ we do not detect nonstationarity, and the resulting estimate coincides with the global estimate.What we see in these cases in Fig. 1 is the effect of overestimating AICC AWS by using k=4 instead of the correct k=2 if nonstationarity was not detected.This approach however does not demonstrate if the method does detect the presence of jumps in the parameters of the modeled series.Using our prior knowledge of the experimental design we define a second goodness of fit measure from the RMS error of the estimated φ relative to the true one.For each run of AWS in m=1, ..., M with {n, φ}, RMSE {n, φ} m , the root mean square error in φ of the estimated model φ(t), σ 2 (t) was calculated.A simple approach to quantify the capability of the method to capture the sudden change in φ for a given {n, φ}, is to compare the RMSE {n, φ} with φ.The respective goodness of fit statistics is defined as with g>1 being a critical value, implying that the procedure has successfully separated the two segments with different correlation structures.denotes the 95% quantile of the empirical distribution of RMSE {n, φ} for a given {n, φ}.We see that for a relatively short (about 200 points) time-series the method guarantees separation only if the step in the autocorrelation coefficient is sufficiently large (above 0.7).These numerical estimates are in reasonably good agreement with a simple inference about the expected performance of the method.These can be drawn from the asymptotic 95% confidence intervals for φ in case of the global MLE (see Eq. 7 and Fig. 3).One should note, however, that this numerical experiment disregards the actual values of φ 1 and φ 2 , whereas the width of the confidence interval for φ varies substantially with φ and n. Figure 2 therefore provides the mean estimates of the goodness of fit as the whole range of possible (φ 1 ,φ 2 ) is considered.

Synthetic time-series
A simple synthetic series containing n=1500 data points is constructed by concatenation of segments of different lengths with locally constant φ and σ 2 .The AWS results are shown in Fig. 4. The figure demonstrates the ability of the method to successfully localize discontinuities in the model parameters by a propagation of weights within homogeneous regions.Note that despite some discrepancy between the true (dotted lines) and estimated (solid lines) values of φ and σ 2 , a good agreement is generally reached.
Figure 5 shows the second example where the method is applied to a series of length n=1000 with σ 2 =1 and φ changing linearly from -0.99 to 0.99.For h max =n the method yields piecewise constant estimates of φ and σ 2 , in accordance with the structural assumption of local stationarity.Note that the magnitudes of jumps in φ and the correspondent segment lengths are close to the minimal detectable ones for the considered φ n , as Fig. 2 shows.

Reconstructed winter NAO index
Figure 6 shows the AWS analysis of the winter (December through March) North Atlantic Oscillation (NAO) index reconstruction for the period 1500-1997 published in (Luterbacher et al., 2002) (the series is available in the World Data Center for Paleoclimatology at http://www.ncdc.noaa.gov/paleo/wdc-paleo.html).The NAO is the dominant pattern of atmospheric circulation variability over the North Atlantic basin, having large impacts on weather and climate in the North Atlantic region and surrounding continents.The NAO index, which is defined as the time-averaged difference of sea level pressure (SLP) between Iceland and the Azores, reflects the strength of the westerly across the Atlantic basin into Europe (note that there also exist alternative definitions of the NAO index, see for example van Loon and Rogers, 1978).Luterbacher et al. (2002) provide a proxy-and early instrumental data based reconstruction of the seasonal and monthly NAO indices.We used winter means for the earlier part of the record for 1500-1668 and derived mean December through March indices from the respective monthly means for the rest of the period.
The estimated φ is close to zero for the first four hundred years of the reconstruction period.Around the turn of the 20th century, however, the lag-1 autocorrelation coefficient undergoes an excursion from zero towards more positive values.A tendency for the winter NAO to show a behavior close to a Gaussian noise during the 19th century have already been reported earlier.Hurrell and van Loon (1997) came to this conclusion when analyzing a spectrum of the instrumental station-based winter-mean NAO SLP index for the shorter period of 1864-1996.Weak positive φ in the 20th century reflects a tendency for the NAO to have a slightly "red" spectrum, what can be attributed to the enhanced variability at decadal scale in the 20th century (Hurrell and van Loon, 1997).Note that the change in φ is accompanied by the increase in the estimated variance, which can also be ascribed to strengthening of the decadal variability in the NAO.A simple AR(1) model fitted to the data, despite being adequate for the analyzed time-series on average, is not capable of capturing the quasi-oscillatory behavior in the data.Yet it may provide an indication that the character of the dependence in the analyzed series has changed.Figure 7 reveals a substantial decrease in φ between the two intervals of not rejected local stationarity with respect to the AR(1) model fitted to the reconstructed Ni no − 3 index.The change most likely occurred around 1800, and is accompanied by a rise in the estimated value of σ 2 .We suggest  that this provides an indication of a change in the character of the dependence in the correlation structure of the analyzed series.During the first half of the considered period the reconstructed N i no − 3 index exhibits a negative trend which levels out in the early 19th century.We note that the presence of this long-term trend component is not consistent with the simple model fitted to the data and partly accounts for the elevated values of φ before 1800.
The tendency towards decreased ENSO variability before 1850 has been argued for in previous proxy-based ENSO studies (see for example Stahle and Cleaveland, 1993).Mann et al. (2000), in turn, came to a similar conclusion having applied the evolutive spectral analysis to the same Ni no − 3 index reconstruction.The analysis revealed an enhanced interannual variability after the mid 19th century, which also agrees well with our estimates.

GRIP oxygen isotope series
In the last example we apply the AWS technique to GRIP (Greenland Ice Core Project) ice core δ 18 O series (Johnsen, 1999) which serves, to a large extent, as an indicator of past temperature changes at the core site.The analyzed part of the record covers some 60 000 years and is originally unevenly sampled in time.We resampled the time-series using 100 year bins and calculated the normalized anomalies.The time increment of 100 years for the procedure is chosen so that at least one data point falls within the resampling interval in the oldest part of the series under consideration.In the upper part of the core the data density is higher, varying be-tween 25 data points per bin in the 20th century to 5 at the termination of the last glacial period around 12 000 years BP.
Figure 8 suggest an increase in the serial autocorrelation coefficient GRIP δ 18 O with the onset of the Holocene accompanied by a decrease in the estimated variance.This result represents a combined effect of actual climate variability and a bias introduced in the series by data interpolation.On one hand, the Holocene is characterized by a more stable climate compared with the last glacial period punctuated by a series of abrupt warmings -the so-called Dansgaard-Oeschger oscillations (Dansgaard et al., 1993).On the other hand the resampling is known to alter the series autocovariance structure by bringing additional dependence to the data (Schulz and Stattegger, 1997).The latter effect will naturally be more pronounced in the uppermost part of the ice core series where the sampling density in the time domain is higher and each point in the resampled series is an average over a number of original observations.As a reference we estimated the autocorrelation using the RedFit package (Schulz and Mudelsee, 2002) which fits the AR(1) model directly to an unevenly spaced time series.The separate analysis of the Holocene and glacial parts of the original GRIP δ 18 O series yields the values of φ of 0.01 and 0.87, respectively, which are below the corresponding AWS estimates.

Conclusions
We presented a method which employs the idea of structural adaptation for fitting an AR(1) model to a time-series.The approach utilizes an assumption of a local stationary AR process.This is used to simultaneously generate local weighting schemes, i.e. a local model, and to estimate the parameters of the AR(1) model as functions of time by weighted maximum likelihood.The proposed procedure leads, for a large maximum bandwidth (h max ) to a local constant approximation of these parameter functions.This seems appropriate if periods of local stationarity are separated by shorter periods of rapid change.The approach implicitly provides a test for global stationarity, i.e. leads to the global AR(1) model if stationarity cannot be rejected.
An implementation of the AWS method is available from the authors as a package (acoraws) for the R-Project for Statistical Computing (R Development Core Team, 2005).

(
Eq. 11)  for iteration step k.Increase the actual bandwidth h(k) by some specified factor a>1 to widen the search radius for the next iteration.D. V. Divine et al.: AWS for estimating the autocorrelation in a time-series 4. Stopping

Fig. 1 .
Fig. 1.A family curve showing the 95% quantiles of the distribution of 1000 AICC-based goodness of fit values (see text for details) for AWS estimates of the autocorrelation structure in synthetic series.The series of length n are constructed by concatenating equal length segments with σ 2 =1 and AR(1) autocorrelation coefficient differing in φ (shown from the right) between the segments.

Fig. 2 .
Fig. 2. Same as in Figure 1 but for the goodness of fit measure based on RMSE of the estimated φ (see text for details).

Figure 2
displays the family curve with estimated values of Q 0.95 RMSE {n, φ} for different values of φ.Here Q 0.95 RMSE {n, φ}

Fig. 3 .
Fig. 3. Double width of the 95% confidence interval (CI 95% ) for the global maximum likelihood estimate of φ for different values of a time-series length n and with autocorrelation coefficient φ.

Fig. 4 .
Fig. 4. Synthetic time-series with piecewise constant φ and σ 2 (top left); AWS estimates of the autocorrelation coefficient (solid lines, top right) and variance (solid lines, bottom left); sum of weights, N(t) (bottom right).Dotted lines show the true values of φ and σ 2 used when modeling the time-series.Dashed lines outline the confidence limits for the estimates of φ and σ 2 .A p-value of the Shapiro-Wilk test for normality of the residuals is shown above the left bottom panel.

7. 4
Reconstructed boreal winter Ni no − 3 index The next example shows the application of AWS to the reconstructed boreal winter Ni no−3 index (Mann et al., 2000) covering the period 1650-1980 (data from http://www.ncdc.noaa.gov/paleo/wdc-paleo.html).The Ni no − 3 index is based on the eastern tropical Pacific sea surface temperatures and serves as one of the indicators of the global El Nino/Southern Oscillation (ENSO) variability.Positive and negative values of the Ni no − 3 index indicate El Niño (warm) and La Niña (cold) episodes, respectively.

Fig. 6 .
Fig. 6.Reconstructed winter NAO index (top left); AWS estimates of the autocorrelation coefficient (top right) and variance (bottom left); sum of weights, N(t) (bottom right).Dashed lines outline the confidence limits for the estimates of φ and σ 2 .