Comparison of correlation analysis techniques for irregularly sampled time series
- 1Potsdam-Institute for Climate Impact Research, P.O. Box 60 12 03, 14412 Potsdam, Germany
- 2Department of Physics, Humboldt-University of Berlin, Newtonstr. 15, 12489 Berlin, Germany
- 3Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB243UE, UK
Abstract. Geoscientific measurements often provide time series with irregular time sampling, requiring either data reconstruction (interpolation) or sophisticated methods to handle irregular sampling. We compare the linear interpolation technique and different approaches for analyzing the correlation functions and persistence of irregularly sampled time series, as Lomb-Scargle Fourier transformation and kernel-based methods. In a thorough benchmark test we investigate the performance of these techniques.
All methods have comparable root mean square errors (RMSEs) for low skewness of the inter-observation time distribution. For high skewness, very irregular data, interpolation bias and RMSE increase strongly. We find a 40 % lower RMSE for the lag-1 autocorrelation function (ACF) for the Gaussian kernel method vs. the linear interpolation scheme,in the analysis of highly irregular time series. For the cross correlation function (CCF) the RMSE is then lower by 60 %. The application of the Lomb-Scargle technique gave results comparable to the kernel methods for the univariate, but poorer results in the bivariate case. Especially the high-frequency components of the signal, where classical methods show a strong bias in ACF and CCF magnitude, are preserved when using the kernel methods.
We illustrate the performances of interpolation vs. Gaussian kernel method by applying both to paleo-data from four locations, reflecting late Holocene Asian monsoon variability as derived from speleothem δ18O measurements. Cross correlation results are similar for both methods, which we attribute to the long time scales of the common variability. The persistence time (memory) is strongly overestimated when using the standard, interpolation-based, approach. Hence, the Gaussian kernel is a reliable and more robust estimator with significant advantages compared to other techniques and suitable for large scale application to paleo-data.