Application of Levy Processes in Modelling (Geodetic) Time Series With Mixed Spectra

Recently, various models have been developed, including the fractional Brownian motion (fBm), to analyse the stochastic properties of geodetic time series, together with the extraction of geophysical signals. The noise spectrum of these time series is generally modeled as a mixed spectrum, with a sum of white and coloured noise. Here, we are interested in modelling the residual time series, after deterministically subtracting geophysical signals from the observations. This residual time series is then assumed to be a sum of three random variables (r.v.), with the last r.v. belonging to the family of Levy processes. This stochastic term models the remaining residual signals and other correlated processes. Via simulations and real time series, we identify three classes of Levy processes: Gaussian, fractional and stable. In the first case, residuals are predominantly constituted of short-memory processes. Fractional Levy process can be an alternative model to the fBm in the presence of long-term correlations and self-similarity property. Stable process is characterized by a large variance, which can be satisfied in the case of heavy-tailed distributions. The application to geodetic time series imply potential anxiety in the functional model selection where missing geophysical information can generate such residual time series.


Introduction
Among the geodetic data, time series of daily position of Global Navigation Satellite System (GNSS) receiver have been of particular interest for the study of geophysical phenomenon at regional and global scales (e.g., study of the crustal deformation due to large Earthquakes, sea-level rise - (Bock and Melgare , 2016;Herring et al. , 2016;He et al. , 2017). However, these time series contain white noise and long-memory processes (i.e. coloured noise). The scientific community agrees with the existence of a trade-off in estimating both the stochastic and functional models (He et al. , 2017). More precisely, the choice of the stochastic model directly influences the estimation of the geophysical signals included in the functional model (i.e., tectonic rate, seasonal variations, slow-slip events - (Bock and Melgare , 2016;He et al. , 2017)). To name a few, it includes the First Order Gauss-Markov (FOGM) model, the white noise with power-law noise (Williams , 2003;Williams et al. , 2004), the Generalized Gauss Markov noise model (GGM), or the Band-pass noise (Langbein , 2008;Langbein and Svarc , 2019). The optimal selection of the stochastic model in GNSS time series analysis remains a hot topic in the scientific community (Bock and Melgare , 2016;Herring et al. , 2016;He et al. , 2017He et al. , , 2019. It is widely accepted in the geodesy community (Montillet and Bos , 2019) that most GNSS time series contain flicker noise which is non-stationary. In addition, recent studies (Langbein and Svarc , 2019;He et al. , 2019) have also advocated the introduction of a random-walk to model small jumps and residual transient signals which is also a non-stationary stochastic processes. Thus, several studies, e.g., Montillet and Yu (2015), proposed the use of the fBm, first developed by Mandelbrodt et al. (1968), in order to model longmemory processes. Botai et al. (2011) and Montillet and Yu (2015) focused on modelling (residual) geodetic time series using the family of Levy α-stable distributions (Nolan , 2009). The application of this family of distribution was supported by the ability to model long-memory processes and the existence of impulsive signals/noise bursts in the data sets suggesting deviations from Gaussian distribution (Botai et al. , 2011).
This work discusses several statistical assumptions (i.e. stationary properties, presence of long-term correlations, Gaussianity of the increments) on the underlying processes in the GNSS time series, justifying the application of the fractional Brownian motion (fBm) and the family of Levy α-stable distributions introduced in Montillet and Yu (2015). A significant difference between Gaussian and Levy stable distributions is that the latter have heavy tails and their variance is infinite. This means that much larger jumps or flights are possible for Levy stable distributions, which causes their variance to diverge. Many natural processes follow Levy stable distributions. Therefore this work aims at understanding when the Levy processes can be applied to model geodetic time series.
The next section starts with the definition of the residual geodetic time series, the fBm and the relationship with the Fractional Autoregressive Integrated Moving Average (FARIMA) model. From financial analysis, we introduce the family of Levy processes (Panas , 2001) and the assumptions in order to relate to other models (i.e. FARIMA, fBm). Section 3 presents the assumptions on the use of the Levy processes in the model of the residual time series. To do so, we model the residual geodetic time series as a sum of three random variables (r.v.), with the hypothesis that the third one is a Levy process. It involves some justifications compared with established models in the scientific community developed in Section 3.1. In Section 3.2, we develop a N steps method based on the variations of the stochastic and functional models when varying the time series' length. Section 3.3 is an application to simulated and real time series. Finally, Section 3.4 discusses the limits of modelling geodetic time series with Levy processes. GNSS time series are generally regarded as a sum of geophysical signals (i.e. seasonal signal, tectonic rate) and stochastic processes (e.g., white noise, coloured noise) (Williams et al. , 2004;Davis et al. , 2012). Modelling the stochastic processes within the geodetic time series is crucial in order to estimate the geophysical signal parameters with reliable uncertainties ((Montillet and Bos , 2019) Chapter 1 and 2, (He et al. , 2017)).
Here, the residual time series are defined as the remaining time series after subtracting deterministically modeled tectonic rate and seasonal components (i.e. the functional model), from the GNSS observations. The functional model of those signals is based on the polynomial trigonometric method (Li et al. , 2000;Williams , 2003;Tregoning and Watson , 2009) with s 0 (t) the sum of the tectonic rate (with coefficient a and b in Eq. (1)) and the seasonal signal (sum of cos and sin functions in Eq. (1)) at the epoch t. Note that d j is equal to 2πj/N , and N can be equal up to 7 (He et al. , 2017). If x(t) is the residual time series after subtracting the GNSS time series (s(t)) with the functional model (s 0 (t)) of the geophysical processes (e.g., seasonal signal, tectonic rate), it is generally formulated the hypothesis that the residual time series is a sum of a residual signal and a noise. Following (Williams , 2003;He et al. , 2017;Montillet and Bos , 2019), the stochastic noise model is described with the variance: where the vector n = [n(t 1 ), n(t 2 ), ..., n(t L )] is a multivariate noise with t i the time at the i-th epoch. Note that n(t i ) = n 0 (t i )+n 1 (t i ), with n 0 (t i ) and n 1 (t i ) the white noise and the coloured noise sample respectively at the i-th epoch. † is the transpose operator, I the identity matrix, J is the variance-covariance matrix of the coloured noise. Finally, σ 2 n0 and σ 2 n1 are the variance of the white noise and coloured noise respectively. Therefore, this type of time series belongs to the family of mixed spectra, where the mixed spectrum results from the sum of the spectra corresponding to the two kinds of noise (Li , 2013). Note that the length of the time series L is much larger than the number of frequencies N defining s 0 (t).
In the modelling of GNSS time series, a strong assumption is the so-called Gauss-Markov hypothesis ( (Montillet and Bos , 2019) Chapter 2) which states that the noise is Gaussian distributed and wide sense stationary (WSS). Therefore, we assume the white noise to be zero-mean and Gaussian, whereas the coloured noise with a mean equal to µ C (t), slowly varying with time and satisfying the WSS hypothesis (Kasdin , 1995;Haykin , 2002). The distribution of the coloured noise is one of the key objective of this study, making various assumptions on the type of processes generating this noise.
Finally, the residual signal is considered to be the remaining geophysical signals (i.e. seasonal component and tectonic rate) not completely estimated due to the mismodelling of the stochastic properties of the time series and other small amplitude (i.e. sub millimeter) short time duration transient signals (i.e. local signals, subsidence, ... ) Herring et al. , 2016;He et al. , 2017).

Relationship between the Power-law Noise, fBm and FARIMA
The error spectrum of the GNSS time series is best characterised by a stochastic process following a power-law with index β. A power-law noise model means that the frequency spectrum is not flat but is governed by long-range dependencies. If the probability density function of the noise is Gaussian or has a different density function with a finite value of variance, its fractal properties can be described by the Hurst parameter (H). Montillet et al. (2013) has proposed to use the fractional Brownian motion (fBm) model in order to model the statistical properties of the residual time series. The essential features of this process are its self-similar behavior -meaning that magnified and rescaled versions of the process appear statistically identical to the original -together with its nonstationarity, implying a never-ending growth of variance with time (Mandelbrodt et al. , 1968). It is worth mentioning that a damped version of the fBm exists and known as the Matérn process, defined having a sloped spectrum that matches fBm at high frequencies and taking on a constant value in the vicinity of zero frequency (Lilly et al. , 2017).
Following the definition of the fBm from Mandelbrodt et al. (1968), if H < 0.5, the process behaves as a Gaussian variable (anti-persistent); if H > 0.5 the process exhibits long-range dependence (persistent); while the case of H equal to 0.5 corresponds to a pure Brownian motion (white noise). Previous studies (Mandelbrodt et al. , 1968;Montillet et al. , 2013) showed that H is directly connected with β by the relation: With this definition, flicker noise corresponds to β equal to 1 or H equal to 1, random walk to β equal to 2 or H equal to 1.5, and white noise to β equal to 0 (H equal to 0.5). Thus, the random walk and the flicker noise are classified as long-term dependency phenomena (Montillet et al. , 2013). Based on the Hurst exponent, one can favor similar approaches as in financial analysis to deal with modeling stochastic processes. Long-memory processes are modeled with a specific class of ARIMA models called FARIMA by allowing for non-integer differentiating. A comprehensive literature on the application of FARIMA can be found in financial analysis (Granger and Joyeux , 1980;Panas , 2001). This model can generate longmemory processes based on the value of the different values of the fractional index d (Granger and Joyeux , 1980). When d equal to 0 it is an ARMA process exhibiting short memory; when −0.5 ≤ d < 0 the FARIMA process is said to exhibit intermediate memory or anti-persistence. This is very similar to the description of the Hurst parameter in the fBm. There is a relationship between d and H such as H = d + 0.5, well-known in financial time series analysis in the presence of aggregation processes (Panas , 2001).

α Stable Random Variable and the Levy α-Stable Distributions
In financial analysis, several models are used, including the fBm and the fractional Levy distribution Panas (2001); Wooldridge (2010). The fractional Levy distribution models the Levy processes and in particular the general family of α stable Levy processes which can be self similar and stationary. Let us recall the definition of a stable random variable.
Definition (Nolan , 2009), chap. 1, definition, 1.6 A random variable X is stable if and only if X d = aZ + b, where 0 < α ≤ 2, −1 ≤ k ≤ 1, a = 0, b ∈ R and Z is a random variable with characteristic function φ(u) = E{exp (iuZ)} = ∞ −∞ exp (iuz)F (z) dz. F (z) is the distribution function of Z. E{.} is the expectation operator. The characteristic function is: Where sign is the signum function, α is the characteristic exponent which measures the thickness of the tails of these distributions (the smaller the values of α, the thicker the tails of distribution are), k ∈ [−1, 1] is the symmetry parameter which set the skewness of the distribution. In general, no closedform expression exists for these distributions, except for the Gaussian (α equal to 2), Pearson (α equal to 0.5, k equal to −1) and Cauchy (α equal to 1, k equal to 0) distributions. Note that the distribution is called a symmetric α-stable if k = 0 (Nolan , 2009;Wang et al. , 2008;Montillet and Yu , 2015). Various methods exist to estimate the parameters (Koutrouvelis , 1980;Nolan , 2009).
In the remainder of this paper, we use the maximum-likelihood method of Nikias and Shao (1995). Now, if a stochastic process is self-similar, then one can model it with the fBm (see (Cont and Tankov , 2004), Definition 7.1). Following (Weron et al. , 2005), the most commonly used extension of the fBm to the α-stable case is the fractional Levy stable motion (fLsm). This process is defined by the integral representation (see appendices). The fLsm is H-self-similar and has stationary increments, with H the Hurst parameter described before. Note that this definition of the Fractional Levy process is different from Benassi et al. (2004) which is not a self-similar process. In the remainder, we use the fLsm definition from ( (Weron et al. , 2005), Eq. (6)-recall in the appendices).
Moreover, the relationship between the fLsm and the fBm is obtained from their definition when α = 2 (see appendices). If H = 1/α, we obtain the Levy α-stable motion which is an extension of the Brownian motion to the α-stable case. The Gaussian case (Brownian motion) is then obtained with α = 2 (see Weron et al. (2005) for a comprehensive definition of the fLsm). Further definitions such as the fractional stable noise can be established with the fLsm, but there are out of the scope of this work.
Finally, the family of Levy α-stable distributions is of a particular interest in this work as the α index is equal to the inverse of the Hurst parameter, therefore in the particular case of the fLsm. Panas (2001) stated that for 1/α < H, positive increments tend to be followed by positive increments and long-range dependence (persistence); whereas for 0 < H < 1/α positive increments tend to be followed by negative increments (anti-persistence). As a consequence, this family of distributions should be suited when modeling the residual time series with a large amplitude coloured noise with long-memory processes. With the previous definition of the FARIMA and the relationship to H, one can assume that the FARIMA model is then favoured over the ARMA process in the case of large coloured noise within the time series. If the white noise is predominant (or H = 1/2), the time series should be fitted with a Gaussian distribution following our assumptions in Section 2.1, and the ARMA model is favoured over the FARIMA.

Levy Processes Applied to Geodetic Time Series Analysis
This section models the residual GNSS time series as a sum of three r.v. together with the statistical assumptions. We then develop a N -steps method to verify our assumptions on simulated and real time series.

Assumptions on the Residual Time Series and the Three Types of Levy Processes
The residual time series is here modeled as a sum of three random variables (r.v.). Namely, it is the sum of a white noise, a coloured noise and a third r.v. It is a similar approach used in previous works looking at the presence of a random-walk component in the stochastic model (Langbein , 2008;Davis et al. , 2012;Langbein and Svarc , 2019;He et al. , 2019). The stochastic properties of the third r.v. should tell us how well is the choice of our initial models (i.e. functional and stochastic). To recall the definition of the Levy processes in Section 2.3, we postulate that the third r.v. belongs to the Levy processes. We then list the type of Levy processes (Wooldridge , 2010;Cont and Tankov , 2004) depending on the assumptions on the underlying stochastic process: 1-(Levy Gaussian) The Levy process is a Gaussian Levy process if the r.v.
follows the properties of a pure Brownian motion also called a Wiener process (identity variance-covariance matrix, zero-mean, stationary process - (Haykin , 2002;Wooldridge , 2010)). That is the special case of the fLsm and fBm with H = 1/2. The residual time series is assumed to contain mostly short-term correlations modeled with an ARMA process. The residual time series should be modeled with a Gaussian distribution. 2-(Fractional Levy) The residual time series exhibits self-similarity with possibly long-term correlations. The Fractional Levy process is described by the model of the fLsm for the specific case reduced to the fBm (see previous section). The long-term correlation process is mostly due to the presence of coloured noise (He et al. , 2017). As explained in Montillet and Yu (2015), the ratio of the amplitude of the coloured over white noise determines which stochastic model of the residual time series should be the most suitable between the FARIMA and ARMA processes. The residual time series should be modeled with a Gaussian distribution following the Gauss-Markov assumption. 3-(Stable Levy) The Levy process is a Levy α-stable motion. That is to generalize important misfit between the selected (stochastic and functional) model (s 0 (t)) and the observations. If small jumps (or Markov jumps), outliers or other unknown processes are presents, it results in a distribution of the residual time series potentially (severely) skewed, not symmetric, with possibly heavy tails, hence modeling with a Levy α-stable distribution.
With the relationship between the Levy α-stable motion, the fBm and the FARIMA, we assume that the stochastic properties of the residual time series should be described with the FARIMA, especially in the presence of high amplitude coloured noise.
The assumption of modelling jumps as Markov jumps in the residual GNSS time series may not be intuitive, because the general model is a Heaviside step function (Herring et al. , 2016;He et al. , 2017). Those jumps result from equipment changes (i.e. antenna, radom) to the receiver, sudden events (bumps to the antenna), geophysical nature (co-seismic offsets) and variations in the environment of the receiver occasioning multipath (e.g., growing trees, buildings) (Montillet and Bos , 2019). In financial time series, the jumps are often resulting from the randomness of the stock prices and modeled as randomwalk. In addition, the presence of temporal aggregation processes can affect the persistence in the time series, and sometimes changing suddenly the mean depending on the amplitude of the processes (Working , 1960). That is why in order to assume a Levy α-stable motion as the underlying stochastic model in geodetic time series, we restrict our assumption to small undetectable offsets, modelling them potentially as random-walk. For a complete discussion about this topic, we invite readers to refer to Gazeaux et al. (2013) and He et al. (2017).

The N Steps Process
Let us describe the functional model and the stochastic noise model described in Equation (1) and (2) as a functional interpretation called F(θ 1 ) and G(θ 2 ). The functional model described in Equation (1) is then equal in functional form as whereas the stochastic noise model described using the variance-covariance matrix in Equation (2) is equal to G(θ 2 ). We define θ 1 = [a, b, (c j , d j ) j={1,N } ] and θ 2 = [a wh , b cl , β], the vector parameters for the functional and stochastic noise model respectively. For simplification, we have not included in the functional model the estimation of possible offsets in the time series (see Appendix B for the model). Also, a wh and b cl are the amplitude of the white and coloured noise respectively. The stochastic noise model is here based on the sum of a white and power-law noise (P L + W N ).
Here, our method is based on varying the length of the time series resulting in the variations of the stochastic and functional models, which they allow classifying the type of Levy process. The variations of the length of the time series should take into account that the coloured noise is a non-stationary signal, and thus the properties (i.e. b cl , β) vary non-linearly. However, varying the length of the time series over several years is not realistic taking into account that real time series can record undetectable transient signals, undocumented offsets and other non-deterministic signals unlikely to be modeled precisely . That is why we restrain the variations of the time series length to 1 year.
Let us call the geodetic time series s = [s(t 1 ), ..., s(t L )] and s = [s(t 1 ), ..., s(t L+N )] at the first and N -th variations respectively. The method can be described as: where. corresponds to the estimated vector or observations.
[.] j means the j-th iteration of the estimated quantity. ∆ 1 s and ∆ N s are the residual time series after the first and N -th variation of the length of the time series. res 1 and res N are the unmodeled signals and stochastic processes after the first and N -th step respectively.
To recall the assumptions in Section 3.1, the residual time series ∆ N s is modeled as a sum of three r.v. corresponding to the white noise, coloured noise and a Levy process. Using N iterations and the definition of the various Levy processes in the previous section (i.e., Levy Gaussian, Fractional Levy and Stable Levy) in the previous section, we make several assumptions on the estimated parameters and selected stochastic models in order to characterize this third r.v. Table 1 summarises the assumptions for these three cases. We use specific mathematical symbols to differentiate between them. means the equality in terms of distribution. , ∼ and = are related to the variations of the estimated parameters of the stochastic model associated with the first and the N -th iteration. This variation is calculated using the sum of the difference in absolute value between the parameters between the first and the N -th iteration. Then, a percentage is deduced based on the initial value of the parameters (at first iteration). Now specifically, the symbol means that there are little differences (less than 3%) between the estimated parameters of the stochastic model associated with the first and the N -th iteration. The symbol ∼ means that we allow bigger differences up to 20% . With much larger values, we use the symbol =.
Moreover, the estimation of the model parameters is carried out using the Hector software . We have restrained our processing to the stochastic model corresponding to the flicker noise (with white noise -F N + W N ) and power-law (with white noise P L + W N ). The optimal choice of the stochastic model is a current research topic in GNSS time series analysis including recent studies such as He et al. (2017), He et al. (2019) and Montillet and Bos (2019). To simplify our study, we have preliminarily applied the method based on the Akaike information criterion developed in He et al. (2019) on the real time series to select the stochastic noise model. Therefore we have selected real time series with stochastic models F N + W N and P L + W N . We are not going to develop further this selection process in this study, but readers can refer to He et al. (2019).
Furthermore, the fitting of the ARMA(p,q) and FARIMA(p,d,q) model to the residual time series is carried out by maximum likelihood following Sowell (1991), varying the lags p and q within the interval [0,5]. Note that the fractional parameter d is an output of the software Hector  when fitting the stochastic model during the N iterations. Also, the ARMA/FARIMA model which best fits the residual time series, is selected in order to minimize the Bayesian Information Criterion (BIC) following Montillet and Yu (2015). Finally, one can wonder if the anxiety in the model selection (ARMA, FARIMA) in presence of heavy-tails can modify the performance of the BIC. This topic is currently debated in the statistical community (see Panahi (2016)). Large tails should be detected in the fitting of the Levy α-stable distribution via the maximum-likelihood method of Nikias and Shao (1995). Due to the direct relationship between the index α and H, we assume that the FARIMA should be chosen defacto over the ARMA model. Table 1 Assumptions on the functional model and the stochastic parameters estimated via N iterations (see,N -Step method) to characterize the type of Levy processes within the geodetic time series. The symbols and notations are explained in Section 3.2

Type of Process
Levy Gaussian Fractional Levy Stable Levy Model To Characterize ARMA(p,q) ARMA(p,q) or FARIMA(p,d,q) Processes FARIMA(p,d,q)

Simulated Time Series
The definition of the Levy processes together with the assumptions in Table 1 are applied to the residual of simulated geodetic time series. The simulations of the geodetic time series follow Williams et al. (2004) and the routines associated with Hector . The estimations of the ARMA and FARIMA models follow Section 3.2. We simulate 10 years long time series fixing a wh to 1.6 mm, a varying between [1 − 3] mm/yr, b equal 0, and (c 1 , d 1 ) equal to (0.4, 0.2) mm/yr. According to Table 1, we vary the amplitude of coloured noise b cl following three scenarios: (A) from low value (i.e. b cl < 0.1 mm/yr β/4 ); (B) intermediate (i.e. 1mm/yr β/4 > b cl > 0.1 mm/yr β/4 ); and (C) high value (i.e. 1mm/yr β/4 < b cl < 4mm/yr β/4 ). In the case of the large amplitude of the coloured noise, the process is unlikely zero-mean stationary. Also, β is equal to 1 (flicker noise) or 1.5 (power-law noise) in the simulations. year (see X-axis). Each figure corresponds to the different coloured noise amplitude following the three scenarios described above. We show both the variations of the stochastic and functional models. On the Y-axis, these variations are basically the statistics (mean and standard deviation) over the percentage estimated between the parameters of either the stochastic or functional model between the first and N -th iteration. For each run, this percentage is calculated using the sum of the differences in absolute value of the various parameters described in Section 3.2.
In Hector, we use the P L + W N model . The first result which is common to all three figures, is that the variations in the functional model starts earlier than for the stochastic model. Previous studies have shown that there is some part of the noise amplitude absorbed in the functional model (Williams , 2003;. In our scenario, the estimation of the linear trend may fit partially into the power-law noise, hence reducing the variations of the stochastic model. This effect can be amplified with higher spectral indexes. Now, Figure 1 shows that over 1 year the variations of the stochastic and functional model are less than 4% (mean value) for small amplitude coloured noise, whereas when increasing the coloured noise the variations increase quickly (e.g., more than 20% for the large coloured noise amplitude for the functional model (c)) . Knowing that Hector assumes only stationary signals , it means that part of the large variations of the coloured noise is wrongly included in the estimation of the functional model.  Now, Table 2 shows the standard deviation of the difference (M ean Square Error) between the ARMA /FARIMA model and the residuals (i.e. res i in Equation (5)). We do not display any mean, because the fit of the models are done on the zero-mean residuals. Note that the value is averaged over the 50 simulations, together with the variations of the length of the time series following the same processing as before. The table also displays the averaged correlation between the distribution of the residuals and the Normal or Levy α-stable distribution. In agreement with the theory, we can see that the ARMA model fits well residuals with small amplitude coloured noise, whereas with the increase of b cl the FARIMA model fits better than the ARMA model. Looking at Table 3 in terms of correlation, the Levy α-stable distribution fits as good as the Normal distribution as long as the distribution of the residuals is Gaussian without large tails or asymmetry. In Section 2.3, we emphasized that the family of Levy α-stable distributions includes the Normal distribution with specific values of its driving parameters (see Equation 4). Thus, the results show that for the amplitude of coloured noise, not very large (i.e. Intermediate -case B -in Table 2 and 3) compared with the white noise, the two distributions show similar results. However, the scenario with large coloured noise amplitude (C), which can generate some aggregation processes thus introducing an asymmetry or large tails in the distribution of the residuals, emphasizes that the family of Levy α-stable distributions perform the best in modelling the residuals' distribution. Note that the asymmetry in the residuals' distribution is relatively limited. Much Larger coloured noise amplitude could produce greater asymmetry in the distribution as seen in financial time series with aggregation processes of high amplitude (Wooldridge , 2010). Finally, those three scenarios support ideally the theory where in the case of small amplitude coloured noise, the stochastic noise properties are dominated by the Gaussian noise, hence supporting a third r.v. defined as a Gaussian Levy. However, the increase of the coloured noise amplitude shows that it is much more difficult to discriminate between the fractional Levy and the stable Levy. The results point out that the third r.v. can be modeled as a stable Levy process when mostly the FARIMA fits the residuals due to large amplitude long-memory processes, hence creating a heavy-tail distribution. This result is restrictive for the application to geodetic time series.

Real Time Series
We process the daily position time series of three GNSS stations namely DRAO, ASCO and ALBH retrieved from the UNAVCO website (UNAVCO , 2009). The functional model includes the tectonic rate, the first and second harmonic of the seasonal signal, and the occurrence time of the offsets. This occurrence time is obtained from the log file of each station. However, ALBH is known to record slow-slip events from the Cascadia subduction zone (Melbourne et al. , 2005). Thus, we include the offsets provided by the Pacific Northwest Geodetic Array (Miller et al. , 1998). In this scenario we do not know which stochastic model could fit the best the observations. Thus, we use two models: the P L + W N together with the F N + W N .
Similar to the previous section, Figure 2 displays the percentage of variations of the stochastic and functional models averaged over the East and North coordinates of each station. Note that the average over the three coordinates is displayed in the appendices (see Figure 3). Because the Up coordinate contains much more noise than the East and North coordinates (Williams et al. Looking at Figure 2, the first result is that for all the stations, there is a strong dependence with the selected noise model. When selecting the powerlaw noise over the flicker noise model, there is an additional variable to estimate (i.e. the power-law noise exponent β in Equation (3) ) within the stochastic noise model. Even though our results show a relationship between modelling the residuals and the choice of the stochastic model, our current work does not deal with this issue. Readers interested in this topic can refer to He et al. (2017He et al. ( , 2019.
The second result is the large variations of the functional model compared with the stochastic model. As explained in the simulations, the functional model partially absorbs the variations of the noise, i.e. the tectonic rate partially fits into the power-law noise. In addition, to some extend at ASCO, the sudden increase in the functional model variations at 0.5 year may be explained due to the absorption of some of the noise with the second harmonic of the seasonal signal.
When comparing the variations of the stochastic and functional models with amplitude below 20% for the stations DRAO and ASCO, the results agree with the definition of the fractional Levy process defined in Table 1 as third r.v. modelling the residuals of the East and North components. The variations of the functional model associated with ALBH are much larger than the other two stations, especially for the P L + W N model with variations up to 50%. Those large variations can be explained due to the slow slip events and the difficulty to model the post-seismic relaxations between two consecutive events. In He et al. (2019), the authors justified the selection in the stochastic noise model of a random-walk component together with a F N + W N in order to model the mismatch between the functional model and the observations. Now Table 4 displays the statistics on the error when fitting the ARMA and FARIMA models to the residuals estimated with the PL+WN stochastic noise model. Note that Table 5 displays in the Appendices the results when using the F N + W N stochastic noise model. The FARIMA and ARMA models perform closely for the whole three stations. The large value for the Up coordinate is due to the amplitude of the noise much larger for this coordinate than for the East and North components (Montillet et al. , 2013). In terms of correlating the distribution of the residuals with the Normal and the Levy α-stable distribution, the correlation value is relatively the same for all stations which indicates that the distribution of the residuals are Gaussian with the absence of large tails. Those results further support the selection of the fractional Levy process as the third r.v. However, the study of real time series also underlines the difficulty to characterize statistically this third r.v.

Discussion on the Limits of Modeling with Levy Processes
As discussed in the previous sections, the stable Levy process is characterized by a very large (or infinite) variance. In Montillet and Yu (2015), it was assumed that the infinite variance of the residual time series comes from large tails of the distribution (also called heavy tails - (Wooldridge , 2010)), generated by a large amplitude of coloured noise, outliers and other remaining geophysical signals. The same study implied that the values of the noise variance should be bounded, excluding extreme values. This is an important assumption to decide whether or not (symmetric) α-stable distributions can be used to model any geodetic time series. Here, we are investigating how the variance due to residual tectonic rate or seasonal signal evolves with the length of the residual time series (i.e. L epochs).
To recall Section 2.1 and the assumption on the noise properties, let us estimate the mean and variance of the residual time series. Here, we call the residual time series after the first iteration s 1 = [s 1 (t 1 ), ..., s 1 (t L )] = ∆ 1 s as defined in the previous section. The mean < s 1 (L) > and variance σ 2 (L) are computed over L epochs (i.e. considering the L-th epoch defined as t L = Ldt, with the sampling time dt equal 1 for simplification and without taking into account any missing epoch in order to have a continuous time series). Based on Papoulis and Unnikrishna Pillai (2002), one can estimate the mean over L epochs < s 1 (L) > in the cases of remaining linear trend, such as: where a r and b r are the amplitude and the intersect of the residual trend (i.e. remaining tectonic rate). Note that the subscript r designates residual in the remaining section. is the approximation for L >> 1. For a time series with L epochs, the variance σ 2 (L) is: Note that Cross is the cross term between a r t i and the noise term n(t i ). Now, if we assume that the remaining seasonal signal S r (t) is a pseudo periodic function at frequencies similar to the seasonal signal in Equation (1), hence taking the form S r (t) = N j=1 c r,j cos (d j t) + e r,j sin (d j t). Thus, we can do the same estimation as above in the case of a remaining pseudo periodic component in the residual time series, such as: where δ is the average of the remaining seasonal signal. It is assumed to be independent of L and bounded such as a periodic function. The variance is equal to: with Cross is the cross term between S r (t) and n(t). In the Eq. (6) to (9), the deterministic signals and the noise are assumed completely uncorrelated, which is valid only with white Gaussian noise (i.e. Wiener process) in signal processing (Papoulis and Unnikrishna Pillai , 2002). As previously discussed in Section 2.1, coloured noise can generate long-memory processes, hence producing non-zero covariance with residual signals. Due to the varying amplitude of the coloured noise in geodetic time series with mixed spectra, the uncorrelated assumption is currently debated within the community (Herring et al. , 2016;He et al. , 2017). Therefore, recent works have introduced a random component together with a deterministic signal: nonlinear rate Dmitrieva et al. , 2017), non-deterministic seasonal signal (Davis et al. , 2012;Chen et al. , 2015;Klos et al. , 2018). Thus, strictly speaking, σ 2 should be seen as an upper bound. The closed-form solution of the variance σ 2 (L) shows that the variance is unbounded in the case of a residual linear trend. To recall the discussion in Section 3.1, if this residual trend originates from various sources not welldescribed in the functional and stochastic model (i.e. undetected jumps, small amplitude random-walk component) of the geodetic time series, the amplitude of this trend should be rather small (a < 1 mm/yr) considering the length of GNSS time series available until now (L < 30 years). Unless this nonlinear residual trend has a large amplitude, a correction of the functional model must be done a posteriori due to possible anxiety between the models and the observations. The same remarks can be applied to the variance of the remaining seasonal signal where a large amplitude would imply a misfit with the functional model. Thus, we expect rather small amplitude of the coefficients c r,j and e r,j ∼ 0.1 to ∼ 0.001 mm. Also, in the Appendix B, we have develop a similar formula to take into account undetected offsets, where we show that the variance is also bounded. In this case, a large value would mean that one or several large offsets have not been included in the functional model.

Conclusions
We have investigated the statistical assumptions behind using the fBm and the family of α-stable distributions in order to model the stochastic processes within the residual GNSS time series. We model the residual time series as a sum of three r.v. The first two are defined from the stochastic model and assumptions on the noise properties of the geodetic time series. The third r.v. is assumed to belong to the Levy processes. We then distinguish three cases. In the case of a residual time series containing only short-term processes, the r.v. is a Gaussian Levy process. In the presence of long-term correlations and exhibiting self-similarity property, fractional Levy processes can be seen as an alternative model of using the fBm. Due to the linear relationship between the Hurst parameter and the fractional parameter of the FARIMA, it is likely that the FARIMA can fit the residual time series under specific conditions (i.e. amplitude of the coloured noise). The third case is the stable Levy process, with the presence of long-term correlation processes, high amplitude aggregation processes or random-walk.
In order to check our model, we have simulated mixed spectra time series with various levels of coloured noise. We have then developed a N steps methodology based on varying the length of the time series (limited to 1 year) to study the variations of the stochastic and functional models and to model the distribution of the residuals. The results emphasize the difficulty to separate the fractional Levy process and the stable Levy process mainly due to the absorption of the variations of stochastic processes by the functional model, unless the distribution of the residuals exhibits heavy-tails. Another difficulty is the dependence of the results with the stochastic noise model. The use of real GNSS time series supports the results based on simulated ones.
However, the discussion on the limits of modeling the stochastic properties of the residuals with the stable Levy process underlines that the infinite variance property can only be satisfied in the case of heavy-tailed distributions. This condition is generally satisfied if there is a large amplitude random-walk (e.g., temporal aggregation in financial time series) or an important misfit between the models (i.e. functional and stochastic) and the observations, which means that there is anxiety in the choice of the functional model (e.g., unmodeled large jumps, large outliers). Finally, with longer and longer time series, one may be able to statistically characterize more precisely the third r.v.

Estimation of the Variance in the Presence of Offsets
We model here the offsets in the time series as Heaviside step functions according to He et al. (2017). Following Section 3.4, the residual time series in presence of remaining offsets can be written such as Where H is the Heaviside step function. One can estimate the mean over L epochs: The variance is equal to In the presence of small (undetectable) offsets ( g k < 1 mm), we can further assume that < s 1 (L) >∼ µ C and σ 2 (L) ∼ σ 2 n (L) − µ 2 C . For multiple large uncorrected offsets (i.e. noticeable above the noise floor), the variance can be large, but the distribution of the residual time series should look like various Gaussian distributions overlapping each other corresponding to the segments of the time series defined by those noticeable offsets. This case is not taken into account in our assumptions summarized in Table 1, because it supposes that there is a large anxiety about the chosen functional model -obviously missing some large offsets.