It is of importance to perform hydrological forecast using a finite hydrological time series. Most time series analysis approaches presume a data series to be ergodic without justifying this assumption. This paper presents a practical approach to analyze the mean ergodic property of hydrological processes by means of autocorrelation function evaluation and Augmented Dickey Fuller test, a radial basis function neural network, and the definition of mean ergodicity. The mean ergodicity of precipitation processes at the Lanzhou Rain Gauge Station in the Yellow River basin, the Ankang Rain Gauge Station in Han River, both in China, and at Newberry, MI, USA are analyzed using the proposed approach. The results indicate that the precipitations of March, July, and August in Lanzhou, and of May, June, and August in Ankang have mean ergodicity, whereas, the precipitation of any other calendar month in these two rain gauge stations do not have mean ergodicity. The precipitation of February, May, July, and December in Newberry show ergodic property, although the precipitation of each month shows a clear increasing or decreasing trend.

Introduction

A hydrological process can be usually regarded as a stochastic process and any observation is just a realization of a random variable representing the stochastic process. A realization of a stochastic process is defined as the outcome of an experiment in which the process is observed (Shahin et al., 1993). For example, a time series of observed precipitation data at a gauge station is a realization of the precipitation process at the area the gauge station covers. A collection of all possible realizations of a stochastic process, i.e. the ensemble, are used to represent the process.

Given that the variations of a hydrological variable representing certain hydrological process are usually very complicated and affected by random factors, statistical properties of the process, such as the phase mean function of a data series ξ(t);t=1,2,,N, m(t)=E{ξ(t)}, and the correlation function R(s,t)=Covξ(s),ξ(t), etc., must be known in order to describe and thus analyze this stochastic process. In reality, however, only finite realizations, i.e. a finite set of records {ξ(t);t=1,2,,N} of the random series ξt, in most time only one single realization, are available from observation. It will be significantly helpful and meaningful if the statistic properties of a process such as its mean and standard deviation variance can be estimated from the observation of a realization. In fact, it is common in practice for hydrologists to use the statistical properties of a single realization from a hydrologic process as the statistical properties of this process. For example, the mean value and the probability distribution of a certain number years of observation data at a flow rate gauge station are often used for further hydrological process analysis, e.g. the flood frequency analysis. By doing this, we actually assume that the statistic properties of the flow rate at this station can be estimated from the finite number of observations and their values are identical, or more strictly, close enough to the corresponding values of the statistic properties of the flow process. Is it possible to estimate the statistical properties of the population process using finite observations? How reliable is this estimate approach? These questions arose and were described as the asymptotic convergence of the average over time (sample average) to the phase mean (population mean). This property is called the ergodic property or ergodicity (Chick et al., 1996). Ergodicity is the property by which each realization of a given process is a complete and independent representative of all possible realizations of the process (Shahin et al., 1993). Thus, the ergodic properties allow scientists and researchers to determine the statistical properties of a process from a single realization. In this sense, the current common practice actually assumes that hydrological processes have ergodicity. This then brings up a question: do the hydrological processes really have ergodicity, just as being assumed in practice? How can one justify a process having ergodicity? The desirability of doing this has been realized for a long time, but rigorous and practical approaches have yet been available.

To date, only limited discussions about the application of time series ergodicity (Domowitz and El-Gamal, 2001; Morvai and Weiss, 2005) have been reported. Most studies of time series applications, such as in the fields of hydrology, hydrodynamics, and noise (Jiang and Zheng, 2005; Oliveira et al., 2006; Veneziano and Tabaei, 2004), discuss statistic characteristics simply by assuming time series having ergodicity without justifying this assumption with a rigorous approach. There have been a few discussions concerning ergodicity in the field of hydrological research. Liu (1998) assumed that ergodicity exists between the spatial distribution and the temporal propagation of hydrological factors of a water exchange system, i.e. these processes are restricted by ergodicity. Xia (2005) used power-weighted Markov chains to predict “plum rain” intensity (an East Asian rainy season usually lasting from June to July) and concluded that this process has ergodicity. In general, the ergodicity of time series refers to the ergodicity of stationary processes, which means that the process averaged over time behaves identical to the process averaged over space.

Until recently, particular studies on ergodic property analysis for hydrological processes have not yet been performed. However, the study of the ergodicity itself is not only significant but also indispensable because it is a fundamental presumption for many time series problems (Ding and Deng, 1988; Fiori and Janković, 2005; Hsu, 2003; Liu, 1998; Mitosek, 2000; Wang et al., 2004). This study proposes a practical approach for mean ergodic property analysis using autocorrelation function (ACF) or Augmented Dickey Fuller (ADF) test and a radial basis function (RBF) neural network. The term ergodic and ergodic property or ergodicity are used mainly in mathematical physics, e.g. dynamics, and the theory of stationary stochastic processes. This study focuses on the ergodicity analysis for stationary stochastic processes which are commonly applied in hydrology.

Definition of ergodicity

A process is said to be ergodic if its statistical properties (such as its mean and variance) can be deduced from a single, sufficiently long sample (realization) of the process. A stochastic process shows ergodicity when its mean and covariance functions are ergodic, i.e. mean erogdicity and covariance ergodicity. Since the ergodicity of covariance function which usually relates to forth-order moments of the process is difficult to verify, only the mean ergodic property of the process (or sequence) is discussed in this paper. For a given stochastic sequence {ξ(t);t=1,2,}, and its sample mean sequence {MT,T=1,2,}, MT=1Tt=1Tξ(t), if the variance series of the sample mean sequence lim⁡TD(MT)=0, the sample mean series is ergodic and the process {ξ(t)} is said to have mean ergodic property, where D(MT)=t=1T(Mt-Mt)2T where Mt is the mean of MT.

It has been proved that only stationary processes could have ergodicity (Davis et al., 1994). Stationarity implies that the statistical parameters of the series computed from different samples do not change except due to sampling variations. A time series is said to be strictly stationary if its statistical properties do not vary with changes of time origin. A less strict type of stationarity is called weak stationarity or second-order stationarity where the first- and second-order moments depend only on time differences (Chen and Rao, 2002). In nature, strictly stationary time series does not exist, and weakly stationary time series is practically considered as stationary time series. In addition to the stationarity, another necessary condition for ergodicity analysis is that the samples from the single realization should be taken from a large enough period of time.

A practical approach to ergodic property analysis

Currently there are no particular statistic tests designed for ergodic property analysis; we, therefore, perform the mean ergodicity analysis based on its definition and demonstrate a practical approach with a series of case studies using monthly precipitation data series collected from two rain gauge stations located in China and one in the US respectively. Whereas the definition of mean ergodicity is simple and straightforward, the practical analysis of mean ergodicity can be complicated. As discussed in the last section that a stochastic process is not ergodic unless stationary, a stationary test for the data series representing a stochastic process is then necessary as a prerequisite for further ergodicity analysis. Another challenge lies in the fact that the infinitely long data series required by the definition of mean ergodicity cannot be achieved in reality. This challenge can be overcome by extending the data series using approaches such as a reliable artificial intelligence approach. We propose to solve this challenge by predicting the D(MT) series using artificial intelligence approach, for example, Radius Basis Function (RBF) neural network, assuming that the predicted D(MT) series represents the characteristics of the population data series.

Stationarity test

Given that the commonly used statistical inference is no longer valid for a non-stationary data series, it is necessary to examine the stationarity of a data series. The standard method for stationarity test is a unit root test, i.e. a time series is stationary if there exists a unit root. The stationarity of a stochastic process is determined by the roots of its characteristic function. If all the characteristic roots are located outside of the unit element, then the process is stationary, whereas, the process is non-stationary if one or more roots are on or within the unit element or circle with unit radius. If a characteristic root has a value of unit, it is called unit root. Dickey Fuller (DF) test and Augmented Dickey Fuller (ADF) test are two commonly used unit root test. ADF test, actually an extension of DF test, eliminates the autocorrelation of residues by increasing the lags of the variable of a time series.

Without loss of generality, DF test can be illustrated with a simple AR (1) process. Consider a stochastic process, xt=ρxt-1+μt where μt is the white noise. If |ρ|<1, the data series is stationary, while when |ρ|>1, the series is non-stationary. In practice, Eq. (2) can be rewritten as Δxt=γxt-1+μt where Δ representing differential, and γ=ρ-1. The DF test is to test the following null hypothesis, H0:γ=0H1:γ<0. For a time series where the random disturbance might be destructed by its high-order time lag, e.g. a AR(p) process, ADF test is then to test the null hypothesis described by Eq. (4) for the AR(p) process as follows, Δxt=γxt-1+ξ1Δxt-1+ξ2Δxt-2++ξp-1Δxt-p+1+μt. Besides DF or ADF test, the stationarity of a data series can also be determined by evaluating its autocorrelation. For a series variables ξ(t);t=1,2,,n, its autocorrelation coefficient function (ACF) is defined as, rk=t=1n-kξt-ξξt+k-ξt=1nξt-ξ2=Covξt,ξt+kVarξt where ξ is the mean. If the ACF rapidly approaches 0 (i.e. falls into the stochastic domain), the time series is stationary; otherwise it is non-stationary (Cline and Pu, 1998, 1999).

RBF neural network

Since the D(MT)series is nonlinear, we adopt the RBF neural network approach (see Nørgaard, 2000; for detail) to extend the data series for predicting the trend of D(MT). The RBF neural network is a well-performing forward neural network model. It has high computational simplicity and extrapolation capacity, and can provide the network with strong nonlinear projecting capability (Alp and Cigizoglu, 2007). The calculation required by RBF neural network is relatively small; even with no more than a few cells, one can get a good approximation as long as the center is properly selected. On the other hand, because data with ergodic properties are unlikely to oscillate dramatically (Zhou et al., 2001), the RBF neural network is a good choice to predict the future changes of ergodic properties.

The original data series {ξ(t)} is first normalized by ξ=ξ/max⁡(ξ), forming a new data series {ξ(t)}. A 3-layer RBF network is then constructed following Nørgaard (2000), with the n1 neurons in the input layer and m neurons in the output layer. In this RBF neural network model, the first n1 data of {ξ(t)} from the observation series can then be used to predict the (n1 + 1)th data; this process can be continued, predicting a longer series of ξ data.

A practical approach for ergodicity analysis

The following procedure for ergodicity analysis is then proposed for a practical analysis of ergodicity of a data series: (i) perform the stationary analysis for the data series by evaluating its autocorrelation function. A data series has no ergodicity unless it is stationary. (ii) Calculate the sample mean value series {MT,T=1,2,}, MT=1Tt=1Tξ(t), and the variance series D(MT) of mean value MT; (iii) simulate D(MT) using approach such as Radius Basis Function (RBF) neural network, predict the trend of D(MT) with time T approaching , and (iv) determine whether the original series {ξ(t)} has mean ergodicity according to lim⁡TD(MT). It should be noted that using the proposed approach, we are trying to examine whether the assumption of mean ergodicity is consistent with the time series rather than mathematically prove the ergodicity of the time series.

Ergodicity analysis of precipitation process

Ergodicity analysis is performed for the monthly precipitation data series of three sites to demonstrate the proposed ergodicity analysis approach, including Lanzhou of Gansu Province and Ankang of Shan'xi Province, China, and Newberry Michigan, USA.

Ergodicity analysis of precipitation series of Newberry, USA

A mean ergodicity analysis is performed for each individual monthly precipitation data series of the 121 year precipitation data (1893–2013) collected from NOAA Newberry Correctional Facility, MI, USA (46.35 N, 85.5 W, Elev. 240 m). The climatic region of Newberry is typified by large seasonal temperature differences, with warm to hot (and often humid) summers and cold (sometimes severely cold) winters. According to the Köppen Climate Classification system, it has a humid continental climate. Newberry has an average of 820.3 mm precipitation per year. Its driest weather is in February with an average of 40.1 mm of precipitation, and wettest weather is in August with an average of 94 mm precipitation.

The statistics of each individual monthly precipitation series of Newberry are given in Table 1. The ACF plots, as shown in Fig. 1, and the ADF test indicate that all the 12 individual monthly precipitation data series at Newberry are stationary. The MT and D(MT) for each monthly precipitation data series are calculated and plotted as shown in Fig. 2. Apparently, T cannot be guaranteed due to the limited number of samples and thus the trend of D(MT) cannot consequently be determined directly from the data series if it is not long enough. The monthly mean precipitation data series of each data group is first normalized by ξ=ξ/max⁡(ξ), and then extended by using RBF neural network. The mean series of the new data series, MT=1Tt=1Tξ(t), and its variance series D(MT) can be calculated. The D(MT) for each extended monthly data series are plotted as shown in Fig. 3, which shows that only the D(MT) of the monthly precipitation data series of February, May, July, and December, have a clear trend approaching zero, thus these four months' precipitation being ergodic.

Ergodicity analysis of precipitation series of Lanzhou, China

Fifty years (1951–2000) of monthly precipitation data are collected from Lanzhou Rain Gauge Station (103.70 E, 35.90 N) in the Yellow River basin of China. The statistics of each individual monthly series are given in Table 1. The autocorrelation of each monthly precipitation data series indicates that all the series data of Lanzhou rain gauge station are stationary. The MT and D(MT) of data series for Lanzhou rain gauge station are then calculated; and the D(MT) of the predicted data series for each monthly precipitation data series by RBF neural network shows that only the D(MT) of precipitations of March, July and August, approach 0. Therefore, these monthly precipitation series has mean ergodicity.

Ergodicity analysis of precipitation series of Ankang, China

Similarly, the ergodicity analysis is performed for the monthly mean precipitation data series of each calendar month for the Ankang rain gauge station. Seventy years (1929–1998) of precipitation data from Ankang Rain Gauge Station (109.03 E, 32.72 N) in the Han River basin of China are collected for the ergodic property analysis. The statistics of each individual monthly series are given in Table 1. The stationarity analysis by evaluating the autocorrelation of each monthly data series shows that the monthly precipitation data series of each month is stationary. The mean ergodicity analysis is then performed for each monthly data series with extending the data series with RBF neuronal network. The results indicate that the monthly precipitation of May, June, and August is ergodic.

Results and discussions

The coefficient of variance (CV) has been widely used to measure dispersion of a data series. The more concentrated the distribution of a random variable is, the more obvious is its regularity, and vice versa. The coefficients of variance for each monthly precipitation data series of Lanzhou rain gauge station, Ankang rain gauge station, and Newberry are calculated, as shown in Table 1, respectively. As all the monthly data series are stationary in our study, we synthesize a combined stationary and nonstationary data series by clustering the monthly data series of Ankang station into four classes, as shown in Table 2, in order to investigate the non/stationarity and ergodicity simultaneously. The stationarity test and mean ergodicity analysis are then performed to the clustered data using the proposed methodology. The stationarity analysis by evaluating the autocorrelation of each group of data series shows that the (Ankang: January, February, March, November, December) series is non-stationary; whereas all the other three groups of data series are stationary. It can be seen that the coefficient of variance for clustered monthly data series of Ankang that are stationary are smaller than that of non-stationary monthly data series. This indicates that the monthly precipitation with small coefficients of variance has more tendency to be stationary, or more regular.

Although a small coefficient of variance provides the representing data series with more tendency to be stationary, it does not give a clear indication to the ergodicity of the data series. The coefficient of variance of precipitation time series with ergodicity is not necessarily smaller than those without ergodicity. Among the monthly precipitation data series of Newberry, the data series of (Newberry: September) has the smallest coefficient of variance but it does not have an ergodic property. Moreover, although the coefficients of variance of Newberry (September), Lanzhou (September), and Ankang (April), are smaller than Newberry (July), Lanzhou (August), and Ankang (June), respectively, the latter ones have ergodicity while the former ones do not. Furthermore, an ergodic process is stationary while the converse may not be true. The stationarity of a data series is the prerequisite of its ergodicity, rather than a guarantee to ergodicity; there are stationary processes which are not ergodic. In other words, a process with ergodicity is necessarily stationary in the strict sense. Therefore, neither the coefficient of variance nor the stationarity test, which is commonly performed in time series analysis, can take over the ergodicity analysis in order to make sure the statistics of a reality, such as the mean, can be safely used as those of its population.

Comparison of the ergodicity of the monthly precipitation data series of May and June of Ankang analyzed by their monthly data series and the clustered data series indicates that the ergodic property could change when new data is introduced. The analysis of May precipitation in Ankang shows it has ergodicity; whereas, the analysis using the clustered data series Ankang (May, June, September) indicates that this data series does not have mean ergodic property. This difference can be explained that the Ankang (May) monthly precipitation may become non-ergodic when new data, for example Ankang (June, September), is introduced, which actually can considered as new observations of Ankang (May) and the variation of Ankang (May) and Ankang (June, September) could be owing to the change of natural or man-made factors that affect precipitation processes.

A linear trend analysis is also performed following Vamos and Craciun (2012) for some of the ergodic monthly precipitation data series of the three rain gauge stations, as shown in Fig. 4. The August precipitation at neither Lanzhou nor Ankang station shows obviously periodic. The August precipitation at Lanzhou rain gauge station shows an overall decreasing trend while the August precipitation at Ankang rain gauge station shows remains almost stable around its mean value. In Newberry, only the precipitation of May shows a relatively stable trend, the precipitation of February has a clear decreasing trend since 1970s, and the precipitation of both July and December has an increasing trend. However, according to the ergodicity analysis, the precipitation of those months having ergodicity in the three rain gauge stations, in the long run will fluctuate around their mean value rather than keep varying as shown in Fig. 4.

In this study, the 50 years of monthly precipitation data series of Lanzhou rain gauge station, the 70 years of precipitation data series of Ankang rain gauge station are insufficient, and even the 121 years of monthly precipitation data from Newberry, MI, USA, is insufficient for direct analysis of ergodic. The BRF neural network is used to extend the data series for the practical analysis of ergodicity. The observation data on many other hydrological processes are expected not to be long enough for direct ergodicity analysis; we, therefore, suggest using some other reliable tools, such as the BRF neural network used in this study, to extend the data series and perform the ergodicity analysis based on the observation and prediction data series. Before a more rigorous method becomes available, such approach of using extended data series by a “black-box” type of method might be controversial, but, we still believe, helpful at least to give a sense whether the data series of analysis has erogdic property.

Conclusions

Ergodic property analysis for hydrological processes is difficult but worthy of discussion. One may argue that whether a data series representing a hydrological process is ergodic, it does not actually affect the practice of analysis of this hydrological process, therefore, the test of ergodicity can be completely neglected. Some researchers (Duan and Goldys, 2001; Koutsoyiannis, 2005; Liu, 1998), however, have pointed out that hydrological processes may have ergodic properties although no particular ergodicity analysis was performed in these works. This study presents a practical approach to analyze the mean ergodicity of hydrological processes, which bridges the concept of ergodicity and its application in hydrological process analysis. This approach primarily includes the stationarity test of the data series through its ACF or ADF test, avoiding the difficulty in analyzing the stationarity of the data series directly from its definition, the extension of the length of the data series, via the RBF network in this study, and the ergodicity analysis based on the sample mean sequence and its variance series. Three case studies, the ergodicity analysis for the monthly precipitation of Lanzhou in the Yellow River Basin of Chin, Ankang in the Han River basin of China, and Newberry, MI, USA, are conducted using the proposed approach.

Our research reveals that the precipitations of March, July, and August in Lanzhou, and May, June, and August in Ankang have ergodicity; therefore the stochastic and statistical analysis of the precipitation of these months based on the observations (sample) in these two stations are expected more reliable than the analysis for any other calendar months' precipitation in the two stations. The ergodicity analysis of precipitation data series of each individual month in Newberry, MI, USA, which has a relatively long observation history indicates that the precipitation of February, May, July, and December show ergodic property, although not all of the precipitation of these months has a tendency converging to its mean value, respectively.

This study focuses mainly on the mean ergodicity analysis; approaches to the covariance ergodicity analysis of hydrological processes need to be developed in the future, which would provide us more useful information. In addition, as discussed, the application of ergodicity seems still controversial although its concept and properties have been applied commonly in hydrology by presuming hydrological processes automatically having ergodicity. More discussion and methodologies on ergodicity analysis would certainly bridge the gap between its concept and application.

Acknowledgements

The paper is supported by the National Science and Technology Support Projects (Grant No. 2006BAB04A08).