the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Statistical hypothesis testing in wavelet analysis: theoretical developments and applications to Indian rainfall
Justin A. Schulte
Statistical hypothesis tests in wavelet analysis are methods that assess the degree to which a wavelet quantity (e.g., power and coherence) exceeds background noise. Commonly, a pointwise approach is adopted in which a wavelet quantity at every point in a wavelet spectrum is individually compared to the critical level of the pointwise test. However, because adjacent wavelet coefficients are correlated and wavelet spectra often contain many wavelet quantities, the pointwise test can produce many false positive results that occur in clusters or patches. To circumvent the pointwise test drawbacks, it is necessary to implement the recently developed areawise, geometric, cumulative areawise, and topological significance tests, which are reviewed and developed in this paper. To improve the computational efficiency of the cumulative areawise test, a simplified version of the testing procedure is created based on the idea that its output is the mean of individual estimates of statistical significance calculated from the geometric test applied at a set of pointwise significance levels. Ideal examples are used to show that the geometric and cumulative areawise tests are unable to differentiate wavelet spectral features arising from singularitylike structures from those associated with periodicities. A cumulative arcwise test is therefore developed to strictly test for periodicities by using normalized arclength, which is defined as the number of points composing a cross section of a patch divided by the wavelet scale in question. A previously proposed topological significance test is formalized using persistent homology profiles (PHPs) measuring the number of patches and holes corresponding to the set of all pointwise significance values. Ideal examples show that the PHPs can be used to distinguish time series containing signal components from those that are purely noise. To demonstrate the practical uses of the existing and newly developed statistical methodologies, a first comprehensive wavelet analysis of Indian rainfall is also provided. An R software package has been written by the author to implement the various testing procedures.
 Article
(14236 KB)  Fulltext XML

Supplement
(1079 KB)  BibTeX
 EndNote
Time series describing the evolution of physical quantities such as streamflow, sea surface temperature (SST), rainfall, and wind speed often contain nonstationary and timescaledependent characteristics. A better understanding of these characteristics is facilitated through the application of various statistical and signal processing methods that account for them. One such method is wavelet analysis, which is a time–frequency analysis method for extracting timelocalized and scaledependent features from time series. This method contrasts with the widely known Fourier analysis that assumes stationarity. The shorttime Fourier transform (STFT) addresses the problem of time localization, but wavelet analysis is still preferred over the STFT because wavelet analysis uses a variable window width that more effectively separates signal components. An additional attractive aspect of wavelet analysis is that it can be used to quantify the relationship or coherence between two time series at an array of timescales in a nonstationary setting (Grinsted et al., 2004). More recently, the frequency domain analogs of partial and multiple correlation (Ng and Cha, 2012; Hu and Si, 2016) have been developed in wavelet analysis, making the method an even more powerful exploratory tool for researchers. Given these desirable aspects of wavelet analysis, it is not surprising that wavelet analysis has been applied to a broad range of topics, including climatology (Gallegati, 2018), hydrology (Schaefli et al., 2007; Labat, 2010; Schulte et al., 2017), forecast model verification (Lane, 2007; Liu et al., 2011), ensemble forecasting (Schulte and Georgas, 2018), climate network analysis (Agarwal et al., 2018; Paluš, 2018; Ramana et al., 2013; Sahay and Srivastava, 2014; Elsanabary and Gan, 2004), and biomedicine (Addison, 2005).
One application of wavelet analysis is the estimation of a sample wavelet spectrum and the subsequent comparison of the sample wavelet spectrum to a background noise spectrum. To make such comparisons, one must implement statistical tests such as the pointwise (Torrence and Compo, 1998), areawise (Maraun et al., 2007), geometric (Schulte et al., 2015), and cumulative areawise (Schulte, 2016a) tests. Torrence and Compo (1998) were the first to place wavelet analysis in a statistical hypothesis testing framework using pointwise significance testing. In the pointwise approach, the statistical significance of wavelet quantities associated with points in a wavelet spectrum is assessed individually without considering the correlation structure among wavelet coefficients. For wavelet power spectra of climate time series, theoretical rednoise spectra are often the noise background spectra against which sample wavelet power spectra are tested. Monte Carlo methods are used to estimate the background noise spectra for wavelet coherence (Grinsted et al., 2004), partial coherence (Ng and Cha, 2012), multiple coherence (Hu and Si, 2016), and autobicoherence (Schulte, 2016b).
Despite its wide use, the pointwise approach has two drawbacks; the first of which is that the test will frequently generate many false positive results because of the simultaneous testing of multiple hypotheses (Maraun et al., 2007; Schulte et al., 2015). The second drawback is that spurious results occur in clusters because wavelet coefficients are correlated. To account for these deficiencies, Maraun et al. (2007) developed an areawise test to reduce the number of spurious results. Additional tests were subsequently developed by Schulte et al. (2015) and Schulte (2016a) to address the deficiencies of the pointwise test and computational inefficiencies of the areawise test. Although these tests were demonstrated as being effective in reducing spurious results, the pointwise testing procedure is still more frequently adopted. Furthermore, there are numerous papers surveying general aspects of wavelet analysis (Meyers et al., 1993; Kumar and FoufoulaGeorgiou, 1997; Torrence and Compo, 1998; Labat, 2005, Lau and Weng, 1995; Addison, 2005; Schaefli et al., 2007; Sang et al., 2012) but no papers surveying the recent developments in statistical hypothesis testing. This observation underscores the need for a paper that summarizes theoretical and practical aspects of statistical hypothesis testing.
A physical application to which wavelet methods have been applied is the understanding of Indian rainfall (Adarsh and Reddy, 2014; Maheswaran and Khosa, 2014). Indian rainfall variability is a complex, nonstationary, and timescaledependent phenomenon, making wavelet analysis a wellsuited tool for studying it. Recognizing the nonstationary behavior of the Indian monsoon phenomena, Torrence and Webster (1999) used wavelet coherence analysis to show that the relationship between the El Niño–Southern Oscillation (ENSO) and Indian rainfall is strong and nonstationary in the 2year to 7year period band. Narasimha and Bhattacharyya (2010) used wavelet crossspectral analysis to link the solar cycle to changes in the Indian monsoon. Other studies have used wavelet analysis to understand the temporal characteristics of the Indian rainfall time series (Nayagam et al., 2009). Fasullo (2004) found biennial oscillations in the allIndia rainfall time series, while Yadava and Ramesh (2007) found significant longterm periodicities in an Indian rainfall proxy time series. Terray et al. (2003) found that a time series describing late summer (September–August) Indian rainfall is associated with significant wavelet power in the 2year to 3year period band and suggested that such power is related to the tropospheric biennial oscillation. Common to all the studies noted above is the use of pointwise significance testing. Recent work highlighting the pitfalls of the pointwise testing approach raises the question as to whether the features identified in previous wavelet studies of Indian rainfall are statistical artifacts or ones distinguishable from the background noise. To address this question, an additional study is needed that applies the new statistical hypothesis tests in wavelet analysis.
In this paper, a first review of the theoretical and practical aspects of recent advances in statistical significance testing of wavelet estimators is presented in Sect. 2. Section 2 also includes discussions about the modifications of existing tests designed to make them more computationally efficient than the original formulations. A cumulative arcwise test is also proposed for testing for the presence of periodicities embedded in noise in a strict sense. Section 3 is devoted to the presentation of a comprehensive wavelet analysis of Indian rainfall time series using recently developed wavelet methods. The paper concludes with a summary and discussion in Sect. 4.
2.1 Basic overview
The continuous wavelet transform of a time series X(t) is given by
where a is wavelet scale, ψ is an analyzing wavelet, and b is time. The sample wavelet power spectrum is W(b,a)^{2} and measures the energy content of a signal at time b and scale a. Thus, the wavelet transform of a time series produces a 2D representation of it. In this paper, the set consisting of all points in the 2D representation will be denoted by H and referred to as the timescale plane. To simplify the discussion of results in this paper, the commonly used Morlet wavelet with angular frequency ω=6 is used throughout. For more details about wavelet analysis, the reader is referred to Torrence and Compo (1998).
Unlike the Fourier analysis where neighboring frequencies are uncorrelated, the wavelet coefficients at neighboring points in H are intrinsically correlated. The intrinsic correlation between wavelet coefficients at (b, a) and (b^{′}, a^{′}) is represented by the reproducing kernel K(b,a, b^{′}, a^{′}) whose mathematical expression is
where c_{ψ} is an admissibility constant. The redundancy between the values W(a,b) and $W\left(a{}^{\prime},b{}^{\prime}\right)$ is expressed as
Equation (3) says that a wavelet coefficient at W(b,a) captures information from neighboring points; the degree to which this occurs depends on the weight $K\left(b,a;b{}^{\prime},a{}^{\prime}\right)$. Even for uncorrelated noise, wavelet coefficients will be correlated in H (Maraun and Kurths, 2004), a theoretical result that has important implications for significance testing.
The normalized reproducing kernel for the Morlet wavelet is shown in Fig. 1, where normalization means that the reproducing kernel is divided by its maximum value so that the maximum of the normalized reproducing kernel is equal to unity and located at the point at which the reproducing kernel is centered. In Fig. 1a, the reproducing kernel is dilated and translated to the scale a=32 and time b=500 and indicates that a wavelet coefficient at (500, 32) will be correlated with other wavelet coefficients surrounding it. The reproducing kernel shown in Fig. 1b is dilated and translated to (500, 128) and seen as being wider in the time direction than the reproducing kernel centered at (500, 32). The widening reflects how the reproducing kernel expands linearly in both the time and scale (in a nonlogarithmic scale) direction.
2.2 Statistical significance tests
2.2.1 Pointwise significance
In pointwise significance testing, one individually compares a wavelet quantity at every point in H to the critical level of the pointwise test, which depends on the chosen pointwise significance level α_{pw} and usually on the wavelet scale a. For wavelet power spectra of climate time series, pointwise test critical values are often determined from a theoretical rednoise background (Torrence and Comp, 1998). For wavelet coherence, partial coherence, multiple coherence, and autobicoherence, Monte Carlo methods need to be implemented to estimate the critical values (Grinsted et al., 2004; Ng and Chan, 2012; Hu and Si, 2016; Schulte, 2016b). However, a parametric bootstrap method may be required for determining the critical values of an arbitrary background model if analytical background models are not readily available (Maraun et al., 2007). Using the pointwise test, one assigns to each point in H a p value, ρ_{pw}, representing the probability of finding the observed or more extreme wavelet quantity (power, coherence, etc.) when the null hypothesis is true. The result of the pointwise test is the subset
of H, representing regions where pointwise significant wavelet quantities have been identified.
To better understand the utility of the pointwise testing procedure, consider two example time series, where the first time series, R(t), corresponds to a realization of a rednoise process with a lag1 autocorrelation coefficient equal to 0.4 (Fig. 2a). The second time series, X(t), shown in Fig. 3a, is given as
where
and N(t) is a realization of a rednoise process with a lag1 autocorrelation coefficient equal to 0.1. The constants σ_{s} and σ_{s} are the standard deviations of S(t) and N(t), respectively. The real number m is a measure of the signaltonoise ratio, with larger values indicating a relatively higher signal.
The outcomes of the pointwise test applied to the (rectified; Liu et al., 2007) wavelet power spectrum of X(t) and R(t) are shown in Figs. 2b and 3b. For R(t), the pointwise test applied at α_{pw}=0.05 identified many statistically significant wavelet power coefficients, all of which being spurious by construction. The spurious results are seen occurring in contiguous regions, and the union of all such regions is P_{pw}. For the wavelet power spectrum of X(t), statistically significant wavelet power at the periods 64, 128, and 256 is seen clustering in narrow bands, reflecting the periods of the individual sinusoids (Fig. 3b). The singularity at t=1000 emerges as a scaleelongated region of pointwise significance, extending from a period of 2 to a period of approximately 32. What about the other features seen in the wavelet power spectrum? All the other significance regions are spurious by construction because they are associated with the noise term N(t), implying that they resulted from stochastic fluctuations. Thus, without further investigation, it would be impossible to know without prior knowledge of X(t) if the features associated with N(t) are part of the signal or not. These examples highlight how the number of spurious results can be large and how they could impede the differentiation between an underlying signal and background noise. It is therefore important to reduce the number of spurious results using other statistical methods.
2.2.2 Areawise and geometric testing
As noted by Maraun et al. (2007), the application of the pointwise test N_{pw} times will on average produce N_{pw}α_{pw} spurious results, with the spurious results occurring in clusters or patches. These socalled pointwise significance patches arise from the reproducing kernel of the analyzing wavelet that represents intrinsic correlations among wavelet coefficients. As noted by Schulte (2016a), patches can be rigorously defined using ideas from topology. That is, a patch is a pathconnected component of P_{pw}, where a pathconnected component is an equivalent class of P_{pw} resulting from an equivalence relation ∼ on P_{pw} that makes points $x,y\phantom{\rule{0.125em}{0ex}}\in {P}_{\mathrm{pw}}$ equivalent (written x∼y) if they can be connected by a continuous path $f:\left[\mathrm{0}\phantom{\rule{0.125em}{0ex}}\mathrm{1}\right]\to {P}_{\mathrm{pw}}$ such that f(0)=x and f(1)=y (Fig. 4). The equivalence relation ∼ reduces the original large set of points in P_{pw} to a smaller set of patches; the implications of which will be described later.
Because patches arise from the reproducing kernel, patches inherit the geometric characteristics of the reproducing kernel, such as convexity, area, length, and width. As an example, consider the set shown in Fig. 1b entirely consisting of points enclosed by the thick black contour. The contoured region is a subset of H for which the normalized reproducing kernel dilated and translated to (500, 128) exceeds 0.1. The set is convex (i.e., contains no concavities) because any two points in it can be connected by a line that remains entirely in the set (Fig. 4). The convexity of the set suggests that typical patches will be convex, which can be confirmed by generating a large ensemble of patches found in the wavelet power spectra of realizations of a rednoise process (Schulte et al., 2015). The convexity of patches plays an important role in understanding how different statistical tests perform. Another important geometric property is area, which reflects the dilated reproducing kernel so that patches at greater scales will be generally larger than those located at smaller scales, as Fig. 1 suggests.
Recognizing that a typical patch area is given by the reproducing kernel, Maraun et al. (2007) developed an areawise test, which is conducted as follows. First choose a critical area P_{crit}(b,a) defined as a subset of H for which the reproducing kernel dilated and translated to (b, a) exceeds a certain critical level K_{crit}. That is,
The set of points whose associated wavelet quantities are also areawise significant is then the set
representing the union of all critical areas that lay completely inside P_{pw}. The larger the critical area, the larger a patch needs to be for it to be deemed areawise significant so that P_{crit}(b,a) is related to α_{aw}, the significance level of the areawise test. As discussed by Maraun et al. (2007), the critical area that corresponds to α_{aw} is determined using a rootfinding algorithm, which is a nontrivial step that can be circumvented by performing a geometric test instead, as discussed below.
While the areawise test effectively addresses the multipletesting pitfall of the pointwise test, the use of the rootfinding algorithm renders the practical implementation of the test difficult. To overcome this drawback, Schulte et al. (2015) constructed a geometric test whose test statistic is normalized area, which is defined as the patch area, A_{patch}, divided by the square of the patch's mean scale coordinate, $\widehat{a}$. That is,
where the division by the mean scale coordinate accounts for how the reproducing kernel results in the scaledependent expansion of patches in the time and scale direction. Thus, the normalized areas of patches can be readily compared regardless of where the patches are in H. The critical value of this test is assessed using Monte Carlo methods as follows: (1) generate wavelet spectra under some null hypothesis (e.g., red noise), (2) create a null distribution of normalized area using the patches found in the wavelet spectra, and (3) estimate the critical level of the test corresponding to the geometric significance level α_{geo} by computing the percentile of the null distribution corresponding to 1α_{geo}. This null distribution calculation can be performed rapidly because wavelet spectra often contain many patches. However, for wavelet coherence, Monte Carlo methods must be applied twice because the critical levels of the pointwise test must be also empirically estimated. Fortunately, the length of the noise realizations needed to generate the null distribution of normalized areas need not be the same length of the input time series because the patch area is unrelated to the time series length; it is related to the reproducing kernel. As such, the realizations can be of shorter length, improving the efficiency of the null distribution computation. The ability to efficiently generate a null distribution allows p values associated with the geometric test to be further adjusted to account for multiple testing. For example, the false discovery rate of the geometric test can be controlled at a desired level if the number of patches to which the test is applied is large (Schulte et al., 2015).
It is important to note that the geometric and areawise tests differ in the way they assign statistical significance to points. For the geometric test, all points in a patch will be deemed insignificant or significant if the patch is deemed statistically insignificant or significant. However, strictly speaking, the areawise test evaluates the statistical significance of wavelet quantities associated with points in P_{pw} based on the geometric properties of the pathconnected subsets of P_{pw} to which they belong. That is, for a wavelet quantity at (b, a) to be areawise significant means that there must be a P⊆P_{pw} containing (b, a) that is pathconnected, sufficiently convex, and large enough to have ${P}_{\mathrm{crit}}\left(b,a\right)\subset P$. If P is a patch, then no subset of P can be areawise significant if P is not, consistent with the patchsorting interpretation. On the other hand, P may contain both areawise and nonareawise significant subsets, opposing the patchsorting interpretation. As discussed earlier, the equivalent relation ∼ reduces the initial large set of points to a set of fewer patches, implying that the number of spurious results arising from the geometric and areawise test is less than that of the pointwise test.
Figures 2 and 3 show the wavelet power spectra of X(t) and R(t) after the application of the areawise and geometric tests at the 0.1 significance level, where the areawise test was performed using an existing R software package (available at https://rdrr.io/github/Dasonk/SOWAS/, last access: 12 January 2018). Both tests are seen reducing the number of spurious results arising from the pointwise significance test applied at the α_{pw}=0.05 level. However, as shown in Fig. 3c, the areawise test also deems features associated with the signal S(t) to be noise. For example, the patch located at a period of 256 extending from t=100 to t=1700 is no longer statistically significant. Note also that the areawise test largely filters out the scaleelongated feature associated with the singularity of X(t) at t=1000, while the geometric test deems the feature to be statistically significant. This difference in performance is expected because patches must be long with respect to the reproducing kernel to be areawise significant, whereas there is no such constraint for the geometric test. Thus, the areawise test is better suited for situations in which only periodicities are sought because periodicities induce temporally long patches. In general, the areawise and geometric tests will reduce the signal detected by the pointwise test because the areawise and geometric tests are theoretically constrained to detect a smaller amount of signal than the pointwise test (Table A1). This theoretical constraint raises a natural question: can a test simultaneously reduce statistical artifacts and increase signal detection (true positive results) relative to the pointwise test? The fact that patches enlarge with increasing α_{pw} suggests that examining the areas of patches across a set of pointwise significance levels could improve signal detection relative to the pointwise test.
2.2.3 Cumulative areawise testing
Maraun et al. (2007) and Schulte et al. (2015) demonstrated that the areawise and geometric tests are procedures for reducing the number of spurious results. However, both tests suffer from the drawback of having to select both an areawise (or geometric) significance level and a pointwise significance level. This dual significance level selection is a cause for concern because a single wavelet quantity can have different levels of areawise or geometric significance depending on the chosen pointwise significance level, leading to uncertainty as to whether the wavelet quantity is statistically artificial or distinguishable from background noise.
To address this concern, Schulte (2016a) suggested that changes in the geometric and topological characteristics of patches should be assessed over a finite set of pointwise significance levels ${\mathit{\alpha}}_{\mathrm{1}},\phantom{\rule{0.125em}{0ex}}{\mathit{\alpha}}_{\mathrm{2}},\mathrm{\dots},{\mathit{\alpha}}_{N}$. If the pointwise significance levels ${\mathit{\alpha}}_{\mathrm{1}},\phantom{\rule{0.125em}{0ex}}{\mathit{\alpha}}_{\mathrm{2}},\mathrm{\dots},{\mathit{\alpha}}_{N}$ are chosen such that
then the sets
will form a filtration
of H for a sufficiently small α_{1} and large α_{N} value. Equation (13) expresses the intuitive idea that one can start with an empty set (no statistically significant wavelet quantities) and build the complete timescale plane by increasing the pointwise significance level to a value that renders all wavelet quantities statistically significant.
An idealized example of a wavelet domain filtration is shown in Fig. 5a. The application of the pointwise test at the α_{1} level results in no statistically significant regions, whereas the application of the test at α_{2} results in a relatively small region of pointwise significance (dark gray). In contrast, setting the pointwise significance level to α_{5} results in nearly all wavelet quantities being statistically significant, as indicated by the light gray region. At α_{6}, all wavelet quantities are statistically significant, and ${P}_{\mathrm{pw}}^{\mathrm{6}}$ is H.
Although filtrations like Eq. (13) have proven to be useful in a data analysis method called persistent homology (Edelsbrunner and Harer, 2008), this method focuses on the sets of all pointwise significant values. At present, the concern is the statistical significance of a wavelet quantity at a point, and thus a more local approach is needed, though the utility of Eq. (13) will become more apparent in Sect. 2.2.5. To localize the persistence approach, a local filtration about x called a geometric pathway is needed (Schulte, 2016a). Mathematically, a geometric pathway G_{x} corresponding to a point x is a nested sequence
where
A concrete example of a geometric pathway G_{x} is shown in Fig. 5b, where the geometric pathway corresponds to the wavelet domain filtration shown in Fig. 5a. In this example, ${P}_{\mathrm{1}}^{x}=\mathit{\varnothing}$ and ${P}_{\mathrm{2}}^{x}=\mathit{\varnothing}$, where ${P}_{\mathrm{2}}^{x}=\mathit{\varnothing}$ because x is not in the dark gray region shown in Fig. 5a representing ${P}_{\mathrm{pw}}^{\mathrm{2}}$. Although the set ${P}_{\mathrm{pw}}^{\mathrm{3}}$ comprises two path components or patches, only the component containing both x and y corresponds to the geometric pathway member at α_{3}. The reason is because any point equivalent to w is not equivalent to x on ${P}_{\mathrm{pw}}^{\mathrm{3}}$, as those points cannot be connected to x by a path fully contained in ${P}_{\mathrm{pw}}^{\mathrm{3}}$. Thus, in this example,
$y\in {P}_{\mathrm{3}}^{x},$ and $w\notin {P}_{\mathrm{3}}^{x}$. A similar situation occurs at α_{4} because ${P}_{\mathrm{pw}}^{\mathrm{4}}$ has three pathconnected components containing v, w, x, y, and z, but x is only equivalent to points that are equivalent to y. At α_{5}, the set ${P}_{\mathrm{pw}}^{\mathrm{5}}$ is pathconnected ($x\sim y\sim z\sim v\sim w$) and is thus the fifth member of G_{x}. The sixth and last member of G_{x} is H.
Using the definition of a geometric pathway, Schulte (2016a) constructed a cumulative areawise test that assesses the statistical significance of wavelet quantities by associating, to each point x in H, the total normalized area integrated over its geometric pathway G_{x}. The first step of the procedure is to calculate the normalized areas ${A}_{\mathrm{1}}^{x},{A}_{\mathrm{2}}^{x},\mathrm{\dots},{A}_{N}^{x}$ corresponding to the N members of G_{x}, where ${A}_{i}^{x}$ is assumed to be zero if ${P}_{i}^{x}=\mathit{\varnothing}$ or ${P}_{i}^{x}=\left\{x\right\}$. The second step is to compute the sum,
and compare A^{x} to a critical area A_{crit} determined by Monte Carlo methods. In the Monte Carlo approach, wavelet spectra and corresponding geometric pathways are generated under some null hypothesis, and a null distribution is calculated from all the computed total normalized areas. The critical area corresponding to the α_{c} level of the test is the percentile of the null distribution corresponding to 1α_{c}. Performing these steps for all x∈H results in an adjusted p value at every point that accounts for multiple testing and the correlation among adjacent wavelet coefficients. As noted by Schulte (2016a), the computational efficiency of the cumulative areawise test can be improved by setting α_{1}=0.02 and α_{N}=0.18, though this selection means that the cumulative areawise test may no longer adjust p values at every point in H because complete filtrations like Eq. (13) may no longer be under consideration. Nevertheless, using a limited range of pointwise significance levels generally results in a better performance of the cumulative areawise test relative to the geometric test performed at a single pointwise significance level (Appendix A and Table A1).
A disadvantage of the cumulative areawise testing procedure is that it involves the computation of nested sequences, which renders the test computationally inefficient for long time series. Fortunately, the computation of nested sequences can be avoided as follows. First note that the test statistic for the cumulative areawise test can be replaced by the mean normalized area without changing the outcome of the test. The criterion for a point to be cumulative areawise significant is then
where ${\widehat{A}}_{\mathrm{Crit}}$ is the critical level of the test calculated by computing the percentile of the null distribution comprising mean normalized areas corresponding to 1−α_{c}. Furthermore, there is a onetoone relationship between the normalized area of a patch and its geometric test p value because a greater normalized area always means a lower p value. Thus, if ${\mathit{\rho}}_{i}^{x}$ is the p value associated with ${A}_{i}^{x}$, then the criterion for a wavelet quantity to be cumulative areawise significant is
because there is also a onetoone relationship between α_{c} and the critical level of the test. Equation (19) implies that to perform the cumulative areawise test, one can individually perform the geometric test at each pointwise significance level and then compute ${\widehat{\mathit{\rho}}}^{x}$, avoiding the computation of nested sequences as in the original formulation of the cumulative areawise test.
This interpretation of the cumulative areawise test also allows the testing method to be put in an ensemble forecasting framework. That is, the individual geometric test p values are identified as ensemble members estimating the statistical significance of a wavelet quantity. The mean p value is identified with the ensemble mean that smooths out the unpredictable aspects of the forecast. In the present context, ${\widehat{\mathit{\rho}}}^{x}$ smooths out the features in the wavelet spectra resulting from noise and leaves the features that most of the geometric tests agree to be significant. This smoothing results in the suppression of noise but enhancement of the signal, which is consistent with how Schulte (2016a) found the cumulative areawise test to perform superiorly to the geometric test in terms of signal detection, at least for periodic signals (Appendix A and Table A1).
Using Eq. (19), we now devise a simplified approach to the cumulative areawise test. In the simplified approach, all points in H are initially assigned a value of 0. Then, for a fixed pointwise significance level, the geometric test is performed on all patches. The points falling in those patches that are geometrically significant at the α_{c} level are then assigned the value 2. The values for the points not in geometrically significant patches are left unchanged. This assignment step is then repeated for each pointwise significance level, and the values are stored separately so that each point in H is assigned a set of values, the number of values equaling the number of pointwise significance levels used to implement the cumulative areawise test. Then, the average value at each point is computed, with regions where the average value exceeds unity deemed to be cumulative areawise significant.
The application of the simplified cumulative areawise test to the wavelet power spectra of R(t) and X(t) (Figs. 2e and 3e) reveals that the test is effective at reducing the number of spurious results arising from the pointwise test when applied at the α_{c}=0.05 level. Like the areawise test, the reduction in spurious features is accompanied by a loss in signal detection. For example, the cumulative areawise test, like the areawise test, deems the wavelet power associated with the pointwise patch located at a period of 256 extending from t=100 to t=1700 in Fig. 3b to be statistically insignificant. On the other hand, the scaleelongated feature associated with the singularity of X(t) at t=1000 more clearly emerges from the application of the cumulative areawise test than the areawise test, which is not surprising because the cumulative areawise test assesses statistical significance based on area without regard to the specific shape of patches.
Although this example suggests that the pointwise test detects a higher amount of signal than the cumulative areawise test, a simplified experiment in Appendix A revealed that the simplified cumulative areawise test can detect a higher amount of signal than the pointwise test when the signaltonoise ratio is relatively high (Table A1). At low pointwise significance levels, the tests were found to detect a similar amount of the signal on average. Thus, a key strength of the cumulative areawise test is that it can retain the signal while also reducing the number of spurious results arising from the pointwise test.
2.2.4 Cumulative arcwise testing
While the areawise, geometric, and cumulative areawise tests can effectively reduce the number of spurious results, they do not test for the existence of periodicities embedded in a time series. This deficiency is especially true for the geometric and cumulative areawise tests because they only consider the area of patches in the assessment of statistical significance. For example, the cumulative areawise test can identify features that are long in scale but short in time as statistically significant (Fig. 3e). Thus, a new test needs to be constructed that is specifically designed to distinguish randomly stable fluctuations from those that are periodicities. As periodicities are determined by patch length, a strict test for periodicities should consider patch length and not patch area.
Using the idea that patch length is more closely linked to the presence of periodicities, we construct a cumulative arcwise test that can be implemented using the following steps. For Step 1, fix a scale and pointwise significance level and identify pointwise significance arcs, where pointwise significance arcs are contiguous strings of points whose associated wavelet quantities are pointwise significant. Step 2 is to compute normalized arclength, defined as the number of points composing each arc divided by the scale in question. Step 3 is to repeat Steps 1 and 2 for all wavelet scales associated with the wavelet spectrum in question. Repeating the procedure for a finite set of pointwise significance levels allows for the assignment of total arclength to each point in the timescale plane. The null distribution can be computed using the Monte Carlo approach adopted for the cumulative areawise test.
The application of the cumulative arcwise test at the 0.05 significance level to the wavelet power spectra of X(t) and R(t) is shown in Figs. 2f and 3f, where pointwise significance levels ranging from 0.02 to 0.18 were used. The discrete spacing between adjacent pointwise significance was set to 0.02 to be consistent with the cumulative areawise test applied earlier (Figs. 2e and 3e). Like with the other statistical tests, the number of spurious pointwise test results for both time series is seen as being dramatically reduced. For X(t), much of the signal detected by the pointwise test remains despite the greater stringency of the cumulative arcwise test. A comparison of Fig. 3c and f reveals that the geometric and cumulative arcwise tests detect a similar amount of the signal associated with the three sinusoids. As expected, the cumulative arcwise test, like the areawise test, largely filters out the scaleelongated feature shown in Fig. 3b, supporting the idea that both the arcwise and areawise tests should be preferred to the geometric and cumulative areawise tests in situations in which features associated with periodic signals are sought. In fact, in Appendix A it is shown that the cumulative arcwise test can detect a higher amount of signal than the pointwise test for high signaltonoise ratios.
2.2.5 Topological significance testing
A common theme among all the statistical procedures discussed so far is that they evaluate the statistical significance of wavelet quantities based on the geometric properties of the patches to which the corresponding points belong. However, as demonstrated by Schulte et al. (2015), topological properties are also important to statistical hypothesis testing. How do topological properties differ from geometric ones? Topological properties of an object are those that remain unchanged no matter how much the object is twisted or bent. Using topological characteristics of pointwise significance regions, additional information about the time series in question can be gained.
The basic principal behind topological significance testing is that topologically equivalent (that is, homeomorphic) objects have identical topological invariants (Ferri, 2017). In the present context, the object of interest is P_{pw} and the topological invariants of concern are the 0 and 1D Betti numbers denoted by β_{0} and β_{1}, respectively. The invariant β_{0} measures the number of path components or patches composing P_{pw}, whereas β_{1} measures the number of 1D holes in P_{pw}. A region in P_{pw} has a hole if there is a closed path in it such that the path cannot be continuously deformed into a point (Fig. 4). Equivalently, holes are regions in H fully surrounded by points in P_{pw}. Although counterintuitive, Schulte et al. (2015) showed that the presence of these holes is related to the statistical significance of the patch to which they correspond. To better understand how these holes are related to statistical significance, consider the reproducing kernel shown in Fig. 1b together with the set of points enclosed by the black contour. As shown in Fig. 1b, the closed path, g, in the contoured region can be continuously (e.g., no tearing or cutting) deformed into a point within the set by contracting or shrinking it, with this property holding for any closed path so that the set enclosed by the contour has no holes. As patches arise from the reproducing kernel, typical patches arising from isolated fluctuations will tend not to have holes at low pointwise significance levels. However, if two fluctuations are located at nearby frequencies and times, then the resulting patches in the wavelet power spectrum may contain holes (Schulte et al., 2015).
Using Betti numbers, one can assess whether P_{pw} is homeomorphic to ${P}_{\mathrm{noise}}=\phantom{\rule{0.125em}{0ex}}\left\{\left(b,a\right):{\mathit{\rho}}_{\mathrm{noise}}\left(b,a\right)<{\mathit{\alpha}}_{\mathrm{pw}}\right\}$, where ρ_{noise} denotes the p value resulting from the application of the pointwise test to a noise spectrum such as the wavelet power spectrum associated with a realization of a rednoise process or the coherence spectrum associated with a pair of rednoise realizations. More specifically, a topological significance test is performed as follows: (1) compute wavelet spectra under some noise model, (2) compute P_{noise} for each of the wavelet spectra, and (3) compute the null distribution of β_{0} and β_{1}. The critical values associated with the twosided topological significance test applied at the α_{topo} significance level are the percentiles of the null distribution corresponding to 1α_{topo}∕2 and α_{topo}∕2. The twosided test accounts for how the number of patches or holes can be unusually high or low relative to noise. However, as shown by Schulte et al. (2015), a typical patch found in the wavelet power spectrum of red noise will typically have no holes at pointwise significance levels less than 0.2, suggesting that a onesided topological test may be better than a twosided test at low pointwise significance levels.
Topological equivalence at a single pointwise significance does not necessarily mean that the time series in question is topological indistinguishable from background noise. That is, the result of the topological significance test may be sensitive to the chosen pointwise significance level. To address this concern, it is necessary to adopt the persistent homology approach (Edelsbrunner and Harer, 2008) in which a topological space is distinguished from another topological space homeomorphic to it through an examination of filtered versions of the spaces (Ferri, 2017). In the wavelet analysis context, the spaces are the timescale planes associated with noise and the times series in question, which are rectangular and topological equivalent. To distinguish the timescale planes, it is necessary to examine the filtered versions of the timescale planes using the filtration represented by Eq. (13). Computing β_{o} and β_{1} at each step in the filtration results in 0 and 1D persistent homology profiles (PHPs; Qaiser et al., 2016). The more comprehensive topological significance then involves comparing PHPs for a time series in question to that of noise by applying the topological significance test at each pointwise significance level.
The utility of the topological significance testing procedure is demonstrated using the wavelet power spectra of X(t) and R(t). The 1D PHP corresponding to R(t), shown in Fig. 6a, indicates that β_{1} is maximum at α_{pw}=0.7 and decreases rapidly until reaching 0 at α_{pw}=0.1. The overall curve is the same for realizations of a rednoise process with any lag1 autocorrelation coefficient, though the number of holes is greater for larger lag1 autocorrelation coefficients (not shown). The 1D PHP for X(t) is like that of R(t), suggesting that these time series are topologically indistinguishable. The application of the topological significance test with α_{topo}=0.05 further supports the topological equivalence of X(t) with red noise because the PHP associated with X(t) falls in the grayshaded envelope representing the test nonrejection region, where the nonrejection region was obtained by generating 100 PHPs associated with 100 realizations of a rednoise process with a lag1 autocorrelation coefficient equal to that of X(t). However, X(t) is not noise by construction, and thus the similarity of 1D PHPs does not exclude the possibility that time series under consideration are indistinguishable from noise.
In contrast to the 1D PHPs, the 0D PHPs for X(t) and R(t) differ substantially (Fig. 6b). The PHP for X(t) is seen to reach a global maximum around α_{pw}=0.25, while the one for R(t) peaks around α_{pw}=0.18. The application of the topological significance test at the 0.05 level strongly supports the idea that X(t) is distinguishable from noise because β_{0} far exceeds the critical level of the test at α_{pw}=0.25. Thus, there are features that are distinguishable from the background noise. These features are precisely those identified from the areawise, geometric, cumulative arcwise, and cumulative areawise tests.
3.1 Data
To understand the temporal behavior and spatial variability of Indian rainfall, monthly rainfall data for five homogenous regions (Parthasarathy et al., 1994) extracted from Indian Institute of Tropical Meteorology website (https://www.tropmet.res.in/Data Archival51Page, last access: 15 March 2019) were analyzed. The five divisions called the Peninsula, Northwest, Northeast, Central Northeast, and West Central regions were devised based on continuity of area, contribution to annual amount, and global and regional circulation parameters (Parthasarathy et al., 1994; Azad et al., 2010). The rainfall data for each region were obtained by areaweighted averaging the subdivisional data corresponding to the meteorological subdivisions composing the homogenous region, where the subdivisional data were obtained by averaging the rainfall data associated with representative rainfall gauges. An allIndia (Parthasarathy et al., 1995) rainfall time series was also examined, as it is commonly reported in the literature. The monthly allIndia rainfall time series is constructed by averaging the subdivisional data after assigning a weight to each subdivision, which is determined by the subdivisional area (Parthasarathy et al., 1994). These rainfall data span the long time period from 1871 to 2016 and are continuous, making them wellsuited for a wavelet analysis. Each rainfall time series was converted into an anomaly time series by subtracting the 1871–2016 mean for each month from the individual monthly values. After the conversion to an anomaly time series, the time series were standardized by dividing them by their respective 1871–2016 standard deviations. The resulting time series are shown in Fig. 7.
The monthly Niño 3.4 index (available at https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.data, last access: 15 March 2019) from 1871 to 2016 was used to characterize the state of the ENSO system. The Niño 3.4 index is defined as the average SST in the region bounded by 5^{∘} N and 5^{∘} S and by 170 and 120^{∘} W. The seasonal cycle was removed from this time series in the same way as it was removed from the rainfall time series.
3.2 Wavelet power spectrum
The wavelet power spectra corresponding to the rainfall time series are shown in Fig. 8, where α_{pw}=0.05 in this case. Statistically significant features are seen for all six time series, with a large swath of pointwise significance being seen around a period of 256 months for the Peninsula, Northeast, and Northwest time series after 1950. For all the time series, most of the pointwise significance patches are seen as being located at lower periods and appear to have varying geometries. For example, the wavelet power spectrum for the Northwest time series contains a scaleelongated significance patch at around 1920, extending from a period of 4 months to a period of 64 months. A similar but less prominent feature is also evident in the wavelet power spectrum of allIndia rainfall. For the remaining time series, no such features are readily identifiable from an inspection of the wavelet power spectra.
To account for how spurious results may result from multiple testing, the cumulative areawise test was applied at the α_{c}=0.05 level to the wavelet power spectrum shown in Fig. 8. For these cases, the simplified cumulative areawise test was applied using pointwise significance levels ranging from 0.02 to 0.18, with the spacing between adjacent pointwise levels equaling 0.02. Like with the pointwise test, statistically significant wavelet power is located around a period of 256 months for the Northwest and Northeast regions, raising the possibility that these time series contain periodicities at that timescale (Fig. 9). For all the time series, most of the statistically significant results are seen at periods less than 64 months. Two of the most prominent features are the scaleelongated significance patches around 1920 in the allIndia and Northwest wavelet power spectra. Because the wavelet power associated with these scaleelongated features remain statistically significant after the application of the cumulative areawise, there is strong evidence that the corresponding fluctuations exceed the rednoise background.
To better differentiate patches arising from singularities from those associated with periodicities, the cumulative arcwise test was applied to wavelet power spectra shown in Fig. 8 at the same pointwise significance levels used for the cumulative areawise test. As shown in Fig. 10, the application of the cumulative arcwise test at the 5 % significance level reveals two timeelongated regions of statistical significance around a period of 256 months in the Peninsula and Northwest wavelet power spectra. However, the wavelet power around a period of 256 months is no longer statistically significant for the Northeast time series, suggesting that the pointwise significant wavelet power at that period is associated with a randomly stable oscillation. The scaleelongated features for the allIndia and Northwest power spectra are largely filtered out like the scaleelongated features shown in Fig. 3 for X(t). This result suggests that the wavelet power at that time and at those periods is associated with a singularitylike time domain feature rather than a periodicity.
To gain a deeper understanding of the rainfall time series, the 0th and 1D PHPs were computed for each time series. For clarity, we computed anomaly PHPs by subtracting the mean profile associated with noise from the individual profiles at each pointwise significance level. Thus, positive (negative) values mean that the number of patches (or holes) associated with the rainfall time series is greater (less) than that associated on average with noise. Because the lag1 autocorrelation coefficients associated with each time series are nearly identical, a single test nonrejection region was computed using 100 realizations of a rednoise process with lag1 autocorrelation coefficient equal to the mean lag1 autocorrelation coefficient of the six rainfall time series. The mean noise profile was subtracted from both the upper and lower bounds of the test nonrejection region after the application of the topological significance test at the 0.05 level.
The 0th dimensional anomaly PHPs for each rainfall time series is shown in Fig. 11a. Many of the profiles are seen to deviate substantially from the mean noise profiles around α_{pw}=0.5 and α_{pw}=0.2. At around α_{pw}=0.5, the largest deviations are associated with the Northwest, West Central, and allIndia time series. The profiles for the Northeast and Peninsula time series lay inside the confidence envelope so that those time series could be topologically indistinguishable from red noise. At around α_{pw}=0.15, the largest deviations are associated with the Northwest and Central Northeast profiles. The large deviations from the mean noise profile for the Northwest time series suggest that the statistically significant features identified by the cumulative areawise test (Fig. 9) are unlikely statistical artifacts. On the other hand, the wavelet power spectrum of the West Central time series contains only five small regions of cumulative areawise significance despite the large departures from the mean noise profiles at numerous pointwise significance levels. This finding raises the possibility of Type II errors for the cumulative areawise test or a Type I error for the topological significance test.
Additional insight into the time series is provided by the 1D anomaly PHPs shown in Fig. 11b. In this case, all the rainfall time series appear to be topologically distinguishable from red noise for pointwise significance levels ranging from 0.7 to 0.65. Like with the 0th dimensional profiles, the departures are greatest for the Northwest and West Central time series, providing strong evidence that these time series comprise features exceeding background noise. For the West Central time series, the lack of arcwise significance shown in Fig. 10 suggests that the topological significance cannot be attributed to periodicities with high confidence. For the Northwest time series, the cumulative areawise test results suggest that topological significance may be attributed to the scaleelongated feature around 1915 and the enhanced wavelet power around a period of 256 months. The PHPs for all the time series converge to 0 below the pointwise significance level 0.3, and the lack of 1D holes below 0.1 renders the isolation of topologically significant regions in H using the approach of Schulte et al. (2015) that determines where there is enhanced topological significance based on the location of holes difficult. However, for the Northwest time series, a few holes were identified at pointwise significance below 0.1 around the scaleelongated patches shown in Fig. 9c, supporting the results of the cumulative areawise test.
3.3 Wavelet coherence
To determine the timescales at which ENSO influences on Indian rainfall are strongest, wavelet coherence was computed between the Niño 3.4 index and the rainfall time series for each region. The pointwise, cumulative areawise, and cumulative arcwise tests were applied to the wavelet coherence spectra at the 5 % significance level, but only the results for the cumulative arcwise test are shown for brevity. The reason why the results for the cumulative arcwise test are emphasized is that we are interested in distinguishing nonrandom coherent oscillations at a scale from randomly stable covarying oscillations.
The results for the wavelet coherence analysis are shown in Fig. 12. Statistically significant wavelet coherence was identified for all six time series, but coherence between the Niño 3. 4 index and the time series of rainfall is seen to be most salient for the allIndia, Northwest, and West Central regions. The statistically significant coherence is mainly confined to the period band of 16 to 64 months, and time periods of enhanced coherence appear to follow time periods of weak coherence. The results shown in Fig. 12 suggest that the 16month to 64month mode of SST variability across the Niño 3.4 region modulates Indian rainfall, which is a wellknown idea (Torrence and Webster, 1999). It is also noted that the coherence between rainfall and the Niño 3.4 index varies spatially, consistent with recent work showing how ENSO influences are spatially nonuniform (Roy et al., 2017).
Wavelet analysis is a tool for extracting timelocalized and scaledependent features from time series. To assess the confidence with which features in wavelet spectra are distinguishable from a noise background, it is necessary to implement statistical significance tests. The traditional approach to statistical significance testing is to individually test whether a wavelet quantity associated with a point in the timescale plane exceeds a background noise spectrum. This pointwise approach has been applied in numerous papers since its first application to wavelet analysis by Torrence and Compo (1998).
Despite its wide use, the pointwise test suffers from two pitfalls, as noted by Maraun and Kurths (2004) and Maraun et al. (2007) and as shown in Fig. 2. The first pitfall is that the test typically produces many false positive results because of the simultaneous testing of multiple hypotheses. Secondly, spurious results occur in clusters, with the clusters reflecting the reproducing kernel of the analyzing wavelet. To remedy the pitfalls of the pointwise test, Maraun et al. (2004) developed an areawise test to reduce the number of spurious pointwise significance patches based on the area and geometry of the patches. While the method has been shown to be effective at reducing the number of spurious results, the test is computationally inefficient. To address the concern of computational inefficiency, Schulte et al. (2015) proposed a geometric test that reduces the number of spurious results by assigning a normalized area to each patch. However, both the areawise and geometric test suffer from a binary decision pitfall in which both an areawise (or geometric) significance level and a pointwise significance level must be selected. This dual selection inevitably leads to uncertainty regarding the statistical significance of a wavelet quantity.
Schulte (2016a) developed a cumulative areawise test to reduce the sensitivity to the chosen pointwise significance level. Adopting ideas from persistent homology, the test provides a way to assess the statistical significance of a wavelet quantity located at a point by examining how the area of a nested sequence of patches containing the point changes as the pointwise significance is changed. This test can be viewed as an ensemble technique, where the individual estimates of statistical significance are associated with the p values of the geometric tests performed at various pointwise significance levels. The ensemble mean is the mean of all such p values. Much like how the ensemble mean in weather forecasting filters out unpredictable aspects associated with the individual ensemble members, the cumulative areawise test filters out the unpredictable aspects associated with a time series in question; the degree to which this occurs depends on the chosen cumulative areawise significance level.
Strictly speaking, the geometric and cumulative areawise tests (and to a lesser extent the areawise test) do not assess the confidence with which periodicities are embedded in time series. To make such assessments, the temporal length of cross sections of pointwise significance patches needs to be computed and compared to a null distribution of cross section lengths. This procedure is termed the cumulative arcwise test and is like the cumulative areawise test. Two ideal cases showed that the procedure reduces the number of spurious results arising from the pointwise test while also detecting periodicities embedded in time series. The arcwise test is also capable of eliminating scaleelongated pointwise significance features induced from singularities in the time domain. Thus, when testing for the presence of periodicities, the arcwise test should be preferred to the cumulative areawise and geometric tests because those tests are unable to distinguish features arising from singularities from those associated with periodicities. An additional benefit of the cumulative arcwise test is that it operates in one dimension, rendering its implementation more rapid than that of the cumulative areawise test.
The application of the statistical tests to the Indian rainfall time series further highlighted the benefits and disadvantages of the various methods. The application of the pointwise test to the wavelet power spectra of the Indian rainfall resulted in many statistically significant results. Much of the pointwise significant wavelet power was deemed indistinguishable from background noise after the implementation of the cumulative areawise and arcwise tests. However, for some Indian subregions, statistically significant features were identified. For example, the Northwest and allIndia time series were found to contain features consistent with singularities.
The application of the cumulative arcwise test to wavelet coherence revealed statistically significant coherence between rainfall for many of the subregions and the Niño 3.4 index at a period of 16 to 64 months. The most pronounced coherence was found between the allIndia rainfall and Niño 3.4 index time series, while the weakest coherence was generally found for the Northeast region. These results highlight the spatial variability of the ENSO teleconnection. Periods of high coherence were found to be followed by periods of lower coherence, consistent with prior work showing how the Indian rainfallENSO teleconnection is nonstationary (Torrence and Webster, 1999).
Topological methods were also adopted to further assess whether the rainfall time series are consistent with a rednoise process. An examination of 0th dimensional PHPs showed that the allIndia, Northwest, and West Central rainfall time series are topologically distinguishable from red noise. The 1th dimensional PHPs revealed that all rainfall time series are distinguishable from red noise despite the lack of cumulative arcwise or areawise significance in the corresponding wavelet power spectra. The discrepancies among the tests suggest that a further investigation is required.
Although only two ideal cases have been in the paper, additional ideal cases are presented in the Supplement for reference. The ideal cases include time series with a single sinusoid and a time series with a single singularity without periodic features. An R software package has been written by the author to implement the various tests documented in this paper. The software package can be found at http://justinschulte.com/wavelets/advbiwavelet.html (last access: 29 April 2019).
Data for Indian rainfall can be accessed through https://www.tropmet.res.in/Data Archival51Page (Indian Institute of Tropical Meteorology, 2019). The monthly Niño 3.4 index is available at https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.data (NOAA/OAR/ESRL PSD, 2019).
A simplified sinusoid experiment was conducted to assess how well the various statistical tests detect a periodic signal embedded in noise. In the experiment, time series of the form
were generated. In Eq. (A1), T=8 is the period of the sinusoid, σ_{s} is the standard deviation of the sinusoid, N(t) is a realization of a rednoise process, σ_{n} is the standard deviation of N(t), and m is a number representing the signaltonoise ratio. In the experiment, 100 different realizations of X(t) were constructed by generating 100 realizations of N(t). The wavelet power spectrum of each realization of X(t) was computed. The computations were repeated for values of m ranging from 0.01 to 1.0. For each statistical test, the fraction of wavelet power coefficients statistically significant at a period of 8 was calculated for each realization. The mean fraction was then calculated for each value of m, resulting in a curve representing how much of the signal on average is detected for different signaltonoise ratios. Values close to 1.0 indicate that the test performed well at detecting the periodic signal. For the experiments, the lag1 autocorrelation coefficient was set to 0.1, 0.5, and 0.9, but only the results for 0.5 are displayed because the results are only weakly dependent on the lag1 autocorrelation coefficient. The pointwise, cumulative areawise, and cumulative arcwise tests were applied at the 5 % level, and the geometric and areawise tests were applied at the 10 % level to 5 % pointwise significance regions.
As shown in Table A1, the amount of the signal detected by any of the tests increases as the signaltonoise increases. Despite how the cumulative areawise test is more stringent than the pointwise test, the tests detected a comparable amount of the signal for low signaltonoise ratios; for signaltonoise ratios greater than 0.7, the test performed better. The cumulative arcwise test outperformed the cumulative areawise test for signaltonoise ratios greater than 0.5. Consistent with theory, both the areawise and geometric tests performed worse than the pointwise test. The performance of all tests was found to improve if the lag1 autocorrelation coefficients were increased to 0.9 and was found to worsen if the lag1 autocorrelation coefficient was set to 0.1.
The supplement related to this article is available online at: https://doi.org/10.5194/npg26912019supplement.
The author declares that there is no conflict of interest.
This research has been supported by the NASA Applied Sciences Project (grant no. NNX26AN38G).
This paper was edited by Norbert Marwan and reviewed by two anonymous referees.
Adarsh, S. and Reddy M. J.: Trend analysis of rainfall in four meteorological subdivisions of southern India using nonparametric methods and discrete wavelet analysis, Int. J. Climatol., 35, 1107–1124, 2014.
Addison, P. S.: Wavelet transforms and the ECG: a review, Physiol. Meas., 26, R155, https://doi.org/10.1088/09673334/26/5/R01, 2005.
Agarwal, A., Maheswaran, R., Marwan, N., Caesar, L., and Kurths, J.: Waveletbased multiscale similarity measure for complex networks, Eur. Phys. J. B, 91, 296, https://doi.org/10.1140/epjb/e2018904606, 2018.
Azad, S., Vignesh, T. S., and Narasimha, R.: Periodicities in Indian Monsoon Rainfall over spectrally homogeneous regions, Int. J. Climatol., 30, 2289–2298, 2010.
Edelsbrunner, H. and Harer, J.: Persistent homologya survey, Contemp. Math., 453, 257–282, 2008.
Elsanabary, M. H. and Gan, T. Y.: Wavelet analysis of seasonal rainfall variability of the upper Blue Nile Basin, its teleconnection to Global Sea surface temperature, and its forecasting by an artificial neural network, Mon. Weather Rev., 142, 1771–1791, 2014.
Fasullo, J.: Biennial characteristics of All India rainfall, J. Climate, 17, 2972–2982, 2004.
Ferri, M.: Persistent topology for natural data analysis – A survey, in: Towards Integrative Machine Learning and Knowledge Extraction, Springer, Berlin/Heidelberg, Germany, 117–133, 2017.
Gallegati, M.: A systematic waveletbased exploratory analysis of climatic variables, Climate Change, 148, 325–338, 2018.
Grinsted, A., Moore, J. C., and Jevrejeva, S.: Application of the cross wavelet transform and wavelet coherence to geophysical time series, Nonlin. Processes Geophys., 11, 561–566, https://doi.org/10.5194/npg115612004, 2004.
Hu, W. and Si, B. C.: Technical note: Multiple wavelet coherence for untangling scalespecific and localized multivariate relationships in geosciences, Hydrol. Earth Syst. Sci., 20, 3183–3191, https://doi.org/10.5194/hess2031832016, 2016.
Indian Institute of Tropical Meteorology: Meteorological Data Sets for Downloading, available at: https://www.tropmet.res.in/Data Archival51Page, last access: 15 March 2019.
Kumar, P. and FoufoulaGeorgiou, E.: Wavelet analysis for geophysical applications, Rev. Geophys., 35, 385–412, 1997.
Labat, D.: Recent advances in wavelet analyses: Part 1. A review of concepts, J. Hydrol., 314, 275–288, 2005.
Labat, D.: Cross wavelet analyses of annual continental freshwater discharge and selected climate indices, J. Hydrol., 385, 269–278, 2010.
Lane, S. N.: Assessment of rainfall–runoff models based upon wavelet analysis, Hydrol. Process., 21, 586–607, 2007.
Lau, K.M. and Weng, H.Y.: Climate signal detection using wavelet transform: How to make a time series sing, B. Am. Meteorol. Soc., 76, 2391–2402, 1995.
Liu, Y., Liang, X. S., and Weisberg, R. H.: Rectification of the bias in the wavelet power spectrum, J. Atmos. Ocean. Tech., 24, 2093–2102, 2007.
Liu, Y., Brown, J., Demargne, J., and Seo, D.: A waveletbased approach to assessing timing errors in hydrological predictions, J. Hydrol., 397, 210–224, 2011.
Maheswaran, R. and Khosa, R.: A waveletbased second order nonlinear model for forecasting monthly rainfall, Water Resour. Manag., 28, 5411–5431, 2014.
Maraun, D. and Kurths, J.: Cross wavelet analysis: significance testing and pitfalls, Nonlin. Processes Geophys., 11, 505–514, https://doi.org/10.5194/npg115052004, 2004.
Maraun, D., Kurths, J., and Holschneider, M.: Nonstationary Gaussian processes in wavelet domain: definitions, estimation and significance testing, Phys. Rev. E., 75, 016707, https://doi.org/10.1103/PhysRevE.75.016707, 2007.
Meyers, S. D., Kelly, B. G., and O'Brien, J. J.: An introduction to wavelet analysis in oceanography and meteorology: With application to the dispersion of Yanai waves, Mon. Weather Rev., 121, 2858–2866, 1993.
Narasimha, R. and Bhattacharyya, S.: A wavelet crossspectral analysis of solar–ENSO–rainfall connections in the Indian monsoons, Appl. Comput. Harmon. A., 28, 285–295, 2010.
Nayagam, L. R., Janardanan, R., and Ram Mohan, H. S.: Variability and teleconnectivity of northeast monsoon rainfall over India, Global Planet. Change, 69, 225–231, 2009.
Ng, E. K. W. and Chan, J. C. L.: Geophysical applications of partial wavelet coherence and multiple wavelet coherence, J. Atmos. Ocean. Tech., 29, 1845–1853, 2012.
NOAA/OAR/ESRL PSD: Climate Timeseries, available at: https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/,last access: 15 March 2019.
Paluš, M.: Linked by Dynamics: WaveletBased Mutual Information Rate as a Connectivity Measure and ScaleSpecificNetworks, in: Advances in Nonlinear Geosciences, edited by: Tsonis, A. A., 427–463, Springer International Publishing, Cham., 2018.
Parthasarathy B., Munot A. A., and Kothawale D. R.: All India monthly and seasonal rainfall series, 1871–1993, Theor. Appl. Climatol., 49, 217–224, 1994.
Parthasarathy, B., Munot, A. A., and Kothawale, D. R.: Monthly and seasonal rainfall series for allIndia homogeneous regions and meteorological subdivisions: 1871–1994, Research Report No. RR065, Indian Institute of Tropical Meteorology, Pune, 113 pp., 1995.
Qaiser, T., Sirinukunwattana, K., Nakane, K., Tsang, Y. W., Epstein, D., and Rajpoot, N.: Persistent homology for fast tumor segmentation in whole slide histology images, Procedia Comput. Sci., 90, 119–124, 2016.
Ramana, R. V., Krishna, B., Kumar, S. R., and Pandey, N. G.: Monthly rainfall prediction using Wavelet Neural Network Analysis, Water Resour. Manag., 27, 3697–3711, 2013.
Roy, I., Tedeschi, R. G., and Collins, M.: ENSO teleconnections to the Indian summer monsoon in observations and models, Int. J. Climatol., 37, 1794–1813, 2017.
Sahay, R. R. and Srivastava, A.: Predicting monsoon floods in rivers embedding wavelet transform, genetic algorithm and neural network, Water Resour. Manag., 28, 301–317, 2014.
Sang, Y. F.: A review on the applications of wavelet transform in hydrology time series analysis, Atmos. Res., 122, 8–15, 2012.
Schaefli, B., Maraun, D., and Holschneider, M.: What drives high flow events in the Swiss Alps? Recent developments in wavelet spectral analysis and their application to hydrology, Adv. Water Resour., 30, 2511–2525, 2007.
Schulte, J. A.: Cumulative areawise testing in wavelet analysis and its application to geophysical time series, Nonlin. Processes Geophys., 23, 45–57, https://doi.org/10.5194/npg23452016, 2016a.
Schulte, J. A.: Wavelet analysis for nonstationary, nonlinear time series, Nonlin. Processes Geophys., 23, 257–267, https://doi.org/10.5194/npg232572016, 2016b.
Schulte, J. A. and Georgas, N.: Theory and Practice of Phaseaware Ensemble Forecasting, Q. J. Roy. Meteor. Soc., 144, 1415–1428, 2018.
Schulte, J. A., Duffy, C., and Najjar, R. G.: Geometric and topological approaches to significance testing in wavelet analysis, Nonlin. Processes Geophys., 22, 139–156, https://doi.org/10.5194/npg221392015, 2015.
Schulte, J. A., Najjar, R. G., and Lee, S.: Salinity and Streamflow Variability in the MidAtlantic Region of the United States and its Relationship with Largescale Atmospheric Circulation Patterns, J. Hydrol., 550, 65–79, 2017.
Terray, P., Delecluse, P., Labattu, S., and Terray, L.: Sea surface temperature associations with the late Indian summer monsoon, Clim. Dynam., 21, 593–618, 2003.
Torrence, C. and Compo, G. P.: A practical guide to wavelet analysis, B. Am. Meteorol. Soc., 79, 61–78, 1998.
Torrence, C. and Webster, P. J.: Interdecadal changes in the ENSO monsoon system, J. Climate, 12, 2679–2690, 1999.
Yadava, M. G. and Ramesh, R.: Significant longerterm periodicities in the proxy record of the Indian monsoon rainfall, New Astron., 12, 544–555, 2007.