Non-gaussian Interaction Information: Estimation, Optimization and Diagnostic Application of Triadic Wave Resonance

Non-Gaussian multivariate probability distributions , derived from climate and geofluid statistics, allow for nonlinear correlations between linearly uncorrelated components , due to joint Shannon negentropies. Triadic statistical dependence under pair-wise (total or partial) independence is thus possible. Synergy or interaction information among triads is estimated. We formulate an optimization method of triads in the space of orthogonal rotations of normalized principal components, relying on the maximization of third-order cross-cumulants. Its application to a minimal one-dimensional, periodic, advective model leads to enhanced tri-ads that occur between oscillating components of circular or locally confined wave trains satisfying the triadic wave resonance condition.


Introduction
Climate and geofluid dynamics often display highly complex and nonlinear interactions between system variables, expressed on a physical or spectral basis.The existence of interactions means that continuous or discrete state vector components statistically constrain the values and decrease the freedom of other components by some physical process with exchanges of extensive variables between system partitions (e.g., mass, energy, momentum), or by impulse propagation (e.g., waves, solitons) or by other more abstract processes.Statistical constraints are synonymous with linear and nonlinear correlations and Bayesian-conditional probabilities among variables, which can occur either at short or at long distances, i.e., teleconnections (Wallace and Gutzler, 1981; Van den Dool and Saha, 2000) where short/long is de-fined according to a given metric defined on the state vector.Examples of that in the spectral space are the kinetic energy turbulent cascade (Batchelor, 1953) and the wave resonant interactions (Hammack, 1993) or, in the physical space, the nonlinear El Niño teleconnections (Monahan, 2001;Hsieh et al., 2006).Information theory (Shannon, 1948;Cover and Thomas, 1991) proposes a consistent quantitative statistical measure (in bits or nats) of those constraints by assessing the mutual information (MI) and the MI multivariate version, the multi-information (MII) or total correlation (Schneidman et al., 2003) that is shared among variables.The information flow is transferred across a certain channel, which is interpreted as an interaction process (Liang and Kleeman, 2007).Under that information theory framework, the absence of interaction means the statistical independence among variables and a vanishing MII.For dynamical systems, time-extending interactions are measured by the concept of Granger causality (Granger, 1969) by computing MIIs in the delayed coordinate phase space.Beyond the referred papers, there are many applications of information theory in nonlinear geophysics, such as in predictability (DelSole, 2004;Abramov et al., 2005), in stochastic theory of multiscale systems (Majda et al., 2005), in ensemble forecasting (Roulston and Smith, 2002) and in data assimilation (Xu, 2007;Zupanski et al., 2007).
Let us consider the system state vector as an Ndimensional random vector (RVec) X ≡ (X 1 , . .., X N ) of N continuous-(or discrete-) valued scalar variables (in italic), where prime denotes transpose.The components can also be interpreted as nodes of a complex network (Donges et al., 2009) where multiple interactions lie.It is often useful to substitute X by vectors of lower dimension that (a) explain Published by Copernicus Publications on behalf of the European Geosciences Union & the American Geophysical Union.much of total variance or total Shannon entropy and (b) have no trivial redundancies between components (e.g., by being linearly uncorrelated), thus making the source separation of the signal.That is possible for continuous-valued RVecs by use of principal component analysis (PCA) (Hannachi et al., 2007), independent subspace analysis (ICA) (Hyvärinen and Oja, 2000), independent component analysis (ICA) (Theis, 2005(Theis, , 2006;;Theis and Kawanabe, 2006;Pires, 2015) and other blind source separation (BSS) methods.However, many of those techniques perform sub-optimally because the issuing variables or factors (e.g., the PCs from a PCA) still have some statistical dependence (non-null MII).This occurs for the vector of PCs derived from a RVec with a non-Gaussian joint probability density function (PDF).Although PCs are linearly uncorrelated, they can exhibit nonlinear correlations (Pearson correlations between functions not written as linear affine combinations of components).Non-Gaussianity (NGty) is evident from climatic observations and model runs through the presence of asymmetries, multimodalities, clusters and kurtoses of both marginal and multiple-projected PDFs (Hannachi and O'Neill, 2001;Sura et al., 2005;Franzke et al., 2007;Pires and Perdigão, 2007).Non-Gaussian (NG) sources include the nonlinear drifting as well as the random non-Gaussian and non-stationary forcings of the dynamical equations and the deterministic chaos.
The paper's motivation is the fact that the non-Gaussian world (of NG PDFs) allows for much more complex, including multiple (triadic, quartic etc.), interactions that are not explained by linear correlations or second-order statistical moments alone, since uncorrelatedness and joint Gaussianity imply global independence.
In the paper, we will focus on sets of three random variables (triads).In fact, three uncorrelated non-Gaussian variables (Y 1 , Y 2 , Y 3 ) (a non-Gaussian triad) may produce nonnull MII values due to nonlinear correlations between functions mixing the three variables.An even more constrained situation is possible, supposing three variables that, while being pair-wise independent (zero dyadic MIs), are not independent as a trivariate whole.In this case, the variables form a perfect triad, due to their interaction synergy leading to an emergent cooperative phenomenon (e.g., a triangle of lovers who meet together only in pairs and in an equitable way, ensuring no preference of any couple).That synergy is measured by the interaction information (IT), which is defined for a triad as (McGill, 1954;Tsujishita, 1995).IT is the MII remaining term that one obtains after discarding all the information coming from proper subsets of variables (here, the dyads).
IT can even be negative in the case of statistical redundancy (e.g., when interactions come into play, e.g., as the aforementioned love triangle moves from pair-wise meetings to a threesome).The IT concept is generalized to any number of variables (Jakulin and Bratko, 2004) and applied to many domains (e.g., economy, sociology, criminology, man-agement, medicine, neurology, sampling design), as one tries to find, from an available pool of candidates, playing factors that maximize a given criterion of useful interactivity (e.g., efficiency) (Timme et al., 2013) or that are responsible for an identified situation (e.g., genes explaining a disease).Both MII and IT can be split into the Gaussian and non-Gaussian terms (Pires andPerdigão, 2012, 2013) coming, respectively, from the best Gaussian joint PDF fit and from joint non-Gaussianity (NGty).The NG terms dominate when used variables are uncorrelated.Now, we present conditions under which IT is maximized.For that, let us consider the partition of each triad variable (Y 1 , Y 2 , Y 3 ) into N c equiprobable categories or symbols.Then, IT is maximized when the bulk of the total probability occurs in N 2 c out of the N 3 c disjoint events, which are described by a Latin-square (LS) look-up table (e.g., a Sudoku game for N c = 9).An example to be put into evidence in practice holds when N c = 2 and categories (Y 1,c , Y 2,c , Y 3,c ) satisfy a Boolean exclusive disjunction or a logic equivalence, hereby denoted, respectively, by The conditions underneath high IT values are difficult by construction, since dyadic MIs have to be minimized or even vanished, which can be quite rare or difficult to find in nature.That is the case with high-dimensional systems (many degrees of freedom), where multiple quasi-independent interactions (e.g., the brain's network of neurons) tend to produce Gaussian PDFs by invoking the central limit theorem or when a variable's time averages are taken for describing the system's low-frequency variability (LFV).Therefore, NGty, nonlinear correlations and multiple interactions are somehow hidden in the high-dimensional space.
In order to get highly synergetic triads, we apply a method relying on the projection pursuit (PP) (Friedman and Tukey, 1975) technique, aiming to maximize NGty or the negentropy (Shannon entropy deficit with respect to the best Gaussian joint PDF fit).Therefore, first the variables are changed into a vector of uncorrelated variables of unit isotropic variances (e.g., the normalized PCs), thus preserving uncorrelatedness under orthogonal rotations.Then, the IT maximum is sought among three-dimensional (3-D) projections of orthogonally rotated standardized PCs taken from a subset in accordance with a given criterion (e.g., those PCs that maximize IT or the PCs of leading variance).However, IT is a very difficult functional, depending on the full X probability density function (PDF) whose estimation is quite difficult due to the "curse of dimensionality" (CC) (Bellman, 1957).Therefore, in order to overtake the CC effect (Bocquet et al., 2010), we maximize, by gradient-descent methods (Gilbert and Lemaréchal, 1989), an IT proxy functional written in terms of the rotation matrix or equivalently in terms of the set of generalized Euler angles spanning the space of all orthogonal rotations (Raffenetti and Ruedenberg, 1969;Hoffman et al., 1972).The functional to be used and maximized consists of a truncated version of the IT Edgeworth expansion, relying on cross-cumulants of order higher than 3 (Comon, 1994).The functional vanishes under joint Gaussian conditions.For the sake of geometric interpretation, it is written as the square of a three-order joint cumulant E(Y 1 Y 2 Y 3 ) (E stands for "expectation"), between the unit-variance, uncorrelated, centered components of the triad.Moreover, it is proportional to a triadic correlation, denoted as cor 3 (Y 1 , Y 2 , Y 3 ), consistently satisfying the Schwarz inequality.That cumulant is written as a third-order polynomial form of rotation matrix coefficients and as a linear form of three-order joint cumulants of the PCs.
For the application of the IT concept and its optimization in the rotation space, we have estimated the triadic IT applied to PCs of the output of the 40-variable Lorenz model (1995), hereafter called Lorenz-95.The 40 variables are in fact longitudinally equidistant nodes in a latitude circle, along which a one-dimensional flow is set to take place.This is thus a minimal one-dimensional (1-D) model of fluid motion with nonlinear advection, linear dissipation and constant forcing, integrated along a latitude circle with circular periodic conditions.Circular symmetry of statistics leads to twice-degenerated sinusoidal empirical orthogonal functions (EOFs), each one with an associated wave number.Therefore, PCs are discrete Fourier transforms (DFTs).We show that non-null values of the moment E(Y 1 Y 2 Y 3 ), between normalized PCs, and of the estimated IT, I t (Y 1 , Y 2 , Y 3 ), are obtained when the integer zonal wave numbers k 1 , k 2 , and k 3 satisfy the wave resonance condition (WRC) k 1 ±k 2 ±k 3 = 0 (Bartello, 1995;Hammack, 1993).It consists of the buildup of a third wave from the quasi-linear interaction of two weak amplitude waves satisfying the dispersion relationship.Optimized triads following the aforementioned method are also computed.They correspond to locally confined wave trains satisfying the WRC for the number of local oscillations.Therefore, the triad optimization can be characterized as a generalized WR diagnostic under the studied model's framework.
After this introduction, the paper presents the concepts of multi-information and interaction information in Sect.2, followed by the optimization implementation of triads (Sect.3).The application to the Lorenz-95 model appears in Sect.4, ending with the conclusion (Sect.5).Some proofs and MII estimators are included in the Appendices A and B.

General context
Without loss of generality, let X ≡ (X 1 , . .., X N ) be a zero average N-dimensional physical RVec, composed of components X i , i = 1, . .., N in italic, and C X,X and C 1,X,X , respectively, the covariance and correlation matrices of X.The PCA of X comes from the factorization C X,X = W W , where W is the orthogonal matrix of normalized EOFs in columns, satisfying W −1 = W , and ≡ Diag (λ 1 , . .., λ N ) is the diagonal matrix sorted by decreasing PC variances, i.e., λ 1 ≥ . . .≥ λ N .The vector of PCs and the standardized (unitvariance) PCs are, respectively, given by X PC ≡ W T X and X u ≡ −1/2 X PC .The PCs are uncorrelated, though they can be statistically dependent; i.e., they interact if they have nonvanishing nonlinear correlations.
The goal of the paper is to find small subsets of D N rotated uncorrelated components such that (a) they explain a relevant fraction of the total variance, (b) they exhibit high nonlinear multivariate correlations with a simple geometric and physical interpretation in terms of the dynamics governing X, and that (c) the correlations emerge from synergies of the D components, hence not being explained by proper subsets.Without loss of generality, we restrict rotations to a set of normalized PCs chosen according to a statistical or physical criterion (e.g., a set of variance-explained leading PCs), by writing X u as a concatenation or a union set of components or, still, a projection (depending on the context) written in the form X u = X u, signal ||X u, noise , where X u, signal is the part to be rotated (the signal) of dimension N rot ≤ N, corresponding to a certain set S of PC indices, and X u, noise is the remaining projection (the generalized noise), in particular the null set.We can consistently write X u, signal = ∪ i∈S {X u, i }, where the union symbol has the meaning of concatenation.
Therefore, the rotated vector is Y = Y rot ||X u, noise with Y rot ≡ RX u, signal , where R is a N rot -order orthogonal matrix; i.e., R = R −1 .The components of Y rot ≡ (Y 1 , . .., Y N rot ) are uncorrelated, given by inner products Y i = v i X u, signal , where v i ≡ (R i1 , R i2 , . .., R iN rot ) ; i = 1, . .., N rot are orthonormalized vectors satisfying v i v j = δ ij , i, j = 1, . .., N rot where δ ij is the Kronecker delta.The components Y k , k = 1, . .., N rot are a linear combination of loadings multiplying the physical component perturbations in the form The loading vectors (W Y 1 ,k , . .., W Y N ,k ) are in general not orthogonal in the physical space unless the rotation is composed of partial rotations over different subsets of PCs of degenerated (equal) variance in each subset.We will focus in particular on triads (D = 3) of components (Y 1 , Y 2 , Y 3 ) of maximum interactivity.They are optimized among the restrictions of the state vector to a certain signal projection and among the set of orthogonal rotations R.
C. A. L. Pires and R. A. P. Perdigão: Non-Gaussian interaction information

Basic information-theory definitions
The assessment of statistical interactivity due to linear and nonlinear correlations relies on concepts of information theory (Shannon, 1948) and measures of non-Gaussianity.Given its relevance for the paper, we briefly present some definitions and notations, valid for any N-dimensional RVec A ≡ (A 1 , . .., A N ) , in particular for X or Y defined above.
-Definition 1: the "Gaussianized" vector A g ≡ (A 1g , . .., A N g ) is the N-dimensional RVec composed of marginal Gaussian anamorphoses (Wackernagel, 1995) , which are changed into standard Gaussian distributed variables of zero mean and unit variance.
-Definition 2: the Shannon differential entropy (SDE) (Shannon, 1948) of The entropy H (A) under statistical independence of the components is H i (A) ≡ N k=1 H (A k ), i.e., the sum of marginal SDEs.The SDE of the multivariate Gaussian PDF with the same average and covariance matrix as A is written as for independent components, h g = 1/2 log(2π e) is the Shannon entropy of the standard Gaussian with PDF ϕ(X) ≡ (2π ) −1/2 exp(−1/2X 2 ) and F g (A) ≡ −1/2 log(detC 1,A,A ) ≥ 0, dependent on the correlation matrix C 1,A,A , with F g (A) being zero for a scalar or for uncorrelated components.
-Definition 3: the negentropy of A is a non-negative quantity J (A) ≡ H g (A) − H (A) ≥ 0, being the deficit of SDE with respect to the Gaussian PDF with the same average and covariance matrix.The negentropy for independent components is J i (A) ≡ H gi (A)−H i (A) ≥ 0, being the sum of marginal negentropies.J is invariant for any invertible linear affine transformation of A. For A g we get J i (A g ) = 0 despite having J (A g ) ≥ 0.
-Definition 4: the multi-information (MII) (Schneidman et al., 2003) or total correlation of A is a non-negative quantity given by measuring the full statistical dependence among components of A. For N = 2, we recover the mutual information (MI) The subscript i in I i is omitted when all components are explicitly written as MII arguments.
MII is invariant for scalar one-to-one changes in components, in particular for A g , from which we obtain I i (A) = I i (A g ).This allows for decomposing of the MII into two consistent positive terms: I i (A) = I ig (A) + I ing (A) (Pires and Perdigão, 2007, 2012, 2013;Hlinka et al., 2014), respectively, the Gaussian and the non-Gaussian MIIs, where I ig (A) = I ig (A g ) ≡ F g (A g ) ≥ 0 is uniquely dependent on "Gaussian" correlations or correlations between "Gaussianized" components, and I ing (A) = I ing (A g ) = J (A g ) ≥ 0 is the part of MII due to the joint negentropy of the PDF with Gaussian marginals.Each term I ig or I ing is invariant for monotonic (though not for non-monotonic in general) changes in marginals.Thanks to the minimum mutual information principle (Globerson et al., 2009), I ig is the minimum MII under imposed Gaussian marginals and a correlation matrix C 1,A g ,A g between Gaussianized components.

Multi-information of a projected and rotated vector
In what follows, we present some results about the multiinformation and non-Gaussianity crossing components of the sought projection Y proj , introduced before, and how it depends on the rotated vector Y rot .Theorem 1: the MII of Y proj expands as The proof comes in Appendix A. Therefore, from Eq. (3), the MII of Y proj varies in the opposite sense of (a) the marginal negentropies of rotated components (projected ones and their orthogonal complements), (b) the MII of the non-projected variables comprising Y comp and (c) the mutual information between the projected and non-projected parts: Since the MII is invariant for one-to-one changes in the intervening variables, in particular rotations, that is understood as the MI between spanned subspaces span(Y proj ) and span(Y comp ).In particular, if Y proj = Y rot , i.e., is a full projection (D = N ), the terms within the square brackets of Eq. (3) reduce to J i (Y rot ), under which the MII of Y proj is minimized when the sum of marginal negentropies is maximum.That is the purpose of the independent component analysis (ICA) (Hyvärinen and Oja, 2000;Hastie et al., 2001).
The goal here is simply the opposite of ICA's, by maximizing I i (Y proj ) or, more precisely, the part of it dealing with synergies among components.This is discussed in the next section.
The following Theorem 2 shows the dominance of MII I i (Y proj ) by its non-Gaussian contribution.Since components of Y rot are uncorrelated, one expects their Gaussian absolute correlations c gi,j ≡ |cor(Y ig , Y j g )| = O(δ) 1, j = i ∈ {1, . .., D} to be a small residual coming from the static Gaussian anamorphoses.Therefore, under those conditions one has Theorem 2: I i (Y proj ) is expanded as The scaling of shows that the non-Gaussian MII is dominant in general when components are uncorrelated, expressed as the joint negentropy of Y proj, g , also coinciding in this case with its compactness (Monahan and Del-Sole, 2009).That measures the level "featureness" of a data concentration around a lower-dimensional manifold, different from a unit-sphere D-dimensional ball, corresponding to an isotropic Gaussian at which J (Y proj, g ) = 0.

Interaction information in general
The components of Y proj interact in a certain sense (e.g., physically); therefore, they are statistically linked to each other, with the strength of their inter-relationships or the closeness to certain geometric or deterministic crossconstraints being measured by information or by the MII.
Those interactions can emerge from proper subsets of components like local interactions within a complex network, or may consist in global or synergetic emerging properties entailed by all members (components) as a whole, like a collective cooperative phenomenon.That decomposition is accounted for by the concept of interaction information (IT) (McGill, 1954;Tsujishita, 1995;Jakulin and Bratko, 2004), which we present here for the generic N-dimensional RVec A introduced in Sect.2.2.The IT of A, hereby denoted by I t (A) (where the subscript t distinguishes it from I i ), is the part of I i (A) not accounted for by any proper subset of components of A, i.e., by any subset B ⊂ A. In particular, for A = (A 1 , A 2 ) , IT coincides with the MI; i.e., I t (A 1 , A 2 ) = I (A 1 , A 2 ) ≥ 0. The IT is built according to the recurrence property where A is B increased by a scalar component C and I t (B|C) is the conditional IT using the conditional PDF ρ B|C in its definition.Jakulin and Bratko (2004) show that I t is given by adequate linear combinations of Shannon entropies and MIIs of the A subsets as In the sums of Eq. ( 6), subsets B of A are arranged according to their cardinality |B|.Positive values of IT for dim(A) = |A| ≥ 3 mean synergy across components, while negative IT values indicate redundancy (Timme et al., 2013).The Gaussian and non-Gaussian terms from the decomposition of IT, denoted, respectively, as I tg and I tng , come from its MII decomposition in Eq. ( 6), using Definition 4.Moreover, the MII is the full sum of ITs which shows that the sum of synergies and redundancies is globally positively constrained.An important corollary comes from Eq. ( 6): Corollary 1: if A is composed of uncorrelated components, from which H = H gi − J , then the IT is a function of negentropies as Eqs. ( 6)-( 8) hold in particular for A = Y proj , expressing how the synergies depend on MIIs and negentropies of subsets.

Interaction information between triads and its estimation
We now focus on the IT of triads, i.e., of A = (A 1 , A 2 , A 3 ) .Considering Eqs. ( 5) and ( 6), it is given by where is the notation for the sum of all dyadic MIs and (i, j, k) is a generic permutation of (1, 2, 3).In Eq. ( 9), the conditional MI depends on , where I 2 (Y ) is dominated by the non-Gaussian dyadic MIs.Synergetic triads occur when components are nonlinearly related while keeping weak or vanishing pair-wise dependence.
The IT I 3 also develops in terms of the Kirkwood superposition approximation (Pearl, 1988) in the form and univariate PDFs.Under those conditions, the triadic IT is a Kullback-Leibler divergence (Cover and Thomas, 1991) between two PDFs, which is necessarily non-negative; i.e., I 3 ≥ 0. It holds if at least one pair of variables is independent (e.g., if A 1 and A 2 are independent, then , corresponding to less restrictive non-perfect triads, hereafter called asymmetric triads.The asymmetry of the dependence relationships introduces some directional causality susceptible to physical interpretation.It yields successively less constrained measures of interactivity: where for some permutations (i , j , k ) and (i, j, k) of (1, 2, 3).The equalities hold when two or one (out of the three) dyadic MIs are null, respectively.The four possibilities of triad interactivity are sketched by non-oriented graphs displayed in Fig. 1a-d.
The MIIs and entropies are subjected to several inequalities (Makarychev et al., 2002).That leads to a useful lower bound of I t .For that, let us write A k = f (A i , A j )+w, where f is a nonlinear one-to-one function of A i for each A j that is supposedly correlated with A k and w is a noise.By playing with conditional MIs, we can write If the noise w is independent of both A i and A j , then the second term of the rhs of Eq. (11b) vanishes and I (f, A k ) can be approximated by its Gaussian part; i.e., Perfect triads can be obtained by nonlinear mixing of globally independent variables.For instance, taking independent standard Gaussian RVs A 1 , A 2 , W ∼ N (0, 1) and ) leads to a perfect triad with bivariate Gaussian isotropic PDFs and the joint non-Gaussian PDF The signals of the RVs satisfy a Boolean-like relationship, sgn(A i ) = sgn(A j )sgn(A k ), and the correlations between any two and the third variable are cor(A j A k , A i ) = 8/(2π ) 3/2 ∼ 0.51, where i, j, k is any permutation of (1, 2, 3).That triadic symbolic link leads to the twisting character of the PDF isosurface ρ A 1 ,A 2 ,A 3 = 0.01 plotted in Fig. 2.
In a perfect triad like (A 1 , A 2 , A 3 ) satisfying Eq. ( 13), the unmixing operator recovering original independent variables is nonlinear, which could only be possible by using nonlinear ICA (Almeida, 2003(Almeida, , 2005)).This means that linear ICA, using rotated variables, is unable to unfold perfect triads into independent scalars, because the 2-D manifold where the PDFs concentrate has an intrinsic curvature that is preserved by the isometric rotation transformation.
For uncorrelated variables, i.e., (A As a corollary, it follows that jointly Gaussian perfect triads are impossible, since uncorrelatedness implies global independence.Consequently, synergetic perfect triads are necessarily jointly non-Gaussian.Using Eq. ( 8), the IT between uncorrelated variables is written in terms of joint and marginal negentropies: which is totally due to non-Gaussianity of variables.For uncorrelated and marginally non-Gaussian variables, Theorem 3 applies and I 3 ≈ I 3ng .In practice, in Sect.4, the triadic and dyadic MIIs necessary to compute IT will be numerically estimated (see Walters-Williams and Li, 2009, for a panoply of MII estimators).For that, we first estimate the joint 2-D or 3-D PDFs by a kernel-based estimator and then the MIIs and IT by an integral Gauss quadrature formula with previous Gaussian anamorphoses of marginals (see details in Appendix B along with the MI estimator's bias depending on sample size).
For weak deviations from joint Gaussianity, the negentropies in Eq. ( 15) can be expanded in terms of self-and cross-cumulants (Comon, 1994), using the Edgeworth expansion of the joint PDF (Van Hulle, 2005).In particular, the second-, third-and fourth-order cumulants between centered variables (used here) are, respectively, being invariant for any permutation of the indices (i, j, k, l) ∈ {1, 2, 3}.After some tedious algebra, we get a positive fourthorder approximation of I 3 , depending on squares of third-and fourth-order cross-cumulants as where Y 1 , Y 2 , and Y 3 are supposed to behave as a sum of N dof-Ed iid variables, which grows as long as variables are closer to Gaussianity, where the central limit theorem is invoked.The first two big terms of Eq. ( 17) depends on trivariate cumulants, while the third and fourth big terms involve bivariate cumulants, thus showing that triadic interactivity can be due to combinations of nonlinear dyadic correlations.Equation ( 17) depends on the mixing cumulant Therefore, thanks to the linear pair-wise uncorrelation and variable normalization, that is proportional to cor In order to define a quantity that is independent of the particular order of those products, we introduce here a consistent measure of trivariate correlation between centered, normalized and pair-wise uncorrelated variables Y 1 , Y 2 , and Y 3 , satisfying the Schwarz inequality.The measure is given by the cubic root of the product of those three correlations as and will be tested in practice.Returning to Eq. ( 12), we will also test the approximation The goodness of that is measured by the difference between |cor 3 | and the equivalent correlation triadic correlation, obtained by inverting I 3 from Eq. ( 19):

Discrete triadic interactions
Maximum triadic interactivity comes essentially from certain categorical (discrete) deterministic relationships, without the need to consider continuous PDFs.Those relationships can be further enhanced in a coarse-grained description of the joint PDF.For that, we can consider a partition of the support sets of any of the marginal variables into N c ≥ 2 equiprobable subsets, indexed by N c different symbols (e.g., digits, letters).Subsets are not necessarily connected (e.g., unions of disconnected intervals).The corresponding coarse-graining transformation is where s j , j = 1, . .., N c are symbols (e.g., numbers or letters).Therefore, the full discrete PDF is characterized by , k and a global normalization constraint.The triadic interaction information in the discrete form is where ). Thanks to the data processing inequality (Cover and Thomas, 1991), the MI always decreases in the coarsegraining process; i.e., ).However, the same inequality is not necessary for I 3 of non-perfect triads.
The conditions of a perfect triad, i.e., ), require the constraining condition of pairwise independence, formulated through the 3N 2 c conditions: Let us analyze it for the simplest case of perfect triads in which N c = 2 and B 2 = {1, 2}.After some algebra, we get all the 8 = 2 3 joint probabilities, uniquely dependent on a single free positive parameter p ∈ [0, 1/4].Any of the 8 joint probabilities are p or (1/4 − p), respectively, when (one or three) or (zero or two) out of the three variables are set to 2 ∈ B 2 = {1, 2}.The corresponding IT is . IT is minimized at I 3 = 0 when all the 8 probabilities are equal to p = 1/8 (independent variables) and maximized at I 3 = log(2) for p = 0 or p = 1/4, from which only 4 = N 2 c events have non-vanishing probabilities.There, any third variable, for instance Y 3,c , is totally determined as a function of the remaining pair (Y 1,c , Y 2,c ).The lookup tables of Y 3,c for p = 0 and p = 1/4 are given in Table 1, in which all the N c = 2 sym-bols are used (without repetition) in any row and any column as a consequence of the requirements of independence of row indices (Y i,c ), column indices (Y j,c ) and table-cell values (Y k,c ) with i, j, k different.
These non-repetition conditions are precisely those of the Latin square (Dénes and Keedwell, 1974) of N c symbols (e.g., a Sudoku game in which N c = 9).The more synergetic perfect triads, i.e., those maximizing IT, are the Latin-square categorical relationships between N c iid symbols with the IT: (Bailey, 1996).This is a quite useful property, for example in the efficiency of sampling and experiment design (e.g., the design of schedules of sports leagues derives from a Latin square).
Still considering the above cases of Latin squares of N c = 2 symbols and by making the correspondence of B 2 elements as 1 → True →; 2 → False, the above tables are isomorphic to the Boolean logic tables, respectively, of the exclusive disjunction 1).
The emerging property common to both tables is that, for ergodic processes and during 3/4 of the time (0.75 of probability), only one out of the three variables ) is equal to a given symbol S (e.g., above or below the median), while during 1/4 of the time (0.25 of probability), the three together exhibit that symbol S.
An interesting result is that squares of trivariate cumulants (e.g., (κ 123 ) 2 , (κ 1123 ) 2 ) appearing in Eq. ( 17) grow as long as the joint trivariate PDF is close to certain LS forms.For example, let us consider the two-symbol LS described by the Boolean equivalence 2, 3 are equally probable classes, µ i is the median of Y i , i = 1, 2, 3, and η i is the median of Y 2 i .That LS holds for two different category attributions.One renders A suggestive example of a perfect triad is a "love triangle" whose intervening parties meet together only in pairs such that each one meets someone else half of the time and one is alone the other half of the time.Asymmetric triads are also easily imagined from the above example, playing with the different levels of allowed collusions between members.Let us associate the variable values +, (−) with (Y 1,c , Y 2,c , Y 3,c ) components when one meets someone (is alone).These constraints lead to (Y 1,c , Y 2,c , Y 3,c ) being pair-wise independent, filling the conditions of a perfect triad with I 3 (Y 1,c , Y 2,c , Y 3,c ) = log(2).Once an external control drives or they decide to form a whole group (threesome), there will be redundancy, and Nonlin.Processes Geophys., 22, 87-108, 2015 www.nonlin-processes-geophys.net/22/87/2015/ The aforementioned situation is somehow similar to another triadic interaction, though not perfect, stating that only two out of the three categorical variables are set to equal values, say occurring in phase.Considering in this case that all the six possible values of (Y 1,c , Y 2,c , Y 3,c ), (+,+,−), (+,−,+), (−,+,+), (−,−,+), (−,+,−), and (+,−,−), have equal probabilities 1/6, we see that variables are no longer pair-wise independent, giving a value 3 Optimization of the interaction information

Proxies of the interaction information and its maximization
The aim of this section is to find relevant triads in the form , in the space of orthogonally rotated components of the N rot -dimensional space of spherized PCs of the signal part of the original physical field.
) is therefore a functional depending both on the rotation matrix R and on the joint PDF of X u, signal , which can be quite difficult to estimate from finite samples due to the dimension N rot of X u, signal due to the curse of dimensionality (Bellman, 1957).Therefore, we will follow a projection pursuit (PP) (Friedman and Tukey, 1975;Huber, 1985) strategy, by considering a projection index, i.e., a functional of the Y proj components simulating I 3 (Y 1 , Y 2 , Y 3 ), or equivalently, a function F opIT (R), explicitly written in terms of variables characterizing orthogonal rotations, the generalized Euler angles (Raffenetti and Ruedenberg, 1969;Hoffman et al., 1972).This technique is also used in dyadic source separation of climatic data (Pires, 2015).One possibility for F opIT (R) is to consider a truncated form of the Edgeworth expansion of IT of Eq. ( 17).However, for the sake of geometric simplicity, we consider that proxy to be proportional to the square of some nonlinear correlation between zero average functions: F 1 (Y proj ) and F 2 (Y proj ), where the dependencies of F 1 (Y proj ) and F 2 (Y proj ) include the three variables.In synthesis, F opIT (R) is positive, satisfying where E(F 1 ) = E(F 2 ) = 0 and F opIT (R) = 0, ∀R when the joint PDF of X u, signal is Gaussian.Under non-Gaussian conditions, the goal is thus to find max R F opIT (R), i.e., the maximum over the space of rotations, leading to the dominant synergetic triad in the subspace span(X u, signal ).In that space, if |cor(F 1 , F 2 )| is high enough, then data are concentrated along some principal manifold (Hastie and Stuetzle, 1989) of dimension two, i.e., along a surface with a geometric shape described by the regression equation where w is some independent noise and σ means standard variation.
We will consider the proxy relying on third-order cumulants by taking

Maximization of the interaction information over the space of rotations
In order to get max R F opIT (R), R needs to be written in terms of a control vector spanning all possible rotations of a N rotdimensional metric space.This is done by using the vector of generalized Euler angles with a dimension N ang = 1/2N rot (N rot − 1) (e.g., N ang = 3 for N rot = 3): Each θ k ∈ [0, 2π[ rad is the rotation angle on the plane spanned by orthogonal components where k is a counting index determining univocally the pair [i(k), j (k)].
Then, we find local maxima of F opIT [R(θ )] using the quasi-Newton descent-gradient technique on −F opIT (R(θ )) implemented by the M1QN3 Fortran subroutine (Gilbert and Lemaréchal, 1989).The gradient is coded with respect to angle components.The rotation optimization in terms of Euler angles (Hoffman et al., 1972) is alternative to that relying on direct components of Y proj constrained by orthogonality relationships through Lagrange multipliers (Jennrich, 2001;Trendafilov, 2006).The function F opIT is not necessarily globally convex, hence owing multiple local maxima www.nonlin-processes-geophys.net/22/87/2015/ Nonlin.Processes Geophys., 22, 87-108, 2015 like the contrast function (Comon, 1994) maximized in ICA.Therefore, a reasonable number N trial of randomly chosen first guess angle vectors are chosen for the maximization.In order to assure that the absolute maximum is reached, the number N trial must increase with N rot , since the number of local maxima also increases.The derivative of F opIT [R(θ )] with respect to θ k is obtained by the derivative chain rule applied to θ k → Y proj → F Y → F opIT as where (i, i , i ) in Eq. ( 28b) is a permutation of (1, 2, 3) and S i,l,(k) (line i, column l) are entries of the matrix The gradient depends on pre-computed third-order moments of un-rotated variables, hence saving time in the maximization.The algorithm can be generalized to an arbitrary control function.

Model physics
The previous information-theoretic analysis for the statistical diagnosing of triadic interactions will be applied to a minimal one-dimensional (1-D) fluid motion model, retaining the nonlinear advective part of Navier-Stokes equations (NSEs) and being subjected to constant external forcing and linear dissipation.Under that framework, we will directly relate triadic correlations to the physical-meaning conditions of triadic wave resonance (Bartello, 1995), thus giving the potential relevance of those diagnostics when applied to more complex models or to observational data sets.
For that purpose, we use the 40-variable Lorenz-95 model (Lorenz, 1995) that is written in terms of N = 40 discretized, equally spaced values X i (t), i = 1, . .., N of the fluid velocity field u(x, t) around a latitude circle with circular periodic conditions, where x ∈ [0, 2π [ is longitude and t is time.The model equations are where F is the constant forcing.Coefficients of Eq. ( 30) (e.g., one unit for the dissipative term) are adjusted in such a way that one time unit roughly represents 5 days of largescale atmospheric circulation (Lorenz, 1995).Here, we take F = 8, also used in predictability (Lorenz and Emanuel, 1998) and data assimilation studies (Evenson and Fario, 1997;Van Leeuwen, 2010), leading to a chaotic attractor with an average perturbation-doubling period of 0.42 time units (2.1 days).The Lorenz-95 model has a single fixed unstable state X i = F, ∀i with correspondent perturbations of different wavelengths exchanging energy between them and propagating eastward (towards higher indices).Eq. ( 30) is invariant under transformations i → i + l due to the circular symmetry; hence, the statistics over the attractor's PDF reflect it by the invariance of the marginal PDFs ρ X i with respect to i. Similarly, the 2-D and 3-D projected PDFs ρ X i ,X l and ρ X i ,X l ,X n , with i ≤ l ≤ n, are arbitrary functions of types f 1 (l − i) and f 2 (l − i, n − l), respectively.Moreover, the quadratic terms are conservative, leading to the kinetic energy equation Taking infinite time averages of Eqs. ( 30) and (31), i.e., averages over the attractor's ergodic PDF, and denoting the average of X i by µ X , its variance by σ 2 X , and its shift correlations by c X,l ≡ cor(X i , X i+l ), we get, thanks to the circular symmetry: All the theory of Sects. 2 and 3 applies, with the state vector X ≡ (X 0 , . .., X N−1 ) substituted by the perturbed vector X − X = X − 1µ X where 1 is a vector of "ones".

Model Fourier transform
Some conclusions are drawn by using a Fourier analysis of Eq. ( 30).The solutions of the model are N-shift periodic; hence, a discrete Fourier transform (DFT) (http://en.wikipedia.org/wiki/Discrete_Fourier_transform)applies by expanding X i in terms of discrete complex exponentials: where i = √ −1, the star denotes the complex conjugate, k is the integer wave number, k,i are orthogonal sinusoidal functions, collected in the matrix of column vectors, satisfying the normalization Representing the state and DFT vectors by X ≡ (X 0 , . .., X N−1 ) and T X ≡ (T −N/2+1 , . .., T N/2 ) , respectively, we collect Eq. (33a, b) in the condensed representation X = 1/N T X , where T X = * X with the orthogonality condition condensed as * = * = NI N , where I N is the N-dimensional identity matrix and the star means the adjoint ≡ transpose of a complex conjugate.Now, we will present fundamental results relying on single-and cross-DFT statistics, coming from the circular invariance of the state vector statistics.The instantaneous DFT of the l shift of N lk , ∀l ∈ Z.However, its time average (hereafter denoted by an overbar) is independent of l, leading to T X i+l ,k = T X i ,k = T X i ,k exp i 2π N lk ; hence, the wavy-DFT averages vanish; i.e., T X i ,k , k = 0.The sole non-null DFT is that corresponding to the average: T X i ,0 = N µ X .
Similarly, taking the product of two DFTs, the moment T X i+l ,k T X i+l ,k shall be independent of l due to circular symmetry, leading to , all pairs of DFTs are uncorrelated, except those in which one is the complex conjugate of the other, giving the average Fourier power: The uncorrelatedness of the DFTs has a relevant consequence regarding the principal components (PCs) of the state vector X(t).The standardized PCs, merged in the vector X u (t), are uncorrelated with each other and have unit variance.The standardized DFTs are collected in the N -dimensional zero-average vector T X,u ≡ P −1/2 X (T X − T X ) whose covariance matrix is P X ≡ diag(P N/2−1 , . .., P 1 , var(T 0 ), P 1 , . .., P N/2 ).
The standardized DFTs are uncorrelated and have unit variance too, leading to the covariance matrix between complex variables: T X,u T * X,u = I N .Therefore, since both X u (t) and T X,u are statistically spherized, X u (t) can be expressed in terms of an orthogonal rotation of T X,u , i.e., , where R is an Ndimensional complex unitary matrix, i.e., RR * = R * R = I N .Now, using the notation of Sect.2.1, the vector of PCs is where, as before, W is the matrix of column EOF vectors and is the diagonal matrix of PC variances.Eq. ( 34) holds for any time, i.e., for any value of X − X; hence, the matrices multiplying that in Eq. ( 34b) and (c) must be equal, i.e., Since the EOFs are spatially orthogonal, one has W W = I N leading to If the Fourier spectrum is non-degenerated, the EOFs corresponding to a certain wave number k = 1, . .., N 2 − 1 have degeneracy 2, with EOFs being sinusoidal functions in quadrature with values in the ith point: , where φ k is a given phase.The PCs come simply as the cosine and sine transforms.For degenerated spectra, EOFs mix different wave numbers.
The product of three DFTs must also be constrained by the circular symmetry; i.e., the moment T X i+l ,k T X i+l ,k T X i+l ,k shall be independent of l, leading to It means that for k, k , k = 0, the triadic correlations given by Eq. ( 18) and consequently the interaction information between the standardized DFTs T X i ,k , T X i ,k , T X i ,k can only be non-null if the wave-number sum is k + k + k = 0, which is equivalent to the resonance condition for wave numbers (Bartello, 1995).The PCs corresponding to wave numbers under triadic resonance must also exhibit non-null triadic correlations.Similar relationships can be obtained for quartic interactions: to be identified by the quartic interaction information appearing in the expansion of Eqs. ( 7) and ( 8).
There is a vast literature on resonance wave theory (RIT) in different contexts (Ziman, 1960;Ball, 1964;Hammack, 1993).Let us briefly describe triadic wave resonance.By linearizing the evolution Eq. ( 30) around the fixed point X i = F and admitting propagating wave perturbations C exp i 2π N (kX i − ωt) of small amplitude C, one obtains a dispersion relationship ω = ω(k).Then, by superposing two waves with the (wave number, frequency) pairs (k 1 , ω 1 ) and (k 2 , ω 2 ), their weakly nonlinear interaction leads to the growing of waves (k 3 , ω 3 ) satisfying the resonance conditions k 1 ±k 2 ±k 3 = 0 and ω 1 ±ω 2 ±ω 3 = 0. Here, non-null triadic correlation and synergetic interaction have the meaning of a triadic resonance, where one wave comes as the physical result of the other two.However, the triads are not perfect, because the DFTs are uncorrelated but not independent, due to quartic moments The Fourier power can be fully explained by triadic wave interactions.In fact, let us consider the evolution equation of the kth DFT by taking the DFT of Eq. ( 30) Then, multiplying Eq. (37a) by 2T X i ,−k and taking the real part, one gets where Re stands for the real part of the DFT complex number.Finally, the time average of Eq. ( 38) leads to the power of DFTs where the energy of waves (k = 0) is totally due to triadic interactions.

Principal component analysis
The aforementioned theoretical results will be tested in practice, especially in what concerns the values of triadic correlation and interaction information.First, we have integrated Eq. ( 30) for a long period of 7200 time units (∼ 98 years), after an attractor relaxation of 100 time units, using the fourth-order Runge-Kutta method and a time step t = 0.01 (∼ 1.2 h).The sampling average and variance are µ X = 2.34 and σ 2 X = 3.63 2 = 13.18,respectively, while the one-and two-shift correlations are c X,1 = 0.063 and c X,2 = −0.361,with Eq. (32a, b) verifying quite accurately.
In order to compress data and reduce data redundancy, we have used running averages ( X) of length 0.2 time units (1 day), leading to a sample of 36 000 (daily) realizations.For that period, the lag correlation (between running averages) is reduced to cor[X i (t), X i (t + 0.2)] = 0.6, producing a quite weak variance reduction since σ 2 X = 12.14, which is quite close to σ 2 X ; hence, Eq. (32a, b) still verifies with ∼ 10 % relative error.
The equation governing running averages is identical to Eq. ( 30) by adding running time covariances (equivalent Reynolds stresses) much smaller than the average terms, since the decorrelation time (∼ 2 time units) is much larger than the running time length.Therefore, the conclusions of Sect.4.2 are plausible.
A PCA over this set is produced with the corresponding EOFs W k (x) and explained variances λ k , k = 1, . .., N .As demonstrated for non-degenerated Fourier spectra, the theoretical EOFs must be sinusoidal functions of wave numbers k = 0, . .., N/2 = 20.
The graphs of the data-derived EOFs are presented in Fig. 3 (wave numbers 1-8) and Fig. 4 (wave numbers 9-20).They are quite close to sinusoidal functions and appear in pairs corresponding to wave numbers k = 1, . .., N/2 = 20, which are identified by visual inspection by counting the number of oscillations around the latitude circle.There is some wave mixing in EOF graphs due to the variance spectrum quasi-degeneracy, finite sampling and running averaging producing some spectrum convolution.
The PC variances λ k for each pair of degenerated EOFs are shown in Fig. 5 for increasing values of wave number k.The dominant mode is k = 8, quite near to the central wave number 10.Then, the variances decrease both for increasing k > 8 up to k = 20 and for decreasing k < 8 down to k = 1.From Fig. 5, we see that pairs of smaller variances tend to be degenerated; i.e., those associated with k and 21-k are quite similar.In fact, from Fig. 4, EOFs with high wave number k tend to be modulated by a smaller wave number, close to 21-k (e.g., k = 19 and k = 20).Differences between degenerated variances are quite small and are due to finite sampling effects.
The autocorrelation function (ACF) of PCs X PC,i , i = 1, . .., N is oscillating with a typical period and tends to vanish for growing time lags.In Fig. 5, we present the decorrelation time (in time units), given by the lag beyond which the absolute value of autocorrelation is less than 0.05.It is higher for the most variant PCs and generally decreases with the PC variances, which also occurs with ACF periods.Since the largest decorrelation time is ∼ 7.5 units, the number of temporal degrees of freedom is N dof ∼ 7200/7.5 = 960.Figure 6.Scatterplots of the dominant wave number P RS (p, q, r) and the wave resonance number δ RS (p, q, r) as a function of the triadic correlation C RS (p, q, r) for each triad of PCs.

Triadic correlation and wave resonance
In order to assess the possibility of triadic resonance, we compute the triadic correlations from Eq. ( 18) between PCs associated with a triad of wavelengths p, q, r.However, the wave phase must be the appropriate one for resonance; hence, we take the maximum from the six possible triadic correlations by considering the pairs of degenerated PCs associated with each wave number.Let us denote by w i the wave number associated with the ith PC (quite clear in our case), where i grows for decreasing variance and, by X u,i , the standardized ith PC.That maximum is defined for a triad (p, q, r) of wave numbers as C RS (p, q, r) ≡ max Moreover, in order to study the dependence of correlations on wave numbers, we present the wave number of the PC of maximum variance in the triad P RS (p, q, r) = arg max w i =p,w j =q,w k =r (λ i , λ j , λ k ) and the wave resonance number The wave resonance condition holds for δ RS (p, q, r) = 0.The scatterplot of P RS (p, q, r) and δ RS (p, q, r) against C RS (p, q, r) for all combinations p, q, r is shown in Fig. 6.The 95 % confidence threshold of significant correlation is 1.96/ √ N dof = 0.063.From values of P RS , we conclude that, in the triads of largest C RS values, intervene the most variant PCs, i.e., with wave numbers P RS ∈ {7, 8, 9, 10} and a verifying resonance condition δ RS = 0, consistent with results of Sect.4.2.A few statistically significant values of C RS occur for δ RS = 0, which could be due to wave resonance coming from non-dominant wave numbers contaminating the EOFs.
The estimated triadic IT (see the estimator in Appendix B) and the triadic absolute correlation |cor 3 | in the validation period (second half the total period) of 18 000 samples (N dof ∼ 480) are shown in Fig. 7.The thresholds of 95 %significant correlations and ITs are ∼ 0.09 and ∼ 0.008, respectively (see Figs. B1 and B2).The three largest values of C RS (p, q, r) are 0.23 for p = q = 10, r = 20, 0.22 for p = q = 9, r = 18 and 0.21 for p = 8, q = 9, r = 17, all satisfying the WRC.Note that when p = q, one uses PCs in quadrature for the same wave number.
For |cor 3 | > 0.12, IT is generally a growing function of |cor 3 | with a quite good matching |cor 3 | ∼ cor 3eq given by Eq. ( 20), showing that the interaction information comes from the MI between one PC and the product of the remaining two.

Optimized triads
In this section, we apply the algorithm of Sect.3.2 in order to find orthogonal rotations of standardized PCs (within a given set S introduced in Sect.2.1) that maximize |cor 3 | and IT.As expected, much larger values can be obtained than when using non-rotated PCs.For that, we consider a sequence of nine encapsulated sets S 1 ⊂ S 2 ⊂ . . .⊂ S 9 of rotated PCs, each one corresponding to a set S wn−i , i = 1, . .., 9 of wave numbers chosen from the triads (p, q, r) providing ], since the approximation |cor 3 | ∼ cor 3eq verifies quite well (as in Fig. 7 for high correlations).
As previously discussed, the triads are not perfect, due to the lack of statistical dyadic independence between PCs and of their rotations.Therefore, in order to evaluate how triads (Y 1 , Y 2 , Y 3 ) of rotated normalized PCs approximate to asymmetric triads, we estimate the values I 3−as2 and I 3−as1 (Eq.10a and b) in the validation period (see Fig. 8), by measuring the maximal values of the types I (Y i , Y j |Y k ) and I [(Y i , Y j ), Y k ], with i, j, k being a permutation of 1, 2, 3.The non-Gaussian dyadic MIs come from high-order cumulants that cannot be vanished by rotations alone.The total MI I i is also evaluated, also growing with a concomitant triadic correlation.
As explained in Sect.2.6, a high value of |cor 3 (Y 1 , Y 2 , Y 3 )| means a high value of the absolute correlation between the sign of any product, sgn(Y i Y j ), and the sign of the remaining variable, sgn(Y k ).This means that the binary random variables Y 1c = sgn(Y 1 ), Y 2c = sgn(Y 2 ), and Y 3c = sgn(Y 3 ) tend also to exhibit high interaction information, with the bulk of probabilities disposed in the entries of a two-category Latin square.
In order to show that, we estimate the discrete version of the interaction information, I 3c ≡ I 3 (Y 1,c , Y 2,c , Y 3,c ) (Eq. 22), in the validation period for the different sets (see Fig. 8).All the estimated information measures grow consistently with the size of the optimization space.Now, we study the physical meaning of the optimized triads.Therefore, for a generic PC-set S (within the selected ones S k , k = 1, 2, 3, 5, 7, 9), we plot in Fig. 9 the loadings given by Eq. ( 1): where W i,j is the j th value multiplying the perturbed value of the original field at the ith longitude.
For the sets S 1 and S 2 , the loadings cover the whole latitude circle, whereas for S 3 , S 5 , S 7 , and S 9 , they are quite restricted to local segments.In all cases, the loading fields oscillate approximately in sinusoidal form within a revelant interval (circular or local) with the number of oscillations p k , k = 1, 2, 3 satisfying the resonance condition.For example, for sets S 5 , S 7 , and S 9 , we obtain triadic resonant local wave    Despite the pair-wise uncorrelatedness, the three variables as a whole are clearly not independent.In order to highlight that, we show in Fig. 11 the normalized histogram (in logarithmic scale) of the product of mutually uncorrelated variables: Y 1 Y 2 Y 3 .The PDF of that product is highly asymmetric and leptokurtic (higher kurtosis than for a Gaussian), presenting negative values of the average and skewness, respectively, −0.44 and −2.66, and a kurtosis excess of 10.16.That skewness means the occurrence of quite enhanced conditions of local resonance.The triadic correlation has a consistent value of cor 3 [Y 1 (t), Y 2 (t), Y 3 (t)] ∼ −0.44.
That negative triadic correlation is compatible with the fact that the signs of (Y 1 , Y 2 , Y 3 ) lie most of the time in one of the four octets, (+,+,−), (+,−,+), (−,+,+), and (−,−,−), each one occurring with probabilities ∼ 17 % over a total of 68 %.Those states alternate randomly throughout the time series, as is clear in Fig. 10.In about ∼ 51 % of the time, one of the three variable lies in the negative phase, while the two others are positively phased.When one of the octets occurs, the Boolean exclusive disjunction sgn(Y 1 ) ⊕ sgn(Y 2 ) = sgn(Y 3 ) holds, corresponding to a Latin square of N c = 2 categories as described in Sect.2.6.The other four octets are more infrequent, occurring with ∼ 8 % of probability each.The most probable octets lie near local maxima of the joint PDF ρ Y 1 ,Y 2 ,Y 3 of the optimized triad, as deduced from the PDF iso-surface shown in Fig. 12.The twisted PDF isosurface presents similarities to those of a typical perfect non-Gaussian triad as in Fig. 2. Finally, Theorem 1 given by Eq. (3), about the invariance of total negentropy over the rotation space, states in this case that, while the optimized triad in wave resonance (corresponding to Y proj ) maximizes NGty, the remaining PC components not included in the triad tend to be closer to Gaussians.

Discussion and conclusions
Multivariate non-Gaussian PDFs of climate and geofluid data allow for the possibility of nonlinear correlations between two, three or even more mutually linearly uncorrelated variables.Those non-trivial correlations, or interactions in lato sensu, make variables statistically dependent, leading to positive mutual information and multi-information values, shared by two, three or more variables.The triadic MII, among three variables, can be totally explained by pairs (redundant triad) or, in the opposite situation, by tripartite synergies (perfect triad), i.e., by an emerging property across the three pair-wise independent variables (e.g., a triangle of lovers who meet together only in pairs and in an equitable way).In that situation, the MII is reduced to the interaction information (IT), which in general measures the synergy level between variables, after excluding information coming from proper sets of those variables.Perfect triads are impossible in the Gaussian world, since pair-wise non-correlation is sufficient for global independence.An example of a perfect triad between binary random variables (Y 1 , Y 2 , Y 3 ) occurs when any of the variables is the result of a Boolean exclusive disjunction and/or a logic equivalence of the two others.In both cases, the look-up table of results is a Latin square (in which categories are all used but repeated, either in rows or in columns) under which IT is maximal.
For continuous uncorrelated variables, used in the paper, IT expands as a function of cross-cumulants of order greater than 3 (vanishing for Gaussian PDFs), and mixes the three variables together.The occurrence of perfect triads is quite extreme and difficult to observe in nature because of the constraints of total pair-wise independence, especially in systems of high dimensionality where time averages or interaction averaging tends to render variables more Gaussian due to the central limit theorem.Consequently, nonlinear correlations, non-Gaussianity and triadic interactions are somehow hidden in a multi-dimensional system; hence, they can only be put into evidence by variable transformations and projections.
Therefore, in order to optimize triads and to maximize their IT, we develop and test a methodology of finding highly synergetic non-Gaussian triads (Y 1 , Y 2 , Y 3 ) in the space of orthogonal rotations of uncorrelated spherized variables (e.g., normalized PCs).Uncorrelatedness and total negentropy are kept invariant over that rotation space (Theorem 1).The direct estimation of IT for arbitrary rotated variables issued from a high-dimensional space is quite difficult to write and to estimate due to the "curse of dimensionality".Therefore, instead of directly maximizing IT, we maximize a proxy of that relying on cross-cumulants mixing the three variables, here, the square of the cumulant E(Y 1 Y 2 Y 3 ).This has the advantage of being proportional to a geometrically intuitive triadic correlation cor 3 (Y 1 , Y 2 , Y 3 ) between uncorrelated variables and of satisfying the Schwartz inequality.Maximization of the control function is performed by optimization through gradient-descent-based methods in the space of orthogonal rotations, totally spanned by generalized Euler angles of rotation and used as control variables.
In the application, we have estimated and optimized triads for the output of the Lorenz-95 minimal fluid motion model.It is a one-dimensional model with nonlinear advection, linear dissipation and constant forcing, integrated along a latitude circle.Circular symmetry of statistics leads to a set of twice-degenerated empirical orthogonal functions that are sinusoids, each one with a certain associated wave number.Therefore, PCs are discrete Fourier transforms (DFTs).Nonnull statistically significant values of E(Y 1 Y 2 Y 3 ) and estimated IT are obtained when the integer zonal wave numbers k 1 , k 2 , and k 3 satisfy the wave resonance condition (WRC) k 1 ± k 2 ± k 3 = 0, thus providing a physical meaning to the non-Gaussian statistical triads in the studied case.For the studied model, high IT values are quite closely approximated by an exclusive function of cor 3 (Y 1 , Y 2 , Y 3 ).
Enhanced triads, issued from sets of rotated PCs satisfying WRC, are thus obtained using optimization methods.Triad components are obtained as the inner product of loading fields of sinusoidal type, locally confined to the segments of the latitude circle and also satisfying WRC between the corresponding number of oscillations (the local wave numbers).Consequently, triadic interaction corresponds here to a generalized criterion and diagnostic of triadic wave-train resonance, which comes totally out of the classical approach of the resonance interaction theory (RIT) (Hammack, 1993), issued from the theory of linear and quasi-linear waves in fluids.The statistically based triad optimization method is therefore able to extract spells of triadic wave resonance behavior from a fully chaotic regime and the modes where that behavior is more intense on average.This aspect is particularly relevant for the use of triadic components or functions of them (e.g., the triadic product Y 1 Y 2 Y 3 ) and their temporal memory in schemes to potentially improve predictability.This is because resonance can be seen as a source of predictability of a certain oscillatory behavior, coming either from an external forcing with an appropriate resonance frequency or from the interaction between internal field waves.Moreover, triads' components can be used for better describing non-Gaussian distortions of the PDF of a system's attractor with respect to the fitting multivariate Gaussian ellipsoid with the leading EOFs as axes, and where the bulk of probability lies.Therefore, nonlinear data explanatory variables (e.g., climatic and large-scale indices), given as a function of triadic components, like Y 1 + c 1 Y 2 Y 3 , where c are appropriate constants, can potentially be useful in forecasting and downscaling schemes, which deserves to be tested in further studies.
Finally, the method of triads' optimization is quite general and applies to any multivariate random vector.In particular, for climatic fields over the globe, triads generalize the classical concept of teleconnections relying on far-distant correlations with the new concept of non-Gaussian triadic teleconnections where three far-distant uncorrelated variables play, which is reserved for works in preparation.
C. A. L. Pires and R. A. P. Perdigão: Non-Gaussian interaction information ρ U g (U g ) = N −1 dof N dof l=1 K(U g , U g,l ), with K(U g , U g,l ) = 1 (2π ) D/2 h D [det(S g )] 1/2 exp − (U g − U g,l ) S −1 g (U g − U g,l ) 2h 2 , (B3) satisfying the normalization R D K(U g , U g,l )dU g,l = 1, where U g,l = (U g,1l , U g,2l , . .., U g,Dl ) , l = 1, . .., N dof are the N dof realizations of Gaussianized vectors expanded in terms of the D components, S g is the D×D estimated covariance matrix of U g and h = {4/[N dof (D + 2)]} [1/(D+4)] is the optimal Gaussian bandwidth.The MII is a D-dimensional multiple integral that is evaluated here through the Gauss quadrature integration formula using the Hermite weight function.This is written for a generic integrable function F (U g ) ≡ F (U g,1 , . .., U g,D ) as . . .
where Q is the number of quadrature points, Q 1 , . .., Q Q ∈ R, and W 1 , . .., W Q are the weights.The formula is exact for F (U g ) exp(− U g 2 ), being a multivariate polynomial of degree less than 2Q.For the MII estimation, F (U g ) = ρ U g (U g ) ln[ρ U g (U g )] is used in the quadrature formula where the PDF was previously estimated.Thus, the used estimator is written as where Q ≡ (Q i 1 , . .., Q i D ) (Eq.B5) is a D-dimensional sum indexed by D indices i 1 , . .., i D .For this kind of integrand discrimination, we have taken Q = 20.
In the paper, the dimension is D = 2 or 3.This estimator has shown fairly good convergence speed for PDFs close to multivariate Gaussians or that do not have sharp gradients or discontinuities.Furthermore, the estimator's performance is better under small values of compactness (i.e., data concentration along a manifold), since it leads to smoother PDFs.The biases of I Quad are evaluated here for the case where the standardized rotated vector (of uncorrelated components) U follows a multivariate isotropic standard Gaussian N (0, I D ) leading to the vanishing of MII.Biases shall not differ much when the true PDF is close to Gaussian.For that purpose, we have used the Monte Carlo methodology by generating an ensemble of 1000 samples of N dof iid realizations of U .Then, statistics (95 %-quantile Q 95 and the average or bias B) of I Quad at dimension D = 2 and 3 were computed from the ensemble.Those statistics are obtained for the multiplicative sequence N dof = 25, 50, 100, 200, 400, 800, 1600, 3200, 6400 and 12 800, collected altogether in the log-log graphic of Fig. B1.The 95 % quantile gives the 5 % significance-level uni-directional threshold of rejection of the null hypothesis H 0 supposing that the standardized rotated vector (of uncorrelated components) U is composed of totally independent components.Then, only the values above are considered statistically significant.Apparently there is a good linear logarithmic fit both for B and Q 95 of the type C/N m dof where C and m are positive constants.The bias B of the interaction information (IT) I 3 (U 3 ), U 3 ∈ R 3 is also computed, along with the quantile Q 95 appearing in Fig. B2.It is interesting to note that, for small (large) N dof , the bias of IT is negative (positive), overestimating redundancy (synergy).
Finally, let us write Y rot ≡ Y proj ||Y comp , where Y proj ≡ P D Y rot = (Y 1 , . .., Y D ) is the D-dimensional projection (D-tuple) onto the leading D components and Y comp ≡ (Y D+1 , . .., Y N rot ) is the correspondent orthogonal projection.

Figure 1 .
Figure 1.Schemes of triad types and appropriate interactivity measures: (a) perfect, I 3 (A i , A j , A k ); (b) asymmetric, I (A j , A k |A i ); (c) asymmetric, I [(A j , A k ), A i ]; and (d) redundant, I (A j , A k , A i ).The triangle edges represent dyadic dependencies and the central star the IT.

Figure 2 .
Figure 2. Iso-surface 0.01 of the PDF (13) of a perfect triad with 1-D and 2-D Gaussian projections.
both the IT and MII come essentially from the non-Gaussian term, which is justified by Theorem 3: assuming that the absolute Gaussian correlations of (Y 1 , Y 2 , Y 3 ) are scaled as |cor(Y ig , Y j g )| ≡ c gi,j = Nonlin.Processes Geophys., 22, 87-108, 2015 www.nonlin-processes-geophys.net/22/87/2015/ O(δ) 1, we get with each one R k rotating a different set of d(k), k = 1, . .., N degenerated DFTs, i.e., with the same variance p(k), taken from P X , producing the same number d(k) of degenerated, equally variant PCs with the same variance λ(k) = (1/N )p(k).Let us arrange = 1 + . . .+ N as a sum of matrices, each one corresponding to the concatenation of the d(k) rotated exponential complex vectors.Since R k R * k = I N δ k,k and * k k = NI N δ k,k , one can arrange the EOF matrix as W = W 1 + . . .+ W N with

Figure 5 .
Figure 5. PC variances and decorrelation times as a function of corresponding wave numbers.

Figure 7 .
Figure 7. Triadic correlation (open circles) and equivalent triadic correlation (solid line) for each triad of PCs.
the higher C RS values.Let us denote DS wn−i ≡ S wn−i /S wn−(i−1) , i = 2, . .., 9.Then, after analyzing triadic correlations among PCs, we obtain S wn−1 = {10, 20}, DS wn−2 = {9, 18}, DS wn−3 = {8, 17}, DS wn−4 = {7}, DS wn−5 = {11}, DS wn−6 = {16}, DS wn−7 = {12}, DS wn−8 = {19}, and DS wn−9 = {15}.The first and last sets have 4 and 24 PCs, with the spaces of rotations spanned by 6 and 276 Euler angles, respectively.In order to get the maximum value of the control function F opIT (R) (Eq.25), proportional to |cor 3 | 2 , we use a set of N trial = 40 first guesses of the Euler angles.The number of local maxima of F opIT increases in general with the size of space of spanning PCs: S wn−i .Figure 8 shows the maximum value of |cor 3 |, optimized over rotations and for each set S wn−i .That is computed in the validation period by using the optimized rotation matrix R (see Sect. 2.1) calibrated in the training period (the first half of the total period).Let us denote that maximum as |cor 3 | max val,i .The rotation coefficients are computed in the calibration and then applied in the validation period.As expected, |cor 3 | max val,i grows with the rotation space dimension, reaching the value ∼ 0.44.The values are also very close to those computed in the training (not shown), showing that rotation coefficients are robust and not subjected to overfitting regarding the number of calibrated Euler angles.Let (Y 1 , Y 2 , Y 3 ) be the optimally rotated normalized PCs for each set.Figure 8 also shows the IT values I 3 (Y 1 , Y 2 , Y 3 ), which are largely due to the non-Gaussian part, since the Gaussian part is ∼ 10 −5 .IT grows with the rotation space dimension where I 3 ∼ −1/2 log[1 − (cor 3 ) 2

Figure 10 .
Figure 10.Time series of the triad components during an interval of 20 time units for the triad optimized with set 9 of PCs.

Figure 11 .
Figure 11.Normalized histogram (in logarithmic scale) of the product of optimized triad components for the triad optimized with set 9 of PCs.

Figure 12 .
Figure 12.Iso-surface 0.001 of the PDF of triad components for the optimized triad with set 9 of PCs.Marginals are Gaussianized.

Table 1 .
Lookup tables of three Boolean variables maximizing the interaction information.The left and right (2 × 2) logic tables correspond, respectively, to the exclusive disjunction and equivalency Boolean relationships.