Nonlinear Processes in Geophysics Weighted complex networks applied to seismicity : the Itoiz dam ( Northern Spain )

Abstract. On 18 September 2004, an earthquake of magnitude mbLg = 4.6 was recorded near the Itoiz dam (Northern Spain). It occurred after the first impoundment of the reservoir and has been catalogued by some authors as induced seismicity. We analyzed the seismicity in the region as weighted complex networks and tried to differentiate this event from others that occurred nearby. We calculated the main topological features of the networks formed by the seismic clusters and compared them. We compared the results with a series of simulations, and showed that the clusters were better modelled with the Epidemic-Type Aftershock Sequence (ETAS) model than with random models. We found that the properties of the different clusters are grouped according to the magnitude of the main shocks and the number of events in each cluster, and that no distinct feature could be obtained for the 18 September 2004 series. We found that the nodes with the highest strength are the most important in the networks' traffic, and are associated with the events with the highest magnitude within the clusters.


Introduction
Earthquakes are one of the most interesting natural phenomena that can be described as complex systems.A key ingredient of a complex system is the non-linear interaction between its constituents, which under special circumstances can give rise to coherent, emergent, complex behaviour patterns with a very rich structure.A way to study these structures is to describe the system as a complex network, and analyze the topological characteristics it forms.
Correspondence to: A. Jiménez (ajlloret@ual.es)In the past few years, much attention has been focused on the study of complex networks, such as the World Wide Web (WWW), the Internet, scientific collaboration networks, worldwide airport networks, etc.Researchers have mainly focused on unweighted networks, that is, networks in which every pair of nodes in the network are either connected or not connected, with weights of 1 or 0, respectively.However, many technological, biological and social systems are best described by weighted networks, whose properties and dynamics depend not only on their structures but also on the weight of the connections between the nodes (Chen and Chen, 2007).Even for purely unweighted graphs, edge weights naturally emerge as dynamical properties, when transport, random walks or other processes take place on the network (Tadic et al., 2007;Barrat et al., 2008).
In seismology, several models based on complex networks have been proposed (Abe andSuzuki, 2004a, b, 2006;Baiesi andPaczuski, 2004, 2005;Davidsen, 2008;Jiménez et al., 2008Jiménez et al., , 2009)).However, these models are unweighted complex networks.In our research we have used a weighted complex network, which describes the seismicity in a more realistic way than an unweighted network, and provides more characteristics from which to gain a more complete picture of the seismicity.
The network model we propose is as follows: each earthquake represents a node, and the link weight between nodes is a distance based on the ETAS model (Ogata, 1988(Ogata, , 1998(Ogata, , 1999)).In this research, we analyzed a small region centred on the Itoiz dam (Northern Spain), which has been extensively studied since the 18 September 2004 earthquake that occurred nearby.For example, Ruiz et al. (2006) analyzed the seismic series corresponding to the 18 September 2004 earthquake; Luzón et al. (2009Luzón et al. ( , 2010) ) and Durá-Gómez and Talwani (2010) studied the effect of the pore pressure due to the impoundment of the reservoir; Santoyo et al. (2010) calculated the stress produced by the reservoir near Itoiz.They Published by Copernicus Publications on behalf of the European Geosciences Union and the American Geophysical Union.
showed that the aftershock sequence was indeed produced by the stress transfer caused by the 18 September 2004 earthquake.Rivas-Medina et al. (2011) calculated the seismic hazard near the Itoiz dam.In a previous work (Jiménez et al., 2009), we studied the seven main seismic clusters near the Itoiz dam by modelling them as unweighted complex networks.We obtained seven clusters with small-world behaviour of the complex networks they formed.We did not find any difference between the 18 September 2004 cluster and the others, except for a higher fractal dimension of the epicentre distribution.

Methods
In this work, we improve on the study in Jiménez et al. (2009), because we obtain the values for the parameters of the ETAS model in the region.These are real values which allow us to use an appropriate distance between earthquakes to decluster the catalogue.
We then analyzed each cluster as a weighted complex network, with the inverse of the ETAS distance as the weight between two events.This distance depends on the magnitude, spatial separation and time between them.We characterized the networks by their average path length and the clustering coefficient.Afterwards, we classified the clusters according to these two features by means of a k-means algorithm (Teknomo, 2007).We also calculated the distribution of weights, strength and distances, and the betweenness (Gleich, 2007).We then compared the results with the average values produced by several simulations, both with a random model (Garlaschelli, 2009) and the ETAS model (Ogata, 1998(Ogata, , 1999;;Helmstetter and Sornette, 2002).

ETAS model
The ETAS model is the most popular contemporary model for studying aftershock sequences.It is a stochastic point process model, which has a number of parameters that are usually fitted from a training period before the model can be applied for forecasting purposes.In the framework of this model, there is no difference between aftershocks, main shocks or foreshocks (Helmstetter and Sornette, 2002).The ETAS model is a point process representing the activity of earthquakes of magnitude m 0 and larger in a region over a given period of time.The model includes background activity of constant occurrence rate µ in time (i.e.stationary Poisson process) and also includes aftershocks as described below.Each earthquake, including aftershocks of another earthquake, is followed by its aftershock activity, although only aftershocks of magnitude m 0 and larger are included in the data.The aftershock activity is represented by a nonstationary Poisson process according to the modified Omori formula in such a way that the occurrence rate (ν i (t)) of aftershocks at time t following the i-th earthquake (t i , m 0 ) is given by (Ogata, 1998): for t > t i , where the parameters K, α, c, p are constants common to all i, and m 0 is the minimum magnitude that produces an aftershock sequence.The rate of occurrence of the whole earthquake series at time t, called the conditional intensity function based on the history of occurrence H t ={(t i , m i ); t i < t}, then becomes: where µ is the background seismicity.As extensions of the ETAS model we confined ourselves to space-time response functions g (t,x,y;m) such that the superposed conditional intensity reads (Ogata, 1998): where A is the area of the study region.Note that m in Eqs.
(2-3) enters only as a parameter, not a variable.We used the response function in such a way that spatial dependency was separable from dependency on magnitude and time.We did this in order to simulate the seismicity as in Helmstetter and Sornette (2002), where the terms relating to time, space and magnitude are independent.For this purpose, we made the following hypothesis (Ogata, 1998): where x and y are the spatial coordinates of the event in 2-D.
In order to choose the best model we followed the maximum likelihood procedure described in detail by Ogata (1998).
We used a Genetic algorithm (GA) to maximize the likelihood of the model.We improved the method proposed by Ogata (1998Ogata ( , 1999) ) to obtain the ETAS parameters, because the GA is a more efficient way of calculating the optimum.GA are methods of global optimization, which have proved to be effective when the models are described by a few parameters, the problem is nonlocal (the global optimum is needed, but there are many local optima) and nonlinear, and there is no a priori knowledge of the behaviour of the function.In geophysics, and particularly in seismology, many problems often have such features.The GA used in this paper was implemented in Jiménez et al. (2005).The definitive search strategy chosen was selection by rank wheel, crossover based on fitness, and replacement by rank wheel.This strategy is moderately elitist, because the best individuals are easily selected, but there is low selection pressure.The simple GA was improved with the reinitialization of the Nonlin.Processes Geophys., 18, 477-487, 2011 www.nonlin-processes-geophys.net/18/477/2011/ population when the convergence stays blocked (Jiménez et al., 2005).Thanks to the large number of models considered, in addition to the fittest model, a mean model and its accuracy were evaluated by means of a statistical approach based on the estimation of the Marginal Posterior Probability Density (MPPD) (Dal Moro et al., 2007).The averaged model was calculated as: where θ j is the model parameter j , θ ij is the value of the parameter in the evaluated model i, and ρ(i) is the function: Where the fitness is the log-likelihood calculated for each model, following Ogata (1998).The standard deviation was calculated as: (7)

Declustering algorithm
The purpose of declustering the catalogue is to find the main clusters in it.Window-based or link-based methods are traditionally used for declustering a seismic catalogue or identifying earthquake clusters (Baiesi and Paczuski, 2005).The main problem is to identify which earthquakes are correlated, because there is no single operational way of distinguishing between aftershocks and main shocks (Zhuang et al., 2004).
Several methods based on the ETAS model have been proposed for declustering the catalogues (Zhuang et al., 2004;Jiménez et al., 2009;Console et al., 2010).We followed the method proposed in Jiménez et al. (2009), where the distance between earthquakes is given by the inverse of the probability of two earthquakes being correlated.In the present work, it is given by the function g , Eq. ( 4), with the parameters obtained previously for the region.We began by calculating these distances between all pairs of earthquakes and then used them to create a hierarchical cluster tree where two events are linked if their correlation is higher than a certain threshold.In order to group individual earthquakes to clusters, we used the single linkage method.The single linkage method is a fundamental agglomerative hierarchical clustering algorithm.By truncating the resulting cluster tree at a suitable threshold value for the distance being studied, a complex network was obtained in which each earthquake represents a single node.The threshold was chosen in such a way that the number of clusters was in agreement with the value obtained for the background seismicity µ.The estimation of the background seismicity following the ETAS model gives us a natural way of choosing the threshold necessary to decluster the catalogue.In a previous work (Jiménez et al., 2009) this step was a little arbitrary, since we chose the threshold that gave us more than one cluster.When the number of points is large, this way of finding the main clusters is less time-consuming than, for example, the k-means algorithm (MacQueen, 1967).It is also unique, unlike the k-means algorithm, which depends on the initial clusters' centres.

Characterization of weighted complex networks
Here we describe the most important parameters that describe a complex network.In an unweighted network, an important feature is the degree distribution, k i : where A is the adjacency matrix and N is the number of nodes.For a weighted network, it is important to know the distribution of the weights, w i,j , which replace the Boolean numbers A i,j , and the strengths, s i , that replace the degree of the node.The higher the weight and strength, the more closely related are the two nodes.We use g (Eq.4) as the weight between the nodes.Two of the most important quantities required to characterize a complex network are the average path length and the clustering coefficient.
The average (or characteristic) path length L is the mean length of the shortest paths (expressed in terms of the number of edges) connecting any two nodes on the graph.The shortest path between a pair (i, j ) of nodes in a network is considered as their geodesic distance G ij , with a mean geodesic distance L of: where N is the number of nodes.
In order to calculate the average path length, we used the distance given by 1/g , and put it into the Dijkstra algorithm (Dijkstra, 1959).We used the implementation given in Kay (2001).
The other important quantity, the clustering coefficient, was calculated as in Onnela et al. (2005): where ŵji = w ij /max i,j (w ij ), and k i equals N if all the nodes are connected, as is the case here.With this definition, C has the same value if the weights become binary, it is defined between 0 and 1, uses a global normalization, takes into account weights of all edges in the triangle and is invariant to weight permutation for one triangle (Saramäki et al., 2007).
To further characterize the complex network, we also calculated the betweenness distribution.This was first proposed by Freeman (1977), and represents the total number of optimal paths between any pairs of nodes passing through one node (Park et al., 2004).More succinctly (Freeman, 1977): where σ st is the number of shortest paths from s to t, and σ st (v) is the number of shortest paths from s to t that pass through a vertex v.We used the code implemented by Gleich (2007) in order to calculate it.

Network model: random weighted networks and ETAS simulation
In order to classify the seismic networks derived in this work according to their main topological features, they can be compared with the corresponding values of characteristic network theoretic measures obtained for comparable model networks, i.e. model networks with the same number of nodes as the original seismic networks.For this purpose, we will use particularly weighted random networks and simulations of the ETAS model.Since the particular network pattern of both types of models depends on random realizations of links, the properties of the resulting networks may differ between individual realizations.In order to consider this in our analysis, 100 realizations were studied in each case to obtain estimates for the expectation values and 95 % confidence levels (the latter were approximated by ±1.96 times the standard deviation of the respective values of the network-theoretic measures obtained for the individual realizations).
The Erds-Rényi (ER) random graph (Erdős and Rényi, 1959, 1960, 1961) is the prototype of all unweighted network models: in a graph with N vertices, an unweighted edge is drawn independently between any pair of vertices with equal probability P .The ER model provides a fundamental reference for the properties of real networks.The increasing interest in complex networks originates precisely because of the striking difference between the observed properties of real networks and the behaviour of the ER model (Garlaschelli, 2009).So, we needed a random reference model to enable us to compare its properties with those of the networks we obtained.Garlaschelli (2009) proposed a procedure to construct a weighted random model as follows: the number of nodes is fixed.For each pair of vertices, we start a series of Bernoulli trials with success probability P , which depends on the number of links and the sum of edge weights in the real network.Each success implies that the weight is increased between the same two vertices in one unit.As soon as a failure occurs for the first time, the sequence of trials stops and a new pair of vertices is selected.The process is repeated until all pairs have been considered.We needed a more realistic simulation of the complex networks.In order to achieve this, we simulated a network with the same number of nodes by using the ETAS model.The algorithm for the simulation is described in Ogata (1998Ogata ( , 1999) ) and Helmstetter and Sornette (2002).We assumed a decoupling between magnitude, space and time, because of its simplicity.Starting with the main event in each cluster, events were simulated sequentially.First, we calculated the conditional seismic rate λ(t) defined by: The time of the following event was then determined according to the nonstationary Poisson process of conditional intensity λ(t), and its magnitude was chosen according to the Gutenberg-Richter (1956) distribution with parameter b.To determine the position in space of this new event, we first chose its mother randomly among all preceding events with a probability proportional to their rate of aftershocks evaluated at the time of the new event.Once we had chosen the mother, we generated the distance r between the new earthquake and its mother according to the spatial part of the distribution given in Eq. ( 4).The location of the new event was determined assuming an isotropic distribution of aftershocks.

Tectonical setting and data
The study area (Fig. 1) is located in a region with a complex thin-skinned structure belonging to the South Pyrenean Zone (SPZ), and takes the form of a square of 2 • ×2 • with its centre in the 111 m high Itoiz dam (42.8 • N, 1.35 • W).The SPZ forms the external part of the Pyrenean belt and is a Tertiary unit overriding to the south of the Ebro foreland basin.This unit is north-bounded by the Palaeozoic Axial Zone (PAZ), which outcrops here as a series of individual massifs, known as the Palaeozoic Basque Massifs (PBM).The North Pyrenean Fault (NPF) is located between the PAZ and the North Pyrenean Zone -a Mesozoic unit overthrusting the Aquitaine foreland basin to the north.This fault is a major tectonic suture running east-west along the Pyrenean range that is interpreted as the superficial expression of the Iberian and European plate boundaries.Palaeomagnetic data, seismic profiles and seismicity studies suggest that the structural boundary associated with the NPF is prolonged westward through the Basque-Cantabrian basin along the Leiza Fault.Another relevant structure in the area is the Pamplona Fault (PF) that runs NNE-SSW from the Ebro basin to the PBM.It has been interpreted as a deep transverse structure separating two different structural zones, the SPZ to the east, where the most important structures mainly trend south, and the Basque-Cantabrian basin to the west, with a thicker lower Cretaceous sequence, where most structures trend northward.This structure acted as an extensional transfer fault during the Mesozoic extensional period that led to the opening of the Gulf of Biscay and the separation of the Iberian Peninsula as a subplate.The PF was subsequently involved in the Tertiary compression responsible for the Pyrenean belt uplift.The epicentral region is a Mesozoic and Tertiary cover area, located in the NE of the Pamplona basin, and composed of anticlines and synclines with the axes trending to the east, but truncated in some places by E-W to ESE-WNW fault systems (Ruiz et al., 2006).
The present day seismicity of the western Pyrenees, as reported from permanent networks, is rather moderate and mainly concentrated on the French side.In the past few decades it has been characterized by events of magnitudes up to 5.5.This activity follows an E-W oriented strip 150 km long and 5 to 15 km wide, starting at the western edge of the NPF, at a longitude of 0.1 • W, and continuing through the PBM along the Leiza Fault.Historical seismicity also appears concentrated in the border region, along the PAZ and the PBM.Six destructive events with intensity greater than VII have been reported in this area over the last 200 yr approximately.Instrumental catalogues of the westernmost Pyrenean edge show a sparse and moderate to low magnitude activity, mainly concentrated westward of Pamplona, where events are related to the central segment of the Pamplona Fault and to the Aralar thrust.An E-W seismicity belt is also found related to the Roncesvalles thrust that separates the Palaeozoic Aldudes massif from the Pamplona basin.The catalogues also report a few events related to the E-W thrust systems delineating the contact between the Pamplona and Ebro basins (Ruiz et al., 2006).
The Itoiz dam (42.80 • N, 1.36 • W) is located in Navarre, Northern Spain, 2 km north of the village of Aoiz and 25 km east of Pamplona, Construction was completed in 2003.In January 2004 its impoundment began and 8 months later, on 18 September a clustered seismic series occurred.The epicentre was located between the city of Pamplona and the Itoiz reservoir.The main shock (mbLg = 4.6, 42.85 • N, 1.45 • W) and the largest aftershock (the 30 September 2004 event) were widely felt in this region causing widespread unease amongst the general public.The Itoiz reservoir has a maximum capacity of 418 hm 3 and, once filled, covers a maximum surface area of 510 km 2 .It was constructed for irrigation and electricity generation purposes.In September 2004, the reservoir was only 15 % full, after beginning impounding water in January 2004 (Ruiz et al., 2006).
The data used were recorded by the Instituto Geográfico Nacional (IGN, 2010), and cover earthquakes from 1999 to 2008.In total, 2350 earthquakes with a magnitude greater than 1 were observed.The location of the data does not include the depth of the earthquakes.The Gutenberg-Richter distribution ( 1956) has b = 1.14, with error bound equal to 0.03.This value differs slightly from that used in Jiménez et al. (2009).The difference is due to the fact that in the present work we calculated it by fitting of the cumulative distribution, and not by using the probability density function as we did in Jiménez et al. (2009).We did this because we wanted to simulate the networks as far as possible according to the method proposed by Ogata (1998).

Results
Following the procedure detailed in Sect.2, we obtained the main clusters in the region.Afterwards, we translated the clusters into the language of complex networks: each node represents an event, and the weight between each pair of nodes is given by the inverse of the distance between the two earthquakes.This distance is based upon the ETAS model, so that it depends on the magnitude of the first earthquake, the time interval and the spatial separation between events.A larger distance represents a lower probability of two earthquakes being connected, so the weight of the link is lower.Ruiz et al. (2006) also calculated some of the parameters of the ETAS model related to time and magnitude for the 18 September 2004 series, but for that cluster only, not for the region as a whole.Here, we calculated the ETAS parameters for the region of interest.The intervals of the search were the following: p = [0.8,1.2], α = [0.5,1.5],c = [10 −3 ,10 −2 ], d = [0.1,20],q = [0,2], K = [10 −9 ,10 −4 ], and µ = [10 −6 ,10 −4 ].The resolution for each parameter was (in bits, so 2 to the power of the given resolution), 6, 6, 6, 7, 4, 15, 6, respectively.The intervals were chosen on the basis of the values obtained by Ogata (1998) for a variety of regions.The result of the maximization of the log-likelihood of the model in Eq. ( 4) Ogata (1998)   The Akaike Information Criterion (AIC) (Akaike, 1974) of the best fit is 1374, which is a low value if we compare it with the ones obtained in Ogata (1998).The average values coincide almost completely with the best fit, and are the same for the resolution given in Table 1.The error is given by the calculation of the Marginal Posterior Probability Density (MPPD) error bound (Dal Moro et al., 2007).The best fit was found in the second generation, and was stable until the 100th generation.In Fig. 2 we can see the difference between the real data and the model.We can see that the ETAS model underestimates the number of events.We are very confident that the GA reaches the best fit, so perhaps even the ETAS model does not fit our data set sufficiently well.We now roughly estimated the errors in the simulations, following Wang et al. (2010).First we generated the spontaneous (background) events that are uniformly distributed in the space-time window.The total number of spontaneous events is a Poisson random variable with a mean of µ T A, and the magnitudes are generated from the truncated Gutenberg-Richter magnitude distribution.This is the first generation.We then proceeded to generate the offspring events, by calculating the aftershock sequence for each event in the first generation, with a number of events equal to a Poisson random variable with mean Ke α(m−m 0 ) T A. The events were generated according to the algorithm proposed by Ogata (1998).This is the second generation.We then continued with the next generation until we reached a generation with no offspring or we reached the maximum number of events in the real catalogue.We then calculated the errors of the simulations as in Note that these errors are calculated for the simulations, and they are not related to the actual error of the fits.They are shown in Table 1.We could only perform 51 instances of the catalogue, with a maximum number of generations equal to 10 for the GA, because of computation limits.We observed that the errors are high, and this is reflected in the errors found in Table 4 when we use the ETAS simulation to compare it with the real networks.
With the ETAS model parameters obtained previously, we proceeded to decluster the catalogue.We wanted to find the threshold in the distances that gives us a number of groups in agreement with the quantity t µT A, with T being the time interval covered by the catalogue, A the area of the region and µ the background seismicity in the ETAS model.With our data set, the quantity µT A is 447.We use a simple link algorithm with the threshold 1/g = 2 • 10 6 , that gives us 440 clusters in our catalogue.
In order to ensure that we had good statistics for the analysis of the weighted complex networks formed, we only chose the clusters with more than 20 events, which are described in Table 2.The clusters are named after the earthquake with the highest magnitude.In Fig. 1 we show the main shocks of the clusters analyzed.The clusters corresponding to the 18 September 2004 and the 24 November 2007 earthquakes coincide with two of the three clusters with more than 50 events described in Jiménez et al. (2009).The other one, corresponding to the 2 April 2007, m = 3.1 earthquake does not appear in the present work, probably because of its lower magnitude.We can also see that the cluster corresponding to the 21 February 2002, m = 3.8 earthquake has fewer events.
In our previous research (Jiménez et al., 2009) this cluster contained 34 events, instead of the 22 it has here.As can be seen from the results, only the three major earthquakes are listed in both works.The other clusters found in Jiménez et al. (2009) seem to be less important in the new approach.Ruiz et al. (2006) found different families within the 18 September 2004 series, based upon the different characteristics of the waves in the recorded seismogram.We did not find these families, mainly because we did not use the waveforms and instead used the spatio-temporal distribution of the earthquakes in the catalogue.Ruiz et al. (2006) used a spatio-temporal window in order to decide if an event belongs to the aftershock sequence or not.In contrast, we used a distance based on the ETAS model obtained for the whole region, which depends on the magnitude, the time interval, and the spatial separation between events.
In order to test the influence of the threshold magnitude for the calculation of the clusters, we applied the method to the same catalogue but taking into account only the events with a magnitude of more than 2. With that condition, we have a catalogue with 455 earthquakes.So, we began by calculating the ETAS parameters that best fit the data and obtained the values given in Table 3.The AIC is 907.We had to perform the fitting part again because the values of the ETAS parameters vary when different threshold magnitudes are applied (Ogata, 1998).So, once we had the new values, we applied the link algorithm with the new distance.Now the quantity µT A was 167.With a threshold distance equal to 1.5•10 7 , we obtained 166 clusters.We found two main clusters with more than 20 events: the 11 December 2002 and the 18 September 2004.These correspond to the two larger clusters found by using a magnitude threshold of 1.The new clusters have less events that the number of earthquakes with magnitude higher than 2 in the original clusters.We observed that 24 % (18 September 2004) and 33 % (11 December 2002) of the events in the original clusters were not found in the new clusters.This is due to the fact that there are some earthquakes linked to others by events with magnitudes of between 1 and 2. If we compare the events in the cluster with this new magnitude threshold and the events with a magnitude higher than 2 in the clusters with a magnitude threshold higher than 1, we can see that 99 % of them are the same for the 18 September 2004 earthquake, and 76 % for the 11 December 2002 earthquake.This shows that the main influence in the catalogue is due to the 18 September 2004 event, and the method fits this main event better.
We now studied the properties of the weighted complex networks we obtained, in particular, the distribution of the distances, weights and strengths of the nodes.The strength of a node is the sum of all the weights of its connections with others.It represents the importance of the node in the whole network.
The distribution of ETAS distances in the clusters seems to follow a power law distribution, with exponents that range from −1.6 to 2.6.We used the programs implemented by Clauset et al. (2009) in order to calculate these exponents and other values that characterize the goodness of the power law hypothesis.The results show that the hypothesis of a power law distance is very plausible.However, the number of points that follow the power law is very small, and the total number of points is not enough to allow us to obtain good statistics.Finite-size bias may be present.The same situation is found for the weight distribution (the exponents range from −1.4 to 1.9, but not enough points are available) and for the strength distribution.We found that the highest strength corresponds to the main shocks in all cases.Other works that use unweighted complex networks find power laws for the degree distributions (Baiesi andPaczuski, 2004, 2005;Abe and Suzuki, 2004b), a quantity equivalent to strength in these networks.In Jiménez et al. (2009) we did not find power laws for the same data set.However, in that research we did not use the method proposed by Clauset et al. (2009) for testing the power law hypothesis.
Another important characteristic of the nodes is betweenness, i.e. the total number of optimal paths between all pairs of nodes passing through one node (Park et al., 2004), in which an optimal path is the shortest geodetic distance between two nodes.The highest betweenness does not always correspond to the highest magnitude.If we take into account the 10 November 2002 earthquake, the highest betweenness www.nonlin-processes-geophys.net/18/477/2011/ Nonlin.Processes Geophys., 18, 477-487, 2011 We also tested the power law.The exponents range from −2.1 to −2.6 for the betweenness distribution.Only a few clusters could be analyzed, due to the low number of points involved.In any case, the small amount of data available means that we cannot conclude that the clusters follow a power law.
We further analyzed which links are most important, the ones with a lower weight or the ones with a higher weight.For this purpose we used the concept of metaweight (Furuya and Yakubo, 2008).We calculated the clustering coefficient following Barrat et al. (2004), as suggested in Furuya and Yakubo (2008), for the networks with weights equal to the q-th power of the weight.The behaviour with respect to q gives us the importance of weak or strong edges.This analysis can only be provided for weighted complex networks, and not for unweighted ones.We obtained a clustering coefficient (Barrat et al., 2004) very close to 0 for q > 0 and near 1 for q < 0, which means that the networks are mainly connected by weak edges (Furuya and Yakubo, 2008), and sparsely connected by strong edges.Note that this clustering coefficient is different from that used in Eq. ( 10).This is because we wanted to follow the same procedure as Furuya and Yakubo (2008).The q power enhances the weak links when q < 0, and the strong links when q > 0. So, when we obtained a clustering coefficient that was higher for q < 0 than for q > 0, it meant that the nodes had low weights between them, and only a few strong connections.This may be related to the high values for the betweenness scaling exponents found.Wang et al. (2008) showed that high values of this exponent (around 2.3) can be interpreted as a high concentration of traffic on the most important links.
We also calculated the values of the average path lengths and clustering coefficients, shown in Table 4.We provide the values for the random graph (Garlaschelli, 2009) and the ETAS model (Ogata, 1998(Ogata, , 1999;;Helmstetter and Sornette, 2002), to interpret them.In unweighted networks, it is very useful to compare the real networks to random models, so that, by comparing their main characteristics, namely, the clustering coefficient and the average path length, important Nonlin.Processes Geophys., 18,[477][478][479][480][481][482][483][484][485][486][487]2011 www.nonlin-processes-geophys.net/18/477/2011/ conclusions can be reached.In our case, we can see that the values of the average path lengths are lower for the real networks.For the random graphs, the clustering coefficients are lower.We can therefore conclude that there is a small world effect (Watts and Strogatz, 1998) in the clusters because of the values for the path lengths and clustering coefficients in the real networks.We can also see that the simulations with the ETAS model give values for the clustering coefficient close to that of the real networks, but the average path lengths are larger.This means that any vertex on the graph can be reached from any other vertex in only a small number of steps, fewer steps than with the models.In Table 5 we show the result of simulating random networks merely by shuffling the weights of the real networks.If we take these values into account, we can see that, in general, the average path lengths are lower for the random model, and the clustering coefficients are lower too.Since the average path lengths of the real networks are higher, no small world behaviour is found for them.In any case, the ETAS model is better than the random model (whichever we use) for simulating the seismicity, as might be expected.
To classify the data in Table 4, we grouped them into different classes.The groups were obtained using a k-means algorithm (Teknomo, 2007), with the logarithm of the clustering coefficient and the logarithm of the average path length as variables, and taking into account that the mean standard deviation of the distances within each group was minimized.We normalized the variables by the sum of the values.We used the logarithm because we observed that small numbers can be very clustered, since there are different magnitude scales in both average path length and clustering coefficient.The best choice for three groups is depicted in Fig. 3, where the clusters are ordered by their magnitude and number of events.The standard deviation is 0.0106.With two groups, we have the lowest standard deviation (0.0192) for the18 September 2004 and the 11 December 2002 in the same group, and another group with the remaining clusters.These two clusters are the ones with the highest main-shock magnitude and the most aftershocks.This analysis indicates that the topological properties of the clusters depend mainly on the magnitude of the main event, and not on the mechanism or mechanisms that originate them.

Conclusions
We proposed an analysis of the seismic clusters in a region based on the characteristics of weighted complex networks.In order to obtain the main clusters in a region, we used a distance based on the ETAS model (Ogata, 1998).We fitted the data to this model and performed a simple linkage clustering algorithm to obtain the main clusters.We analyzed the errors of both the fit and the simulated catalogues.
In our approach, we used the ETAS model, although this is not the only seismicity-based model.Another option is the BASS model (Turcotte et al., 2007, Holliday et al., 2008), for example.The main difference between the ETAS and BASS models is that the latter includes Båth's law (Båth, 1965) and is completely self-similar.However, it is normally used as a forward model, and for simulating one cluster only.Ogata (1998) provides several fitting functions that are based on real data, and this is why we used the ETAS model.In future research, other distances based on the BASS model could be used by implementing a similar procedure for fitting the data to a proposed BASS model.
We then analyzed and compared the clusters found to some models (one based on the ETAS model and two random models).We concluded that the ETAS model better approaches the seismicity near the Itoiz dam than the two random models we tried.The different values for the topological features analyzed seem to depend on the magnitude of the clusters and the number of events in each series, which are related.An event with higher magnitude can produce more aftershocks.We analyzed clusters both before and after the impoundment, and they were classified by their magnitude rather than the time interval within which they occurred.So, we cannot conclude that the 18 September 2004 earthquake behaves abnormally compared to others.However, only two global topological features have been analyzed in this classification.In order to discern the possible difference of this particular event we must have more clusters to analyze.A larger area must therefore be studied, with the difficulty inherent in analyzing a more complex region.Another option would be to study a larger time interval.In that case, the minimum magnitude would have to be increased to have a complete catalogue, and as a result smaller clusters would be found.
We found that the nodes with the higher strength are the most important in the traffic of the networks.The main shocks are associated with the events with the highest strength.However, the betweenness does not always coincide with the main events.We analyzed the weight, strength, and betweenness distributions, but we could not conclude that they followed a power law, because of the few data points involved.
We compared the real networks obtained with two random models in order to study the small-world phenomenon, in the same sense that it is studied for unweighted complex networks.When we used a theoretical random model (Garlaschelli, 2009), we found that there was small world behaviour.However, when all we did was shuffle the weights, results showed that the average path length was higher in the real networks, so no small world was found.No single answer was found for this question.
For future research, more properties based on the graph theory should be calculated in order to characterize the seismic networks better.We can also use this method in other places where no induced seismicity exists, and try to understand if we can differentiate the seismicity affected by artificial processes from that produced naturally.

Fig. 2 .
Fig. 2. In solid line, the real seismicity, and in dashed line, the ETAS model.

Fig. 3 .
Fig. 3. Classification of the clusters.We see three clusters, one formed by the 27-Oct-2007, other formed by the 18-Sep-2004 and 11-Dec-2002 events, and the remaining.We used the logarithm of the clustering coefficient divided by the sum of the logarithms of the clustering coefficient, and of the logarithm of the average path length divided by the sum of the logarithms of the averaged path lengths.

Table 2 .
Main clusters in the catalogue.D 2 is the correlation dimension.

Table 3 .
Average values and error of the ETAS fit for magnitude higher than 2.

Table 4 .
Values of the network characteristics corresponding to the main clusters in the catalogue.L is the average path length, and C is the clustering coefficient.For the random model and the ETAS model we also provide the error bounds, below the mean value.

Table 5 .
Values of the average path length and clustering coefficient for the random networks calculated by shuffling the weights.