The use of artificial neural networks to analyze and predict alongshore sediment transport

An artificial neural network (ANN) was developed to predict the depth-integrated alongshore suspended sediment transport rate using 4 input variables (water depth, wave height and period, and alongshore velocity). The ANN was trained and validated using a dataset obtained on the intertidal beach of Egmond aan Zee, the Netherlands. Rootmean-square deviation between observations and predictions was calculated to show that, for this specific dataset, the ANN (εrms=0.43) outperforms the commonly used Bailard (1981) formula ( εrms=1.63), even when this formula is calibrated (εrms=0.66). Because of correlations between input variables, the predictive quality of the ANN can be improved further by considering only 3 out of the 4 available input variables ( εrms=0.39). Finally, we use the partial derivatives method to “open and lighten” the generated ANNs with the purpose of showing that, although specific to the dataset in question, they are not “black-box” type models and can be used to analyze the physical processes associated with alongshore sediment transport. In this case, the alongshore component of the velocity, by itself or in combination with other input variables, has the largest explanatory power. Moreover, the behaviour of the ANN indicates that predictions can be unphysical and therefore unreliable when the input lies outside the parameter space over which the ANN has been developed. Our approach of combining the strong predictive power of ANNs with “lightening” the black box and testing its sensitivity, demonstrates that the use of an ANN approach can result in the development of generalized models of suspended sediment transport. Correspondence to: B. van Maanen (b.vanmaanen@niwa.co.nz)


Introduction
Alongshore sediment transport can have large-scale and long-term effects on coastal evolution and plays therefore a key role in nearshore studies and is of interest to scientists, managers, and engineers.Understanding and predicting sediment transport in the surfzone has proven to be extremely difficult because of the energetic environment and the complexity of nearshore systems and sediment transport itself.Both observational and theoretical approaches have been used to study sediment transport.From an observational point of view, obtaining accurate measurements of suspended sediment concentrations remains a challenge primarily because of its sensitivity to air bubbles (Puleo et al., 2006) and mixtures of sediments (Green and Boon, 1993), or the uncertain vertical position of the sensors with respect to the seabed.Semi-empirical (or semi-theoretical) models (e.g.Bailard, 1981) that account for the effect of waves and currents have also been developed but their application to natural conditions has shown only limited success (e.g.van Maanen et al., 2009).Practically, all of the theoretical approaches need a specific field calibration to tune the many parameters present in the models so that essentially, despite decades of research, making reliable predictions of sediment transport remains a difficult task.
A commonly adopted alongshore transport equation has been developed by Bailard (1981) who suggested that the work done in transporting the sediment is a fixed portion of the total energy dissipated by the flow.Depth-integrated alongshore suspended sediment (kg/m/s) is given by: Published by Copernicus Publications on behalf of the European Geosciences Union and the American Geophysical Union.
where the angle brackets indicate time-averaging over many waves, U (t) represents the instantaneous velocity vector, v is the time-averaged alongshore velocity, v* is the alongshore orbital velocity, and where c f is the bed drag coefficient, ε s is an efficiency factor, and w s is the settling speed of the characteristic grain size.Similar predictors have been proposed by other authors (see Bayram et al., 2001, for a thorough review) but their success is limited, especially during storms, and a specific calibration is often required (Bayram et al., 2001).
A different approach is provided by data-driven models.The simplest example of a data-driven model is provided by a linear regression where a single input variable (e.g.wave height) is used to provide an estimate of the predicted variable (e.g.sediment transport rate).Many different (and more complicated) data-driven algorithms have been developed and Artificial Neural Networks (ANNs) are an excellent example of such algorithms.ANNs have been applied to several fields of science (see for example Gardner and Dorling, 1998;Dayhoff and DeLeo, 2001) and several applications exist also in the field of ocean and coastal engineering.For example, ANNs have been applied to develop forecasts of hydrodynamics at different scales ranging from nearshore waves (Browne et al., 2007) to tides (Tsai and Lee, 1999) and storms (Sztobryn, 2003).Furthermore, for the case of unidirectional flow, ANNs have also been successfully used to predict sediment concentrations in laboratory (Lin and Namin, 2005) and field (Nagy et al., 2002) studies.ANNs are also beginning to be applied to the study of sandbar dynamics (Kingston et al., 2000) and beach profile evolution (Tsai et al., 2000).These studies have all treated ANNs as a blackbox focusing primarily on its predictive capability with little emphasis on increasing understanding of the driving physical processes.In few cases, ANNs have also been successfully used to explore the role of nonlinearities of a system including time-lag and scale effects (Pape et al., 2007).Finally, ANNs have also been applied in the field of geophysics and oceanography (Krasnopolsky, 2007) to develop hybrid models that combine ANNs and partial differential equations based on first principles (e.g.mass and momentum conservation).Overall, it appears that ANNs are becoming more and more common tools in geophysical and oceanographic studies but clearly they are still not used to fulfill their whole potential.
In this contribution we use field observations to train ANNs and show that, for the present dataset, ANNs can provide better predictions of alongshore suspended sediment transport rate than the commonly used Bailard (1981) formula.We also "open and lighten" the black-box to show that ANNs can be used to analyze the physical processes associated with suspended sediment transport.This approach is valuable because the usefulness of ANNs beyond their pre-dictive power has often been the subject of discussion (Mc-Cann, 1992;Gardner and Dorling, 1998) and also because it demonstrates that using an ANN approach can result in the development of generalized models of suspended sediment transport.

Field measurements
A field experiment was conducted at Egmond aan Zee (the Netherlands), a sandy (median size equal to 0.3 mm) beach characterized during the field experiment by one intertidal and two subtidal sandbars.Four tripods were deployed shoreward of the intertidal sandbar and each tripod included an electromagnetic flow velocity meter (EMF), a pressure sensor, and three optical backscatter (OBS) sensors.While tripods were submerged, timeseries were recorded continuously and subsequently split in 15 min bursts.Sampling frequency for all instruments was 2Hz.A detailed description of the field experiment including collection and analysis techniques of hydrodynamic and suspended sediment data has been presented previously (van Maanen et al., 2009).
Data from the pressure sensors were used to obtain spectral wave height H m0 , peak period T p , and water depth h.Timeseries collected using the EMF were used to derive the burst-and depth-averaged alongshore velocity V (for details see van Maanen et al., 2009).With respect to the suspended sediment concentrations, OBSs were calibrated using sand collected at the field site and data were used to construct the vertical profiles of suspended sediment concentration (for details see van Maanen et al., 2009).Finally, the depthintegrated suspended load was derived by integrating over the water depth, for each burst, the product between velocity and suspended sediment concentration profiles.The data used throughout this study are shown in Fig. 1.

Artificial Neural Network background and architecture
Over the last few decades, development and continuous improvement of ANNs have resulted in a powerful predictive tool.A major advancement was achieved by Werbos (1974) who expanded the applicability of ANNs to nonlinear systems and this development formed the basis of many ANNs used today.An ANN consists of input, hidden, and output nodes arranged in layers (Fig. 2).The input layer is usually "non-neural" in the sense that it only serves to feed the input data to the network.Each input is connected to a number of neurons, which altogether constitute the hidden layer.Here, information from input variables is condensed after performing operations of the type where x i is the i-th input variable, h j represents the response of the j -th neuron in the hidden layer, f is the activation function (a sigmoid has been used throughout this study similar for example to Rumelhart et al., 1986), w i is the connection weight between x i and h j , a j is the bias for the j -th hidden neuron, and there are n input variables.A further combination of hidden nodes, which is achieved by means of a new activation function (again a sigmoid) and new connection weights and biases, results in the output layer (in this study the output layer corresponds to one single value, the depth-integrated suspended sediment load).
The biases and connection weights of the ANN are evaluated through an optimization process that starts by splitting the dataset into two parts: the training dataset and the validation dataset.Training data are used by the ANN to learn how the system behaves, a process which ultimately results in the specification of biases and weights.Validation data are used to assess the performance of the ANN in making predictions.We used 66% (corresponding to 1522 observations) of the dataset to train the ANN, and the remaining part (784 observations) for validation.For this study, we used the most common type of feed-forward ANN that consists of one hidden layer with training performed using  (Kolen and Pollack, 1990), can be extremely sensitive to the initial values assigned to biases and weights.In trying to determine the biases and weights resulting in the global minimum of the difference between observations and ANN predictions, local minima may be encountered (whose presence depends on the initial values assigned to biases and weights) which halt the optimization process.There is no clear solution to this problem, which is why most authors prefer to train ANNs using different random seeds to generate initial weights and then analyze the best ANN (Faraway and Chatfield, 1998).For this study we adopted this approach and generated 10 000 ANNs with different initial random seeds.Results presented in this contribution refer to the ANN that displayed the lowest error herein defined as the root-mean-square deviation (Bayram et al., 2001;van Maanen et al., 2009): where the subscripts P and F , respectively, refer to the values predicted by the ANN (the same parameter is used to evaluate the goodness of fit of the Bailard, 1981, formula) and the values measured in the field.Also, to avoid overfitting of the training dataset, we have used a typical earlystopping technique such that if the performance of the training parameters (weights and biases) on the validation dataset does not improve, the optimization process is stopped and no new weights and biases are generated.
In this study, we have considered ANNs with the simplest structure: a number of hidden nodes ranging from 2 to 8 and only one hidden layer.Faraway and Chatfield (1998) showed that increasing the number of nodes can sometimes cause ANN performance to decay (overtraining).The likelihood of overtraining is obviously related to the ratio between the number of free parameters in the model (biases and weights) and the number of training samples.As each dataset has different characteristics, no clear guideline exists on how many samples are needed to avoid overtraining.The general rule (valid also for multiple linear regression) is that the number of training samples (1522 in the present study) should be at least 10 times the number of free parameters (Burnham and Anderson, 2002), so that the present training dataset should not be prone to overtraining issues.
The sigmoid activation function, defined as: (5) requires transforming the dependent variable (the prediction target or output) in both the training and validation datasets into a value inside the range [0 1] (the sigmoid function contracts any input inside this range).Variable transformation has been achieved by: where y * k is the transform of y k which, in turn, is the k-th observation of the dependent variable y. min(y) and max(y) are the minimum and maximum values of y, respectively.When evaluating the error associated with the ANN predictions, the dependent variable is transformed back into original values.Input variables vary over different ranges and need to be standardized to facilitate post-processing of the ANN and analysis of variable importance.Following Dimopoulos et al. (1999), this is achieved by: where x * k is the standardized value of independent variable x k , which, in turn, is the k-th observation of independent variable x. x mean and σ x are the mean and standard deviation of x, respectively.Subsequently, the standardized values of the input variables are also normalized according to Eq. ( 6).

Opening and lightening the "black box"
Despite the presence of studies showing how ANNs can be used to increase understanding of physical processes (e.g.Pape et al., 2007), ANNs are often considered to be "black boxes" with little, if any, capacity to provide insight on the dataset from which they have been constructed.However, for more than a decade techniques have been suggested that allow detailed analysis of connection weights and estimation of the role of each input variable (Vaughn, 1996;Benítez, 1997;Dimopoulos et al., 1999;Olden, 2000;Olden and Jackson, 2002).Recently, some of the techniques available have also been reviewed (Gevrey et al., 2003;Olden et al., 2004) and the partial derivatives (PaD) method has been shown to have the best explanatory power.The PaD approach was originally proposed by Dimopoulos et al. (1995) and recently extended by Gevrey et al. (2006).Assuming the use of a sigmoid activation function for all connections between nodes (as in the present study) and considering a network constituted by n input variables, one single hidden layer with m nodes and one output, the sensitivity of the ANN output to the input variable x i is evaluated through the sum of the squared partial derivatives SSD (Dimopoulos et al., 1995): where the index i refers to the input variable, the index k refers to the N available observations of the testing dataset.Assuming m hidden nodes, the derivative of output node k with respect to input variable i is evaluated as: where w ij is the weight connecting the i-th input node and the j -th hidden node, w j o is the weight connecting the output and the j -th hidden node, S k is the derivative of the output node with respect to its input, and I j k is the response of the j -th hidden node for the k-th input (for more details see Dimopoulos et al., 1995).Once SSD i has been calculated for each input variable, one can compare values and establish which variable is relatively the most important.The larger the value of SSD i , the more influence input variable x i has on the output.Using a similar approach, the importance of pair-wise combinations of input variables has also been evaluated (Gevrey et al., 2006): where all symbols have been previously indicated apart from s k which is the second derivative of the output node with respect to its input.As for the case of individual variables, the relative contribution of pairs of variables to the ANN explicatory power can be evaluated as:

Results
For the present study we decided to compare the predictive capability of ANNs and the Bailard model since van Maanen et al. (2009) found that the Bailard model outperformed another commonly used alongshore transport model (van Rijn, 1984) when the entire dataset was being evaluated.Also, the Bailard formula allows for an easy calibration procedure.We initially evaluated the performance of the Bailard model on the testing dataset (Fig. 3) which was not satisfactory given the large scatter of the data and the overall underprediction of depth-integrated suspended sediment transport (ε rms = 1.63).The Bailard formula (Eqs. 1 and 2) involves two coefficients, the drag coefficient and the efficiency factor, whose values are difficult to establish unequivocally.In this study we used 0.003 and 0.02, respectively, following van Maanen et al. ( 2009).We then decided, consistently with the ANN approach, to calibrate the Bailard formula on the training dataset and then apply the calibration coefficient to the testing dataset.Best agreement between measurements and predictions was obtained after multiplying the uncalibrated Bailard predictions by a factor of 35.47.This factor is extremely large especially because it can only be attributed to the drag coefficient or the efficiency factor.Obviously, measurement errors and the assumptions made during the computation of the measured sediment transport could also have contributed to the large difference between observations and predictions (see van Maanen et al., 2009, for more details).Nevertheless, the calibrated predictions are shown in Fig. 4 and the associated root-mean-square deviation, ε rms , using the testing dataset is equal to 0.66. Figure 5 shows typical ANN predictions when using the 4 available input variables (water depth, wave height and period, alongshore velocity) and 4 nodes in the hidden layer.There is a clear improvement (ε rms has decreased to 0.43) compared to the calibrated Bailard predictions.Despite the improvement, the ANN struggles to predict the highest and lowest measured values.This problem is likely to arise from different effects.The low values of depth-integrated suspended sediment flux are so low that measurements might be close to the limits resulting from intrinsic instrument accuracy.In fact, although the performance of the Bailard model also decreases for low values of measured suspended sediment transport, the real problem might simply be the relatively small number of measurements available below 10 −3 kg/m/s.Had the dataset included many more of these low measured values (with an instrument-accuracy problem), the ANN could have learnt about the instrument-accuracy problem and resulted in good (in the sense that they are close to the measured values) predictions.Overall, it is worth noticing that the low asymptotic limit for the ANN (10 −3 ) is at least one order of magnitude higher than the lowest sediment flux measured and predicted according to Bailard (see Fig. 4).The highest values are not particularly wellpredicted by the ANN and we suspect this effect is again related to predicting the tails of the distribution of the available measurements.Only a small number of large values of suspended sediment flux are present in the overall dataset with no more than 53 values out of 2306 measurements that exceed 1 kg/m/s.This affects the training of the ANN and Table 1.Percentage of the contribution of single variables for the 3 best-performing models presented in Fig. 6.
the ε rms evaluated over these high values of measured sediment transport amounts to 0.51.This reflects a problem of ANNs (and data-driven models in general) which are difficult to train for extreme conditions while producing an accurate prediction of extreme values is of specific interest to coastal engineers and scientists.In general, similar predictive results have been obtained when changing the number of nodes in the hidden layer (changing the number of hidden nodes from 2 to 6 corresponded to changes in ε rms from 0.49 to 0.51, the minimum value being 0.43 for 4 nodes).
ANN results tend to be sensitive to correlations in the input variables.Some of the four variables are certainly characterized by some level of correlation.Correlation effects are likely to become particularly evident as a result of specific conditions encountered in the field (e.g. for saturated wavebreaking conditions, wave height and water depth become strongly correlated).To test the sensitivity to the choice of available input variables, we have built (following the same methodology described for the case with 4 input variables) ANNs characterized by only 3 inputs and 3 hidden nodes (Fig. 6).When one variable between H m0 , h, and T p is dropped out of the ANN, there is an improvement in the overall prediction skill of the ANN.Although results do not allow to distinguish which input variable (between H m0 , h, and T p ) is the most relevant, it is evident (see Fig. 6d) that the alongshore component of the velocity plays a major role in the prediction of suspended sediment fluxes.Removing the alongshore component of the velocity from the input variables causes a strong decay in the prediction power of the ANN.The lowest error (defined using Eq. 4) was obtained using V , h and T p as input variables.For the 3 best-performing models presented in Fig. 6 we have analyzed the importance of individual variables and of their interactions using the PaD approach (see previous section).Analysis of the contribution of single variables (Table 1) shows that, for all models considered, the alongshore component of the velocity is the input variable with the largest explanatory power.For each of the models presented in Table 1 we have run 10 000 additional ANNs with different initial weights.We have then analyzed the weights of the ANNs with a predictive skill similar to the one of the best performing ANN (difference from the ε rms shown in Fig. 6 was below 10%) and, apart from negligible differences in the contribution of each variable, results confirm the findings reported in Table 1.Analysis Table 2. Percentage of the contribution of combinations of variables for the 3 best-performing models presented in Fig. 6. of the contribution of combinations of variables (Table 2) is less straightforward but still provides evidence that, for all models, the mechanism(s) leading to improved predictions of depth-integrated sediment fluxes are related to the presence of an alongshore current and its interaction with the other variables.Probably because of cross-correlation between some of the input variables, other ANNs with similar predictive skill can result in contributions that differ from those presented in Table 2.For example, with respect to Model 2 the explanatory power can shift between T p −V and H m0 −V without any significant effect on the prediction skill.

Discussion and conclusions
A typical criticism of an ANN predictor (or any other datadriven predictor) is that its validity is limited and intrinsically linked to the distribution of the input variables in the training dataset.To analyze this sensitivity and the "universality" of the ANN, we reconstructed the predictor using the biases and weights of the best performing ANN (Model 1 in Table 1, see also Fig. 6c) and then examined the response to changes in the input conditions.Figure 7 shows that the response of the reconstructed ANN is extremely nonlinear and that extending the predictions far beyond the values considered in the training dataset can lead to unphysical results.For example, looking at Fig. 7a where V is kept constant at 0.17 m/s (the observed mean value), it is easy to notice that an increase in depth-integrated suspended sediment transport occurs for increasing values of T p up to 6 s.The increase is smaller for larger depths.While these aspects of the predictor are physically sound, larger increases in T p lead to a sharp decrease in sediment transport.This behaviour is clearly unphysical and, as shown by the white dots in Fig. 7a, is driven by the extremely limited number of observations available for these combinations of V , h and T p .It is also worth noticing that for these combinations of V , h and T p only small values of sediment transport are observed (see colour-bar of Fig. 7a and compare to the other subplots).If the fixed value of V is set to 0.7 m/s (Fig. 7b), the response of the ANN to changes in h and T p is physically sound as no reduction in sediment transport is predicted for large values of T p . Figure 7c and d shows a physically correct response of the ANN when respectively T p and h are kept constant at their mean values (similar results are obtained for larger or smaller values of the fixed input variable).An increase in h corresponds to a small decrease in sediment transport (Fig. 7c) and the opposite occurs for an increase in T p (Fig. 7d).The large gradient in sediment transport with respect to V indicates again the dominance of the alongshore velocity on ANN outcome.
A key requirement for any data-driven model, be it as simple as a linear regression or as complicated as an artificial neural network, is that it is capable of providing predictions that can explain the observed variability and that are physically meaningful.Because of the large number of free parameters, ANNs can create highly nonlinear functions which relate independent and dependent variables.This capability explains why ANNs can outperform both theoretical approaches (e.g. the Bailard model) and simple data-driven predictors (e.g.linear or multiple regressions).However, because of the complex structure of an ANN, it is difficult to disentangle the interactions between the input variables.As a result, despite their undeniable predictive power, ANNs have failed to provide insight with respect to the physical processes driving the predictions.In this contribution we use an already established technique to open up and lighten the ANN black box.This allows dissecting the interactions leading to predictions of sediment transport that are a large improvement (Fig. 6) compared to a physically based predictor (Figs. 3 and 4).The use of the PaD technique allows ranking the role of each individual variable (Table 1) and also of combinations of variables (Table 2).Physically, our results are not unexpected, in the sense that, because of cross-correlation effects, one variable between H m0 , T p and h can be dropped without losing predictive skills.Also, as shown by the analysis of the relative importance of variables, the dominant input variable is the alongshore component of the velocity with a minor role played by resuspension mechanisms related to wave processes.The small contribution to the explanatory power of wave related input variables is likely to be related to the conditions encountered in the field during the collection of this dataset.A limitation of the present study is that it utilizes data collected at one beach site.For example, the ANN predictor does not include a grain size dependency which is clearly not physically correct.Moreover, the dataset used here encompasses only a small range of wave conditions.Adding datasets of measured sediment transport that cover a wide range of wave, sediment, and beach conditions during the development of the ANN and extensive validation could, ultimately, result in a more universal predictor.
Overall, the results presented in this paper suggest that using an ANN approach can result in the development of a powerful predictor and show that ANNs can be analyzed.However, users of ANNs should always bear in mind that when the input variables, or their combination, are different from the parameter space over which the ANN has been developed, predictions can be unphysical and so meaningless.On the other hand, the same ANN data-driven approach used in this study could be extended and applied to other datasets increasing the parameter space over which predictions are valid and so the generality of the predictor.

Fig. 1 .
Fig. 1.Tripod observations: (A) Depth-integrated suspended sediment transport (a log-scale is employed), (B) water depth, (C) significant wave height, (D) peak wave period, (E) alongshore velocity.The vertical dashed lines separate data obtained from different tripods (tripods were measuring simultaneously, but each dataset is characterized by a different length because of the different time that each tripod was submerged).The vertical solid lines at observation 1522 indicate the difference between the "training" (first 66% of the data) and the "testing" (last 33% of the data) datasets.

Fig. 3 .
Fig. 3. Comparison of measured and predicted (using the Bailard formula) values of depth-integrated suspended sediment transport.The solid line indicates equality and the dashed lines indicate a factor 5 difference between the predicted and observed values.

Fig. 4 .
Fig. 4. Comparison of measured and predicted (using a calibrated version of the Bailard formula) values of depth-integrated suspended sediment transport.The solid line indicates equality and the dashed lines indicate a factor 5 difference between the predicted and observed values.

Fig. 5 .
Fig. 5. Comparison of measured and predicted values of depthintegrated suspended sediment transport.Predictions have been made using an ANN with 4 inputs (H , h, T p , V ) and 4 nodes in 1 hidden layer.The solid line indicates equality and the dashed lines indicate a factor 5 difference between the predicted and observed values.

Fig. 6 .
Fig. 6.Comparison of measured and predicted values of depth-integrated suspended sediment transport.Predictions have been made using an ANN with 3 inputs and 3 nodes in 1 hidden layer.Inputs are: (A) h, H m0 , V ; (B) T p , H m0 , V ; (C) T p , h, V ; (D) h, H m0 , T p .The solid line indicates equality and the dashed lines indicate a factor 5 difference between the predicted and observed values.

Fig. 7 .
Fig.7.Sensitivity of the best-performing ANN (Fig.6c.Input variables are V , h and T p ) to changes in the input variables.Colour-bar represents depth-integrated suspended sediment transport (kg/m/s).Notice the top subplots have a different colour-scale.In (A) and (B), V is equal to 0.17 (the mean value in the observations) and 0.7 m/s, respectively.In (C) T p is equal to 4.3 s (mean value) and in (D) h is 0.94 m (mean value).White dots represent observations whose value of the fixed variable is within ±1 standard deviation from the fixed value.
An idealized feed-forward ANN characterized by n input nodes (the predictors or independent variables), m hidden nodes and one output node (the prediction or dependent variable).