Boosting performance in machine learning of geophysical flows via scale separation

Recent advances in statistical and machine learning have opened the possibility to forecast the behavior of chaotic systems using recurrent neural networks. In this article we investigate the applicability of such a framework to geophysical flows, known to involve multiple scales in length, time and energy and to feature intermittency. We show that both multiscale dynamics and intermittency introduce severe limitations on the applicability of recurrent neural networks, both for short-term forecasts, as well as for the reconstruction of the underlying attractor. We suggest that possible strategies to overcome such 5 limitations should be based on separating the smooth large-scale dynamics from the intermittent/small-scale features. We test these ideas on global sea-level pressure data for the past 40 years, a proxy of the atmospheric circulation dynamics. Better short and long term forecasts of sea-level pressure data can be obtained with an optimal choice of spatial coarse grain and time filtering.


Introduction
The advent of high-performance computing has paved the way for advanced analyses of high-dimensional datasets (Jordan and Mitchell, 2015;LeCun et al., 2015). Those successes have naturally raised the question of whether it is possible to learn the behavior of a dynamical system without resolving or even without knowing the underlying evolution equations. Such an interest is motivated on one side by the fact that many complex systems still miss a universally accepted state equation -e.g. brain 15 dynamics (Bassett and Sporns, 2017), macro-economical and financial systems (Quinlan et al., 2019) -and, on the other, by the need of reducing the complexity of the dynamical evolution for the systems of which the underlying equations are known -e.g. on geophysical and turbulent flows . Evolution equations are difficult to solve for large systems such as the geophysical flows, so that approximations and parameterizations are needed for meteorological and climatological at understanding this sensitivity in a deeper way, while assessing the possibility to reduce its impact on prediction through simple noise reduction methods. The remaining of this article is organised as follows: first, we give an overview of the ESN method and provide the description of the systems used. Then, we show the results for the perturbed Lorenz 1963 equations, for the Pomeau-Manneville intermittent map, and for the Lorenz 1996 equations. We discuss the improvement in short-term prediction and the long-term attractor reconstruction obtained with the moving average filter. We conclude by testing these 70 ideas on atmospheric circulation data.

Methods
Reservoir computing" is a variant of recurrent neural networks (RNN) in which the input signal is connected to a fixed and random dynamical system called reservoir (Hinaut, 2013). The principle of Reservoir computing consists in projecting first the input signal to a space of high dimension in order to obtain a non-linear representation of the signal; and then perform a new 75 projection (linear regression or ridge regression) between the high-dimensional space and the output units. In our study ESN are implemented as follows. The code is given the in appendix and it shows the parameters used for the computations. Let u(t) be the K-dimensional observable consisting of t = 1, 2 . . . , T time iterations, originating from a dynamical system and r(t) be the N -dimensional reservoir state, then: (1) 80 where W is the adjacency matrix of the reservoir: its dimensions are N × N , and N is the number of neurons of the reservoir.
In ESN, the neuron layers of classic deep neural networks are replaced by a single layer consisting of a sparsely connected random network, with coefficients uniformly distributed in [−0.5; 0.5]. W in , with dimensions N × K, is the weight matrix of the connections between the input layer and the reservoir and the coefficients are randomly sampled, as for W . The output of the network at time step t + dt is 85 W out r(t + dt) = v(t + dt) where v(t + dt) is the ESN prediction, W out with dimensions K × N , is the weight matrix of the connections between the reservoir neurons and the output layer. We estimate W out via a ridge regression (Hastie et al., 2015): with λ = 10 −8 . In the prediction phase we have a recurrent relationship:

ESN performance indicators
In this paper, we use three different indicators of performance of the ESN: As a first diagnostic of the performance of ESN, we aim at assessing whether the marginal distribution of the forecast values for a given dynamical system is significantly different from the invariant distribution of the system itself. To this purpose, we conduct a χ 2 test (Cochran, 1952), designed as follows. Let U be a system observable with support R U and probability density function f U (u), and let u(t) be a sample trajectory from U . Let nowf U (u) be an approximation of f U (u), namely the histogram of u over i = 1, . . . , M bins. Note that, if u spans the entire phase space,f U (u) is the numerical approximation of the 100 Sinai-Ruelle-Bowen measure of the dynamical system (Eckmann and Ruelle, 1985;Young, 2002). Let now V be the variable generated by the ESN forecasting, with support R V = R U , v(t) the forecast sample, g V (v) its probability density function and g V (v) the histogram of the forecast sample. We test the null hypothesis that the marginal distribution of the forecast sample is the same as the invariant distribution of the system, against the alternative hypothesis that the two distributions are significantly different: ) are due to random errors, and are then independent and identically distributed Gaussian random variables. Statistical theory shows that, given H 0 true, the test statistics is distributed as a chi-squared random variable with M degrees of freedom, χ 2 (M ). Then, to test the null hypothesis at the level α, the observed value of the test statistics Σ is compared to the critical value corresponding to the 1 − α quantile of the chi-square distribution, Σ c = χ 2 1−α (M ): if Σ > Σ c , the null hypothesis must be rejected in favour of the specified alternative.
In our setup, we encounter two limitations in using the standard χ 2 test. First, problems may arise whenf U (u), i.e. if 115 the sample distribution does not span the entire support of the invariant distribution of the system. We observe this in a relatively small number of cases; since aggregating the bins would introduce unwanted complications, we decide to discard the pathological cases, controlling the effect empirically as described below. Moreover, even producing relatively large samples, we are not able to actually observe the invariant distribution of the considered system, which would require much longer simulations. As a consequence, we would observe excessive rejection rates when testing samples generated under H 0 .

120
We decide to control these two effects by using a Monte Carlo approach. To this purpose, we use 10000 samples using the system equations under the null hypothesis, and we compute the test statistic for each one according to Eq. (5). Then, we use the (1 − α) quantile of the empirical distribution of Σ -instead of the theoretical χ 2 (M ) -to determine the critical threshold Σ c . As a last remark, we notice that we are making inference in repeated tests setting, as the performance of the ESN is tested 10000 times. Performing a high number of independent tests at a chosen level α increases the observed rejection rate:

125
in fact, even if the samples are drawn under H 0 , extreme events become more likely, resulting in an increased probability to erroneously reject the null hypothesis. To avoid this problem, we apply the Bonferroni correction (Bonferroni, 1936), testing each one of the m = 10000 available samples at the level α = α m , with α = 0.05. Averaging the test results over several sample pairs u(t), v(t) we obtain a rejection rate 0 < φ < 1 that we use to measure the adherence of a ESN trajectory v(t) to trajectories obtained via the equations. If φ = 0, almost all the ESN trajectories can 130 shadow original trajectories, if φ = 1 none of the ESN trajectories resemble those of the systems of equations.

Predictability Horizon
As a measure of the predictability horizon of the ESN forecast compared to the equations, we use the root mean square error (RMSE): 135 and we define the predictability horizon τ s as the first time that RMSE exceeds a certain threshold s. We link s to the average separation of observations in the observable U and we fix We have tested the sensitivity of results against the exact definition of s.
We interpret τ s as a natural measure of the Lyapunov time ϑ, namely the time it takes for an ensemble of trajectories of a dynamical system to diverge (Faranda et al., 2012;Panichi and Turchetti, 2018).

Initial Forecast Error
The initial error is given by η = RM SE(t = 1), for the first time step after the initial condition at t = 0. We expect η to reduce 140 as the training time increases. In this phase, the the smaller the initial error will be.

Moving average filter
Equipped with these indicators, we analyze two sets of simulations performed with and without smoothing, which was implemented using a moving average filter. The moving average operation is the integral of u(t) between t and t − w, where w is the 145 window size of the moving average. The simple moving average filter can be seen as a nonparametric time series smoother (see e.g. (Brockwell and Davis, 2016), chapter 1.5). It can be applied to smooth out (relatively) high frequencies in a time series, both to de-noise the observations of a process or to estimate trend-cycle components, if present. Moving averaging consists, in practice, of replacing the observation u(t) by a value u (f ) (t), obtained by averaging the previous w observations. If the time dimension is discrete (like in the Pomeau-Manneville system) it is defined as: while for continuous time systems (like the Lorenz 1963 system), the sum is formally replaced by an integral: We can define the residuals as: In practice, the computation always refers to the discrete time case, as continuous time systems are also sampled at finite time steps. Since Echo State Networks are known to be sensitive to noise (see e.g. Shi and Han (2007)), we exploit the simple moving average filter to smooth out high-frequency noise and assess the results for different smoothing windows w. We find that the choice of the moving averaging window w must respect two conditions: it should be large enough to smooth the noise but smaller than the characteristic time τ of the large-scale fluctuations of the system. For chaotic systems, τ can be derived 160 knowing the rate of exponential divergence of the trajectories, a quantity linked to the Lyapunov exponents (Wolf et al., 1985), and τ is known as the Lyapunov time.
We also remark that we can express explicitly the original variable u(t) as a function of the filtered variable u (f ) (t) as: 165 we will test this formula for stochastically perturbed systems to evaluate the error introduced by the use of residuals δu.

Testing ESN on filtered dynamics
Here we describe the algorithm used to test ESN performance on filtered dynamics: 1. Simulate the reference trajectory u(t) using the equations of the dynamical systems, where u(t) has been standardized by subtracting the mean and dividing by its standard deviation.

170
2. Perform the moving average filter to obtain u (f ) (t).

Extract from
Compare v(t) and u(t > T train ) using the metrics φ, τ and η.
As an alternative to step 6, one can also use Eq. (10) and obtain: that does not require the use of residuals δu(t).

Lorenz 1963 equations
The Lorenz (Lorenz, 1963) system is a simplified model of Rayleigh-Benard convection, derived by E.N. Lorenz. It is an autonomous continuous dynamical system with three variables u ∈ {x, y, z} parametrizing respectively the convective motion, the horizontal temperature gradient and the vertical temperature gradient. It writes: where σ, and b are three parameters, σ mimicking the Prandtl number and the reduced Rayleigh number. The Lorenz model is usually defined using Eq. (12), with σ = 10, = 28 and b = 8/3. A deterministic trajectory of the system is shown in Figure 1a). It has been obtained via integrating numerically the Lorenz equations with an Euler scheme (dt = 0.001). The 195 systems is perturbed via additive noise: ξ x (t), ξ y (t) and ξ z (t) are random variable all drawn from a Gaussian distribution. The initial conditions are randomly selected within a long trajectory of 5·10 6 iterations. First, we study the dependence of the ESN on the training length in the deterministic system ( = 0, Figure 1b of ∼ 10) when the moving average procedure is applied. We now study in a more systematic way the dependence of the ESN performance on noise intensity , network size N and for three different averaging windows w = 0, w = 0.01, w = 0.05. We 210 produce, for each combination, 100 ESN forecasts. Figure 3 shows φ (a), log(τ s=1 ) (b) and log(η) (c) computed setting u ≡ x variable of the Lorenz 1963 system (results qualitatively do not depend on the chosen variable). In each panel from left to right the moving average window is increasing, upper sub-Panels are obtained using the exact expression in Eq. 11 and lower panels using the residuals in Eq 9. For increasing noise intensity and for small reservoirs sizes, the performances without moving average (left subpanels) rapidly get worse. The moving average smoothing with w = 0.01 (central sub-panels) improves the 215 performance for log(τ s=1 ) (b) and log(η) (c), except when the noise is too large ( = 1). When the moving average window is too large (right panels), the performances of φ decrease. This failure can be attributed to the fact that residuals δu (Eq.9) are of the same order of magnitude of the ESN predicted fields for large. Indeed, if we use the formula provided in Eq. 11 as an alternative to step 6, we can evaluate the error introduced in the residuals. The results shown in Figure 3 suggest that residuals can be used without problems when the noise is small compared with the dynamics. When is close to one, the 220 residuals overlay the deterministic dynamics and ESN forecast are poor. In this case, the exact formulation in Eq. 11 appears much better.

Pomeau-Manneville intermittent map
Several dynamical systems, including Earth climate, display intermittency, i.e., the time series of a variable issued by the system can experience sudden chaotic fluctuations, as well as a predictable behavior where the observables have small fluctuations. In 225 atmospheric dynamics, such a behavior is observed in the switching between zonal and meridional phases of the mid-latitude 8 https://doi.org/10.5194/npg-2020-39 Preprint. Discussion started: 17 September 2020 c Author(s) 2020. CC BY 4.0 License. dynamics if a time series of the wind speed at one location is observed: when a cyclonic structure passes through the area, the wind has high values and large fluctuations, when an anticyclonic structure is present the wind is low and fluctuations are smaller (Weeks et al., 1997;Faranda et al., 2016). It is then of practical interest to study the performance of ESN in Pomeau Manneville predictions as they are a first prototypical example of the intermittent behavior found in climate data.

230
In particular, the Pomeau-Manneville (Manneville, 1980) map is probably the simplest example of intermittent behavior, produced by a 1D (here u = x) discrete deterministic map given by: where 0 < a < 1 is a parameter. We use a = 0.91 in this study and a trajectory consisting of 5×10 5 iterations. The systems is perturbed via additive noise ξ(t) drawn from a Gaussian distribution. It is well known that Pomeau-Manneville systems 235 exhibit sub-exponential separation of nearby trajectories and then the Lyapunov exponent is λ = 0. However, one can define a Lyapunov exponent for the non-ergodic phase of the dynamics and extract a characteristic time scale (Korabel and Barkai, 2009). From this latter reference, we can derive a value λ 0.2 for a = 0.91, implying w < τ 5. We find that the best match between ESN and equations in terms of the φ indicator are obtained for w = 3.

240
Results for the Pomeau-Manneville map are shown in Figure 4. We first observe that the ESN forecast of the intermittent dynamics of the Pomeau-Manneville map is much more challenging than for the Lorenz system as a consequence of the intermittent behavior of this system. For the simulations performed with w = 0, the ESN cannot simulate an intermittent behavior, for all noise intensities and reservoir sizes. This is reflected in the behavior of the indicators. In the deterministic limit, the ESN fails to reproduce the invariant density in 80% of the cases (φ 0.8). For intermediate noise intensities φ > 0.9 ( Figure   245 4-a). The predictability horizon log(τ s=0.5 ) for the short term forecast is small (Figure 4d) and the initial error large ( Figure   4g). The moving average procedure with w = 3 partially improves the performances (Figure 4b Before running the ESN algorithm on actual climate data, we test our idea in a more sophisticated, and yet still idealized, model of atmospheric dynamics, namely the Lorenz 1996 equations (Lorenz, 1996). This model explicitly separates two scales and therefore will provide a good test for our ESN algorithm. The Lorenz 1996 system consists of a lattice of large-scale resolved variables X, coupled to small-scale variables Y , whose dynamics can be intermittent, so that u ∈ {X, Y }. The model is defined 260 via two equations: where i = 1, . . . , I and j = 1, 2, . . . , J denote respectively the number of large-scale X and small-scale Y variables. Large-265 scale variables are meant to represent the meanders of the jet-stream driving the weather at mid-latitudes. The first term on the right-hand side represents advection, the second diffusion, while F mimics an external forcing. The system is controlled via the parameters b and c (the time scale of the the fast variables compared to the small variables) and via h (the coupling between large and small scales). From now on, we fix I = 30, J = 5 and F = b = 10 as these parameters are typically used to explore the behavior of the system (Frank et al., 2014). We integrate the equations with an Euler scheme (dt = 10 −3 ) from the 270 initial conditions Y j,i = X i = F , where only one mode is perturbed as X i=1 = F + ε and Y j,i=1 = F + ε 2 . Here ε = 10 −3 . We discard about 2 · 10 3 iterations to reach a stationary state on the attractor, and we retain 5 · 10 4 iterations. When c and h vary, different interactions between large and small scales can be achieved. A few examples of simulations of the first mode X 1 and Y 1 are given in Figure 6. In the Lorenz 1996 model we can explore what happens to the ESN performances if we turn on and off intermittency and/or the small-to-large-scale coupling, without introducing any additional noise term. Moreover, we can also learn the Lorenz 1996 dynamics on the X variables only, or learn the dynamics on both X and Y variables. The purpose of this analysis is to assess 280 whether the ESN are capable of learning the dynamics of the large-scale variables X alone, and how this capability is influenced by the coupling and the intermittency of the small-scale variables Y . Using the same simulations presented in Figure 6, we train the ESN on the first 2.5·10 4 iterations, and then perform, changing the initial conditions 100 different ESN predictions for 2.5 · 10 4 more iterations. We apply our performance indicators not to the entire I-dimensional X variable (X 1 , . . . , X I ), as the χ 2 test becomes intractable in high dimensions, but rather to the spatial average of the large-scale variables X. The behavior 285 of each variable X i is similar, so the average is representative of the collective behavior. The rate of failure φ is very high (not shown) because even when the dynamics is well captured by the ESN the variables are not scaled and centered as those of the original systems. For the following analysis, we therefore replace φ with the χ 2 distance T (Eq. (5)). The use of T allows for better highlighting the differences in the ESN performance with respect to the chosen parameters. The same considerations also apply to the analysis of the sea-level pressure data reported in the next paragraph. Results of the ESN simulations for the Lorenz 1996 system are reported in Figure 7. In Figure 7a,c,e) ESN predictions are obtained by varying c at fixed h = 1, while in Figure 7b,d,f) by varying h at fixed c = 10. The continuous lines refer to results obtained feeding the ESN with only the X variables, dotted lines with both X and Y . For the χ 2 distance T (Figure 7a,b), performances show a large dependence on both intermittency c and coupling h. First of all, we remark that learning both X 295 and Y variables lead to higher distances T , except for the non intermittent case, c = 1. For c > 1, the dynamics learnt on both X and Y never settles on a stationary state resembling that of the Lorenz 1996 model. When c > 1 and only the dynamics of the X variables is learnt, the dependence on N when h is varied is non monotonic and better performances are achieved for 800 < N < 1200. For this range, the dynamics settles on stationary states whose spatio-temporal evolution resembles that of the Lorenz 1996 model, although the variability of time and spatial scales is different from the target. An example is provided 300 in Figure 8, for N = 800.
Let us now analyse the two indicators of short-term forecasts. Figure 7c,d) display the predictability horizon τ s with s = 1.
The best performances are achieved for the non-intermittent case c = 1 and learning both X and Y . When only X is learnt, we again get better performances in terms of τ s for rather small network sizes. The performances for c > 1 are better when only X 305 variables are learnt. The good performance of ESN in learning only the large-scale variables X are even more surprising when looking at initial error η (Figure 7), which is one order of magnitude smaller when X, Y are learnt. Despite this advantage in the initial conditions, the ESN performances on (X, Y ) are better only when the dynamics of Y is non-intermittent. We find clear indications that large intermittency (c = 25) and strong small-to-large scale variables coupling (h = 1) worsen the ESN performances, supporting the claims made for the Lorenz 1963 and the Pomeau-Manneville systems.

310
The NCEP sea-level pressure data We now test the effectiveness of the moving average procedure in learning the behavior of multiscale and intermittent systems on climate data issued by reanalysis projects. We use data from the National Centers for Environmental Prediction (NCEP) version 2 (Saha et al., 2014) with a horizontal resolution of 2.5 • . We adopt the global 6 hourly sea-level pressure (SLP) field 315 from 1979 to 31/08/2019 as the meteorological variable proxy for the atmospheric circulation. It traces cyclones (resp. anticyclones) with minima (resp. maxima) of the SLP fields. The major modes of variability affecting mid-latitudes weather are often defined in terms of the Empirical Orthogonal Functions (EOF) of SLP and a wealth of other atmospheric features (Hurrell, 1995;Moore et al., 2013), ranging from teleconnection patterns to storm track activity to atmospheric blocking can be diagnosed from the SLP field.

320
In addition to the time moving average filter, we also investigate the effect of spatial coarse-graining the SLP fields by a factor c and perform the learning on the reduced fields. We use the nearest neighbor approximation, which consist in taking from the original dataset the closest value to the coarse grid. Compared with methods based on averaging or dimension reduction techniques such as EOFs, the nearest neighbors approach has the advantage of not removing the extremes (except if the extreme is not in one of the closest gridpoint) and preserve cyclonic and anticyclonic structures. For c = 2 we obtain a horizontal resolution of 5 • and for c = 4 a resolution 10 • . For c = 4 the information on the SLP field close to the poles is lost. However, in the remaining of the geographical domain, the coarse grained fields still capture the positions of cyclonic and anticyclonic structures. Indeed, as shown in (Faranda et al., 2017), this coarse grain field still preserves the dynamical properties of the original one. There is therefore a certain amount of redundant information on the original 2.5 • horizontal resolution SLP fields.
The dependence of the quality of the prediction for the sea-level pressure NCEPv2 data on the coarse graining factor c and 330 on the moving average window size w is shown in Figure 9. We show the results obtained using the residuals (Eq. 9). and c = 4. Indeed this is the case such that, with the smallest network we get almost the minimal χ 2 distance T , the highest predictability (32 h) and one of the lowest initial errors. We also remark that, for c = 0 (panels (c) and (i)), the fit always diverges for small network sizes.
We In order to assess the performances of the two ESNs with and without moving average in a more quantitative way, we present the space-time distributions in Figure 10a). The distribution obtained for the moving average w = 12h has more realistic tails and matches better than the run w = 0h that of the target data. Figure 10b-d) shows the wavelet spectrograms (or 350 scalograms) (Hudgins et al., 1993). The scalogram is the absolute value of the continuous wavelet transform of a signal, plotted as a function of time and frequency. The target data spectrogram (b) presents a rich structure at different frequencies and some interannual variability. The wavelet spectrogram of non-filtered ESN run w = 0 h (c) shows no short time variability and too large interseasonal and interannual variability. The spectrogram of the target data is better matched by the run with w = 12 h (d) which shows that, on time scales of days to weeks, there is a larger variability.

Discussion
We have analysed the performance of ESN in reproducing both the short and long-term dynamics of observables of geophysical flows. The motivation for this study came from the evidence that a straightforward application of ESN to high dimensional geophysical data (such as the 6 hourly global gridded sea-level pressure data) does not yield to the same results quality obtained by (Pathak et al., 2018) for the Lorenz 1963 and the Kuramoto-Sivashinsky models. Here we have investigated the causes for 360 this behavior and identified two main bottlenecks: (i) intermittency and (ii) the presence of multiple dynamical scales, which both appear in geophysical data. In order to illustrate this effect, we have first analysed two low dimensional systems, namely the Lorenz (1963) and the Manneville (1980)  Our results may appear rather counter-intuitive, as the weather and climate modelling communities are moving towards extending simulations of physical processes to small scales. As an example, we cite the use of highly-resolved convection-380 permitting simulations (Fosser et al., 2015) as well as the use of stochastic (and therefore non-smooth) parameterizations in weather models (Weisheimer et al., 2014). We have, however, a few heuristic arguments on why the coarse-gaining and filtering operations should improve the ESN performances. First of all, the moving-average operation helps both in smoothing the signal and by providing the ESN with a wider temporal information. In some sense, this is reminiscent of the embedding procedure (Cao, 1997), where the signal behavior is reconstructed by providing not only information on the previous time step, but on 385 previous times depending on the complexity. The filtering procedure can also be motivated by the fact that the active degrees of freedom for the sea-level pressure data are limited. This has been confirmed by Faranda et al. (2017) via coarse-graining these data and showing that the active degrees of freedom are independent on the resolution, in the same range explored in this study.
In other words, including small scales in the learning of sea-level pressure data, does not provide additional information on the dynamics and push towards over-fitting and saturating the ESN with redundant information. The latter consideration poses 390 also some caveats on the generality of our results: we believe that this procedure is not beneficial whenever a clear separation of scales is not achievable, e.g. in non-confined 3-D turbulence. Moreover, in this study, note that three sources of stochasticity were present: (i) in the random matrices and reservoir, (ii) in the perturbed initial conditions and (iii) in the ESN simulations when using moving average filtered data with sampled δu components. The first one is inherent to the model definition. The perturbations of the starting conditions allow characterizing the sensitivity of our ESN approach to the initial conditions. The 395 stochasticity induced by the additive noise δu provides a distributional forecast at each time t. Although this latter noise can be useful to simulate multiple trajectories and evaluate their long-term behaviour, in practice, i.e., in the case where a ESN would be used operationally to generate forecasts, one might not want to employ a stochastic formulation with an additive noise, but rather the explicit and deterministic formulation in Eq. 11. This exemplifies the interest of our ESN approach for possible distinction between forecasts and long-term simulations, and therefore makes it flexible to adapt to the case of interest.

400
In future work, it will be interesting to use other learning architectures and other methods of separating large-from smallscale components (Wold et al., 1987;Froyland et al., 2014;Kwasniok, 1996). For example, our results give a more formal framework for applications of machine learning techniques on geophysical data. Deep-learning approaches have proven useful in performing learning at different time and spatial scales whenever each layer is specialized in learning some specific features Hamid, Cedric Nguounge Langue and Giulia Carella performed the analysis. All the authors contributed to writing and discussing the results of the paper.