Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970–1999) and two scenarios (RCP 2.6 and 8.5; 2070–2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

The forecasting of meteorological phenomena is a complex task. The
mathematical, statistical, and dynamic methods developed in recent decades
help address the problem, but there is still a need to investigate new
techniques to improve the results. One of these techniques is statistical
downscaling, which involves the reduction of the model's spatial scale.
Downscaling techniques can be divided into two broad categories: dynamic and
statistical. Dynamic techniques focus on numerical models with more detailed
resolution, while statistical (or empirical) techniques use transfer
functions between scales. Currently, numerical weather prediction (NWP)
models can forecast various meteorological variables with acceptable accuracy

Specifically, rainfall is of great interest, both for its climatic and
meteorological relevance and for its direct effect on agricultural output, hydropower
generation, and other important economic factors. However, it is one of the
most difficult variables to forecast, because of its inherent spatial and
temporal variability

In this context, the aim of this study is to conduct a statistical downscaling to estimate rainfall over South America (SA), based on some models used in the fifth report of the IPCC (Intergovernmental Panel on Climate Change), by applying artificial neural networks and multiple linear regression using principal components.

List of models from the CMIP5 data set used in this study.

We used monthly precipitation simulations for the austral summer (December–January–February)
and winter (June–July–August) generated by 10 models
(Table

Our focus on South America is because it is one of the planet's regions that
will be most affected by the climate change projected for the end of the 21st
century

Illustration of the study areas of the defined regions.

An ANN is a system inspired by the operation of biological neurons with the
purpose of learning a certain system. The construction of an ANN is achieved
by providing a stimulus to the neuronal model, calculating the output, and
adjusting the weights until the desired output is achieved. An entry is
submitted to the ANN along with a desired target, a defined response for the
output (when this is the case, the training is regarded as supervised). An
error field is built based on the difference between the desired response and
the output of the system. The error information is used as feedback for the
system, which adjusts its parameters in a systematic way; in other words, the
backpropagation error algorithm is used to train the network. According to

Structure of the artificial neural network.

Absolute error as a function of the number of iterations; AMZ (green), NEB (red), and LPB (black). Continuous lines represent the summer period for each region, and the dashed lines represent winter.

In the first phase, the functional signal based on the inputs propagates through the network until generating an output, with the weights of synapses remaining fixed. In the second phase, the output is compared with a target, producing an error signal. The error signal propagates from the output to the input, and the weights are adjusted in such a way as to minimize the error. The process is repeated until the performance is acceptable. As such, the performance of the ANN is strongly dependent on the data source.

Residuals

The first part of the data is used for training, the second is used for cross-validation,
and the third part is used for testing. The architecture of the
ANN used in the present study can be found in Fig.

The structure of the ANN used here involves training of 11 predictors (10 outputs of the models plus the observation data) as input to the network, and the best network performance is selected. We therefore expect that the ANN will be able to provide more reliable values (through the error analysis between the simulated values) than when using only climate models.

Proportion and cumulative proportion of variance for the indicated regions. Left column for summer, and right column for winter.

MLR is a statistical technique that consists of
finding a linear relationship between a dependent (observed) variable and
more than one independent variable (outputs of the general circulation models (GCMs)). A multiple
regression model can be represented by the following equation:

In spite of their obvious success in many applications, MLRs present
multicollinearity when employed with climatic variables. In this regard, the
parameter estimation errors can be incorrectly interpreted

MLR is commonly used in various research areas and is widely accepted by the scientific community. The ANNs are still being inserted in science, especially when it comes to climate studies. Our intention is to show advantages of using ANNs for the weather. The advantages of the ANNs stand out: the nonlinearity inherent networks that allow this technique can perform functions that a linear program (such as MLR) can not. In addition, a neural network can be designed to provide information not only about which particular pattern, but also on the confidence in the decision.

After using the precipitation simulations for the period 1970–1999 with the
ANNs, we obtained a final error after a number of interactions, which ranged
from 1 to 600 (Fig.

With respect to winter, the networks remained unstable for a longer time
before finding the minimum error. The NEB region should be highlighted, which
required the largest number of iterations, around 600. This is possibly
related to the greater variability of rainfall in this season (Fig.

According to

To validate the MLR, the following assumptions need to be met: (i) the
residuals must have random distribution around mean zero (homoscedasticity);
(ii) the residuals should have a normal distribution; and (iii) variance must
be homogeneous

The same as in Fig. 4 but for winter.

Figures

Tables

Change in monthly precipitation in terms of an increase or decrease by the end of this century (2071–2100) in the scenarios RCP 8.5 and 2.6, in relation to the reference period 1971–1999 (observation), in millimeters per month and percentage.

Table

In both scenarios, and employing both ANNs and MLR, an increase of
precipitation in the summer and a decrease in the winter can be observed.
These results corroborate the findings of

In the NEB region (Table

This paper investigated the applicability of artificial neural networks and multiple linear regression analysis by principal components, as temporal downscaling methods for the generation of monthly precipitation over South America (for current years and future scenarios). Both the ANN and MLR methods provided good fit with the observed data. This indicates that ANNs are a viable alternative for the modeling of precipitation in time series. ANNs can be compared with the statistical model, and this indicates that the networks are a potentially competitive tool.

The future scenarios used (RCP 2.6, lower climate forcing, and RCP 8.5, higher climate forcing) indicate an increase in precipitation in summer and a reduction in precipitation during winter according to both the methods used.

In general, the results showed that the use of ANNs produced more accurate results than MLR by PCs, which can be attributed to the fact that ANNs perform tasks that a linear program is unable to do. In addition, one of the advantages of ANNs is their capacity for temporal processing and thus their ability to incorporate not only concurrent but also several predictive values as inputs without any additional effort.

We are grateful to CAPES and PPGCC/UFRN for financial support. Edited by: V. Perez-Munuzuri Reviewed by: two anonymous referees