Explaining the high skill of reservoir computing  methods in El Niño prediction

Guardamagna, Francesco; Wieners, Claudia; Dijkstra, Henk A.

doi:https://doi.org/10.5194/npg-32-201-2025

Articles | Volume 32, issue 2

https://doi.org/10.5194/npg-32-201-2025

Articles | Volume 32, issue 2

Research article

01 Jul 2025

Research article |

| 01 Jul 2025

Explaining the high skill of reservoir computing methods in El Niño prediction

Francesco Guardamagna, Claudia Wieners, and Henk A. Dijkstra

Abstract

Accurate prediction of the extreme phases of the El Niño–Southern Oscillation (ENSO) is important to mitigate the socioeconomic impacts of this phenomenon. It has long been thought that prediction skill was limited to a 6-month lead time. However, machine learning methods have shown to have skill at lead times of up to 21 months. In this paper, we aim to explain for one class of such methods, i.e. reservoir computers (RCs), the origin of this high skill. Using a conditional nonlinear optimal perturbation (CNOP) approach, we compare the initial error propagation in a deterministic Zebiak–Cane (ZC) ENSO model and that in an RC trained on synthetic observations derived from a stochastic ZC model. Optimal initial perturbations at long lead times in the RC involve both sea surface temperature and thermocline anomalies, which leads to decreased error propagation compared to the ZC model, where mainly thermocline anomalies dominate the optimal initial perturbations. This reduced error propagation allows the RC to provide a higher skill at long lead times than the deterministic ZC model.

Download & links

Article (PDF, 3112 KB)

Download & links

How to cite.

Received: 13 Nov 2024 – Discussion started: 20 Nov 2024 – Revised: 04 Apr 2025 – Accepted: 06 Apr 2025 – Published: 01 Jul 2025

1 Introduction

The El Niño–Southern Oscillation (ENSO) phenomenon, driven by ocean–atmosphere interactions in the tropical Pacific, is one of the biggest sources of interannual climate variability (Neelin et al., 1998). The full ENSO cycle shows an irregular period of 2–7 years. During its warm (El Niño) and cold (La Niña) phases, ENSO strongly affects the climate all over the globe through well-known teleconnections (McPhaden et al., 2006), increasing the incidence of extreme weather events such as global droughts (Yin et al., 2022) and tropical cyclones (Wang et al., 2010). ENSO can therefore have a substantial impact on the worldwide economy (Liu et al., 2023 a), and accurate and reliable forecasts are necessary to mitigate its socioeconomic consequences.

For this reason, ENSO modelling and forecasting have been a central topic of extensive research, which, thanks to the contribution of the Tropical Ocean–Global Atmosphere programme, led to the development of a complete hierarchy of models. This hierarchy includes conceptual models (Jin, 1997; Suarez and Schopf, 1988; Takahashi et al., 2019; Timmermann et al., 2003), intermediate complexity models (Zebiak and Cane, 1987; Battisti and Hirst, 1989), and global climate models (Planton et al., 2021). Many of these classical dynamical models can reasonably forecast ENSO for up to a lead time of 6 months, with a correlation between predictions and observations larger than 0.5 (Barnston et al., 2012), but their skill rapidly decreases for longer lead times.

In recent years, the application of machine learning (ML) techniques for predicting ENSO has significantly advanced (Bracco et al., 2024). Ham et al. (2019) showed that convolutional neural networks (CNNs) trained with CMIP5 and reanalysis data could obtain reasonable skill at lead times of up to about 17 months. Hu et al. (2021) advanced the CNN approach by integrating dropout and transfer learning with a residual CNN, obtaining a good performance for a lead time of up to 21 months. Long short-term memory (LSTM) networks, able to exploit the temporal dynamics present in the training data, have also been successfully applied to ENSO forecasting (Xiaoqun et al., 2020). More recent studies have combined LSTMs with other methods such as graph neural networks (Jonnalagadda and Hashemi, 2023), CNNs (Mahesh et al., 2019) and autoencoders (Jonnalagadda and Hashemi, 2023) to create hybrid models boosting the performance as they are able to capture both the spatial and the temporal dynamics present in the data. Reservoir computer (RC) methods, a special class of recurrent neural networks (RNNs), have shown optimal performance in predicting ENSO (Hassanibesheli et al., 2022). The RC offers a good balance between performance and model simplicity, which enhances explainability and facilitates analysis of model predictions. Moreover, like other RNN-based models, the RC offers the possibility of generating a self-evolving system that does not rely on external inputs (Guardamagna et al., 2024). This characteristic is crucial to understanding the internal dynamics of the RC and the evolution of errors over time during forecasting.

All these new tools provide more accurate forecasting skills than classical dynamical models, especially for longer lead times, and seem to be able to circumvent the spring predictability barrier (SPB). The SPB (Webster and Yang, 1992; Lau and Yang, 1996) has been identified and documented across all of ENSO's dynamical model hierarchy from conceptual models (Jin and Liu, 2021 a, b; Jin et al., 2021) to comprehensive general circulation models (GCMs; Duan and Wei, 2013). In particular, in the intermediate-complexity Zebiak–Cane (ZC) model (Zebiak and Cane, 1987), the SPB has been rigorously studied and quantified using the conditional nonlinear optimal perturbation (CNOP) framework (Mu et al., 2007). This tool has been applied to investigate the sensitivity of the ZC model to both initial conditions (Duan et al., 2013) and model parameters (Yu et al., 2014) uncertainties. Thus, the ZC model is an excellent test bed to analyse why ML algorithms can have skill beyond the SPB, providing good predictions even when initialized during boreal spring.

In this paper, we aim to explain the good performance of RC methods in ENSO prediction. Specifically, we compare the evolution of optimal initial perturbations, determined using the CNOP approach, between the RC (trained with synthetic observations from the stochastic ZC model) and the deterministic ZC model. In Sect. 2, we shortly describe the ZC model and the CNOP technique, focusing on the changes introduced to adapt them to our analysis; in addition, the RC approach is briefly presented. In Sect. 3, we first assess the performance of the RC and then present results of the CNOP analysis for both the RC approach and the ZC model. A summary and discussion of the results follow in Sect. 4.

2 Models and methods

2.1 Zebiak–Cane (ZC) model

The ZC model is an intermediate-complexity ENSO model that describes the evolution of anomalies with respect to a prescribed seasonal mean climatological state across the tropical Pacific. The state vector of this model consists of two-dimensional fields of sea surface temperature, thermocline depth, oceanic and atmospheric velocities, and atmospheric geopotential. For a complete description of the model's components and equations, we refer the reader to Zebiak and Cane (1987). We use both the original deterministic ZC model and a stochastic ZC model following the approach described in Roulston and Neelin (2000). In this stochastic version, only noise in the zonal wind-stress field is applied as follows. First, a linear regression (LR) model relating sea surface temperature (SST) anomalies and surface zonal wind-stress anomalies was constructed empirically from observations using the ORAS5 dataset (Copernicus Climate Change Service, 2021) over the period between 1961 and 1991 with a time step of 10 d (corresponding to the ZC model time step). Next, the variability explained by this linear model was subtracted from the total zonal wind-stress field to obtain the residual zonal wind-stress anomalies. The first empirical orthogonal function (EOF) of this residual (Fig. A1 in Appendix A) shows a strong component over the eastern Pacific. In Feng and Dijkstra (2017), the first two EOFs were included, where the second EOF captures the westerly wind bursts, but to keep the spatial noise structure simple, we only included the first EOF. Finally, the principal component (PC) related to the first EOF was fitted to a first-order autoregressive model:

\begin{matrix} (1) & x_{t + 1} = a x_{t} + b ϵ_{t}, \end{matrix}

where ϵ_t is a white noise term following a Gaussian distribution with a zero mean and unit variance ( $ϵ_{t} \sim N (0, 1)$ ), while a and b are the fitted parameters. This fitted first-order autoregressive model was used during integration to generate a different (random) zonal wind-stress anomaly pattern at each time step.

There is still a debate on whether the Pacific climate state is in a subcritical or supercritical regime (Kessler, 2002; Guardamagna et al., 2024). This distinction hinges on whether ENSO variability is a damped oscillation excited by stochastic forcing (subcritical) or occurs as a sustained oscillation or limit cycle (supercritical). In the supercritical case, ENSO behaviour is strongly influenced by nonlinearities, which arise from three main sources in the ZC model: heat advection, wind-stress anomalies, and subsurface water temperature variations (Duan et al., 2013). Given this ongoing debate, we study both regimes here, which can be easily distinguished in the ZC model by varying a single parameter. Following Tziperman et al. (1994), we use a parameter r_d in the drag coefficient $C_{d} = r_{d} C_{d}^{0}$ , where $C_{d}^{0}$ is the standard value in the ZC model. Given the zonal and meridional wind velocities $u_{a} = (u_{a}, v_{a})$ , the ZC model computes the wind stress (τ_x,τ_y) acting on the ocean surface according to the following bulk formula:

\begin{matrix} (2) & (τ_{x}, τ_{y}) = ρ_{air} r_{d} C_{d}^{0} | u_{a} | (u_{a}, v_{a}), \end{matrix}

where ρ_air is the air density and r_d=1 is the original model configuration (Zebiak and Cane, 1987). With increasing r_d, the ZC model generates a larger wind-stress response to sea surface temperature anomalies, intensifying the coupling strength between ocean and atmosphere.

In the deterministic version of the ZC model, an initial anomaly on the seasonal background state rapidly decays for r_d=0.79. In contrast, for r_d=0.8, ENSO variability occurs as a periodic solution with a ∼4-year period (Fig. A2). Hence, the Hopf bifurcation bounding the two regimes is located between r_d=0.79 and r_d=0.8; here, we choose r_d=0.77 as a value in the subcritical regime and r_d=0.9 in the supercritical regime. When noise is introduced, the ZC model's ENSO is phase-locked in the winter season (Fig. A3) for both r_d=0.77 and r_d=0.9. The SPB is identified with the initial month corresponding to the fastest decrease in autocorrelation in eastern Pacific SST anomalies (Jin and Liu, 2021 a). According to this definition, the ZC model shows a clear SPB in May for both r_d=0.77 and r_d=0.9 (Fig. A4). All these aspects make the ZC model a good test bed for understanding why the RC can circumvent the SPB in both the subcritical and supercritical regime.

2.2 Reservoir computer

Although the procedure to generate an RC has been well described elsewhere (Pathak et al., 2018), we briefly summarize the approach here, while also introducing our notation. Given an input signal $u (n) \in R^{N_{u}}, n = 1, \dots, N_{t}$ , where N_t is the total number of time steps, and a given output signal $y^{target} (n) \in R^{N_{y}}$ , the RC has to learn how to estimate an output signal $y (n) \in R^{N_{y}}$ as similar as possible to y^target(n). To do that during the training procedure, an error measure E(y,y^target) is minimized, for which we choose a common measure for regression problems: the mean squared error (MSE) defined by

\begin{matrix} (3) & E (y, y^{target}) = \frac{1}{N_{y}} \sum_{i = 1}^{N_{y}} (\frac{1}{N_{t}} \sum_{n = 1}^{N_{t}} (y_{i} (n) - y_{i}^{target} (n))^{2}) . \end{matrix}

Before the training procedure, the input data u(n) are nonlinearly expanded into a higher-dimensional so-called reservoir space, generating in this way a new signal $x (n) \in R^{N_{x}}$ . This new representation of the data also contains temporal information and is based on the following update equations:

\begin{array}{l} (4a) & \tilde{x} (n) & = \tanh (W^{in} u (n) + W x (n - 1)), \\ (4b) & x (n) & = (1 - α) x (n - 1) + α \tilde{x} (n), \end{array}

where the hyperbolic tangent (tanh) is applied component-wise. Including a nonlinear activation function such as tanh in the update equations enables the RC to estimate nonlinear relationships among the input variables in contrast to less sophisticated models such as the linear regressor, which can only capture linear relationships. This gives the RC an advantage in scenarios where nonlinearities play a significant role. The two matrices $W^{in} \in R^{N_{x} \times N_{u}}$ and $W \in R^{N_{x} \times N_{x}}$ are generated randomly according to chosen hyperparameters. The non-zero elements of W and Wⁱⁿ are sampled from a uniform distribution over the range $[- a, a]$ . The sparse matrix W derives from a random network with mean degree $< k >$ , while Wⁱⁿ is a dense matrix. The quantity $α \in (0, 1]$ in Eq. (4b) is the leaking rate. The output layer is defined as y(n)=W^outx(n), where $W^{out} \in R^{N_{y} \times N_{x}}$ , and during the training procedure, only the weights of W^out are estimated by minimizing E(y,y^target) through a linear regression procedure. We use a ridge regression to avoid overfitting, leading to the loss function ℒ:

\begin{matrix} (5) & L (W^{out}) = E (y, y^{target}) + ϵ \sum_{i = 1}^{N_{y}} \sum_{j = 1}^{N_{x}} (W_{i, j}^{out})^{2} . \end{matrix}

The hyperparameters are given by the dimension of the reservoir (N_x), the spectral radius of the matrix W (ρ), the sparsity of W's connections $< k >$ , the input scaling a, and the leaking rate α. Given an input sequence u(n)=y^target(n), the RC is trained by determining W^out from the sequence $y (n) = u (n + 1) = y^{target} (n + 1)$ using the loss function (Eq. 5).

After training, the RC can be transformed into an autonomous evolving dynamical system to be used for prediction (Pathak et al., 2018). Thereto feedback connections between the outputs at time step n and the inputs at the subsequent time step are introduced. In this way, a model is generated that autonomously evolves in time according to

\begin{matrix} (6a) & \begin{aligned} x (n + 1) = & (1 - α) x (n) \\ + α \tanh (W x (n) + W^{in} u (n + 1)), \end{aligned} \\ (6b) & u (n + 1) = y (n) = W^{out} x (n), \end{matrix}

where x(n) and x(n+1) are the reservoir states at time step n and n+1, while y(n) is the output at time step n, and u(n+1) is the input at the subsequent time step n+1. This property of the RC allows us to make predictions similar to classical dynamical systems. Consequently, we can study how an initial perturbation evolves in the RC.

In the results below, the input vector u consists of the following feature variables: the NINO3 index, the thermocline depth anomalies h_W and h_E averaged over the regions 5–5° S, 120–180° E, and 5° N–5° S, 180–290° E, respectively, and the zonal surface wind-speed anomalies τ_C averaged over the area 5° N–5° S, 145–190° E. Instead of directly using zonal surface wind-stress anomalies, zonal surface wind-speed anomalies are used as a proxy. The two variables are inherently correlated through the bulk formula (Eq. 2) and therefore convey similar information. However, a key distinction arises from how noise is introduced in the ZC model, specifically in the form of random zonal wind-stress anomalies. This leads to random local fluctuations in the zonal wind-stress signal, which are inherently difficult for the RC to predict and reproduce. In contrast, the surface wind-speed anomaly signal is smoother and more predictable, making it easier for the RC to learn and generalize efficiently.

In addition to the previous variables, a sine signal with a 12-month period was included to represent the seasonal cycle such that N_u=5. Although a combination of sine and cosine signals is required to uniquely identify each month of the year, we found that including both made little difference in performance. Therefore, to minimize the number of input variables and reduce the complexity of the learned function, we decided to use only the sine signal. The output vector consists of the same variables as in the input except for the sine signal; hence, N_y=4. In self-evolving mode, the sine signal encoding the seasonal cycle is provided as an external input rather than generated directly by the RC.

2.3 CNOP computation

Our implementation of the CNOP methodology follows the one described by Duan et al. (2013). Let $M_{t_{0}, t}$ be the propagator of a nonlinear model from initial time t₀ to a chosen end time t_e. We indicate v₀ as the initial perturbation superimposed on the model's background state V₀ at time t₀. For a selected norm $‖ . ‖$ , an initial perturbation v_0δ is defined as a CNOP if and only if

\begin{array}{l} (7a) & J (v_{0}) & = ‖ M_{t_{0}, t_{e}} (V_{0} + v_{0}) - M_{t_{0}, t_{e}} (V_{0}) ‖, \\ (7b) & J (v_{0 δ}) & = max_{C (v_{0}) \leq δ} J (v_{0}), \end{array}

where C(v₀) is the constraint condition and $M_{t_{0}, t} (V_{0})$ represents the model state at time t when the integration starts from the background state V₀ at time t₀. In Duan et al. (2013), an initial perturbation is applied to all the grid points over the tropical area, and the constraint condition to the initial perturbation amplitude C(v₀) is defined as

\begin{matrix} (8) & C (v_{0}) = \sqrt{\sum_{i, j} [(w_{T}^{- 1} T_{i, j}^{'})^{2} + (w_{h}^{- 1} h_{i, j}^{'})^{2}]}, \end{matrix}

where $T_{i, j}^{'}$ and $h_{i, j}^{'}$ are the initial sea surface temperature anomalies (SSTA) and thermocline depth anomalies, respectively, at grid point (i,j). The weights w_T=2 °C and w_h=50 m represent the characteristic scale of SST and thermocline depth anomalies, respectively.

As mentioned in Sect. 2.2, the RC is trained using a limited feature vector. To ensure a fair comparison of CNOPs between those of the RC and the ZC models, the tropical area of the ZC model is divided into boxes and uniform perturbations are applied over those boxes. Specifically, we apply a uniform SSTA perturbation $T_{E}^{'}$ over all the grid points in the NINO3 area (5° N–5° S, 210–270° E); a uniform thermocline depth perturbation $h_{W}^{'}$ to all the grid points in the area 5° N–5° S, 120–180° E; and a uniform thermocline depth perturbation $h_{E}^{'}$ to all the grid points in the area 5° N–5° S, 180–290° E. The constraint condition can then be written as

\begin{matrix} (9) & C (v_{0}) = \sqrt{(w_{T}^{- 1} T_{E}^{'})^{2} + (w_{h}^{- 1} h_{E}^{'})^{2} + (w_{h}^{- 1} h_{W}^{'})^{2}} . \end{matrix}

For both the RC and the ZC model, the objective function J(v₀) in Eq. (7b) has been defined as the root squared error (RSE) between the perturbed and background trajectories. Specifically, if we define the NINO3 index value at time t when the integration start from the initial state V₀ as NINO3(t,V₀), the objective function J(v₀) is defined as

\begin{matrix} (10) & J (v_{0}) = \sqrt{\sum_{t = t_{0}}^{t = t_{N}} (NINO3 (t, (V_{0} + v_{0})) - NINO3 (t, V_{0}))^{2}}, \end{matrix}

where t_N=t_e. To solve the optimization problem associated with determining the CNOP, we use the gradient-free COBYLA optimization algorithm (Powell, 1994). Since the COBYLA algorithm starts its optimization process from a random initial guess, we always perform 10 different realizations starting from 10 different initial guesses to select the CNOPs that shows the largest error propagation according to the value of J(v₀); a detailed description of the COBYLA algorithm is reported in Appendix B.

3 Results

In the Results section, we first explain the training and validation of the RC (Sect. 3.1), demonstrate the forecasting skills of the RC (Sect. 3.2) while also demonstrating the importance of the zonal surface wind velocity anomalies as a training variable. Next, in Sect. 3.3, we present the results of the CNOP analysis for both the RC and deterministic ZC models.

3.1 Training and validation of the RC

For both subcritical (r_d=0.77) and supercritical (r_d=0.9) regimes, we first performed a simulation of 1000 years with the stochastic ZC model using a time step of 10 d. We refer to these data as synthetic observations. The NINO3 amplitudes of the supercritical case (Fig. 1b) are, as expected, about a factor of 2 larger than those of the subcritical case (Fig. 1a). As mentioned in Sect. 2.2, the 12-month-period sine signal and the feature vector components h_W, h_E, τ_C, and NINO3 (extracted from the synthetic observation time series) are used to train the RC. To investigate the effect of τ_C on the performance of the RC, we also trained a second RC using only h_W, h_E, NINO3, and the sine signal. Before training, NINO3 and both h_W and h_E have been normalized by w_T=2 °C and w_h=50 m, respectively. From the total 1000 years of synthetic observations, the first 300 years were discarded to avoid capturing any initial transient behaviour. The next 500 years were used for training and validation (300 years for training and 200 years for validation), and the last 200 years were used for testing, ensuring an independent evaluation of the RC model performance. The training of the RC is described in Sect. 2.2, where, given an input sequence u(n)=y^target(n), W^out is determined from the sequence $y (n) = u (n + 1) = y^{target} (n + 1)$ using the loss function (Eq. 5).

https://npg.copernicus.org/articles/32/201/2025/npg-32-201-2025-f01

Figure 1NINO3 index from the last 700 years of the stochastic ZC model simulations (synthetic observations) used to train, validate, and test the RC model: (a) r_d=0.77 and (b) r_d=0.9.

Explaining the high skill of reservoir computing methods in El Niño prediction

2.1 Zebiak–Cane (ZC) model

2.2 Reservoir computer

2.3 CNOP computation

3.1 Training and validation of the RC

3.2 RC performances

3.3 CNOP analysis