Correlation does not necessarily imply causation, and this is why causal methods have been developed to try to disentangle true causal links from spurious relationships. In our study, we use two causal methods, namely, the Liang–Kleeman information flow (LKIF) and the Peter and Clark momentary conditional independence (PCMCI) algorithm, and we apply them to four different artificial models of increasing complexity and one real-world case study based on climate indices in the Atlantic and Pacific regions. We show that both methods are superior to the classical correlation analysis, especially in removing spurious links. LKIF and PCMCI display some strengths and weaknesses for the three simplest models, with LKIF performing better with a smaller number of variables and with PCMCI being best with a larger number of variables. Detecting causal links from the fourth model is more challenging as the system is nonlinear and chaotic. For the real-world case study with climate indices, both methods present some similarities and differences at monthly timescale. One of the key differences is that LKIF identifies the Arctic Oscillation (AO) as the largest driver, while the El Niño–Southern Oscillation (ENSO) is the main influencing variable for PCMCI. More research is needed to confirm these links, in particular including nonlinear causal methods.

One of the most commonly used methodologies to identify potential relationships between variables in climate research is correlation, with or without a lag (or time delay). For example,

However, such correlation (or linear regression) approaches, despite being useful for identifying potential relationships between variables, do not imply causation. A significant correlation simply means that there is a relationship, or synchronous behavior, between two variables without explicitly confirming a causal link between the two. Correlation suffers from five key limitations. First, a significant correlation between variables could appear by chance (that is called “random coincidence”). Second, the correlation does not allow us to identify the direction of the potential causal link, so this approach supposes an a priori knowledge of processes at play. The problem of directional dependence is often coped with by using lagged correlation or regression, but this method is susceptible to overstate causal relationships when one variable has significant memory

Hence, causal methods prove to be very useful.

The Peter and Clark momentary conditional independence (PCMCI) method is a causal discovery method based on the Peter and Clark (PC) algorithm

The Liang–Kleeman information flow (LKIF;

Commonly, each study focuses on only one causal method. However, contradictory results might appear when using different causal methods, and it is thus important to compare them. Several studies have investigated differences between causal methods. One of the most comprehensive studies in this respect in the recent past is the intercomparison of

The main goal of this study is to provide a detailed comparison between two independent causal methods, namely, LKIF and PCMCI, which have been widely used in the context of the JPI-Climate/JPI-Oceans ROADMAP project (Role of ocean dynamics and Ocean-Atmosphere interactions in Driving cliMAte variations and future Projections of impact-relevant extreme events;

In order to apply the two causal methods described below (Sect.

We first consider a two-dimensional (2D) stochastic linear model (Eq. 12 in

We solve this system with the Euler–Maruyama method using a time step

Then, we investigate a six-dimensional (6D) stochastic linear vector autoregressive (VAR) model with only one lag (Eq. 21 in

We solve this system using 10

The next model is a nine-dimensional (9D) stochastic nonlinear VAR system with a maximum of four lags (Eq. 17 in

We solve this system using 10

We also use the three-dimensional (3D)

We solve the

Finally, we use eight different regional climate indices affecting the Atlantic and Pacific regions of especially the Northern Hemisphere, following a similar approach as

The four atmospheric indices are computed from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis:

The Pacific–North American (PNA) index is obtained by projecting the daily 500 hPa geopotential height anomalies over the Northern Hemisphere (0–90° N) onto the PNA loading pattern (second leading mode of rotated empirical orthogonal function (EOF) analysis of monthly mean 500 hPa height anomalies during the 1950–2000 period). A positive PNA features above-average heights in the vicinity of Hawaii and over the intermountain region of North America and below-average heights south of the Aleutian Islands and over the southeastern United States. A negative PNA reflects an opposite pattern of height anomalies over these regions.

The North Atlantic Oscillation (NAO) index is based on the difference in sea-level pressure between the subtropical high (Azores) and the subpolar low (Iceland). A positive NAO reflects above-normal pressure over the central North Atlantic, the eastern United States, and western Europe and below-normal pressure across high latitudes of the North Atlantic. A negative NAO features an opposite pattern of pressure anomalies over these regions.

The Arctic Oscillation (AO), or Northern Annular Mode (NAM), index is constructed by projecting the 1000 hPa geopotential height anomalies poleward of 20° N onto the leading EOF (using monthly mean 1000 hPa height anomalies from 1979 to 2000). When the AO is in its positive phase, strong westerlies act to confine colder air across polar regions. When the AO is negative, the westerly jet weakens and can become more meandering.

The Quasi-Biennial Oscillation (QBO) index is calculated from the zonal average of the 30 hPa zonal wind at the Equator. It is the most predictable mode of atmospheric variability that is not linked to changing seasons, with easterly and westerly winds alternating each 13 months.

Below are the four indices based on ocean conditions:

The Atlantic Multidecadal Oscillation (AMO) index is computed based on version 2 of the

The Pacific Decadal Oscillation (PDO) index is obtained by projecting the Pacific SST anomalies from version 5 of the NOAA Extended Reconstructed SST (ERSST) dataset onto the dominant EOF from 20 to 60° N. The PDO is positive when SST is anomalously cold in the interior North Pacific and warm along the eastern Pacific Ocean. The PDO is negative when the climate anomaly patterns are reversed.

The Tropical North Atlantic (TNA) index is computed based on SST anomalies from the Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST) and NOAA Optimal Interpolation (OI) datasets averaged in the Tropical North Atlantic (5.5–23.5° N; 57.5–15° W), based on

The Niño3.4 index is based on standardized SST anomalies (using ERSST v5) averaged over the eastern tropical Pacific (5° S–5° N; 170–120° W). The Niño3.4 index is in its warm phase when SST anomaly exceeds 0.5 °C, and it is in its cold phase when SST anomaly is below

In this section, we describe the two causal methods used in this study, namely, the Liang–Kleeman information flow (LKIF; Sect.

The LKIF method has been developed by

Under the assumption of a linear model with additive noise, the maximum likelihood estimate of the information flow reads as follows

To assess the importance of the different cause–effect relationships, we compute the relative rate of information transfer

In the following, we will only use the relative rate of information transfer

The PCMCI method is a causal discovery method based on the Peter and Clark (PC) algorithm

Note that the term “causal” rests upon a set of assumptions, which are described in

In the first step, or PC step, for each actor in the (example) set of actors

In the second step, or MCI step, the partial correlation between each possible pair of actors is calculated a second time by regressing once on the combined set of parents. If we assume that

The strength of a causal link from variable

Before investigating results from the two causal methods, it is important to highlight the main differences between the two methods, which are summarized in Table

The metric used by LKIF is the rate of information transfer from variable

While for both methods the strength of the metric, in absolute value, indicates how strongly two variables are causally linked (i.e., the larger

Main differences between the two causal methods used in this study.

Since correct causal links are known for the three first artificial models (2D, 6D, and 9D models), we can check the performance of the two causal methods, as well as the correlation coefficient, in identifying the ground truth. The diagnostics presented here are not computed for the

To summarize the results from the confusion matrix, we also compute the

We provide results from the four artificial models and the real-world case study hereafter. Table

For the 2D model, the numerical value of the correlation between

LKIF can accurately retrieve the correct causal link, i.e., from

PCMCI only captures the self-influences of

This example shows that LKIF performs well for such a very simple 2D system, while PCMCI struggles with the original time step. In particular, the serial dependency in this particular model might overcast the mutual dependency for a “typical” maximum lag considered by PCMCI, which has not been designed for such conditions.

Numerical results from the 2D model:

For the 6D model, the correlations are significant for all 30 pairs of variables (excluding autocorrelations), despite relatively small values for many of them (Fig.

Both LKIF (Fig.

This example shows the strength of causal methods, which can capture the correct causal influences, while the correlation is not able to provide such information and cannot identify confounding variables and the direction of causality.

Results from the 6D model:

For the 9D model, the correlation does a poor job at identifying correct causal influences (Fig.

Using LKIF without any lag shows that the method can detect all correct links, except

The use of time lags up to

Using PCMCI with lags up to

This example also demonstrates the power of causal methods compared to a correlation analysis when using an appropriate number of lags: all expected links are correctly identified. Although some wrong causal links are identified by both methods, the strength of the relationship remains small for these wrong influences.

Results from the 9D model without lags:

Results from the 9D model with lags:

The only large correlation (excluding autocorrelation) in this system is between

According to LKIF, a two-way causal link appears between

Then, we investigate whether there is a lag dependence on the results. For the correlation and LKIF, we repeat the computation by shifting the three variables one by one with a lag from 0 to 1 unit time (100 time steps) with 0.1 unit time increment (i.e., every 10 time steps). For example, we take

The correlation coefficient between

The LKIF rates of information transfer from

The PCMCI path coefficients between

If we replace

The correlation between

Results from the

Results from the

Results from the

The real-world case study with climate indices shows that 54 % of the pairs of variables (excluding autocorrelations) are related by significant correlations when considering no lag (Fig.

Results from the two causal methods present several similarities, including the AO influence on both PDO and TNA (Fig.

In terms of differences between the two causal methods, LKIF identifies additional causal influences of AO on PNA, NAO, and AMO, while PCMCI does not identify these causal links (Fig.

The use of 12 time lags (0 to 11 months) with both methods (bringing 8 variables

NAO influences PDO with both methods but with very different lags depending on the method, i.e., 11 months with LKIF (Fig.

AO is by far the climate index that influences most variables with LKIF (Fig.

QBO does not have any influence on any other climate indices with any of the methods (Figs.

The AMO–TNA two-way causal influence already identified in Fig.

PDO has an influence on PNA with LKIF at lag

Finally, ENSO influences PDO at lags

Results from the real-world case study:

Results from the real-world case study with LKIF (rate of information transfer; absolute value) as a function of the lag:

Results from the real-world case study with PCMCI (path coefficient; absolute value) as a function of the lag:

Correlation is often used by the climate community to identify potential relationships between variables, but a statistically significant correlation does not necessarily imply causation. In our study, we used two causal methods, LKIF and PCMCI, to disentangle true causal links from spurious correlations, and we applied them to four artificial models and one real-world case study based on climate indices. Below we discuss our results compared to previous literature (Sect.

For the simplest (2D) model used here, we show that LKIF can accurately reproduce the correct causal link, with relatively high accuracy compared to the analytical solution, while PCMCI fails to reproduce this link when using the original time step (Sect.

The above results are not entirely comparable to findings from

The main novelties compared to

True-positive, true-negative, false-positive, and false-negative rates (in

Table

In our study, we extend previous analyses from

Due to the small methodological differences in our analysis compared to

ENSO has a relatively large influence on other climate indices, especially on PDO for both LKIF and PCMCI (Fig.

In this study, we compare two independent causal methods, namely, the Liang–Kleeman information flow (LKIF) and the Peter and Clark momentary conditional independence (PCMCI), and the Pearson correlation coefficient. We use five different datasets with an increasing level of complexity, including three stochastic models, one nonlinear deterministic model, and one real-world case study.

We show that both causal methods are superior to the correlation, which suffers from five key limitations: random coincidence, no identification of the direction of causality, external drivers not distinguished from direct drivers, no identification of potential nonlinear influences, and application to bivariate cases only. For most models and the real-world case study, the number of significant correlations is much larger than the number of significant causal links, which is incorrect from a causal perspective for the three first models. By extension, we assume that the correlation also suffers from this overestimation in the real-world case study, and causal methods allow us to improve results.

When comparing both causal methods together, LKIF can accurately reproduce the correct causal link in the 2D model, while PCMCI cannot with the original time step and needs to be computed with a larger sampling time step to provide correct causal links, although the influence remains small. For the 6D model, both methods can capture the seven correct causal links. For the 9D model, PCMCI correctly reproduces all causal links, and LKIF without any time lag is not totally accurate. When used with time lags, LKIF can identify the correct causal links.

For the

Finally, the real-world case study with climate indices provides some similarities but also important differences between the two methods. In terms of similarities, AO influences both PDO and TNA, there is a two-way causal link between AMO and TNA, and ENSO influences PDO. In terms of differences, LKIF identifies additional influences of AO on PNA, NAO, and AMO, as well as two-way causal links between ENSO and PNA and between ENSO and TNA. When using 12 time lags, the number of influences detected by PCMCI becomes larger compared to LKIF, e.g., ENSO has a large influence on all other variables except NAO, while AO remains the largest influencer (at smaller lags) with LKIF. More detailed analysis of the physical processes would be needed to identify correct causal links between these climate indices.

In summary, this analysis shows that both causal methods should be preferred to correlation when it comes to identify causal links. Additionally, as both LKIF and PCMCI display strengths and weaknesses when used with relatively simple models in which correct causal links can be detected by construction, we do not recommend one or the other method but rather encourage the climate community to use several methods whenever possible. We highlight that both methods, as used here, assume linearity, so results need to be taken with caution for nonlinear problems, such as the

The climate indices were retrieved from the Physical Sciences Laboratory (PSL) of the National Oceanic and Atmospheric Administration (NOAA;

DD, GDC, RVD, CALP, AS and SV designed the study. DD generated the model datasets and retrieved the climate indices. DD computed the LKIF method and Pearson correlation onto the datasets, and GDC ran the PCMCI algorithm. DD led the writing of the manuscript, with contributions from all co-authors. DD created all figures, with the help of GDC. All authors participated to the data analysis and interpretation.

At least one of the (co-)authors is a member of the editorial board of

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

We thank X. San Liang for his feedback related to our analysis. We also thank the editor Stefano Pierini and two anonymous reviewers for their comments, which helped to improve our article.

David Docquier, Giorgia Di Capua, Reik Donner, Carlos Pires, Amélie Simon, and Stéphane Vannitsem were supported by ROADMAP (Role of ocean dynamics and Ocean-Atmosphere interactions in Driving cliMAte variations and future Projections of impact-relevant extreme events;

This paper was edited by Stefano Pierini and reviewed by two anonymous referees.