This paper investigates the potential of a Wasserstein generative adversarial network to produce realistic weather situations when trained from the climate of a general circulation model (GCM). To do so, a convolutional neural network architecture is proposed for the generator and trained on a synthetic climate database, computed using a simple three dimensional climate model: PLASIM.
The generator transforms a “latent space”, defined by a 64-dimensional Gaussian distribution, into spatially defined anomalies on the same output grid as PLASIM. The analysis of the statistics in the leading empirical orthogonal functions shows that the generator is able to reproduce many aspects of the multivariate distribution of the synthetic climate. Moreover, generated states reproduce the leading geostrophic balance present in the atmosphere.
The ability to represent the climate state in a compact, dense and potentially nonlinear latent space opens new perspectives in the analysis and handling of the climate. This contribution discusses the exploration of the extremes close to a given state and how to connect two realistic weather situations with this approach.
The ability to generate realistic weather situations has numerous potential applications. Weather generators can be used to characterize the
spatio-temporal complexity of phenomena in order, for example, to assess the socio-economical impact of the weather
The last decade has seen new kinds of generative methods from the machine-learning field using artificial neural networks (ANNs). Among these, generative adversarial networks (GANs)
Data-driven approaches and numerical weather prediction are two domains that share important similarities.
While there is a growing interest in using deep-learning methods in weather impact or weather prediction
In this study, in order to evaluate the potential of GANs applied to the global atmosphere, a synthetic climate is computed using the PLASIM global
circulation simulator
The article is organized as follows. The formalism of WGAN is first introduced in Sect.
The Earth system is considered to be the solution of an evolution equation
Obtaining a complete description of
For instance, in the present study, the true weather dynamics
Thus,
The main advantage of such a formulation is to have a function
Here the generator is a good candidate for learning the physical constraints that make a climate state realistic without the need to run a complete simulation. The construction of the generator is now detailed.
To characterize the climate, we first introduce a simple Gaussian distribution
The search is limited to a family of transformations
Even with this simplified framework, the search for an optimal
One of the major advantages of the Wasserstein distance is that it is real-valued for non-overlapping distributions. Indeed, the Kullback–Leibler
(KL) divergence is infinite for disjoint distributions, and using it as a loss function leads to a vanishing gradient
Unfortunately, the formulation in Eq. (
However, there is no simple way to characterize the set of
Finally, if the weights of the network are constrained to a compact space
The following sections will aim to create a climate data generator from the WGAN method. The next section will describe the architecture of the network adapted to the complexity of the dataset used.
WGANs are known to be time-consuming to train, usually needing a high number of iterations due to the alternating aspect of the training algorithm
between the critic and the generator. Our initial architecture used a simple convolutional network for both, with a high number of parameters, but it
proved difficult to train a fitting multimodal distribution such as green distributions in the left panels in Fig.
A network is composed of a stack of layers; when a specific succession of layers is used several times, we can refer to it as a
Critic architecture.
Residual identity block for the critic.
Residual convolutional block for the critic. If
Generator architecture.
Residual convolutional block for the generator. The upsampling layer can be removed if not necessary and is mentioned when used in Fig.
One should note that the PLASIM simulator is a spectral model run on a Gaussian grid that consequently enforces the periodic boundary condition. In order to impose the periodic boundary condition in the generated samples, it was necessary to create a
The critic network input has the shape of a sample from the dataset
Its output must be a real number because it is an approximation of the Wasserstein distance between the distribution of the batch of images from the dataset and the one from the generator that is being processed. The architecture ends with a dense layer of one neuron with linear activation. The
core of the structure is taken from the residual network and can be seen in Fig.
The input of the generator network (see Fig. An upsampling layer is added to increase the image size for some convolutional blocks. Wrap and nearest padding layers are added in, respectively, the west–east and north–south directions. A batch normalization layer is present after convolutional layers.
One could argue that the ReLU activation function is not differentiable in 0, but this is managed by taking the left derivative in the software
implementation. The study does not claim that the network architectures used are optimal: the computational burden was too high to run a parameter sensitivity study. Guidelines from
Hyperparameters for training step.
Smoothed version of the Wasserstein distance computed during the training. The vertical axis is in log scale.
For the training phase, the neural network's hyperparameters are summarized in Table
Two-norm distance between a generated sample and all the dataset samples.
The training loss in Fig.
Variables used in the dataset.
There are no stopping criteria for the training, and it was stopped after 35 000 iterations in the interest of computational cost. It should be highlighted that the performance of generative networks and especially GANs is difficult to evaluate. In the deep-learning literature, the quality of the images generated is assessed using a reference image dataset such as ImageNet
Because our study does not apply to the ImageNet dataset, it is necessary to compute our own metrics. Section
The metrics by which the results will be analyzed are visual aspects, capacity to generate atmospheric balances and statistics of the generations compared to climate distribution. For the latter, the chosen metric is the Wasserstein distance. Because it is the same metric the generator has to minimize during the training step, it seems a good candidate to assess the training quality. One could argue that the network is overly trained on this metric; that is why we use other metrics such as mean and standard deviation differences and singular value decomposition to complete our analysis. Finally, because no trivial stop criteria are available, it is interesting to see where the magnitude of the Wasserstein distance is large so as to diagnose some limitations of the trained generator that would provide some ideas of improvements.
To create synthetic data, a climate model known as PLASIM
A 100-year daily simulation was run at a T42 resolution (an approximate resolution of 2.8
Each database sample is an 82-channel (nfield) two-dimensional matrix of size 64 (nlat) by 128 (nlon) pixels. The channels represent seven physical three-dimensional variables: the temperature (ta), the eastward (ua) and northward (va) wind, relative humidity (hus), vertical velocity (wap), the relative vorticity (
Our study aims to have a generator able to reproduce the climate distribution present in the dataset made from the low-resolution GCM PLASIM. This section proposes a way to assess the quality of the distribution learned by the WGAN.
The first required property for a weather generator is a low computational cost compared to the GCM that produced the data. Here the simulation with the GCM PLASIM took 50 min for a 100-year simulation in parallel on 16 processors, whereas the generator took 3 min to generate 36 500 samples equivalent to a 100-year simulation on an NVIDIA Tesla V-100.
Each generated sample is compared with dataset samples. Figures
Sample on three different pressure levels (1000, 500 and 100
Sample on three different pressure levels (1000, 500 and 100
Mean error over 30 years of the normalized dataset and the same number of normalized generated samples on three different pressure levels (1000, 500 and 100
Standard deviation error over 30 years of the normalized dataset and the same number of normalized generated samples on three different pressure levels (1000, 500 and 100
In order to quantitatively assess the generator quality, Figs.
Scalar product of SVD components derived from a dataset and generated data.
Spatial components corresponding to principal components of SVDs applied to the dataset and the generated samples.
In order to go further in the analysis of the generated climate states, a singular value decomposition (SVD) was performed over 30 years of the
dataset (renormalized over the 30 years). Then the same number of generated data was considered and projected onto the five first principal components of the SVD that represent 75 % of explained variance of the dataset. In Fig.
Location from where the temperature distributions are plotted in Fig.
Temperature distribution at different locations for 5000 samples from dataset (green) and generated (blue).
Figure
Wasserstein distance between 5000 datasets and generated samples on each pixel and each channel.
Distributions with the higher
Wasserstein distance between 5000 datasets and generated samples on each pixel grouped by pressure height
It follows that a good way to see the general statistics learned by the generator is to plot the Wasserstein distance for every pixel and for every
variable. This result can be visualized spatially in Fig.
The previous subsection has shown the ability of the generator to engender weather situations and climate similar to those of the simulated weather.
However, geophysical fluids are featured by multivariate fields that present known balance relations. Among these balances, the simplest ones are the geostrophic and thermal wind balances (see, e.g.,
Geostrophic and ageostrophic wind derived from geopotential at 500
Relative error in the norm between geostrophic wind and normal wind shown in Fig.
The geostrophic balance occurs at a low Rossby number when the rotation dominates the nonlinear advection term. Two forces are in competition: the Coriolis force,
This asymptotic balance Eq. (
Figure
A similar behavior can be observed in Fig.
We find that weather situations generated from samples in the latent space reproduce the geostrophic balance at an order of approximation that is similar to the one of the real dataset. This means that the generator is able to produce the realistic multivariate link between the wind and the geopotential. This property is essential in operational weather forecasting, e.g., in producing balanced fields in the ensemble Kalman filter.
Temperature (
Thermal wind balance from the boreal winter situation shown in Fig.
Temperature (
The thermal wind balance arises by combining the geostrophic wind Eq. (
Figure
The same illustrations are shown in Fig.
This section has shown the ability of the generator to reproduce some important balances present in the atmosphere. In particular, the generator is
able to produce mid-latitude cyclones whose velocity field is in accordance with the geostrophic balance. The authors emphasize that it is necessary to conduct more analysis of the weather situations outputted by the generator, which is beyond the scope of this study. For example, it would be interesting to assess whether other inter-variable balances are present, such as the
An exploratory study was done on the property of the latent space and its consequence in the climate space in regard to climate domain problematics. If the generator is perfectly trained, then each sample generated with it should represent a typical weather situation. It is hard to figure out what the attractor of the climate is. However, the geometry of the Gaussian in high dimension being known, it is easy to characterize the climate in the latent space.
For a normal law in the high dimension space
Considering these properties, one can introduce a two dimensional pseudo-representation which preserves the isotropy of the distribution as well as the distribution to the origin: a random sample vector
Pseudo-spherical metaphorical representation of 10 000 samples of the normal distribution in
Figure
This suggests evaluating whether the extremes of the latent space correspond to those of the meteorological space.
Knowing what are the extremes in the latent space might be helpful to determine what are the extremes of the climate, at least to determine what are extreme situations closed to a given state.
For any sample in the latent space, say point
Generations obtained by radial interpolation in the latent space. Panel
Figure
The most likely typical state
The link of the animation of such interpolation is available on GitHub
Even if there are no dynamics in the latent space, which makes it impossible to construct a prediction from this space, we can consider how to
interpolate two latent states. A naive answer is to compute the linear interpolation between two samples of the latent space
So as to preserve the likelihood of the interpolated weather situations, it is better to introduce a spherical interpolation. This kind of interpolation has also been used in image processing, where, e.g.,
Spherical interpolation snapshots. Respectively, panels
Linear interpolation in the latent space interpolation snapshots. Respectively, panels
Linear interpolation in the image space. Respectively, panels
This interpolation will connect point
The objective of this experience is to be able to produce realistic intermediate states. This can be visible in Fig.
In this section, the goal is to study the difference between two climate states coming from close latent points. In this experiment, sample
Geopotential height: the first column reference corresponds to
Figure
Our study shows that it is possible to map the climate distribution output of a GCM to a much simpler low-dimensional distribution using a highly nonlinear neural-network-based generator. It also proposes ways to assess the quality of the generator by evaluating statistical quantities as well as with respect to physical balance properties.
In this article, a weather generator based on the WGAN method able to produce realistic states of the atmosphere was created. Metrics such as SVD principal component comparison, Wasserstein distance on pixel value distribution and mean and standard deviation comparison were used in order to be compared to other future proposed methods.
A comparison of the atmospheric balance was realized between samples and averaged over 30 years of data, showing promising results. Coherence between variables as well as spatial coherence were also shown to be promising.
Interesting properties of such a generator were discussed with regard to possible applications in insurance, weather simulation and data assimilation. The generator is able to generate intermediate realistic climate states with coherent structures, interpolate between two defined states with other plausible states, and create realistic perturbations around a climate state, all at a low computational cost compared to a GCM.
A study was also done on the interpretability of the latent space and the connections between the extreme events in the data space and the latent space. It highlighted the radial direction as the direction of the intensity of climate events.
Our results highlight the ability of the method to handle the mapping of a high-dimensional distribution onto a multivariate Gaussian. We believe this is an important step that opens many opportunities for climate data exploration. Some extensions of this work as well as potential application are highlighted in the following.
First, the WGAN could be conditioned by the season or by the day in the year. Such conditioning would give access to other quantitative methods to assess the quality of the weather generator. It would be also an important step towards application in the risk assessment area, for example.
Optimization can be done to find specific states in the latent space by defining an objective function such as Euclidian distance in the climate
space. The network gradient with respect to its inputs being accessible, direct minimization can be used to find climate states that fit observations
in data assimilation problems. More advanced strategies, such as training a separate inference network
A more sophisticated dataset could be used, such as a true climate reanalysis, to see the effect of the dataset complexity on the method's performance. The optimization of the network's architecture and a sensitivity study on the hyperparameters such as the dimension of the latent space, for example, would be useful. Moreover, it would be interesting to see whether it is possible to take advantage of the GAN trained in PLASIM to facilitate the training of a GAN on the reanalysis.
The structure of the latent space and its interpretability is also a critical way to exploit the specificities of the method. The opportunity to find similar climate states with extreme events is also something not possible with other weather generators and could have lots of application for risk assessment applications.
The definition of additional metrics to assess the quality of the generator should be the main focus following this study to identify improvement of the method and facilitate the participation from diverse researcher communities.
Finally, we could consider restarting the GCM from a generated state to assess how well balanced the generated fields are, which could have important implications in data assimilation methods.
The study is a first step towards deep-learning weather generation; while many challenges remain to be solved, it shows several potential applications of GANs to improve the effectiveness of current approaches.
The code and the weights of the trained neural network are available at the following GitHub repository in v0.1:
The dataset used is available on demand. The GitHub repository explains how to recreate it from a PLASIM simulation.
The authors contribute to the design of the neural network architecture and the experiments. CB implemented the neural network architecture, performed the PLASIM simulation and trained the WGAN. The analysis of the results has been made by the authors.
The authors declare that they have no conflict of interest.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research paper was written during a thesis in partnership with Total. We would like to thank Philippe Berthet, Anahita Abadpour, Daniel Busby and Tatiana Chugunova for their support in the application of our method in different fields of the geosciences. We would like to thank Rabeb Selmi for her help and for sharing her expertise.
This paper was edited by Takemasa Miyoshi and reviewed by two anonymous referees.