Representation learning with unconditional denoising diffusion models for dynamical systems

Finn, Tobias Sebastian; Disson, Lucas; Farchi, Alban; Bocquet, Marc; Durand, Charlotte

doi:https://doi.org/10.5194/npg-31-409-2024

Articles | Volume 31, issue 3

https://doi.org/10.5194/npg-31-409-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/npg-31-409-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 31, issue 3

Research article

| Highlight paper

|

19 Sep 2024

Research article | Highlight paper |

| 19 Sep 2024

Representation learning with unconditional denoising diffusion models for dynamical systems

Tobias Sebastian Finn, Lucas Disson, Alban Farchi, Marc Bocquet, and Charlotte Durand

Download

Final revised paper (published on 19 Sep 2024)
Preprint (discussion started on 20 Oct 2023)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2261', Sibo Cheng, 12 Mar 2024
This research paper presents a study on using denoising diffusion models for data-driven representation learning of dynamical systems. The research demonstrates the utility of such networks with the Lorenz 63 system, showing that the trained network can produce samples almost indistinguishable from those on the attractor, indicating the network has learned an internal representation of the system. This representation is then used for surrogate modeling and generating ensembles out of a deterministic run.
Overall I found this paper very well written and the contribution of introducing diffusion model into dynamical systems in geoscience novel and of clear contribution. Here lists my comments before I can recommend acceptance of this manuscript:
Comments:
1. If I understand correctly, the objective of this study is to explore the possibility of using diffusion model for high-dimension systems in geoscience. The numerical experiments are carried out using a three dimensional Lorenz model. To enhance the discussion, It would be beneficial if the authors could explain how generalizable their approach is to a high-dimensional spatial temporal system (e.g. by adding CNN or transformer layers for feature extractions (encoding) and decoding etc).
2. As a consequence of the small dimension, the ‘latent space’ in your diffusion model (256) is much larger the one of the physics space (3). Therefore, you have little risk in losing any information when using the denoising network for surrogate modelling. The authors may consider adding a baseline of transfer learning from an untrained (randomly initialized denoising NN) in Fig 7. The authors have shown the results of untrained NN in Tab 3 but only with a linear fine-tuning. What happens if you fine-tune with a non-linear NN of an untrained denoising NN?

Minor questions:
In figure 7, it seems that the dense neural network with two layers trained from scratch outperforms your transfer learning from the diffusion model. Is that the case? In fact, results in tab 3 also show that the model trained from scratch (dense *3 and resnet) performs similarly to the fine-tuning from your diffusion model? The authors may want to add some comments regarding this

Page 3, ‘generative training is rarely used for pre-training and representation learning of high-dimensional systems’. There are some works tried to use diffusion model for contrastive models, e.g,

-Yang, X. and Wang, X., 2023. Diffusion model as representation learner. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 18938-18949).
- Mittal, S., Abstreiter, K., Bauer, S., Schölkopf, B. and Mehrjou, A., 2023, July. Diffusion based representation learning. In International Conference on Machine Learning (pp. 24963-24982). PMLR.
The authors may want to include some references and discuss the difference/similarity compared to the method used in this paper. This paper is probably the first one to propose diffusion-based representation learning in dynamical systems(?)
3. Page 9, ‘show that this representation is entangled’ why it is important for the learned features to be entangled?
4. Page 11, check the sentence ‘As we will see later, the bigger the Because of the state-dependency, the resulting distribution is implicitly represented by the ensemble and could extend beyond a Gaussian assumption’
5. Page 13, it seems that you have used a lot of training samples (1.6*E7) for your diffusion model for the Lorenz system of dimension 3. I was wondering if a standard surrogate model will require that much. That is saying maybe a standard surrogate model can outperform the diffusion-based one with less training data. I am curious to see the authors’ thought.
6. fig 5 (a) and 1(b). if I understand correctly, the x-axis is the pseudo time instead of the real time in the dynamical system. if it is the case, it would be benificial to add an x-axis label to avoid any confusion.
Citation: https://doi.org/10.5194/egusphere-2023-2261-RC1
- AC1: 'Reply on RC1', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC1
RC2:
'Comment on egusphere-2023-2261', Anonymous Referee #2, 03 Apr 2024

This is a very interesting and novel study on the use of denoising diffusion model for representation learning. The manuscript is well written and describes very nicely the context, how these approaches (rooted in image applications) can be adapted to geosciences, and illustrates two distinct relevant applications, surrogate modelling and ensemble generations, that are both extremely important in high dimensional settings.
I think the manuscript can be accepted almost as it is, but I have a few minor comments I would encourage the Authors to look at.
1) While there are little spaces for doubts, I would strongly suggest the Authors to specify that their approach applies to ergodic chaotic dynamics for which an invariant distribution exists that describe the state distribution on the system's attractor. An obvious counterexample would be a stable system having an equilibrium point (or a limit cycle) as attractor.
2) When mentioning the Schrodinger Bridge (page 2), you may want to refer to Reich S. 2019 (doi:10.1017/S0962492919000011) as an exemplar study of the same analogy but in the area of data assimilation.
3) Line 27. "..dynamical systemS ..."
4) In the caption of Fig1b, use (left/right) to point the reader.
5) Line 44. I think you should always order references chronologically.
6) Line 53--59. While I understand and I like the Authors narrative and choice of references. Nevertheless, and particularly for the readers of NPG, it would be appropriate to also mention the large bulk of work on the generation of ensemble members based on dynamical systems's theory and data assimilation. A good recent reference is 10.1029/2021MS002828
7) I am a bit of an inconvenience with the use of the term "latent". On the one side I agree with a comment from the other Reviewer. On the other I do also see in line 100 that you state z=x which makes one deduce the latent and actual state have the same dimension. Finally, while it is true that latent variables are defined in relation to their indirect (often hidden) relation with the observables quantities, with no reference to their number (or space dimension), in many practical applications the latent space is assumed/defined/used as being of smaller dimension.
8) Line 115. I would add ".... prior distribution FOR THE DENOISING PROCESS."
9) Equations (8). Wouldn't be better to (re)state clearly that we do not have access to x in practice?
10) Line 145. Is that because they do not depend on x?
11) Line 153. I think "Equation" must be written at the beginning of the sentence.
12) Line 176. Instead of "normally" I would suggest "most of the times".

Citation: https://doi.org/10.5194/egusphere-2023-2261-RC2
- AC2: 'Reply on RC2', Tobias Finn, 24 May 2024
  
  Thank you very much for the constructive feedback and the comments on how to improve our manuscript. In the attached file, we discuss and describe our plan to address all your comments.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2261-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Tobias Finn on behalf of the Authors (20 Jun 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (12 Jul 2024) by Ioulia Tchiguirinskaia

AR by Tobias Finn on behalf of the Authors (16 Jul 2024) Manuscript

Short summary

We train neural networks as denoising diffusion models for state generation in the Lorenz 1963 system and demonstrate that they learn an internal representation of the system. We make use of this learned representation and the pre-trained model in two downstream tasks: surrogate modelling and ensemble generation. For both tasks, the diffusion model can outperform other more common approaches. Thus, we see a potential of representation learning with diffusion models for dynamical systems.