We pursue a simplified stochastic representation of smaller scale convective activity conditioned on large-scale dynamics in the atmosphere. For identifying a Bayesian model describing the relation of different scales we use a probabilistic approach by

Complex dynamical processes involving scaling cascades are omnipresent in natural science. Such processes feature different characteristic scales. The smallest and largest scales are far apart, and much of the scale range is involved by scale interactions. Dynamics in the atmosphere take place across a large range of timescales and length scales, from micro-seconds to months and lengths from

A new perspective for weather and climate models came from stochastic parametrizations that represent the small-scale effects of convection on large-scale dynamics (see

The paper is structured as follows: in Sect.

Our aim is to study and understand a stochastic relation between two variables

We are interested in modeling the probabilistic relationship of two potentially random quantities,

Typically, the transition matrix

Since we merely have a finite amount of observation at hand, it is essential to be aware of the uncertainty of the statistical estimate (

In numerous situations the apparent complexity of our observations is an artifact of our measurement procedure, and there are low-dimensional features that govern the process at hand. Thus, even if we were able to find a full matrix

The following approach, proposed by

Introduction of intermediate latent states in DBMR for efficient and scalable estimation of

The task is now to determine the pair of column-stochastic matrices

The obtained models are also less subject to overfitting issues and are more advantageous in terms of the model quality measures

Let us emphasize that additionally to all the computational advantages of DBMR that allow it to work with large data sets, its conceptual strength is that it combines model estimation and model reduction in one step. The latent states often have a physical meaning – a property that we shall focus on in our application.
All computations in Sect. 4 have been conducted with the DBMR implementation; see

To apply DBMR, a quantization of the input and output processes into categories has to be performed. First, we discuss the choice of meteorological variables and scales in view of the categorical processes. As input we use a variable related to large-scale atmospheric flow, convective available potential energy (CAPE), a measure for the energy an air parcel would gain if lifted to a specific height in the atmosphere.

CAPE can be seen as a measure for atmospheric stability, first suggested by

For our studies, the COSMO-REA6 reanalysis data set is used (see

REA6 domain that covers Germany consisting of grid boxes 1 to 4; grid box 1 is applied on the large scale for DBMR and is of approximately 500 km

We used the term “domain” for the total region we considered in Fig.

Number of grid boxes

According to the meteorological data described in Sect.

We consider the range of values for CAPE (

We map vertical velocities

updraft for

no draft for

downdraft for

The 12 h mean data for day and night serve as a basis for determining the interval for vertical draft which was chosen symmetrically with interval limits

Histogram of spatial mean vertical velocities for day and night for a resolution of 125 km (64 grid boxes on a small scale) is presented. The red vertical lines show the interval for vertical velocity which was selected on the basis of the 75th percentile of the data set. The sample size of the 12 h mean data set for day and night sums up to

The model reduction is a consequence of using the affiliation matrix

Exact log-likelihood value as in Eq. (

In Fig.

We discuss probability distributions conditioned on the resulting latent states introduced in Sect.

Histograms show probabilities of numbers of updrafts (

The missing number of neutral grid points

Note that the number of possible output categories

Histograms show probabilities of numbers of updrafts (

In Fig.

The results for three latent states are considered in Appendix

In Sect.

To analyze the relation of large-scale dynamics in the atmosphere to smaller scale categorical processes, the COSMO-REA6 reanalysis data set was applied (see

In Fig.

Joint probability distribution of the number of grid points with positive and negative vertical velocity conditioned on the resulting latent states is shown in Figs.

It is of importance to identify stochastic models using categorical approaches compared to fluid mechanics described by continuous partial differential equations. In this study, a recent algorithmic framework called Direct Bayesian Model Reduction (DBMR) is applied which provides a scalable probability-preserving identification of reduced models directly from data (see

The step from the fluid continuum described by partial differential equations to a categorical stochastic description with DBMR provides a reduced model defined on a set of a few latent variables. These are interpreted as reduced states for the large-scale atmospheric dynamics with respect to their probabilistic impact on vertical motion. For two latent states the input is separated into categories with high and low CAPE values, whereas for three latent states, we have an affiliation to categories with high, medium, and low CAPE values. The output categories for the vertical velocity describe the number of up- and downdrafts. In the result, we gain conditional distributions for the numbers of up- and downdrafts conditioned on the latent states for day and night. In the application we found a probabilistic relation of CAPE and vertical up- and downdraft.

For a resolution of 125 km we applied a

On the smaller scale, with a resolution of 15 km, we applied a

The model reduction of smaller scale convective activity is part of a development process for a model with a stochastic component for a conceptual description of convection embedded in a deterministic atmospheric flow model. Various energetic variables are applicable on the large scale. A potential driver to control small-scale models is the Dynamic State Index (DSI) in

MATLAB code for the Bayesian-Model-Reduction-Toolkit is available at

Research data are archived at Refubium – Freie Universität Berlin repository (

All authors designed the research, discussed the results, and wrote the manuscript. AM prepared the meteorological data. RP conducted the computations. AM, HR, and PN made major contributions to the meteorological discussion, and PK contributed to the methodological development.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We are thankful for the helpful suggestions provided by the anonymous reviewers.

This research has been supported by the Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 “Scaling Cascades in Complex Systems”, project number 235221301, project A01 “Coupling a multiscale stochastic precipitation model to large-scale atmospheric flow dynamics”.We acknowledge support from the Open Access Publication Initiative of Freie Universität Berlin.

This paper was edited by Jürgen Kurths and reviewed by two anonymous referees.