Direct Bayesian model reduction of smaller scale convective activity conditioned on large scale dynamics

We pursue a simplified stochastic representation of smaller scale convective activity conditioned on large scale dynamics in the atmosphere. For identifying a Bayesian model describing the relation of different scales we use a probabilistic approach (Gerber and Horenko, 2017) called Direct Bayesian Model Reduction (DBMR). The convective available potential energy (CAPE) is applied as large scale flow variable combined with a subgrid smaller scale time series for the vertical velocity. We found a probabilistic relation of CAPE and vertical upand downdraft for day and night. The categorization is based on the 5 conservation of total probability. This strategy is part of a development process for parametrizations in models of atmospheric dynamics representing the effective influence of unresolved vertical motion on the large scale flows. The direct probabilistic approach provides a basis for further research of smaller scale convective activity conditioned on other possible large scale drivers.


Stochastic model
We are interested in modeling the probabilistic relationship of two potentially random quantities, X and Y . For us, it will be only relevant that Y is a random function of X-randomness of X itself is irrelevant. Since the observations typically arise as 60 time series, we consider X and Y as processes X(t) and Y (t) with time t, however t can denote any parameter ordering the realizations of the process. We will consider the case where X and Y can only attain a finite number of values, such that we call the processes discrete-state or categorical. Say, Y (t) is taking one of the possible values from m categories {y 1 , y 2 , ..., y m } and X(t) from the n categories {x 1 , x 2 , ..., x n }. The central quantity of interest describing the relationship of X and Y is the m × n matrix of conditional probabilities, also called transition matrix, . . . . . . . . .
Note that Λ is a column-stochastic matrix. In practical studies, when the Λ ij are estimated from the available observations of X and Y one needs to guarantee that the data is acceptably randomised (Holland, 1986). We will assume that i.e., given the input X(t), the distribution of the output Y (t) is independent of the other inputs.

Bayesian approach
Typically, the transition matrix Λ is not directly available and can only be estimated from observed data. Let S be the number of observation pairs for the categorical processes X and Y , such that the following observational data is available: (1), X(2), ..., X(S), Y (1), Y (2), ..., Y (S)} , where X(t) ∈ {x 1 , . . . , x n } and Y (t) ∈ {y 1 , . . . , y m }, as above. Given XY , it is reasonable to search for the Λ for which the to-75 tal probability of obtaining the particular sequences of observations (3) is maximized. By the independence assumption (2), the likelihood of a matrix Λ of conditional probabilities-i.e., the probability of observing the data if the conditional probabilities were given by Λ-is given by where N ij is the total number of instances in the data when X(t), Y (t) = (x j , y i ). The optimum can be more easily com-80 puted if one considers the log-likelihood log(P [XY | Λ]) = m i=1 n j=1 N ij log Λ ij , with which we arrive at the maximum likelihood problem The optimal solution of that constrained optimisation problem can be determined analytically (Gerber and Horenko, 2017), resulting in the empirical frequency estimator: Since we merely have a finite amount of observation at hand, it is essential to be aware of the uncertainty of the statistical estimate (6). While we refer the reader interested in exact bounds to (Gerber and Horenko, 2017, Supplement, Eq. (14)), an intuition can be gained as follows. To estimate each Λ ij to a sufficient (statistical) accuracy, the transitions N ij should be, on average, numerous. As there are nm parameters in Λ to estimate, this asks for the sample size S to be reasonably large as 90 compared to nm. In practice, this can be problematic if n and m are large. Thus, next we will discuss a modification of the above method that can mitigate this problem.

Direct estimation of low-order models
In numerous situations the apparent complexity of our observations is an artefact of our measurement procedure, and there are low-dimensional features that govern the process at hand. Thus, even if we would be able to find a full matrix Λ of conditional 95 probabilities, the ultimate goal would be to reduce this through such low-dimensional features.
The following approach, proposed by Gerber and Horenko (Gerber and Horenko, 2017), achieves both estimation and reduction in one step. We assume that the output depends on the input through a latent variable Z, which can merely take a small number K min{n, m} of different states {z 1 , . . . , z K }. In terms of probabilistic influences, we assume the structure 100 where λ, Γ are matrices of conditional probabilities, We also assume conditional independence of Y on X given Z, that is, the input-output conditional probability matrix Λ satisfies Λ = λΓ. Note that we can interpret Γ kj as an affiliation of input category x j to the latent state z k , see Fig. 1.
The task is now to determine the pair of column-stochastic matrices (λ, Γ) from the observation data XY , as given in (3).

105
Again, we wish to solve the problem with a maximum-likelihood approach, which would require solving (5) with replacing Λ ij by (λΓ) ij and the constraints by requiring λ and Γ being stochastic matrices. This, however, is a computationally hard optimization problem, which Gerber and Horenko in (Gerber and Horenko, 2017)  While (9) produces suboptimal estimates, its advantage comes from the fact that it is concave in both variables λ and Γ, respectively, allowing for a very simple alternating maximization as optimization procedure (Gerber and Horenko, 2017, DBMR algorithm). The resulting algorithm is DBMR. Moreover, the method yields Γ * kj ∈ {0, 1}, i.e., the original input categories are assigned to the reduced system's (latent) categories in a deterministic fashion (no "fuzzyness" in the affiliations). Of course, the 115 number K of latent states is not known in advance, and has to be chosen judiciously by compromising between "expressiveness" (the likelihood of the model, i.e., the optimal value in (9)) and "effort" (the number of total parameters to be estimated and their statistical error). This can be done comparing multiple DBMR runs with different K.
The obtained models are also less subject to overfitting issues and are more advantageous in terms of the model quality measures (Gerber and Horenko, 2017;Gerber et al., 2018). This manifests in the variance of the estimated parameter λ * ik , which 120 shows a K/n-times smaller uncertainty than Λ ij , cf. (Gerber and Horenko, 2017, Theorem and eqn. [7]). Again, intuitively this advantage of DBMR over the full model (6) can be seen by noting that from the same amount of data DBMR only needs to estimate k(n + m) parameters, while the full model nm parameters.
Let us emphasize that additionally to all the computational advantages of DBMR that allow it to work with large data sets, its conceptual strength is that it combines model estimation and model reduction in one step. The latent states often have a 125 physical meaning-a property that we shall focus on in our application.
To apply DBMR, categorical processes for the in-and output have to be defined. First, we discuss the choice of meteorological variables and scales in view of the categorical processes. As input we use a variable related to large scale atmospheric flow: Convective Available Potential Energy (CAPE), a measure for the energy an air parcel would gain if lifted to a specific height 130 in the atmosphere.

Meteorological data
CAPE can be seen as a measure for atmospheric stability, first suggested by (Weisman and Klemp, 1982). It is defined by where θ e is the pseudopotential temperature of the ascending air parcel, θ is the potential temperature of the surrounding 135 air, and z LFC is the so-called Level of Free Convection (LFC). The LFC is the height at which the rising air parcel becomes significantly warmer than its environment; Z ET denotes the height, where the rising air parcel has the same temperature as its environment (ET stands for equal temperature). Thus, regarding its definition (11), CAPE becomes large if the temperature difference between the rising air and the environmental air is large, see (Bott, 2016, p. 431 ff). As an integral, CAPE is a global variable that we consider as representative variable on the larger scale. To capture convective activity, characterized by strong 140 up-and downdrafts, on the smaller scale, we regard the vertical velocity. Parcel theory (Dutton, 1976) predicts where v max is the maximum vertical motion in the dimension m/s expected from the release of CAPE in the dimension J/kg. The relation in Eq. (12) is a kinetic description of a potential which does not have to be released to vertical updraft. (Moncrieff and Miller, 1976) were the first to use the term CAPE. The USAF Air Weather Service (which changed its name 145 to the Air Force Weather Agency in 1997) simply called it positive area (AWS 1961). (Fritsch and Chappell, 1980) called it potential buoyant energy (PBE), while variations of this include +BE and net positive buoyant energy. Despite the abundance of names, it now appears that CAPE is the de facto standard terminology. In (Kirkpatrick et al., 2009)   To analyze the relation of large and small scale parameters, the COSMO-REA6 reanalysis data set is used (Bollmeyer et al., 2015). This reanalysis is based on the non-hydrostatic numerical weather prediction model COSMO (COnsortium for sums up to S = 1302 (2 × 31 × 21). We choose a REA6 subdomain that covers Germany. This subdomain is bounded by the Fig. 2. The Northwest coordinate is (5.8 • E; 54.7 • N ) and the Southeast coordinate is (15.3 • E; 45.2 • N ). As vertical layer the 600 hPa surface is considered, because here the latent heat release takes place and the vertical velocity reaches its maximum, as shown in (Müller et al., 2020).
160 Figure 2. REA6 domain that covers Germany consisting of subdomains 1 to 4; Subdomain 1 is applied on the large scale for DBMR and is of approximately 500 km × 500 km. Image credit of the map: US Geological Survey (USGS).

Filtering CAPE and vertical velocity
The domain that covers Germany in Fig. 2 is divided into four 500 km × 500 km quadrants, where the spatial arithmetic mean of each of the quadrant is considered such that we obtain one CAPE value for each quadrant. We separate and filter the data of CAPE and the vertical velocity in further subdomains in order to define the categorical in-and output. The corresponding sizes of the subdomains are summarized in Tab. 1. For the analysis with DBMR, the northwest quadrant 1 over Holland in Fig. 2 is 165 used. There is no influence of the Alps on smaller scale convective activity.

Categorical input and output
According to the meteorological data in Sect. 3.1, we will set up applicable categories for in-and output. CAPE plays the role of an input variable X in Sect. 2, describing the potential for convection. We use the average of the 500 km × 500 km quadrants, considering CAPE as the large scale atmospheric driver. With energy units, CAPE has a non-negative range of

Categorical input
We will consider two different ways to define the input categories. First, linearly spaced categories for the range of CAPE are applied. With this type of classification, extreme weather events tend to be in a separate category. These are not Gaussian 175 distributed. Alternatively, the categorization by evenly spaced quantiles is presented. We consider the range of values for CAPE (X) and generate n categories by {x ∈ in probability using empirical 1/n-quantiles as category boundaries.

180
While the first is easily interpreted on the CAPE-axis, the second categorization has the advantage of (almost) equally populated categories. The resulting n categories are denoted by integers 1, ..., n, we choose n = 10. In Sect. 3.1 we set up the meteorological data with a size of the observational data S = 1302. That means we have about 130 data points in each CAPE category.

185
We map vertical velocities ω i at grid box i on a variable Y i ∈ {1, 2, 3} as (a 1 , a 2 ) ∈ R 2 define a potentially asymmetric interval around zero vertical velocity which we consider as neutral, with a 1 < 0 190 and a 2 > 0. The choice of (a 1 , a 2 ) depends on the scale of the box where Y is averaged over. In Sect. 4.1 the choice of the interval for our analysis is described. Once this discretization is made, the final output categories needed can be set up. Let Y i (t) be the discretized vertical velocities at time t with 1 ≤ i ≤ m numbering the grid boxes on the corresponding scale, see Tab. ??. We define the following categorical procesŝ

195
with #{Y i = k} being the number of grid boxes with vertical velocity mapped onto k ∈ {1, 2, 3}. There are exactly (m + 1) 2 ways to decompose m into the (ordered) sum of 3 nonnegative numbers, thus the number of actually occurring categories nŶ ≤ (m + 1) 2 . In our probability-preserving algorithm the number of the occurring categories in the data are counted for the categorical observational input and output. The probability of a category is estimated by its occurrence frequency with respect to the total number of data points. We try to conclude down-and updraft behavior from theŶ (t), i.e. the distribution of up-and 200 downdrafts. Note that we have in this experiment no information on the (spatial) structure, as the category in (13) is a triple of numbers for counts of down-, updraft and low vertical velocity.

Maximum likelihood estimation
The model reduction is a consequence of using the affiliation matrix Γ, which assigns the n large scale categories to K < n latent states. In the frame of DBMR we optimize a relaxed log-likelihood, cf. (9). We ran DBMR 100 times (with random 205 initializations) for every fixed number K of latent state. For the respective latent state, the run with the maximum log-likelihood is presented. We also evaluate the exact log-likelihood, as in (5) For an attempt for model selection, the largest increase in log-likelihood can be found by increasing K = 2 to K = 3, for K = 6, the maximum value has been reached. Note that as n = 10, choosing K = 10 presents no model reduction.
4 Reduced Bayesian model for atmospheric dynamics 4.1 Results Figure 3. Exact log-likelihood value as in (5) (blue) and relaxed log-likelihood value as in (9) (red) of the reduced Bayesian model estimated by DBMR with K latent states (also referred to as collective causality boxes).

Interval for vertical draft
All-day mean data serve as basis for determining the interval for vertical draft. The subclassification by the interval is symmetric with a 1 = −0.0048m/s and a 2 = 0.0048m/s. These values are chosen from a histogram of mean vertical velocities. In Fig. 4, the histogram of mean vertical velocities for a resolution of 125km with the interval is shown.

220
Afterwards, the data for day and night are split up and will be fed to DBMR respectively.

Affiliation to latent states
In Fig of CAPE concerning the affiliated n categories (n = 10 divided by quantiles). In Fig. 5 one sees that the input categorization is similar in terms of value for day and night. The first latent state includes 5 (for day) and 4 (at night) CAPE categories with high values. This represents high CAPE values and is therefore referred to as "High". Five (for day day) and 6 (at night) categories

Distributions conditioned on latent states
We discuss probability distributions conditioned on the resulting latent states introduced in Sect. 2.3 in two ways: -Law[X | Z] gives the distribution of CAPE X within a latent state Z,   is shown. KDE is a non-parametric way to estimate the probability density function of a random variable. At night, more 270 categories are assigned to the latent state "Low", the first latent state has a larer mean and median than during daytime.
In Fig. 8  with affiliation without gaps is higher at night compared to day. In Fig. A3 and A4 we use CAPE as input with a resolution of 500 km × 500 km on large scale and a grid of 15 km × 15 km for the output. The output is on convective scale. Here the affiliations are without gaps as well. The refined category for the convective scale leads to a difference in the distributions over the latent states for day and night. For day 4 and for night 5 mean categories are distributed according to the violin plot of Fig. A3. The corresponding conditional probabilities can be seen in Fig. A4. At night the highest conditional probabilities  To analyze the relation of large scale dynamics in the atmosphere to smaller scale categorical processes, the COSMO-REA6 315 reanalysis data set was applied (Bollmeyer et al., 2015). We averaged CAPE for 500 km × 500  Fig. 7 at the bottom, 6 high and 4 low CAPE categories for daily mean and 4 high and 6 low CAPE categories at night are affiliated.
As a result of the averaging, the categories are almost evenly distributed over the latent states. The convective activity of the atmosphere is stronger during the day than during nighttime. Therefore, the vertical draft is less at night than during the day.
Mean and median are around 100 J/kg for the latent state "High" and 25 J/kg for the latent state "Low". The mean and median are similar for day and night. There is a difference for the variance. At night the distribution of latent state "High" is sharper

345
The generation of kinetic energy of a certain CAPE level to vertical draft on smaller scales can occur up to a few hours later.
A temporal shift for the in-and output could have an effect on the stochastic relation shown in Fig. 6. We consider the 12 hours means. For data with a higher temporal resolution, one could realize a shift of 2-4 hours for the input. This is deferred to future studies.

350
It is of importance to identify stochastic models by using categorical approaches compared to fluid mechanics described by continuous partial differential equations. In this study, a recent algorithmic framework called Direct Bayesian Model Reduction (DBMR) (Gerber and Horenko, 2017) is applied which provides a scalable probability-preserving identification of reduced models directly from data. We assume that the output of a Bayesian model depends on the input through a latent variable, which can merely take a small number of different latent states. The stochastic method is tested in a meteorological application 355 towards a model reduction to latent states of smaller scale convective activity conditioned on large scale atmopsheric flow.
We combined the convective available potential energy (CAPE) as large scale flow variable with smaller scale subgrids time series for vertical velocity. Therefore we mapped vertical velocities as updraft, no draft and downdraft dependent on an interval around zero vertical velocity and count the numbers of up-and downdrafts. Data sets of daily means of 12 hours for day and night were computed using COSMO-REA6 reanalysis over a domain that covers Germany for a period of the summer months considered. The categorical data analysis was done for day and night and discussed for different numbers of latent states. We chose a number of 10 input categories and reduced these to two and three latent states.
The step from the fluid continuum described by partial differential equations to a categorical stochastic description with DBMR provides a reduced model defined on a set of a few latent variables. These are interpreted as reduced states for the 365 large scale atmospheric dynamics with respect to their probabilistic impact on vertical motion. For 2 latent states the input is separated into categories with high and low CAPE values whereas for 3 latent states we have an affiliation to categories with high, medium and low CAPE values. The output categories for the vertical velocity describe the number of up-and downdrafts.
In the result, we gain conditional distributions for the numbers of up-and downdrafts conditioned on the latent states for day and night. In the application we found a probabilistic relation of CAPE and vertical up-and downdraft.

370
For a resolution of 125km we applied a 4 × 4 grid and had 16 boxes with vertical velocities. During daytime the chance for updraft is higher conditioned on the latent state with high CAPE values. Probability adds up for numbers of up-or downdrafts higher than 10 to 81%. The distribution for the latent state with low CAPE values has higher probabilities at high numbers of downdrafts. Here the probability of numbers of downdrafts of 6 to 16 is 68%. At night probability adds up at small numbers of downdrafts for the latent state with high CAPE values. For low CAPE values, we observe that a low number of updrafts is 375 likely. The probability is accumulated to 82% for the number of updrafts between 0 and 4.
On smaller scale with a resolution of 15km we applied a 32×32 grid and had 1024 boxes with vertical velocities. We divided the output into 3 categories of low (0 to 341), medium (342 to 683) and high (84 to 1024) numbers of up-and downdrafts.
During daytime the probability for a medium number of up-and downdrafts is 34% for the latent state with high CAPE values.
Here low and high numbers of up-and downdraft have small probability. For low CAPE values the maximum in the distribution 380 occurs for a medium number of downdrafts and low number of updrafts at 50%. At night the probability adds up at low to medium numbers of downdrafts for the latent state with high CAPE values and for low CAPE values, we observe that the chance of low and medium number of updrafts is 82%. The distribution for the smaller scale resolution (15km) is a stochastic aggregation of the distribution with resolution of 125km. Therefore the distributions are qualitatively similar. When the number of latent states is further increased, the latent states can be clustered in groups of high, low and medium CAPE categories.

385
The model reduction of smaller scale convective activity is part of a development process for a model with a stochastic component for a conceptual description of convection embedded in a deterministic atmospheric flow model. Various energetic variable are applicable on large scale. A potential driver to control small scale models is the Dynamic State Index (DSI) (Müller et al., 2020;Müller and Névir, 2019), an "adiabaticity indicator". Other large scale variables driving the smaller scale stochastics are the available moisture or vertical wind shear. The presented approach provides a basis for further research of 390 smaller scale convective activity conditioned on other possible large scale drivers.
Author contributions. Annette Müller prepared the meteorological data. All authors have then contributed to develop the work and prepared the manusscript.