A prototype stochastic parameterization of regime behaviour in the stably stratified atmospheric boundary layer

Recent research has demonstrated that hidden Markov model (HMM) analysis is an effective tool to classify atmospheric observations of the stably stratified nocturnal boundary layer (SBL) into weakly stable (wSBL) and very stable (vSBL) regimes. Here we consider the development of explicitly stochastic representations of SBL regime dynamics. First, we analyze whether HMM-based SBL regime statistics (the occurrence of regime transitions, subsequent transitions after the first, and very persistent nights) can be accurately represented by “freely running” stationary Markov chains (FSMCs). Our results show that despite the HMM-estimated regime statistics being relatively insensitive to the HMM transition probabilities, these statistics cannot all simultaneously be captured by a FSMC. Furthermore, by construction a FSMC cannot capture the observed non-Markov regime duration distributions. Using the HMM classification of data into wSBL and vSBL regimes, state-dependent transition probabilities conditioned on the bulk Richardson number (RiB) or the stratification are investigated. We find that conditioning on stratification produces more robust results than conditioning on RiB. A prototype explicitly stochastic parameterization is developed based on stratification-dependent transition probabilities, in which turbulence pulses (representing intermittent turbulence events) are added during vSBL conditions. Experiments using an idealized single-column model demonstrate that such an approach can simulate realistic-looking SBL regime dynamics.

understanding of turbulence in the atmospheric boundary layer, with turbulence quantities decreasing with height and nearsurface profiles which are well-described by Monin-Obukhov similarity theory (MOST; e.g. Sorbjan, 1986;Mahrt, 1998aMahrt, , b, 2014Pahlow et al., 2001;Grachev et al., 2005Grachev et al., , 2013. In the vSBL, on the other hand, turbulence profiles can decouple from the surface (Banta et al., 2007) and MOST breaks down (e.g. Derbyshire, 1999;Banta et al., 2007;Williams et al., 2013;Mahrt, 2011;Optis et al., 2015). Regime structures and transitions are poorly represented in weather and climate models, due both to 5 coarse resolution (vertical and horizontal) and to an imperfect understanding of the diverse physical processes governing the SBL, particularly with regard to the vSBL to wSBL transitions (e.g Holtslag et al., 2013;Mahrt, 2014). In this study we first analyze how well the statistics of SBL regime occupation and regime transitions can be described by a two-regime system, with the goal of using this information to develop a prototype explicitly stochastic parameterization of turbulence in the SBL for models of weather and climate. 10 As transitions between the two SBL regimes are a common feature of SBL dynamics around the globe (AM19b) a representation of the effect of these dynamics in weather and climate models is needed. The regime transitions, however, are associated with a range of different mechanisms. Over land, the wSBL to vSBL transition (which for simplicity we denote the collapse of turbulence even though turbulence does not cease entirely) is normally caused by radiative cooling at the surface increasing the inversion strength and suppressing vertical turbulent fluxes of momentum and heat. This process is relatively 15 well understood and has been examined using conceptual and idealized single column models (van de Wiel et al., 2007(van de Wiel et al., , 2017Holdsworth et al., 2016;Holdsworth and Monahan, 2019;Maroneze et al., 2019), or direct numerical simulations of stratified channel flows van Hooijdonk et al., 2017) and atmospheric boundary layers (e.g. Flores and Riley, 2011;Ansorge and Mellado, 2014). Radiative cooling leads to very shallow boundary layers which are typically not resolved well in large-scale circulation models. Another mechanism for the wSBL to vSBL transition of particular importance over water is the 20 advection of warm air aloft (AM19c), producing vSBL conditions which are not as shallow as those driven by radiative fluxes.
The vSBL to wSBL transition (which we denote the recovery of turbulence) is less well-understood. Mechanisms by which turbulence recovers include the build-up of shear resulting in instabilities, or an increase in cloud cover weakening the stratification through increasing the downwelling longwave radiation (AM19b). Another potential class of processes initiating these transitions is associated with intermittent turbulence events (e.g. Mahrt, 2014, and references within) which have been found 25 to dominate the turbulent transport in vSBL conditions (Nappo, 1991;Coulter and Doran, 2002;Doran, 2004;Basu et al., 2006;Acevedo et al., 2006;Williams et al., 2013). Intermittent turbulence arises from a range of different phenomena such as breaking gravity waves or solitary waves (Mauritsen and Svensson, 2007;Sun et al., 2012), density currents (Sun et al., 2002), microfronts (Mahrt, 2010), Kelvin-Helmholtz instabilities interacting with the turbulent mixing (Blumen et al., 2001;Newsom and Banta, 2003;Sun et al., 2012), or shear instabilities induced from internal wave propagation (Sun et al., 2004;Zilitinke-30 vich et al., 2008;Sun et al., 2015). It has also been suggested from direct numerical simulations that intermittency can arise as an intrinsic mode of the non-linear equations in the absence of external perturbations of the mean flow (Ansorge and Mellado, 2014). Regardless of which process causes the recovery of turbulence, all phenomena are subgrid-scale in state-of-the-art weather and climate models and are typically not included explicitly through process-based deterministic parameterizations.
Accurate simulations of these near-surface properties is particularly important for global and regional weather forecasts of vertical temperature structures which control the formation of fog and frost (Walters et al., 2007;Holtslag et al., 2013). More 10 accurate simulations of the SBL regime behaviour are also important for better representations of surface wind variability and wind extremes (He et al., 2010(He et al., , 2012Monahan et al., 2011); simulation and assessment of pollutant dispersal, air quality (Salmond and McKendry, 2005;Tomas et al., 2016), harvesting of wind energy (Storm and Basu, 2010;Zhou and Chow, 2012;Dörenkämper et al., 2015); and agricultural forecasts (Prabha et al., 2011;Holtslag et al., 2013). Global and regional weather and climate models often use an artificially enhanced surface exchange under stable condi-15 tions in order to improve simulations of the large-scale flow . This approach led to the introduction of long-tailed stability functions and minimum background TKE values that are not supported by observations. In such representations, turbulence is artificially sustained under very stable conditions and the two-regime characteristic of the SBL is suppressed, biasing near-surface winds and temperature profiles. Without such parameterizations the nocturnal boundary layers can experience a single turbulence collapse which persists for the entire night. Although the long-tailed stability functions 20 in relatively coarse-resolution models are designed to mimic the gridbox-mean of fluxes over many subgrid-scale wSBL and vSBL patches, with increasing horizontal and vertical resolution more accurate process-based parameterizations are necessary.
The occurrence of vSBL to wSBL transitions does not appear to depend deterministically on internal or external state variables (AM19a,b), indicating that parameterizations of the effects of these kinds of transitions in weather and climate models may be required to be explicitly stochastic (e.g. He et al., 2012;Mahrt, 2014). In particular, phenomena such as intermittent turbulence 25 events will likely rely on stochastic parameterizations as their structure and propagation are found to be only weakly-dependent on mean states (e.g. Rees and Mobbs, 1988;Lang et al., 2018). Stochastic subgrid-scale parameterizations of SBL processes have been proposed to account for the missing variability in the SBL and improve both climate mean states and forecast ensem-long-term statistics) transition probability matrices for a two-regime Markovian system, a natural approach to developing stochastic parameterizations of SBL regime dynamics is to investigate if these can be based on 'freely-running' stationary Markov chains (FSMC) using these transition matrices. The first goal of this study is to determine if climatological Markovian transition probability matrices, which are by construction independent of the state of the SBL, are adequate for simulations of the SBL regime dynamics. While the HMM analyses presented in AM19a assume stationary Markov regime dynamics, 5 statistical analyses of the estimated regime sequences show clear evidence of elevated probability of turbulence collapse close to sunset (AM19b). Furthermore, probability distributions of event durations demonstrate a localized maximum corresponding to a typical recovery time of one to two hours after transitions, during which a subsequent transition is unlikely. This structure is indicative non-Markov behaviour (AM19b). Because of these non-stationary and non-Markov behaviours, a FSMC will never exactly capture all aspects of SBL regime dynamics. However, it is a parsimonious model and might be sufficient to reproduce 10 most statistics of interest.
In order to investigate the potential of FSMC-based parameterizations, we first analyze how well they can characterize the HMM-based regime statistics. As part of this analysis we consider the sensitivity of the regime sequence estimated by the HMM to perturbations of the persistence probabilities, allowing for a quantification of what ranges of persistence probabilities accurately describe SBL regime statistics in HMM analyses. By comparing this sensitivity analysis with a sensitivity analysis 15 of regime statistics to varying persistence probabilities in a FSMC we quantify what ranges of persistence probabilities are consistent with both SBL regime statistics derived from an HMM analysis and SBL regime statistics simulated in a FSMC. As we demonstrate that FSMCs cannot adequately simulate all SBL regime statistics of interest, we then investigate the state dependence of observed SBL regime transition probabilities. Finally, we develop a pragmatic prototype of an explicitly stochastic parameterization using the derived state-dependent transition probabilities and present preliminary tests in an idealized single 20 column model (SCM; Holdsworth and Monahan, 2019). The study is organized as follows. First a short review of the observational data used in the HMM analysis is given in section 2, followed by a brief review of the HMM application to the SBL (section 3). Results of simulating statistics in FSMC are shown in section 4, followed by the description of the prototype state-dependent stochastic parameterization and test simulations in section 5. Conclusions follow in section 6.

25
Only a brief summary is presented here because the observational data used in this study have been discussed in detail in AM19a. Observational data sets from nine different research towers measuring standard Reynolds-averaged meteorological state variables with a time resolution of 30 minutes or finer are considered (Table 11). The observational levels of wind speeds and temperatures determine the reference state variable sets which are used in the HMM analyses to classify the data into SBL regimes (cf. AM19a, Table 11). Substantial differences among the nine experimental sites exist in terms of their 30 surface conditions, surrounding topography, and their meteorological setting. As a simple classification scheme, we distinguish between land-based, glacial-, and sea-based stations.
The land-based stations can be further clustered into different subsets. Both the Cabauw and Hamburg towers lie in flat, humid, grassland areas, although the Hamburg tower is affected by the large metropolitan area of Hamburg. The Karlsruhe tower is located in the Rhine valley, a rather hilly, forested area. The American sites, Boulder and Los Alamos, are located in relatively arid settings and are strongly affected by the surrounding topography of the Rocky Mountains.
The DomeC observatory, the single glacial-based station, is located in the interior of Antarctica and is influenced by com-5 pletely different surface conditions including high albedo and low roughness length.
The sea-based stations are the offshore research platforms Forschungsplattform in Nord-und Ostsee (FINO), located in the German North and Baltic Seas. These sites are characterized by relatively homogeneous local surroundings and a large surface heat capacity. At the FINO towers nights with statically unstable conditions (defined as nights with two or more unstable datapoints in a night) are excluded as under these conditions wind speed measurements are unreliable (Westerhellweg and 10 Neumann, 2012). Furthermore, at FINO-1 nights with primary wind directions between 280 and 340 degrees are excluded due to mast interference effects in the data. At the other stations such an exclusion is not necessary as three wind measurements with 120 degree separation are taken at each level.

Brief summary of the hidden Markov model
We now present a brief overview of the HMM analysis with application to the SBL (Monahan et al., 2015, AM19a). An 15 in-depth description of HMM analyses can be found in Rabiner (1989).
We use the HMM to systematically characterize regime behaviour in the SBL from observed data. The HMM assumes that underlying the observations is an unobserved, or hidden, discrete Markov chain (X = {x 1 , x 2 , . . . , x T }). The analysis estimates the regime-dependent parametric probability density distribution (pdf) of the observations (described by the parameter set λ), the transition probability matrix Q, and a most likely regime path of the Markov chain (known as the Viterbi Path, VP). We 20 associate the different states of the Markov chain with the SBL regimes (wSBL and vSBL). In our analysis we use observations of the three-dimensional vector consisting of the Reynolds-averaged vertically-averaged mean wind speed, wind speed shear, and stratification to define the HMM input vector Y. A detailed justification of this observational input dataset is presented in AM19a. The HMM estimation algorithm makes use of the following assumptions: 1. Markov assumption: the current regime value i t at x t depends exclusively on the previous regime of x t−1 , so: where the dynamics of the SBL are governed by Q (a 2 × 2 matrix corresponding to the wSBL and vSBL, respectively) such that it Q itit−1 = 1.
2. Independence assumption: conditioned on X, values of Y are independent and identically distributed variables resulting in a probability of the observational data sequence of where Λ = {λ i , π i , Q} i∈{0,1} is the full set of parameters of the HMM, for which {λ i } i∈{0,1} is the parameter set describing the regime-dependent pdfs of y t (taken to be Gaussian mixture models as in AM19a), and π i is the probability that x 0 is in regime i (wSBL or vSBL).
The goal of the HMM analysis is to estimate Λ from Y. Starting from the probability of the observational time series 5 conditioned on the parameters P (Y|Λ) and applying Bayes' theorem to obtain P (Λ|Y), the problem reduces to a maximumlikelihood estimation which can be iteratively solved to find local maxima via the expectation maximization algorithm (Dempster et al., 1979). Having estimated Λ, the most likely regime sequence (the VP) can be calculated. Regime occupation and transition statistics can then be obtained through analysis of the VP. The estimation of the parameters in the expectationmaximization scheme for our analysis is described in detail in Rabiner (1989). 10 One limitation of the HMM model considered is that it assumes stationary statistics, However, nonstationarities linked to the diurnal cycle and seasonal variability are present in the regime statistics of the SBL (cf. next section, AM19b). Generalizations exist which can account for nonstationarities, such as nonhomogeneous HMMs (Hughes et al., 1999;Fu et al., 2013) which condition transition probabilities on the state of external variables. Other clustering techniques such as the finite-element variational approaches also relax the stationarity assumptions (e.g. Franzke et al., 2009;Horenko, 2010;O'Kane et al., 2013). In 15 particular, the finite-element, bounded-variation, vector autoregressive factor method (FEM-BV-VARX) includes both autoregressive dynamics within each regime and a modulation of regime dynamics to external drivers. For instance, Vercauteren and Klein (2015) were able to use this model to identify different regimes of interaction between submesoscale motions and the turbulence. However, as our analyses find no clear relationship between external drivers (geostrophic wind and cloud cover) and transitions between regimes of Reynolds-averaged variables (AM19b), we consider stationary HMM analysis in this study 20 in order to investigate the simplest possible approach to a stochastic parameterization of turbulence under SBL conditions.
Using such a relatively simple parameterization allows us to determine where additional complexity is warranted and assess how well the dynamics are approximated by stationary Markovian systems.

SBL regime statistics based on 'freely-running' Markov chains
To be useful as the basis of new parameterizations of turbulent fluxes in the SBL, FSMCs should model SBL regime statistics 25 accurately. The statistics we focus on are the event durations and the probabilities of each of: the occurrence of very persistent nights, of the occurrence of at least one transition within a night, and of multiple transitions within a night. Our reference FSMC models use transition matrices Q ref obtained from HMM analyses in AM19a (

15
In VP ref the probability of at least one wSBL to vSBL or vSBL to wSBL transition occurring within a night shows no systematic dependence on the length of the night across the tower sites ( Figure 12). In contrast, the occurrence probability of at least one transition in a FSMC (Eqns. A3 and A4) increases with the length of the night, and is larger than the VP ref at all sites ( Figure 12, lower panels). The overestimation of turbulence recovery events by the FSMC is slightly larger than that of turbulence collapse events at land-based stations, while the opposite is true at sea-based stations. 20 The probabilities of the occurrence of a recovery event subsequent to a turbulence collapse in the FSMC (Eqns. A6 and A8, Figure 13) agree better with those of VP ref than do the probabilities of the overall occurrence of at least one wSBL to vSBL transition ( Figure 12). Both VP ref and FSMC occurrence probabilities increase with the length of the night, at about the same rate. At land-based stations the VP ref has fewer subsequent turbulence recovery events than expected from a FSMC, and at sea-based sites more are observed than predicted by a FSMC. Distributions of wSBL to vSBL transitions subsequent to 25 recovery events in a FSMC and the VP ref are generally similar with slightly better agreement in summer than winter ( Figure   13, right panels).
The occurrence of subsequent transition events is related to event durations in the vSBL and wSBL. For both types of events, the duration pdfs display clear maxima between one and two hours after preceding transitions, demonstrating that the occurrence of subsequent transitions most often occurs after some recovery period ( Figure 14). No two-regime FSMC can 30 account for these recovery periods because event duration pdfs in the FSMC always decay monotonically (equations A5 and A7). After the initial recovery period, however, event duration pdfs are generally close in the VP ref and FSMC (for a 12-hour night durations), resulting in a generally good agreement of mean event durations. The qualitative shape of the event duration pdfs is insensitive to season, although during winter the probabilities of longer events increase (longer nights allow more time for longer events to occur). Consistently, different nighttime durations in the FSMC change the pdf decay rate (steeper and shallower for respectively shorter and longer nights), however, the substantial differences to pdfs estimated from VP ref remain.
The results above demonstrate the existence of at least two aspects of the regime statistics which qualitatively cannot be accounted for by a two-regime FSMC: non-stationary and non-Markov behaviour. While many other regime statistics follow qualitatively the behaviour of a FSMC, quantitative differences between the statistics of VP ref and FSMCs using Q ref are 5 substantial. As values on the diagonal of Q ref are close to one (Table 11), the theoretical regime statistics calculated from the FSMC are highly sensitive to these values (cf. Eqns. A1-A8). Therefore, we now investigate the sensitivity of VP to perturbed Q to determine if biases in the SBL regime statistics of the FSMC can be reduced by modest changes of Q.

Sensitivity of the VP to perturbed persistence probabilities
We consider the sensitivity of the VPs to changes of the persistence probabilities (diagonal elements of with i t = i t−1 and i ∈ {wSBL,vSBL}) by perturbing Q from the reference value, holding it fixed, and repeating the HMM analysis. In order to assess if the perturbed VPs are consistent with VP ref we consider first the occupation consistency between the two (the fraction of time in which both VPs are in the same regime). As in AM19a, we then assess the consistency of the timing of transitions (simultaneity of transitions in the reference and perturbed VPs) as well as the representation of very persistent nights. These various metrics are then combined to obtain the total VP consistency. For this part of the analysis, we 15 focus on the Cabauw tower data as we have analyzed these data extensively in AM19a. The same qualitative results are found using all tower station data we have considered (not shown).
The estimated VP is robust to substantial changes in Q, with an occupation consistency of more than 90 % obtained for ranges of wSBL and vSBL persistence probabilities between 0.5 and about 0.9999 ( Figure 15). Agreement at the 99 % level is found for persistence probabilities between approximately 0.9 and 0.9999. Accurate representation of the timing of transitions 20 is found for both a broad range of low persistence probabilities and for a small range of persistence probabilities between 0.96 to 0.99. The fact that the accuracy of the transitions is above 99 % if both persistence probabilities are below 0.5 (regime transitions in a single step are more probable than remaining in the regime) is a consequence of the high frequency of modelled transitions improving the ability to capture individual transitions in VP ref (at the expense of modelling far too many transition events). Because regime transitions are relatively rare, the physically meaningful range of persistence probabilities corresponds 25 to relatively large values of both. The accuracy of the occurrence of very persistent wSBL nights in the perturbed VP is best for high P (wSBL → wSBL) and is weakly sensitive to P (vSBL → vSBL). This result is not surprising as the high wSBL persistence probability ensures that the majority of very persistent wSBL nights as estimated by VP ref are captured. This measure is unaffected by any underestimate of the occurrence of very persistent vSBL nights. Complementary results are found for the occurrence of very persistent vSBL nights. 30 Each of the five consistency measures discussed capture distinct aspects of agreement between the reference and perturbed VPs. We define total consistency relative to VP ref as each of the five described VP consistencies exceeding a specific threshold.
The sensitivity analysis of the estimated regime occupation sequence to changes in Q values reveals that reasonably accurate HMM regime statistics can be obtained over a relatively large range of persistence probabilities. We now return to FSMC calculations using the ranges of Q where the total VP consistency exceeds 95 % to assess if a common range of persistence   At sea-based stations the range of persistence probabilities that ensures good agreement between the VP ref and the perturbed VPs is substantially larger than for land-based stations (not shown). The total VP consistency exceeds 95 % for regime persistence probabilities ranging from approximately 0.92 to 0.99. Nonetheless, similar to land-based stations, no persistence 30 probability range can be identified which allows a FSMC to simulate all SBL regime statistics accurately. Again, to obtain only specific SBL regime statistics, ranges of persistence probabilities are required which generally exceed the values assuring good agreement between the reference and perturbed VPs.
The fact that a FSMC is not able to account for all regime statistics (with or without seasonally varying Q) motivates the consideration of other approaches to the parameterization of regime dynamics. In particular, the use of state-dependent transition probabilities is supported by the relatively well-understood control of the internal SBL dynamics on wSBL to vSBL transitions In the next section we present a prototype of such a parameterization. 5

Stochastic parameterization for SBL regime dynamics
In this section, we develop a prototype explicitly stochastic parameterization for numerical weather prediction and climate models and test it using an idealized SCM. We first consider the state dependence of transition probabilities from VP ref , to be used as the basis of a two-value (wSBL or vSBL) discrete SBL regime occupation variable (S). After having estimated functional forms for these conditional probabilities from fits to data, a parameterization of episodic enhancement of eddy diffusivity 10 by intermittent turbulence bursts is developed. Finally, the application of this parameterization in the SCM is presented. We emphasize the fact that the explicitly stochastic parameterization and the tests of it presented here are intended to be a proof of concept. A formal validation of model experiments against observational data, including systematic sensitivity analyses of the free parameters and an implementation in a more complex single column model, will be the subject of a future study.

State-dependent transition probabilities 15
The Richardson number (Ri) is often used in parameterizations of stratified boundary layer turbulence, and as such is a natural candidate on which to condition probabilities of transitions between states of S. For instance, we expect physically that P (wSBL → vSBL|Ri) should be small for small Ri, but should increase to virtual certainty for sufficiently large Ri.
Due to their coarse vertical sampling, the Reynolds-averaged observational tower data considered only allow for a characterization of finite differenced approximations to Ri, in particular the bulk Richardson number (Ri B ): where g is the acceleration due to gravity, Θ the mean background potential temperature, h the height of the upper measurement and s the lower measurement height near the surface, and Θ and W are respectively the potential temperature and wind speed (at heights indicated by subscripts).
To assess the relationship between Ri B and transition probabilities (and in particular the robustness of this relationship across 25 different locations) we first investigate composites of its evolution during regime transitions at the various tower sites described in section 2. These composites, centred on the time of transitions and extending 90 minutes before and after, characterize the average behaviour of Ri B across transitions. Such composites do not distinguish differences between individual events which may be important for a detailed physical understanding of a specific transition. Furthermore, composite changes may be less sharp than individual ones, due to variations in transition timing below the averaging scale of the data considered. 30 Across all land-and glacial-based stations Ri B measured between each observational height and the surface systematically increases (decreases) during turbulence collapse (recovery) events ( Figure 17, columns one and three). At sea-based sites changes in Ri B are not evident. The weak signal in Ri B at sea-based stations is likely related to the fact that the lowest observational levels are much farther from the surface than at other stations (30 m above mean sea level).
In order to further compare results across all tower sites we concentrate on Ri B between about 100 m and 10 m (Ri B (100, 10))  creases rapidly for weaker inversion strengths. To build a state-dependent parameterization for S conditioned on stratification, conditional transition probabilities discussed above are fit to functional forms. As the wSBL cannot be sustained for strong inversions nor a vSBL for weak inversions (Figure 18, second row), transition probabilities for such conditions are set to one.
The functional forms for the stratification-dependent transition probabilities are (4) 5 and The best-fit parameter and the RMSE for each station are listed in Table 14; the corresponding best-fit functions are shown in Figure 18 (second row). Those fits are similar enough to each other to motivate analysis of the mean behaviour across all data, yielding parameter sets are listed in Table 14 and corresponding functions are depicted in Figure 18 (third row, solid black 10 line). Using the median parameter set or a best-fit through all data does not change the parameterization substantially.

Stochastic forcing in the vSBL regime
As described in the Introduction, state-of-the-art planetary boundary layer turbulence parameterizations are able to produce radiatively driven turbulence collapse. In contrast, mechanisms to explicitly generate the turbulence recovery are too weak or lacking in standard parameterizations. He et al. (2012) showed that a stochastic process representing the effects of intermittent 15 turbulence events can be implemented as an extra source term in the prognostic TKE budget during vSBL conditions, such that these events episodically drive the vSBL into a turbulence active regime. In their approach, however, the generation of intermittent turbulence bursts did not depend on the state of the boundary layer. Here, we propose to introduce a new local variable, the two-value discrete SBL regime occupation variable S, tracking SBL regimes. At each time step the occurrence of a regime transition is determined randomly using the instantaneous state-dependent transition probabilities derived above. When 20 S is in the vSBL additional stochastic forcing is added as a representation of the effect of intermittent turbulence bursts. These enhancements occur with random sizes and at random times, and are similar to a compound Poisson process. This approach is designed to be able to reproduce the recovery time in the vSBL event duration distribution.
Many models, including the one we consider, represent turbulence fluxes using first order closure. Here, we represent additional stochastic forcing in the vSBL by increasing the diffusivities of heat and momentum: where K is the diffusivity for momentum and heat, K SM C the diffusivity as determined by the standard SCM parameterization (cf. Eqn. B7), and SF k represents the effects of the k-th intermittent turbulence pulse parameterized as follows.
1. At each timestep the probability of the occurrence of an intermittent turbulence pulse is given by λ SF dt, where λ SF is its occurrence rate and dt the model timestep. If a turbulence pulse is determined to occur at time t k , a random number 30 r is drawn from a uniform-distribution on [0, R] representing the maximum strength of the burst SF k , occurring at time t w k = t k + τ w .
2. The evolution of SF k is split into growth and decay phases. The relatively short growth phase is introduced to avoid numerical instabilities, while the decay phase represents the dissipation of the intermittent turbulence pulse. Each SF k is assumed to have a Gaussian profile in the vertical (which is intended to represent the localization of the enhanced mixing 5 in the region where the turbulence pulse occurs) given by 3. The strength s k (t) increases from t k until t w k according to a hyperbolic tangent function. Afterwards an exponential decay is prescribed with an eddy overturning timescale τ e : 10 4. We assume the centre of the turbulence pulse, h k (t), to be initiated aloft (cf. AM19c, Figure 4) and to move exponentially towards the surface during the decay phase: where h b and h e denote the heights of the centre of SF k (t, z) at the beginning and end of the turbulence pulse respectively, and τ h is the vertical migration timescale. 15 5. The width of SF k (t, z), σ k (t), is assumed to increase from the onset of the pulse until t w k according to a hyperbolic tangent function which is introduced to avoid numerical instabilities as for s k (t). The functional form ensures σ k (t) increases in a similar way as s k (t). During its decay σ k (t) widens exponentially (representing the interaction of the turbulent patch with its surrounding) with a typical broadening timescale τ σ :

20
where σ w and σ e are the widths of the turbulence pulse at respectively the time of its maximal strength and end of its lifecycle.
As indicated by Eqn. 6, the effects of multiple overlapping turbulence bursts are taken to be additive. Thus, we assume no interaction between successive turbulence bursts. Below we test the parameterization in an idealized SCM. The values for the parameters in the stochastic forcing parameterization as used in the SCM experiments are listed in Table 15. 25

SCM experiments with explicitly stochastic parameterization
The SCM we use to test the parameterization is described in van Hooijdonk et al. (2017) and Holdsworth and Monahan (2019).
The governing equations of the SCM are presented in detail in Appendix B. In this study we consider the upper boundary of the model, at which we impose the boundary condition that the flow is geostrophic with a speed of 6 m s −1 , to be fixed at 5000 m. The lower boundary of the model domain is determined by the momentum roughness length which is set at z 0 = 0.001 m 5 over a dry sand surface with density ρ s = 1600 kg m −3 , specific heat capacity c s = 800 J kg −1 K −1 and thermal conductivity λ s = 0.3 W m −1 K −1 . Furthermore, we assume clear sky conditions. The explicitly stochastic parameterization described above results in SBL transitions that are qualitatively in agreement with observations. An example realization is presented in Figure 110. In this realization radiative cooling leads initially to a steady increase in stratification and flow stability. Once the vSBL is established (around simulation hour 2) turbulence pulses 10 occur (none of which are individually sufficient to initiate a vSBL to wSBL transition). These turbulence pulses result in heat fluxes slightly larger than observed but of the right order of magnitude (eg. Doran, 2004). The occurrence of multiple smaller turbulence pulses between simulation hours 6-7.5 slowly erodes the stratification until it is sufficiently weakened that a vSBL to wSBL transition occurs. Consistent with observations the simulated vSBL to wSBL transition lags behind the occurrence of the last turbulence burst (AM19c). After the wSBL is established (about simulation hour 7.5) the stratification begins to 15 increase again and a subsequent turbulence collapse occurs approximately 1.5 hours later. This event duration is very close to the peak in the pdf of the wSBL event duration (cf. Figure 14) providing further evidence that these recovery periods in the wSBL are related to internal dynamics.
Structures of wind and temperature profiles during vSBL to wSBL transitions resemble those of observations (cf. AM19c).
Turbulence pulses lead to warming in the lowest 40 m of the boundary layer as turbulent sensible heat fluxes transport warm 20 air from layers between 50 to 150 m towards the surface (Figure 110, middle panels). Enhanced vertical momentum transport results in the near-surface winds first increasing, and then decreasing (as a result of enhanced momentum flux into the surface; Figure 110, bottom panels). The relative magnitudes of the initial wind speed increase and subsequent decrease are sensitive to the height and width of SF k (not shown). As the turbulence pulses decrease the stratification, boundary layer heights increase.
These results demonstrate that an explicitly stochastic model with state-dependent transition probabilities and a representation 25 of intermittent turbulence pulses in the vSBL can produce regime transitions that are in qualitative agreement with observations.

Discussion and Conclusions
Recent studies have demonstrated that hidden Markov model (HMM) analysis is an effective tool to classify the nocturnal boundary layer (SBL) into weakly stable (wSBL) and very stable (vSBL) conditions (Monahan et al., 2015, AM19a, AM19b, AM19c). One goal of this study is to investigate if a two-regime 'freely-running' stationary Markov chain (FSMC, obtained 30 from the HMM analysis) is able to simulate SBL regime statistics with sufficient accuracy to be the foundation of a stochastic parameterization of SBL regimes. We have assessed the performance of the FSMC (using the most likely transition probabilities from HMM anlyzes) relative to the observed regime statistics such as the distributions of event durations and the probabilities of occurrence of very persistent nights (nights without SBL regime transitions), of any regime transitions, and of multiple subsequent transitions.
The nonstationary occurrence probabilities of very persistent nights as estimated from the HMM analyses cannot be accounted for in a FSMC. The occurrence of regime transitions is slightly overestimated by the FSMC. Transitions subsequent to preceding ones and the mean event durations in each regime are relatively close to the statistics as estimated with the HMM 5 across all sites and seasons. The recovery time between regime transitions, however, is not explainable by any two-regime FSMC.
By fixing the persistence probability matrix and producing new perturbed HMM regime sequences we have quantified the range of persistence probabilities that are consistent with the most likely HMM regime sequence. At all sites considered, we find that the HMM regime sequence varies only slightly for reasonable variations of transition probabilities.  A prototype of an explicitly stochastic parameterization is developed based on the following foundations. The explicitly 25 stochastic parameterization uses a new local variable S tracking the SBL regime (wSBL or vSBL). At each time step, the occurrence of a transition is determined randomly using the instantaneous state-dependent transition probabilities. If S is determined to be in the vSBL, episodes of enhanced turbulent mixing are added.
Experiments in an idealized single column model (SCM) confirm that such an approach provides a reasonable representation of SBL regime dynamics. VSBL to wSBL transitions are related to the occurrence of turbulence bursts, lagging these slightly. 30 The simulated responses of temperatures and wind speeds due to the enhanced heat and momentum fluxes towards the surface are comparable to observations. For both transitions, simulated recovery times are consistent with the observed distributions.
Emphasizing the fact that the explicitly stochastic parameterization presented here is only intended to be a proof of concept, preliminary results suggest that the parameterization has the potential to simulate SBL regime dynamics in weather and climate models. The observational information on climatological regime statistics, and event duration distributions (cf. AM19b, 35 AM19c, and the present study) can be used to tune the parameterization to generate the appropriate SBL regime variability. Due to the fact that event duration distributions and stratification-dependent transition probabilities are similar between the midlatitude tower sites, we believe that the transition process at those stations can be parameterized independent of the local complexity of the surface conditions (such as surface type, topography etc.). Although at DomeC similar stratificationdependent transition probabilities can be obtained, the altitude range used to determine stratification is different than at the other 5 sites, suggesting that a generalized parameterization has to take additional local state variables into account. Furthermore, even though a systematic behaviour of transition probabilities conditioned on Ri B across the different tower sites is absent, Ri B is a coarse approximation to Ri. Analyses of other data sets (with higher spatial and temporal resolution) allowing for better approximations or an estimation of the gradient Ri or Ri-flux are needed to determine if a systematic behaviour is truly absent.
As our model analysis only considers fixed surface and upper boundary conditions, sensitivity analyses to these conditions in 10 the idealized SCM as well as to different resolutions both in time and space must be assessed in terms of different observational case studies. Due to the fact that the first-order closure requires us to consider the effects of intermittent turbulence events as an enhancement of diffusivities for momentum and heat, we have imposed a rather synthetic space-time structure of these enhancements. As intermittent turbulence events are associated with the local enhancement of turbulence kinetic energy (TKE), our parameterization of episodically occurring turbulence bursts can more naturally be implemented in a prognostic TKE 15 scheme as an additional TKE source term (e.g He et al., 2012He et al., , 2019. Such an approach would allow the model to determine the space time structure of turbulence pulses as well as the interaction of turbulence bursts. In the future, we will implement the parameterization in a more complex SCM (with and without a prognostic TKE scheme) to obtain a more comprehensive assessment of its use in numerical weather prediction and climate models.
Finally, the parameterization requires further information regarding horizontal dependence of regime statistics, as it is not 20 reasonable to expect an entire large-scale weather or climate model grid box to always be in one or the other state. This horizontal dependence will be the subject of a future study. Assessment of the dependence length scales relative to the grid box size will allow the determination of the effects of spatial averaging to the gridbox scale on the probability distribution of SBL quantities.

25
In this appendix, we present the calculations of quantities based on 'freely-running' stationary Markov chains. Note that we introduce in the following equations the notation of P (i t−1 → i t ) instead of Q itit−1 (cf. equation 2) indicating the regime transition probabilities between two timesteps. We replace the mathematical notation of i ∈ {0, 1} for the regime occupation with the terms wSBL and vSBL in order to increase the readability.

A1 Calculation of very persistent regimes
The occurrence probability of very persistent SBL nights in a stationary Markov chain is calculated using the persistence probabilities of the Markov chain (i.e. P (wSBL → wSBL) and P (vSBL → vSBL)) as follows P r(wSBL|n) = π wSBL P (wSBL → wSBL) n , (A1) where π wSBL and π vSBL are respectively the initial climatological distributions of being in the wSBL or vSBL and n equals the length of the night in hours multiplied by six (for a data resolution of 10 min)

A2 Calculation of at least one particular SBL transition occurrence
The probability of the occurrence of a particular SBL transition in a night of duration n can be expressed in terms of the probability of the absence of any transitions and the probability of a single complementary transition. In the case of the wSBL 10 to vSBL transition, the single complementary transitions are from the vSBL to the wSBL. Naturally, the reverse is true for vSBL to wSBL transitions. That way we account for all possibilities that definitely do not have a transition of the desired type.
The probability of the occurrence of turbulence collapse is: The probability that a turbulence recovery event occurs after a turbulence collapse in a night of duration n time steps is equal to the sum of the probabilities of all events that include the occurrence of SBL patterns starting at time t 1 in the wSBL, and afterwards showing the sequence wSBL → t× vSBL → . . . → vSBL → wSBL with no further subsequent recovery events, i.e.
the SBL remains in the wSBL or have a maximum of one more collapse. The last part of this calculation assures that no double counting of sequences with length t occur as the probability calculation of being in the wSBL at time t 1 does not include information of the preceding path. The probability of a certain subsequent recovery pattern of length t can then be calculated where π is the vector of climatological initial probabilities.
To calculate the overall probability that such a subsequent event occurs is then the summation over all possible t: Equivalently, the probabilities of subsequent turbulence collapses after recovery events are To calculate the overall probability that such a subsequent event occurs is then the summation over all possible t: The idealized SCM is a model of pressure-driven flow in the dry SBL assuming horizontal homogeneity, described in detail in Holdsworth and Monahan (2019). The model equations follow those of Blackadar (1979): where the three state variables U (z, t), V (z, t) and T (z, t) are the zonal velocity, meridional velocity, and potential temperature. 10 The surface temperature T s is determined by the sum of radiative, turbulent sensible heat, and surface heat fluxes as described in more detail below. The constant C HL = 2 K h −1 represents the atmospheric cooling due to net long-wave radiative flux divergence and is set as a fixed constant for simplicity.
The geostrophic wind components are defined by with the magnitude of the geostrophic wind speed given by S g = (U 2 g + V 2 g ) 0.5 . The vertical heat flux H = ρc p w T and shear stresses τ x = −ρU w and τ y = −ρV w (where w is the vertical velocity) are parameterized using first order closure τ x /ρ = K m ∂ z U , τ y /ρ = K m ∂ z V and H/ρc p = −K H ∂ z T , where K m and K h are the diffusivities for respectively momentum and heat. The diffusivities are taken to be the sum of molecular and turbulent where the molecular contribution ν = 1.5 × 10 −5 m 2 s −1 is the kinematic viscosity. The molecular Prandtl number is fixed at Pr = 0.72 and λ = ν/Pr is the molecular diffusivity. The mixing length is given by with κ the von Kármán constant, u * friction velocity, C = 26 (Van Driest, 1951), and λ 0 = 0.00027S g /f 0 (Blackadar, 1962).
The stability functions f m,h (Ri), depend on the Richardson number Ri = g TREF ∂zT (∂zU ) 2 which are related to the similarity where ζ = z/L is the stability parameter and L = − u 2 * κ g Ts H0 cpρ is the Obukhov length. In our simulations, we use the Businger-Dyer formulation given by φ m,h (ζ) = 1 + βζ where β = 1/Ri c = 5.2 (Businger, 1988). 10 At the upper boundary of the model we impose the boundary condition that the flow is geostrophic and a no-flux condition so H = 0 and τ = 0. The lower boundary of the model domain is determined by the roughness length (assumed to be the same for momentum and energy) with no-slip boundary conditions U (z 0 ) = 0 and T (z 0 , t) = T s (t).
The model implements the surface energy scheme of Blackadar (1976), known as the force-restore method. The surface, represented as an infinitesimally thin layer with temperature T s (t) at z = z 0 , is forced by the net radiation and sensible heat flux 15 and restored to the subsurface temperature through the subsurface energy fluxes. The damping depth of the diurnal forcing d = (2λ s /C s ω) 0.5 , where C s = ρ s c s is the volumetric heat capacity, is associated with a sinusoidal diurnal forcing. The temperature at this depth is set as the subsurface temperature T d = 281 K. In Eq. (B4), C 1 = 2/(0.95C s d) and C 2 = 1.18(2π/T d ). The first two terms in Eq. (B4) constitute the net long-wave radiation Q n , the third term is the sensible heat flux into the atmosphere due to turbulent transports H 0 , and the fourth term is the heat flux into the subsurface G. As our focus is on the stably stratified 20 boundary layer we do not include the effects of albedo or latent heat in the heat budget. We also neglect the effects of the vegetation canopy.
The downwelling longwave radiation is given by where Q c is the cloud fraction, Q a is the specific humidity and T a is the atmospheric temperature at a reference level z a just 25 above the Earth surface (Staley and Jurica, 1972;Deardorff, 1978). For simplicity, Q a is held constant at 0.003 kg kg −1 .
The equations are integrated in time using a fourth order Runge-Kutte method. The spatial discretization is obtained using finite differences on a logarithmic grid. This grid has 50 vertical levels with a much finer resolution in the boundary layer than aloft and is determined by z j = ∆z 0 We define t = 0 as the time when the shortwave radiation goes to zero acknowledging the fact that observations indicate that the onset of the SBL can occur before this time (van Hooijdonk et al., 2017;van de Wiel et al., 2017, AM19a). The initial conditions were set in accordance with the logarithmic equations that arise from MOST. The near-neutral profiles for temperature and wind used to initialize the model are given by where U ext = U g κ/ ln(h/z 0 ), V ext = V g κ/ ln(h/z 0 ) and θ ext = 0.01 K ( Monin and Obukhov, 1954). For simplicity, we set ∂p ∂y = 0 in all of our simulations, so U 0 is identically zero at the start of the simulation.
Competing interests. The authors declare that they have no conflict of interest.

10
Acknowledgements. We would like to thank a number of individuals and institutes for their willingness to share their tower data which were indispensable in carrying out this extensive comparison of SBL structures at different location sites. Our acknowledgements are presented in the order that the tower stations were presented in the paper but we are equally thankful to all.    Figure 13. As in Figure 11, but for the probabilities of the occurrence of turbulence recovery events subsequent to turbulence collapse (left panels) and turbulence collapse events subsequent to turbulence recovery events (right panels).     Θ4 −Θs, and Θ100 −Θ30 for respectively land-, glacial-, and sea-based tower sites; binned by 0.2 K increments) and best-fit curves. In order to reduce sampling variability in those panels, only data are considered for which the regime occupation probability in a bin exceeds 0.1 % of all data within that regime. Third row: Parameterization of the state-dependent transition probabilities conditioned on stratification using the mean and median parameter sets of the curve fits (respectively solid and dotted black lines). The best-fit estimated through all stratification data is displayed in red. Fourth row: Root mean square error (RMSE) between the conditional transition probabilities as estimated from HMM analyses and the parameterized conditional transition probabilities. All transition probabilities have been normalized to 10 minute intervals.   Figure 110. One realization of a twelve hour simulation of the evolution of the nocturnal boundary layer (with time zero being the time the net energy surface flux becomes negative) using the proposed parameterization. Times when S is in the vSBL are highlighted in grey. Top row from left to right: RiB (solid and dotted black lines respectively for the reference experiment without the stochastic parameterization and the experiment with the stochastic parameterization) and the strength of the stochastic forcing (blue line; left); the structure of the stochastic forcing function (middle), and the resulting kinematic heat fluxes (right). Middle row from left to right: the stratification between different levels (left), the temperature profiles (middle), and the difference in the temperature field to the reference experiment without the stochastic parameterization (right). Third row from left to right: Wind speeds at different heights (left), wind speed profiles (middle) and the difference in the wind field to the reference experiment without the stochastic parameterization (right). Table 11. Basic information about the different land-, glacial-, and sea-based tower sites (geographical coordinates, time resolution) and their individual reference HMM state variable inputs Y (wind speeds W h and static stabilities ∆Θ with their observational levels h) and reference transition probability matrices (Q ref ) of HMM analyses estimated from Y. Starting regimes for the transition probability matrices are denoted with a star. Transition probability matrices at Hamburg, Los Alamos, and DomeC are transformed to a 10 minute time resolution, so a direct comparison to other sites is possible. To retrieve original transition probability matrices at these sites the 1/10, 3/2, and 3 matrix powers (respectively) must be taken.

Tower site
Reference state variables Q ref References   Table 14. Parameter values for the state-dependent parametric probability transition functions conditioned on stratification (Θ100 − Θs, Θ4 − Θs, and Θ100 − Θ30 for respectively land-, glacial-, and sea-based sites) and the RMSE between parameterized values and those obtained from estimations of HMM analyses. The mean and median values of the parameters are stated below together with the best fit approximation through all data (averaged).