Probabilistic downscaling of precipitation data in a subtropical mountain area: a two-step approach

In this study, a two-step probabilistic downscaling approach is introduced and evaluated. The method is exemplarily applied on precipitation observations in the subtropical mountain environment of the High Atlas in Morocco. The challenge is to deal with a complex terrain, heavily skewed precipitation distributions and a sparse amount of data, both spatial and temporal. In the first step of the approach, a transfer function between distributions of large-scale predictors and of local observations is derived. The aim is to forecast cumulative distribution functions with parameters from known data. In order to interpolate between sites, the second step applies multiple linear regression on distribution parameters of observed data using local topographic information. By combining both steps, a prediction at every point of the investigation area is achieved. Both steps and their combination are assessed by cross-validation and by splitting the available dataset into a trainingsand a validation-subset. Due to the estimated quantiles and probabilities of zero daily precipitation, this approach is found to be adequate for application even in areas with difficult topographic circumstances and low data availability.


Introduction
Downscaling of climate data is an important issue in order to obtain high-resolution data desired for most applications in meteorology and hydrology and to gain a better understanding of local climate variability (e.g.Maraun et al., 2010).Although climate data is often required with high resolution, most datasets are provided as gridded data (General Circulation Model (GCM) output or reanalysis data) or sometimes sparsely distributed observations at weather stations.
Correspondence to: R. Haas (rhaas@meteo.uni-koeln.de)Currently available GCM datasets have a spatial resolution of typically 150-300 km (Meehl et al., 2007) and reanalysis datasets of the order of 100-200 km (Kistler et al., 2001;Uppala et al., 2005).Due to their low spatial resolution, such GCM and reanalysis datasets are missing necessary information on the actual regional characteristics, e.g. to reproduce effects of strong height gradients or complex topography like in mountainous areas (Wilby et al., 2004).
To obtain the desired resolution for impact studies, information from large scale has to be transferred to local scale.Further, it is desirable to extrapolate punctual climate observations to a regular grid.Several different downscaling techniques have been developed in recent decades (cf.for reviews e.g.Fowler et al., 2007;Maraun et al., 2010;Wilby et al., 1998;Wilby and Wigley, 1997).They may be roughly divided into dynamical and statistical approaches.Some studies also present a combination of both approaches (e.g.Fuentes and Heimann, 2000;Pinto et al., 2010).Dynamical downscaling uses in most cases Regional Climate Models (RCMs) nested into GCMs with a higher spatial resolution.These have typical resolutions of 10-50 km, sometimes down to 2-3 km and focus on the region of interest.Statistical downscaling uses the fact that local climate is influenced by global climate and local surface characteristics.The relation between those two factors is described by statistical functions, which can be obtained e.g. by regression.Advantages against dynamical downscaling are low computational costs and straightforward adjustment of approaches to new regions or variables.In general, statistical downscaling techniques relate large-scale variables, called predictors, and local variables, called predictands.The different methods are usually categorized into the three main groups "weather classification", "regression models" and "weather generators", which can be used both separately and combined.
The application of weather classification is helpful to group days, especially if the chosen variable is discontinuous in space and time.Wilby (1994) obtains coherent R. Haas and K. Born: Probabilistic downscaling of precipitation data results for mean rainfall characteristics in southern England with weather classification techniques.For our investigation area, a statistical dynamical approach considering circulation weather types is presented in Hübener and Kerschgens (2007).
Linear regression models perform well for continuous variables like temperature.Huth (1999) combines multiple linear regression (MLR) with principle component analysis (PCA) and utilises three kinds of model selection: stepwise regression, full regression and point-wise regression.The model using full regression performs best in reproducing daily mean temperatures in central Europe.Kidson and Thompson (1998) also apply a form of screening regression by choosing a maximum of five predictors that are taken into account for the regression equations.Several other studies investigate the performance of regression techniques with good results for temperature and show the need for improvement for precipitation (e.g.Kim et al., 1984;Wilks, 1989).A critical point for these regression methods is that for climate change studies, the distribution parameters have to be inserted into a regression model for future predictors using parameters obtained from historical data, because global-local scale transfer relations may not be stationary in future.
In general, weather generators are stochastic models that produce time series of local variables from statistical characteristics.They are mostly used in combination with other techniques.For a Hidden Markov Model (Bates et al., 1998), it is combined with weather classification.A combination with regression techniques is useful to disaggregate monthly or daily information or if the time series of observations is not long enough.Bürger (1996) introduces so-called expanded downscaling.The approach is similar to linear statistical downscaling, but links the covariances of local climate and global circulation instead of their anomalies.It operates on the global covariance and outputs a local covariance so that expanded downscaling can be used to generate local weather scenarios and is consequently a combination of a regression model and a weather generator.The study shows that expanded downscaling replicates the variability of temperature and precipitation closer to the observations than a simple linear regression model.The advantages become mostly clear in detecting precipitation sums.
An alternative to the above mentioned pure regression approaches is probabilistic downscaling, where probability distributions instead of time series themselves are considered.Bremnes (2003) drafts such a method for precipitation, where a common strategy is to divide the probability distribution into one function for the precipitation occurrence and one for the precipitation amounts.To relate large-scale predictors and local climate observations, Michelangeli et al. (2009) suggest a transformation between the cumulative distribution functions (CDFs) for recent and future climate conditions.The assumption of stationarity is a key point in statistical downscaling (Wilby et al., 2004;Maraun et al., 2010) and has to be made if climate model output is included.The above mentioned problem of static transfer relations between global and local scale climate parameters is not solved using probabilistic downscaling, but is reduced because the link between global and estimated local variables is not assumed to be restricted by assumption of a (generalized) linear functional relationship.The influence of this assumption can be validated by replacing climate projections with reanalysis data and splitting available data into two subsets and comparing the according transfer functions.
The objective of this work is to develop an approach suitable for downscaling precipitation data in a subtropical mountain environment, in particular of the High Atlas in Morocco.For this region, precipitation variability, teleconnections and dry-and wet-periods recurrences have been analysed on a larger spatial scale (Born et al., 2008(Born et al., , 2010)).In the present study, the challenge and the aim are to make a statement on spatial distributions of precipitation characteristics in areas with difficult local conditions.The problems to be handled are skewed precipitation distributions, strong topographic gradients and little amount of data.So far, only few studies dealt with the effects of topography on daily precipitation in Africa (e.g.Hewitson and Crane, 1996).But in particular in this area the knowledge of available fresh water resources and their future development is an important issue for agriculture and policy makers (Speth and Fink, 2010).
A CDF transformation, similar to Michelangeli et al. (2009) and based on probability mapping, is used in a first step to work out a transfer function between large-scale reanalyses and observations for each given test site.Therefore a theoretical CDF, e.g.Weibull, is fit to the empirical cumulative distributions of daily precipitation amounts.In this study, data consists of observations from eleven weather stations and ERA-Interim reanalyses.In a second step, we combine this approach with MLR applied on estimated parameters of the theoretical distribution.This two-step approach is then used to estimate precipitation distributions at every point of the investigation area taking into account local topographic information and large-scale distribution parameters.
Details on the investigation area and data sources are given in Sect. 2. In Sect. 3 the linking of model data, observations and topographic data is explained.Validation and application results of the two-step approach are presented in Sect. 4. In Sect. 5 the probability of zero daily precipitation is investigated.A short summary and conclusions finish this paper.

Investigation area and data
Precipitation observations used in this study were collected within the GLOWA project "IMPETUS West Africa" (for a comprehension of the project see Speth et al., 2010).The considered area is characterized by strong NW-SE gradients both of altitude and precipitation.The locations of the stations and the topography are shown in Table 1 and Fig. 1.They are located at the river catchment of the river Drâa (Arabic: Oued Drâa) in south-eastern Morocco.The area is characterized as semi-arid, has a size of 28.428 km 2 and contains parts of the alpine High Atlas Mountains in the north and the arid Saharan borders at the foothills.The altitude of the test sites ranges from 445 m (Lac Iriki) to 3850 m (M'Goun).
The precipitation in the mountains feeds the reservoir El-Mansour-Eddahbi, which is the fresh water source of six river oases downstream, though only two are permanent throughout the year (Schulz and Judex, 2008, chapter 3).At the three highest test sites, the measured rain has to be supplemented by snow precipitation, which is detected by snow heights.
The records cover a period of about eight years (16 November 2000 to 1 November 2008), with failure rates within this period from 7% to 22%.They were caused by shorter measuring periods at several test sites and by operational failures.The availability of daily precipitation values is illustrated in Fig. 2. The data matrix contains 1201 days, where reports of all sites are complete.This number can be

.3).
A large number of observations is important, as the precipitation distribution is only valid for days with precipitation occurrence, and many days without precipitation cannot be used for the parameter estimation.Further, the period should be long enough to split it into a trainings-and a validationsubset.
As surface data, a digital elevation model (DEM) with a resolution of 1 km × 1 km is used.It is based on the Space Shuttle Radar Topography Mission (SRTM) in February 2000 (Farr et al., 2007)."Total precipitation" of ERA-Interim reanalyses is daily accumulated and represents largescale data with a resolution of 0.5 • × 0.5 • (Berrisford et al., 2009).To allow for a comparison between this data and the observations, the reanalysis amounts are bilinearly interpolated to test sites.

The two-step approach
The approach presented in this section links two problems: (i) downscaling of large-scale reanalyses to the sites and (ii) interpolation or extrapolation of observational data to a high-resolution grid using an elevation model of the investigation area.To solve these problems we combine probability mapping and MLR.We use a probabilistic approach, which means that statistical precipitation characteristics are used as predictands instead of absolute rainfall amounts.

Estimation of the Weibull distribution parameters
As a preliminary step, the empirical precipitation distribution for values greater than 0 mm precipitation amount is fit by a theoretical distribution.In this study, the cumulative Weibull distribution is chosen.The advantage is that the estimation of the parameters is simple and straightforward (see Zhang et To estimate the parameters, the distribution can be transformed so that the converted values can be fit by a linear regression (see Appendix A1). Figure 3 shows an example for the estimation of the parameters for observations and bilinearly interpolated reanalyses at test site Bou Skour.The linear form of the curves at least for precipitation values above the lowest observation of 0.1 mm indicates that the Weibull distribution allows for a sufficient fit of the data.This is verified by a Kolmogorov-Smirnov goodness-of-fit test, where the estimated Weibull distribution is only rejected for observations at Tichki on 5% significance level and reanalyses interpolated to Trab Labiet on 5% and 1% significance level (see Table 2).The different slopes result from the fact that the gridded precipitation, which has to be interpreted as a grid-box average, shifts the rainfall distribution slightly towards a distribution with higher frequencies of small rainfall values.Additionally, the tendency to smaller amounts is increased by the bilinear interpolation.It may be concluded that the left tail of the reanalysis rainfall data distribution, which contains very small values (beyond the lowest observation limit!), is not adequately represented by a Weibull function.But this is not problematic here for two reasons: on the one hand we are mainly interested in "observable" rainfall amounts, on the other hand the different slopes should be realized by the transfer function (see Sect. 3.2).

Transfer function between model data and observations
Climate change studies on a local scale usually have to consider three known distributions: one for historical observations at a certain location and two for climate model data In this study and for validation of the method, future projections are replaced by reanalysis data.ERA-Interim daily precipitation totals are bilinearly interpolated to sites.In order to allow for the validation of the method, both series, observations and interpolated reanalyses, are split into a trainings-and a validation-subset.To estimate the distribution for the validation-subset at the sites, a new precipitation dataset at the sites (x SV , S = station data, V = validationsubset) is estimated based on Michelangeli et al. (2009).An equal probability mapping (see Appendix A3) is applied to define a transfer function between the distributions of reanalyses and of observations.It takes into account the change of the large-scale distribution form the trainings-to the validation-subset.A common strategy in statistical downscaling is to assume that the relationships between predictors and predictands remain constant for periods outside the fitting period (Wilby et al., 2004).This assumption is adopted here so that the transfer function can be applied to the model dataset of the validation-subset (x RV , R = reanalysis data, V = validation-subset).For following validation, attention has to be turned on impacts of this strategy.The distribution parameters of the estimated values can be estimated as described in the subsection above.The advantage of this method is the fact that MLR can be directly applied on the new estimated distribution parameters.

Multiple linear regression
After using the reanalysis data as large-scale predictor to estimate the precipitation distribution at the stations for the validation-subset, the aim is to enhance the resolution of the data.For this purpose, a transfer function between dependent variable (predictand) and explanatory variables (predictors) is estimated by MLR (see Appendix A4).According to the probabilistic approach, the Weibull parameters are used as predictands instead of observed rainfall amounts (except for the data enlargement).As possible predictors height, longitude, latitude and the gradients of height in east-west-and north-south-direction are investigated.To avoid an overfitting of the model, a predictor analysis is carried out.We chose forward-selection with the corrected coefficient of determination as criterion.In contrast to the uncorrected form, it takes into account the degrees of freedom to avoid an increase with every additional predictor of the model (see Appendix A5).For validation, MLR is first investigated separately by leave-one-out cross-validation regarding only Weibull parameters fit to observations.Additionally, the goodness-of-fit test mentioned in Sect.3.1 is repeated (see also Sect. 4.1).
So far, CDF transformation and MLR are only used separately.We combine both approaches in this study to model the precipitation distribution of a validation-subset at any point of the investigation area.The combined approach is validated by means of the two subsets.It has to be remarked that the order of the two steps is arbitrary and has to be investigated.

Logistic regression
The occurrence of precipitation is modelled by logistic regression (Chandler and Wheater, 2002).For this purpose, the probability of zero daily precipitation (p 0 ) is calculated for observations and reanalyses interpolated to the test site locations.According to the transfer function between reanalyses and observations (Sect.3.2), a ratio t between p 0 of the observations and p 0 of the reanalysis data is calculated for the trainings-subset.To avoid estimated values of p 0 lower 0 or greater 1, it is replaced by a so-called logit ln p 0 1−p 0 .The probability of zero daily precipitation at the test sites in the validation-subset can be estimated by the calculated ratio and the logit of the validation-subset reanalyses.For details see Appendix A6.The estimated logit is then used as predictand in the regression model (see Eq. A11).

Estimation of precipitation distributions
For following analyses, precipitation distributions F (x) are estimated from precipitation days only.Thus, the complete CDF is H (x) = p 0 +(1−p 0 )F (x), where p 0 is the probability of zero daily precipitation.

Validation
For validation, the two parts of the approach are first examined separately.Only winter months (December to February) are considered in this study because most of the precipitation occurs during these months.This leads to a sample size of 593 days including days without precipitation.The approach may also be applied on data of the other months, but the errors increase with decreasing number of precipitation days.
To analyse the skill of the CDF transformation and especially the accuracy of the stationarity assumption, the available time period of winter values is split into a trainingswww.nonlin-processes-geophys.net/18/223/2011/ Nonlin.Processes Geophys., 18, 223-234, 2011 and a validation-subset.One possibility is to divide the time period into first and second subset, thus into two continuous subsets.Another possibility is to divide it according to even and odd entry numbers, which means that every second value is used for validation.The different behaviours of the CDFs resulting from the two different possibilities are illustrated for station Bou Skour in Fig. 4. The difference between first and second subset is larger for reanalyses than for observations (left plot).It results that if the first subset is given (blue solid), the second subset is underestimated (red dashed).The first subset (red solid) estimated from the second subset (blue dashed) is overestimated.In terms of the assumed static transfer functions, this means that the one determined from the first subset is steeper than the one from second subset.The CDFs of reanalyses split after even and odd are nearly identical (right plot).The CDFs of observations differ for small amounts and are similar for larger amounts.Therefore, the estimated even CDF (solid red) is nearly equal to the given odd CDF (blue dashed) and the other way round.Nevertheless, the estimates are better than for the first mentioned splitting due to the high conformance for large amounts.At the high situated test sites, both validation methods deliver good estimates based on larger number of values greater 0 mm.The quantiles, estimated by CDF transformation, are plotted against the original ones in Fig. 5 (left).The above mentioned over-and underestimations are obvious.The deviations from the optimal diagonal grow with higher amounts in case of the two continuous subsets and are roughly constant for the two other subsets.
The estimated MLR model is tested separately by crossvalidation.For this purpose, two subsets are built; one for slope m and one for axis intercept b, determined as explained in Sect.3.1 from all available winter observations.Each includes ten of the eleven test sites.Afterwards, the best MLR model for m and the best MLR model for b are detected separately.The coefficients of the eleventh left out test site  2).As can be seen in Fig. 5 (right), the result is very encouraging for Bou Skour as the quantiles are nearly at the optimal diagonal.The skills at the other test sites are also satisfying but have a larger deviation from the optimal diagonal.This results show that MLR can be used without the first step and thus without model data, but in that case it would only be an interpolation method and limited to observations.
For the combination of methods, the order of the application of CDF transformation and MLR has an influence.The skills differ from test site to test site and from subset to subset.Nevertheless, no scheme can be established so that none of both possibilities can be clearly identified as the better one.For the following applications on the entire investigation area, first CDF transformation and then MLR is used.Advantageous are the lower computational costs, because CDF transformation is not applied to every grid point of the DEM, but only to the eleven test sites.
The estimated quantiles for Bou Skour are shown in Fig. 6; the results of the other stations are summarized in Table 3.The deviation from the perfect diagonal is at some test sites even less than for the single CDF transformation.These results are compared to the ones of a simple nearest neighbours approach and the ones from MLR only with reanalysis data (not shown).In both cases observations are disregarded as trainings data set so that quantiles are underestimated and results are worse than those delivered from the combined approach.

Application
For the application of the approach to the entire investigation area, the time series is again split after even and odd entry numbers.Winter values of all test sites are taken into account for the estimation of the regression model.
The resulting quantiles (25th, 50th and 75th) are shown in Fig. 7.It is obvious that the estimated patterns show a good agreement with the original quantiles at the stations, marked in the circles.The quantiles estimated for both subsets are of a similar order of magnitude, but the plots for the two subsets have little different structures according to the selected predictors.Interestingly, the longitude has a large influence on the first subset.Within the shown area, a low gradient is visible from west to east for the low quantiles and from east to west for the high quantiles.This is caused by a west-east gradient of the parameter α and an east-west gradient of the parameter β.Thus, the gradient of the quantiles becomes stronger the farther the data is extrapolated away from the test sites.This weakness in extrapolating data is caused by the north-south arrangement of the test sites and the resulting lack of observations in east-west direction.For the second subset, the height and the gradients of height are more influencing, so that the topographic texture emerges more clearly.Keeping in mind the splitting after even and odd, the subsets should be well mixed and have similar statistical characteristics.Hence, the differences can be traced back to the small sample size and should disappear if the sample size is large enough.A larger spatial expansion of available observations  can be expected to improve results, especially with respect to different weights of predictors in different subsets.Anyhow, this fact pronounces the limits of the method with respect to extrapolating data.

Probability of zero daily precipitation
The probability of zero daily precipitation (p 0 ) is calculated for the observations and the reanalyses.Both values are considered at the sites.For the calculation p 0,reanalyses two procedures are possible: (i) amounts are first interpolated to the test sites and then p 0 is calculated; or (ii) p 0 is calculated at grid points and then interpolated.To avoid negative estimated values of p 0 or values greater 1, logistic regression is used.The relation between reanalysis and observational data is included by a factor calculated with the ratio between the probabilities of both datasets.For validation, the time series is split after even and odd values and cross-validation is applied.
In Fig. 8, the original probabilities are plotted against the estimations.The estimation works well for test sites with a low number of precipitation days, respectively low situated test sites.At the higher located sites, p 0 is strongly overestimated (Tichki) or underestimated (M'Goun and Tizin-Tounza).This is caused by the MLR and only slightly strengthened by prepending the transfer factor.Regarding the interpolation procedure, interpolating reanalysis data first and then calculating p 0 achieves the best results (compare Table 4).Therefore, this order is used for the following investigations.
The results for the entire investigation area are shown in Fig. 9.As can be seen, the agreement between original and estimated probabilities at the three highest sites is much better than the results with cross-validation because the regression model is trained with eleven instead of ten predictands.The lower situated stations show also good estimations for the first subset of the observations (even entry numbers, left).In the second subset (odd entry numbers, right), p 0 is little underestimated at the station Bou Skour and Trab Labied.In this case, strong gradients of p 0 occur if data is extrapolated west of the shown area, similar to the quantiles.This could also result from the lack of observations in this region.

Summary and conclusions
In this study, a two-step probabilistic approach for downscaling of precipitation data has been proposed and evaluated in a subtropical mountain environment, namely the upper and middle Drâa catchment south of the High Atlas in Morocco.The method combines CDF transformation and MLR to relate punctual climate observations, reanalysis data and highresolution surface data.The cumulative Weibull distribution is examined as theoretical distribution by a hypothesis test.The approach is validated by cross-validation and splitting the time series into a trainings-and a validation-subset.The results show that the best choice is to split the time series according to even and odd entry numbers, rather than selecting two separate continuous subsets.The favoured combination order is first applying CDF transformation and then MLR, as computational costs are considerably lower in this order.The Q-Q-plots document the good agreement between the quantiles of the calculated distributions and of the estimated distributions.This is valid for CDF transformation and MLR themselves and also for the combination of both steps.For the estimation of the probability of zero daily precipitation, reanalysis are interpolated to the site locations and then p 0 is calculated and related with the local surface data by MLR.The differences between calculated and estimated probabilities are low for test sites with a low number of precipitation days.However, the approach performs less well for the three highest test sites if the model is trained only with ten predictands.Comparing the performance of the approach for interpolation vs. extrapolation, the results for the former are very encouraging, in particular if longitude is not used as a predictor (Fig. 7).On the other hand, results for extrapolation are comparatively worse: For example, some unreliable values appear west of the test sites where no stations are available.Taking into consideration the small amount of available data, both temporal and spatial, we consider the approach to be appropriate for areas with complex topography.
In this study, we have chosen a linear regression approach as statistical downscaling technique, which is rather unusual for precipitation.Such linear regression approaches are more typical for normally distributed variables such as temperature (e.g.Huth, 1999).Nevertheless, we could demonstrate that such an approach may also be applicable for precipitation if a probabilistic view is used and in combination with a CDF transformation.At this point, we had to make use of the assumption of static transfer functions, which is common in statistical downscaling techniques.Although this strategy has an influence on the results, it has its adequacy due to the gain of information.Without the probabilistic view and the associated use of distribution parameters instead of observed precipitation amounts in the linear regression model, estimated values may be unrealistic.Furthermore, separately used MLR of observations is strictly speaking only a kind of interpolation.The advantage of the combination with CDF transformation is that large-scale information and the changes between two different samples (e.g. a historical and a future climate) can be included in a simple way.If this information was included as predictor in a regression model (e.g.Murphy, 1998), it would not be possible to take the local topography into account due to the different scales.However, the probabilistic approach delivers precipitation characteristics only in form of distributions.To simulate daily values, a weather generator would have to be subsequently applied (e.g.Semenov and Barrow, 2002), potentially in combination with weather classification (e.g. Bates et al., 1998).
We have noticed some difficulties in transferring the information from the test sites to the whole area, in particular for regions where no observations are available.Thus, the results could be improved significantly when using a spatially denser dataset.In particular for the presented investigation area, an improvement is expected if additional test sites would be available in east-west direction.Based on this assessment, good results for an application in areas with a large density of test sites (e.g.Europe) may be expected.Nevertheless, further work should focus on the improvement of the extrapolation ability.
The presented two-step probabilistic approach has the potential to be applied for other variables (e.g.temperature or wind), as in the probabilistic view only the parameters of the according distributions are estimated.In this case, other or additional predictors have to be selected and investigated.
In the present study, the approach was tested and validated with reanalysis data.As next step, possible changes in precipitation distributions under future climate conditions will be estimated.For this purpose, it is planned to consider RCM data as large-scale predictor as a further step.In particular, it is intended to use the approach to evaluate changes in longterm precipitation variability and to estimate changes in return periods for extreme events between present and future climate conditions.Such information is very important for impact studies, in particular for those dealing with the availability of fresh water resources in the presented subtropical mountain environment.
This estimation is then used as predictand in the regression model: ln p 0,S,V 1 − p 0,S,V = Xc + (A19)

Fig. 1 .
Fig. 1.Upper and middle Drâa Catchment and positions of the eleven IMPETUS weather stations and topography from digital elevation model.

Fig. 2 .
Fig. 2. Availability of daily precipitation values at the eleven test sites.Available dates are marked in green and missing dates are marked in red. 3

Fig. 3 .
Fig. 3. Estimation of Weibull parameters by linear regression at the station Bou Skour.Black dots: Observed values; grey dots: ERA-Interim Reanalysis data interpolated to test site location.

Fig. 4 .
Fig. 4. Cumulative distributions of precipitation amount in winter (December-February) at Bou Skour.Left: the first subset (solid lines) contains the first half, the second subset (dashed lines) the second half of all precipitation days.Right: periods are split according to even (solid lines) and odd (dashed lines) entry numbers.Blue = observations, green = interpolated reanalysis data, red = estimations.

Fig. 5 .
Fig. 5. Left: quantiles of validation-subsets, estimated by transformation of the cumulative distribution function.Blue dots (circles): validation-subset consists of the first (second) half of all winter values.Green dots (circles): validation-subset consists of all winter values with an even (odd) entry number.Right: quantiles estimated by multiple linear regression including all winter values.

Fig. 6 .
Fig. 6.Combination of methods by first transforming the cumulative distribution function and then applying multiple linear regression.Blue dots (circles): validation-subset consists of the first (second) half of all winter values.Green dots (circles): validationsubset consists of all winter values with an even (odd) entry number.

Fig. 7 .
Fig. 7. Transformation of distribution and multiple linear regression applied to observations and topographic data.From left to right: 25th, 50th and 75th quantile.Top (bottom): basic distribution is estimated from all winter values with an even (odd) entry number.The points at the test sites are coloured according to the calculated quantiles.

Fig. 8 .
Fig. 8.Estimated probabilities of zero daily precipitation (p 0 ).Dots (Circles): Validation-subset consists of all winter values with an even (odd) entry number.Blue symbols: First reanalysis data is interpolated to the test sites and then p 0 is calculated.Green symbols: p 0 is calculated at grid points and then interpolated.

Fig. 9 .
Fig. 9.Estimated probabilities of zero daily precipitation (p 0 ).First (second) subset consists of all winter values with an even (odd) entry number.p 0 is interpolated to the grid of the digital elevation model by multiple linear regression.The points at the test sites are coloured according to the original probabilities.

Table 1 .
Detailed information of the eleven IMPETUS test sites.

Table 2 .
Kolmogorov-Smirnov goodness-of-fit test (see Appendix A2).Rejections are marked in bold (Weibull distribution paramters estimated from observations) and/or are underlined (Weibull distribution parameters estimated by MLR).

Table 3 .
Validation of combined approach with the quantile skill score (QSS, see Appendix A5).

Table 4 .
Validation of p 0 estimation: Differences between calculations and estimates ( p 0 −p 0 ) and root mean squared errors (RMSE).(a) First reanalysis data is interpolated to the test sites and then p 0 is calculated.(b) p 0 is calculated at grid points and then interpolated.