Adaptive Smoothing of the Ensemble Mean of Climate Model Output for Improved Projections of Future Rainfall

. Ensemble simulations of future climate can be described as consisting of a forced climate change response and noise, where the noise arises from internal variability and errors in the different models. In the ensemble mean the noise is reduced, making it easier to identify the mean of the forced response. The noise in the ensemble mean can potentially be 10 reduced further by spatial smoothing, and this potential has been explored by previous authors. Depending on the variable, the resolution and the size of the ensemble it has been reported that the benefit of spatial smoothing of the ensemble mean may be small, and that spatial smoothing may have the unwanted side-effect that it modifies genuine features in the forced response. However, the spatial smoothing methods that have been tested previously used the same degree of smoothing at all locations, which limits their effectiveness. We derive a novel adaptive smoothing methodology for the ensemble mean that 15 utilizes ensemble information with respect to signal, uncertainty and spatial correlations in order to vary the degree of smoothing in space. The methodology corresponds to simple intuitive concepts, such as the idea that locations with higher signal to noise ratio should be smoothed less. We apply the method to EURO-CORDEX simulations of future annual mean rainfall, and by using cross-validation within the ensemble are able to demonstrate a three times greater increase in potential predictive accuracy than from the non-adaptive smoothing methods we compare with. The adaptive smoothing method also 20 preserves sharp features in the ensemble mean to a greater extent than the non-adaptive methods. We conclude that adaptive smoothing may be a useful post-processing tool for improving the potential accuracy of climate projections. mean higher precision, i.e., higher potential accuracy. The differences created by the smoothing are on small scales, to around 200km, do affect the continental-scale features of the climate response. The changes are quite localized, small number of regions show large changes due to the adaptive smoothing. The predictions show greater potential accuracy in terms of PRMSE, the smoothed fields distinguish genuine forced signals variability more effectively unsmoothed ensemble mean, the ensemble mean smoothed using non-adaptive methods. the smoothing methods we have tested for other datasets the performance of the method is to on aspects of the behaviour of the ensemble (ensemble means, variances, length-scales and correlations) which would vary depending on the variable being considered, the model and the model resolution. In addition, the optimal amount of smoothing will depend on the size of the ensemble. Applying the adaptive smoothing methodology to different datasets than those we have considered would require testing using a cross-validation based methodology similar to the one that we have applied. One simplifying factor in doing so, relative to the study described here, is that our results show that it would not be necessary to test multiple versions of non-adaptive and adaptive smoothing, with different kernels, but just one of each.

can potentially be reduced with appropriate smoothing. Second, climate model ensemble means are affected by internal 30 variability and errors in the individual climate model simulations in the ensemble, and spatial smoothing can potentially remove some of the impact of this variability and these errors from the ensemble mean.
MK and RY also discuss reasons why applying spatial smoothing to the ensemble mean may not be beneficial. Calculating the ensemble mean already reduces the effects of both errors and internal variability. Also, since it is difficult for spatial smoothing methods to distinguish between the forced climate response and the noise, the smoothing may inadvertently 35 degrade the estimate of the forced climate response to some extent. This is an unwanted side-effect and may offset any potential benefit from the smoothing.
Whether or not smoothing is beneficial has been tested in practice by MK and RY. MK applied Gaussian and circular top-hat smoothers to CMIP3 (Meehl et al. 2007) temperature and rainfall, for both present and future climate. For present climate they compared smoothed model results with observations. They reported that one downside of smoothing is that it creates 40 undesirable fuzziness around features in the ensemble mean. RY also applied Gaussian smoothers to present and future climate and compared with observations. For future climate, they tested the performance of their smoothing methods using cross-validation within the ensemble. They reported that optimal smoothing length-scales were much smaller for the ensemble mean than for individual models and found only small increases in potential accuracy when smoothing the ensemble mean. They discussed fundamental limits that apply to how well smoothing can improve potential accuracy and 45 concluded that there is perhaps no great benefit to be obtained from spatial smoothing of the ensemble mean for large ensembles of independent models.
The results presented in MK and RY were based on spatial-smoothing methods in which the same amount of smoothing is used in all locations, which we will call non-adaptive methods, since nothing about the amount of smoothing adapts to the data. However, the ensemble itself contains information that we can potentially use to determine how to vary the type or 50 amount of smoothing used from one location to another. In particular, there are four factors that can be derived from the ensemble that, intuitively, one would think might be useful as a way to determine what amount of smoothing to use. The first factor is the ensemble spread at the location at which the forced climate response is being estimated (the target location), which can be estimated using the ensemble members. If the ensemble spread at the target location is narrow then the signal there is already estimated relatively accurately by the ensemble mean, and less spatial smoothing might be needed than for 55 other target locations where the spread is wider. The second factor is the ensemble spread at the locations surrounding the target location, that are being used as predictors (the predictor locations). If the ensemble spread at the predictor locations is narrow then the signal is estimated relatively accurately at those locations and hence putting more weight on those predictors might be beneficial. The third factor is the correlation between the variability in the ensemble at the target location and the predictor locations, which can also be estimated using the ensemble members. If the correlation between the target location 60 and the predictor locations is high, that perhaps suggests that putting more weight on the predictor locations might be beneficial. The fourth factor is the size of the ensemble mean signal at the target location relative to the size of the ensemble mean signal at the predictor locations. If there is a large difference in the size of the two signals then that suggests that less https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License. smoothing would be beneficial. One might imagine that these factors should be combined in some way e.g., the size of the difference between the target and predictor locations should perhaps be converted into a signal to noise type of parameter 65 using the ensemble spread. However, exactly how to combine all four factors is unclear.
In this article we derive and test a novel adaptive smoothing methodology, where adaptive means that a different amount of smoothing is used at each location, depending on the ensemble data. It is based on a simple method that naturally derives and combines the four factors listed above and leads to an equation that specifies the amount of smoothing to be used. The statistical method is novel, and comes from an area of statistics known as Frequentist Model Averaging (FMA: see text-70 books such as Burnham and Anderson (2002), Claeskens and Hjort (2008), Fletcher (2019)). The method can potentially break the trade-off inherent in non-adaptive smoothing, discussed in RY, in which increasing the smoothing length-scale may improve potential accuracy but will destroy details. It may be able break this trade-off by smoothing less when there are details that the ensemble indicates should be preserved.
The methodology is based on an equation that is derived by minimising predictive root mean-squared error (PRMSE) in a 75 bias-variance trade-off. Smoothed estimates are made by estimating the forced climate response as a linear combination of the ensemble means at the target location and the predictor locations. The weights in the linear combination are the adaptive feature of the method and are derived separately at each location from the statistics of the ensemble at that location and nearby locations. We apply the methodology with four different shapes of smoothing kernel. We test the methodology using high resolution EURO-CORDEX projections of annual mean rainfall over Europe, for two RCPs and three points in time. 80 We use cross-validation within the ensemble, following RY, to evaluate the PRMSE performance of non-adaptive and adaptive smoothing methods based on each of the four smoothing kernels. We also evaluate the results visually, to assess the extent to which the different methods preserve sharp features in the ensemble mean.
In section 2 we describe the EURO-CORDEX data, and the cross-validation framework we will use to test the smoothing methodologies. In section 3 we describe the smoothing methodologies we will test. In section 4 we present results in terms of 85 RMSE performance and spatial maps of smoothed fields. In section 5 we conclude.

EURO-CORDEX Data
The data we use for this study is annual mean rainfall data extracted from the EURO-CORDEX set of ensemble projections of future climate (Jacob and authors 2014). The EURO-CORDEX ensembles have been used for a large number of studies of 90 European climate in the last decade (e.g., Colmet-Daage et al. (2018), Dalelane et al. (2018), Dyrrdal, Stordal andLussana (2018), Foley and Kelman (2018), Hosseinzadehtalaei, Tabari and Willems (2018), Mascaro, Viola and Deidda (2018), Soares et al. (2017)). We use data from 10 models, each of which is a different combination of a global model and a regional model. The models are listed in Table 1. Further details on EURO-CORDEX and the models we use are given in the EURO-roughly an order of magnitude higher resolution that the climate model output considered in MK and RY. In order to focus on changes in climate, we consider differences between temporal means over future time periods and the baseline time period 1981-2010. We use six scenarios, based on combining results for two different RCPs (RCP4.5 and RCP8.5) and three different future time periods (2011-2040, 2041-2070 and 2071-2100

Cross-validation Methodology
We will derive our smoothing methodology using frequentist statistical ideas. These frequentist arguments should not taken 105 as reflecting a particular philosophy, or set of beliefs, about climate models and how they should be interpreted, or, indeed, any preference for frequentist statistical methods over any other statistical methods. They are simply a convenient mathematical device for deriving a smoothing equation, that may be a useful solution for this particular problem. Once the smoothing equation has been derived, the frequentist assumptions used to derive it can be forgotten. The equation must then be tested and should be judged purely according to how well it works in practice. It may, in fact, be possible to derive the 110 same equation using different arguments.
We are ultimately interested in deriving means, variances and distributions that give projections for possible values of future climate. As part of the frequentist derivation, we will imagine an infinite ensemble of reasonable climate models, covering all sources of uncertainty, and we will write the mean of this ensemble as µ. We do not assume that this infinite ensemble is unbiased, relative to observations. The fact that this infinite ensemble is entirely fictional, and that the word reasonable is 115 impossible to define, is of no concern: all that matters is the effectiveness of the equation we derive based on these abstract concepts.
We will assume that the ensemble members in the EURO-CORDEX ensemble we use are independent samples from the infinite ensemble of models. It is possible to address issues of dependency between models, rather than assume https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License.
independence, and there has been a large amount of research into how dependencies between the models in an ensemble 120 might be estimated and modelled using weighting (e.g., Abramowitz & Bishop (2015), Annan & Hargreaves (2016), Rauser et al. (2015, Sanderson et al. (2015b), Knutti et al. (2017)). However, understanding model dependencies is difficult, and our impression is that this research has so far been somewhat inconclusive. Since our aim is simply to test whether adaptive smoothing is useful, we do not apply weighting. If weighting were included, that would, if anything, reduce the effective sample size of our ensemble and make smoothing more effective. 125 The EURO-CORDEX ensemble is a multi-model ensemble. Relative to single model ensembles, multi-model ensembles are more complex statistical entities. They are, in some ways, more difficult to interpret, e.g., are more likely to need weighting, but may sample model uncertainty more thoroughly. Both single and multi-model ensembles can be used to create estimates of future climate. Given all the challenges involved in converting ensemble output to estimates of future climate, such estimates are perhaps best considered as subjective views of what future climate might be, conditional on all the subjective 130 assumptions used in the post-processing of the models.
Our analysis is restricted to working with climate model results and involves no comparison with observations. This setup is sometimes known as a perfect model experiment. Within this setup our goal can be described as trying to estimate the mean of the infinite ensemble as well as possible. Some aspects of the larger goal, of estimating what real climate may be in the future, cannot be addressed in the absence of observations. For example, there is no way to estimate possible bias versus the 135 real climate, or the extent to which our smoothing methods might reduce such biases. However, we make the assumption that developing a better understanding of climate model ensembles, in the perfect model framework, is nevertheless a step towards this larger goal.
We will evaluate the methods that we test using cross-validated predictive root mean squared error (PRMSE), following RY.
In the frequentist framework, this can be considered as a way of evaluating how well we can estimate the infinite ensemble 140 mean. Alternatively, the infinite ensemble concept can be avoided, and cross-validation can be justified by simply assuming that if we have a method that can improve how well we can predict climate model output, in an out-of-sample way, it may be a better method for predicting the real climate.
For each smoothing methodology we will compute the PRMSE by using a standard leave-one-out cross-validation scheme.
In this scheme, we will drop each of the ten EURO-CORDEX models from the ensemble in turn and apply the smoothing to 145 the remaining nine models. The prediction for µ from the smoothing will then be compared with the dropped model, on a grid-point by grid-point basis. We will calculate the squared error of the prediction at each grid-point, and then calculate the mean squared error (MSE) where the mean is calculated over all grid-points, and by dropping each of the ten models in turn.
We then subtract the variance over all grid-points and models from the MSE value in order to convert the MSE from a value that represents the MSE around an individual model to a value that represents the MSE around the unknown mean µ. The 150 downside of using cross-validation on ensembles of high-resolution model output over large domains is that it is computationally expensive, which limits the number of numerical experiments that can be performed.

Smoothing Methodologies
We now describe the smoothing methodologies we will apply to the EURO-CORDEX data. There are eight methodologies in all: four non-adaptive methodologies and their four adaptive counterparts. Following the notation in MK and RY, we will 155 say that we are trying to estimate µ at the target location ( , ) using predictor locations ( , ), taking into account the distance between the target and predictor locations , ( , ).

Non-adaptive Smoothing Methodologies
We first describe the four non-adaptive smoothing methodologies that we will test. They are all non-adaptive because, within each test, nothing about the methodology varies by location except the input rainfall data at the target and predictor 160 locations. The four methods can all be described as creating a smoothed prediction for µ, written as μ, at location ( , ) using a weighted average over ensemble mean values given by: where , is the ensemble mean at location ( , ), , ( , , , ( , )) is the corresponding weight, and is the set of spatial points used to make the prediction, which consists of the target location and all the predictor locations. If one wishes to avoid 165 the use of the mean of the infinite ensemble µ in the derivation then the methods can be described as predicting the excluded ensemble member in the cross-validation: the end result of the derivation would be the same. We consider different sizes for the set of spatial points used to make the prediction, as indicated by the subscript . The four non-adaptive methods differ from each other in terms of the formulation of the weight, and the set of points . Where some of the points in the set of predictor locations are missing, because of coastlines or because of the edge of the domain, we simply eliminate those 170 points from the set. We now describe each of the four non-adaptive smoothing methods in detail.

Method 1: Square Top-hat
The first smoothing method we apply considers all the points in a square around the target location ( , ), for squares of different sizes. The target location ( , ) is at the centre of the square, and the squares have an odd number of points along each side. The different squares are labelled using = 1,2,3 …, the length of the side of the square in each case is given by 175 2 − 1, and the number of points in the square is given by (2 − 1) 2 . The square corresponding to = 1 involves no smoothing at all and defines a baseline prediction which the smoothing methods can be compared against. We then calculate the average rainfall over all the points in the square, and that serves as the prediction μ. The weight function for this method is given by: , ( , , , ( , )) = 1 (2) 180 One of the reasons this method is relevant is because it relates to the commonly used aggregation method in which data is aggregated within squares.

Method 2: Circle Top-hat
The second smoothing method we apply is similar to the square top-hat, and uses the same weight function, but only considers predictor locations in a circle of diameter points around the target location ( , ). The difference between this 185 method and the square top-hat method is that points in the corners of the squares are now eliminated because they lie outside the circle. This method was also considered by MK.

Method 3: Exponential
The third smoothing method we apply considers the same circular set of predictor locations as in the circle top-hat method, but now down-weights them as a function of distance from the point ( , ) using an exponential function with a lengthscale 190 . We write the weights as: The length-scale is chosen so that the unnormalized weight at the edge of the circle is 0.1, which gives = − 1−2 2 ln(0.1) .

Method 4: Gaussian
The fourth smoothing method we apply considers the same circular set of predictor locations as in the circle top-hat and 195 exponential methods, but now down-weights them using a Gaussian function with lengthscale . Very similar methods were used in MK and RY. We write the weights as: where the length-scale is again chosen so that the weight at the edge of the circle is 0.1, which in this case gives = − (1−2 ) 2 4 ln(0.1) . 200

Adaptive Smoothing Derivation
We now derive the adaptive smoothing methodology that we will use. We will again write our smoothed estimate for the unknown ensemble mean µ at the target location ( , ) as μ. This estimate will be based on a linear combination of the ensemble mean at location ( , ) and a predictor based on a weighted sum of ensemble means at the surrounding points (not including the target location ( , )), which we write as . The two are combined using a factor , and an expression 205 for will be derived below. The prediction μ is therefore given by: Where = − . The parameter will be adaptive, i.e., will vary in space according to information in the ensemble, and this is where the adaptivity of the method comes in. The predictor will be non-adaptive and will be constructed from the surrounding predictor locations using smoothing kernels very similar to those used in the non-adaptive methods, in ways that 210 are explained below.
We are only considering a single predictor , with a single adaptive parameter . One could also imagine more complex adaptive smoothing methodologies that would consider multiple predictors with multiple adaptive parameters. One could also consider methodologies in which the predictor is adaptive e.g., in which the length-scales in the smoothing kernels that determine the predictor are adaptive and vary as a function of information in the ensemble. 215 In order to determine an appropriate expression for we can consider the statistical properties of the predictor μ . The following derivation makes a bias-variance trade-off, similar to that used to derive methods that have been shown to increase the accuracy of estimates of temperature trends by Jewson and Penzer (2006), and increase the potential accuracy of future rainfall projections by Jewson et al. (2021). We write the unknown ensemble mean at the target location as µ , and for the predictor locations as µ . The prediction error is then given as: 220 We now derive expressions for the bias and variance of this prediction error, as a function of . We will then combine the bias and the variance to give the predictive mean square error (PMSE). The bias in the prediction is given as: Where = ( , ) is the covariance between the ensemble mean at the target location ( , ) and the predictor , and we 230 have written = ( ) and = ( ).
Combining equations (7) and (8), the PMSE, as a function of , is given by: PMSE = bias 2 + error variance = 2 2 + (1 − ) 2 + 2 + 2(1 − ) This equation is a version of Eq. (3) from RY but evaluated for the particular prediction model given by Eq. (5). It is a quadratic equation in α, and a proof that it has a minimum is given in Appendix A. 235 In order to find the α value at the minimum PMSE, we differentiate Eq. (9) by , giving: Setting this derivative equal to zero and solving for gives: This gives the value of that minimises the PMSE and the PRMSE. We can see that it contains all four factors that were 240 discussed in the introduction as potentially relevant for determining the optimal degree of smoothing: the ensemble variance https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License. of the location being predicted, captured by the variance of the ensemble mean at that location ( ); the ensemble variances of the surrounding points, captured by the variance of the predictor ( ); the ensemble correlations between the location being predicted and the surrounding points, captured by the covariance between the ensemble mean at the location being predicted and the predictor ( ); and the difference between the size of the signal at the location being predicted and the 245 predictor (δ). Dividing the top and bottom of equation (11) by gives a different form for this expression, in which the δ 2 term becomes a kind of signal to noise ratio.
If we could determine the exact value of , then, based on the derivation above, we would be guaranteed to be able to produce predictions that would have a lower PMSE than using either the local ensemble mean or the predictor alone.
However, in practice we can never know the exact value of , and the best we can do is to estimate it from the sample data 250 that we have. The most obvious (although not necessarily the best) way to do this is by using sample estimates for the terms on the right-hand side of Eq. (11) above. Using sample estimates ̂ for , ̂ for ,̂ for and 2 for δ 2 gives an estimate ̂ which is: Using this estimated value of may or may not give improved estimates, and whether it does or not can only be resolved by 255 testing. The sample estimates ̂, ̂, and ̂ in this expression can be estimated in the usual ways: ̂=̂2 , ̂= ̂2 and ̂=̂ where ̂2 is the observed ensemble variance at the target point ( , ), ̂2 is the observed ensemble variance for the predictor, ̂ is the observed covariance between ensemble members at points ( , ) and the ensemble members for the predictor and is the number of members in the ensemble. 260 For our purposes ̂ only makes sense if it is in the range [0,1]. When ̂ is non-zero, values from Eq. (12) could lie outside this range, and so we restrict ̂ to this range in our calculations by increasing values that are below 0 to 0 and reducing values that are above 1 to 1.

Adaptive Smoothing Methodologies
To define the four adaptive smoothing methodologies it just remains to define the predictor in each case. In all four cases, 265 we define as a weighted average of ensemble means using the expression: This is very similar to Eq. (1), which defines the non-adaptive predictions. The only difference is that the set of points differs from in that it does not contain the target location ( , ). The target location is not included in since it is already included in the prediction given by Eq. (5). Apart from this difference, the weights for the four adaptive smoothing 270 methodologies are calculated in exactly the same way as they are for the non-adaptive methodologies, creating adaptive https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License. square top-hat, adaptive circle top-hat, adaptive exponential and adaptive Gaussian methods. In this way, the adaptive methodologies combine the unsmoothed prediction based on just the ensemble mean at the target location ( , ) with information from surrounding points, and these two sources of information are combined together in a way that varies from location to location, taking into account what we can learn from the ensemble about how best to combine the two. Whether 275 the method works is a matter for testing.

Results
We now show results from the application of the smoothing methods described in Sect. 3 above to the EURO-CORDEX rainfall projection data described in Sect. 2 above. We first present PRMSE performance of the various methods, and then give examples of some smoothed fields. 280 Figure 1 shows the PRMSE performance of the four non-adaptive smoothing methods, versus the parameter that defines the size of the region over which the field is smoothed, for each of the six scenarios (two RCPs and three future time periods). We show the six scenarios separately, rather than aggregated together, to give an idea of the variability around the results when the methodology is applied to different datasets. The results are normalized so that for = 1 (which 285 corresponds to no smoothing) the PRMSE is 100. For all four methods for all six scenarios as increases the errors reduce initially, reach a minimum value, and then increase. The minimum errors, marked by circles, occur at values of of 2, 3 or 4, corresponding to squares of dimension 3, 5 or 7 grid points. The results for the six different scenarios are similar: for the square top-hat method the minimum always occurs at = 2, for the circle top-hat method it occurs at = 2 or = 3 and for the exponential and Gaussian methods it occurs at = 3 or = 4. The top-hat methods reach their minima for lower 290 values of because the shape of the smoothing kernel puts a greater weight on data further from the target location.

295
In all cases exponential smoothing achieves the lowest minimum PRMSE, followed by Gaussian, followed by circle top-hat, followed by square top-hat. The minimum values achieved by the exponential smoothing are all between 96% and 98%, https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License.
corresponding to estimates of the mean rainfall in the notional infinite ensemble that are between 4% and 2% more precise than the unsmoothed ensemble mean. We would therefore say that the reduction in error due to the smoothing is rather small.
The optimal values of correspond to very small optimal smoothing length-scales, relative to those derived by MK and RY: 300 the optimal smoothing regions being used here would lie entirely within one grid box of the model output considered in those studies. We are therefore considering smoothing of different, and much smaller scale, aspects of model error and model internal variability. Figure 2 shows the performance of all eight smoothing methods (four non-adaptive and four adaptive) on a larger horizontal scale up to = 40. We limit to = 40 because the large number of spatial points in our domain, the use of cross-validation 305 and the number of methods compared make the calculations computationally intensive. Nevertheless, we are able to find optimal values of for all eight smoothing methods when we consider all scenarios together (see below). The PRMSE results for the adaptive methods have a completely different character to the results from the non-adaptive methods: they decrease monotonically up to = 18 for all 24 cases (four methods and six scenarios). In 16 of the 24 cases they reach a minimum and start to increase again before = 40 while in eight of the 24 cases they are still decreasing slightly at = 40. 310 They show lower errors than the non-adaptive methods at all values of the parameter , and the minima are in the range from 88% to 94%, corresponding to estimates of the mean rainfall that are between 12% and 6% more precise than the unsmoothed ensemble mean. These increases in precision are about three times greater than the increases in precision achieved by the non-adaptive methods, and are being achieved in very different way, by incorporating information from a much wider area. 315 Figure 3 shows the performance of all eight smoothing methods aggregated over the six scenarios. All methods now reach their minima within the range of values of that we are considering. The best of the non-adaptive methods is exponential, which achieves a minimum of 97.1% at = 3, corresponding to a 2.9% increase in precision. This is equivalent to the increase in precision that would be achieved by adding 0.5 extra independent climate models to the ensemble. The best of the adaptive methods is Gaussian, which achieves a minimum of 91.4% at = 37, corresponding to a 8.6% increase in 320 precision, which is 2.97 times the increase in precision due to the best of the non-adaptive methods. This is equivalent to the average increase in precision that would be achieved by adding 1.9 extra independent climate models to the ensemble, although adding extra climate models increases the precision everywhere, while adaptive smoothing increases the precision in a targeted fashion at certain locations, and so may be considered more effective than adding extra models.
Between the four adaptive methods the differences in the minima achieved are rather small, and can probably be considered 325 immaterial. Once again, the top-hat methods achieve lower errors at lower values of because they put more weight on more distant data points.
We now consider the resulting smoothed fields.     Figure 4 shows Europe-wide smoothed ensemble mean fields for the non-adaptive and adaptive Gaussian methods for the first of our six scenarios (RC4.5, 2011(RC4.5, -2040. Panel (a) shows the ensemble mean change in rainfall prior to smoothing, which shows increases in rainfall in most of the domain, and decreases in the south of the domain. Panel (b) shows the value 340 of used by the adaptive Gaussian smoothing based on = 37, which is the value of which gives the lowest PRMSE overall. Panel (c) shows the results of non-adaptive Gaussian smoothing using the optimal value of = 3, and panel (d) shows the results of adaptive Gaussian smoothing using the value of = 37. Panels (e) and (f) show the changes created by the non-adaptive and adaptive smoothing methods, respectively. All the plots except panel (b) use non-linear colour-scales in order to emphasize spatial variations in the values shown. 345

Europe-wide Smoothed Fields
It is somewhat difficult to see differences between the unsmoothed ensemble mean and the smoothed fields (i.e., how panels

Regional Smoothed Fields
To illustrate in more detail the effect of the different smoothing methods, we now show the same results as shown in Fig. 4 but for the Alps (Fig. 5), Norway and Sweden (Fig. 6) and Iceland (Fig. 7). These regions were chosen as regions where both smoothing methods have relatively larger impacts.

The Alps 370
For the Alps (Fig. 5) the unsmoothed ensemble mean shows increases in rainfall everywhere. The increases vary in space.
The adaptive smoothing parameter shows spatial structure that roughly corresponds to the spatial structures in the ensemble mean. For the smoothed ensemble means, careful inspection of panels (c) and (d) shows that the non-adaptive Gaussian smoothing leads to slightly greater changes in the features in the ensemble mean, while the adaptive method leads to smaller and barely noticeable changes. The difference fields show that the non-adaptive smoothing is having an impact at 375 very small scales, while the adaptive smoothing is having an impact at larger scales, and only in certain regions.

Norway and Sweden, Iceland
The results for Norway and Sweden (Fig. 6) and Iceland (Fig. 7) show similar effects. The smoothing effect of the adaptive smoothing is such that small-scale features are preserved to a greater extent than by the non-adaptive smoothing. The adaptive smoothing parameter takes values in the whole range from 0 to 1. The difference fields show that the non-adaptive 380 smoothing makes changes on very small scales, while the impact of the adaptive smoothing is on larger scales.

Discussion
Figures 5, 6 and 7 show that the value of varies between near to 0 and near to 1 over very short length-scales. When is near to 0 then very little smoothing takes place: the reduction in PRMSE values shown in Fig. 2 and Fig. 3 must therefore be due to smoothing that is taking place in locations where is not close to zero. The improvements in PRMSE in the regions 385 which are more strongly smoothed must on average be higher than the improvements shown in Fig. 2 and Fig. 3. That the adaptive smoothing methods have lower PRMSE values than the non-adaptive methods is presumably partly because the adaptive methods avoid detrimental smoothing, by using values of close to zero in many locations, and partly because it allows more smoothing in regions where smoothing is beneficial. In this way, by combining information from the ensemble means, variances and covariances at different locations the adaptive smoothing may help distinguish between those signals 390 in the ensemble mean which are genuine forced signals and those which are not.  We have also performed a visual comparison of the results from adaptive smoothing for different values of the smoothing 395 parameter in a range from = 10 to = 40, and for the other three adaptive smoothing methods. The results of these comparisons (not shown) show that there is almost no impact on the results of varying within this range, and almost no difference between the results from the different methods. This is perhaps to be expected, given that the PRMSE values shown in Fig. 2 and Fig. 3 are very flat near the minimum, and are very similar for the four adaptive smoothing methods.
The lack of differences between the impact of the different methods is a useful result, since it suggests that future studies 400 would not need to compare different smoothing kernels, but could choose any one of the four, and also do not need to consider every possible value of , as we have done. probabilities would also be derived from the ensemble by fitting distributions. If the improved estimate for the ensemble mean is used in the distribution fitting process, then the probabilities will also be improved. 405 Figure 6: As Fig. 4, but for Norway and Sweden only.

Summary and Conclusions 410
One question that arises in the interpretation and usage of climate model ensembles is whether the ensemble mean should be smoothed in space order to give more precise estimates of the mean of the forced climate response in the ensemble. One of https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License. the reasons this is of interest is that smoothing the ensemble is vastly cheaper in terms of resources than increasing the precision of the ensemble mean by building new climate models or improving existing climate models. Previous research using non-adaptive smoothing (i.e., smoothing that applies the same amount of smoothing at all locations, irrespective of the 415 information contained in the ensemble) has found only a small benefit to smoothing, and has shown that smoothing tends to smooth away features in the ensemble mean that may be correct. We have tested four non-adaptive smoothing methods on EURO-CORDEX annual mean rainfall data, and our results for these methods tend to agree with these previous conclusions.
However, there is no reason why all locations have to be smoothed in the same way, and the ensemble contains a wealth of information that can help determine how best to smooth each location. For instance, one might imagine that signals in the 420 ensemble mean that have a large amplitude and low variance (i.e., a high signal-to-noise ratio) should not be smoothed as much as signals that have a low amplitude and high variance (i.e., a low signal-to-noise ratio). There are also other relevant factors, including the correlations in space in the ensemble, and the gradient of the signal in the ensemble mean. We have constructed a novel methodology that incorporates this idea of taking information from the ensemble into account in the smoothing algorithm, based on the idea of minimising predictive root-mean-squared error (PRMSE) via a bias-variance 425 trade-off. We have tested the methodology using cross-validation within the ensemble, and we find that it gives much better results than the standard non-adaptive smoothing methods that we compare with. The reduction in the PRMSE from the adaptive smoothing methodologies is roughly three times as large as the reduction from the non-adaptive methodologies, and visual comparison shows that features in the ensemble mean are preserved to a greater extent by the adaptive methodologies.
The four adaptive smoothing methods we have tested, which have different smoothing kernels, perform similarly. We 430 conclude that the choice of smoothing kernel is not particularly important.
For the data we have tested we conclude that the using adaptive smoothing gives materially better estimates of the mean of the climate response in the ensemble than using either unsmoothed model output or non-adaptive smoothing. By better we mean higher precision, i.e., higher potential accuracy. The differences created by the smoothing are on small scales, up to around 200km, and do not affect the continental-scale features of the climate response. The changes are quite localized, and 435 a small number of regions show large changes due to the adaptive smoothing. The predictions show greater potential accuracy in terms of PRMSE, and the smoothed fields may help distinguish genuine forced signals from variability more effectively than the unsmoothed ensemble mean, or the ensemble mean smoothed using non-adaptive methods. We cannot, however, make any statements about whether the smoothing methods we have tested would work for other datasets since the performance of the method is likely to depend on aspects of the behaviour of the ensemble (ensemble means, variances, 440 length-scales and correlations) which would vary depending on the variable being considered, the model and the model resolution. In addition, the optimal amount of smoothing will depend on the size of the ensemble. Applying the adaptive smoothing methodology to different datasets than those we have considered would require testing using a cross-validation based methodology similar to the one that we have applied. One simplifying factor in doing so, relative to the study described here, is that our results show that it would not be necessary to test multiple versions of non-adaptive and adaptive 445 smoothing, with different kernels, but just one of each.  The size of the beneficial impact of any spatial smoothing method, in terms of increased precision, is limited by the fact that 450 we are improving the estimate of an ensemble mean that is already reasonably precisely estimated, especially for large ensembles based on independent models (as discussed in Raisanen and Ylhaisi (2010)). However, smoothing the ensemble mean is still vastly cheaper in terms of resources than increasing the precision of the ensemble mean by building or improving climate models. It is also possible that smoothing might reduce bias versus reality, although we have not been able to evaluate that in this study. 455 https://doi.org/10.5194/npg-2022-7 Preprint. Discussion started: 22 February 2022 c Author(s) 2022. CC BY 4.0 License.