Comment on npg-2022-7

Model projections of future climate change are expected to be the least reliable on the smallest resolved (grid box) scales, where the effects of both model errors and internal variability are maximized. Earlier studies have explored the possibility to improve the projections by smoothing the climate model output in space but have concluded that this potential is small for two reasons. First, the smoothing introduces a bias if the true climate change signal at the target locations differs from its environment. Second, multi-model ensemble means are more difficult to improve by smoothing than the output from individual models, because averaging over multiple models implicitly acts a spatial smoother.

Model projections of future climate change are expected to be the least reliable on the smallest resolved (grid box) scales, where the effects of both model errors and internal variability are maximized. Earlier studies have explored the possibility to improve the projections by smoothing the climate model output in space but have concluded that this potential is small for two reasons. First, the smoothing introduces a bias if the true climate change signal at the target locations differs from its environment. Second, multi-model ensemble means are more difficult to improve by smoothing than the output from individual models, because averaging over multiple models implicitly acts a spatial smoother.
In this manuscript, the authors show that the potential advantages of smoothing can be increased by an adaptive formulation, where the original multi-model mean climate change (m) at the target grid box is replaced by where p is a weighted average of the model projections in a surrounding region and α is an adaptive parameter. The value for α is derived theoretically and varies from grid box to grid box, whereas the size of the region over which p is calculated is the same everywhere. The novelty compared with earlier research is the introduction of α, which allows the magnitude of the smoothing to be varied based on the local properties of the model ensemble (i.e., the ensemble mean difference between m and p, the ensemble spreads of both m and p, and the covariance between m and p). The inter-model crossvalidation conducted by the authors indicates that the adoption of a varying (0 to 1) α increases the optimal size of the domain over which p is calculated relative to the nonadaptive case, which always uses the same value of (approaching 1, depending on the domain size). This, in turn, leads to a larger decrease in the cross-validated prediction error relative to the use of grid-box-scale climate changes with no smoothing.
The manuscript is very well written and the development of the theory of adaptive smoothing is clear and elegant. Consequently, my suggestions for improvement are generally small. However, there is one scientific issue that requires further elaboration. This is discussed in the next section of this review. The other, mostly very minor comments follow thereafter.

MAIN COMMENT: What leads to an improvement on the grid box scale might not do the same on a larger scale.
Inspection of the precipitation change maps in Figs. 4-7 reveals an interesting pattern. On one hand, the multi-model mean increase in precipitation (as shown in the a-panels) tends to be systematically larger over mountainous regions (e.g., the Alps, western Norway, and south-eastern Iceland) than elsewhere. On the other hand, the adaptive smoothing (f-panels) tends to systematically reduce the precipitation increase in the same areas. This is, of course, a direct consequence of the fact that the precipitation increase in the surrounding areas is smaller. Still, this seems undesirable because the larger increase over mountainous regions is physically plausible. Even if the relative (per cent) increase in precipitation were the same over the mountains and the surrounding flatlands, the larger baseline precipitation over the mountains would lead to a larger absolute increase.
Although the algorithm used to find the values of α and the horizontal scale over which the predictor p is calculated is likely optimal for minimizing the grid-scale mean square errors, these features suggest that this may not be the case when the interest is on larger-scale mean values (e.g., the average precipitation change over the Alps). For such larger-scale averages, the decrease in "random noise" becomes likely less important (because there is less noise to start with) relative to the biases that result from calculating p over a relatively wide area.
While it is easy to recognize this problem, it may not be as simple to solve it. By intuition, the best solution might be a smoothing in which size of the area over p is calculated is adaptive and regionally variable (a possibility mentioned by the authors). Most likely, this area should be smaller where there are large regional variations in the multi-model mean climate change. However, as the formulation of such smoothing may be mathematically less straightforward, this falls beyond the scope of the present work.
An easier alternative worth checking might be to apply the smoothing on relative (per cent) than absolute precipitation changes, since the per cent changes are likely to show a less systematic difference between mountainous areas and their surroundings.

MINOR COMMENTS
L126-131. Is this paragraph needed? L298 and 312. What is the corresponding relative decrease in the PRMSE relative to the individual cross-verifying simulations (which was the statistics used by RY)? This is relevant under the statistically indistinguishable paradigm which assumes that the real climate changes (including a contribution from internal variability) belong to the same statistical population as the model results, rather than being in the middle of this population. L356-357. This is unsurprising, because the values of precipitation and therefore its change are also larger in mountainous areas. L360. Please include the unit of precipitation change in the caption of Fig. 4.