References

NPG

Nonlinear Processes in Geophysics

NPG

Nonlin. Processes Geophys.

1607-7946

Copernicus Publications

Göttingen, Germany

10.5194/npg-15-661-2008

On reliability analysis of multi-categorical forecasts

Bröcker

Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Strasse 34, 01187 Dresden, Germany

06 08 2008

15 4 661 673

2008

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/3.0/

This article is available from https://npg.copernicus.org/articles/15/661/2008/npg-15-661-2008.html

The full text article is available as a PDF file from https://npg.copernicus.org/articles/15/661/2008/npg-15-661-2008.pdf

Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.

References 1

Anderson, J L.: A method for producing and evaluating probabilistic forecasts from ensemble model integrations, J. Climate, 9, 1518–1530, 1996.

Breiman, L.: Probability, Addison-Wesley-Publishing, 1973.

Bröcker, J.: Decomposition of Proper Scores, Tech. rep., Max-Planck-Institut für Physik komplexer Systeme, Dresden, arXiv:0806.0813 [physics.ao-ph], 2008.

Bröcker, J. and Smith, L A.: Scoring Probabilistic Forecasts: The Importance of Being Proper, Weather and Forecasting, 22, 382–388, 2007a.

Bröcker, J. and Smith, L A.: Increasing the Reliability of Reliability Diagrams, Weather and Forecasting, 22, 651–661, 2007b.

Devroye, L.: Non-Uniform Random Variate Generation, Springer Verlag, 1986.

Elmore, K L.: Alternatives to the Chi-Square Test for Evaluating Rank Histograms from Ensemble Forecasts, Weather and Forecasting, 20, 789–795, 2005.

Epstein, E S.: A scoring system for probability forecasts of ranked categories, J. Appl. Meteorol., 8, 985–987, 1969.

Gneiting, T. and Raftery, A.: Strictly Proper Scoring Rules, Prediction, and Estimation, J. Am. Statist. Assoc., 102, 359–378, 2007.

Gneiting, T., Balabdaoui, F., and Raftery, A E.: Probabilistic Forecasts, Calibration, and Sharpness, Tech. rep., Department of Statistics, University of Washington, 2005.

Hamill, T M.: Interpretation of rank histograms for verifying ensemble forecasts, Mon. Weather Rev., 129, 550–560, 2001.

Hamill, T M. and Colucci, S J.: Verification of Eta–RSM Short Range Ensemble Forecasts, Mon. Weather Rev., 125, 1312–1327, 1997.

Hamill, T M. and Colucci, S J.: Evaluation of Eta–RSM Ensemble Probabilistic Precipitation Forecasts, Mon. Weather Rev., 126, 711–724, 1998.

Hansen, J. and Smith, L.: Extending the Limits of Forecast Verification with the Minimum Spanning Tree, Mon. Weather Rev., 132, 1522–1528, 2004.

Mood, A M., Graybill, F A., and Boes, D C.: Introduction to the Theory of Statistics, McGraw-Hill Series in Probability and Statistics, McGraw-Hill, 1974.

Murphy, A H.: A note on the ranked probability score, J. Appl. Meteorol., 10, 155, 1971.

Murphy, A H. and Winkler, R L.: Reliability of Subjective Probability Forecasts of Precipitation and Temperature, Appl. Statist., 26, 41–47, 1977.

Murphy, A H. and Winkler, R L.: A General Framework for Forecast Verification, Mon. Weather Rev., 115, 1330–1338, 1987.

Talagrand, O., Vautard, R., and Strauss, B.: Evaluation of Probabilistic Prediction Systems, in: Workshop on Predictability, pp. 1–25, ECMWF, 1997.

Toth, Z., Talagrand, O., Candille, G., and Zhu, Y.: Probability and Ensemble Forecasts, in: Forecast Verification, edited by: Jolliffe, I T. and Stephenson, D B., chap 7, pp. 137–163, John Wiley & Sons, Ltd., Chichester, 2003.

Toth, Z., Talagrand, O., and Zhu: The attributes of forecast systems: A framework for the evaluation and calibration of weather forecasts, in: Predictability of Weather and Climate, edited by: Palmer, T N. and Hagedorn, R., pp. 584–595, Cambridge University Press, 2005.

Wilks, D S.: Statistical Methods in the Athmospheric Sciences, vol 59 of International Geophysics Series, Academic Press, second edn., 2006.