Articles | Volume 15, issue 4
Nonlin. Processes Geophys., 15, 661–673, 2008
https://doi.org/10.5194/npg-15-661-2008
Nonlin. Processes Geophys., 15, 661–673, 2008
https://doi.org/10.5194/npg-15-661-2008

  06 Aug 2008

06 Aug 2008

On reliability analysis of multi-categorical forecasts

J. Bröcker J. Bröcker
  • Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Strasse 34, 01187 Dresden, Germany

Abstract. Reliability analysis of probabilistic forecasts, in particular through the rank histogram or Talagrand diagram, is revisited. Two shortcomings are pointed out: Firstly, a uniform rank histogram is but a necessary condition for reliability. Secondly, if the forecast is assumed to be reliable, an indication is needed how far a histogram is expected to deviate from uniformity merely due to randomness. Concerning the first shortcoming, it is suggested that forecasts be grouped or stratified along suitable criteria, and that reliability is analyzed individually for each forecast stratum. A reliable forecast should have uniform histograms for all individual forecast strata, not only for all forecasts as a whole. As to the second shortcoming, instead of the observed frequencies, the probability of the observed frequency is plotted, providing and indication of the likelihood of the result under the hypothesis that the forecast is reliable. Furthermore, a Goodness-Of-Fit statistic is discussed which is essentially the reliability term of the Ignorance score. The discussed tools are applied to medium range forecasts for 2 m-temperature anomalies at several locations and lead times. The forecasts are stratified along the expected ranked probability score. Those forecasts which feature a high expected score turn out to be particularly unreliable.