The tail probability,

Not all geophysical variables obey a Gaussian (normal) distribution. This is
true not only for the central part, but also for the extremal part (tail) of
a distribution. Instead of a Gaussian exponential behaviour, we often observe
a Pareto tail (power law) with a distribution function,

Theoretical explanations of the heavy tail behaviour rest on multiplicative
or non-linear interaction of variables in complex geophysical systems. Such
derivations exist, for example, for the variables rainfall

Many statistical distributions have heavy tails. A particularly useful class
of those are the stable distributions

The accurate statistical estimation of the heavy tail index on the basis of a
set of data,

Order selection constitutes a statistical trade-off problem

We assess as minor the fact that the Hill estimator is not translation
invariant (to shifts in

There exist other estimators of

Processes in complex geophysical systems may exhibit not only heavy tail
behaviour, but also persistence in the time domain. Let

We aim here for a heavy tail parameter estimation that is accurate, widely
applicable and robust (i.e. reliable even when some underlying assumptions
are not met). The selection

To repeat the ingredients of the statistical problem, let

Algorithm

Optimal order selection for the Hill estimator.

calculate

calculate

generate

generate

calculate

calculate

calculate the measure,
RMSE

select arg min

Algorithm

In the context of heavy tail index estimation,
Algorithm

We compare the optimal order selector for the Hill estimator
(Algorithm

Monte Carlo experiment on order selection for the Hill estimator,

prescribe

draw spacing,

scale

set

generate

select order,

calculate

calculate
RMSE

The first competitor as order selector is based on the asymptotic normality
of

The second competitor
aims to improve the selector based on asymptotic normality
by estimating the AMSE via a computing-intensive bootstrap resampling procedure

There exists also the suggestion to look for a plateau of the sequence

RMSE of the estimated heavy tail parameter in dependence on
data size for the Hill estimator and various order selectors:
optimal (Algorithm

The results (Fig.

To adapt the preface to our book on climate time series analysis

The Monte Carlo experiment (Sect.

A note on the selection of

The application of heavy tail estimation to artificial data (Fig.

Application of heavy tail estimation with optimal order selection to
artificial data:

It is remarkable that the sequence

The resulting estimates with RMSE error bars
from

The good agreement between data and fit is also reflected by the good agreement
between data histograms and fitted densities
(Fig.

One caveat to consider is the fact that the prescribed density of the process
that generated the data (Fig.

The application of heavy tail estimation to observed data (Fig.

Application of heavy tail estimation with optimal order selection to
observed data:

Due to excessive computing costs associated with a brute force search for

The resulting estimates with RMSE bars from

For the hydrological interpretation of the statistical results, not only the
error bars (RMSE

In the case of persistence estimation of runoff series, an alternative to the
AR(1) model may be a long-memory model

In the case of heavy tail index estimation,
we think that the employed stable distribution model class
does already capture the true distribution
(Fig.

However, the possibility of model mis-specification prevents us at this stage
of the analysis from concluding unambiguously that with

The tail probability,

The accurate estimation of

The new selector is claimed to utilize the data in an optimum way for
performing an estimation. The resulting error bars
(RMSE

The data-generating process (AR(1) with stable distributed innovations)
achieves “distributional robustness” because the full distribution does not
need to follow Eq. (

However, at this stage of method development, it is still useful to perform
more Monte Carlo simulation studies on heavy tail index estimation. These
simulations should include varied designs, in particular, prescribed shapes
other than a stable distribution. Furthermore, it is interesting to study
estimators other than Hill (on which this paper focuses). The computer
program associated with optimal index estimation (ht) has also implemented
the estimation routine following

The application to an observed, hydrological time series (Fig.

The wider impact of optimal heavy tail estimation may be not only on the
application to the area of instrumental environmental measurements, but also
to reconstructed variables from the areas of paleoclimatology

The code (Fortran 90 source, Windows executable and
auxiliary files) and a manual are available at

MAB wrote the initial software version
and carried out the Monte Carlo experiment (Fig.

The authors declare that they have no conflict of interest.

Please see the source code or the manual for the disclaimer.

We thank two anonymous referees for helpful reviews. We thank Mersku Alkio (RBZ Wirtschaft, Kiel, Germany) and Mark M. Meerschaert (Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA) for comments on the draft manuscript. We thank the BfG for supplying us with the Elbe runoff time series. This work is supported by the European Commission via Marie Curie Initial Training Network LINC (project number 289447) under the Seventh Framework Programme. Edited by: Jinqiao Duan Reviewed by: two anonymous referees