<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0"><?xmltex \hack{\hyphenpenalty= 8000}?><?xmltex \hack{\sloppy}?>
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">NPGD</journal-id>
<journal-title-group>
<journal-title>Nonlinear Processes in Geophysics Discussions</journal-title>
<abbrev-journal-title abbrev-type="publisher">NPGD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Nonlin. Processes Geophys. Discuss.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2198-5634</issn>
<publisher><publisher-name>Copernicus GmbH</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/npgd-2-1363-2015</article-id><title-group><article-title>Identifying non-normal and lognormal characteristics of temperature, mixing ratio, surface pressure, and wind for data assimilation systems</article-title>
      </title-group><?xmltex \runningtitle{Non-normal atmospheric variables}?><?xmltex \runningauthor{A.~J. Kliewer et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Kliewer</surname><given-names>A. J.</given-names></name>
          <email>anton.kliewer@colostate.edu</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Fletcher</surname><given-names>S. J.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Jones</surname><given-names>A. S.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Forsythe</surname><given-names>J. M.</given-names></name>
          
        </contrib>
        <aff id="aff1"><institution>Cooperative Institute for Research in the Atmosphere, Colorado State University,
1375 Campus Delivery, Fort Collins, CO 80523-1375, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">A. J. Kliewer (anton.kliewer@colostate.edu)</corresp></author-notes><pub-date><day>4</day><month>September</month><year>2015</year></pub-date>
      
      <volume>2</volume>
      <issue>5</issue>
      <fpage>1363</fpage><lpage>1405</lpage>
      <history>
        <date date-type="received"><day>10</day><month>July</month><year>2015</year></date>
           <date date-type="accepted"><day>11</day><month>August</month><year>2015</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://npg.copernicus.org/articles/.html">This article is available from https://npg.copernicus.org/articles/.html</self-uri>
<self-uri xlink:href="https://npg.copernicus.org/articles/.pdf">The full text article is available as a PDF file from https://npg.copernicus.org/articles/.pdf</self-uri>


      <abstract>
    <p>Data assimilation systems and retrieval systems that are based upon
a maximum likelihood estimation, many of which are in operational
use, rely on the assumption that all of the errors and variables
involved follow a normal distribution.  This work develops a series
of statistical tests to show that mixing ratio, temperature, wind
and surface pressure follow non-normal, or in fact, lognormal
distributions thus impacting the design-basis of many operational
data assimilation and retrieval systems.  For this study one year of
Global Forecast System 00:00 UTC 6 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula> forecast were analyzed
using statistical hypothesis tests. The motivation of this work is
to identify the need to resolve whether or not the assumption of
normality is valid and to give guidance for where and when a data
assimilation system or a retrieval system needs to adapt its cost
function to the mixed normal-lognormal distribution-based Bayesian
model. The statistical methods of detection are based upon
Shapiro–Wilk, Jarque–Bera and a <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> test, and a new composite
indicator using all three measures.  Another method of detection
fits distributions to the temporal-based histograms of temperature,
mixing ratio, and wind.  The conclusion of this work is that there
are persistent areas, times, and vertical levels where the normal
assumption is not valid, and that the lognormal distribution-based
Bayesian model is observationally justified to minimize the error
for these conditions.  The results herein suggest that comprehensive
statistical climatologies may need to be developed to capture the
non-normal traits of the 6 h forecast.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>It has been documented several times that there are variables in the
atmosphere that come from non-normal distributions (Biondini, 1976;
López,1977; Mielke et al.,1977; Toth and Szentimrey, 1990;
Sauvageot, 1994; Yang and Pierrehumbert, 1994; Miles et al., 2000;
O'Neill et al., 2000; Harmel et al., 2002; Stephens et al., 2002;
Foster and Bevis, 2003; Zhang et al., 2003; Cho et al., 2004; Sengupta
et al., 2004; Foster et al., 2006; Perron and Sura, 2013). It is shown
in Fletcher (2010), that atmospheric variables may come from different
probability distributions depending on the season. If this is the case
then the variables' inherent distributions could also be conditioned
on large-scale climatic dynamics.</p>
      <p>If a testing procedure can be established that determines the nature
of a variable, then an appropriate analysis scheme may be chosen to
suit the variable.  Many numerical weather prediction centers include
some form of variational data assimilation, or Kalman filter, for
their analyses and forecasting scheme, that are dependent on a normal
distribution assumption for the error description.  These centers
include the Met Office (Rawlins et al., 2000), the European Centre for
Medium-Range Weather Forecasts (Rabier et al., 2000),
Météo-France (Fischer et al., 2005), Meteorological Service
of Canada (Gauthier et al., 2007), the Naval Research Laboratory (NRL)
Atmospheric Variational Data Assimilation System-Accelerated
Representer (Rosmond and Xu, 2006), and the National Centers for
Environmental Predication's Gridpoint Statistical Interpolation
(Kleist et al., 2009).  For more thorough reviews of variational data
assimilation see Fletcher (2010) and Fletcher and Jones (2014).</p>
      <p>In addition to operational data assimilation systems, the normal
distribution assumption for the modeling of errors is also made for
satellite retrieval systems, for example in the National Oceanic and
Atmospheric Administration's Microwave Integrated Retrieval System
(MiRS) (Boukabara et al., 2011), but where a logarithmic transform is
used to convert a lognormally distributed variable into a normally
distributed variable.  This transform approach is also used in the
Canadian Middle Atmosphere Model to make the state more
normally-distributed (Polavarapu et al., 2005).  While moment
statistics have been used to analyze atmospheric variables (Perron and
Sura, 2013), as of this writing the authors are unaware of any testing
procedure attempting to classify the statistical framework of mixing
ratio, temperature, wind and surface pressure.</p>
      <p>There have been previous studies that have shown that variables
including precipitation (Biondini, 1976; Mielke et al., 1977;
Sauvageot, 1994; Cho et al., 2004), total precipitable water (Foster
and Bevis, 2003; Foster et al., 2006), extreme temperatures (Toth and
Szentimrey, 1990; Harmel et al., 2002), cloud and radar echo
populations (López, 1997) cloud droplet size (Miles et al., 2000),
liquid water path (Sengupta et al., 2004; Stephens et al., 2002),
aerosol optical depth (O'Neill et al., 2000), tropical water vapor
(Zhang et al., 2003), and relative humidity (Yang and Pierrehumbert,
1994) do not conform to a normal distribution to describe their
behavior.  A climatology of nine variables' distributional
characteristics is analyzed in Perron and Sura (2013).  Some of the
studies considered spatial data while others used time-series data.
These studies used a variety of techniques to quantify the nature of
these distributions including probability density-fitting via moment
calculations, <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> goodness-of-fit tests, moment statistics, the
Shapiro–Wilk tests, and simply plotting histograms with prominent
non-normal distribution features.</p>
      <p>Across a variety of disciplines it is often convenient, and somewhat
innocuous, to treat measured variables as normally distributed in
nature.  This can misrepresent the inherent summary statistics due to
a loss of information (e.g., lack of higher statistical moment
information), and can be harmful within certain applications of the
data.  If a model, or algorithm, incorrectly assumes that a random
variable is normally distributed then the properties of this
distribution may skew its output.</p>
      <p>A variable's probability distribution dictates the probabilistic solution
that is found when using data assimilation techniques. In 3-D variational
assimilation the cost function is given by

              <disp-formula id="Ch1.E1" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi>J</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">B</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="bold-italic">h</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">R</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="bold-italic">h</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> is the background error covariance matrix, <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula> is
the observational error covariance matrix, <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> is the vector of
observations, and <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">h</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the non-linear observation operator.
The background component of Eq. (1) includes the difference between the
minimizing solution <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> and the background distribution
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. If both <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are assumed
to be indepedent normally-distributed, then the difference of these variables
is also a normally-distributed random variable. If these variables are not
normally-distributed, or moreover lognormally-distributed, then the
difference is not a normally-distributed random variable (Fletcher, 2010),
however the ratio of the variables is a lognormally-distributed random
variable (Casella and Berger, 2002). It is an implicit model assumption that
the solution <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> and the background <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> come from the
same probability distribution. It should also be noted that if the “errors”
are assumed to be unbiased then
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mtext>true</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, which means that
the expected value of the minimizing solution <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula> and background
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are centered at the same point but have different spread
(variance) and therefore skewness. It has been previously observed that that
mixing ratio errors are not normally distributed (Dee and da Silva, 2003) and
in fact are lognormally distributed (Daley and Barker, 2001).</p>
      <p>Biases could also be introduced in data assimilation and retrieval
systems that assume variables, and hence their errors, are normally
distributed when they actually follow a non-normal distribution in
nature.  A clear example of where this can be problematic is if
a computed value is physically impossible, such as relative humidity
taking a negative value.  This dubious value may be incorrectly
incorporated into the analyses, or reset to a lower bound near zero.
In either case this is certainly less desirable than solving for the
correct value using an appropriate scheme that incorporates its
correct underlying probability distribution.  Recently, mixed
normal-lognormal variational data assimilation methods have been
developed in 3-D (Fletchera and Zupanski, 2006a, b, 2007), and 4-D in
Fletcher (2010).  These initial full field formulations were not
consistent with the current operational incremental
configurations. However, a derivation and testing of a mixed
multiplicative and additive incremental 3-D- and 4-D-VAR for a control
vector that contains both normal and lognormally distributed variables
is presented in Fletcher and Jones (2014).</p>
      <p>Evidence of how an assimilation scheme improves based on the distribution of
the observational errors is shown in Fletcher and Jones (2014). Using the
Lorenz '63 chaotic model the authors show that a lognormal-based cost
function performs better than the current normal formulation given lognormal
errors. Those conclusions result from testing observations of varying
accuracy, sparseness in time, and over different window lengths.</p>
      <p>Given that there is now a mathematical framework for assimilating mixed
normal-lognormally distributed variables/errors, techniques are needed that
can inform the user of a mixed system when to switch between a full normal
distribution-based version or a mixed normal-lognormal-based version to
optimize the performance of the system and to make it consistent with the
“current” observed probabilistic behavior.</p>
      <p>Therefore, the motivation of this work is to design a set of tests that can
be performed offline between cycles or windows such that the configuration
for the approximation for the background error covariance matrix, cost
function, Jacobian and approximations to the Hessian, if used, can be ready
for the next minimization step.</p>
      <p>Given the motivation to detect a non-normal, specifically a lognormal signal,
we use 1 <inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> resolution data from the National Oceanic and Atmospheric
Administration (NOAA) Global Forecast System (GFS) 00:00 UTC 6 h forecast
between 1 January 2005 and 31 December 2005 defined on a <inline-formula><mml:math display="inline"><mml:mrow><mml:mn>181</mml:mn><mml:mo>×</mml:mo><mml:mn>360</mml:mn></mml:mrow></mml:math></inline-formula> grid.
The forecasts, which are the GFS outputs, at each grid point form the time
series. The data is analyzed over the entire year as well as on a
“seasonal” basis by considering 3 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">months</mml:mi></mml:math></inline-formula> at a time (January–March,
April–June, July–September, and October–December). The sample sizes are
consistent with the suggested size from Croarkin and Tobias (1999) of
<inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi>n</mml:mi><mml:mfrac><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">5</mml:mn></mml:mfrac></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of observations available.
The chosen variables include mixing ratio, temperature, surface pressure and
wind at levels 100, 200, 300, 500, 700, 850, and 1000 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>.</p>
      <p>While there are transformation techniques employed by operational centers for
moisture (Bocquet et al., 2010), the Navy Operational Global Atmospheric
Prediction System (NOGAPS) previously used the logarithm of specific humidity
(Eckermann et al., 2004), which is equivalent to mixing ratio (Dee and da
Silva, 2003) analyzed in this study. In Fletcher and Zupanski (2007) it is
shown that a logarithmic transformation finds the median in multivariate
lognormal space, which is positively biased relative to the mode, or the most
likely state.</p>
      <p>In this work we propose using easily calculable statistics and hypothesis
testing to show that these variables described above show strong evidence of
a non-normal nature, or more specifically, a lognormal behavior. The
hypothesis tests considered in this paper include the Jarque–Bera,
Shapiro–Wilk, and <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> goodness-of-fit. In addition, a composite
hypothesis test is proposed that includes all of the decisions made by the
aforementioned tests. Such tests are a requirement of advanced methods
(Fletcher and Zupanski, 2007; Fletcher, 2010; Song et al., 2012; Fletcher and
Jones, 2014) that are able to use multiple probability models.</p>
      <p>The format of the remainder of this paper proceeds as follows: Sect. 2
describes the formulation of the hypothesis tests as well as the test
statistics. In Sect. 3 results of these tests are presented. In Sect. 4
conclusions and a discussion of the results of Sect. 3 are presented.</p>
</sec>
<sec id="Ch1.S2">
  <title>Statistical methods</title>
      <p>In this section the statistical methods that are used to detect a non-normal
distribution signal are presented along with tests to see if the distribution
is a lognormal distribution. The random sample <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mi>X</mml:mi></mml:mrow></mml:math></inline-formula> of
independent and identically distributed (iid) observations is taken from the
GFS data for each of the hypothesis tests that all rely on a significance
level of <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math></inline-formula>. This value of <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> indicates a 99 %
confidence-level in the results of the testing procedures.</p>
      <p>The samples' autocorrelation has been checked in order to verify the iid
assumption for the hypothesis tests. While there is some autorcorrelation in
the samples, we attempt to minimize its effect by choosing such a small
<inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>. Histograms of the data are also presented in order to verify the
validity of the results of the hypothesis tests as well. The iid assumption
on any data set found in nature is difficult to assert and it is also noted
that many methods, including the National Meteorological Center (NMC) method
(Parrish and Derber, 1992), make no correction for autocorrelation.</p>
<sec id="Ch1.S2.SS1">
  <title>Hypotheses</title>
      <p>For the Shapiro–Wilk and the Jarque–Bera tests (Hain, 2010) the following
hypotheses are defined, with <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>,

                <disp-formula specific-use="align" content-type="numbered"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E2"><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>:</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:msqrt><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mfrac><mml:mi>exp⁡</mml:mi><mml:mfenced close=")" open="("><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo><mml:mspace width="2em" linebreak="nobreak"/><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo>&lt;</mml:mo><mml:mi>x</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>:</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>≠</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:msqrt><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mfrac><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo><mml:mspace width="2em" linebreak="nobreak"/><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo>&lt;</mml:mo><mml:mi>x</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
      <p>In all subsequent presentations of results a returned value of <inline-formula><mml:math display="inline"><mml:mn mathvariant="normal">0</mml:mn></mml:math></inline-formula> in
a hypothesis test indicates that the null hypothesis cannot be rejected at
the <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-level. A value of <inline-formula><mml:math display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula> indicates that the null hypothesis is
rejected in favor of the alternative hypothesis. The subtlety of this
framework cannot be overstated in that while the data may in fact originate
from a non-normal distribution there may not be enough evidence in the data
to support the claim that it is not normally distributed and therefore the
result of the hypothesis test will be <inline-formula><mml:math display="inline"><mml:mn mathvariant="normal">0</mml:mn></mml:math></inline-formula>. It is assumed that the null
hypothesis is true prior to the test, thus putting the burden of proof on the
alternative hypothesis, with the choice of <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math></inline-formula> indicating that the
testing procedures are very conservative. While one of the aims of this paper
is to investigate the possibility of mixing ratio, temperature, wind and
surface pressure following a lognormal distribution, this conclusion is not
possible with the previous hypotheses. Therefore a <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> goodness-of-fit
test has the following hypotheses, with <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>,

                <disp-formula specific-use="align" content-type="numbered"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E3"><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>:</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>x</mml:mi><mml:msqrt><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mi>ln⁡</mml:mi><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="2em"/><mml:mi>x</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>:</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>≠</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>x</mml:mi><mml:msqrt><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac><mml:mi>exp⁡</mml:mi><mml:mfenced close=")" open="("><mml:mo>-</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mi>ln⁡</mml:mi><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="2em"/><mml:mi>x</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0 .</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
      <p>In an attempt to combine both sets of hypotheses a new “composite test” is
defined. In this test if both the Shapiro–Wilk and the Jarque–Bera tests
reject <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> in favor of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and the <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> test fails to reject <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>,
then a value of <inline-formula><mml:math display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula> is returned, otherwise the result is <inline-formula><mml:math display="inline"><mml:mn mathvariant="normal">0</mml:mn></mml:math></inline-formula>. This is meant
to be a very strict test of the data not coming from a normal distribution
but in fact that the data is from a lognormal distribution.</p>
      <p>As opposed to reporting the skewness and kurtosis of a particular time-series
as in Perron and Sura (2013), this information is used to make a decision
about the distribution. While the structure of a hypothesis test includes
a preconception about the data, multiple tests are combined simultaneously to
test both directions of the normality assumption. This design ensures that
the data truly is lognormally distributed without a false positive. The
authors are not aware of this technique having been previously applied.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <title>Shapiro–Wilk</title>
      <p>Let <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> be the order statistics of the random variable
<inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> the sample mean, where the order statistic of rank <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>
is the <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th smallest value in <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula>, denoted by <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. Define the vector
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the
associated expected values of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, and let <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">V</mml:mi></mml:math></inline-formula> be the
covariance matrix of the order statistics. The expected value and covariance
matrix for a random variable <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> with probability density function <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are
given by

                <disp-formula specific-use="align" content-type="numbered"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E4"><mml:mtd/><mml:mtd/><mml:mtd><mml:mrow><mml:mi>E</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∫</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:mi>x</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="normal">d</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E5"><mml:mtd/><mml:mtd/><mml:mtd><mml:mrow><mml:mi>V</mml:mi><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mfenced open="[" close="]"><mml:mfenced open="(" close=")"><mml:mi mathvariant="bold">X</mml:mi><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mfenced close=")" open="("><mml:mi mathvariant="bold">X</mml:mi></mml:mfenced></mml:mfenced><mml:msup><mml:mfenced close=")" open="("><mml:mi mathvariant="bold">X</mml:mi><mml:mo>-</mml:mo><mml:mi>E</mml:mi><mml:mfenced close=")" open="("><mml:mi mathvariant="bold">X</mml:mi></mml:mfenced></mml:mfenced><mml:mi mathvariant="normal">T</mml:mi></mml:msup></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>
      <p>Then the Shapiro–Wilk (SW) test statistic is given by

                <disp-formula id="Ch1.E6" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi mathvariant="normal">SW</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi>a</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where

                <disp-formula id="Ch1.E7" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">V</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">V</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mi mathvariant="bold">V</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi></mml:mfenced><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:msup></mml:mrow></mml:mfrac><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>A thorough mathematical explanation of this statistic is presented in Hain
(2010). Razali and Wah (2011) has found that the Shapiro–Wilk test
outperforms in power the Kolmogorov–Smirnov, Lilliefors, and
Anderson–Darling tests for both symmetric and non-symmetric distributions
based on sample size. The power of a test is the probability of not
committing a Type II error, which occurs when <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is false but is
incorrectly not rejected (Casella and Berger, 2002).</p>
</sec>
<sec id="Ch1.S2.SS3">
  <title>Jarque–Bera</title>
      <p>Clear differences between the normal and lognormal distributions include
skewness and kurtosis. Skewness essentially determines the asymmetry of
a distribution. This statistic can be positive, negative or zero and is the
third moment of a random variables' probability distribution. Kurtosis, the
fourth moment, measures how peaked the distribution is. Descriptions of these
statistics can be found in Casella and Berger (2002). The Jarque–Bera test
combines these statistics to determine their goodness-of-fit to a normal
distribution. If the distribution is normal, then asymptotically the
Jarque–Bera (JB) test statistic has a <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> distribution with two degrees
of freedom and is given by

                <disp-formula id="Ch1.E8" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi mathvariant="normal">JB</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mi>n</mml:mi><mml:mn mathvariant="normal">6</mml:mn></mml:mfrac><mml:mfenced open="(" close=")"><mml:msup><mml:mi>S</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">4</mml:mn></mml:mfrac><mml:msup><mml:mfenced close=")" open="("><mml:mi>K</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          with the third and fourth moments given by

                <disp-formula specific-use="align" content-type="numbered"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E9"><mml:mtd/><mml:mtd><mml:mi>S</mml:mi></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mfenced><mml:mfrac><mml:mn mathvariant="normal">3</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:msup></mml:mrow></mml:mfrac><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E10"><mml:mtd/><mml:mtd><mml:mi>K</mml:mi></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">4</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mfrac><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>
</sec>
<sec id="Ch1.S2.SS4">
  <title>Chi-squared</title>
      <p>With the null hypothesis of the chi-squared test being that the data come
from a lognormal distribution, the test statistic compares expected, <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
vs. observed, <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>O</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, observations in <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> bins of data. The expected
frequency, for each bin, is given by

                <disp-formula id="Ch1.E11" content-type="numbered"><mml:math display="block"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>n</mml:mi><mml:mfenced close=")" open="("><mml:mi>F</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">u</mml:mi></mml:msub></mml:mfenced><mml:mo>-</mml:mo><mml:mi>F</mml:mi><mml:mfenced close=")" open="("><mml:msub><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">l</mml:mi></mml:msub></mml:mfenced></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> is the cumulative distribution function for the lognormal
distribution and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">u</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the upper and lower
limits for class <inline-formula><mml:math display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the sample size. The statistic, which is
compared against the <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> distribution, is given by

                <disp-formula id="Ch1.E12" content-type="numbered"><mml:math display="block"><mml:mrow><mml:msup><mml:mi mathvariant="italic">χ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mfrac><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:msub><mml:mi>O</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>Much more can be said about these hypothesis tests but that is outside of the
scope of this paper. Those details are left out in lieu of the application
results as applied to the GFS data.</p>
</sec>
<sec id="Ch1.S2.SS5">
  <title>Distribution fitting</title>
      <p>The normal and lognormal probability density functions are fitted to the data
using the maximum likelihood technique. For an independent and identically
distributed sample <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with probability density
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold">x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the likelihood function is
defined by

                <disp-formula id="Ch1.E13" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi>L</mml:mi><mml:mfenced open="(" close=")"><mml:mi mathvariant="italic">θ</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:mi>x</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mi>L</mml:mi><mml:mfenced close=")" open="("><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mfenced><mml:mo>=</mml:mo><mml:msubsup><mml:mo>∏</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>f</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>For each sample point <inline-formula><mml:math display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> the likelihood function is maximized as a function
of <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula>. A thorough explanation of this procedure can be found in Casella
and Berger (2002).</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <title>Results</title>
      <p>The results of the time-series hypothesis tests for mixing ratio and
temperature resulted in numerous figures and data plots displaying the
non-normal and lognormal nature of the GFS data. An overview of these results
is presented along with a more detailed analysis of specific points of
interest. Instead of presenting the results of the Shapiro–Wilk,
Jarque–Bera, and Chi-squared tests only the results of the Composite Test
are shown which incorporate all of the results simultaneously.</p>
      <p>For each point of the GFS data, an forecast from each day between
1 January 2005 through 31 December 2005 makes up the random variable <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> for
one year. This data is also broken down into four “seasons,” i.e.
1 January 2005 through 31 March 2005 (denoted as JFM in all figures),
1 April 2005 through 30 June 2005 (denoted AMJ), 1 July 2005 through
30 September 2005 (denoted JAS), and 1 October 2005 through 31 December 2005
(denoted OND).</p>
<sec id="Ch1.S3.SS1">
  <title>Mixing ratio</title>
      <p>A tabulated view of all of the tests results can be seen in Fig. 1.
Frequencies depict how often the Shapiro–Wilk and Jarque–Bera tests reject
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, the Chi-squared failed to reject <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and when these results
coincided for each point of the GFS. Therefore the Composite test cannot have
a larger value than any one of the individual tests. For example, the entire
year of forecasts of mixing ratio at 300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> has almost 99 % of
points coming from a non-normal distribution as concluded by the
Shapiro–Wilk and Jarque–Bera tests, and almost 29 % of points cannot be
determined to not come from lognormal distribution as per the Chi-squared
test. Therefore the composite test concludes that almost 29 % are
lognormally distributed. This chart demonstrates that there is significant
occurance of the non-normal distribution behavior, but not necessarily
lognormal behavior as determined by the Chi-squared test. Choice of
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math></inline-formula> dictates these results and can be adjusted depending on the
user's desired level of confidence.</p>
      <p>Figure 2 shows the results of the composite test at 300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>. In this
and all subsequent figures red areas indicate a positive result of the
composite test, i.e. the Shapiro–Wilk and Jarque–Bera rejected the
hypothesis that the data come from a normal distribution and the chi-squared
test failed to reject the hypothesis the data come from a lognormal
distribution. Blue areas in these figures indicate that at least one of these
conclusions is not met for the hypothesis tests. With the composite test it
is easy to see when all of the tests agree that the data comes from
a lognormal (red) as opposed to a non-lognormal distribution (blue). It is
interesting to note how the data changes over the course of the year as well
as when the data is taken as a whole for 2005. Since the areas in red are not
randomly scattered, coherent physical processes must be at work to sustain
the statistical properties of the mixing ratio. Similarly, Fig. 3 shows the
results of the composite test at 500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> for each time domain. Note
that the first two time domains of 2005 have the largest coverage of
lognormally-distributed data.</p>
      <p>To see what a sample of the data actually looks like consider Fig. 4. This
data is at 300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> located in the North Atlantic off the Canadian
coast. For 2005 as a whole as well as each season the composite test returned
a positive result for this location. With the fitted probability
distributions it is clear that the lognormal distribution is a better fit for
the mode of the data and also captures its skewness. Conversely, the fitted
normal distribution misses the mode, attempts to smooth out the data, and
includes substantial probabilites for values below zero which is physically
impossible for mixing ratio. Cold dry air extrusions into this region could
very well be driving this statistical behavior.</p>
      <p>Another location of interest which experiences significant continental air
masses (Trewartha and Horn, 1971) is in central North America where tornadoes
frequently develop. Figure 5 shows the data and probability fits at
300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>. Once again this is an instance where 2005 and each season
passes the composite test. The lognormal distribution tightly fits data again
whereas a characteristic of the normal distribution is that it is centered
around the mean of the data, which is not necessarily the location parameter
of choice for a skewed distribution. As a result, a symmetric curve is placed
at the mean which misses the major characteristics of the data.</p>
      <p>Figure 6 shows the data and distribution fits for a point in the tropical
cyclone formation region in the North Atlantic at 500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>. In this
instance, the composite test passes for each season but not for the year even
though the histogram resembles a lognormal distribution. The reason for this
speaks to the conservative nature of the tests. With <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math></inline-formula> and the
sample size of <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>363</mml:mn></mml:mrow></mml:math></inline-formula>, there must be overwhelming evidence for all of the
tests, and therefore the composite test, to conclude the data's true
statistical signal. Similar to Fig. 4, the lognormal fit clearly captures the
nature of the data better than the normal distribution.</p>
      <p>For a location near Japan at 850 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> the composite test correctly
concludes that the data does not follow a lognormal distribution as shown in
Fig. 7. Here the data is either somewhat symmetric or is bi-modal. For the
January–February–March months the normal fit is somewhat better than the
lognormal. However, for the entire year the normal fit misses both modes
entirely and gives maximum probability to less observed values.</p>
      <p>Closer inspection of many more vertical levels and locations could be shown
but are omitted due to limitations of space.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <title>Temperature</title>
      <p>Similar to Fig. 1 for mixing ratio, statistical test results are presented
for temperature in Fig. 10. It is clear that the composite test concludes
that the non-normal and lognormal signals are seen to be much less pronounced
for temperature than for mixing ratio. However there are still numerous
occurances as determined by the strict hypothesis tests. Inspection of the
composite test results for 500 and 700 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> can be seen in Figs. 8
and 9. In these images it is clear that the lower tropics are more likely to
have lognormally-distributed temperature data.</p>
      <p>By looking at the results of the Shapiro–Wilk and Jarque–Bera tests, there
are occurances where the temperature data is seen to come from a non-normal
distribution. There are 77 points out of 65 160 where the data for all of
2005 and each season is not normally distributed, i.e. the null hypothesis is
rejected for these tests on all time domains. All but one of these points are
in the Southern Hemispere, with a majority of points falling between 500 and
1000 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">mb</mml:mi></mml:math></inline-formula>. in the Southern Indian Ocean. Results shown in Figs. 11
and 12 contain examples of this occurring in the Indian Ocean and near Japan
respectively. Note how the data is either bi-modal, positively- or
negatively-skewed, or even resembles a uniform distribution. In addition
there are numerous points, across all pressure levels, where the data for one
or more “seasons” is not normaly-distributed.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <title>Surface pressure</title>
      <p>While surface pressure is a positive definite random variable, the chi
squared test indicated no instances of lognormal behavior. This is a result
of the data typically being right-skewed if the normal assumption is
rejected.</p>
      <p>While non-normal behavior is not as prevalent in surface pressure as in
mixing ratio, the frequency can be seen in Fig. 13. Here the composite test
indicates the frequency that the Jarque–Bera and Shapiro–Wilk reject the
null hypothesis, omitting the Chi squared test. Spatial coverage of the
composite test is shown in Fig. 14.</p>
      <p>An interesting presentation of the number of seasons where the normality
assumption is rejected by the composite test is shown in Fig. 15. Here, areas
over the ocean are seen more often to have non-normally distributed surface
pressure than over land.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <title>Wind</title>
      <p>Since the GFS wind data is not a positive definite random variable,
the lognormal distribution is not a viable candidate to capture its
shape or spread.  Therefore, for wind, the composite test now reports
when both the Shapiro–Wilk and Jarque–Bera tests simultaneously
reject the null hypothesis that the data comes from a normal
distribution.  Since a much more thorough review of the probability
distributions of wind has been conducted by Carta et al. (2009),
a brief inclusion of the results is presented here, which corroborate
the non-normal behavior of wind that has been previously observed.</p>
      <p>Figures 16 and 17 show the frequency that each test rejected normality as
well as where they overlap in the composite test for the <inline-formula><mml:math display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind
components. It is clear that for almost every time domain, the vertical level
with the least percentage of non-normal behavior (“most
normally-distributed”) is at 500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>. Also of interest are the
differences in the wind analyses at 1000 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>, which clearly show that
the <inline-formula><mml:math display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> component is more likely to be non-normal. This can be seen spatially
in Fig. 18.</p>
      <p>Closer inspection of the nature of the skewed and bi-modal behavior of <inline-formula><mml:math display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula>
can be seen in Fig. 19. For each time domain at 850 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>, the normal
assumption is rejected by the composite test. The normal distribution misses
the mode of the 0 h forecast and the presence of values less than zero
prevent the fit of a lognormal distribution.</p>
      <p>Given these results for mixing ratio, temperature, surface pressure, and
wind, a real-time detection method may include a moving-average that includes
the last <inline-formula><mml:math display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> number of forecasts, where this value <inline-formula><mml:math display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> could be user-defined
to specify a certain power for the hypothesis tests. Another method may
involve including data available in the current season in order to make
a determination of whether to assume the variable is normally- or
lognormally-distributed. As demonstrated in this paper the hypothesis tests
are robust for a <inline-formula><mml:math display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> smaller than one year.</p>
      <p>In this section different variables, vertical levels, time domains, and
locations have been presented demonstrating non-normal or logormal (or
neither) behavior. Given the prevalence of non-normally distributed random
variables the necessity of checking what the data looks like has been
demonstrated.</p>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <title>Conclusions and discussion</title>
      <p>Since mixing ratio and temperature have been shown to be non-normally
distributed and in many cases appear to be lognormally distributed, 3-D- and
4-D-VAR data assimilation schemes that include lognormal cost functions for
both the observations and the apriori background may be required for more
accurate results. This would have implication on the forecast skill of a DA
system, or for a retrieval system, as the analysis state from the
minimization of the mixed distribution cost function should be consistent
with the probabilistic behavior of the true state. The normal assumption,
while convenient and easily adaptable, may need to be more carefully
considered in light of these results.</p>
      <p>While it is true that a lognormal distribution with a small variance looks
very similar to a normal distribution, the detection methods used in this
paper attempt to operationally handle large amounts of data similar to the
resolution of an inner loop in incremental data assimilation schemes. It is
in this end that these statistical procedures have been demonstrated in order
to understand the true nature of atmospheric variables.</p>
      <p>The time-series data clearly indicates data for mixing ratio and temperature
will follow a lognormal distribution in certain areas. These results give
light to the fact that the normal distribution assumption is not a valid
assumption for the basis of the data assimilation and variational based
retrieval systems and suggests that more research is needed to study the
impact of assuming a normal distribution fit on forecast skills, variational
observational quality control as well as the gross error check (Lorenc and
Hammon, 1988).</p>
      <p>Therefore this work suggests that statistical climatology tests need to be
developed on a seasonal, or possibly a monthly basis, as the distributions
that are found for specific variables indicate which distribution's cost
function should be used in the assimilation schemes as a function of space
and time. Ideally a real-time decision of how the data is statistically
structured would be determined, ensuring that the correct scheme is chosen.
In either case, it is the goal that an objective decision methodology be
available for an appropriate scheme based on the nature of the data. The
choice under what observational conditions to apply alternative Baysian
models is now made as an objective decision through the procedure used and
demonstrated in this work.</p>
      <p>Future work can consider longer time-series, more vertical levels, other
atmospheric variables such as column water vapor when a boundary layer cloud
is present as seen in Fletcher (2010), and other statistical methods
including the Akaike information criterion (Akaike, 1974). The possible
future benefit of the Akaike information criterion (AIC) is that it detects
the best distribution for a random variable based on information theory which
could then give guidance for what other distributions need to be included in
the variational cost function. AIC balances the goodness-of-fit of
a distribution while minimizing the number of model parameters.</p>
      <p>It has been shown in Fletcher and Jones (2014) that there is a negative
impact on the performance of a normal distribution only incremental 4-D-VAR
when lognormal forecasts are assimilated. However, when the same observations
were assimilated in a lognormal-based incremental 4-D-VAR, then there was no
negative impact on the analysis error. Therefore, the need to determine which
distribution the observations and their errors come from is important to
minimize the impact of these errors on the analysis of a DA system and the
subsequent forecast. In this paper methodologies have been developed and
tested with the 2005 GFS 00:00 UTC 6 h forecast and it has been shown that
there are lognormal signals in the forecasts. This therefore suggests a need
for statistical climatologies to be developed and for these climatologies to
also be linked in near real-time with the data assimilation and retrieval
systems.</p>
</sec>

      
      </body>
    <back><app-group>
    <?xmltex \hack{\gdef\theequation{A\arabic{equation}}}?>

<app id="App1.Ch1.Sx1"><label/>
  <title>Distribution of errors</title>
      <p>Let <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula> be the background error component of the 3-D cost function Eq. (1)
given in Sect. 1, i.e. let

              <disp-formula id="App1.Ch1.E1" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>Without loss of generality consider the univariate case. For a random
variable <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> with a cumulative density function <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>X</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, the moment generating
function is defined by <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="normal">e</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>X</mml:mi></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. The moment generating
function for a normal random variable with mean <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> and variance <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>
is given by

              <disp-formula id="App1.Ch1.E2" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi>M</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>exp⁡</mml:mi><mml:mfenced open="{" close="}"><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>Since this equation is an exponential, the sum <inline-formula><mml:math display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula> of two independently
distributed normal random variables is also a normal random variable. That
is, if <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>Y</mml:mi><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, then
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:mi>Z</mml:mi><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The uniqueness theorem
states that if two random variables have the same momement generating
function, then they have the same probability distributions. Clearly,

              <disp-formula specific-use="align" content-type="numbered"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.E3"><mml:mtd/><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>Z</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="normal">e</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>Z</mml:mi></mml:mrow></mml:msup><mml:mo>]</mml:mo><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="normal">e</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msup><mml:mo>]</mml:mo><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="normal">e</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>X</mml:mi></mml:mrow></mml:msup><mml:mo>]</mml:mo><mml:mi>E</mml:mi><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="normal">e</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>Y</mml:mi></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>Y</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>exp⁡</mml:mi><mml:mfenced close="}" open="{"><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mfenced><mml:mi>exp⁡</mml:mi><mml:mfenced close="}" open="{"><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi>exp⁡</mml:mi><mml:mfenced open="{" close="}"><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mfrac><mml:mrow><mml:mfenced close=")" open="("><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mfenced><mml:msup><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mfenced open="(" close=")"><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>x</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
      <p>Equation (A1) can be written as

              <disp-formula id="App1.Ch1.E4" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>If it assumed that <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are normally distributed random
variables then it has been shown that the left hand side of Eq. (A4) is
normally distributed.</p>
      <p>Section 3 contains results that atmospheric random variables can have
a non-normal, or in particular, a lognormal distribution. This would imply
that the right hand side of Eq. (A4) would be the sum of a normal and
a lognormal distribution. An assumption such as this for the sought after
state <inline-formula><mml:math display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> would be highly suspect. It is with this mathematical formulation
in mind that yielded the research into mixed normal-lognormal variational
data assimilation methods as well as the distributions of the assimilated
variables.</p>
</app>
  </app-group><ack><title>Acknowledgements</title><p>This work is primarily supported by the National Science Foundation
via grant AGS-1038790 at CIRA/Colorado State University and the GFS
data were obtained from the National Climatic Data Center at
<uri>http://nomads.ncdc.noaa.gov/data.php#hires_weather_datasets</uri>.</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Akaike, H: A new look at the statistical model identification, IEEE T.
Automat. Contr., 19, 716–723,
doi:<ext-link xlink:href="http://dx.doi.org/10.1109/tac.1974.1100705">10.1109/tac.1974.1100705</ext-link>,
1974.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Biondini, R.: Cloud motion and rainfall statistics, J. Appl. Meteorol., 15,
205–224, <ext-link xlink:href="http://dx.doi.org/10.1175/1520-0450(1976)015&lt;0205:CMARS&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0450(1976)015&lt;0205:CMARS&gt;2.0.CO;2</ext-link>, 1976.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Bocquet, M., Pires, C., and Wu, L.: Beyond Gaussian statistical modeling in
geophysical data assimilation, Mon. Weather Rev., 138, 2997–3023,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/2010MWR3164.1">10.1175/2010MWR3164.1</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Boukabara, S. A., Garrett, K., Chen, W., Flavio, I. S., Grassotti, C.,
Kongoli, C., Chen, R., Liu, Q., Yan, B., Weng, F., Ferraro, R.,
Kleespies, T., and Meng, H.: MiRS: An all-weather 1-DVAR satellite data
assimilation and retrieval system, IEEE T. Geosci. Remote, 49, 3249–3272,
doi:<ext-link xlink:href="http://dx.doi.org/10.1109/tgrs.2011.2158438">10.1109/tgrs.2011.2158438</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Carta, J. A., Ramírez, P., and Velázquez, S.: A review of wind speed
probability distributions used in wind energy analysis: case studies in the
Canary Islands, Renew. Sust. Energ. Rev., 13, 933–955,
doi:<ext-link xlink:href="http://dx.doi.org/10.1016/j.rser.2008.05.005">10.1016/j.rser.2008.05.005</ext-link>,
2009.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>
Casella, G., and Berger, R.: Statistical Inference, Duxbury Press, Pacific
Grove, CA, 2002.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Cho, H. K., Bowman, K. P., and North, G. R.: A comparison of gamma and
lognormal distributions for characterizing satellite rain rates from the
Tropical Rainfall Measuring Mission, J. Appl. Meteorol., 43,
1586–1597,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/JAM2165.1">10.1175/JAM2165.1</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Croarkin, M. and Tobias, P.: NIST/SEMATECH engineering statistics Internet
handbook, available at: <uri>http://www.nist.gov/stat.handbook</uri>, last access:
26 August 2015, 1999.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Daley, R. and Barker, E.: NAVDAS: Formulation and diagnostics, Mon. Weather
Rev., 129, 869–883,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;0869:ANFAD&gt;2.0.CO;2">10.1175/1520-0493(2001)129&lt;0869:ANFAD&gt;2.0.CO;2</ext-link>,
2001.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Dee, D. and da Silva, A.: The choice of variable for atmospheric moisture
analysis, Mon. Weather Rev., 131, 155–171,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2003)131&lt;0155:TCOVFA&gt;2.0.CO;2">10.1175/1520-0493(2003)131&lt;0155:TCOVFA&gt;2.0.CO;2</ext-link>,
2003.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>
Eckermann, S. D., McCormack, J. P., Coy, L., Allen, D., Hogan, T., and
Kim, Y. J.: NOGAPS-Alpha: A prototype high-altitude global NWP model,
Preprint Volume, P2.6, Symposium on the 50th Anniversary of Operational
Numerical Weather Prediction, American Meteorological Society, University of
Maryland, College Park, MD, 14–17 June, 2004.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Fischer, C., Montmerle, T., Berre, L., Auger, L., and
Ştefănescu, S. E.: An overview of the variational assimilation in the
ALADIN/France numerical weather-prediction system, Q. J. Roy. Meteor. Soc.,
131, 3477–3492,
doi:<ext-link xlink:href="http://dx.doi.org/10.1256/qj.05.115">10.1256/qj.05.115</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Fletcher, S. J.: Mixed lognormal-Gaussian four-dimensional data
assimilation, Tellus A, 62, 266–187,
doi:<ext-link xlink:href="http://dx.doi.org/10.1111/j.1600-0870.2010.00439.x">10.1111/j.1600-0870.2010.00439.x</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Fletcher, S. J. and Jones, A. S.: Multiplicative and additive incremental
variational data assimilation for mixed lognormal and Gaussian errors, Mon.
Weather Rev., 142, 2521–2544,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/MWR-D-13-00136.1">10.1175/MWR-D-13-00136.1</ext-link>,
2014.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Fletcher, S. J. and Zupanski, M.: A data assimilation method for log-normally
distributed observational errors, Q. J. Roy. Meteor. Soc., 132, 2505–2519,
doi:<ext-link xlink:href="http://dx.doi.org/10.1256/qj.05.222">10.1256/qj.05.222</ext-link>, 2006a.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Fletcher, S. J. and Zupanski, M.: A hybrid normal and lognormal distribution
for data assimilation, Atmos. Sci. Lett., 7, 43–46,
doi:<ext-link xlink:href="http://dx.doi.org/10.1002/asl.128">10.1002/asl.128</ext-link>, 2006b.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Fletcher, S. J. and Zupanski, M.: Implications and impacts of transforming
lognormal variables into normal variables in VAR, Meteorol. Z., 16,
755–765,
doi:<ext-link xlink:href="http://dx.doi.org/10.1127/0941-2948/2007/0243">10.1127/0941-2948/2007/0243</ext-link>,
2007.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Foster, J. and Bevis, M: Lognormal distribution of precipitable water in
Hawaii, Geochem. Geophy. Geosy., 4, 1–8,
doi:<ext-link xlink:href="http://dx.doi.org/10.1029/2002gc000478">10.1029/2002gc000478</ext-link>,
2003.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Foster, J., Bevis, M., and Raymond, W.: Precipitable water and the lognormal
distribution, J. Geophys. Res., 111, D15102,
doi:<ext-link xlink:href="http://dx.doi.org/10.1029/2005JD006731">10.1029/2005JD006731</ext-link>,
2006.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Gauthier, P., Tanguay, M., Laroche, S., Pellering, S., and Morneau, J.:
Extension of a 3-D-Var to 4-D-Var: implementation of 4-D-Var at the
Meteorological Service of Canada, Mon. Weather Rev., 135, 2339–2354,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/MWR3394.1">10.1175/MWR3394.1</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>
Hain, J.: Comparison of common tests for normality, PhD thesis, Institut
für Mathematik und Informatik, Julius-Maximilians-University at
Würzburg, Germany, 102 pp., 2010.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Harmel, R. D., Richardson, C. W., Hanson, C. L., and Johnson, G. L.:
Evaluating the adequacy of simulating maximum and minimum daily air
temperature with the normal distribution, J. Appl. Meteorol., 41, 744–753,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0450(2002)041&lt;0744:ETAOSM&gt;2.0.CO;2">10.1175/1520-0450(2002)041&lt;0744:ETAOSM&gt;2.0.CO;2</ext-link>,
2002.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Kleist, D. T., Parrish, D. F., Derber, J. C., Treadon, R., Wu, W. S., and
Lord, S.: Introduction of the GSI into the NCEP Global Data Assimilation
System, Weather Forecast., 24, 1691–1705,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/2009WAF2222201.1">10.1175/2009WAF2222201.1</ext-link>,
2009.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>López, R. E.: The lognormal distribution and cumulus cloud populations,
Mon. Weather Rev., 135, 865–872,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(1977)105&lt;0865:TLDACC&gt;2.0.CO;2">10.1175/1520-0493(1977)105&lt;0865:TLDACC&gt;2.0.CO;2</ext-link>,
1977.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Lorenc, A. C. and Hammon, O.: Objective quality control of observations using Bayesian methods. Theory, and a practical implementation, Q. J. Roy. Meteor. Soc., 114, 515–543,
doi:<ext-link xlink:href="http://dx.doi.org/10.1002/qj.49711448012">10.1002/qj.49711448012</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Mielke Jr., P. W., Williams, S. J., and Wu, S. C.: Covariance analysis
techniques based on bivariate log-Normal distribution with weather
modification applications, J. Appl. Meteorol., 16, 183–187,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0450(1977)016&lt;0183:CATBOB&gt;2.0.CO;2">10.1175/1520-0450(1977)016&lt;0183:CATBOB&gt;2.0.CO;2</ext-link>,
1977.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Miles, N. L., Verlinde, J., and Clothiaux, E. E.: Cloud droplet size
distribution in low-level stratisform clouds, J. Atmos. Sci., 57, 295–311,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0469(2000)057&lt;0295:CDSDIL&gt;2.0.CO;2">10.1175/1520-0469(2000)057&lt;0295:CDSDIL&gt;2.0.CO;2</ext-link>,
2000.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>O'Neill, N., Ignatov, A., Holben, B., and Eck, T.: The lognormal
distribution as a reference for reporting aerosol optical depth statistics:
emperical tests using multi-year, multi-site AERONET sunphotometer data,
Geophys. Res. Lett., 27, 3333–3336,
doi:<ext-link xlink:href="http://dx.doi.org/10.1029/2000GL011581">10.1029/2000GL011581</ext-link>,
2000.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Parrish, D. F. and Derber, J. C.: The National Meteorological Center's spectral statistical-interpolation analysis system, Mon. Weather Rev., 120, 1747–1763,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(1992)120">10.1175/1520-0493(1992)120</ext-link>,1747:TNMCS S.2.0.CO;2, 1992.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Perron, M. and Sura, P.: Climatology of non-Gaussian atmospheric statistics,
J. Climate, 26, 1063–1083,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/JCLI-D-11-00504.1">10.1175/JCLI-D-11-00504.1</ext-link>,
2013.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Polavarapu, S., Ren, S., Rochon, Y., Sankey, D., Ek, N., Koshyk, J., and
Tarasick, D.: Data assimilation with the Candian Middle Atmosphere Model,
Atmos. Ocean, 43, 77–100,
doi:<ext-link xlink:href="http://dx.doi.org/10.3137/ao.430105">10.3137/ao.430105</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>Rabier, F., Jarvinen, H., Klinker, E., Mahouf, J. F., and Simmons, A.: The
ECMWF implementation of four dimensional variational assimilation. Part I:
Experimantal results with simplified physics, Q. J. Roy. Meteor. Soc., 126A,
1143–1170,
doi:<ext-link xlink:href="http://dx.doi.org/10.1002/qj.49712656415">10.1002/qj.49712656415</ext-link>,
2000.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Rawlins, F., Ballard, S. P., Bovis, K. J., Clayton, A. M., Li, D.,
Inverarity, G. W., Lorenc, A. C., and Payne, T. J.: The Met Office global
four-dimensional variational data assimilation scheme, Q. J. Roy. Meteor.
Soc., 133, 347–362,
doi:<ext-link xlink:href="http://dx.doi.org/10.1002/qj.32">10.1002/qj.32</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>
Razali, N. M. and Wah, Y. B.: Power comparisons of Shapiro–Wilk,
Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests, J. Stat. Model.
Analytics, 2, 21–33, 2011.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation>Rosmond, T. and Xu, L.: Development of NAVDAS-AR: non-linear formulation and
outer loop test, Tellus A, 58, 45–58,
doi:<ext-link xlink:href="http://dx.doi.org/10.1111/j.1600-0870.2006.00148.x">10.1111/j.1600-0870.2006.00148.x</ext-link>,
2006.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Sauvageot, H.: The probability density function of rain rate and the
estimation of rainfall by area integrals, J. Appl. Meteorol., 33, 1255–1262,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0450(1994)033&lt;1255:TPDFOR&gt;2.0.CO;2">10.1175/1520-0450(1994)033&lt;1255:TPDFOR&gt;2.0.CO;2</ext-link>,
1994.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Sengupta, M., Clothiaux, E. E., and Ackerman, T. P.: Climatology of warm
boundary layer clouds at the ARM SCP site and their comparison to models, J.
Climate, 17, 4760–4782,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/JCLI-3231.1">10.1175/JCLI-3231.1</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>Song, H., Edwards, C. A., Moore, A. M., and Fiechter, J.: Incremental
four-dimensional variational data assimilation of positive-definite oceanic
variables using a logarithm transformation, Ocean Model., 54, 1–17,
doi:<ext-link xlink:href="http://dx.doi.org/10.1016/j.ocemod.2012.06.001">10.1016/j.ocemod.2012.06.001</ext-link>,
2012.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Stephens, G. L., Vane, D. G., Boain, R. J., Mace, G. G., Sassen, K., Wang,
Z., Illingworth, A. J., O'Connor, E. J., Rossow, W. B., Durden, S. L.,
Miller, S. D., Austin, R. T., Benedetti, A., and Mitrescu, C.: The CLOUDSAT
mission and the A-train, B. Am. Meteorol. Soc., 83, 1771–1190,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/BAMS-83-12-1771">10.1175/BAMS-83-12-1771</ext-link>,
2002.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Toth, Z. and Szentimrey, T.: The binormal distribution: a distribution for
representing asymmetrical but normal-like weather elements, J. Climate, 3,
128–137,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0442(1990)0032.0.CO;2">10.1175/1520-0442(1990)0032.0.CO;2</ext-link>,
1990.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>
Trewartha, G. T. and Horn, L. H.: An introduction to climate, MGraw Hill
International, London, 1971.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation>Yang, H. and Pierrehumbert, R.: Production of dry air by isentropic mixing,
J. Atmos. Sci., 5, 3437–3454,
doi:<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0469(1994)051&lt;3437:PODABI&gt;2.0.CO;2">10.1175/1520-0469(1994)051&lt;3437:PODABI&gt;2.0.CO;2</ext-link>,
1994.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Zhang, C. D., Mapes, B. E., and Soden, B. J.: Bimodality in tropical water
vapour, Q. J. Roy. Meteor. Soc., 129, 2847–2866,
doi:<ext-link xlink:href="http://dx.doi.org/10.1256/qj.02.166">10.1256/qj.02.166</ext-link>, 2003.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

      <fig id="App2.Ch1.F1"><caption><p><bold>(a–e)</bold> Frequency of each test result on
every time domain and atmospheric level.  For the Jarque–Bera and
Shapiro–Wilk tests, the values represent the percentage of points
where the null hypothesis is rejected, concluding non-normal data.
For the Chi-squared test, the frequency is the percentage of points
where the null hypothesis is not rejected, demonstrating
insufficient evidence against the data being
lognormally-distributed.  The composite test combines these results,
indicating non-normal and lognormally-distributed data.  The tests
conclude large percentages of points where the data is non-normal
and seasonal points where the data is lognormally-distributed.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f01.pdf"/>

    </fig>

      <fig id="App2.Ch1.F2"><caption><p>Composite results for water vapor mixing ratio for
<bold>(a)</bold> 2005 and <bold>(b–e)</bold> each season at
300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">mb</mml:mi></mml:math></inline-formula>.</p></caption>
      <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f02.pdf"/>

    </fig>

      <fig id="App2.Ch1.F3"><caption><p>Similar to Fig. 2, composite results for mixing ratio for
<bold>(a)</bold> 2005 and <bold>(b–e)</bold> each season at
500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">mb</mml:mi></mml:math></inline-formula>.</p></caption>
      <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f03.pdf"/>

    </fig>

      <fig id="App2.Ch1.F4"><caption><p>Histograms along with Normal and Lognormal probability
distibution for <bold>(a)</bold> 2005 and <bold>(c–f)</bold> each season at
300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">mb</mml:mi></mml:math></inline-formula>.  Panel <bold>(b)</bold> indicates the location of this
data off the Canadian eastern coast.  This point is an example where
each season along with the entire year of data passes the composite
test indicating lognormal behavior.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f04.pdf"/>

    </fig>

      <fig id="App2.Ch1.F5"><caption><p>Similar to Fig. 4 at 300 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> for a point in central
North America.  The composite test returns a positive result for
2005 and each season.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f05.pdf"/>

    </fig>

      <fig id="App2.Ch1.F6"><caption><p>Similar to Fig. 4 at 500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> for a point in the North
Atlantic.  This is an example where each season, but not the entire
year, passes the composite test indicating lognormal behavior.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f06.pdf"/>

    </fig>

      <fig id="App2.Ch1.F7"><caption><p>Location near Japan at 850 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> where the composite
test fails for every time domain.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f07.pdf"/>

    </fig>

      <fig id="App2.Ch1.F8"><caption><p>Similar to Fig. 2, composite results for <bold>(a)</bold> 2005
and <bold>(b–e)</bold> each season at 500 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> for temperature.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f08.pdf"/>

    </fig>

      <fig id="App2.Ch1.F9"><caption><p>Similar to Fig. 2, composite results for <bold>(a)</bold> 2005
and <bold>(b–e)</bold> each season at 700 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula> for temperature.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f09.pdf"/>

    </fig>

      <fig id="App2.Ch1.F10"><caption><p>Frequency of each test result for temperature on every time
domain and atmospheric level similar to Fig. 1.  There are
a significant number of points where non-normal and
lognormally-distributed data appear, both annually and seasonally.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f10.pdf"/>

    </fig>

      <fig id="App2.Ch1.F11"><caption><p>Temperature data for a point near Taiwan where the
Shapiro–Wilk and Jarque–Bera conclude non-normally distributed
data.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f11.pdf"/>

    </fig>

      <fig id="App2.Ch1.F12"><caption><p>Temperature data for a point in Australia where the
Shapiro–Wilk and Jarque–Bera conclude non-normally distributed
data.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f12.pdf"/>

    </fig>

      <fig id="App2.Ch1.F13"><caption><p>Similar to Fig. <xref ref-type="fig" rid="App2.Ch1.F1"/>, the frequencies
represent how often the normality assumption was rejected for each time
domain for surface pressure.</p></caption>
      <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f13.pdf"/>

    </fig>

      <fig id="App2.Ch1.F14"><caption><p><bold>(a, b)</bold> Similar to Fig. 2, the red areas indicate the
normality assumption was rejected for surface pressure.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f14.pdf"/>

    </fig>

      <fig id="App2.Ch1.F15"><caption><p>Frequency (0–4) of seasons determined to be non-normal by
the composite test.</p></caption>
      <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f15.pdf"/>

    </fig>

      <fig id="App2.Ch1.F16"><caption><p><bold>(a–e)</bold> Similar to Fig. 1, the frequencies represent
how often the normality assumption was rejected for each vertical
level and time domain for the <inline-formula><mml:math display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> component of wind.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f16.pdf"/>

    </fig>

      <fig id="App2.Ch1.F17"><caption><p><bold>(a–e)</bold> Similar to Fig. 1, the frequencies represent
how often the normality assumption was rejected for each vertical
level and time domain for the <inline-formula><mml:math display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> component of wind.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f17.pdf"/>

    </fig>

      <fig id="App2.Ch1.F18"><caption><p><bold>(a, b)</bold> Similar to Fig. 2, the red areas indicate the
normality assumption was rejected for the <inline-formula><mml:math display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> components of
wind respectively.</p></caption>
      <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f18.pdf"/>

    </fig>

      <fig id="App2.Ch1.F19"><caption><p>Similar to Fig. 4, histograms along with a normal probability
distibution for <bold>(a)</bold> 2005 and <bold>(c–f)</bold> each season at
850 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">hPa</mml:mi></mml:math></inline-formula>.  Panel <bold>(b)</bold> indicates the location of this
data in the Pacific Ocean near Hawaii.  In each time domain, the
composite test rejected the normal assumption for this location.</p></caption>
      <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/preprints/2/1363/2015/npgd-2-1363-2015-f19.pdf"/>

    </fig>

    </app></app-group></back>
    </article>
