Compacting the Description of a Time-Dependent Multivariable System and Its Time-Dependent Multivariable Driver by Reducing the System and Driver State Vectors to Aggregate Scalars: The Earth’s Solar-Wind-Driven Magnetosphere

Using the solar-wind-driven magnetosphere-ionosphere-thermosphere system, a methodology is developed to reduce a state-vector description of a time-dependent driven system to a composite scalar picture of the activity in the 10 system. The technique uses canonical correlation analysis to reduce the time-dependent system and driver state vectors to time-dependent system and driver scalars, with the scalars describing the response in the system that is most-closely related to the driver. This reduced description has advantages: low noise, high prediction efficiency, linearity in the described system response to the driver, and compactness. The methodology identifies independent modes of reaction of a system to its driver. The analysis of the magnetospheric system is demonstrated. Using autocorrelation analysis, Jensen-Shannon 15 complexity analysis, and permutation-entropy analysis the properties of the derived aggregate scalars are assessed. This state-vector-reduction technique may be useful for other multivariable systems driven by multiple inputs.

magnetosphere's evolution in response to the time-varying solar wind is rich and diverse. The magnetospheric system is characterized by multiple subsystems that interact with each other (cf. Lyon, 2000;Otto, 2005;Siscoe, 2011;Eastwood et al., 2015;Borovsky and Valdivia, 2018): almost 6 orders of magnitude of spatial scales are involved in the global behavior of the magnetosphere, from ~1 km to ~6×10 5 km. This system is highly coupled, dynamic, with memory and with feedback loops. Multiple physical processes act to couple the various subsystems, with the strength of the couplings evolving with 5 time as the subsystems evolve owing to the couplings. Even after a half of a century of measurements and analysis, its subsystems and the couplings between its subsystems are not fully understood (Stern, 1989(Stern, , 1996Denton et al., 2016). It has been argued that the system adjectives "adaptive", "nonlinear", "dissipative", and "complex" apply to the magnetospheric system (Borovsky and Valdivia, 2018). (See also the earlier systems analyses by Horton et al. (1999), Chapman et al. (2004), Valdivia et al. (2005Valdivia et al. ( , 2013, and Sharma (2010)). The magnetospheric system is well measured : 10 there are hundreds of thousands of hours of simultaneous measurements of various aspects of the magnetospheric system and its solar-wind driver over the five decades of the "space age" (cf. Stern, 1989Stern, , 1996King and Papitashvili, 2005).
The solar-wind-driven magnetospheric system very cleanly follows the D→S picture where the driver affects the system, but the system does not affect the driver. The Earth's magnetosphere has no influence whatsoever on the properties of the solar wind that passes the Earth. Measurements of this magnetospheric system will be used in Sections 2 and 3 to 15 explore the mathematical reduction of the state-vector D(t)→S(t) picture to the composite-scalar D (i) (t)→S (i) (t) picture. Table   1 lists the 9 time-dependent measurements of the magnetosphere in the system state vector S and the 8 time-dependent measurements of the solar wind in the driver state vector D. The individual variables in the system state vector and in the driver state vector are described in the Appendix.
This report is organized as follows. In Section 2 the CCA approach is applied to the magnetospheric system driven 20 by the solar wind to derive the first three time-dependent sets of composite variables S (1) (t) and D (1) (t), S (2) (t) and D (2) (t), and S (3) (t) and D (3) (t). from the state vectors S(t) and D(t). In Section 3 the three sets of composite variables S (i) and D (i) for the magnetospheric system are explored and the complexity-entropy properties of the aggregate variable S (1) (t) are analyzed. In Section 4 the advantages of the reduced D (i) →S (i) scalar description are examined: these advantages include (a) a compact description of global system-wide reactions to variations in the driver, (b) increased predictability of the system from a 25 knowledge of the driver, (c) linearity in the description of the system's response to the driver, and (d) lower noise in correlations between the system variables and the driver variables. The reduced scalar picture can also reveal independent modes of reaction of the system to the driver, providing insight into the behavior of the system in reaction to complexities in the driver. The variables of the magnetospheric and solar-wind state vectors are described in the Appendix. 30 Nonlin. Processes Geophys. Discuss., https://doi.org/10.5194/npg-2019-2 Manuscript under review for journal Nonlin. Processes Geophys. Discussion started: 20 May 2019 c Author(s) 2019. CC BY 4.0 License.

Creation of Composite (Aggregate) Variables from the State Vectors
Using the Earth's magnetosphere-ionosphere-thermosphere system as driven by the solar wind, the reduction of a time-dependent state-vector picture D(t)→S(t) to the time-dependent composite-variable-pair picture D (i) (t)→S (i) (t) will be performed. The 9 measured variables chosen for the 9-dimensional magnetospheric system state vector S appear in the first column of Table 1, and the 8 measured variables chosen for the 8-dimensional solar-wind driver state vector D appear in the 5 second column of Table 1, with explanations of those measures deferred to the Appendix.
One-hour averages of all magnetospheric and solar-wind variables are used in the years 1991 are used between the solar-wind measurements and the magnetospheric measurements: most expected time lags will be about 1-hr (e.g. Clauer et al., 1981;Smith et al., 1999), which is the time resolution of the data set.
Canonical correlation analysis (CCA) is applied to the time-dependent state vectors S(t) and D(t). CCA finds 10 correlation patterns between two multivariable data sets (Nimon et al., 2010;Hair et al., 2010). It yields pairs of composite (aggregate) variables (a) that are linear combinations of the variables of the two data sets and (b) that have maximal correlations with each other. Each pair of composite variables is called the "Nth canonical correlation". From the data sets of S(t) and D(t) the first pair of composite variables yielded (the first canonical variates) are S (1) (t) and D (1) (t): these two variables are projections of S and D given by S (1) (t) = C S1 •S(t) and D (1) (t) = C D1 •D(t) where C S1 and C D1 are time-15 independent coefficient (weight) vectors. S (1) and D (1) are the composite variables from S and D that have the highest Pearson linear correlation coefficient with each other. Here, CCA is in a sense creating the system function S (1) (t) that is most reactive to the driver vector D(t) and creating the driver scalar function D (1) (t) that describes that driving. CCA then yields other pairs of composite variables S (2) and D (2) (the second canonical correlation), S (3) and D (3) (the third canonical correlation), etc. S (2) and D (2) are the projections of S and D that have the highest correlation with each other, providing that 20 S (2) and D (2) are uncorrelated with S (1) and D (1) . S (3) and D (3) are the projections of S and D that have the highest correlations with each other, provided they are uncorrelated with S (1) , S (2) , D (1) , and D (2) . S (1) (t), S (2) (t), and S (3) (t) represent three independent modes of reaction of the global system to the driver D(t). The CCA process will identify these modes (and their respective drivers).
CCA is a matrix equation solution, non-iterative, that yields a single unique solution (Johnson and Wichern, 2007). 25 CCA operates on standardized variables (with the mean value subtracted and the values then divided by the standard deviation), denoted with an asterisk *. (For each variable the mean value and standard deviation are calculated for the entire data set.) CCA operates most efficiently on variables that are Gaussian distributed: hence the logarithms of some variables are used to yield more-Gaussian-like distributions. All standardized variables v* have a mean value of zero, a standard deviation unity, and no units. 30 When CCA is applied to the 1991-2007 S(t) and D(t) data sets (see Table 1), the first canonical pair of timedependent variables is S (1) = 0.0260 log 10 (1 + |AL|)* + 0.1151 log 10 (1 + |AU|)* + 0.2160 |PCI|* + 0.1451 Kp* + 0.2881 log 10 (1 + am)* + 0.0201 d|Dst|/dt* + 0.0492 log 10 (0.01 + mP e )* + 0.2531 log 10 (0.01 + mP i )* 35 + 0.0854 log 10 (0.01 + P ips )*

Properties of the Scalar Reduced Picture for the Magnetospheric System
The three sets of composite variables S (1) and D (1) , S (2) and D (2) , and S (3) and D (3) for the magnetospheric system are explored and the advantages of the reduced D (i) →S (i) scalar description are investigated.

The Primary Mode of System Response as Represented by D (1) →S (1) 5
In Figure 1 the composite system variable S (1) (as given by expression (1a)) is plotted for the years 1991-2007 as a function of the composite driver variable D (1) (as given by expression (1b)). Each black point in Figure 1 represents 1 hour of data.
The Pearson linear correlation coefficient between S (1) and D (1) for the 1991-2007 data set is r corr = 0.921. Accordingly, r corr 2 = 84.8% of the variance of the system function S (1) (t) is described by the driver function D (1) (t), and so 15.2% of the variance of S (1) is unaccounted for by D (1) . The blue line in Figure 1 is a linear-regression fit to S (1) and the red curve is a 50-point 10 vertical running average of the black points. Note the approximate linearity of system variable S (1) as a function of driver variable D (1) , indicated by the manner in which the running average tracks the linear-regression line.
Note that whereas the correlation coefficient between S (1) (t) and D (1) (t) is r corr = 0.921, the maximum correlation coefficient between any single variable in the system state vector S(t) and any single variable in the driver state vector D(t) is only r corr = 0.586 (between <sin 2 (θ clock /2)> 3 and log 10 (1 + |AL|)). 15 In the six panels of Figure 2 the coefficients of the six vectors C S1 , C D1 , C S2 , C D2 , C S3 , and C D3 are plotted. (These are the coefficients in expressions (1) -(3).) Examining these six panels enables the reaction modes represented by S (1) , S (2) , and S (3) to be interpreted as well as their drivers D (1) , D (2) , and D (3) . Figure 2a indicates that all coefficients of S (1) are positive: this indicates a mode of the magnetospheric system in which all measures of activity in the system vector S increase or decrease in unison, with S (1) representing a "global activity index". Figure 2b indicates that all of the coefficients 20 of D (1) are positive. The variables in the driver state vector D (Table 1) and their signs were all chosen so that a positive increase in each variable would result in a generally accepted increase in magnetospheric activity. The individual variables on the right-hand side of expression (1b) have all been correlatively associated with the driving of magnetospheric activity (Berthelier, 1976;Borovsky and Funsten, 2003;Newell et al., 2007;Borovsky and Denton, 2014;Borovsky and Birn, 2014;Osmane et al., 2015). S (1) is selected by the CCA process to have highest correlation with solar-wind variability: S (1) is 25 focused on activity that reacts to the solar-wind driver.
Using the linear-regression curve in Figure 1 as a "prediction" of the value of S (1) from a knowledge of the value of In Figure 3 the autocorrelation functions of S (1) (t) (red curve), D (1) (t) (blue curve), and S (1) (t)-S (1)pred (t) (green curve) are 30 plotted. In Figure 3a it is seen that the autocorrelation functions of S (1) and D (1) are very similar, with 1/e autocorrelation times of 23.3 hr for S (1) and 22.7 hr for D (1) . In Figure 3b the three autocorrelation functions are plotted for time shifts up to 40 days. Note the 27-day peak in the autocorrelation functions of D (1) (t) and S (1) (t): this is associated with the 27-day rotation period of the Sun as viewed from the Earth and the persistence of features on the solar surface that give rise to solar wind with characteristic properties. This causes the driver D(t) properties to have a 27-day periodicity, which drives the system 35 S(t) with a 27-day periodicity.
The quantity S (1) -S (1)pred is the portion of S (1) (t) that is not accounted for by D (1) (t), i.e., the unaccounted for variance of S (1) (t). S (1) (t)-S (1)pred (t) is completely uncorrelated with D (1) (t). Further, S (1) (t)-S (1)pred (t) is completely uncorrelated with each of the 8 individual solar-wind variables on the right-hand side of expression (1b). Since S (1) is so similar to D (1) , the standard analyses of the S (1) (t) time series (e.g. determining the correlation dimension, examining the state space, or Fourier 40 analyzing (Sharma et al., 2005a;Vassiliadis, 2006)) would largely be an analysis of the properties of the solar-wind time series D (1) (t). Not so for S (1) (t)-S (1)pred (t), which is uncorrelated with D (1) . The autocorrelation function of S (1) (t)-S (1)pred (t) in Determining what the unaccounted-for variance S (1) -S (1)pred originates from is of great interest. Four suggestions of what contributes to S (1) -S (1)pred are made here. First, some fraction of S (1) -S (1)pred may be associated with noise in the various measurements of the magnetospheric system and of the solar wind. Shot noise (random noise in the values of the variables) would have an autocorrelation time of less than 1 hr, the autocorrelation function of the shot-noise going from 1 to 0 in one data-resolution time shift (cf. Sect. 2.4 of Borovsky et al. (1997)). Second, some fraction of S (1) (t)-S (1)pred (t) may be owed to 5 errors in the measurement values in the state vectors S(t) and D(t). Errors in the values of the variables of D could be caused by the spatial structure of the solar wind and the measuring spacecraft upstream of the Earth not intercepting the exact solarwind structures that hit and drive the Earth (cf. Weimer et al., 2003;Borovsky, 2018): this could affect all of the variables of D. Extrapolating local measures to estimate global properties can also lead to errors: this might affect the hemispheric particle-precipitation variables mP e and mP i (Emery et al., 2008) in S and also the magnetospheric pressure values P ips 10 (Borovsky, 2017) in S. Variables reacting to more than one physical process (such as d|Dst|/dt and P ips ) could also appear to have error in the values when relating the values to D (1) . Third, unaccounted-for time lags between solar-wind variables and magnetospheric variables may be resulting in weakened correlations: most time lags are 1 hr or less, but measurements of magnetospheric particle populations can have lags of several hours (e.g. Denton and Borovsky, 2009;Borovsky, 2017). The fourth suggestion is that some fraction of S (1) (t)-S (1)pred (t) might be associated with system variations that are not directly 15 associated with the solar-wind driver as measured by D. The autocorrelation time of S (1) (t)-S (1)pred (t) is approximately the 2-3 hr time duration of a magnetospheric substorm (Borovsky et al., 1993;Weimer, 1994;Chu et al., 2015). Substorms are large transients in the reaction of the magnetospheric system to solar-wind driving. (Substorms have been described as selforganized criticality events in the driven magnetospheric system (Klimas et al., 2000).) The occurrence of a substorm is notoriously difficult to predict from solar wind data (Freeman and Morley, 2004;Hsu and McPherron, 2009;Newell and 20 Liou, 2011). The timing of substorm occurrence would be particularly difficult to infer from the 1-hr-resoluton variables going into D because of the 3-hr smoothing used on the clock-angle term <sin 2 (θ clock /2)> 3 in expression (1b) for D (1) , with the clock angle being critical for substorm occurrence (Newell and Liou, 2011). The occurrence of a substorm would produce signatures in many of the variables used in S (1) , typically an enhancement in the variable's amplitude lasting 2-3 hours (Weimer, 1994). 25 To investigate this substorm hypothesis for S (1) (t)-S (1)pred (t), the variables D (1) (t), S (1) (t), and S (1) (t)-S (1)pred (t) are superposed-epoch averaged in Figure 4 for a collection of 2155 substorm events; the collection is from Borovsky and Yakymenko (2017). The zero epoch in Figure 4 is the onset time of each of the 2155 substorms. Substorms are associated with intervals of driving of the magnetosphere (e.g. Caan et al., 1977;Morley and Freeman, 2007); this is indicated by the increase in the superposed average of D (1) beginning prior to the onset time in Figure 4. However, substorms also represent a 30 transient release of stored energy in the magnetosphere (Birn et al., 2006); this is indicated in Figure 4   Additionally, it would be valuable to differentiate S (1) from other indices commonly used to characterize magnetospheric activity. In order to achieve this task we use the methodology of Rosso et al. (2007) based on the combined use of permutation entropy (Bandt and Pompe, 2002) and Jensen-Shannon complexity mapping. This mapping developed by Rosso et al. (2007) is particularly useful to disentangle deterministic and stochastic time series. The reader with little familiarity to these two information theoretic measures can consult the reviews of Riedl et al. (2013) and Zanin et al. (2012) 40 or the pedestrian methodology section found in Osmane et al. (2019). In Figures 5 and 6 The bottom left panels in Figures 5 and 6 show the value of the permutation entropy for AL, am, D (1) and S (1) as a function of embedding delay. Similarly, the bottom right panels show the value of the Jensen-Shannon complexity for AL, am, D (1) and S (1) as a function of embedding delay. What we notice is that all four signals are highly stochastic since the normalized permutation entropy is very close to 1. However, we see that the Jensen-Shannon complexity for S (1) is of 10 comparable magnitude as for am, and that it is significantly larger than for AL. This is not a surprise because the construction of S (1) was based on am, and the Jensen-Shannon complexity is indicating that the former preserved the correlated structures of the latter on timescales ranging between a few hours to a few days. The top left panel of Figures 5 and 6 shows the complexity-entropy plane and the top right panel is a zoom of the right corner where most of the data for AL, am, and S (1) is lying. On both figures the blue line curves represent the maximum and minimum value of complexity for 15 a fixed entropy value, and the dashed curve represents the complexity-entropy mapping of fractional Brownian motion (fBm) with Hurst exponent ranging between 0 and 1, that is a stochastic process that also contains correlated structures. The fBm curve is a boundary between deterministic (above) and stochastic (below) fluctuations. We note that AL is effectively stochastic, whereas am and S (1) lie above the fBm boundary for a few tens of hours. The explanation for this behavior from am lies from its construction: it is repeated for three hours at a time. Hence, ordinal patterns of size d=4 and embedding 20 delays of a few hours will register the repetition as correlated structures. Since S (1) is constructed in part with am, it also contains part of its correlated structure.

The Secondary Modes of Reaction Represented by S (2) and S (3) .
In Figures 7a and 7b (2) and (3).) The correlation coefficient for the second pair is still quite high (r corr = 0.775), but lower than that of the first pair (Figure 1). This correlation coefficient r corr = 0.775 for the secondary mode is better than correlations obtained in most studies of solar-wind/magnetosphere coupling using single measures of the magnetospheric system (e.g. Table 3 of Newell et al., 2007; Table 1 of Borovsky, 2013). D (2) describes r corr 2 = 60.0% of the variance of S (2).
In Figure 7b the correlation coefficient for the third pair is low (r corr = 0.456); D (3) only describes r corr 2 = 20.8% of the 35 variance of S (3) . Canonical pairs beyond the third pair have even weaker correlations. Figure 2c shows that mode S (2) (Figure 7a) is dominated by opposite-signed coefficients for mP i and mP e , which respectively are measures of the global ion precipitation into the atmosphere versus the global electron precipitation into the atmosphere. In this S (2) mode the intensity of ion and electron precipitation reacts oppositely. Figure 2d shows that D (2) (the driver of S (2) ) is dominated by the solar wind number density n sw opposite to the clock angle sin 2 (θ clock /2) of the solar-wind 40 magnetic field, with the solar wind density increasing while the clock angle decreases resulting in more ion precipitation and less electron precipitation. This ion-versus-electron precipitation mode is a newly uncovered mode of reaction of the M-I-T system to the solar wind.   (Figure 7b) is characterized by PCI and mP i acting oppositely to Kp and am. PCI is a measure of high-latitude electrical currents in the magnetosphere and mP i is a measure of high-latitude ion precipitation; Kp and am are measures of global magnetospheric convection. This S (3) mode is very similar to a high-latitude versus convection mode uncovered by Borovsky (2014) and by Holappa et al. (2014). Figure 2f indicates that the driver D (3) for this mode is the solar wind velocity acting oppositely to the magnetic field clock angle: the wind velocity increasing while the 5 clock angle is reduced producing more convection and less high-latitude activity, or the wind slowing down while the clock angle increases producing less convection and more high-latitude activity.

Advantages of the Reduced (Aggregate-Variable) Representation of the System
The aggregate variable S (1) acts as a global activity index for the magnetospheric system: S (1) is new and unfamiliar and experience using S (1) is needed to gain an understanding of the full usefulness of this measure. S (1) could be thought of as a next-generation magnetospheric index. In Earth systems science global aggregate variables are familiar: the Global Warming Index (Hasselman, 1997;Haustein et al., 2016), the global mean sea level (Vermeer and Rahmstorf, 2009), the 5 mean global temperature (Hansen et al., 2006), the Palmer Drought Severity Index (Wells et al., 2004), and Sea Surface Temperature indices (Kaplan et al., 1998). are well known. in Earth systems science aggregate variables such as Sea Surface Temperature indices (Kaplan et al., 1998), the Global Warming Index (Haustein et al., 2016), and the Palmer Drought Severity Index (Wells et al., 2004). Here the aggregate variable S (1) is mathematically derived. The individual variables of S that go into the definition of S (1) represent familiar and identifiable aspects of activity in the magnetospheric system. The 10 composite variable S (1) is a mix of these understood measurements, the mix reflecting some global properties of the system's reaction to the solar wind. Unfamiliar as it is, the composite-scalar D (1) →S (1) reduction of the state-vector D→S picture exhibits some outright advantages for the magnetospheric system. This is particularly true in comparison with the standard method of analysis of magnetospheric driving by the solar wind that uses only a single measurement of magnetospheric activity and a single function of solar-wind variables. Four advantages are discussed in the following four paragraphs. 15 Linearity. The plotted points in Figure 1 indicate that there is a linear response of the composite system variable S (1) (t) to the composite driver D (1) (t). Usually, single measures of the magnetosphere tend to have a nonlinear response to the solar wind (e.g. Voros, 1994;Valdivia et al., 1996;Sharma et al., 2005b;Borovsky, 2013;Stepanova and Valdivia, 2016), with the individual activity variables saturating (becoming anomalously weak) when solar-wind driving becomes strong (e.g. Fig. 3 of Reiff and Luhmann, 1986;Fig. 17 of Lavraud and Borovsky, 2008;Fig. 6 of Borovsky, 2013). Such a saturation is 20 not seen for S (1) driven by D (1) . Undoubtedly, the linearity of the result is in part owed to the maximizing of the "linear" correlation coefficient in the CCA process. The linearity of the S (1) -versus-D (1) relation has a great advantage: the same mathematical relationship between S (1) and D (1) (i.e. expression (4)) holds for weak driving of the system (small D (1) ) (e.g. Kerns and Gussenhoven, 1990) and for strong driving of the system (large D (1) ) (e.g. Sharma and Veeramani, 2011).
Low Noise. The high correlation between S (1) and D (1) (cf. Figure 1) indicates that there is a relatively low level of 25 noise in the linear-regression fit to S (1) : the activity in the system as described by S (1) responds directly to the solar-wind driving as described by D (1) . For example, the unaccounted for variance of S (1) is only 15.2%. Single measures of the magnetospheric system have much weaker Pearson linear correlation coefficients with solar-wind variables than do S (1) and D (1) . Examples can be found in Table 3 of Newell et al. (2007) and Table 1 of Borovsky (2013): the maximum correlation coefficient in those tables is 0.860 (for the Dst index), but usually it is much lower. The lower noise is also confirmed by the 30 Jensen-Shannon complexity analysis of S (1) : the points for S (1) and D (1) sit closer to the maximum complexity curve than AL and other indices. The lower noise (and higher r corr ) reduces "regression dilution bias" (Bock and Petersen, 1975;Hutcheon et al., 2010) when the system activity is fit by the driver strength. Regression dilution bias can lead to spurious interpretation of trends in the data when subsets of the data are compared, particularly when a subset with systematically weaker driving is compared with a subset with systematically stronger driving. 35 High Prediction Efficiency. In magnetospheric physics, predicting what the reaction of the magnetospheric system will be to measured upstream solar-wind conditions is very important: i.e. the prediction of "space weather" (Singer et al., 2001). The high correlation between S (1) and D (1) means that there will be a high prediction efficiency when the value of S (1) is predicted from a knowledge of the value of D (1) . Note that this is high prediction of S (1) (t) without using past values of S (1) , just using the present value of D (1) (t). By optimizing the Pearson linear correlation coefficient between S and D, S (1) was 40 designed to focus on aspects of the magnetospheric system that are responsive to the conditions of the solar wind. Internal dynamics of the system that are not dependent on the time-varying state of the driver are de-emphasized in S (1) . Nonlin. Processes Geophys. Discuss., https://doi.org/10.5194/npg-2019-2 Manuscript under review for journal Nonlin. Processes Geophys. Discussion started: 20 May 2019 c Author(s) 2019. CC BY 4.0 License.
Compactness of the Description. Reductionist analysis has concluded that the magnetosphere-ionospherethermosphere system is extremely complicated (e.g. Siscoe, 2011;Eastwood et al., 2015;Borovsky and Valdivia, 2018) and as driven by the solar wind there are major outstanding issues as to how the system functions (e.g. Denton et al., 2016).
Having a single scalar variable S (1) (t) that is describing a universal global reaction of the system to its driver promises to yield insight as to how the combined system operates. 5 Uncovering New Modes of Reaction. In the CCA analysis of the system and driver state vectors, two additional aggregate variables S (2) (t) and S (3) (t) were generated (expressions (2a) and (3a)). Analysis in Section 3.B showed these two variables to represent two modes of reaction of the system to the driver that are independent of (uncorrelated with) the global-activity mode represented by S (1) (t). The mode represented by S (3) is known (having been independently discovered by this CCA methodology in Borovsky (2014) and by a principle-components methodology in Holappa et al. (2014)), but the 10 mode represented by S (2) has until now been unknown. The CCA methodology used here also identifies the aggregate driver variable that drives each of the independent modes. In future, expanding the system state vector to include a larger number of measurements in the diverse magnetospheric system should enable this state-vector-reduction methodology to uncover more unknown modes of reaction of the system to the driver. Once a reaction and its driver are uncovered, reductionist analysis can be applied to determine the physical reasons why the mode arises. 15 For a system measured by multiple time-dependent variables (that are collected into a time-dependent system state vector S(t)), with that system driven by multiple time-dependent factors (inputs) (that are collected into a time-dependent driver state vector D(t)), canonical correlation analysis (CCA) can be used to reduce the D(t)→S(t) state-vector picture to a D (i) (t)→S (i) (t) composite-scalar picture. The reduction will work, even if there is influence on the driver by the system (i.e. D(t)↔S(t)). The advantageous properties of this reduction that were examined for the magnetospheric system should apply 20 to systems in general.
Future developments of this methodology will focus on the introduction of time lags between the driver and the system, on the introduction of integro-differential correlations rather than algebraic correlations (e.g. Borovsky, 2017), and on the use of dynamic canonical correlation analysis (e.g. Dong and Qin, 2018a,b).

Vector
The time-dependent variables of the magnetospheric system state vector and the solar-wind driver state vector are listed in Table 1.
The magnetospheric variables measure various aspects of activity in the magnetosphere. The auroral upper index 30 AU (Davis and Sugiura, 1966) measures electrical current in the high-latitude ionosphere: this variable is taken to be a measure of electrical currents in the dayside magnetosphere (Goertz et al., 1993). The auroral lower index AL (Davis and Sugiura, 1966) measures electrical current in the high-latitude nightside ionosphere: this variable is taken to be a measure of auroral activity in the nightside magnetosphere (Goertz et al., 1993). The polar cap index PCI is a measure of the strength of cross-polar-cap electrical current in the ionosphere (Troshichev et al., 1988). The planetary K index Kp is a measure of the 35 strength of global convection in the magnetosphere (Thomsen, 2004). The range index am (Mayaud, 1980) is another measure of the strength of global magnetospheric convection. The disturbance storm-time index Dst measures plasma pressure in the inner magnetosphere (Dessler and Parker, 1959); Dst also reacts to the currents on the dayside boundary of the magnetosphere and to the cross-magnetotail currents in the nightside magnetosphere. The time derivative of the magnitude of the Dst index d|Dst|/dt is a compound measure of magnetospheric activity: when d|Dst|/dt is positive, hot 40 plasma is being convected from the magnetotail into the dipolar portion of the magnetosphere, and when d|Dst|/dt is negative, convection has recently subsided. The variables mP e and mP i are estimates of the full-Earth power in magnetospheric electron precipitation into the atmosphere and magnetospheric ion precipitation into the atmosphere (Emery  et al., 2008, 2009), with these estimates coming from observations on only a few spacecraft in orbit around the Earth. The average of the ion-plasma-sheet particle pressure P ips around the Earth (Borovsky, 2017)