Optimal Precursors Identification for North Atlantic Oscillation using CESM and CNOP Method

The North Atlantic Oscillation (NAO) is the most prominent atmospheric seesaw phenomenon in the North Hemisphere. It has a profound influence on the westerly wind strength and storm tracks in North Atlantic, which further affect the winter climate in Northern Hemisphere. Therefore, the identification for optimal precursor (OPR) of the NAO event is of research value and practical significance. In this paper, the conditional nonlinear optimal perturbation (CNOP) method, which has been widely used in research on the OPR of climatic event, is adopted to explore which kind of initial perturbation is most 5 likely to trigger the NAO anomaly pattern in the Community Earth System Model (CESM). Since the adjoint model of CESM has yet to be developed, this kind of problem cannot be solved using traditional strategies based on gradient information provided by the adjoint model. We utilize an adjoint-free algorithm to solve CNOP in such a high dimensional numerical model, and OPRs of the NAO can be successfully identified. The results reveal that OPRs obtained by CNOP can cause the basic state to develop into typical dipole mode, and the nonlinear process plays an important role in the last stage of the prediction period. 10 The algorithm adopted in this work can avoid falling into a local optimum and is accelerated with multiple parallel frameworks to enhance performance. The solution scheme can also be generalized to the OPR research of other climate events or other complex numerical models.

, the dipole precipitation pattern over northwest Europe and northwest Africa (Wassenburg et al., 2016) and the surface temperature variation (Pokorná and Huth, 2015), etc.
The NAO can be regarded as a nonlinear initial value problem (Woollings et al., 2008), and its optimal precursor (OPR) is a kind of initial perturbation that is most likely to develop into NAO events (Mu et al., 2014). Since the initial condition has a significant influence on the predictability of the decadal variability (Zhang et al., 2016), the OPR can help us to understand 5 the dynamical processes of the NAO state transition. The physical mechanism for triggering the NAO event can be discovered by investigating the developing process of the OPR. Moreover, the sensitive areas determined by the spatial structure of the OPRs are beneficial to the intensive observations, thus improving the forecast accuracy of the NAO state transition. Hence, the research on OPR is of widespread scientific research value to study the physical mechanism and enhance forecast skill for the NAO. Further, exploring and optimizing the method for solving the OPR also has essential meanings. Although the phase and 10 amplitude of NAO are affected by numerous factors (Bucha, 2014;Yu and Lin, 2016;Hansen et al., 2017), the characteristics of the NAO in the atmospherical process can be captured by the nonlinear models (Luo et al., 2007). As a new generation of ocean-atmosphere coupled model, the Community Earth System Model (CESM) has presented the outstanding performance of NAO simulation, and the low-frequency variability of the NAO has also been well demonstrated (Nandini, 2017;Wang et al., 2015). 15 The conditional nonlinear optimal perturbation (CNOP) is a mature method for solving OPRs. It describes the initial perturbation that causes the largest prediction error under a specific constraint condition at the prediction time. CNOP is appropriate for predictability studies of climate events with simulating nonlinear motions of oceans and atmospheres (Mu et al., 2003), and can be concluded as an optimization problem with constraints. CNOP approach was initially adopted to identify the OPRs of ENSO (Duan et al., 2004) and was gradually applied in research on the onset of blocking events (Mu and Jiang, 2011), 20 Kuroshio large meander (Zhang et al., 2017b), and Indian Ocean dipole events . Recently, Jiang et al. (Jiang et al., 2013) explore the OPRs that trigger the NAO events using CNOP, demonstrating that the amplitude induced by the self-interaction of perturbations in the onset of the N AO − is stronger than that in the onset of the N AO + . On this basis, Dai et al. (Dai et al., 2016) investigate the relationship between the OPR and optimally growing initial error (OGE) using CNOP.
It is indicated that the two types of OGEs and the OPRs corresponding to the two types of NAO events have similar structures, 25 and both of them can develop into dipole NAO anomaly patterns. These studies have proved that CNOP is a useful tool to investigate the onset of the NAO event. In their studies, the T21L3 quasigeostrophic global spectral model, which is a simple three-level model designed by Marshall and Molteni (Marshall and Molteni, 1993), is applied under ideal conditions. Due to the feature of the T21L3 model, they selected geopotential height as the characterized variable, and potential vorticity is the input variable. For solving CNOP, Jiang et al. (Jiang et al., 2013) and Dai et al. (Dai et al., 2016) all used spectral projected gradient 30 2 (SPG2) algorithm (Birgin et al., 2001). The SPG2 was designed to solve the minimum problem with restraints by determining the gradients of the cost function (Guo-Dong, 2009). Several approaches with the same type have also been adopted to calculate the CNOP in the early years (Duan and Mu, 2006;Wang et al., 2012;Bo et al., 2014), such as the sequential quadratic programming (SQP) algorithm (Büskens and Maurer, 2000) and the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-model needs to be called to obtain the gradients of the initial condition in the solving process. However, these adjoint-based methods would have a high probability to produce local CNOPs when the objective function has multiple extreme values and would fail with large initial disturbance or long prediction time due to the strong nonlinearity of the dynamical model. Crucially, these traditional adjoint-based algorithms are not feasible to solve CNOP in complicated 5 operational models that do not have an adjoint available (Wang, 2010). In recent years, swarm intelligence algorithms are gradually applied in the research of the CNOP (Zheng et al., 2012;Yang et al., 2017). These algorithms determine the search direction from the position and fitness values of particles instead of gradients so that they can be extended to models without the adjoint model. It is also indicated that the swarm intelligence method still achieves global CNOP and has a shorter run time in the situation of larger initial perturbations, longer prediction times, multiple extrema values (Zheng et al., 2017) and 10 discontinuous objective functions (Mu et al., 2015). Although these algorithms have shortened the runtime, it is still very time-consuming to search CNOP in the original dimensions. To enhance the performance, the researchers combined feature extraction strategies with intelligent algorithms, transforming the problems with high dimensions into the low-dimension space.
At present, the tentative application of intelligent algorithms based on feature extraction yielded concrete achievements. The principal component analysis based genetic algorithm (PCAGA) (Zhang et al., 2017a), the modified artificial bee colony 15 algorithm (MABC) (Ren et al., 2016), the dynamic search fireworks algorithm with linearly decreased dimension number strategy (ld-dynFWA)  and PCA-based flower pollination (PCAFP)  have been successfully applied in the researches of tropical cyclone adaptive observations, El Niño-Southern Oscillation, and double-gyre variation, respectively. The CNOPs obtained by these methods have similar patterns and larger fitness values in comparison to the adjoint method. It is indicated that the PCA-based intelligent algorithm is appropriate for solving CNOP in high dimensional numerical 20 models, especially the models without the adjoint model, like CESM.
The objective of this paper is to find the OPRs which can produce the NAO anomaly pattern and explore the effect of the nonlinear process. We study the topic using the CESM, which is an ocean-atmosphere coupled model without an adjoint model. Thus, traditional algorithms like SPG2 are inappropriate for this case. The particle swarm optimization (PSO) and genetic algorithm (GA) hybrid algorithm (PSO-GA) (Kumar and Vidyarthi, 2016;Agarwal and Srivastava, 2018), which is 25 combined with the principal component analysis (PCA) strategy (PGAPSO), is selected to identify OPRs of the NAO. NAOI is selected as the objective function (North Atlantic sector), while the perturbations are superimposed on the Arctic region. We adopt total dry energy over the Arctic atmosphere as the constraint condition. The variation of NAOI and SLP are traced during simulation time, and the OPRs derived from the PGAPSO are compared with other adjoint-free methods (BGM, random) in terms of pattern structure and fitness values for validation. It is found that the nonlinear process mainly plays a role in the last 30 days of the optimization time. For two cases (Case 1 and Case 2 in Section 4) in different initial conditions, OPRs obtained by the improved algorithm can steadily turn the basic state into anomaly mode with anomaly higher (lower) NAOI. The structures of OPRs are also explored. It shows that the wind over the Arctic Ocean as well as the temperature around Greenland have obvious characteristics and a noticeable impact for N AO + and N AO − . The temperature is discovered as an important factor to trigger NAO events. In addition, the solving system is accelerated with multiple parallel frameworks. After parallelized with MPI and CUDA, the speed-up ratio of the intelligent solution system reaches 40.2× compared with its serial version.
The structure of this paper is organized as follows: Section 2 describes the CESM, and Section 3 presents the CNOP method, the PGAPSO algorithm, and the parallelization technique. Experiments and results are displayed in Section 4. This paper ends with a conclusion and future work in Section 5.

CESM
The CESM (Kay et al., 2015) is a new generation of fully coupled climate model developed in 2010. It has been widely used to simulate the carbon cycle (Lehner et al., 2015), ocean currents (Large and Caron, 2015), soil moisture (Swenson and Lawrence, 2012), precipitation (Hagos et al., 2016) and other climate phenomena. As shown in Figure   We simulate the NAO cases in the winter (DJF), and the parameter nhtfrq is set to -24, which denotes the daily average. The   5 https://doi.org/10.5194/npg-2020-27 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License.

CNOP
The CNOP is a natural extension of the linear singular vector into the nonlinear regime and is proposed to study predictability problems of weather and climate in numerical models (Mu et al., 2009). The CNOP can represent the initial perturbation that can trigger the largest uncertainty at the prediction time. The OPR of NAO is a kind of CNOP that makes the basic state evolve 5 into the NAO events. Suppose the nonlinear model can be briefly described as follows: where S denotes the state vector, and S 0 is the basic state at the initial time t 0 . F is the nonlinear partial differential operator of the model. The equation (1) has the discrete form: 10 where M t0→t represents the nonlinear propagator that "propagates" the initial state from time t 0 to the prediction time t. S t is the state vector at time t. If we superimpose an initial perturbation s 0 on the basic state, the development of the initial perturbation would be: The OPR refers to the initial perturbation that can make the objective function achieve the maximum (minimum) under the 15 constraint condition at prediction time. In this work, the NAOI difference between perturbation state and the reference state is selected as the objective function J: where s 0 is the initial perturbation and consists of physical variables listed in Table 1. According to the definition of the OPR, 20 the perturbation s * 0 (N AO + ) is the OPR of the N AO + and makes J achieve the maximum, whereas s * 0 (N AO − ) is the OPR of the N AO − and makes J achieve the minimum. denotes the constraint condition of the OPRs, ensuring the perturbation within a reasonable range. The constraint condition is consulted from the related works of the sensitive area identification for tropical cyclone (Zhang et al., 2017a) owing to the same variables, and is set to 10% of the summation of the kinetic energy of basic state in the vertical coordinate σ and verification areas D: where U 0 , V 0 , T 0 , and Π 0 are the initial conditions of zonal wind, meridional wind, temperature, and surface geopotential respectively. C p is the specific heat at the constant pressure which is set to 1005.7 J · kg −1 K −1 and T r is the reference 5 temperature with a value of 270K. R a denotes the ideal gas constant, and its value is set to 287.05 J · kg −1 K −1 . π r is the static reference pressure with a value of 1000 hP a.
In this experiment, we choose a blocking indicator proposed by Liu (Liu, 1994) to quantify the extent of the NAO events.
The NAOI is defined as the projection of the SLP field on the NAO anomaly pattern: where SLP d is obtained by subtracting the climatological mean from SLP output, and denotes the inner product operation of vectors. SLP N AO denotes the NAO anomaly pattern decomposed by the empirical orthogonal function (EOF) analysis, which is illustrated in Figure 3. The NAO spatial pattern is manifested as a typical meridional dipole mode, including the Iceland low pressure along with the North Atlantic subtropical high. In Figure 3, it is a pattern of the N AO + , presenting the mode with the negative anomalies in high latitude and the positive anomalies in low latitude.
15 Figure 3. The first mode of the EOF with SLP anomaly field concentrated in the North Atlantic region between 90 Combining the equation (4), (5) and (7), the objective function is described as follows: In summary, the objective function is the projection of the SLP field difference between the final state and the reference state on the NAO anomaly pattern, and the OPR of N AO + (N AO − ) can be obtained by solving the maximum (minimum) of the objective function J.  First, we obtain 900 original samples after running 10-year integration (only in winter) on a daily average using CESM.
Owing to their different physical units, we need to perform Z-Score normalization for each variable: where c refers to the number of variables. P v is the mean value of the variable v, and the σ v denotes the standard deviation.
After that, each piece of sample is reshaped into a vector with one dimension (1 × 976896), these samples constitute a 15 matrix with a size of 900 × 976896. Then each sample is subtracted from the mean values of these samples for centering, and the obtained sample is weighted according to the area of the grid: where P ij denotes the j th element of the i th sample data. n is the number of the samples, and r is the length for each sample.
lat(j) is the corresponding latitude of the j th element, and the area weight is calculated approximately via the cosine value of 20 the latitude. Then the eigenvalues (λ 1 , . . . , λ n ) and eigenvectors of the covariance matrix P P T are calculated to obtain principal components: where L is the eigenvector matrix, and Σ is a diagonal matrix whose entries in the main diagonal are the corresponding eigenvalues. The top m columns of the eigenvectors L sorting by their eigenvalues are selected as the principal components. The value of m is determined by the contribution rate. m is set to the minimum number of dimensions that meets the contribution rate of 90% to balance the efficiency and effectiveness. The reduced space with m dimensions is far smaller than the original one.
To obtain the extremum of the objective function, we adopt a hybrid algorithm improved from two efficient algorithms, PSO and GA. The PSO is a type of intelligent heuristic algorithm to solve the problem with NP property (Kennedy, 2011). The 5 position with the best fitness value is searched by tracing both individual optimal position and the optimal global position.
The algorithm flow can be briefly described as follows: (1) Initialize the speed (V ) and position (X) of each particle with random values. The random values obey the normal distribution and ensure most of the perturbations satisfy the constraints.
(2) For each particle i, position vector is in reduced space, so the position vector needs to be restored into original space via Then superpose the perturbation X i on the basic state. When 10 the model integration is finished, calculate the fitness value of each particle through the equation (8) and record its optimal position (X pb ) along with the optimal global position (X gb ).
(3) Update the position and speed of each particle. (4) Repeat steps as mentioned above until satisfying the terminal condition. The updating formula is as follows: where V k i is the speed of particle i for step k and V k+1 i is for particle i at step k + 1. c 1 is the self-awareness coefficient for the 15 historical self-optimal position, and c 2 is the social-awareness coefficient for the optimal global position of all particles. The empirical value of c 1 and c 2 are both set to 2. r 1 and r 2 are random factors in range of [0, 1]. X k pb refers to the best position of particle i up to the step k, and X k gb represents the best position of the entire swarm in k steps. ω k is the weight parameter and is calculated by: 20 where iter is the current number of steps, and iter max is set to 50.
PSO is the main body of the PGAPSO, and the GA further optimizes the search process. As a metaheuristic algorithm, the GA derives from natural selection (Goldberg and Holland, 1988). When the speed norm is smaller than the threshold, the particles are fed into GA for further search. The main operations of GA include selection, crossover, and mutation. Particles are selected according to their fitness value to breed a new generation. The selection probability for each individual is equal to 25 the ratio of its fitness value to the sum of the fitness value population: https://doi.org/10.5194/npg-2020-27 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License.
After that, the selected parents generate new individuals via crossover: Then the new generation mutates in a single location with probability p m to increase the randomness. The fitness value of each new particle is compared against its parents, and the best position is recorded. If the new individual has a better fitness value than the global best position, the global best position (X k gb ) would be replaced. Then the speed and position of particles 5 are updated using equation (12), and the final global fitness value is obtained until the iter reaches iter max .

Parallelization
The computation of CNOP in CESM is quite time-consuming. With 48 CPU cores, 30 particles, and 50 iterations, it takes about 8.33 days to obtain the OPRs in the serial program. Multiple parallel techniques and frameworks are adopted to improve the performance in this work. Recently, the Graphics Processing Unit (GPU) has been widely used in accelerating numerical models. Since GPU is suitable for large-scale parallel computing, it can significantly improve the execution performance of numerical calculation in climate models. A parallel scheme for Community Climate System Model (CCSM) has been proposed to shorten the runtime of climate prediction by porting the radiation module onto GPUs (Coleman and Feldman, 2013 Forecasting (WRF) was highly optimized using NVIDIA Tesla K40 with 2880 cores (Huang et al., 2015). Compared to the CPU-based parallel version running on four nodes, the GPU-based scheme performed better. A novel asynchronous strategy has provided significant performance benefits in CESM (Korwar et al., 2013). The most time-consuming routines have been accelerated via OpenACC directives, and it achieves a speedup of 1.19×-1.53× for the entire model. Another attempt for accelerating CESM was to port CESM along with a rewritten vertical remapping scheme onto GPUs (Carpenter et al., 2013). 25 Previous work results have shown that the performance of the optimized subroutine can be improved by the GPU technique substantially. In this work, we port several time-consuming subroutines in CAM onto GPUs via the PGI CUDA Fortran platform. The subroutine runtime is analyzed using pref, which is a performance evaluation tool for Linux. As shown in Figure   4, subroutine radclwmx and radabs both execute longer compared with other subroutines, so we rewrite these two subroutines with the CUDA interface. Simultaneously, we adopt kernel directives and OpenACC directives to simplify specific operations

5
In each iteration of PGAPSO, the calculation of fitness value for each particle is relatively independent. According to its features, multi-process techniques are suitable for executing these tasks concurrently. Here we adopt MPI, which enables the parallelization of the program by supporting communication and broadcasting between nodes, as the parallel framework to accelerate the algorithm. The pseudocode describes the flow of the parallel PGAPSO: Assume that n particles are initialized, then n+1 processes are launched. These processes are divided into two groups: one is 10 a master process, the others are slave processes. We assign a process for each particle as the slave process so that CESM can be called in parallel, and the fitness value can be obtained simultaneously. Figure 5 illustrates the parallel architecture of PGAPSO for solving CNOP. At each iteration, the master process allocates calculation tasks to slave processes. For each slave process, perturbation under constraint condition is generated and is superimposed on the basic state of CESM. Then CESM, which is paralleled with MPI, OpenMP, and CUDA, is called to perform the integration. The fitness values are calculated via equation 15 (8) and are gathered by the master process. After that, the master process broadcasts the optimal global value to slave processes via MPI. If particles' speed norm is less than the threshold value, the crossover and mutation operations are conducted. Then the master process compares the current particles and updates the information of optimal particle at the end of each iteration.
With the help of MPI, the performance of PGAPSO can be significantly enhanced. gather results from each process 9: if norm of particle speed ≤ ξ then update particle speed via V k+1 i 16: end for 4 Experiments and Results

Experimental Environment
We conduct experiments on the Tianhe-2 supercomputer, which is located in the National Supercomputer Center in Guangzhou, China. Each node consists of 2 Intel Ivy Bridge Xeon processors connected by Intel QuickPath Interconnect. NVIDIA Tesla K80 GPUs on Tianhe-2 are used in our GPU-based scheme for CESM acceleration. Each Tesla K80 GPU has 4992 CUDA 5 cores, and its double-precision performance is up to 2.91TFLOPS. Data transmission between CPUs and GPUs depends on PCI-e 3.0 bus with 40 lanes.

Dimensions of solution space
The first step is to decompose the principal component from the original sample. From the equation (11), we know that the dimension of solution space equals to the number of eigenvectors selected according to the accumulative variance ratio.

Simulation Duration for OPR
Since the NAO has an approximated e-folding period of two weeks, we intend to determine the most appropriate simulation duration within 15 days. We select 5 days, 7 days, 10 days, and 15 days as the optimization times to observe the largest variation of the NAOI amplitude. Using the PGAPSO with 50 iterations, the distribution of the optimal fitness value at different simulation durations is illustrated in Fig. 7. From the box plot, we can see that the 5-day simulation has the narrowest range of   Then we need to verify whether the OPRs obtained by PGAPSO can trigger the NAO events. We acquire the SLP difference patterns by subtracting the SLP data of reference state from the output, and these patterns for different simulation durations are shown in Fig. 8. The patterns of 5-day and 7-day simulation are in a state of instability with multiple dispersed pressure centers. 5 We can find that a strong negative center appears in the mode of N AO − in 7-day simulation, and this is the reason for the relatively lower ∆N AOI − in Fig. 7. In the same way, the strong positive center in the pattern of 10-day simulation also causes the abnormal ∆N AOI − . The typical feature of NAO event is the dipole SLP mode located near Iceland and Azores. Although the approximated structure of NAO events appears in the 7-day and 10-day optimization, the pressure cores are irregular and discrete with the trend of migration. As for the 15-day integration, the dipole centers form and migrate across the Atlantic Ocean, which is particularly evident in N AO − . Therefore, we select 15 days as the simulation duration in our experiments.

Evolution of NAOI and SLP pattern
To visualize the effects of PGAPSO, we compare it with two adjoint-free methods in the cases of 15-day simulation. One of 5 them is the breeding of growing mode (BGM), which has been put into operation by the National Centers for Environmental Prediction (NCEP) (Toth and Kalnay, 1997). The breeding vector (BV) generated from BGM retains the dynamical structure of rapid growth and is proven compatible with the atmospheric model. The crucial point is that it can be solved without gradient information. The other procedure is the random method, which has the same search times with the PGAPSO. The NAOI amplitude flows for these methods are displayed in Fig. 9. CN OP P O and CN OP N E denote the two types of OPRs  center of the Atlantic Ocean, while the negative pressure field migrates to the Norwegian Sea, resulting in the overall structure tilts to the east. The pattern of random vector is almost occupied by positive pressure, which is mainly located in Irish islands.
Unlike the N AO + , the N AO − displays similar structures in these patterns, which have the typical NAO mode with a strong positive center located in Iceland and the negative center(s) around the Azores. The difference is that, the patterns of BV and random vector include several discrete negative cores. In Case 1, the OPRs solved by PGAPSO can trigger two types of NAO 5 events with typical dipole mode.
The situation in Case 2 is illustrated in Fig. 11. For N AO + , the OPR obtained by PGAPSO forms an SLP pattern with noticeable features of the N AO + event, while BV and random vector generate irregular SLP fields with scattered centers.
From Fig. 9, we find that the perturbation state has a larger probability of moving to the contrary phase of the reference. while CNOPs can still make the basic state evolve into the NAO event that has the same phase with the reference state. Greenland gradually expands and reaches its peak. In this process, the most obvious stage of change is from Day 10 to Day 20 https://doi.org /10.5194/npg-2020-27 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License.  15, and the large difference of pressure and the dipole mode are formed at this stage. This also confirms the conclusion that the role of the nonlinear process is mainly at the end of the prediction phase.

CNOP pattern
In this section, we aim to explore the structure of the OPRs that trigger the NAO event. Fig. 13 shows the OPRs obtained using the PGAPSO method for Case 1 in the previous section, including zonal wind, meridional wind, and temperature in the nearsurface layer. The left subfigure is the OPR corresponding to the N AO + , and the positive temperature structure appears in the Arctic Ocean region, accompanied by the wind from the Arctic to the Pacific Ocean. A negative temperature field distributes 5 near Iceland, with the wind perturbation direction from the Arctic to the Atlantic. The wind direction of negative-phase OPR is opposite to that of positive-phase OPR in the Arctic Ocean and near Iceland, and the intensity of temperature is stronger than that of positive-phase OPR.
For Case 2, the OPR of the N AO + has a higher temperature intensity, while the OPR of the N AO − has smaller values in temperatures, along with the monotonous distribution characteristics. As can be seen from Fig. 9 and Fig. 10, under the 10 circumstance of the relative high reference NAOI in Case 1, the amplitude of the OPR for N AO + is tiny compared to the N AO − , and the initial state after perturbations superposition is more liable to develop into a N AO − state. The situation in Case 2 is similar to Case 1. The reference NAOI is in a negative phase, and the perturbation that can trigger a strong N AO + event has a larger intensity. This could mean that the simulation result is more inclined to migrate to the opposite phase as the reference state. It can also be seen from Fig. 13 and Fig. 14 that, for the two cases whose reference states are widely different, 15 the patterns and value ranges of the OPRs are also substantially different.
The above diagrams describe the spatial characteristics of multi-variable perturbations, and the NAO events can also be triggered by a single variable, such as temperature. Following the above procedure, temperature perturbations are limited under a constrained condition of T 2 ≤ 100 and are superimposed on the 25 th layer of the atmosphere (near the surface) in the  19 https://doi.org /10.5194/npg-2020-27 Preprint. Discussion started: 1 July 2020 c Author(s) 2020. CC BY 4.0 License.
Arctic region. Using the PGAPSO, the OPRs composed of temperature are obtained, as illustrated in Fig. 15. The patterns of OPR of N AO + and OPR of N AO − have almost converse structures in the North Atlantic sector. There exists a noticeable pressure difference between Greenland and Iceland, with several centers in the mid-to-high latitudes and small cores around the Arctic region. Besides, the positive anomaly in eastern Europe is also conducive to the formation of the dipole. It aligns with the hypotheses that atmospherical temperature gradients will result in the anomalous poleward atmospherical heat transport 5 and an increased probability of the NAO reaching its high-index state.

Performance Analysis
To manifest the performance improvement of parallel PGAPSO, Fig. 16 compares the runtime of parallel PGAPSO and serial PGAPSO in one iteration. Due to the limitation of computing resources, we conduct experiments on no more than 1080 CPU cores. From Fig. 16, we can see that when the number of CPU cores is more than 720, it will take longer to run the serial 10 algorithm. When the overmuch CPU cores are assigned to the serial program, the frequent communications would make the runtime increase. In contrast, the parallel version can make full use of resources, and its execution time keeps going down. The speedup ratio, which raises with the increasing CPU cores' number, is displayed in Table 2. With 1080 CPU cores, PGAPSO, based on the parallel scheme, achieves a speedup of 40.2× compared to its serial version.  The parallel PGAPSO with GPU technique is also compared against the parallel PPSO and the parallel PGAPSO without GPU in Fig. 17. From Fig. 17, along with the increase of CPU cores, all methods have a trend of decrease in time consumption.
When the CPU cores are increased to 1080, the runtimes of these three methods for one step are 334.12s, 270.18s, and 247.33s, respectively. Since the PGAPSO contains the operations like crossover and mutation, it may take slightly longer to execute compared with the PPSO. However, the accelerators in GPU relieve the performance bottleneck caused by massive matrix 5 computation in CESM. Although only two subroutines are optimized, the PGAPSO obtains promising speedup effects. It is also proved that the GPU has a potential capacity in accelerating numerical models.
Meanwhile, the convergence and optimal values of the PGAPSO are shown in Fig. 18. The speed norm is a measure indicator of the distance between a particle and the current optimal particle. Here we adopt mean speed norm to represent the convergence of the entire swarm. If the mean speed norm approaches zero, it shows that all particles move to the positions close to the 10 optimal particle, and it means the algorithm converges. In Fig. 18, there is not much difference between the PGAPSO and the PPSO at the beginning. The mean speed norm of the PGAPSO falls rapidly around Step 15 and converges in about Step 26.
Besides, the best fitness value of the PGAPSO is greater than the PPSO. The advantages of the hybrid algorithm display in two aspects: the crossover operation of the GA has a relatively larger probability of generating particles with better fitness value, and the mutation operation increases the randomness of the current particle to avoid plunging into local optimum. From Fig.   15 18, it is found that the PGAPSO improves the convergence speed and the solution quality compared with the PPSO.  Initial condition errors are critical factors that result in uncertainty when simulating and predicting the NAO, and the NAO simulation can be improved by reducing the errors in initial condition. As a type of initial condition error, the OPR would cause the largest prediction error and eventually evolve into climate events. Therefore, the research of OPRs would help to reveal the dynamic processes related to the NAO events and improve the prediction accuracy.

5
In this paper, we adopt a novel CNOP-based approach to study OPRs of the NAO using CESM. Since the CESM does not have a corresponding adjoint model, we cannot solve CNOP through the adjoint-based method mentioned in the previous works, such as SQP and SPG2. We present a hybrid intelligence algorithm named PGAPSO to solve CNOP. First, the optimization time are determined from experiments. Then the perturbations, which contain the variables of zonal wind, meridional wind, temperature, specific humidity, surface pressure, and surface geopotential, are superimposed on the basic state. We in-10 vestigate the OPRs in two cases with different initial state and similar NAOI, and both of the cases occur in winter (DJF). To validate the effectiveness of the PGAPSO, the trends of the NAOI amplitude and SLP anomaly patterns are compared to the BGM and random method. It is indicated that the OPRs obtained by PGAPSO can trigger the NAO events with the typical dipole pattern and have the largest |∆N AOI|. The SLP variation in the north hemisphere is traced during simulation time, and the features of teleconnection patterns are identified via the SLP difference mode. 15 We also analyze the OPRs' structures for both N AO + and N AO − in these two cases. The wind directions of N AO + and N AO − present the opposite mode around Arctic Ocean and Iceland, and the temperature perturbations over Greenland island would promote the occurrence of NAO events. It is also demonstrated that even slight errors (see N AO − in Case 2) may cause a large uncertainty in simulation.
Since temperature has a significant effect on the NAO variation, the experiments with OPRs containing only temperature are 20 conducted. Under a reasonable constraint, the temperature OPRs can still successfully trigger the NAO events for both N AO + and N AO − . The OPRs for these two types of NAO events have almost converse structures, especially over Greenland and European, confirming the view of previous researches.
Moreover, multiple parallel frameworks are applied in this work to improve efficiency. The parallelization mainly consists of two parts: parallelization of the algorithm with MPI and acceleration of CESM using CUDA. It significantly enhances the 25 performance and achieves a speed-up of 40.2×.
Our future work is to apply the PGAPSO algorithm to study other climatological phenomena with the CNOP method. In this work, the CESM is regarded as a black box program. It is convenient to transplant the solver framework to other numerical models. We will also apply our approach to models that have high dimensions or have no corresponding adjoint model.