In this paper, we propose a sampling algorithm based on state-of-the-art statistical machine learning techniques to obtain conditional nonlinear
optimal perturbations (CNOPs), which is different from traditional (deterministic) optimization methods.

One

Generally, the statistical machine learning techniques refer to the marriage of traditional optimization methods and statistical methods, or, say, stochastic optimization methods, where the iterative behavior is governed by the distribution instead of the point due to the attention of noise. Here, the sampling algorithm used in this paper is to numerically implement the stochastic gradient descent method, which takes the sample average to obtain the inaccurate gradient.

of the critical issues for weather and climate predictability is the short-term behavior of a predictive model with imperfect initial data. For assessing subsequent errors in forecasts, it is of vital importance to understand the model's sensitivity to errors in the initial data. Perhaps the simplest and most practical way is to estimate the likely uncertainty in the forecast by considering running it with initial data polluted by the most dangerous errors. The traditional approach is the normal mode methodBoth the approaches of normal and non-normal modes are based on the assumption of linearization, which means that the initial error must be so small
that a tangent linear model can approximately quantify the error's growth. Besides, the complex nonlinear atmospheric and oceanic processes have not
yet been well considered in the literature. To overcome this limitation,

The primary goal of obtaining the CNOPs is to efficiently and effectively implement nonlinear programming, mainly including the spectral projected
gradient (SPG) method

To overcome the limitations of the adjoint-based method described above, we start to take consideration from the perspective of stochastic
optimization methods, which as the workhorse have powered recent developments in statistical machine learning

In this section, we provide a brief description of the CNOP approach. It should be noted that the CNOP approach has been extended to investigate the
influences of other errors, i.e., parameter errors and boundary condition errors, on atmospheric and oceanic models

Let

With both the reference states at time

Throughout the paper, the norm

Both the objective function (Eq.

In this section, we first describe the basic idea of the sampling algorithm. Then, it is shown for comparison with the baseline algorithms in
numerical implementation. Finally, we conclude this section with a rigorous Chernoff-type concentration inequality to characterize the degree to which
the sample average probabilistically approximates the exact gradient. The detailed proof is postponed to Appendix

The key idea for us to consider the sampling algorithm is based on the high-dimensional Stokes' theorem, which reduces the gradient in the unit ball
to the objective value on the unit sphere in terms of the expectation. Let

In other words, the objective function

Before proceeding to the next, we note the unit sphere as

The rigorous description and proof are shown in Appendix

In the numerical computation, we obtain the approximate gradient,

Next, we provide a simple but intuitive analysis of the convergence in probability for the samples in practice. With the representation of

Combined with the error estimate of gradient (Eq.

Finally, we conclude the section with the rigorous Chernoff-type bound in probability for the simple but intuitive analysis above with the following
theorem. The rigorous proof is shown in Appendix

In this section, we perform several experiments to compare the proposed sampling algorithm with the baseline algorithms for two numerical models, the
Burgers equation with small viscosity and the Lorenz-96 model. After the CNOP was first proposed in

The objective values of CNOPs and the percentage over that computed by the definition method. Bold emphasizes the high efficiency of the sampling method.

Spatial distributions of CNOPs (unit:

We first consider a simple theoretical model, the Burgers equation with small viscosity under the Dirichlet condition. It should be noted here that we
adopt the internal units meters and seconds. The reference state

The constraint parameter is set to be

The spatial distributions of the CNOPs computed by the baseline algorithms and the sampling method are shown in Fig.

The spatial pattern of the CNOPs computed by two baseline algorithms, the definition method and the adjoint method, are nearly identical.

Based on the spatial pattern of the CNOPs computed by two baseline algorithms, there are some fluctuating errors for the sampling method.

When the number of samples increase from

Comparison of computation times (unit: ^{®} Core™ i9-10900 CPU of 2.80

Nonlinear evolution behavior of the CNOPs in terms of the norm square.

We have figured out the spatial distributions of the CNOPs in Fig.

Next, we show the computation times to obtain the CNOPs by the baseline algorithms and the sampling method in Table

Finally, we describe the nonlinear evolution behavior of the CNOPs in terms of norm squares

Nonlinear evolution behavior of the CNOPs in terms of the difference and relative difference of the norm square.

The Burgers equation with small viscosity is a partial differential equation, which is an infinite-dimensional dynamical system. In the numerical
implementation, it corresponds to the high-dimensional case. Taking all the performances with different test quantities into account, i.e., spatial
structures, objective values, computation times, and nonlinear error growth, we conclude that the adjoint method obtains almost the total information
and saves much computation time simultaneously; the sampling method with

Next, we consider the Lorenz-96 model, one of the most classical and idealized models, which is designed to study fundamental issues regarding the
predictability of the atmosphere and weather forecasting

With a cyclic permutation of the variables as

In this study, we use the classical fourth-order Runge–Kutta method to numerically solve the Lorenz-96 model (Eq.

Spatial distributions of CNOPs.

The objective values of CNOPs and the percentage over that computed by the definition method. Bold emphasizes the high efficiency of the sampling method.

Similarly, we show the computation times to obtain the CNOPs by the baseline algorithms and the sampling method in
Table

Comparison of computation times. Run on Matlab2022a with Intel^{®} Core™ i9-10900 CPU of 2.80

Nonlinear evolution behavior of the CNOPs in terms of the norm square.

Nonlinear evolution behavior of the CNOPs in terms of the difference and relative difference of the norm square.

Finally, we demonstrate the nonlinear evolution behavior of the CNOPs in terms of norm squares

Although the dimension of the Lorenz-96 model is not very large due to being composed of a finite number of ordinary differential equations, it
possesses strongly nonlinear characters. Unlike the Burgers equation with small viscosity, the adjoint method does not work well for the Lorenz-96
model, which spends more computation time and obtains less percentage of the total information. The sampling method performs more advantages in the
computation, saving far more computation time and obtaining more information. However, the performance in reducing the number of samples from

In this paper, we introduce a sampling algorithm to compute the CNOPs based on the state-of-the-art statistical machine learning techniques. The theoretical guidance comes from the high-dimensional Stokes' theorem and the law of large numbers. We derive a Chernoff-type concentration inequality to rigorously characterize the degree to which the sample average probabilistically approximates the exact gradient. We show the advantages of the sampling method by comparison with the performance of the baseline algorithms, e.g., the definition method and the adjoint method. If there exists the adjoint model, the computation time is reduced significantly with the exchange of much storage space. However, the adjoint model is unusable for the complex atmospheric and oceanic model in practice.

For the numerical tests, we choose two simple but representative models, the Burgers equation with small viscosity and the Lorenz-96 model. The Burgers equation with small viscosity is one of the simplest nonlinear partial differential equations simplified from the Navier–Stokes equation, which holds a high-dimensional property. The Lorenz-96 model is a low-dimensional dynamical system with strong nonlinearity. For the numerical performance of a partial differential equation, the Burgers equation with small viscosity, we find that the adjoint method performs very well and saves much computation time; the sampling method can share nearly the same computation time with the adjoint method with dropping a few accuracies by adjusting the number of samples; and the computation time can be shortened more by reducing the number of samples further with the nearly consistent performance. For the numerical performance of a low-dimensional and strong nonlinear dynamical system, the Lorenz-96 model, we find that the adjoint method takes underperformance, but the sampling method fully occupies the dominant position, regardless of saving the computation time and performing the CNOPs in terms of the spatial pattern, the objective value, and the nonlinear growth. Still, unlike the Burgers equation with small viscosity, the performance is not obvious for reducing the number of samples for the Lorenz-96 model. Based on the comparison above, we propose a possible conclusion that the sampling method probably works very well for an atmospheric or oceanic model in practice, which is a partial differential equation with strong nonlinearity. Perhaps the high efficiency of the sampling method performs more dominantly, and the computation time is shortened obviously by reducing the number of samples.

Currently, the CNOP method has been widely applied to predictability in meteorology and oceanography. For the nonlinear multiscale interaction (NMI)
model

For

For the case of

Because the vector

Since the ratio of the surface area and the volume of the unit ball

If the objective function

Because

For any

For any

Since

Hence, we obtain the equivalent representation of the gradient

Finally, since

Considering any

Because

With Lemma A2, the random variable

The sub-Gaussian norm of a random variable

Combined with Lemma A1 and Lemma A3, we can obtain the concentration inequality for the samples as

Based on the triangle inequality, we can proceed with the concentration inequality with the estimate of the difference between the expectation of
objective value and itself (Eq.

The codes that support the findings of this study are available from the corresponding author, Bin Shi, upon reasonable request.

No data sets were used in this article.

BS constructed the basic idea of this paper, derived all formulas and the proofs, coded the sampling method in MATLAB to show all the figures, and wrote the paper. GS joined the discussions of this paper and provided some suggestions. All the authors contributed to the writing and reviewing of the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We are indebted to Mu Mu for seriously reading an earlier version of this paper and providing his suggestions about this theoretical study. Bin Shi would also like to thank Ya-xiang Yuan, Ping Zhang, and Yu-hong Dai for their encouragement to understand and analyze the nonlinear phenomena in nature from the perspective of optimization in the early stages of this project. This work was supported by grant no. 12241105 of NSFC and grant no. YSBR-034 of CAS.

This research has been supported by the National Natural Science Foundation of China (grant no. 12241105) and the Chinese Academy of Sciences (grant no. YSBR-034).

This paper was edited by Stefano Pierini and reviewed by Stéphane Vannitsem and two anonymous referees.