Ordering of trajectories reveals hierarchical finite-time coherent sets in Lagrangian particle data: detecting Agulhas rings in the South Atlantic Ocean

Wichmann, David; Kehl, Christian; Dijkstra, Henk A.; van Sebille, Erik

doi:https://doi.org/10.5194/npg-28-43-2021

Articles | Volume 28, issue 1

https://doi.org/10.5194/npg-28-43-2021

Articles | Volume 28, issue 1

Research article

19 Jan 2021

Research article |

| 19 Jan 2021

Ordering of trajectories reveals hierarchical finite-time coherent sets in Lagrangian particle data: detecting Agulhas rings in the South Atlantic Ocean

David Wichmann, Christian Kehl, Henk A. Dijkstra, and Erik van Sebille

Abstract

The detection of finite-time coherent particle sets in Lagrangian trajectory data, using data-clustering techniques, is an active research field at the moment. Yet, the clustering methods mostly employed so far have been based on graph partitioning, which assigns each trajectory to a cluster, i.e. there is no concept of noisy, incoherent trajectories. This is problematic for applications in the ocean, where many small, coherent eddies are present in a large, mostly noisy fluid flow. Here, for the first time in this context, we use the density-based clustering algorithm of OPTICS (ordering points to identify the clustering structure; Ankerst et al., 1999) to detect finite-time coherent particle sets in Lagrangian trajectory data. Different from partition-based clustering methods, derived clustering results contain a concept of noise, such that not every trajectory needs to be part of a cluster. OPTICS also has a major advantage compared to the previously used density-based spatial clustering of applications with noise (DBSCAN) method, as it can detect clusters of varying density. The resulting clusters have an intrinsically hierarchical structure, which allows one to detect coherent trajectory sets at different spatial scales at once. We apply OPTICS directly to Lagrangian trajectory data in the Bickley jet model flow and successfully detect the expected vortices and the jet. The resulting clustering separates the vortices and the jet from background noise, with an imprint of the hierarchical clustering structure of coherent, small-scale vortices in a coherent, large-scale background flow. We then apply our method to a set of virtual trajectories released in the eastern South Atlantic Ocean in an eddying ocean model and successfully detect Agulhas rings. We illustrate the difference between our approach and partition-based k-means clustering using a 2D embedding of the trajectories derived from classical multidimensional scaling. We also show how OPTICS can be applied to the spectral embedding of a trajectory-based network to overcome the problems of k-means spectral clustering in detecting Agulhas rings.

Download & links

Article (PDF, 9929 KB)

Download & links

How to cite.

Received: 20 Jun 2020 – Discussion started: 29 Jun 2020 – Revised: 19 Oct 2020 – Accepted: 03 Nov 2020 – Published: 19 Jan 2021

1 Introduction

Understanding the transport of tracers in the ocean is an important topic in oceanography. Despite large-scale transport features of the mean flow, on smaller scales, mesoscale eddies and jets play an important role for tracer transport (van Sebille et al., 2020). Such eddies can capture large amounts of a tracer and, while transported in a background flow, redistribute them in the ocean. Eddies have been shown to play an important role in the accumulation of plastic (Brach et al., 2018) and the transport of heat and salt (Dong et al., 2014). To quantify the effects of eddies on tracer transport in the ocean, it is necessary to develop methods that are able to detect and track them. Many methods exist to detect such finite-time coherent sets of fluid parcels based on different mathematical or heuristic principles (Hadjighasem et al., 2017). The term finite-time coherent set is based on the work of Froyland et al. (2010) and is, in our context, defined as a set of particles that, in a sense, stay specifically close to each other along their entire trajectories. Here, for the first time in this context, we make use of the density-based clustering algorithm OPTICS (ordering points to identify the clustering structure; Ankerst et al., 1999) to detect finite-time coherent sets in Lagrangian trajectory data.

The detection of coherent Lagrangian vortices using abstract embeddings of Lagrangian trajectories together with data-clustering techniques has received significant attention in the recent literature (Froyland and Padberg-Gehle, 2015; Hadjighasem et al., 2016; Padberg-Gehle and Schneide, 2017; Banisch and Koltai, 2017; Schneide et al., 2018; Froyland and Junge, 2018; Froyland et al., 2019). Using embedded trajectories for the detection of finite-time coherent sets is interesting as it allows one to use sparse trajectory data, and it can, in principle, be applied to ocean drifter trajectories, as demonstrated by Froyland and Padberg-Gehle (2015) and Banisch and Koltai (2017) for the detection of the five ocean basins. Yet, most of these methods cluster trajectory data with graph partitioning, which does not incorporate the difference between coherent, clustered trajectories and noisy trajectories that should not belong to any cluster. Graph partitioning has been shown to work in situations where the finite-time coherent sets are not too small compared to the fluid domain (Froyland and Padberg-Gehle, 2015; Hadjighasem et al., 2016; Padberg-Gehle and Schneide, 2017; Banisch and Koltai, 2017; Froyland and Junge, 2018). For applications to Lagrangian trajectory data sets on basin-scale ocean domains, where multiple small-scale coherent sets (eddies) coexist with noisy trajectories in the background, graph partitioning is, however, likely to fail. Similar observations were made by Froyland et al. (2019) for the partition-based clustering approaches based on transfer and dynamic Laplace operators (Froyland and Junge, 2018). Although some attempts have been made to accommodate such concepts in hard partitioning, e.g. by incorporating one additional cluster corresponding to noise (Hadjighasem et al., 2016), this approach is likely to fail for large ocean domains, as discussed by Froyland et al. (2019) and shown in Sect. 4 of this paper. Froyland et al. (2019) have developed a special form of trajectory embedding, based on sparse eigenbasis decomposition, given the eigenvectors of transfer operators and dynamic Laplacians. By superposing different sparse eigenvectors, they successfully separate coherent vortices from unclustered background noise.

Motivated by the results Froyland et al. (2019) obtained by developing a new form of trajectory embedding, we here explore the potential of another clustering algorithm to overcome the inherent problems of partition-based clustering. We use the density-based clustering method of OPTICS, developed by Ankerst et al. (1999), to detect finite-time coherent sets in large ocean domains, using a very simple choice of embedding (see Sect. 3.2.1). Density-based clustering aims to detect groups of data points that are close to each other, i.e. regions with high data density. Our data points correspond to entire trajectories, and groups of trajectories staying close to each other over a certain time interval correspond to such regions of high point density. Different from partition-based methods such as k-means or fuzzy-c-means, OPTICS does not require one to fix the number of clusters beforehand. Furthermore, density-based clustering has an intrinsic notion of a noisy data point – a point does not belong to any cluster (i.e. a finite-time coherent set) if it is not part of a dense region. A more detailed comparison of the method presented here to existing related methods can be found in Sect. 3.4.

Another desirable property of the OPTICS algorithm is its ability to capture coherence hierarchies. In the ocean, coherent sets of trajectories naturally come with a notion of such a hierarchy. For example, the surface flow in the North Atlantic Ocean can be seen as approximately coherent (Froyland et al., 2014), while mesoscale eddies and jets are also finite-time coherent sets of trajectories at smaller scales within the North Atlantic Ocean. Froyland et al. (2019) show how their leading eigenvectors resolve coherent sets at large scales, while small-scale results can be obtained with a sparse eigenbasis approximation of a set of eigenvectors. Similarly, clustering results obtained from OPTICS is typically hierarchical. The main result of OPTICS, i.e. the reachability plot, provides this hierarchical information in a simple 1D graph.

In Sect. 4, we first show how OPTICS detects finite-time coherent sets at different scales for the Bickley jet model flow (also discussed, e.g., by Hadjighasem et al., 2017) and successfully detects the six coherent vortices and the jet as the steepest valleys in the reachability plot. The general structure of the reachability plot also reveals the large-scale finite-time coherent sets, i.e. the northern and southern parts of the model flow, separated by the jet. We then apply our method to Lagrangian particle trajectories released in the eastern South Atlantic Ocean, where large rings detach from the Agulhas Current (e.g. Schouten et al., 2000). We detect several Agulhas rings and, on the larger scale, also separate the eastward- and westward-moving branches of the South Atlantic subtropical gyre. While the traditional approach to studying Agulhas rings is based on sea surface height analysis (see, e.g., Dencausse et al., 2010), several methods based on virtual Lagrangian trajectories have been applied to Agulhas ring detection before (Haller and Beron-Vera, 2013; Beron-Vera et al., 2013; Froyland et al., 2015; Hadjighasem et al., 2016; Tarshish et al., 2018). Our method is different from these approaches in that it is directly applicable to a trajectory data set, i.e. without much preprocessing of the data. As the OPTICS algorithm is readily available in the scikit-learn library in Python, the detection of finite-time coherent sets can be done without much effort and with only a few lines of code. A further difference is the mentioned intrinsic notion of coherence hierarchy, which allows for simultaneous analysis of trajectory data at different scales. While we mainly focus on the direct embedding of trajectories in an abstract, high-dimensional Euclidean space, we also show in Appendix C that OPTICS can be used to overcome the limits of k-means clustering in the context of spectral clustering of the trajectory-based network of Padberg-Gehle and Schneide (2017).

2 Trajectory data sets

2.1 Quasi-periodically perturbed Bickley jet

We apply our method to a model system that has been used frequently in studies to detect finite-time coherent sets (Hadjighasem et al., 2017; Padberg-Gehle and Schneide, 2017; Hadjighasem et al., 2016; Banisch and Koltai, 2017; Froyland and Junge, 2018). The velocity field of the quasi-periodically perturbed Bickley jet (Bickley, 1937; del Castillo-Negrete and Morrison, 1993) is defined by a stream function $ψ (x, y, t)$ , i.e. $\dot{x} = - \frac{\partial ψ}{\partial y}$ and $\dot{y} = \frac{\partial ψ}{\partial x}$ , with $ψ (x, y, t) = ψ_{0} (y) + ψ_{1} (x, y, t)$ consisting of a stationary eastward background flow as follows:

\begin{matrix} (1) & ψ_{0} (y) = - U L \tanh (y / L), \end{matrix}

and a time-dependent perturbation, as follows:

\begin{matrix} (2) & ψ_{1} (x, y, t) = U L {sech}^{2} (y / L) Re [\sum_{n = 1}^{3} f_{n} (t) \exp (i k_{n} x)], \end{matrix}

where Re(z) denotes the real part of the complex number z. We use the same parameter values as Hadjighasem et al. (2017), with U=62.66 m/s the characteristic velocity of the zonal background flow, and L=1770 km. The parameters in Eq. (2) are given by $k_{n} = 2 n / r_{0}$ and $f_{n} (t) = ϵ_{n} \exp (- i k_{n} c_{n} t)$ , with ϵ₁=0.075, ϵ₂=0.4, ϵ₃=0.3, c₁=0.1446U, c₂=0.205U and c₃=0.461U. The domain of interest is $Ω = [0, π r_{0}] \times [- 3000 km$ , 3000 km], where r₀=6371 km is the radius of the Earth, and the left and right edges of Ω are identified, i.e. the flow is periodic in the x direction with period πr₀. Similar to Banisch and Koltai (2017), we seed the domain with an initial number of 12 000 particles on a uniform 200×60 grid. For this choice, the initial particle spacing is slightly above 100 km in both directions. We compute the trajectories for 40 d with a time step of 1 s using the SciPy integrate package. We output the trajectories every day, i.e. we have T=41 data points in time for each trajectory.

2.2 Agulhas rings in the South Atlantic

To test the OPTICS algorithm with a more realistic ocean flow, we simulate surface particle trajectories in a strongly eddying ocean model. Surface velocities are derived from a Nucleus for European Modelling of the Ocean (NEMO) ORCA-N006 run (Madec, 2008), which has a horizontal resolution of $1 / 12^{\circ}$ and velocity output for every 5 d. The model is forced by reanalysis and the observed data of wind, heat and freshwater fluxes (Dussin et al., 2016), i.e. the currents do not only contain the geostrophic component, as is the case in altimetry-derived currents (Beron-Vera et al., 2013; Froyland et al., 2019). For the advection of virtual particles, we use version 1.11 of the open source Parcels framework (Lange and van Sebille, 2017, see http://oceanparcels.org/, last access: 2 January 2021). The 2D surface current velocity is interpolated in space and time with the C-grid interpolation scheme of Delandmeter and van Sebille (2019), using a fourth-order Runge–Kutta method with a time step of 10 min. We initially distribute particles uniformly in the ocean on the vertices of a $0.2^{\circ} \times 0.2^{\circ}$ grid in the domain (30^∘ W, 20^∘ E) × (40^∘ S, 20 ^∘ S), which corresponds to a total number of 23 821 particles. At 30^∘ S, a spacing of 0.2^∘ corresponds to roughly 20 km. The particles start on 5 January 2000 and are advected for 2 years. We output the trajectories with a time interval of 5 d. We only use the first 100 d as data to detect the finite-time coherent sets, i.e. we have T=21 data points for each trajectory, but also look at later times to see how long the rings need to disperse. We provide the used trajectory data for the Agulhas flow as a NumPy file on Zenodo (Wichmann, 2020 b).

3 Methods

3.1 Detecting coherent structures in Lagrangian trajectory data

For N trajectories of dimension D and length T, the trajectory information can be stored in a data matrix $X \in R^{N \times D T}$ , where each row results from a particle trajectory by concatenating the different spatial dimensions. The analysis of the trajectory data to detect the finite-time coherent sets of trajectories (Froyland and Padberg-Gehle, 2015; Banisch and Koltai, 2017; Hadjighasem et al., 2016; Padberg-Gehle and Schneide, 2017; Schneide et al., 2018; Froyland and Junge, 2018; Wichmann et al., 2020) can be split into the following two essential steps:

Step 1. Embedding of the trajectories in an abstract (metric) space, i.e. $X \to \bar{X} \in R^{N \times M}$ , where M≤DT. If one uses a dimensionality reduction method, then M<DT.
Step 2. Clustering of the embedded data with a clustering algorithm.

The embedding is necessary to represent the trajectories as points in a metric space. Different options for embedding the trajectories exist, e.g. a direct embedding of the data points along the trajectories (Froyland and Padberg-Gehle, 2015) or embeddings based on the eigenvectors derived from networks that are defined by physically motivated trajectory similarities (Banisch and Koltai, 2017; Padberg-Gehle and Schneide, 2017; Banisch and Koltai, 2017; Froyland and Junge, 2018). Once an embedding of each trajectory as a point in a metric (typically Euclidean) space is established, one can apply a clustering algorithm. Roughly speaking, clustering algorithms try to identify groups of points that are close to each other as a cluster. Partition-based clustering methods divide the entire data set into a (typically fixed) number of K clusters, such that each data point belongs to a cluster. The most popular method in this category is the k-means algorithm, which tries to find a given number of K clusters such that the sum of the pairwise squared distances of points within a cluster is minimized. Other clustering algorithms contain a concept of noisy data, i.e. data points that do not belong to any cluster or belong to a cluster only with a certain probability. Examples of the former case are density-based spatial clustering of applications with noise (DBSCAN; Ester et al., 1996), as discussed by Schneide et al. (2018) in the fluid dynamics context, and the OPTICS (Ankerst et al., 1999) algorithm presented here. For the latter case, the most popular method is fuzzy-c-means clustering, as discussed by Froyland and Padberg-Gehle (2015) in the context of finite-time coherent sets.

Figure 1 shows a few possible options for trajectory embedding and clustering that have partially been explored before (see the footnotes in the figure for the combinations used in related studies). For a given trajectory data set, one can, in principle, apply an arbitrary combination of embedding and clustering methods. Only a few of the different combinations have been explored so far, and many more options for embedding and clustering (like those shown in Fig. 1) exist. It is important to note that a good choice of embedding and clustering might well depend on the specific problem at hand, and there might be no combination that performs well for all possible situations.

https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f01

Figure 1Different steps for detecting coherent trajectories in Lagrangian data with trajectory clustering. The figure is nonexhaustive, and many more options for embedding and clustering exist. Footnotes: ¹ Froyland and Padberg-Gehle (2015). ² Hadjighasem et al. (2016), Padberg-Gehle and Schneide (2017) and Banisch and Koltai (2017) all define networks with spectral embedding and subsequent k-means clustering. Froyland et al. (2019) define spectral embeddings as being on dynamic Laplacian and transfer operators. ³ Schneide et al. (2018).

Ordering of trajectories reveals hierarchical finite-time coherent sets in Lagrangian particle data: detecting Agulhas rings in the South Atlantic Ocean

2.1 Quasi-periodically perturbed Bickley jet

2.2 Agulhas rings in the South Atlantic

3.1 Detecting coherent structures in Lagrangian trajectory data

3.2 Trajectory embedding

3.2.1 Direct embedding

3.2.2 Dimensionality reduction with classical multidimensional scaling

3.3 Clustering with OPTICS

3.4 Comparison to related methods

4.1 Bickley jet flow

4.2 Agulhas rings