The detection of finite-time coherent particle sets in Lagrangian trajectory data, using data-clustering techniques, is an active research field at the moment. Yet, the clustering methods mostly employed so far have been based on graph partitioning, which assigns each trajectory to a cluster, i.e. there is no concept of noisy, incoherent trajectories. This is problematic for applications in the ocean, where many small, coherent eddies are present in a large, mostly noisy fluid flow. Here, for the first time in this context, we use the density-based clustering algorithm of OPTICS

Understanding the transport of tracers in the ocean is an important topic in oceanography. Despite large-scale transport features of the mean flow, on smaller scales, mesoscale eddies and jets play an important role for tracer transport

The detection of coherent Lagrangian vortices using abstract embeddings of Lagrangian trajectories together with data-clustering techniques has received significant attention in the recent literature

Motivated by the results

Another desirable property of the OPTICS algorithm is its ability to capture coherence hierarchies. In the ocean, coherent sets of trajectories naturally come with a notion of such a hierarchy. For example, the surface flow in the North Atlantic Ocean can be seen as approximately coherent

In Sect.

We apply our method to a model system that has been used frequently in studies to detect finite-time coherent sets

To test the OPTICS algorithm with a more realistic ocean flow, we simulate surface particle trajectories in a strongly eddying ocean model. Surface velocities are derived from a Nucleus for European Modelling of the Ocean (NEMO) ORCA-N006 run

For

The embedding is necessary to represent the trajectories as points in a metric space. Different options for embedding the trajectories exist, e.g. a direct embedding of the data points along the trajectories

Figure

Different steps for detecting coherent trajectories in Lagrangian data with trajectory clustering. The figure is nonexhaustive, and many more options for embedding and clustering exist. Footnotes:

Most of the studies that use clustering techniques to detect finite-time coherent sets have focused on developing new forms of trajectory embeddings. For example,

A direct embedding of the trajectory data in a high-dimensional Euclidean space, i.e.

A reduction in the trajectory data to a 2D embedding space, using classical multidimensional scaling (MDS; see Sect.

A spectral embedding of the network proposed by

In the following sections, we explain in detail the embeddings of E1 and E2 and the OPTICS algorithm. We introduce the network embedding of E3 together with the corresponding results in Appendix

The direct embedding of each trajectory in

To develop an intuition for what the OPTICS algorithm does, and the differences to

We compute

The detection of dense accumulations of points that are separated from each other by non-dense regions (noise) is the main goal of density-based clustering. We use the OPTICS algorithm by

For

The core distance is simply the minimum radius of a ball around

The ordering of the points is based on the reachability distance of a point

Note that the ordering of points is achieved by constantly updating the ordered seed list (see step 3). In this way, the algorithm iterates through groups of dense points, one after the other, and it only continues with other points once a dense region has been fully explored. Note also that the entire algorithm depends on the choice of the parameter

The main result of the OPTICS algorithm is a reachability plot. This plot is the graph defined by

DBSCAN clustering. Choose a cut-off parameter

The start of the cluster

The end of the cluster

The cluster contains at least

Every point in the inside the cluster is at least a factor of

We refer to

The OPTICS algorithm and functions for deriving both clustering results from an OPTICS output are available in the scikit-learn library in Python. Note that the implementation in the scikit-learn library allows for a minimum cluster size that is different from

Intuitively, the two clustering methods can be understood as follows. DBSCAN detects those groups of points that have a certain minimum density defined by the minimum reachability distance

Our method is closely related to existing methods for detecting finite-time coherent sets with clustering techniques. Most notably,

As mentioned, OPTICS also contains an intrinsic notion of cluster hierarchy, i.e. coherent sets that are themselves part of coherent sets at larger scales.

As described in Sect.

A more recent and powerful technique for detecting finite-time coherent sets in sparse trajectory data was presented by

A downside of our method compared to other approaches is the rather ad hoc choice of embedding (see Eq.

We start with the direct embedding of the Bickley jet flow trajectories (see Sect.

Result of the OPTICS algorithm applied to the direct embedding of the trajectories.

To illustrate the difference between OPTICS and

The corresponding clustering results in real space are shown in Figs.

It should be noted here that the poor performance of

Result of DBSCAN clustering of the 2D embedding of the classical MDS method.

Result of

We finally also tested the performance of our algorithm with a random subset of 2000 particles, using data for every 5 d instead of every day (see Fig.

We next apply OPTICS to the Agulhas trajectories. As described in Sect.

Result of the OPTICS algorithm applied to the direct embedding of the trajectories with different clustering methods. Grey particles correspond to noise.

Figure

Reachability values at the initial time that resulted from the OPTICS algorithm being applied to the direct embedding of the trajectories. The regions with lowest values clearly correspond to Agulhas rings. The colour bar is cut off at a reachability of

In order to illustrate again the difference between OPTICS and

Embedding of the Agulhas trajectories in the 2D space defined by the leading eigenvectors of the MDS kernel matrix

Result of OPTICS applied to the 2D embedding of 12 000 randomly selected particles with the classical MDS method (see Fig.

Result of the

It is interesting to note that the use of classical MDS in Fig.

Spectral embeddings derived from networks, together with partition-based clustering, have a similar problem to the one illustrated in Figs.

The abstract embedding of particle trajectories in a metric space with subsequent clustering is a promising field of research for the detection of finite-time coherent sets in oceanography. Yet, most of the existing methods have been based on graph partitioning, which has no concept of noisy, unclustered trajectories. This is a problem for applications in the ocean, where many eddies are transported in a noisy background flow on large domains. This study is motivated by the success of

We apply OPTICS to Lagrangian particle trajectories directly, in the spirit of

Extending our method to data sets with more trajectories can be made more efficient by choosing a finite generating distance for OPTICS

Spectrum of the classical MDS kernel matrix

Result of the OPTICS algorithm for a random subset of 2000 particles in the Bickley jet flow, with particle data every 5 d instead of every day. To account for the smaller number of particles, we set

Spectrum of the classical MDS kernel matrix

To demonstrate that OPTICS can also be applied to the spectral embedding of a particle-based network, we use the network proposed by

Here,

Figure

Spectrum of the random walk Laplacian (see Eq.

Result of

Applying OPTICS instead of

Result of OPTICS applied to the

All code is available at

DW performed the analysis, with support from CK, EvS and HAD. DW wrote the paper, and all authors jointly edited and revised it.

The authors declare that they have no conflict of interest.

David Wichmann, Christian Kehl and Erik van Sebille have been supported through funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant no. 715386). This work was partially carried out on the Dutch national e-infrastructure, with the support of SURF Cooperative (project no. 16371). We thank Andrew Coward for providing the ORCA-N006 simulation data.

This research has been supported by the European Research Council (TOPIOS (grant no. 715386)).

This paper was edited by Juan Restrepo and reviewed by two anonymous referees.