The basin-wide surface transport of tracers such as heat, nutrients and plastic in the North Atlantic Ocean is organized into large-scale flow structures such as the Western Boundary Current and the Subtropical and Subpolar gyres. Being able to identify these features from drifter data is important for studying tracer dispersal but also for detecting changes in the large-scale surface flow due to climate change. We propose a new and conceptually simple method to detect groups of trajectories with similar dynamical behaviour from drifter data using network theory and normalized cut spectral clustering. Our network is constructed from conditional bin-drifter probability distributions and naturally handles drifter trajectories with data gaps and different lifetimes. The eigenvalue problem of the respective Laplacian can be replaced by a singular value decomposition of a related sparse data matrix. The construction of this matrix scales with

The transport of tracers such as heat, nutrients or plastic in the ocean is an important field of research in oceanography

Here, we propose a new and conceptually simple network-based method to identify groups of trajectories that have similar dynamical behaviour. The network is constructed based on ideas from symbolic dynamics, which describes the coarse-grained trajectory of a particle given some partition (binning) of the state space. The itinerary of a particle, i.e. the sequence of bins it visited, if known for long times, resolves information far below the bin resolution. Different from previous network-based methods, we make full use of the duality between individual particles and their coarse-grained itineraries, which can lead to significant computational advantages. Here, we simplify the itineraries to a minimum: neglecting the time dimension, we represent the trajectory data as a bipartite network connecting particles and bins, with links defined by conditional distributions over symbols. With an appropriate choice of similarity measure, our method allows us to formulate spectral relaxations of the NCut

We show with a model flow, the periodically driven double-gyre flow, that the method accurately finds almost-invariant regions and transport barriers. We also show that the method correctly classifies most of the trajectories in an incomplete data set. Our method can also be used to formulate the clustering problem in a low-dimensional setting by choosing a coarse partition, which can still resolve the leading-order flow structures to a high accuracy. We show this for the double-gyre flow with an effectively nine-dimensional formulation of the clustering problem, which still resolves the leading features of the flow up to details far below the coarse graining scale. Our method is designed for detecting quasi-stationary features from ocean drifter trajectories. We therefore emphasize that, owing to the strong simplifications of the particle itineraries, the method is not suitable for the detection of coherent vortices that are transported in a background flow, such as the “Bickley jet” (discussed e.g. in

Several trajectory-based methods have been applied to the ocean drifter data set on the global scale

We use daily drifter data derived from the 6-hourly interpolated data from the NOAA-AOML Global Drifter Program

Suppose we are given a set of

As a function of time, the matrices

The matrix

Two example trajectories on a binned fluid domain, with alphabet A,

Given a partition

We therefore define another continuous similarity measure

(Req. 1) Invariance to permutation:

(Req. 2) Sensitivity to missing data:

(Req. 3) Zero similarity for disjoint itineraries:

Requirement 1 essentially discards the time dimension. Requirement 2 takes into account that an itinerary with data gaps contains less information than a full itinerary and that a missing data point should be treated just as another symbol (“D”) that is not part of the example itinerary. Requirement 3 states that completely different itineraries have zero similarity. The easiest way to implement requirements 1–3 is by introducing a conditional symbol distribution

All requirements are fulfilled with the similarity measure

Identifying each letter in the symbolic alphabet with a number

This is the projection of the bipartite network

In this section we sketch the method of solving a relaxed version of the NCut according to

Assume we are given an undirected network defined on a discrete set

Here,

As shown in

Such a solution is only approximate, as constraints of the optimization problem are neglected, hence the term “spectral relaxation”; see also

To find

Equation (

Compute the first

Compute

Embed the

Perform a standard Euclidean-space clustering algorithm (here

We choose this algorithm for the double-gyre flow (cf. Sect.

Compute the network

Find the largest connected component of the network and restrict

Compute the eigenvector

Compute

Find a cutoff

Split the original network into two networks, with respective adjacency matrices

For each sub-network, repeat steps H3–H5.

Choose to split the sub-network that minimizes the generalized normalized cut in Eq. (

Repeat steps H3–H8 up to a certain number of sets

At each iteration, only one of the clusters is split into two, and the

Note that there is no general rule to determine the number of clusters

Our method aims to detect groups of particles, with trajectories of different groups having only little overlap. In this sense, our method detects groups of particles with little mixing between each other, which is close to detecting almost-invariant sets according to

There are also major differences between our method and other existing methods that cluster on the particle level

The major drawback of our method is the dependence on a reference frame with respect to which the phase space partition and thus the symbolic itineraries are defined. This can be understood when imagining a time-independent flow from a rotating reference frame. The rotation of the reference frame contributes to a particle's itinerary, and, by averaging over different points in time, non-zero similarities between trajectories can result from the sole rotation of the reference frame. For this reason, our method can not be applied to strongly time-dependent systems such as the Bickley jet model flow where coherent vortices are transported in a periodic background flow. It is, however, still possible to detect transport boundaries in time-dependent flows such as the periodically driven double-gyre flow, as we show in Sect.

To test our method, we choose a model flow that has been used for the detection of coherent structures before

Figure

Clustering of

We emphasize that the clustering of

Clustering of

To test the robustness of our method to missing data, we randomly choose 500 out of the 20 000 particles and for the remaining data set randomly delete 80 % of the data points. This is similar to the approach of

Clustering of

The results for the double-gyre flow illustrate the robustness of our method in identifying the most dominant structures with incomplete trajectory data. Having control over the bin size enables us to tune the network such that it stays connected and the major structures can be resolved. At the same time, small-scale features of the flow seem to be resolved, at least to some extent, independent of the bin size using the

We also tested the algorithm for shorter trajectories (cf. Fig.

We compute the matrix

Figure

The first major split separates the Subpolar Gyre and Nordic Seas from the subtropical and tropical North Atlantic. This splits essentially the Subpolar Gyre from the Subtropical Gyre, which compose together a double-gyre system, having some similarity to the one in Sect.

Next, our algorithm separates the Subpolar Gyre from the Nordic Seas. A relatively clear cut is seen along the Iceland–Scotland ridge. The strength of the transport over the ridge is in fact an old topic in oceanography

We also tested our clustering algorithm without constraining the trajectory length of the data set (see Fig.

To test the sensitivity of the clustering result in Fig.

We introduce a new and conceptually simple method that enables the fast construction and clustering of particle-based networks to detect quasi-stationary regions with similar flow properties. Our method is based on ideas from symbolic dynamics, where a coarse but long particle itinerary can still resolve very detailed structures below the partition size. We implement a conceptually simple form of this idea and construct a bipartite graph that connects particles and bins, with links corresponding to the time-averaged conditional symbol distribution of each particle's trajectory. We use this bipartite graph to define a similarity graph on particle trajectories, to which we apply normalized cut spectral clustering. The bipartite fundament of our method enables us to use singular vectors of a related data matrix to construct a simultaneous

Our results show that although we reduce the amount of processed data to a minimum by considering distributions over particle itineraries only, our method is powerful in handling incomplete trajectory data and is computationally efficient to implement. The basic idea of our algorithm is rooted in dynamical systems theory and symbolic dynamics, where long and coarse particle itineraries slice the state space up to scales much below the partition size. The construction of the sparse data matrix used for the singular value decomposition (SVD) has computational complexity

Despite the performance and the low computational complexity of our method, the construction of the networks defined with itinerary distributions is to a certain extent ad-hoc. The construction is mostly motivated by practical requirements, i.e. the need to define a reasonable similarity measure between particles that is not too exclusive, satisfies some reasonable behaviour regarding missing data and decomposes into block-diagonal structure for invariant flow regions in ideal cases. Due to completely discarding the time dimension, our method is dependent on a fixed reference frame with respect to which the state space partition is defined. Therefore, it can not detect moving Lagrangian vortices, such as those in the “Bickley jet” (discussed e.g. in

For the double-gyre flow, our method successfully identifies known flow features to relatively high detail in the known transport boundaries. We demonstrate that our algorithm performs relatively well under deleting a large part of the trajectory data, making it suitable for real-world applications. We also show that an a priori low-dimensional definition of the clustering problem through a coarse binning can still detect the major flow features with an accuracy down to scales well below the bin resolution. We finally apply hierarchical clustering to the network constructed from drifter data in the North Atlantic, and successfully detect major flow regions such as the Western Boundary Current region, the Subpolar and Subtropical gyres and the Caribbean Sea, providing the first drifter-based clustering of the North Atlantic surface transport using network theory.

For an introduction to finding almost-invariant sets with the transfer operator, see

Note that this relies on the fact that

Clustering of

Clustering of

Share of incorrectly assigned particle labels for the double-gyre flow (simultaneous

Clustering of

Hierarchical clustering result of the double gyre using algorithm 2 of Sect.

Hierarchical clustering result of the double gyre using algorithm 2 of Sect.

Hierarchical clustering result of the double gyre using algorithm 2 of Sect.

Clustering of

Clustering of

All code, including the script to constrain the global drifter data to the North Atlantic, is available at github:

DW performed the analysis with support from CK, EvS and HAD. DW wrote the manuscript and all the authors jointly edited and revised it.

The authors declare that they have no conflict of interest.

David Wichmann, Christian Kehl and Erik van Sebille are supported through funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant agreement no. 715386).

This research has been supported by the European Research Council (ERC) (grant no. 715386).

This paper was edited by Ana M. Mancho and reviewed by two anonymous referees.