<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">NPG</journal-id><journal-title-group>
    <journal-title>Nonlinear Processes in Geophysics</journal-title>
    <abbrev-journal-title abbrev-type="publisher">NPG</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Nonlin. Processes Geophys.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7946</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/npg-28-43-2021</article-id><title-group><article-title>Ordering of trajectories reveals hierarchical finite-time coherent sets in Lagrangian particle data: detecting Agulhas rings in the South Atlantic Ocean</article-title><alt-title>Ordering of trajectories reveals hierarchical finite-time coherent sets</alt-title>
      </title-group><?xmltex \runningtitle{Ordering of trajectories reveals hierarchical finite-time coherent sets}?><?xmltex \runningauthor{D. Wichmann et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Wichmann</surname><given-names>David</given-names></name>
          <email>d.wichmann@uu.nl</email>
        <ext-link>https://orcid.org/0000-0001-5530-8377</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Kehl</surname><given-names>Christian</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-4200-1450</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Dijkstra</surname><given-names>Henk A.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>van Sebille</surname><given-names>Erik</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2041-0704</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Institute for Marine and Atmospheric Research Utrecht, Utrecht University, Utrecht, the Netherlands</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Centre for Complex Systems Studies, Utrecht University, Utrecht, the Netherlands</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">David Wichmann (d.wichmann@uu.nl)</corresp></author-notes><pub-date><day>19</day><month>January</month><year>2021</year></pub-date>
      
      <volume>28</volume>
      <issue>1</issue>
      <fpage>43</fpage><lpage>59</lpage>
      <history>
        <date date-type="received"><day>20</day><month>June</month><year>2020</year></date>
           <date date-type="rev-request"><day>29</day><month>June</month><year>2020</year></date>
           <date date-type="rev-recd"><day>19</day><month>October</month><year>2020</year></date>
           <date date-type="accepted"><day>3</day><month>November</month><year>2020</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 David Wichmann et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021.html">This article is available from https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021.html</self-uri><self-uri xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021.pdf">The full text article is available as a PDF file from https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e113">The detection of finite-time coherent particle sets in Lagrangian trajectory data, using data-clustering techniques, is an active research field at the moment. Yet, the clustering methods mostly employed so far have been based on graph partitioning, which assigns each trajectory to a cluster, i.e. there is no concept of noisy, incoherent trajectories. This is problematic for applications in the ocean, where many small, coherent eddies are present in a large, mostly noisy fluid flow. Here, for the first time in this context, we use the density-based clustering algorithm of OPTICS <xref ref-type="bibr" rid="bib1.bibx1" id="paren.1"><named-content content-type="pre">ordering points to identify the clustering structure;</named-content></xref> to detect finite-time coherent particle sets in Lagrangian trajectory data. Different from partition-based clustering methods, derived clustering results contain a concept of noise, such that not every trajectory needs to be part of a cluster. OPTICS also has a major advantage compared to the previously used density-based spatial clustering of applications with noise (DBSCAN) method, as it can detect clusters of varying density. The resulting clusters have an intrinsically hierarchical structure, which allows one to detect coherent trajectory sets at different spatial scales at once. We apply OPTICS directly to Lagrangian trajectory data in the Bickley jet model flow and successfully detect the expected vortices and the jet. The resulting clustering separates the vortices and the jet from background noise, with an imprint of the hierarchical clustering structure of coherent, small-scale vortices in a coherent, large-scale background flow. We then apply our method to a set of virtual trajectories released in the eastern South Atlantic Ocean in an eddying ocean model and successfully detect Agulhas rings. We illustrate the difference between our approach and partition-based <inline-formula><mml:math id="M1" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering using a 2D embedding of the trajectories derived from classical multidimensional scaling. We also show how OPTICS can be applied to the spectral embedding of a trajectory-based network to overcome the problems of <inline-formula><mml:math id="M2" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means spectral clustering in detecting Agulhas rings.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <?pagebreak page44?><p id="d1e144">Understanding the transport of tracers in the ocean is an important topic in oceanography. Despite large-scale transport features of the mean flow, on smaller scales, mesoscale eddies and jets play an important role for tracer transport <xref ref-type="bibr" rid="bib1.bibx31" id="paren.2"/>. Such eddies can capture large amounts of a tracer and, while transported in a background flow, redistribute them in the ocean. Eddies have been shown to play an important role in the accumulation of plastic <xref ref-type="bibr" rid="bib1.bibx5" id="paren.3"/> and the transport of heat and salt <xref ref-type="bibr" rid="bib1.bibx9" id="paren.4"/>. To quantify the effects of eddies on tracer transport in the ocean, it is necessary to develop methods that are able to detect and track them. Many methods exist to detect such finite-time coherent sets of fluid parcels based on different mathematical or heuristic principles <xref ref-type="bibr" rid="bib1.bibx20" id="paren.5"/>. The term finite-time coherent set is based on the work of <xref ref-type="bibr" rid="bib1.bibx15" id="text.6"/> and is, in our context, defined as a set of particles that, in a sense, stay specifically close to each other along their entire trajectories. Here, for the first time in this context, we make use of the density-based clustering algorithm OPTICS <xref ref-type="bibr" rid="bib1.bibx1" id="paren.7"><named-content content-type="pre">ordering points to identify the clustering structure;</named-content></xref> to detect finite-time coherent sets in Lagrangian trajectory data.</p>
      <p id="d1e168">The detection of coherent Lagrangian vortices using abstract embeddings of Lagrangian trajectories together with data-clustering techniques has received significant attention in the recent literature <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx19 bib1.bibx25 bib1.bibx2 bib1.bibx27 bib1.bibx13 bib1.bibx18" id="paren.8"/>. Using embedded trajectories for the detection of finite-time coherent sets is interesting as it allows one to use sparse trajectory data, and it can, in principle, be applied to ocean drifter trajectories, as demonstrated by <xref ref-type="bibr" rid="bib1.bibx14" id="text.9"/> and <xref ref-type="bibr" rid="bib1.bibx2" id="text.10"/> for the detection of the five ocean basins. Yet, most of these methods cluster trajectory data with graph partitioning, which does not incorporate the difference between coherent, clustered trajectories and noisy trajectories that should not belong to any cluster. Graph partitioning has been shown to work in situations where the finite-time coherent sets are not too small compared to the fluid domain <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx19 bib1.bibx25 bib1.bibx2 bib1.bibx13" id="paren.11"/>. For applications to Lagrangian trajectory data sets on basin-scale ocean domains, where multiple small-scale coherent sets (eddies) coexist with noisy trajectories in the background, graph partitioning is, however, likely to fail. Similar observations were made by <xref ref-type="bibr" rid="bib1.bibx18" id="text.12"/> for the partition-based clustering approaches based on transfer and dynamic Laplace operators <xref ref-type="bibr" rid="bib1.bibx13" id="paren.13"/>. Although some attempts have been made to accommodate such concepts in hard partitioning, e.g. by incorporating one additional cluster corresponding to noise <xref ref-type="bibr" rid="bib1.bibx19" id="paren.14"/>, this approach is likely to fail for large ocean domains, as discussed by <xref ref-type="bibr" rid="bib1.bibx18" id="text.15"/> and shown in Sect. <xref ref-type="sec" rid="Ch1.S4"/> of this paper. <xref ref-type="bibr" rid="bib1.bibx18" id="text.16"/> have developed a special form of trajectory embedding, based on sparse eigenbasis decomposition, given the eigenvectors of transfer operators and dynamic Laplacians. By superposing different sparse eigenvectors, they successfully separate coherent vortices from unclustered background noise.</p>
      <p id="d1e201">Motivated by the results <xref ref-type="bibr" rid="bib1.bibx18" id="text.17"/> obtained by developing a new form of trajectory embedding, we here explore the potential of another clustering algorithm to overcome the inherent problems of partition-based clustering. We use the density-based clustering method of OPTICS, developed by <xref ref-type="bibr" rid="bib1.bibx1" id="text.18"/>, to detect finite-time coherent sets in large ocean domains, using a very simple choice of embedding (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS1"/>). Density-based clustering aims to detect groups of data points that are close to each other, i.e. regions with high data density. Our data points correspond to entire trajectories, and groups of trajectories staying close to each other over a certain time interval correspond to such regions of high point density. Different from partition-based methods such as <inline-formula><mml:math id="M3" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means or fuzzy-<inline-formula><mml:math id="M4" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula>-means, OPTICS does not require one to fix the number of clusters beforehand. Furthermore, density-based clustering has an intrinsic notion of a noisy data point – a point does not belong to any cluster (i.e. a finite-time coherent set) if it is not part of a dense region. A more detailed comparison of the method presented here to existing related methods can be found in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>.</p>
      <p id="d1e229">Another desirable property of the OPTICS algorithm is its ability to capture coherence hierarchies. In the ocean, coherent sets of trajectories naturally come with a notion of such a hierarchy. For example, the surface flow in the North Atlantic Ocean can be seen as approximately coherent <xref ref-type="bibr" rid="bib1.bibx16" id="paren.19"/>, while mesoscale eddies and jets are also finite-time coherent sets of trajectories at smaller scales within the North Atlantic Ocean. <xref ref-type="bibr" rid="bib1.bibx18" id="text.20"/> show how their leading eigenvectors resolve coherent sets at large scales, while small-scale results can be obtained with a sparse eigenbasis approximation of a set of eigenvectors. Similarly, clustering results obtained from OPTICS is typically hierarchical. The main result of OPTICS, i.e. the reachability plot, provides this hierarchical information in a simple 1D graph.</p>
      <p id="d1e239">In Sect. <xref ref-type="sec" rid="Ch1.S4"/>, we first show how OPTICS detects finite-time coherent sets at different scales for the Bickley jet model flow (also discussed, e.g., by <xref ref-type="bibr" rid="bib1.bibx20" id="altparen.21"/>) and successfully detects the six coherent vortices and the jet as the steepest valleys in the reachability plot. The general structure of the reachability plot also reveals the large-scale finite-time coherent sets, i.e. the northern and southern parts of the model flow, separated by the jet. We then apply our method to Lagrangian particle trajectories released in the eastern South Atlantic Ocean, where large rings detach from the Agulhas Current (e.g. <xref ref-type="bibr" rid="bib1.bibx28" id="altparen.22"/>). We detect several Agulhas rings and, on the larger scale, also separate the eastward- and westward-moving branches of the South Atlantic subtropical gyre. While the traditional approach to studying Agulhas rings is based on sea surface height analysis <xref ref-type="bibr" rid="bib1.bibx8" id="paren.23"><named-content content-type="pre">see, e.g.,</named-content></xref>, several methods based on virtual Lagrangian trajectories have been applied to Agulhas ring detection before <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx3 bib1.bibx17 bib1.bibx19 bib1.bibx30" id="paren.24"/>. Our method is different from these approaches in that it is directly applicable to a trajectory data set, i.e. without much preprocessing of the data. As the OPTICS algorithm is readily available in the scikit-learn library in Python, the detection of finite-time coherent sets can be done without much effort and with only a few lines of code. A further difference is the mentioned intrinsic notion of coherence hierarchy, which allows for simultaneous analysis of trajectory data at different scales. While we mainly focus on the direct embedding of trajectories in an abstract, high-dimensional Euclidean space, we also show in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/> that OPTICS can be used to overcome the limits of <inline-formula><mml:math id="M5" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering in the context of spectral clustering of the trajectory-based network of <xref ref-type="bibr" rid="bib1.bibx25" id="text.25"/>.</p>
</sec>
<?pagebreak page45?><sec id="Ch1.S2">
  <label>2</label><title>Trajectory data sets</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Quasi-periodically perturbed Bickley jet</title>
      <p id="d1e286">We apply our method to a model system that has been used frequently in studies to detect finite-time coherent sets <xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx25 bib1.bibx19 bib1.bibx2 bib1.bibx13" id="paren.26"/>. The velocity field of the quasi-periodically perturbed Bickley jet <xref ref-type="bibr" rid="bib1.bibx4 bib1.bibx6" id="paren.27"/> is defined by a stream function <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mi mathvariant="italic">ψ</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, i.e. <inline-formula><mml:math id="M7" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mo>∂</mml:mo><mml:mi mathvariant="italic">ψ</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">˙</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mo>∂</mml:mo><mml:mi mathvariant="italic">ψ</mml:mi></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:mi>x</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mi mathvariant="italic">ψ</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> consisting of a stationary eastward background flow as follows:
            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M10" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>U</mml:mi><mml:mi>L</mml:mi><mml:mi>tanh⁡</mml:mi><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>/</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          and a time-dependent perturbation, as follows:
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M11" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{9.5}{9.5}\selectfont$\displaystyle}?><mml:msub><mml:mi mathvariant="italic">ψ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>U</mml:mi><mml:mi>L</mml:mi><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msup><mml:mi mathvariant="normal">sech</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>/</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi mathvariant="normal">Re</mml:mi><mml:mfenced open="[" close="]"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mn mathvariant="normal">3</mml:mn></mml:munderover><mml:msub><mml:mi>f</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mi>exp⁡</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:msub><mml:mi>k</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>,</mml:mo><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mi mathvariant="normal">Re</mml:mi><mml:mo>(</mml:mo><mml:mi>z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the real part of the complex number <inline-formula><mml:math id="M13" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula>. We use the same parameter values as <xref ref-type="bibr" rid="bib1.bibx20" id="text.28"/>, with <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">62.66</mml:mn></mml:mrow></mml:math></inline-formula> m/s the characteristic velocity of the zonal background flow, and <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1770</mml:mn></mml:mrow></mml:math></inline-formula> km. The parameters in Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>) are given by <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi>n</mml:mi><mml:mo>/</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi>exp⁡</mml:mi><mml:mo>(</mml:mo><mml:mo>-</mml:mo><mml:mi>i</mml:mi><mml:msub><mml:mi>k</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:msub><mml:mi>c</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.075</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.1446</mml:mn><mml:mi>U</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.205</mml:mn><mml:mi>U</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.461</mml:mn><mml:mi>U</mml:mi></mml:mrow></mml:math></inline-formula>. The domain of interest is <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi mathvariant="normal">Ω</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="italic">π</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>]</mml:mo><mml:mo>×</mml:mo><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3000</mml:mn><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mn mathvariant="normal">3000</mml:mn><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mi mathvariant="normal">km</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">6371</mml:mn></mml:mrow></mml:math></inline-formula> km is the radius of the Earth, and the left and right edges of <inline-formula><mml:math id="M27" display="inline"><mml:mi mathvariant="normal">Ω</mml:mi></mml:math></inline-formula> are identified, i.e. the flow is periodic in the <inline-formula><mml:math id="M28" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> direction with period <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi mathvariant="italic">π</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. Similar to <xref ref-type="bibr" rid="bib1.bibx2" id="text.29"/>, we seed the domain with an initial number of 12 000 particles on a uniform <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mn mathvariant="normal">200</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">60</mml:mn></mml:mrow></mml:math></inline-formula> grid. For this choice, the initial particle spacing is slightly above <inline-formula><mml:math id="M31" display="inline"><mml:mn mathvariant="normal">100</mml:mn></mml:math></inline-formula> km in both directions. We compute the trajectories for 40 d with a time step of 1 s using the SciPy integrate package. We output the trajectories every day, i.e. we have <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">41</mml:mn></mml:mrow></mml:math></inline-formula> data points in time for each trajectory.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Agulhas rings in the South Atlantic</title>
      <p id="d1e918">To test the OPTICS algorithm with a more realistic ocean flow, we simulate surface particle trajectories in a strongly eddying ocean model. Surface velocities are derived from a Nucleus for European Modelling of the Ocean (NEMO) ORCA-N006 run <xref ref-type="bibr" rid="bib1.bibx24" id="paren.30"/>, which has a horizontal resolution of <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and velocity output for every 5 d. The model is forced by reanalysis and the observed data of wind, heat and freshwater fluxes <xref ref-type="bibr" rid="bib1.bibx10" id="paren.31"/>, i.e. the currents do not only contain the geostrophic component, as is the case in altimetry-derived currents <xref ref-type="bibr" rid="bib1.bibx3 bib1.bibx18" id="paren.32"/>. For the advection of virtual particles, we use version 1.11 of the open source Parcels framework <xref ref-type="bibr" rid="bib1.bibx22" id="paren.33"><named-content content-type="post">see <uri>http://oceanparcels.org/</uri>, last access: 2 January 2021</named-content></xref>. The 2D surface current velocity is interpolated in space and time with the C-grid interpolation scheme of <xref ref-type="bibr" rid="bib1.bibx7" id="text.34"/>, using a fourth-order Runge–Kutta method with a time step of 10 min. We initially distribute particles uniformly in the ocean on the vertices of a <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.2</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup><mml:mo>×</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> grid in the domain (30<inline-formula><mml:math id="M35" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W, 20<inline-formula><mml:math id="M36" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> E) <inline-formula><mml:math id="M37" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> (40<inline-formula><mml:math id="M38" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> S, 20 <inline-formula><mml:math id="M39" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> S), which corresponds to a total number of 23 821 particles. At <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mn mathvariant="normal">30</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> S, a spacing of <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.2</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> corresponds to roughly <inline-formula><mml:math id="M42" display="inline"><mml:mn mathvariant="normal">20</mml:mn></mml:math></inline-formula> km. The particles start on 5 January 2000 and are advected for 2 years. We output the trajectories with a time interval of 5 d. We only use the first 100 d as data to detect the finite-time coherent sets, i.e. we have <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">21</mml:mn></mml:mrow></mml:math></inline-formula> data points for each trajectory, but also look at later times to see how long the rings need to disperse. We provide the used trajectory data for the Agulhas flow as a NumPy file on Zenodo <xref ref-type="bibr" rid="bib1.bibx34" id="paren.35"/>.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methods</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Detecting coherent structures in Lagrangian trajectory data</title>
      <p id="d1e1084">For <inline-formula><mml:math id="M44" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> trajectories of dimension <inline-formula><mml:math id="M45" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula> and length <inline-formula><mml:math id="M46" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula>, the trajectory information can be stored in a data matrix <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, where each row results from a particle trajectory by concatenating the different spatial dimensions. The analysis of the trajectory data to detect the finite-time coherent sets of trajectories <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx2 bib1.bibx19 bib1.bibx25 bib1.bibx27 bib1.bibx13 bib1.bibx37" id="paren.36"/> can be split into the following two essential steps:
<list list-type="custom"><list-item><label> </label>
      <p id="d1e1136"><italic>Step 1.</italic>  Embedding of the trajectories in an abstract (metric) space, i.e. <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>→</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>≤</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula>. If one uses a dimensionality reduction method, then <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>&lt;</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label> </label>
      <p id="d1e1197"><italic>Step 2.</italic>  Clustering of the embedded data with a clustering algorithm.</p></list-item></list></p>
      <p id="d1e1202">The embedding is necessary to represent the trajectories as points in a metric space. Different options for embedding the trajectories exist, e.g. a direct embedding of the data points along the trajectories <xref ref-type="bibr" rid="bib1.bibx14" id="paren.37"/> or embeddings based on the eigenvectors derived from networks that are defined by physically motivated trajectory similarities <xref ref-type="bibr" rid="bib1.bibx2 bib1.bibx25 bib1.bibx2 bib1.bibx13" id="paren.38"/>. Once an embedding of each trajectory as a point in a metric (typically Euclidean) space is established, one can apply a clustering algorithm. Roughly speaking, clustering algorithms try to identify groups of points that are close to each other as a cluster. Partition-based clustering methods divide the entire data set into a (typically fixed) number of <inline-formula><mml:math id="M51" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> clusters, such that each data point belongs to<?pagebreak page46?> a cluster. The most popular method in this category is the <inline-formula><mml:math id="M52" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means algorithm, which tries to find a given number of <inline-formula><mml:math id="M53" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> clusters such that the sum of the pairwise squared distances of points within a cluster is minimized. Other clustering algorithms contain a concept of noisy data, i.e. data points that do not belong to any cluster or belong to a cluster only with a certain probability. Examples of the former case are density-based spatial clustering of applications with noise <xref ref-type="bibr" rid="bib1.bibx11" id="paren.39"><named-content content-type="pre">DBSCAN;</named-content></xref>, as discussed by <xref ref-type="bibr" rid="bib1.bibx27" id="text.40"/> in the fluid dynamics context, and the OPTICS <xref ref-type="bibr" rid="bib1.bibx1" id="paren.41"/> algorithm presented here. For the latter case, the most popular method is fuzzy-<inline-formula><mml:math id="M54" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula>-means clustering, as discussed by <xref ref-type="bibr" rid="bib1.bibx14" id="text.42"/> in the context of finite-time coherent sets.</p>
      <p id="d1e1254">Figure <xref ref-type="fig" rid="Ch1.F1"/> shows a few possible options for trajectory embedding and clustering that have partially been explored before (see the footnotes in the figure for the combinations used in related studies). For a given trajectory data set, one can, in principle, apply an arbitrary combination of embedding and clustering methods. Only a few of the different combinations have been explored so far, and many more options for embedding and clustering (like those shown in Fig. <xref ref-type="fig" rid="Ch1.F1"/>) exist. It is important to note that a good choice of embedding and clustering might well depend on the specific problem at hand, and there might be no combination that performs well for all possible situations.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><label>Figure 1</label><caption><p id="d1e1264">Different steps for detecting coherent trajectories in Lagrangian data with trajectory clustering. The figure is nonexhaustive, and many more options for embedding and clustering exist. Footnotes: <inline-formula><mml:math id="M55" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">1</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx14" id="text.43"/>. <inline-formula><mml:math id="M56" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx19" id="text.44"/>, <xref ref-type="bibr" rid="bib1.bibx25" id="text.45"/> and <xref ref-type="bibr" rid="bib1.bibx2" id="text.46"/> all define networks with spectral embedding and subsequent <inline-formula><mml:math id="M57" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering. <xref ref-type="bibr" rid="bib1.bibx18" id="text.47"/> define spectral embeddings as being on dynamic Laplacian and transfer operators. <inline-formula><mml:math id="M58" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx27" id="text.48"/>.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f01.png"/>

        </fig>

      <p id="d1e1326">Most of the studies that use clustering techniques to detect finite-time coherent sets have focused on developing new forms of trajectory embeddings. For example, <xref ref-type="bibr" rid="bib1.bibx19" id="text.49"/>, <xref ref-type="bibr" rid="bib1.bibx25" id="text.50"/>, <xref ref-type="bibr" rid="bib1.bibx2" id="text.51"/> and <xref ref-type="bibr" rid="bib1.bibx13" id="text.52"/> all use different forms of spectral embeddings together with <inline-formula><mml:math id="M59" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering. <xref ref-type="bibr" rid="bib1.bibx18" id="text.53"/> have developed a powerful form of embedding based on a sparse eigenbasis approximation. Here, we focus on the clustering step in Fig. <xref ref-type="fig" rid="Ch1.F1"/> and propose the OPTICS clustering algorithm in the fluid dynamics context. We test the algorithm for the following three different kinds of embeddings:
<list list-type="custom"><list-item><label>E1.</label>
      <p id="d1e1356">A direct embedding of the trajectory data in a high-dimensional Euclidean space, i.e. <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula> (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS1"/>).</p></list-item><list-item><label>E2.</label>
      <p id="d1e1376">A reduction in the trajectory data to a 2D embedding space, using classical multidimensional scaling (MDS; see Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS2"/>). This is mainly to visualize the difference from partition-based <inline-formula><mml:math id="M61" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering.</p></list-item><list-item><label>E3.</label>
      <p id="d1e1389">A spectral embedding of the network proposed by <xref ref-type="bibr" rid="bib1.bibx25" id="text.54"/>.</p></list-item></list></p>
      <p id="d1e1395">In the following sections, we explain in detail the embeddings of E1 and E2 and the OPTICS algorithm. We introduce the network embedding of E3 together with the corresponding results in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Trajectory embedding</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Direct embedding</title>
      <p id="d1e1415">The direct embedding of each trajectory in <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the most straightforward embedding as it requires no further preprocessing of the trajectory data. For simplicity, assume we are given a set of <inline-formula><mml:math id="M63" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> trajectories in a 3D space, i.e. <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. We then simply define the embedding of trajectory <inline-formula><mml:math id="M67" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> in the abstract <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula>-dimensional space as follows:
              <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M69" display="block"><mml:mtable class="split" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
            and impose an Euclidean metric in <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> to measure distances between the different embedded trajectories. The resulting embedded data matrix <inline-formula><mml:math id="M71" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover></mml:math></inline-formula> is then simply given by the vertical concatenation of the different embedding vectors. This kind of embedding was also explored by <xref ref-type="bibr" rid="bib1.bibx14" id="text.55"/>, together with a fuzzy-<inline-formula><mml:math id="M72" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula>-means clustering. Intuitively, if two trajectories <inline-formula><mml:math id="M73" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M74" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> belong to the same finite-time coherent set, the corresponding particles follow very similar pathways, i.e. the Euclidean distance of the embedding vectors <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> is expected to be small. On the other hand, a particle <inline-formula><mml:math id="M76" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> that belongs to a coherent set is expected to have a larger distance to a particle <inline-formula><mml:math id="M77" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> that is not part of the set. In other words, groups of particles that form a finite-time coherent set are dense in the embedding space. This motivates the use of a density-based clustering algorithm to detect finite-time coherent sets.<?xmltex \hack{\\}?>To take into account the <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:mi mathvariant="italic">π</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> periodicity in the <inline-formula><mml:math id="M79" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> direction of the Bickley jet flow, we first put the individual 2D data points on the surface of a cylinder with radius <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> in <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and interpret the resulting trajectories in a 3D Euclidean space. The resulting data matrix is <inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">41</mml:mn></mml:mrow></mml:math></inline-formula>. For the Agulhas particles, we put the single data points on the Earth's surface in a 3D Euclidean embedding space by the standard coordinate transformation of spherical to Euclidean coordinates. The resulting data matrix is thus <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold">X</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">23</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">821</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">21</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Dimensionality reduction with classical multidimensional scaling</title>
      <p id="d1e2008">To develop an intuition for what the OPTICS algorithm does, and the differences to <inline-formula><mml:math id="M88" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, we wish to visualize the data structure in the plane. For this visualization, it is necessary to reduce the embedding dimension of each trajectory from <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mi>T</mml:mi></mml:mrow></mml:math></inline-formula> to two in such a way that the density structure, and hence the individual Euclidean distances between embedded trajectories <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> (see Eq. <xref ref-type="disp-formula" rid="Ch1.E3"/>), are preserved. We do so through a common method of nonlinear dimensionality reduction, called classical multidimensional scaling <xref ref-type="bibr" rid="bib1.bibx12" id="paren.56"><named-content content-type="pre">MDS; see, e.g., chap. 10.3 of</named-content></xref>. Classical MDS tries to find an embedding of the high-dimensional<?pagebreak page47?> data points in a low-dimensional space such that the pairwise distances are approximately preserved. Similar to a principal component analysis, classical MDS makes use of the eigenvectors corresponding to the largest eigenvalues of a kernel matrix, which is, in this case, defined by the following:
              <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M91" display="block"><mml:mrow><mml:mi mathvariant="bold">B</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mi mathvariant="bold">H</mml:mi><mml:msup><mml:mi mathvariant="bold">Δ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi mathvariant="bold">H</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
            where <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Δ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is a matrix containing all squared distances between the points, <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="normal">Δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:msup><mml:mo>|</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M94" display="inline"><mml:mi mathvariant="bold">H</mml:mi></mml:math></inline-formula> is the centring matrix with <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">δ</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the Kronecker delta. The matrix <inline-formula><mml:math id="M97" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) is called the centred inner product matrix. If <inline-formula><mml:math id="M98" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold">B</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula> is the matrix of inner products of the embedded data points, i.e. <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>B</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with Euclidean scalar product, then <inline-formula><mml:math id="M100" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> can be obtained by removing the mean of all rows and columns of <inline-formula><mml:math id="M101" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold">B</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx12" id="paren.57"><named-content content-type="pre">see chap. 10.3 of</named-content></xref>. An embedding of the data points using the eigenvectors corresponding to the leading nonnegative eigenvalues of <inline-formula><mml:math id="M102" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) ensures that one captures the main variance of the (squared) distance structure, similar to a principal component analysis.</p>
      <p id="d1e2301">We compute <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">Δ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> with the Euclidean embedding described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS1"/> and restrict ourselves to the first 2D to visualize the data structure in the plane, i.e. the embedding is defined by the following:
              <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M104" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
            where <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:mi mathvariant="bold">B</mml:mi><mml:msub><mml:mi>w</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>≥</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>≥</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for all <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. This choice of embedding ensures that the main variance of the data points is captured, and we therefore also expect to capture the main structure in terms of data density. For large particle sets, however, computing the spectrum of <inline-formula><mml:math id="M108" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) is computationally not feasible as the matrix <inline-formula><mml:math id="M109" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> is dense and computing the spectrum scales with <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>N</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. We apply classical MDS to the 12 000 particles of the Bickley jet model flow and a random selection of the equal number of particles for the Agulhas flow. In our context, the method is most useful for visualization purposes as it provides a good 2D approximation of the point distances, i.e. also the density structure of the embedded trajectories.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Clustering with OPTICS</title>
      <p id="d1e2491">The detection of dense accumulations of points that are separated from each other by non-dense regions (noise) is the main goal of density-based clustering. We use the OPTICS algorithm by <xref ref-type="bibr" rid="bib1.bibx1" id="text.58"/> to detect these regions. The OPTICS algorithm can be seen as an extension of DBSCAN <xref ref-type="bibr" rid="bib1.bibx11" id="paren.59"/>. As we have no prior information on the density structure of the embedded nodes, we set the generating distance of OPTICS to infinity, and our presentation here is limited to this case. The general OPTICS algorithm with finite generating distance is computationally more efficient and slightly more complicated, and we refer to <xref ref-type="bibr" rid="bib1.bibx1" id="text.60"/> for more details.</p>
      <p id="d1e2503">For <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>∈</mml:mo><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:math></inline-formula>, the <inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> neighbourhood of a point <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>M</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> is defined as the <inline-formula><mml:math id="M114" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>-dimensional ball of radius <inline-formula><mml:math id="M115" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> around <inline-formula><mml:math id="M116" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula>. We define <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi mathvariant="italic">δ</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as the number of points that is in the <inline-formula><mml:math id="M118" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> neighbourhood of <inline-formula><mml:math id="M119" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula>, including <inline-formula><mml:math id="M120" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula> itself. OPTICS requires one parameter, i.e. an integer <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> (called MinPts by <xref ref-type="bibr" rid="bib1.bibx1" id="altparen.61"/>), to define the core distance of a point <inline-formula><mml:math id="M122" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula> as follows:
            <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M123" display="block"><mml:mrow><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mo movablelimits="false">min⁡</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>)</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>|</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msub><mml:mi>M</mml:mi><mml:mi mathvariant="italic">δ</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>)</mml:mo><mml:mo>≥</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo mathvariant="italic">}</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e2674">The core distance is simply the minimum radius of a ball around <inline-formula><mml:math id="M124" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula>, such that the ball contains <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> points. Note that the generating distance that we set to infinity is a maximum cut-off distance for the computation of the core distance in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>), beyond which the core distance is not defined. As<?pagebreak page48?> we do not have an intuition for a good value of such a cut-off, we remove it by setting it to infinity.</p>
      <p id="d1e2697">The ordering of the points is based on the reachability distance of a point <inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula> with regards to another point <inline-formula><mml:math id="M127" display="inline"><mml:mi mathvariant="bold-italic">q</mml:mi></mml:math></inline-formula> and is defined as follows:
            <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M128" display="block"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>|</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo movablelimits="false">max⁡</mml:mo><mml:mo>(</mml:mo><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula>, in our case, denotes the Euclidean distance between <inline-formula><mml:math id="M130" display="inline"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M131" display="inline"><mml:mi mathvariant="bold-italic">q</mml:mi></mml:math></inline-formula>. The ordering of points is then constructed with the following scheme:
<list list-type="custom"><list-item><label> </label>
      <p id="d1e2805"><italic>Step 1</italic>.  Pick a point <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. This is the first point in the order, and it is arbitrary.</p></list-item><list-item><label> </label>
      <p id="d1e2822"><italic>Step 2</italic>.  Compute the core distance <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label> </label>
      <p id="d1e2856"><italic>Step 3</italic>.  Define an ordered seed list containing all other points, i.e. <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>. For each point <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, define the reachability value <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as the reachability distance (Eq. <xref ref-type="disp-formula" rid="Ch1.E7"/>) with regards to <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Order the list in ascending order of the <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label> </label>
      <p id="d1e2989"><italic>Step 4</italic>.  Pick the first point on the ordered seed list as <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and compute the core distance <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. For all remaining points, i.e. <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>, update the reachability value <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>→</mml:mo><mml:mo>min⁡</mml:mo><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label> </label>
      <p id="d1e3111"><italic>Step 5</italic>.  Update the ordered seed list according to the new reachability.</p></list-item><list-item><label> </label>
      <p id="d1e3117"><italic>Step 6</italic>.  Repeat steps 4–5 to obtain <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. Continue until all points are processed.</p></list-item></list></p>
      <p id="d1e3134">Note that the ordering of points is achieved by constantly updating the ordered seed list (see step 3). In this way, the algorithm iterates through groups of dense points, one after the other, and it only continues with other points once a dense region has been fully explored. Note also that the entire algorithm depends on the choice of the parameter <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>. The value of  <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> should be chosen roughly as a minimum value of the expected cluster size. In the examples presented in this paper, we take values for <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> that correspond to the estimated minimum size of the coherent sets.</p>
      <p id="d1e3170">The main result of the OPTICS algorithm is a reachability plot. This plot is the graph defined by <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> by definition. The reachability plot is a powerful presentation of the global and local distribution of a set of points at once. The valleys in this plot correspond to dense regions, which we relate to finite-time coherent sets. We show examples of reachability plots in Sect. <xref ref-type="sec" rid="Ch1.S4"/>. Given the reachability plot <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, we use the following two common ways to derive a clustering result:
<list list-type="order"><list-item>
      <p id="d1e3248">DBSCAN clustering. Choose a cut-off parameter <inline-formula><mml:math id="M154" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> and define all points <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula> as core points. All points that are not in the <inline-formula><mml:math id="M157" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> neighbourhood of a core point are defined as noise. This set of noisy data points is equivalent to all points <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> that are not core points and have a reachability value <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>&gt;</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula>. A cluster of size <inline-formula><mml:math id="M161" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> is then defined as a consecutive set (in the sense of the ordering) of non-noise points <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, with the adjacent points of <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> being noise. This is similar to the clustering result of a DBSCAN run with equal values for <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M166" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. All possible realizations of DBSCAN clusters, with the same value for <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>, can therefore be derived from the reachability values, core distances and the ordering determined by OPTICS. Up to boundary points, a DBSCAN clustering result can be obtained by drawing horizontal lines in the reachability plot (see Sect. <xref ref-type="sec" rid="Ch1.S4"/>).</p></list-item><list-item>
      <p id="d1e3466"><inline-formula><mml:math id="M168" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering. While the DBSCAN clustering method looks for deep valleys in the reachability plot, this method looks for valleys with steep boundaries. In short, the larger a parameter <inline-formula><mml:math id="M169" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula> with <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, the steeper the boundary of a valley has to be to be classified as a cluster. In more detail, a <inline-formula><mml:math id="M171" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-cluster is defined as a consecutive set of points <inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> that has steep boundaries in the sense that for a parameter <inline-formula><mml:math id="M173" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. This leads to the following:
<list list-type="custom"><list-item><label>a.</label>
      <p id="d1e3578">The start of the cluster <inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is in a <inline-formula><mml:math id="M176" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep downward area. A <inline-formula><mml:math id="M177" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep downward area is a maximal set of consecutive points <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, where (1) <inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> are <inline-formula><mml:math id="M182" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep downward points, i.e. <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>≤</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>)</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>≤</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>)</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, (2) <inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>≤</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>l</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for all <inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula>, and (3) not more than <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> consecutive points in the set are no <inline-formula><mml:math id="M188" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep downward points.</p></list-item><list-item><label>b.</label>
      <p id="d1e3876">The end of the cluster <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is a <inline-formula><mml:math id="M190" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep upward area. The definitions are the reverse of the <inline-formula><mml:math id="M191" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep downward area, with the definition of a <inline-formula><mml:math id="M192" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-steep upward point being <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>≤</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>)</mml:mo><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label>c.</label>
      <p id="d1e3967">The cluster contains at least <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> points, i.e. <inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>≥</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label>d.</label>
      <p id="d1e3997">Every point in the inside the cluster is at least a factor of <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> smaller than the boundary points <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. All points that do not belong to a cluster are classified as noise.</p></list-item></list></p></list-item></list></p>
      <p id="d1e4047">We refer to <xref ref-type="bibr" rid="bib1.bibx1" id="text.62"/> for a more detailed discussion of the <inline-formula><mml:math id="M199" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method, with illustrations for example data. Note that the full <inline-formula><mml:math id="M200" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method presented by <xref ref-type="bibr" rid="bib1.bibx1" id="text.63"/> contains some more details related to the choice of the start and end points which we did not mention here.</p>
      <?pagebreak page49?><p id="d1e4070">The OPTICS algorithm and functions for deriving both clustering results from an OPTICS output are available in the scikit-learn library in Python. Note that the implementation in the scikit-learn library allows for a minimum cluster size that is different from <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> for the <inline-formula><mml:math id="M202" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method (item 2c above), but we will not make use of this additional freedom to reduce the number of parameters. Note that, different from <inline-formula><mml:math id="M203" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, both clustering methods do not require an a priori determination of the number of clusters. For the <inline-formula><mml:math id="M204" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method, a larger <inline-formula><mml:math id="M205" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula> requires steeper boundaries to form a cluster, i.e. it will typically lead to a reduction in the number of resulting clusters. For DBSCAN clustering with very large <inline-formula><mml:math id="M206" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>, one will detect one large global cluster. Making <inline-formula><mml:math id="M207" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> smaller then leads to consecutive splits of this cluster, forming (up to noise) a cluster hierarchy. We will demonstrate the properties for both clustering methods in Sect. <xref ref-type="sec" rid="Ch1.S4"/> for different situations. In the following applications, we use an estimation of the minimum number of particles per finite-time coherent set for the parameter <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d1e4140">Intuitively, the two clustering methods can be understood as follows. DBSCAN detects those groups of points that have a certain minimum density defined by the minimum reachability distance <inline-formula><mml:math id="M209" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. Clusters detected by DBSCAN are therefore defined by a global density criterion. This assumes no structural differences in the type of coherent sets in different regions of the fluid. Different from that, the <inline-formula><mml:math id="M210" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method detects clusters by finding strong changes in the density of the data points, and it is not based on absolute densities. This has an advantage in that clusters of different absolute densities can be detected. Such a situation can arise if the distribution of particles is inhomogeneous over the fluid domain or if the spatial extend of the fluid domain is very large, such that the properties of finite-time coherent sets vary significantly. It is important to note that the main result of OPTICS is the reachability plot itself. The DBSCAN- and <inline-formula><mml:math id="M211" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering methods should be seen as useful tools for identifying the most important features of that plot.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Comparison to related methods</title>
      <p id="d1e4172">Our method is closely related to existing methods for detecting finite-time coherent sets with clustering techniques. Most notably, <xref ref-type="bibr" rid="bib1.bibx14" id="text.64"/> also use a direct embedding of individual trajectories similar to Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>), together with fuzzy-<inline-formula><mml:math id="M212" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula>-means clustering. <xref ref-type="bibr" rid="bib1.bibx19" id="text.65"/>, <xref ref-type="bibr" rid="bib1.bibx2" id="text.66"/>, <xref ref-type="bibr" rid="bib1.bibx25" id="text.67"/> and <xref ref-type="bibr" rid="bib1.bibx13" id="text.68"/> use spectral embeddings of graphs that are defined by some form of physical intuition, or by dynamical operators, together with <inline-formula><mml:math id="M213" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering. These studies show applications of their methods to example flows where the size of almost-coherent sets is not too small compared to the fluid domain. Such examples are the Bickley jet flow, which we also study in Sect. <xref ref-type="sec" rid="Ch1.S4.SS1"/>, the five major ocean basins <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx2" id="paren.69"/>, a few individual eddies in an ocean or atmospheric flow <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx25 bib1.bibx13" id="paren.70"/>. In such situations, noisy background trajectories can be detected as individual clusters by the partitioning method, as discussed by <xref ref-type="bibr" rid="bib1.bibx19" id="text.71"/>. For applications in large ocean domains, where the number of eddies is not known beforehand and where there are many more noisy trajectories than coherent trajectories, such an approach is likely to fail <xref ref-type="bibr" rid="bib1.bibx18" id="paren.72"><named-content content-type="pre">see also the discussion by</named-content></xref>. OPTICS does not require fixing the number of clusters beforehand, and it also contains an intrinsic concept of noisy trajectories that do not belong to any cluster, making OPTICS suitable for challenging flows in large domains.</p>
      <p id="d1e4224">As mentioned, OPTICS also contains an intrinsic notion of cluster hierarchy, i.e. coherent sets that are themselves part of coherent sets at larger scales. <xref ref-type="bibr" rid="bib1.bibx23" id="text.73"/> studied hierarchical coherent sets in the transfer operator framework of <xref ref-type="bibr" rid="bib1.bibx15" id="text.74"/>, in the spirit of the hierarchical clustering method proposed by <xref ref-type="bibr" rid="bib1.bibx29" id="text.75"/>. Their approach is also partition based, i.e. there is no concept of noisy trajectories. In addition, at each stage of the hierarchy, a fixed cut-off has to be chosen based on minimizing an objective function <xref ref-type="bibr" rid="bib1.bibx23" id="paren.76"/>. Different from that approach, the main result of OPTICS, the reachability plot, contains such hierarchical information in a smooth and intrinsic manner.</p>
      <p id="d1e4239">As described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>, clustering results of the DBSCAN algorithm <xref ref-type="bibr" rid="bib1.bibx11" id="paren.77"/> can be derived from the reachability plot of OPTICS. DBSCAN has been used in the context of coherent sets before by <xref ref-type="bibr" rid="bib1.bibx27" id="text.78"/>, although not to identify specific clusters but to distinguish noisy from clustered trajectories. The potential of density-based clustering for applications in the ocean, and its comparison to other existing clustering methods for flow examples such as the Bickley jet (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>), has not been explored so far. Different from OPTICS, DBSCAN detects clusters with a certain fixed minimum density, although clusters with varying densities might be present in a data set <xref ref-type="bibr" rid="bib1.bibx1" id="paren.79"/>. More specifically, the value for the cut-off parameter <inline-formula><mml:math id="M214" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>) has to be set beforehand. Choosing a good value for the density parameter in DBSCAN is challenging if there is no underlying physical intuition for the density structure. As described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>, OPTICS allows one to derive any DBSCAN clustering result, with the same value for the parameter <inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula>, after computing the reachability plot, i.e. after one can obtain the first insights into the clustering structure of the data set to make an appropriate choice for <inline-formula><mml:math id="M216" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. Furthermore, it also allows one to use the <inline-formula><mml:math id="M217" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method instead of DBSCAN (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>).</p>
      <?pagebreak page50?><p id="d1e4295">A more recent and powerful technique for detecting finite-time coherent sets in sparse trajectory data was presented by <xref ref-type="bibr" rid="bib1.bibx18" id="text.80"/>, based on dynamic Laplacian and transfer operators <xref ref-type="bibr" rid="bib1.bibx13" id="paren.81"/>. <xref ref-type="bibr" rid="bib1.bibx18" id="text.82"/> apply their method to a trajectory data set in the western boundary current region in the North Atlantic Ocean and successfully detect many eddies by superposing individual eigenvectors. The methods presented there are based on a form of spectral embedding derived from discretized dynamical operators. Based on this embedding, clustering results have also been derived with <inline-formula><mml:math id="M218" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means by <xref ref-type="bibr" rid="bib1.bibx13" id="text.83"/> and with individual thresholding by <xref ref-type="bibr" rid="bib1.bibx18" id="text.84"/>. <xref ref-type="bibr" rid="bib1.bibx18" id="text.85"/> also show how the low-order eigenvectors correspond to large-scale coherent features, while the individual eddies are derived by a sparse eigenbasis approximation of a number of eigenvectors. The latter approach is essentially a transformation of the embedding to represent the most reliable features, such that a superposition of the eigenvectors alone yields the information about the location and size of finite-time coherent sets (without a clustering step). This is essentially an optimized form of embedding, i.e. the second step in Fig. <xref ref-type="fig" rid="Ch1.F1"/>. Our aim here is to focus on the third step in Fig. <xref ref-type="fig" rid="Ch1.F1"/>, i.e. to demonstrate the potential of the density-based clustering algorithm OPTICS, together with a very simple embedding of Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>).</p>
      <p id="d1e4331">A downside of our method compared to other approaches is the rather ad hoc choice of embedding (see Eq. <xref ref-type="disp-formula" rid="Ch1.E3"/>). Different from many other methods, most notably the ones of <xref ref-type="bibr" rid="bib1.bibx2" id="text.86"/>, <xref ref-type="bibr" rid="bib1.bibx13" id="text.87"/> and <xref ref-type="bibr" rid="bib1.bibx18" id="text.88"/>, this type of embedding is not derived from a meaningful dynamical operator. It could be fruitful to explore a combination of these more meaningful embeddings together with OPTICS as a clustering algorithm in future research.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Bickley jet flow</title>
      <p id="d1e4361">We start with the direct embedding of the Bickley jet flow trajectories (see Sect. <xref ref-type="sec" rid="Ch1.S2"/>). The data matrix has the dimension <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">123</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. We apply the OPTICS algorithm to the resulting points, together with DBSCAN clustering, choosing <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> as a minimum size of the finite-time coherent sets. In the following, all axis units are in multiples of <inline-formula><mml:math id="M221" display="inline"><mml:mn mathvariant="normal">1000</mml:mn></mml:math></inline-formula> km. Figure <xref ref-type="fig" rid="Ch1.F2"/> shows the reachability plot, together with the DBSCAN clustering result of three different choices of <inline-formula><mml:math id="M222" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. The six vortices and the jet are clearly visible as the major valleys in the reachability plot. The hierarchical structure of the DBSCAN clustering with decreasing <inline-formula><mml:math id="M223" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is visible in the figures from top (large-scale coherence) to bottom (small-scale coherence). Note that for the DBSCAN clustering results, boundary points of the clusters can be above the horizontal line at <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula>. This is because of the definition of the DBSCAN clustering in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><label>Figure 2</label><caption><p id="d1e4444">Result of the OPTICS algorithm applied to the direct embedding of the trajectories. <bold>(a)</bold>, <bold>(d)</bold> and <bold>(f)</bold> show the reachability plot, with different DBSCAN clustering results indicated by the black horizontal line. The corresponding clustering results of each choice of DBSCAN parameter <inline-formula><mml:math id="M225" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is shown on the right of the reachability plots for different times. Grey particles correspond to noise. Axis units in the centre and right column are in <inline-formula><mml:math id="M226" display="inline"><mml:mn mathvariant="normal">1000</mml:mn></mml:math></inline-formula> km.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f02.png"/>

        </fig>

      <p id="d1e4476">To illustrate the difference between OPTICS and <inline-formula><mml:math id="M227" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, we use the embedded trajectories and apply classical MDS to obtain a 2D embedding. As described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS2"/>, this assures the capturing of the major variance along the embedding axes. The spectrum of <inline-formula><mml:math id="M228" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F11"/> in the appendix, with two clearly dominant eigenvalues. The fact that there are two very dominant eigenvalues ensures that the illustration of the data in the plane captures the major variance in the data points. Figure <xref ref-type="fig" rid="Ch1.F3"/>a shows the corresponding embedding of the trajectories in the 2D Euclidean space. The star-shaped distribution of data points reflects the strong symmetries of the underlying idealized Bickley jet flow. Such symmetry is not expected to be present for more realistic flows. Figure <xref ref-type="fig" rid="Ch1.F3"/>b and c show the cluster labels for OPTICS with DBSCAN clustering at <inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">6</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> km, and for a <inline-formula><mml:math id="M230" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering with <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> clusters, respectively. <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> corresponds to the six vortices, the jet, and one noise cluster as suggested by <xref ref-type="bibr" rid="bib1.bibx19" id="text.89"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><label>Figure 3</label><caption><p id="d1e4557"><bold>(a)</bold> A 2D embedding of the classical MDS method (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS2"/>) of the trajectories. <bold>(b)</bold> Labels according to the DBSCAN result of Fig. <xref ref-type="fig" rid="Ch1.F4"/>. The six vortices and the jet are clearly visible as dense regions. Grey particles correspond to noise. <bold>(c)</bold> The <inline-formula><mml:math id="M233" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering result for <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula>; see Fig. <xref ref-type="fig" rid="Ch1.F5"/> for the spatial clustering result of <inline-formula><mml:math id="M235" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f03.png"/>

        </fig>

      <p id="d1e4607">The corresponding clustering results in real space are shown in Figs. <xref ref-type="fig" rid="Ch1.F4"/> and <xref ref-type="fig" rid="Ch1.F5"/> for OPTICS and <inline-formula><mml:math id="M236" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, respectively. The jet and the six vortices are clearly recognizable as dense accumulations of points in the 2D space of Fig. <xref ref-type="fig" rid="Ch1.F3"/>b (see Fig. <xref ref-type="fig" rid="Ch1.F4"/> for the corresponding colours). The clustering result with <inline-formula><mml:math id="M237" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means in Fig. <xref ref-type="fig" rid="Ch1.F5"/> shows that the clusters corresponding to the vortices are much less focused. In addition, each of the eight clusters in Fig. <xref ref-type="fig" rid="Ch1.F3"/>c contains some of the noisy points of Fig. <xref ref-type="fig" rid="Ch1.F3"/>b, which shows that using one additional cluster for noise does not work in this situation. It is interesting to note that capturing the noisy data points of Fig. <xref ref-type="fig" rid="Ch1.F3"/>b with an additional cluster in <inline-formula><mml:math id="M238" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means is geometrically impossible, simply because <inline-formula><mml:math id="M239" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clusters are circular. Covering all noisy points without including the centre, i.e. the jet in Fig. <xref ref-type="fig" rid="Ch1.F3"/>b, is not possible for <inline-formula><mml:math id="M240" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means.</p>
      <p id="d1e4665">It should be noted here that the poor performance of <inline-formula><mml:math id="M241" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means in Figs. <xref ref-type="fig" rid="Ch1.F3"/>c and <xref ref-type="fig" rid="Ch1.F5"/> is not representative for other methods that use <inline-formula><mml:math id="M242" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means. For example, the method of <xref ref-type="bibr" rid="bib1.bibx2" id="text.90"/> captures the coherent structures in the Bickley jet rather well, including the jet in the middle. We emphasize again that we use classical MDS here mostly for visualization purposes as the computation of the classical MDS embedding is difficult for large particle sets. In our case, a dense <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">000</mml:mn></mml:mrow></mml:math></inline-formula> symmetric matrix has to be diagonalized, which already takes a significant amount of computation time.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><label>Figure 4</label><caption><p id="d1e4710">Result of DBSCAN clustering of the 2D embedding of the classical MDS method. <bold>(a)</bold> Reachability plot, with the black line representing the DBSCAN parameter <inline-formula><mml:math id="M244" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. <bold>(b–c)</bold> Corresponding clustering results at different times. Grey particles represent noise. Axis units are in <inline-formula><mml:math id="M245" display="inline"><mml:mn mathvariant="normal">1000</mml:mn></mml:math></inline-formula> km.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f04.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><label>Figure 5</label><caption><p id="d1e4741">Result of <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M247" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering of the 2D embedding from classical MDS (see Fig. <xref ref-type="fig" rid="Ch1.F4"/>). Axis units are in <inline-formula><mml:math id="M248" display="inline"><mml:mn mathvariant="normal">1000</mml:mn></mml:math></inline-formula> km.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f05.png"/>

        </fig>

      <p id="d1e4779">We finally also tested the performance of our algorithm with a random subset of 2000 particles, using data for every 5 d instead of every day (see Fig. <xref ref-type="fig" rid="App1.Ch1.S1.F12"/> in the Appendix). OPTICS still detects the six vortices and the jet, although the cluster boundaries are less clearly defined compared to Fig. <xref ref-type="fig" rid="Ch1.F2"/>. <xref ref-type="bibr" rid="bib1.bibx13" id="text.91"/> detect the vortices and the jet by using the data of 3000 particles only at initial and final times (<inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> d). Our method is not able to detect the expected finite-time coherent sets by using only initial and final particle data. This is likely to be a result of the ad hoc direct embedding; see Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>) and the discussion at the end of Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Agulhas rings</title>
      <?pagebreak page51?><p id="d1e4826">We next apply OPTICS to the Agulhas trajectories. As described in Sect. <xref ref-type="sec" rid="Ch1.S2"/>, we have <inline-formula><mml:math id="M251" display="inline"><mml:mrow><mml:mi mathvariant="bold">X</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">63</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">23</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">821</mml:mn></mml:mrow></mml:math></inline-formula>. We choose <inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula> in the following, which corresponds initially to a square cell of <inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, i.e. a reasonable minimum size of an Agulhas ring. Figure <xref ref-type="fig" rid="Ch1.F6"/> shows the result of the direct embedding. The reachability plot in Fig. <xref ref-type="fig" rid="Ch1.F6"/>a is much more jagged than for the Bickley jet model flow (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>a). The narrow, deep valleys and the wider valleys in the reachability plot indicate the presence of large- and small-scale coherence patterns. Figure <xref ref-type="fig" rid="Ch1.F6"/>a–c show the DBSCAN clustering result for a relatively large value of <inline-formula><mml:math id="M255" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. The main separation of fluid domains is between the red and the blue particles, with a few vortices at their boundary. These two water masses are the northern and southern parts of the subtropical gyre in the South Atlantic, with the red particles moving to the west and the blue particles moving to the east. The second and third rows of Fig. <xref ref-type="fig" rid="Ch1.F6"/> show other clustering results for the DBSCAN- and the <inline-formula><mml:math id="M256" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method, respectively. The valleys in Fig. <xref ref-type="fig" rid="Ch1.F6"/>g with steepest boundaries, as detected by the <inline-formula><mml:math id="M257" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method, mostly correspond to eddy-like structures separated by background noise. Note that not all clusters in the figure correspond to eddies. For example, the blue cluster in Fig. <xref ref-type="fig" rid="Ch1.F6"/>g–i stays approximately coherent over the considered time interval, although it is certainly not an Agulhas ring. An animation of the detected finite-time coherent sets for the full 2 years of trajectory data, based on the <inline-formula><mml:math id="M258" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method as in the last row of Fig. <xref ref-type="fig" rid="Ch1.F6"/>, can be found on Zenodo <xref ref-type="bibr" rid="bib1.bibx33" id="paren.92"/>, showing that many of the sets stay coherent for significantly longer times than the first 100 d.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><?xmltex \currentcnt{6}?><label>Figure 6</label><caption><p id="d1e4953">Result of the OPTICS algorithm applied to the direct embedding of the trajectories with different clustering methods. Grey particles correspond to noise.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f06.png"/>

        </fig>

      <?pagebreak page52?><p id="d1e4962">Figure <xref ref-type="fig" rid="Ch1.F6"/> shows that, for this situation, the <inline-formula><mml:math id="M259" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method detects more Agulhas rings than DBSCAN. While the clustering results shown in the figure all depend on the parameter values for <inline-formula><mml:math id="M260" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M261" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>, it is visible in the reachability plot of Fig. <xref ref-type="fig" rid="Ch1.F6"/>g that the definition of some eddies includes the entire boundary of the valleys, i.e. up to very high reachability values. At the same time, the detection of the large-scale clusters, as in Fig. <xref ref-type="fig" rid="Ch1.F6"/>a–c, is not possible with the <inline-formula><mml:math id="M262" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method. These findings are in fact expected; see the discussion of the two clustering methods at the end of Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>. DBSCAN is best for detecting global density structures, i.e. when the reachability values of all points are compared to the same cut-off <inline-formula><mml:math id="M263" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. Regions that are dense locally but not necessarily globally are better detected with the <inline-formula><mml:math id="M264" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method. Despite these differences between the two clustering methods, we again emphasize that the main result of OPTICS is the reachability plot itself. Figure <xref ref-type="fig" rid="Ch1.F7"/> shows a colour map at the initial time of the reachability values. We clearly see Agulhas rings as the dark regions corresponding to lowest values of reachability. The regions of large reachability correspond to trajectories that are relatively noisy compared to all the other trajectories.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7"><?xmltex \currentcnt{7}?><label>Figure 7</label><caption><p id="d1e5021">Reachability values at the initial time that resulted from the OPTICS algorithm being applied to the direct embedding of the trajectories. The regions with lowest values clearly correspond to Agulhas rings. The colour bar is cut off at a reachability of <inline-formula><mml:math id="M265" display="inline"><mml:mn mathvariant="normal">1000</mml:mn></mml:math></inline-formula> km to show the relevant structure of the variations. </p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f07.png"/>

        </fig>

      <p id="d1e5037">In order to illustrate again the difference between OPTICS and <inline-formula><mml:math id="M266" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means for this example, we choose 12 000 random trajectories and again embed the trajectories in a 2D space with classical MDS (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS2.SSS2"/>). The reduction in the particle set is necessary for simplifying the eigendecomposition of the matrix <inline-formula><mml:math id="M267" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>), and we therefore choose <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula>. The corresponding spectrum of <inline-formula><mml:math id="M269" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S2.F13"/> in the Appendix, showing that there are again two dominant eigenvectors, i.e. visualizing the network in the plane captures the main variance of the data. Figure <xref ref-type="fig" rid="Ch1.F8"/> shows the embedded trajectories together with OPTICS and DBSCAN clustering (Fig. <xref ref-type="fig" rid="Ch1.F8"/>b) and <inline-formula><mml:math id="M270" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means (Fig. <xref ref-type="fig" rid="Ch1.F8"/>c) for <inline-formula><mml:math id="M271" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula>. Figures <xref ref-type="fig" rid="Ch1.F9"/> and <xref ref-type="fig" rid="Ch1.F10"/> show the corresponding clustering results in the fluid domain. It is clear that <inline-formula><mml:math id="M272" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means does not detect a single vortex but instead splits the fluid domain into regions of approximately similar size. OPTICS detects multiple Agulhas rings by finding the deepest valleys in the reachability plot.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><?xmltex \currentcnt{8}?><label>Figure 8</label><caption><p id="d1e5122">Embedding of the Agulhas trajectories in the 2D space defined by the leading eigenvectors of the MDS kernel matrix <inline-formula><mml:math id="M273" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula>. <bold>(a)</bold> No labels. <bold>(b)</bold> Clustering labels of OPTICS and DBSCAN (see Fig. <xref ref-type="fig" rid="Ch1.F9"/> for the corresponding plot in the Agulhas region). Grey particles represent noise. <bold>(c)</bold> The <inline-formula><mml:math id="M274" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means with <inline-formula><mml:math id="M275" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> (see Fig. <xref ref-type="fig" rid="Ch1.F10"/> for the corresponding plot in the Agulhas domain.)</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f08.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9" specific-use="star"><?xmltex \currentcnt{9}?><label>Figure 9</label><caption><p id="d1e5173">Result of OPTICS applied to the 2D embedding of 12 000 randomly selected particles with the classical MDS method (see Fig. <xref ref-type="fig" rid="Ch1.F8"/>b; <inline-formula><mml:math id="M276" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula>). The corresponding spectrum is shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S2.F13"/> in the Appendix, showing that there are two dominant eigenvectors. Grey particles are classified as noise.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f09.png"/>

        </fig>

      <?xmltex \floatpos{h!}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><label>Figure 10</label><caption><p id="d1e5204">Result of the <inline-formula><mml:math id="M277" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering, with <inline-formula><mml:math id="M278" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> applied to the 2D embedding with classical MDS (see Fig. <xref ref-type="fig" rid="Ch1.F8"/>c).</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f10.png"/>

        </fig>

      <p id="d1e5234">It is interesting to note that the use of classical MDS in Fig. <xref ref-type="fig" rid="Ch1.F9"/> has led to the detection of many of the vortices of Fig. <xref ref-type="fig" rid="Ch1.F6"/>d–f with DBSCAN instead of the <inline-formula><mml:math id="M279" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula>-clustering method. The transformation to the reduced 2D space has hence led to a simplification of the reachability plot, which now represents the major variations in the distances of the embedded trajectories. At the same time, the large-scale structure of Fig. <xref ref-type="fig" rid="Ch1.F6"/>a is not visible any more in Fig. <xref ref-type="fig" rid="Ch1.F9"/>. This indicates that exploring more dimensionality-reduction techniques could be useful for future research, in particular for those that are computationally more efficient than classical MDS.</p>
      <p id="d1e5252">Spectral embeddings derived from networks, together with partition-based clustering, have a similar problem to the one illustrated in Figs. <xref ref-type="fig" rid="Ch1.F8"/>c and <xref ref-type="fig" rid="Ch1.F10"/> <xref ref-type="bibr" rid="bib1.bibx18" id="paren.93"/>. Similar to the case discussed here, OPTICS can be used to overcome the problems of <inline-formula><mml:math id="M280" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means. We show this in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/> for the network proposed by <xref ref-type="bibr" rid="bib1.bibx25" id="text.94"/> for the Agulhas region, together with a brief introduction to the network and how to construct spectral embeddings. In summary, <inline-formula><mml:math id="M281" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means again fail to detect any of the vortices, while OPTICS detects many of the coherent vortices in the spectrally embedded network. Yet, other flow features are also present that result from the physical motivation of the network definition (see the results in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>).</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d1e5294">The abstract embedding of particle trajectories in a metric space with subsequent clustering is a promising field<?pagebreak page53?> of research for the detection of finite-time coherent sets in oceanography. Yet, most of the existing methods have been based on graph partitioning, which has no concept of noisy, unclustered trajectories. This is a problem for applications in the ocean, where many eddies are transported in a noisy background flow on large domains. This study is motivated by the success of <xref ref-type="bibr" rid="bib1.bibx18" id="text.95"/> in overcoming the problem of graph partitioning by a sophisticated form of trajectory embedding. Here, we show how the density-based clustering algorithm of OPTICS <xref ref-type="bibr" rid="bib1.bibx1" id="paren.96"/> can be used instead of graph partitioning in order to detect small-scale eddies in large ocean domains. Different from partition-based clustering methods such as <inline-formula><mml:math id="M282" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means, OPTICS does not require one to fix the number of clusters beforehand. Clusters are detected by identifying dense accumulations of points, i.e. groups of trajectories that are close to each other in the embedding space. Coherent groups of particle trajectories can be identified as valleys in the reachability plot computed by the OPTICS algorithm. This plot also has a natural interpretation in terms of cluster hierarchies, i.e. finite-time coherent sets that are by themselves part of a larger-scale finite-time coherent set. Such hierarchies are present in the surface ocean flow, where the subtropical basins are approximately coherent and, at the same time, contain other finite-time coherent structures such as eddies and jets.</p>
      <p id="d1e5310">We apply OPTICS to Lagrangian particle trajectories directly, in the spirit of <xref ref-type="bibr" rid="bib1.bibx14" id="text.97"/>. OPTICS successfully detects the expected coherent structures in the Bickley jet model flow, separating the six vortices and the jet from background noise. We also apply OPTICS to simulated trajectories in the eastern South Atlantic and successfully identify Agulhas rings separated by noise. We visualize the difference between OPTICS and <inline-formula><mml:math id="M283" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means with a 2D embedding of the trajectories, based on classical multidimensional scaling. We also show how OPTICS can be applied to the spectral embedding of the particle-based network proposed by <xref ref-type="bibr" rid="bib1.bibx25" id="text.98"/>, providing a necessary amendment to their method of detecting coherent vortices in a large ocean domain, i.e. when <inline-formula><mml:math id="M284" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means fails. Our method is very simple to implement in Python, as OPTICS is available in the scikit-learn library in Python. While<?pagebreak page54?> we here present the results of OPTICS with three different kinds of embeddings, it is likely that OPTICS also works for other trajectory embeddings, such as the spectral embeddings of <xref ref-type="bibr" rid="bib1.bibx2" id="text.99"/> or <xref ref-type="bibr" rid="bib1.bibx13" id="text.100"/>. Using such dynamically motivated embeddings instead of the ad hoc direct embedding presented here could be a promising direction for future research.</p>
      <p id="d1e5340">Extending our method to data sets with more trajectories can be made more efficient by choosing a finite generating distance for OPTICS <xref ref-type="bibr" rid="bib1.bibx1" id="paren.101"/>. While this is better from a computational point of view, it requires some knowledge or intuition about the spatial distribution of the embedded trajectories. A major challenge for the method proposed here is the embedding dimension. For long trajectories, it is necessary to reduce the dimensionality of the trajectories before applying OPTICS. A complication here is the desired property of an embedding to preserve both local and global distances in order to make full use of the hierarchical properties of OPTICS. This means, for example, that the popular method of a locally linear embedding <xref ref-type="bibr" rid="bib1.bibx26" id="paren.102"/> is not suitable, unless only the small-scale (densest) finite-time coherent sets are to be detected. Using classical multidimensional scaling (MDS), as we did here to visualize the clustering results, preserves local and global distances in principle, although our results indicate that the large-scale coherence structure in the Agulhas flow is less pronounced for the classical MDS embedding compared to the full embedding of trajectories. In any case, classical MDS is not an option for very large data sets, as it requires the diagonalization of a dense symmetric square matrix of size equal to the particle number. Spectral embeddings of derived networks, such as the ones of <xref ref-type="bibr" rid="bib1.bibx19" id="text.103"/>, <xref ref-type="bibr" rid="bib1.bibx25" id="text.104"/> and <xref ref-type="bibr" rid="bib1.bibx2" id="text.105"/>, are useful for achieving lower dimensional embeddings, but they come with the introduction of additional parameters for the network construction and heuristics to truncate the embedding dimension. Further research into other nonlinear dimensionality-reduction techniques that have not been explored in the context of finite-time coherent sets can lead to more efficient and robust methods.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<?pagebreak page55?><app id="App1.Ch1.S1">
  <?xmltex \currentcnt{A}?><label>Appendix A</label><title>Additional figures for the Bickley jet flow</title>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F11"><?xmltex \currentcnt{A1}?><label>Figure A1</label><caption><p id="d1e5372">Spectrum of the classical MDS kernel matrix <inline-formula><mml:math id="M285" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> for the Bickley jet flow. It is evident that there are two dominant eigenvalues. We choose the vectors corresponding to these first two eigenvalues as embedding vectors in Sect. <xref ref-type="sec" rid="Ch1.S4.SS1"/>.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f11.png"/>

      </fig>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.S1.F12"><?xmltex \currentcnt{A2}?><label>Figure A2</label><caption><p id="d1e5394">Result of the OPTICS algorithm for a random subset of 2000 particles in the Bickley jet flow, with particle data every 5 d instead of every day. To account for the smaller number of particles, we set <inline-formula><mml:math id="M286" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">15</mml:mn></mml:mrow></mml:math></inline-formula> for this case. The six vortices and the jet are still clearly visible.</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f12.png"/>

      </fig>

<?xmltex \hack{\clearpage}?>
</app>

<?pagebreak page56?><app id="App1.Ch1.S2">
  <?xmltex \currentcnt{B}?><label>Appendix B</label><title>Additional figures for the Agulhas flow</title>

      <?xmltex \floatpos{h}?><fig id="App1.Ch1.S2.F13"><?xmltex \currentcnt{B1}?><label>Figure B1</label><caption><p id="d1e5432">Spectrum of the classical MDS kernel matrix <inline-formula><mml:math id="M287" display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula> for the Agulhas flow, where we first constrain the particle data to 12 000 randomly selected trajectories. There are again two dominant eigenvalues for which we choose the corresponding vectors for embedding in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f13.png"/>

      </fig>

</app>

<app id="App1.Ch1.S3">
  <?xmltex \currentcnt{C}?><label>Appendix C</label><title>Detecting Agulhas rings with a particle-based network</title>
      <p id="d1e5458">To demonstrate that OPTICS can also be applied to the spectral embedding of a particle-based network, we use the network proposed by <xref ref-type="bibr" rid="bib1.bibx25" id="text.106"/>. If we have a set of particle trajectories <inline-formula><mml:math id="M288" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M289" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M290" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M291" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> the number of particles and <inline-formula><mml:math id="M292" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> is the number of time steps, the network <inline-formula><mml:math id="M293" display="inline"><mml:mrow><mml:mi mathvariant="bold">A</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is defined as follows:
          <disp-formula id="App1.Ch1.S3.E8" content-type="numbered"><label>C1</label><mml:math id="M294" display="block"><mml:mrow><?xmltex \hack{\hbox\bgroup\fontsize{9.1}{9.1}\selectfont$\displaystyle}?><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfenced open="{" close=""><mml:mtable class="cases" rowspacing="0.2ex" columnspacing="1em" columnalign="left left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi mathvariant="normal">if</mml:mi><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mo>∃</mml:mo><mml:mi>t</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>&lt;</mml:mo><mml:mi>d</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi mathvariant="normal">otherwise</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><?xmltex \hack{$\egroup}?></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e5697">Here, <inline-formula><mml:math id="M295" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mo>.</mml:mo><mml:mo>|</mml:mo><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> denotes the Euclidean norm, and <inline-formula><mml:math id="M296" display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> is a fixed predetermined cut-off parameter. See <xref ref-type="bibr" rid="bib1.bibx25" id="text.107"/> for a discussion on the choice of <inline-formula><mml:math id="M297" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> (called <inline-formula><mml:math id="M298" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> in <xref ref-type="bibr" rid="bib1.bibx25" id="altparen.108"/>).
Similar to <xref ref-type="bibr" rid="bib1.bibx25" id="text.109"/>, we embed the nodes in a lower dimensional space <inline-formula><mml:math id="M299" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>K</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> by means of the eigenvectors of its random walk Laplacian (see, e.g., <xref ref-type="bibr" rid="bib1.bibx32" id="altparen.110"/>) as follows:
          <disp-formula id="App1.Ch1.S3.E9" content-type="numbered"><label>C2</label><mml:math id="M300" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">D</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mi mathvariant="bold">A</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M301" display="inline"><mml:mi mathvariant="bold">D</mml:mi></mml:math></inline-formula> is a diagonal matrix with <inline-formula><mml:math id="M302" display="inline"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. The embedding of node <inline-formula><mml:math id="M303" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> is defined by the following:
          <disp-formula id="App1.Ch1.S3.E10" content-type="numbered"><label>C3</label><mml:math id="M304" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>K</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        where <inline-formula><mml:math id="M305" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> are the right eigenvectors corresponding to the largest eigenvalues <inline-formula><mml:math id="M306" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of <inline-formula><mml:math id="M307" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">L</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The eigenvalues are assumed to be ordered in descending order, i.e. <inline-formula><mml:math id="M308" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>&gt;</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>≥</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mo>≥</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The classical simultaneous <inline-formula><mml:math id="M309" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-way normalized cut proceeds with applying the <inline-formula><mml:math id="M310" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means algorithm to the embedding defined in Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S3.E10"/>) to detect <inline-formula><mml:math id="M311" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> clusters <xref ref-type="bibr" rid="bib1.bibx32" id="paren.111"/>, resulting in an approximate solution to the normalized cut problem <xref ref-type="bibr" rid="bib1.bibx29" id="paren.112"/>.</p>
      <p id="d1e6021">Figure <xref ref-type="fig" rid="App1.Ch1.S3.F14"/> shows the spectrum of the resulting random walk Laplacian with <inline-formula><mml:math id="M312" display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula> km. No obvious spectral gap is visible that would suggest a truncation of the embedding space. Figure <xref ref-type="fig" rid="App1.Ch1.S3.F15"/> shows the clustering result if we apply a <inline-formula><mml:math id="M313" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means algorithm, as suggested by <xref ref-type="bibr" rid="bib1.bibx25" id="text.113"/>, to detect <inline-formula><mml:math id="M314" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> clusters. It is visible that the partition-based <inline-formula><mml:math id="M315" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering method does not detect any individual Agulhas rings but instead partitions the state space into regions of approximately equal size.</p>

      <?xmltex \floatpos{t}?><fig id="App1.Ch1.S3.F14"><?xmltex \currentcnt{C1}?><label>Figure C1</label><caption><p id="d1e6073">Spectrum of the random walk Laplacian (see Eq. <xref ref-type="disp-formula" rid="App1.Ch1.S3.E9"/>) of the network proposed by <xref ref-type="bibr" rid="bib1.bibx25" id="text.114"/> applied to the Agulhas trajectory data. No clear gap exists to suggest a truncation of the embedding.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f14.png"/>

      </fig>

      <?xmltex \floatpos{p}?><fig id="App1.Ch1.S3.F15" specific-use="star"><?xmltex \currentcnt{C2}?><label>Figure C2</label><caption><p id="d1e6089">Result of <inline-formula><mml:math id="M316" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means clustering applied to the 40 leading eigenvectors of the random walk Laplacian (see Eq. <xref ref-type="disp-formula" rid="App1.Ch1.S3.E9"/>), looking for 40 clusters. No individual vortices are detected.</p></caption>
        <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f15.png"/>

      </fig>

      <p id="d1e6107">Applying OPTICS instead of <inline-formula><mml:math id="M317" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-means with a subsequent <inline-formula><mml:math id="M318" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula> clustering detects some of the Agulhas rings (see Fig. <xref ref-type="fig" rid="App1.Ch1.S3.F16"/>), where we choose <inline-formula><mml:math id="M319" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula> as in Sect. <xref ref-type="sec" rid="Ch1.S4.SS2"/>. Note also that structures other than typical circular eddies are detected. While this depends on the clustering parameter <inline-formula><mml:math id="M320" display="inline"><mml:mi mathvariant="italic">ξ</mml:mi></mml:math></inline-formula> (or <inline-formula><mml:math id="M321" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> for DBSCAN), this is also a consequence of the physically motivated network defined by Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S3.E10"/>), where particles are connected equally if they are close to each other at least once at a point in time. This is different from the direct embedding, where we require particles to stay close to each other along the entire trajectory.</p>

      <?xmltex \floatpos{p}?><fig id="App1.Ch1.S3.F16" specific-use="star"><?xmltex \currentcnt{C3}?><label>Figure C3</label><caption><p id="d1e6162">Result of OPTICS applied to the <inline-formula><mml:math id="M322" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">40</mml:mn></mml:mrow></mml:math></inline-formula> spectral embedding of the network defined in Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S3.E8"/>), with <inline-formula><mml:math id="M323" display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula> km and <inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>. Grey particles are classified as noise.</p></caption>
        <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://npg.copernicus.org/articles/28/43/2021/npg-28-43-2021-f16.png"/>

      </fig>

<?xmltex \hack{\clearpage}?>
</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d1e6218">All code is available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.4426287" ext-link-type="DOI">10.5281/zenodo.4426287</ext-link> <xref ref-type="bibr" rid="bib1.bibx35" id="paren.115"/>, including the code to generate the Bickley jet trajectories. The data for the virtual particles in the South Atlantic are available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.3899942" ext-link-type="DOI">10.5281/zenodo.3899942</ext-link> <xref ref-type="bibr" rid="bib1.bibx34" id="paren.116"/>. Details on the Parcels simulation for the virtual trajectories in the ocean can be found at the GitHub repository of our previous paper, i.e. <ext-link xlink:href="https://doi.org/10.5281/zenodo.4426310" ext-link-type="DOI">10.5281/zenodo.4426310</ext-link> <xref ref-type="bibr" rid="bib1.bibx36" id="paren.117"/>. The NEMO N006 data are kindly provided by Andrew Coward at NOC Southampton, UK, and can be downloaded at <uri>http://opendap4gws.jasmin.ac.uk/thredds/nemo/root/catalog.html</uri> (last access: 10 March 2019).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e6246">DW performed the analysis, with support from CK, EvS and HAD. DW wrote the paper, and all authors jointly edited and revised it.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e6252">The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e6258">David Wichmann, Christian Kehl and Erik van Sebille have been supported through funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (grant no. 715386). This work was partially carried out on the Dutch national e-infrastructure, with the support of SURF Cooperative (project no. 16371). We thank Andrew Coward for providing the ORCA-N006 simulation data.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e6263">This research has been supported by the European Research Council (TOPIOS (grant no. 715386)).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e6269">This paper was edited by Juan Restrepo and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Ankerst et al.(1999)Ankerst, Breunig, Kriegel, and
Sander</label><?label ankerst1999optics?><mixed-citation>Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J.: OPTICS: Ordering
Points to Identify the Clustering Structure, ACM Sigmod Record, 28, 49–60,
<ext-link xlink:href="https://doi.org/10.1145/304181.304187" ext-link-type="DOI">10.1145/304181.304187</ext-link>, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Banisch and Koltai(2017)</label><?label Banisch2017?><mixed-citation>Banisch, R. and Koltai, P.: Understanding the geometry of transport: Diffusion
maps for Lagrangian trajectory data unravel coherent sets, Chaos: An
Interdisciplinary J. Nonlinear Sci., 27, 035804,
<ext-link xlink:href="https://doi.org/10.1063/1.4971788" ext-link-type="DOI">10.1063/1.4971788</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Beron-Vera et al.(2013)Beron-Vera, Wang, Olascoaga, Goni, and
Haller</label><?label Beron-Vera2013?><mixed-citation>Beron-Vera, F. J., Wang, Y., Olascoaga, M. J., Goni, G. J., and Haller, G.:
Objective Detection of Oceanic Eddies and the Agulhas Leakage, J.
Phys. Oceanogr., 43, 1426–1438,
<ext-link xlink:href="https://doi.org/10.1175/JPO-D-12-0171.1" ext-link-type="DOI">10.1175/JPO-D-12-0171.1</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bickley(1937)</label><?label bickley1937lxxiii?><mixed-citation>Bickley, W.: LXXIII. The plane jet, The London, Edinburgh, and Dublin
Philosophical Magazine and Journal of Science, 23, 727–731,
<ext-link xlink:href="https://doi.org/10.1080/14786443708561847" ext-link-type="DOI">10.1080/14786443708561847</ext-link>, 1937.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Brach et al.(2018)Brach, Deixonne, Bernard, Durand, Desjean, Perez,
van Sebille, and ter Halle</label><?label Brach2018?><mixed-citation>Brach, L., Deixonne, P., Bernard, M. F., Durand, E., Desjean, M. C., Perez, E.,
van Sebille, E., and ter Halle, A.: Anticyclonic eddies increase
accumulation of microplastic in the North Atlantic subtropical gyre, Marine
Pollution Bulletin, 126, 191–196,
<ext-link xlink:href="https://doi.org/10.1016/j.marpolbul.2017.10.077" ext-link-type="DOI">10.1016/j.marpolbul.2017.10.077</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>del Castillo-Negrete and Morrison(1993)</label><?label del1993chaotic?><mixed-citation>del Castillo-Negrete, D. and Morrison, P.: Chaotic transport by Rossby waves in
shear flow, Phys. Fluids A, 5, 948–965,
<ext-link xlink:href="https://doi.org/10.1063/1.858639" ext-link-type="DOI">10.1063/1.858639</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Delandmeter and van Sebille(2019)</label><?label Delandmeter2019?><mixed-citation>Delandmeter, P. and van Sebille, E.: The Parcels v2.0 Lagrangian framework: new field interpolation schemes, Geosci. Model Dev., 12, 3571–3584, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-3571-2019" ext-link-type="DOI">10.5194/gmd-12-3571-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Dencausse et al.(2010)Dencausse, Arhan, and Speich</label><?label Dencausse2010?><mixed-citation>Dencausse, G., Arhan, M., and Speich, S.: Routes of Agulhas rings in the
southeastern Cape Basin, Deep-Sea Res. Pt. I, 57, 1406–1421, <ext-link xlink:href="https://doi.org/10.1016/j.dsr.2010.07.008" ext-link-type="DOI">10.1016/j.dsr.2010.07.008</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Dong et al.(2014)Dong, McWilliams, Liu, and Chen</label><?label Dong2014?><mixed-citation>Dong, C., McWilliams, J. C., Liu, Y., and Chen, D.: Global heat and salt
transports by eddy movement, Nature Commun., 5, 3294,
<ext-link xlink:href="https://doi.org/10.1038/ncomms4294" ext-link-type="DOI">10.1038/ncomms4294</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Dussin et al.(2016)Dussin, Barnier, and Brodeau</label><?label Dussin2016?><mixed-citation>
Dussin, R., Barnier, B., and Brodeau, L.: The making of Drakkar forcing set
DFS5, Tech. rep., LGGE, Grenoble, France, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Ester et al.(1996)Ester, Kriegel, Sander, and Xu</label><?label ester1996density?><mixed-citation>
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X.: A Density-Based Algorithm
for Discovering Clusters in Large Spatial Databases with Noise, in:
Proceedings of the Second International Conference on Knowledge Discovery and
Data Mining, KDD'96,  226–231, AAAI Press, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Fouss et al.(2016)Fouss, Saerens, and Shimbo</label><?label Fouss2016?><mixed-citation>Fouss, F., Saerens, M., and Shimbo, M.: Algorithms and models for network data
and link analysis, Cambridge University Press, Cambridge, <ext-link xlink:href="https://doi.org/10.1017/CBO9781316418321" ext-link-type="DOI">10.1017/CBO9781316418321</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Froyland and Junge(2018)</label><?label Froyland2018?><mixed-citation>Froyland, G. and Junge, O.: Robust FEM-based extraction of finite-time
coherent sets using scattered, sparse, and incomplete trajectories, SIAM
Journal on Applied Dynamical Systems, 17, 1891–1924,
<ext-link xlink:href="https://doi.org/10.1137/17M1129738" ext-link-type="DOI">10.1137/17M1129738</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Froyland and Padberg-Gehle(2015)</label><?label Froyland2015?><mixed-citation>Froyland, G. and Padberg-Gehle, K.: A rough-and-ready cluster-based approach
for extracting finite-time coherent sets from sparse and incomplete
trajectory data, Chaos: An Interdisciplinary J. Nonlinear Sci.,
25, 087406, <ext-link xlink:href="https://doi.org/10.1063/1.4926372" ext-link-type="DOI">10.1063/1.4926372</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Froyland et al.(2010)Froyland, Santitissadeekorn, and
Monahan</label><?label Froyland2010?><mixed-citation>Froyland, G., Santitissadeekorn, N., and Monahan, A.: Transport in
time-dependent dynamical systems: Finite-time coherent sets, Chaos: An
Interdisciplinary J. Nonlinear Sci., 20, 043116,
<ext-link xlink:href="https://doi.org/10.1063/1.3502450" ext-link-type="DOI">10.1063/1.3502450</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Froyland et al.(2014)Froyland, Stuart, and van
Sebille</label><?label Froyland2014?><mixed-citation>Froyland, G., Stuart, R. M., and van Sebille, E.: How well-connected is the
surface of the global ocean?, Chaos: An Interdisciplinary J.
Nonlinear Sci., 24, 033126, <ext-link xlink:href="https://doi.org/10.1063/1.4892530" ext-link-type="DOI">10.1063/1.4892530</ext-link>,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Froyland et al.(2015)Froyland, Horenkamp, Rossi, and van
Sebille</label><?label Froyland2015ring?><mixed-citation>Froyland, G., Horenkamp, C., Rossi, V., and van Sebille, E.: Studying an
Agulhas ring's long-term pathway and decay with finite-time coherent sets,
Chaos: An Interdisciplinary J. Nonlinear Sci., 25, 083119,
<ext-link xlink:href="https://doi.org/10.1063/1.4927830" ext-link-type="DOI">10.1063/1.4927830</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Froyland et al.(2019)Froyland, Rock, and Sakellariou</label><?label Froyland2019?><mixed-citation>Froyland, G., Rock, C. P., and Sakellariou, K.: Sparse eigenbasis
approximation: Multiple feature extraction across spatiotemporal scales with
application to coherent set identification, Communications in Nonlinear
Science and Numerical Simulation, 77, 81–107,
<ext-link xlink:href="https://doi.org/10.1016/j.cnsns.2019.04.012" ext-link-type="DOI">10.1016/j.cnsns.2019.04.012</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Hadjighasem et al.(2016)Hadjighasem, Karrasch, Teramoto, and
Haller</label><?label Hadjighasem2016?><mixed-citation>Hadjighasem, A., Karrasch, D., Teramoto, H., and Haller, G.:
Spectral-clustering approach to Lagrangian vortex detection, Phys.
Rev. E, 93, 063107, <ext-link xlink:href="https://doi.org/10.1103/PhysRevE.93.063107" ext-link-type="DOI">10.1103/PhysRevE.93.063107</ext-link>,
2016.</mixed-citation></ref>
      <?pagebreak page59?><ref id="bib1.bibx20"><label>Hadjighasem et al.(2017)Hadjighasem, Farazmand, Blazevski, Froyland,
and Haller</label><?label Hadjighasem2017?><mixed-citation>Hadjighasem, A., Farazmand, M., Blazevski, D., Froyland, G., and Haller, G.: A
critical comparison of Lagrangian methods for coherent structure detection,
Chaos: An Interdisciplinary J. Nonlinear Sci., 27, 053104,
<ext-link xlink:href="https://doi.org/10.1063/1.4982720" ext-link-type="DOI">10.1063/1.4982720</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Haller and Beron-Vera(2013)</label><?label Haller2013?><mixed-citation>Haller, G. and Beron-Vera, F. J.: Coherent Lagrangian vortices: The black
holes of turbulence, J. Fluid Mech., 731,
<ext-link xlink:href="https://doi.org/10.1017/jfm.2013.391" ext-link-type="DOI">10.1017/jfm.2013.391</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Lange and van Sebille(2017)</label><?label Lange2017?><mixed-citation>Lange, M. and van Sebille, E.: Parcels v0.9: prototyping a Lagrangian ocean analysis framework for the petascale age, Geosci. Model Dev., 10, 4175–4186, <ext-link xlink:href="https://doi.org/10.5194/gmd-10-4175-2017" ext-link-type="DOI">10.5194/gmd-10-4175-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Ma and Bollt(2013)</label><?label Ma2013?><mixed-citation>Ma, T. and Bollt, E. M.: Relatively Coherent Sets as a Hierarchical Partition
Method, Int. J. Bifurcat.Chaos, 23, 1330026,
<ext-link xlink:href="https://doi.org/10.1142/S0218127413300267" ext-link-type="DOI">10.1142/S0218127413300267</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Madec(2008)</label><?label Madec2008?><mixed-citation>
Madec, G.: NEMO ocean engine, Note du Pôle de modélisation, No
27, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Padberg-Gehle and Schneide(2017)</label><?label Padberg-Gehle2017?><mixed-citation>Padberg-Gehle, K. and Schneide, C.: Network-based study of Lagrangian transport and mixing, Nonlin. Processes Geophys., 24, 661–671, <ext-link xlink:href="https://doi.org/10.5194/npg-24-661-2017" ext-link-type="DOI">10.5194/npg-24-661-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Roweis and Saul(2000)</label><?label roweis2000nonlinear?><mixed-citation>Roweis, S. T. and Saul, L. K.: Nonlinear Dimensionality Reduction by Locally
Linear Embedding, Science, 290, 2323–2326,
<ext-link xlink:href="https://doi.org/10.1126/science.290.5500.2323" ext-link-type="DOI">10.1126/science.290.5500.2323</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Schneide et al.(2018)Schneide, Pandey, Padberg-Gehle, and
Schumacher</label><?label Schneide2018?><mixed-citation>Schneide, C., Pandey, A., Padberg-Gehle, K., and Schumacher, J.: Probing
turbulent superstructures in Rayleigh-Bénard convection by Lagrangian
trajectory clusters, Phys. Rev. Fluids, 3, 113501,
<ext-link xlink:href="https://doi.org/10.1103/PhysRevFluids.3.113501" ext-link-type="DOI">10.1103/PhysRevFluids.3.113501</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Schouten et al.(2000)Schouten, de Ruijter, van Leeuwen, and
Lutjeharms</label><?label Schouten2000?><mixed-citation>Schouten, M. W., de Ruijter, W. P. M., van Leeuwen, P. J., and Lutjeharms, J.
R. E.: Translation, decay and splitting of Agulhas rings in the southeastern
Atlantic Ocean, J. Geophys. Res.-Oceans, 105,
21913–21925, <ext-link xlink:href="https://doi.org/10.1029/1999jc000046" ext-link-type="DOI">10.1029/1999jc000046</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Shi and Malik(2000)</label><?label Shi2000?><mixed-citation>Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905,
<ext-link xlink:href="https://doi.org/10.1109/34.868688" ext-link-type="DOI">10.1109/34.868688</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Tarshish et al.(2018)Tarshish, Abernathey, Zhang, Dufour, Frenger,
and Griffies</label><?label Tarshish2018?><mixed-citation>Tarshish, N., Abernathey, R., Zhang, C., Dufour, C. O., Frenger, I., and
Griffies, S. M.: Identifying Lagrangian coherent vortices in a mesoscale
ocean model, Ocean Model., 130, 15–28,
<ext-link xlink:href="https://doi.org/10.1016/j.ocemod.2018.07.001" ext-link-type="DOI">10.1016/j.ocemod.2018.07.001</ext-link>, 2018.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{{van Sebille} et~al.(2020){Van Sebille}, Aliani, Law, Maximenko,
Alsina, Bagaev, Bergmann, Chapron, Chubarenko, C{\'{o}}zar, Delandmeter,
Egger, Fox-Kemper, Garaba, Goddijn-Murphy, Hardesty, Hoffman, Isobe,
Jongedijk, Kaandorp, Khatmullina, Koelmans, Kukulka, Laufk{\"{o}}tter,
Lebreton, Lobelle, Maes, Martinez-Vicente, {Morales Maqueda}, Poulain-Zarcos,
Rodr{\'{i}}guez, Ryan, Shanks, Shim, Suaria, Thiel, {Van Den Bremer}, and
Wichmann}}?><label>van Sebille et al.(2020)Van Sebille, Aliani, Law, Maximenko,
Alsina, Bagaev, Bergmann, Chapron, Chubarenko, Cózar, Delandmeter,
Egger, Fox-Kemper, Garaba, Goddijn-Murphy, Hardesty, Hoffman, Isobe,
Jongedijk, Kaandorp, Khatmullina, Koelmans, Kukulka, Laufkötter,
Lebreton, Lobelle, Maes, Martinez-Vicente, Morales Maqueda, Poulain-Zarcos,
Rodríguez, Ryan, Shanks, Shim, Suaria, Thiel, Van Den Bremer, and
Wichmann</label><?label VanSebille2020?><mixed-citation>van Sebille, E., Aliani, S., Law, K. L., Maximenko, N., Alsina, J. M.,
Bagaev, A., Bergmann, M., Chapron, B., Chubarenko, I., Cózar, A.,
Delandmeter, P., Egger, M., Fox-Kemper, B., Garaba, S. P., Goddijn-Murphy,
L., Hardesty, B. D., Hoffman, M. J., Isobe, A., Jongedijk, C. E., Kaandorp,
M. L., Khatmullina, L., Koelmans, A. A., Kukulka, T., Laufkötter, C.,
Lebreton, L., Lobelle, D., Maes, C., Martinez-Vicente, V., Morales Maqueda,
M. A., Poulain-Zarcos, M., Rodríguez, E., Ryan, P. G., Shanks, A. L.,
Shim, W. J., Suaria, G., Thiel, M., Van Den Bremer, T. S., and Wichmann,
D.: The physical oceanography of the transport of floating marine debris,
Environ. Res. Lett., 15, 023003,
<ext-link xlink:href="https://doi.org/10.1088/1748-9326/ab6d7d" ext-link-type="DOI">10.1088/1748-9326/ab6d7d</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Von Luxburg(2007)</label><?label VonLuxburg2007?><mixed-citation>Von Luxburg, U.: A Tutorial on spectral clustering, Stat.
Comput., 17, 395–416, <ext-link xlink:href="https://doi.org/10.1007/s11222-007-9033-z" ext-link-type="DOI">10.1007/s11222-007-9033-z</ext-link>,
2007.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Wichmann(2020{\natexlab{a}})}}?><label>Wichmann(2020a)</label><?label agulhasanimation?><mixed-citation>Wichmann, D.: Animation of finite-time coherent sets in the Agulhas region, Zenodo,
<ext-link xlink:href="https://doi.org/10.5281/zenodo.4103741" ext-link-type="DOI">10.5281/zenodo.4103741</ext-link>, 2020a.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Wichmann(2020{\natexlab{b}})}}?><label>Wichmann(2020b)</label><?label agulhasdata?><mixed-citation>Wichmann, D.: Lagrangian particle dataset (2 years) for Agulhas region surface
flow, Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.3899942" ext-link-type="DOI">10.5281/zenodo.3899942</ext-link>, 2020b.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Wichmann(2021a)</label><?label Wichmann2021a?><mixed-citation>Wichmann, D.: OceanParcels/coherent_vortices_OPTICS: Release for publication of corresponding paper (Version v1.0), Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.4426287" ext-link-type="DOI">10.5281/zenodo.4426287</ext-link>, 2021a.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Wichmann(2021b)</label><?label Wichmann2021b?><mixed-citation>Wichmann, D.: OceanParcels/near_surface_microplastic: Release of code for near-surface microplastic simulations, Zenodo,  <ext-link xlink:href="https://doi.org/10.5281/zenodo.4426310" ext-link-type="DOI">10.5281/zenodo.4426310</ext-link>, 2021b.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Wichmann et al.(2020)Wichmann, Kehl, Dijkstra, and van
Sebille</label><?label wichmnpg?><mixed-citation>Wichmann, D., Kehl, C., Dijkstra, H. A., and van Sebille, E.: Detecting flow features in scarce trajectory data using networks derived from symbolic itineraries: an application to surface drifters in the North Atlantic, Nonlin. Processes Geophys., 27, 501–518, <ext-link xlink:href="https://doi.org/10.5194/npg-27-501-2020" ext-link-type="DOI">10.5194/npg-27-501-2020</ext-link>, 2020.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Ordering of trajectories reveals hierarchical finite-time coherent sets in Lagrangian particle data: detecting Agulhas rings in the South Atlantic Ocean</article-title-html>
<abstract-html><p>The detection of finite-time coherent particle sets in Lagrangian trajectory data, using data-clustering techniques, is an active research field at the moment. Yet, the clustering methods mostly employed so far have been based on graph partitioning, which assigns each trajectory to a cluster, i.e. there is no concept of noisy, incoherent trajectories. This is problematic for applications in the ocean, where many small, coherent eddies are present in a large, mostly noisy fluid flow. Here, for the first time in this context, we use the density-based clustering algorithm of OPTICS (ordering points to identify the clustering structure; Ankerst et al., 1999) to detect finite-time coherent particle sets in Lagrangian trajectory data. Different from partition-based clustering methods, derived clustering results contain a concept of noise, such that not every trajectory needs to be part of a cluster. OPTICS also has a major advantage compared to the previously used density-based spatial clustering of applications with noise (DBSCAN) method, as it can detect clusters of varying density. The resulting clusters have an intrinsically hierarchical structure, which allows one to detect coherent trajectory sets at different spatial scales at once. We apply OPTICS directly to Lagrangian trajectory data in the Bickley jet model flow and successfully detect the expected vortices and the jet. The resulting clustering separates the vortices and the jet from background noise, with an imprint of the hierarchical clustering structure of coherent, small-scale vortices in a coherent, large-scale background flow. We then apply our method to a set of virtual trajectories released in the eastern South Atlantic Ocean in an eddying ocean model and successfully detect Agulhas rings. We illustrate the difference between our approach and partition-based <i>k</i>-means clustering using a 2D embedding of the trajectories derived from classical multidimensional scaling. We also show how OPTICS can be applied to the spectral embedding of a trajectory-based network to overcome the problems of <i>k</i>-means spectral clustering in detecting Agulhas rings.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Ankerst et al.(1999)Ankerst, Breunig, Kriegel, and
Sander</label><mixed-citation>
Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J.: OPTICS: Ordering
Points to Identify the Clustering Structure, ACM Sigmod Record, 28, 49–60,
<a href="https://doi.org/10.1145/304181.304187" target="_blank">https://doi.org/10.1145/304181.304187</a>, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Banisch and Koltai(2017)</label><mixed-citation>
Banisch, R. and Koltai, P.: Understanding the geometry of transport: Diffusion
maps for Lagrangian trajectory data unravel coherent sets, Chaos: An
Interdisciplinary J. Nonlinear Sci., 27, 035804,
<a href="https://doi.org/10.1063/1.4971788" target="_blank">https://doi.org/10.1063/1.4971788</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Beron-Vera et al.(2013)Beron-Vera, Wang, Olascoaga, Goni, and
Haller</label><mixed-citation>
Beron-Vera, F. J., Wang, Y., Olascoaga, M. J., Goni, G. J., and Haller, G.:
Objective Detection of Oceanic Eddies and the Agulhas Leakage, J.
Phys. Oceanogr., 43, 1426–1438,
<a href="https://doi.org/10.1175/JPO-D-12-0171.1" target="_blank">https://doi.org/10.1175/JPO-D-12-0171.1</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bickley(1937)</label><mixed-citation>
Bickley, W.: LXXIII. The plane jet, The London, Edinburgh, and Dublin
Philosophical Magazine and Journal of Science, 23, 727–731,
<a href="https://doi.org/10.1080/14786443708561847" target="_blank">https://doi.org/10.1080/14786443708561847</a>, 1937.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Brach et al.(2018)Brach, Deixonne, Bernard, Durand, Desjean, Perez,
van Sebille, and ter Halle</label><mixed-citation>
Brach, L., Deixonne, P., Bernard, M. F., Durand, E., Desjean, M. C., Perez, E.,
van Sebille, E., and ter Halle, A.: Anticyclonic eddies increase
accumulation of microplastic in the North Atlantic subtropical gyre, Marine
Pollution Bulletin, 126, 191–196,
<a href="https://doi.org/10.1016/j.marpolbul.2017.10.077" target="_blank">https://doi.org/10.1016/j.marpolbul.2017.10.077</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>del Castillo-Negrete and Morrison(1993)</label><mixed-citation>
del Castillo-Negrete, D. and Morrison, P.: Chaotic transport by Rossby waves in
shear flow, Phys. Fluids A, 5, 948–965,
<a href="https://doi.org/10.1063/1.858639" target="_blank">https://doi.org/10.1063/1.858639</a>, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Delandmeter and van Sebille(2019)</label><mixed-citation>
Delandmeter, P. and van Sebille, E.: The Parcels v2.0 Lagrangian framework: new field interpolation schemes, Geosci. Model Dev., 12, 3571–3584, <a href="https://doi.org/10.5194/gmd-12-3571-2019" target="_blank">https://doi.org/10.5194/gmd-12-3571-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Dencausse et al.(2010)Dencausse, Arhan, and Speich</label><mixed-citation>
Dencausse, G., Arhan, M., and Speich, S.: Routes of Agulhas rings in the
southeastern Cape Basin, Deep-Sea Res. Pt. I, 57, 1406–1421, <a href="https://doi.org/10.1016/j.dsr.2010.07.008" target="_blank">https://doi.org/10.1016/j.dsr.2010.07.008</a>,
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Dong et al.(2014)Dong, McWilliams, Liu, and Chen</label><mixed-citation>
Dong, C., McWilliams, J. C., Liu, Y., and Chen, D.: Global heat and salt
transports by eddy movement, Nature Commun., 5, 3294,
<a href="https://doi.org/10.1038/ncomms4294" target="_blank">https://doi.org/10.1038/ncomms4294</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Dussin et al.(2016)Dussin, Barnier, and Brodeau</label><mixed-citation>
Dussin, R., Barnier, B., and Brodeau, L.: The making of Drakkar forcing set
DFS5, Tech. rep., LGGE, Grenoble, France, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Ester et al.(1996)Ester, Kriegel, Sander, and Xu</label><mixed-citation>
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X.: A Density-Based Algorithm
for Discovering Clusters in Large Spatial Databases with Noise, in:
Proceedings of the Second International Conference on Knowledge Discovery and
Data Mining, KDD'96,  226–231, AAAI Press, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Fouss et al.(2016)Fouss, Saerens, and Shimbo</label><mixed-citation>
Fouss, F., Saerens, M., and Shimbo, M.: Algorithms and models for network data
and link analysis, Cambridge University Press, Cambridge, <a href="https://doi.org/10.1017/CBO9781316418321" target="_blank">https://doi.org/10.1017/CBO9781316418321</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Froyland and Junge(2018)</label><mixed-citation>
Froyland, G. and Junge, O.: Robust FEM-based extraction of finite-time
coherent sets using scattered, sparse, and incomplete trajectories, SIAM
Journal on Applied Dynamical Systems, 17, 1891–1924,
<a href="https://doi.org/10.1137/17M1129738" target="_blank">https://doi.org/10.1137/17M1129738</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Froyland and Padberg-Gehle(2015)</label><mixed-citation>
Froyland, G. and Padberg-Gehle, K.: A rough-and-ready cluster-based approach
for extracting finite-time coherent sets from sparse and incomplete
trajectory data, Chaos: An Interdisciplinary J. Nonlinear Sci.,
25, 087406, <a href="https://doi.org/10.1063/1.4926372" target="_blank">https://doi.org/10.1063/1.4926372</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Froyland et al.(2010)Froyland, Santitissadeekorn, and
Monahan</label><mixed-citation>
Froyland, G., Santitissadeekorn, N., and Monahan, A.: Transport in
time-dependent dynamical systems: Finite-time coherent sets, Chaos: An
Interdisciplinary J. Nonlinear Sci., 20, 043116,
<a href="https://doi.org/10.1063/1.3502450" target="_blank">https://doi.org/10.1063/1.3502450</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Froyland et al.(2014)Froyland, Stuart, and van
Sebille</label><mixed-citation>
Froyland, G., Stuart, R. M., and van Sebille, E.: How well-connected is the
surface of the global ocean?, Chaos: An Interdisciplinary J.
Nonlinear Sci., 24, 033126, <a href="https://doi.org/10.1063/1.4892530" target="_blank">https://doi.org/10.1063/1.4892530</a>,
2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Froyland et al.(2015)Froyland, Horenkamp, Rossi, and van
Sebille</label><mixed-citation>
Froyland, G., Horenkamp, C., Rossi, V., and van Sebille, E.: Studying an
Agulhas ring's long-term pathway and decay with finite-time coherent sets,
Chaos: An Interdisciplinary J. Nonlinear Sci., 25, 083119,
<a href="https://doi.org/10.1063/1.4927830" target="_blank">https://doi.org/10.1063/1.4927830</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Froyland et al.(2019)Froyland, Rock, and Sakellariou</label><mixed-citation>
Froyland, G., Rock, C. P., and Sakellariou, K.: Sparse eigenbasis
approximation: Multiple feature extraction across spatiotemporal scales with
application to coherent set identification, Communications in Nonlinear
Science and Numerical Simulation, 77, 81–107,
<a href="https://doi.org/10.1016/j.cnsns.2019.04.012" target="_blank">https://doi.org/10.1016/j.cnsns.2019.04.012</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Hadjighasem et al.(2016)Hadjighasem, Karrasch, Teramoto, and
Haller</label><mixed-citation>
Hadjighasem, A., Karrasch, D., Teramoto, H., and Haller, G.:
Spectral-clustering approach to Lagrangian vortex detection, Phys.
Rev. E, 93, 063107, <a href="https://doi.org/10.1103/PhysRevE.93.063107" target="_blank">https://doi.org/10.1103/PhysRevE.93.063107</a>,
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Hadjighasem et al.(2017)Hadjighasem, Farazmand, Blazevski, Froyland,
and Haller</label><mixed-citation>
Hadjighasem, A., Farazmand, M., Blazevski, D., Froyland, G., and Haller, G.: A
critical comparison of Lagrangian methods for coherent structure detection,
Chaos: An Interdisciplinary J. Nonlinear Sci., 27, 053104,
<a href="https://doi.org/10.1063/1.4982720" target="_blank">https://doi.org/10.1063/1.4982720</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Haller and Beron-Vera(2013)</label><mixed-citation>
Haller, G. and Beron-Vera, F. J.: Coherent Lagrangian vortices: The black
holes of turbulence, J. Fluid Mech., 731,
<a href="https://doi.org/10.1017/jfm.2013.391" target="_blank">https://doi.org/10.1017/jfm.2013.391</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Lange and van Sebille(2017)</label><mixed-citation>
Lange, M. and van Sebille, E.: Parcels v0.9: prototyping a Lagrangian ocean analysis framework for the petascale age, Geosci. Model Dev., 10, 4175–4186, <a href="https://doi.org/10.5194/gmd-10-4175-2017" target="_blank">https://doi.org/10.5194/gmd-10-4175-2017</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Ma and Bollt(2013)</label><mixed-citation>
Ma, T. and Bollt, E. M.: Relatively Coherent Sets as a Hierarchical Partition
Method, Int. J. Bifurcat.Chaos, 23, 1330026,
<a href="https://doi.org/10.1142/S0218127413300267" target="_blank">https://doi.org/10.1142/S0218127413300267</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Madec(2008)</label><mixed-citation>
Madec, G.: NEMO ocean engine, Note du Pôle de modélisation, No
27, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Padberg-Gehle and Schneide(2017)</label><mixed-citation>
Padberg-Gehle, K. and Schneide, C.: Network-based study of Lagrangian transport and mixing, Nonlin. Processes Geophys., 24, 661–671, <a href="https://doi.org/10.5194/npg-24-661-2017" target="_blank">https://doi.org/10.5194/npg-24-661-2017</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Roweis and Saul(2000)</label><mixed-citation>
Roweis, S. T. and Saul, L. K.: Nonlinear Dimensionality Reduction by Locally
Linear Embedding, Science, 290, 2323–2326,
<a href="https://doi.org/10.1126/science.290.5500.2323" target="_blank">https://doi.org/10.1126/science.290.5500.2323</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Schneide et al.(2018)Schneide, Pandey, Padberg-Gehle, and
Schumacher</label><mixed-citation>
Schneide, C., Pandey, A., Padberg-Gehle, K., and Schumacher, J.: Probing
turbulent superstructures in Rayleigh-Bénard convection by Lagrangian
trajectory clusters, Phys. Rev. Fluids, 3, 113501,
<a href="https://doi.org/10.1103/PhysRevFluids.3.113501" target="_blank">https://doi.org/10.1103/PhysRevFluids.3.113501</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Schouten et al.(2000)Schouten, de Ruijter, van Leeuwen, and
Lutjeharms</label><mixed-citation>
Schouten, M. W., de Ruijter, W. P. M., van Leeuwen, P. J., and Lutjeharms, J.
R. E.: Translation, decay and splitting of Agulhas rings in the southeastern
Atlantic Ocean, J. Geophys. Res.-Oceans, 105,
21913–21925, <a href="https://doi.org/10.1029/1999jc000046" target="_blank">https://doi.org/10.1029/1999jc000046</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Shi and Malik(2000)</label><mixed-citation>
Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905,
<a href="https://doi.org/10.1109/34.868688" target="_blank">https://doi.org/10.1109/34.868688</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Tarshish et al.(2018)Tarshish, Abernathey, Zhang, Dufour, Frenger,
and Griffies</label><mixed-citation>
Tarshish, N., Abernathey, R., Zhang, C., Dufour, C. O., Frenger, I., and
Griffies, S. M.: Identifying Lagrangian coherent vortices in a mesoscale
ocean model, Ocean Model., 130, 15–28,
<a href="https://doi.org/10.1016/j.ocemod.2018.07.001" target="_blank">https://doi.org/10.1016/j.ocemod.2018.07.001</a>, 2018.

</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>van Sebille et al.(2020)Van Sebille, Aliani, Law, Maximenko,
Alsina, Bagaev, Bergmann, Chapron, Chubarenko, Cózar, Delandmeter,
Egger, Fox-Kemper, Garaba, Goddijn-Murphy, Hardesty, Hoffman, Isobe,
Jongedijk, Kaandorp, Khatmullina, Koelmans, Kukulka, Laufkötter,
Lebreton, Lobelle, Maes, Martinez-Vicente, Morales Maqueda, Poulain-Zarcos,
Rodríguez, Ryan, Shanks, Shim, Suaria, Thiel, Van Den Bremer, and
Wichmann</label><mixed-citation>
van Sebille, E., Aliani, S., Law, K. L., Maximenko, N., Alsina, J. M.,
Bagaev, A., Bergmann, M., Chapron, B., Chubarenko, I., Cózar, A.,
Delandmeter, P., Egger, M., Fox-Kemper, B., Garaba, S. P., Goddijn-Murphy,
L., Hardesty, B. D., Hoffman, M. J., Isobe, A., Jongedijk, C. E., Kaandorp,
M. L., Khatmullina, L., Koelmans, A. A., Kukulka, T., Laufkötter, C.,
Lebreton, L., Lobelle, D., Maes, C., Martinez-Vicente, V., Morales Maqueda,
M. A., Poulain-Zarcos, M., Rodríguez, E., Ryan, P. G., Shanks, A. L.,
Shim, W. J., Suaria, G., Thiel, M., Van Den Bremer, T. S., and Wichmann,
D.: The physical oceanography of the transport of floating marine debris,
Environ. Res. Lett., 15, 023003,
<a href="https://doi.org/10.1088/1748-9326/ab6d7d" target="_blank">https://doi.org/10.1088/1748-9326/ab6d7d</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Von Luxburg(2007)</label><mixed-citation>
Von Luxburg, U.: A Tutorial on spectral clustering, Stat.
Comput., 17, 395–416, <a href="https://doi.org/10.1007/s11222-007-9033-z" target="_blank">https://doi.org/10.1007/s11222-007-9033-z</a>,
2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Wichmann(2020a)</label><mixed-citation>
Wichmann, D.: Animation of finite-time coherent sets in the Agulhas region, Zenodo,
<a href="https://doi.org/10.5281/zenodo.4103741" target="_blank">https://doi.org/10.5281/zenodo.4103741</a>, 2020a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Wichmann(2020b)</label><mixed-citation>
Wichmann, D.: Lagrangian particle dataset (2 years) for Agulhas region surface
flow, Zenodo, <a href="https://doi.org/10.5281/zenodo.3899942" target="_blank">https://doi.org/10.5281/zenodo.3899942</a>, 2020b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Wichmann(2021a)</label><mixed-citation>
Wichmann, D.: OceanParcels/coherent_vortices_OPTICS: Release for publication of corresponding paper (Version v1.0), Zenodo, <a href="https://doi.org/10.5281/zenodo.4426287" target="_blank">https://doi.org/10.5281/zenodo.4426287</a>, 2021a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Wichmann(2021b)</label><mixed-citation>
Wichmann, D.: OceanParcels/near_surface_microplastic: Release of code for near-surface microplastic simulations, Zenodo,  <a href="https://doi.org/10.5281/zenodo.4426310" target="_blank">https://doi.org/10.5281/zenodo.4426310</a>, 2021b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Wichmann et al.(2020)Wichmann, Kehl, Dijkstra, and van
Sebille</label><mixed-citation>
Wichmann, D., Kehl, C., Dijkstra, H. A., and van Sebille, E.: Detecting flow features in scarce trajectory data using networks derived from symbolic itineraries: an application to surface drifters in the North Atlantic, Nonlin. Processes Geophys., 27, 501–518, <a href="https://doi.org/10.5194/npg-27-501-2020" target="_blank">https://doi.org/10.5194/npg-27-501-2020</a>, 2020.
</mixed-citation></ref-html>--></article>
