Articles | Volume 27, issue 3
https://doi.org/10.5194/npg-27-453-2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/npg-27-453-2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Applications of matrix factorization methods to climate data
CSIRO Oceans and Atmosphere, Hobart, Australia
Terence J. O'Kane
CSIRO Oceans and Atmosphere, Hobart, Australia
Related authors
No articles found.
Mark Collier, Dylan Harries, and Terence O'Kane
EGUsphere, https://doi.org/10.5194/egusphere-2025-3948, https://doi.org/10.5194/egusphere-2025-3948, 2025
This preprint is open for discussion and under review for Nonlinear Processes in Geophysics (NPG).
Short summary
Short summary
Here we apply Bayesian methods to reconstructed and simulated climate model data over past decades to determine the role of long timescale phase dependencies, and extratropical teleconnections, on the major drivers of tropical climate variability.
Serena Schroeter, Terence J. O'Kane, and Paul A. Sandery
The Cryosphere, 17, 701–717, https://doi.org/10.5194/tc-17-701-2023, https://doi.org/10.5194/tc-17-701-2023, 2023
Short summary
Short summary
Antarctic sea ice has increased over much of the satellite record, but we show that the early, strongly opposing regional trends diminish and reverse over time, leading to overall negative trends in recent decades. The dominant pattern of atmospheric flow has changed from strongly east–west to more wave-like with enhanced north–south winds. Sea surface temperatures have also changed from circumpolar cooling to regional warming, suggesting recent record low sea ice will not rapidly recover.
Cited articles
Aloise, D., Deshpande, A., Hansen, P., and Popat, P.: NP-hardness of Euclidean
sum-of-squares clustering, Mach. Learn., 75, 245–248,
https://doi.org/10.1007/s10994-009-5103-0, 2009. a
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., and Perona, I.: An extensive comparative study of cluster validity indices, Pattern Recognition, 46, 243–256, 2013. a
Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J.: Clustering with
Bregman Divergences, J. Mach. Learn. Res., 6, 1705–1749,
https://doi.org/10.1007/s10994-005-5825-6, 2005. a
Barnston, A. G. and Livezey, R. E.: Classification, Seasonality and
Persistence of Low-Frequency Atmospheric Circulation Patterns, Mon.
Weather Rev., 115, 1083–1126,
https://doi.org/10.1175/1520-0493(1987)115<1083:csapol>2.0.co;2, 1987. a
Barriopedro, D., Fischer, E. M., Luterbacher, J., Trigo, R. M., and
García-Herrera, R.: The Hot Summer of 2010: Redrawing the Temperature
Record Map of Europe, Science, 332, 220–224, https://doi.org/10.1126/science.1201224, 2011. a
Bezdek, J. C., Ehrlich, R., and Full, W.: FCM: The Fuzzy c-Means Clustering
Algorithm, Comput. Geosci., 10, 191–203,
https://doi.org/10.1109/igarss.1988.569600, 1984. a
Bregman, L.: The relaxation method of finding the common point of convex sets
and its application to the solution of problems in convex programming, USSR
Comp. Math. Math+, 7, 200–217,
https://doi.org/10.1016/0041-5553(67)90040-7,
1967. a
Bueh, C. and Nakamura, H.: Scandinavian pattern and its climatic impact,
Q. J. Roy. Meteor. Soc., 133, 2117–2131,
https://doi.org/10.1002/qj.173,
2007. a
Cheng, X. and Wallace, J. M.: Cluster Analysis of the Northern Hemisphere
Wintertime 500-hPa Height Field: Spatial Patterns, J. Atmos.
Sci., 50, 2674–2696,
https://doi.org/10.1175/1520-0469(1993)050<2674:CAOTNH>2.0.CO;2,
1993. a
Christiansen, B.: Atmospheric Circulation Regimes: Can Cluster Analysis Provide
the Number?, J. Climate, 20, 2229–2250, https://doi.org/10.1175/JCLI4107.1, 2007. a
Cutler, A. and Breiman, L.: Archetypal Analysis, Technometrics, 36, 338–347, 1994. a
Damianou, A. C., Titsias, M. K., and Lawrence, N. D.: Variational Gaussian Process Dynamical Systems, in: Advances in Neural Information Processing Systems 24 (NIPS 2011), 12–17 December 2011, Granada, Spain, 2510–2518, 2011. a
Ding, C. and He, X.: K-means clustering via principal component analysis, in:
Proceedings of the twenty-first international conference on Machine learning (ICML 2004),
4–8 July 2004, Banff, Canada, 29–37, 2004. a
Dole, R. M., Hoerling, M., Perlwitz, J., Eischeid, J., Pegion, P., Zhang, T.,
Quan, X.-W., Xu, T., and Murray, D.: Was there a basis for anticipating the
2010 Russian heat wave?, Geophys. Res. Lett., 38, L06702,
https://doi.org/10.1029/2010GL046582,
2011. a, b
Dole, R. M. and Gordon, N. D.: Persistent Anomalies of the Extratropical
Northern Hemisphere Wintertime Circulation: Geographical Distribution and
Regional Persistence Characteristics, Mon. Weather Rev., 111,
1567–1586, https://doi.org/10.1175/1520-0493(1983)111<1567:PAOTEN>2.0.CO;2,
1983. a, b
Dommenget, D. and Latif, M.: A cautionary note on the interpretation of EOFs, J. Climate, 15, 216–225,
https://doi.org/10.1175/1520-0442(2002)015<0216:ACNOTI>2.0.CO;2, 2002. a
Dunn, J. C.: A fuzzy relative of the ISODATA process and its use in detecting
compact well-separated clusters, J. Cybernetics, 3, 32–57,
https://doi.org/10.1080/01969727308546046, 1973. a
Eckart, C. and Young, G.: The approximation of one matrix by another of lower
rank, Psychometrika, 1, 211–218, https://doi.org/10.1007/BF02288367, 1936. a
Efimov, V., Prusov, A., and Shokurov, M.: Patterns of interannual variability
defined by a cluster analysis and their relation with ENSO, Q. J. Roy. Meteor. Soc., 121, 1651–1679, 1995. a
Fereday, D. R., Knight, J. R., Scaife, A. A., Folland, C. K., and Philipp, A.:
Cluster Analysis of North Atlantic–European Circulation Types and Links
with Tropical Pacific Sea Surface Temperatures, J. Climate, 21,
3687–3703, https://doi.org/10.1175/2007JCLI1875.1, 2008. a
Forgey, E.: Cluster analysis of multivariate data: Efficiency vs.
interpretability of classification, Biometrics, 21, 768–769, 1965. a
Gönen, M., Khan, S., and Kaski, S.: Kernelized Bayesian matrix
factorization, in: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), 17–19 June 2013, Atlanta, USA, 864–872,
2013. a
Hannachi, A. and Legras, B.: Simulated annealing and weather regimes
classification, Tellus A, 47, 955–973,
https://doi.org/10.1034/j.1600-0870.1995.00203.x,
1995. a
Hannachi, A. and Trendafilov, N.: Archetypal analysis: Mining weather and
climate extremes, J. Climate, 30, 6927–6944,
https://doi.org/10.1175/JCLI-D-16-0798.1, 2017. a, b, c, d
Hannachi, A., Jolliffe, I. T., and Stephenson, D. B.: Empirical orthogonal
functions and related techniques in atmospheric science: A review,
Int. J. Climatol., 27, 1119–1152, https://doi.org/10.1002/joc.1499,
2007. a
Harada, Y., Kamahori, H., Kobayashi, C., Endo, H., Kobayashi, S., Ota, Y.,
Onoda, H., Onogi, K., Miyaoka, K., and Takahashi, K.: The JRA-55 Reanalysis:
Representation of Atmospheric Circulation and Climate Variability, J. Meteorol. Soc. Jpn. Ser. II, 94, 269–302,
https://doi.org/10.2151/jmsj.2016-015, 2016. a
Harries, D. and O'Kane, T. J.: Matrix factorization case studies code, Zenodo, https://doi.org/10.5281/zenodo.3723948, 2020. a
Hartigan, J. A. and Wong, M. A.: Algorithm AS 136: A K-Means Clustering
Algorithm, J. Roy. Stat. Soc. C-App., 28, 100–108,
https://doi.org/10.2307/2346830, 1979. a
Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical
Learning: Data Mining, Inference and Prediction, Springer, New York, USA, 2005. a
Horenko, I.: On a scalable entropic breaching of the overfitting barrier in
machine learning, Neural Computation, arXiv [preprint], arXiv:2002.03176, 8 February 2020. a
Hunter, J. D.: Matplotlib: A 2D graphics environment, Comput. Sci.
Eng., 9, 90–95, https://doi.org/10.1109/MCSE.2007.55, 2007. a
Huth, R., Beck, C., Philipp, A., Demuzere, M., Ustrnul, Z., Cahynová, M.,
Kyselý, J., and Tveito, O. E.: Classifications of Atmospheric Circulation
Patterns, Ann. NY Acad. Sci., 1146, 105–152,
https://doi.org/10.1196/annals.1446.019, 2008. a
Jolliffe, I. T., Trendafilov, N. T., and Uddin, M.: A Modified Principal
Component Technique Based on the LASSO, J. Comput.
Graph. Stat., 12, 531–547, https://doi.org/10.1198/1061860032148, 2003. a, b
Kaiser, E., Noack, B. R., Cordier, L., Spohn, A., Segond, M., Abel, M.,
Daviller, G., Östh, J., Krajnović, S., and Niven, R. K.:
Cluster-based reduced-order modelling of a mixing layer, J. Fluid
Mech., 754, 365–414, 2014. a
Kaiser, H. F.: The varimax criterion for analytic rotation in factor analysis, Psychometrika, 23, 187–200, https://doi.org/10.1007/BF02289233, 1958. a
Kidson, J. W.: The Utility Of Surface And Upper Air Data In Synoptic
Climatological Specification Of Surface Climatic Variables, Int.
J. Climatol., 17, 399–413,
https://doi.org/10.1002/(SICI)1097-0088(19970330)17:4<399::AID-JOC108>3.0.CO;2-M, 1997. a
Kidson, J. W.: An analysis of New Zealand synoptic types and their use in
defining weather regimes, Int. J. Climatol., 20, 299–316,
https://doi.org/10.1002/(SICI)1097-0088(20000315)20:3<299::AID-JOC474>3.0.CO;2-B, 2000. a
Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, H., Onogi,
K., Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, K., and Takahashi, K.:
The JRA-55 Reanalysis: General Specifications and Basic Characteristics,
J. Meteorol. Soc. Jpn. Ser. II, 93, 5–48,
https://doi.org/10.2151/jmsj.2015-001, 2015 (data available at: https://jra.kishou.go.jp/JRA-55/index_en.html, last access: 12 April 2019). a, b
Lau, K.-M., Sheu, P.-J., and Kang, I.-S.: Multiscale Low-Frequency Circulation Modes in the Global Atmosphere, J. Atmos. Sci., 51,
1169–1193, 1994. a
Lawrence, N.: Probabilistic Non-linear Principal Component Analysis with
Gaussian Process Latent Variable Models, J. Mach. Learn. Res., 6, 1783–1816, 2005. a
Legras, B., Desponts, T., and Piguet, B.: Cluster analysis and weather regimes, in: Seminar on the Nature and Prediction of Extra Tropical Weather Systems, 7–11 September 1987, Reading, UK, 123–150, 1987. a
Li, T. and Ding, C.: The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering, in: Proceedings of the Sixth International Conference on Data Mining (ICDM'06), 18–22 December 2006, Hong Kong, China, 362–371, 2006. a
Lloyd, S.: Least squares quantization in PCM, IEEE T.
Inform. Theory, 28, 129–137, https://doi.org/10.1109/TIT.1982.1056489, 1982. a, b
Lorenz, E. N.: Empirical Orthogonal Functions and Statistical Weather
Prediction, Tech. rep., Massachusetts Institute of Technology, Cambridge, UK,
1956. a
MacKay, D. J. C.: Information theory, inference and learning algorithms,
Cambridge University Press, Cambridge, UK, 2003. a
MacQueen, J.: Some methods for classification and analysis of multivariate observations, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, 21 June–18 July 1965 and 27 December 1965–7 January 1966, Berkeley, USA, 281–297, 1967. a
Mahajan, M., Nimbhorkar, P., and Varadarajan, K.: The planar k-means problem is NP-hard, Theor. Comput. Sci., 442, 13–21,
https://doi.org/10.1016/j.tcs.2010.05.034,
2012. a
Matsueda, M.: Predictability of Euro-Russian blocking in summer of 2010,
Geophys. Res. Lett., 38, L06801, https://doi.org/10.1029/2010GL046557,
2011. a
Michelangeli, P.-A., Vautard, R., and Legras, B.: Weather Regimes: Recurrence
and Quasi Stationarity, J. Atmos. Sci., 52, 1237–1256,
https://doi.org/10.1175/1520-0469(1995)052<1237:WRRAQS>2.0.CO;2,
1995. a, b
Mnih, A. and Salakhutdinov, R. R.: Probabilistic Matrix Factorization, in: Advances in Neural Information Processing Systems 20 (NIPS 2007), 3–6 December 2007, Vancouver, Canada, 1257–1264, 2008. a
Mo, K. and Ghil, M.: Cluster analysis of multiple planetary flow regimes,
J. Geophys. Res.-Atmos., 93, 10927–10952,
https://doi.org/10.1029/JD093iD09p10927,
1988. a, b, c
Molteni, F., Tibaldi, S., and Palmer, T. N.: Regimes in the wintertime
circulation over northern extratropics. I: Observational evidence, Q.
J. Roy. Meteor. Soc., 116, 31–67,
https://doi.org/10.1002/qj.49711649103, 1990. a
Monahan, A. H., Fyfe, J. C., Ambaum, M. H. P., Stephenson, D. B., and North,
G. R.: Empirical Orthogonal Functions: The Medium is the Message, J. Climate, 22, 6501–6514, https://doi.org/10.1175/2009JCLI3062.1, 2009. a
Neal, R., Fereday, D., Crocker, R., and Comer, R. E.: A flexible approach to
defining weather patterns and their application in weather forecasting over
Europe, Meteorol. Appl., 23, 389–400, https://doi.org/10.1002/met.1563,
2016. a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in Python, J. Mach. Learn.
Res., 12, 2825–2830, 2011. a
Pelly, J. L. and Hoskins, B. J.: A New Perspective on Blocking, J.
Atmos. Sci., 60, 743–755, https://doi.org/10.1175/1520-0469(2003)060<0743:ANPOB>2.0.CO;2,
2003. a
Pohl, B. and Fauchereau, N.: The Southern Annular Mode Seen through Weather
Regimes, J. Climate, 25, 3336–3354, https://doi.org/10.1175/JCLI-D-11-00160.1, 2012. a
Rayner, N. A., Parker, D. E., Horton, E. B., Folland, C. K., Alexander, L. V., Rowell, D. P., Kent, E. C., and Kaplan, A.: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late
nineteenth century, J. Geophys. Res.-Atmos., 108, 4407,
https://doi.org/10.1029/2002JD002670,
2003 (data available at: https://www.metoffice.gov.uk/hadobs/hadisst/, last access: 29 April 2019). a, b
Renwick, J. A.: Persistent Positive Anomalies in the Southern Hemisphere
Circulation, Mon. Weather Rev., 133, 977–988, https://doi.org/10.1175/MWR2900.1, 2005. a
Richman, M. B.: Rotation of Principal Components, J. Climatol., 6,
293–335, https://doi.org/10.1177/1746847713485834, 1986. a
Ruspini, E. H.: A New Approach to Clustering, Inform. Control, 15,
22–32, 1969. a
Salakhutdinov, R. and Mnih, A.: Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo, in: Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML'08), 5–9 July 2008, Helsinki, Finland, 880–887, 2008. a
Seth, S. and Eugster, M. J.: Probabilistic archetypal analysis, Mach.
Learn., 102, 85–113, https://doi.org/10.1007/s10994-015-5498-8, 2016. a, b, c
Shan, H. and Banerjee, A.: Generalized Probabilistic Matrix Factorizations for Collaborative Filtering, in: Proceedings of the Tenth IEEE International Conference on Data Mining, 14–17 December 2010, Sydney, Australia, 1025–1030, 2010. a
Shaposhnikov, D., Revich, B., Bellander, T., Bedada, G. B., Bottai, M., Kharkova, T., Kvasha, E., Lezina, E., Lind, T., Semutnikova, E., and Pershagen, G.:
Mortality related to air pollution with the Moscow heat wave and wildfire of
2010, Epidemiology, 25, 359–364, https://doi.org/10.1097/EDE.0000000000000090, 2014. a
Steinschneider, S. and Lall, U.: Daily Precipitation and Tropical Moisture
Exports across the Eastern United States: An Application of Archetypal
Analysis to Identify Spatiotemporal Structure, J. Climate, 28,
8585–8602, https://doi.org/10.1175/JCLI-D-15-0340.1, 2015. a
Stone, E. and Cutler, A.: Introduction to archetypal analysis of
spatio-temporal dynamics, Physica D, 96, 110–131,
https://doi.org/10.1016/0167-2789(96)00016-4, 1996. a
Stone, R. C.: Weather types at Brisbane, Queensland: An example of the use of
principal components and cluster analysis, Int. J.
Climatol., 9, 3–32, https://doi.org/10.1002/joc.3370090103, 1989. a
Straus, D. M., Corti, S., and Molteni, F.: Circulation Regimes: Chaotic
Variability versus SST-Forced Predictability, J. Climate, 20,
2251–2272, https://doi.org/10.1175/JCLI4070.1, 2007. a
Tibshirani, R., Walther, G., and Hastie, T.: Estimating the number of clusters in a data set via the gap statistic, J. Roy. Stat. Soc. B, 63, 411–423,
https://doi.org/10.1111/1467-9868.00293, 2001. a, b
Tipping, M. E. and Bishop, C. M.: Probabilistic Principal Component Analysis,
J. Roy. Stat. Soc. B, 61, 611–622, https://doi.org/10.1111/1467-9868.00196, 1999. a, b
Virtanen, T., Cemgil, A. T., and Godsill, S.: Bayesian extensions to non-negative matrix factorisation for audio signal modelling, in: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing, 30 March–4 April 2008, Las Vegas, USA, 1825–1828, 2008. a
Wang, C., Xie, S.-P., and Carton, J. A.: A Global Survey of Ocean–Atmosphere
Interaction and Climate Variability, American Geophysical Union (AGU), 1–19, https://doi.org/10.1029/147GM01, 2004. a
Wang, J., Hertzmann, A., and Fleet, D. J.: Gaussian Process Dynamical Models, in: Advances in Neural Information Processing Systems 18 (NIPS 2005), 5–8 December 2005, Vancouver, Canada, 1441–1448, 2006. a
Witten, D. M., Tibshirani, R., and Hastie, T.: A penalized matrix
decomposition, with applications to sparse principal components and canonical
correlation analysis, Biostatistics, 10, 515–534,
https://doi.org/10.1093/biostatistics/kxp008, 2009. a, b
Short summary
Different dimension reduction methods may produce profoundly different low-dimensional representations of multiscale systems. We perform a set of case studies to investigate these differences. When a clear scale separation is present, similar bases are obtained using all methods, but when this is not the case some methods may produce representations that are poorly suited for describing features of interest, highlighting the importance of a careful choice of method when designing analyses.
Different dimension reduction methods may produce profoundly different low-dimensional...