Articles | Volume 21, issue 6
Nonlin. Processes Geophys., 21, 1145–1157, 2014

Special issue: Physics-driven data mining in climate change and weather...

Nonlin. Processes Geophys., 21, 1145–1157, 2014

Research article 01 Dec 2014

Research article | 01 Dec 2014

Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

D. Das1,2, J. Dy3, J. Ross3, Z. Obradovic2, and A. R. Ganguly1 D. Das et al.
  • 1Sustainability and Data Sciences Lab, Northeastern University, Boston, MA, USA
  • 2Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA, USA
  • 3Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA

Abstract. Climate projections simulated by Global Climate Models (GCMs) are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often preclude their application to accurately assessing the effects of climate change on finer regional-scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods is often performed for regional climate projections. Statistical downscaling (SD) is based on the understanding that the regional climate is influenced by two factors – the large-scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model that relates these features (predictors) to a climatic variable of interest (predictand) based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet process (DP) for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence more generalizable than non-sparse alternatives, and lend themselves to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling show that our method can lead to new insights.