Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

Das, D.; Dy, J.; Ross, J.; Obradovic, Z.; Ganguly, A. R.

doi:https://doi.org/10.5194/npg-21-1145-2014

Articles | Volume 21, issue 6

https://doi.org/10.5194/npg-21-1145-2014

© Author(s) 2014. This work is distributed under
the Creative Commons Attribution 3.0 License.

Special issue:

Physics-driven data mining in climate change and weather...

https://doi.org/10.5194/npg-21-1145-2014

© Author(s) 2014. This work is distributed under
the Creative Commons Attribution 3.0 License.

Articles | Volume 21, issue 6

Research article

|

01 Dec 2014

Research article |

| 01 Dec 2014

Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

D. Das, J. Dy, J. Ross, Z. Obradovic, and A. R. Ganguly

Abstract. Climate projections simulated by Global Climate Models (GCMs) are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often preclude their application to accurately assessing the effects of climate change on finer regional-scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods is often performed for regional climate projections. Statistical downscaling (SD) is based on the understanding that the regional climate is influenced by two factors – the large-scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model that relates these features (predictors) to a climatic variable of interest (predictand) based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet process (DP) for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence more generalizable than non-sparse alternatives, and lend themselves to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling show that our method can lead to new insights.

Received: 27 Feb 2014 – Discussion started: 11 Apr 2014 – Revised: 21 Aug 2014 – Accepted: 23 Oct 2014 – Published: 01 Dec 2014