Flexible Bayesian Methods for High Dimensional Data

Saha, Enakshi

doi:10.6082/uchicago.3033

Flexible Bayesian Methods for High Dimensional Data

Saha, Enakshi

2021

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

We study flexible Bayesian methods that are amenable to a wide range of learning problems involving complex high dimensional data structures, with minimal tuning. We consider parametric and semiparametric Bayesian models, that are applicable to both static and dynamic data, arising from a multitude of areas such as economics, finance and marketing, to name a few. A special emphasis is given on deriving probabilistic guarantees of these models, that corroborate their strong empirical performance and can potentially provide insight into interesting avenues for future research.Chapter 1 describes the broader theme of our research. We focus on two important domains of Bayesian Statistics: Bayesian ensemble learning and latent factor models. As part of the first topic, we explore the theoretical properties and empirical adaptability of Bayesian trees and their additive ensembles, along with their multiple incarnations. In the second part of our research we propose a sparse factor analysis model for dynamic data that is suitable for discovering latent structures in multivariate time series arising from a wide range of real life applications. Bayesian additive regression trees (BART) is an ensemble learning technique that has been adapted to a wide range of high dimensional learning tasks. In Chapter 2 we demonstrate that the BART model has a near-optimal posterior concentration rate when the underlying regression function is Holder continuous. In Chapter 3 we demonstrate that this theoretical guarantee extends beyond the regression problem, to encompass response variables belonging to the exponential family, thereby including variants of BART that are adaptable to other important applications, such as classification and count regression. We also prove that these results can be replicated not only for Holder continuous functions but also when the regression function is a step function or a monotone function. In Chapter 4 we demonstrate the scope of BART for discrete choice modeling. We demonstrate that BART exhibits superior predictive accuracy on several benchmark datasets compared to some popular discrete choice models. In Chapter 5, we propose a Bayesian sparse factor analysis model for high dimensional dynamic data. We address some important challenges that often hinder the practical deployment of many existing dynamic factor analysis tools. Firstly, our model infers the number of latent factors from the data, instead of fixing this number to a user-defined value. Moreover both the number of latent factors, as well as the factor loadings are allowed to vary over time. Second, we propose an EM implementation that requires minimal identification constraints and is considerably faster than the MCMC sampler, for high dimensional applications. To demonstrate the efficacy of our model, we study a large scale US macroeconomic data with a special focus on the 2008 financial crisis. Finally Chapter 6 concludes with a discussion on possible implications of our work and some promising future research directions.