Files
Abstract
Statistical inference plays a crucial role in realizing large-scale intelligent systems that can learn safely and efficiently. This thesis presents two interesting yet challenging problems of modern statistical inference.
In the first part, we consider statistical inference for estimation in online settings. Model parameter estimation through optimization is a classical problem in statistics and machine learning. Algorithms based on stochastic approximation, particularly stochastic gradient descent (SGD) and its variants, have emerged as the workhorses for solving such problems in modern statistical and machine learning. Despite SGD’s tremendous success in practical applications, one cost of the SGD algorithm is the uncertainty of solutions. A crucial aspect of this thesis is to understand the variability inherent in these solutions and perform practical statistical inference. We will discuss both theoretical aspects of inference as well as methods to conduct practical inference. From the theoretical perspective, topics include studying the limiting distribution, where we extend the classic asymptotic normality results for averaged SGD to a general case of weighted averaged SGD. Beyond asymptotic distribution, we also study the concentration properties of SGD solutions under heavy-tailed noise settings. To provide a practical methodology, we introduce an approach to estimate the limiting covariance matrix of SGD estimates in an online fashion and construct confidence intervals as a byproduct. When only confidence intervals are of interest, we further introduce a more computationally efficient way to construct confidence intervals directly without estimating the covariance matrix and enable testing related to high-level confidence.
In the second part, we consider the problem of goodness-of-fit (GoF) testing for parametric models. This testing problem involves a composite null hypothesis, due to the unknown values of the model parameters. In some special cases, co-sufficient sampling (CSS) can remove the influence of these unknown parameters via conditioning on a sufficient statistic—often, the maximum likelihood estimator (MLE) of the unknown parameters. And the recent approximate co-sufficient sampling (aCSS) framework replacing sufficiency with an approximately sufficient statistic (namely, a noisy version of the MLE) to recovers power in a range of settings where CSS leads to a powerless test, but can only be applied in settings where the unconstrained MLE is well-defined and well-behaved, which implicitly assumes a low-dimensional regime. We extend aCSS to the setting of constrained and penalized maximum likelihood estimation, so that more complex estimation problems can now be handled within the aCSS framework, including those in high-dimensional settings