Testing on the Edge: Detection and Localization of Sparse Signals in Sequence Data

Xiang, Daniel

doi:10.6082/uchicago.7622

Xiang, Daniel

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Exchangeability and sparsity are fundamental concepts in statistical modeling. The former requires that the model or procedure satisfy a certain symmetry, either probabilistic or operational in nature. On the other hand, sparsity implies that underlying features are mostly null with a small fraction of exceptions, and can be difficult to reconcile with exchangeability in complex data structures. Chapter 2 discusses joint work with Professor Peter McCullagh on modeling exchangeable random graphs. Compatible notions of exchangeability and sparsity are motivated by the Ewens process (see, e.g. Crane (2016) and Tavare (2021)), which is exchangeable and sparse when viewed as a sequence of distributions on a random permutation. The Permanental Graph Model (PGM) is proposed as a generalization of the random permutation to a directed random graph, which is characterized mathematically by its normalization constant and degree distribution. A negative result is found, implying that no setting of parameters in the PGM yields an exchangeable random graph process. Chapters 3 and 4 discuss joint work with Professors Chao Gao and Peter McCullagh on sparse signal detection. Procedures are developed to detect sparse alternatives to a global null for matrix and sequence data with independent Gaussian errors. For a signal that is sparse in the sense of McCullagh and Polson (2018), an identification boundary determines the number of independent samples required in order for the signal to be identifiable in a sparse limiting sense. There is a close relationship with the detection-boundary literature, which studies the problem of discriminating between two product distributions in the large-sample limit. Chapter 5 discusses joint work with Jake Soloff and Professor Will Fithian on boundary false discovery rate (bFDR) controlling methodology. Some theory for local false discovery rates is developed, providing a frequentist interpretation of Bayes motivated procedures operating in a model where effect sizes of individual studies are fixed and unknown. The local false discovery rate (lfdr) is traditionally defined as a posterior probability when effects are random, but can be more generally interpreted as the expected proportion of nulls among hypotheses with similar test-statistics. This concept inspires a new frequentist type 1 error criterion, the bFDR, which describes the rate of false discoveries near the rejection threshold and is a local analogue to the FDR concept of Benjamini and Hochberg (1995). We discuss bFDR control under relaxed null assumptions, and demonstrate our main ideas on a dataset of "nudges" from the behavioral psychology literature.