Hypothesis testing for large-scale data: Enhancing reliability and efficiency

Tue January 26th 2021, 4:30pm
Yinqiu He, University of Michigan

Abstract:   In scientific research that involves large-scale data, researchers often start with questions regarding the global properties of a large set of measurements. For instance, are a group of related genes in the same functional pathway jointly associated with a trait of interest? Such questions can be formulated as hypothesis testing problems that globally examine a large number of parameters in a high-dimensional joint distribution. Examples include hypothesis testing on mean vectors, covariance matrices and regression coefficients. To extract informative scientific knowledge from abundant data, reliability and efficiency are among the major concerns in statistical inference.

In this talk, I will address particular reliability and efficiency issues arising from jointly testing a large number of parameters. First, I will discuss how reliable the popular likelihood ratio tests (LRTs) are in terms of the type I error control for high-dimensional data. I will provide theoretical insights into the reliability of the LRTs in a variety of problems, which are based on phase transition results of the foundational Wilk’s theorem. Next, to improve efficiency of the existing testing procedures under high-dimensional settings, I will introduce a new adaptive testing framework that can maintain high statistical power against a wide range of alternative hypotheses. The proposed framework is based on a family of U-statistics that are constructed to capture the information in different directions in high-dimensional spaces. For a broad class of problems, we establish high-dimensional asymptotic theory for the U-statistics and develop adaptive testing procedures that are statistically powerful in a wide variety of scenarios.

Zoom Recording [SUNet/SSO authentication required]