Generalization beyond observations: A distributional perspective
Generative AI has achieved remarkable success across various domains, but its potential for addressing statistical challenges remains underexplored. This talk focuses on generalization beyond the observed data distribution, including problems such as extrapolation, distribution shifts, and causal inference. These tasks require generalizing beyond what has been directly observed. We propose tackling such problems through a distributional perspective: instead of fitting only low-dimensional summaries like conditional means, we estimate the entire distribution of the observed data. While natural from an identifiability standpoint, this approach has been underutilized in estimation. We introduce engression, a distributional learning method that blends the flexibility of generative models with conceptual simplicity. Under various structural settings, we show how engression can be adapted to out-of-support covariate shifts, conditional distribution shifts, and causal effect estimation. By leveraging generative modeling tools, we hope to demonstrate how estimating the distribution yields better generalization.
The talk is based on joint works with Nicolai Meinshausen, Alex Henzi, Michael Law, Peter Bühlmann, Anastasiis Holovchak, Sorawit Saengkyongam.