Optimal transport in statistics and Pitman efficient multivariate distribution-free testing

Thu February 3rd 2022, 4:30pm
Nabarun Deb, Columbia

In recent years, the problem of optimal transport (see e.g., Villani, 2003) has received significant attention in statistics and machine learning due to its powerful geometric properties. In this talk, we introduce the optimal transport problem and present concrete applications of this theory in statistics. In particular, we will propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of "multivariate ranks" defined using the theory of optimal transport. We demonstrate the applicability of this approach by constructing exactly distribution-free tests for two classical nonparametric problems: (i) testing for the equality of two multivariate distributions, and (ii) testing for mutual independence between two random vectors. We investigate the consistency and asymptotic distributions of these tests, both under the null and local contiguous alternatives. We further study their local power and asymptotic (Pitman) efficiency, and show that a subclass of these tests achieve attractive efficiency lower bounds that mimic the remarkable efficiency results of Hodges and Lehmann (1956) and Chernoff and Savage (1958). To the best of our knowledge, these are the first collection of multivariate exactly distribution-free tests that provably achieve such attractive efficiency lower bounds.

Finally, we also study the rates of convergence of the estimated optimal transport maps, which are of pivotal importance in generative modeling, domain adaptation, etc. We will show that the natural plugin estimators for these maps achieve minimax optimal rates of convergence without any tuning parameters.