Blessings and curses of overparameterization: Insights from high-dimensional statistics
Overparameterized deep neural nets generalize well without significant effort on the algorithmic front: no explicit regularization or early-stopping is needed and off the shelf gradient-descent methods on standard losses suffice. At the same time, the same models and methods perform poorly in terms of fairness criteria in the presence of under-represented classes or groups with sensitive attributes.
In the first part of the talk we unpack the empirically observed virtue that "overparameterized models generalize well." Understanding this phenomenon poses a new challenge to modern learning theory as it contradicts classical statistical wisdoms. We shed light on key questions: When do interpolating solutions generalize well and why? How sensitive is the performance on the choice of the optimization algorithm?
In the second part, focusing on imbalanced datasets, we caution that "standard fairness-promoting algorithms are inefficient under overparameterization." Based on formal optimization and statistical insights, we design alternative algorithms that provably improve fairness accuracy. We also show experiments on state-of-the-art datasets that are fully consistent with our theoretical insights and confirm the superior performance of our algorithms.
Our results embody three key insights of overparameterized learning: (i) A hierarchy of model abstractions revealing that linear models imitate—and thus give valuable intuition on—deep-learning practices. (ii) Gradient-descent methods converge to favorable interpolating solutions characterized in terms of the norms of the underlying geometry. (iii) Tools from high-dimensional statistics go a long way in explaining newly discovered phenomena, such as double descent curves, proliferation of support vectors and subtle tradeoffs between fairness and accuracy.