Classification under misspecification: Halfspaces, generalized linear models, and connections to evolvability
Abstract: In this paper we revisit some classical problems in classification under complex noise models and misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate η. In a recent work, Diakonikolas, Gouleakis, and Tzamos resolved a long-standing problem by giving the first computationally efficient learner which achieves error η+ε for any ε>0. However, their algorithm outputs a complicated hypothesis which partitions space into poly(d,1/ε) regions. Here, we give a simpler algorithm and in the process resolve a number of outstanding open questions:
- We give the first proper learner for Massart halfspaces that achieves error η+ε. We also give improved bounds on the sample complexity achievable by polynomial time algorithms.
- Based on (1), we develop a blackbox knowledge distillation procedure which converts an arbitrarily complex classifier into an equally good proper classifier.
- By leveraging a simple but overlooked connection to evolvability, we show that any SQ algorithm requires super-polynomially many queries to achieve error OPT+ε.
Moreover, we study classification under generalized linear models where E[Y | X] = σ(w • X) for any odd, monotone, and Lipschitz function σ. This family includes the previously mentioned halfspace models as a special case, but also includes models like logistic regression where the noise level increases near the decision boundary. We introduce a challenging new corruption model that generalizes Massart noise, and give a general algorithm for learning in this setting. Finally, we study our algorithm for learning halfspaces under Massart noise empirically and find that it exhibits some appealing fairness properties.