Main content start

Beyond NTK: A mean-field analysis of neural networks with polynomial width, samples, and time

Tue November 7th 2023, 4:30pm
Sloan 380C
Tengyu Ma, Stanford Computer Science

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. I will present a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size n = O(d^{3.1}) where d is the dimension of the inputs, the network trained with projected gradient flow converges in poly(d) time to a non-trivial error that is not achievable by kernel methods using n << d^4 samples, hence demonstrating a clear separation between unmodified gradient descent and NTK. As a corollary, we show that projected gradient descent with a positive learning rate and a polynomial number of iterations converges to low error with the same sample complexity.

No prior knowledge about mean-field analysis is assumed. This is joint work with Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow.