Main content start

The high-dimensional asymptotics of principal component regression

Date
Tue July 30th 2024, 4:30pm
Location
Sloan 380C
Speaker
Elad Romanov, Stanford Statistics

Principal component regression (PCR) is a classical two-step approach to linear regression, where one first reduces the data dimension by projecting onto its large principal components, and then performs ordinary least squares regression. We study PCR in an asymptotic high-dimensional regression setting, where the number of data points is proportional to the dimension. We derive exact limiting formulas for the estimation and prediction risks, which depend in a complicated manner on the eigenvalues of the population covariance, the alignment between the population PCs and the true signal, and the number of selected PCs. A key challenge is the fact that in this regime, the sample covariance is an inconsistent estimate of its population counterpart, hence sample PCs may fail to fully capture potential latent low-dimensional structure in the data. We demonstrate this point through several case studies, including that of a spiked covariance model.

This is joint work with Alden Green.