A new model selection criterion for high-dimensional PCA

Tue April 27th 2021, 4:30pm
Soumendu Sundar Mukherjee, Indian Statistical Institute

Abstract:   We consider the problem of estimating the number of principal components in the high-dimensional asymptotic regime where $p$, the number of variables, grows at the same rate as $n$, the number of observations, i.e., $p/n \rightarrow c \in (0, \infty)$. Under the spiked covariance model of Johnstone (2001), the Akaike Information Criterion (AIC) is known to be strongly consistent (Bai et al., 2018), although under a certain "gap condition" which requires the dominant population eigenvalues to be above a threshold depending on $c$, which is strictly larger than the BBP threshold $1 + \sqrt{c}$, below which a spiked covariance structure becomes indistinguishable from one with no spikes (Baik et al., 2005). We show how to modify the penalty term of AIC to yield a strongly consistent estimator under an arbitrarily small "gap", i.e., when the dominant population eigenvalues exceed the BBP threshold by an arbitrarily small amount $\delta > 0$. We also propose another intuitive alteration of the penalty which results in a weakly consistent estimator under exactly zero gap, i.e., above the BBP threshold. We empirically compare the proposed estimators with other existing estimators in the literature.

This is based on joint work with Abhinav Chakraborty (University of Pennsylvania) and Arijit Chakrabarti (Indian Statistical Institute).

Zoom Recording [SUNet/SSO authentication required]