Asymptotic properties of high-dimensional random forests

Tue May 25th 2021, 4:30pm
Yingying Fan, USC Marshall School of Business

Abstract:   As a flexible nonparametric learning tool, random forests has been widely applied to various real applications with appealing empirical performance, even in the presence of high-dimensional feature space. Unveiling the underlying mechanisms has led to some important recent theoretical results on the consistency of the random forests algorithm and its variants. However, to our knowledge, all existing works concerning random forests consistency under the setting of high dimensionality were done for various modified random forests models where the splitting rules are independent of the response. In light of this, in this paper we derive the consistency rates for the original version of the random forests algorithm in a general high-dimensional nonparametric regression setting through a bias-variance decomposition analysis. Our new theoretical results show that random forests can indeed adapt to high dimensionality. In particular, we investigate in depth the conditions under which random forests controls the bias. Furthermore, our bias analysis characterizes explicitly how the random forests bias depends on the sample size, tree height, and column subsampling parameter. Some limitations of our current results are also discussed.

This is a joint work with Chien-Ming Chi, Jinchi Lv and Patrick Vossler.

Zoom Recording [SUNet/SSO authentication required]