Speaker: Lihua Lei, Stanford Statistics
Abstract: Recent progress in machine learning provides us with myriad powerful tools for prediction. When they are deployed for high-stakes decision-making, it is also crucial to have valid uncertainty quantification, which is challenging for complex predictive algorithms. The challenge is more pronounced in situations where the predicted targets are not fully observed in the data. This talk introduces conformal inference-based approaches to generate calibrated prediction intervals for two types of partially observed outcomes: (1) counterfactuals characterized by potential outcomes, observable only for those in a particular treatment arm, and (2) time-to-event outcomes, observable only for those whose event has occurred. When the missing data mechanism is known, as in randomized experiments, both approaches achieve desired coverage in finite samples without any assumption on the distribution of the outcome conditional on the covariates or the accuracy of the predictive algorithm. When the missing data mechanism is unknown, both approaches satisfy a doubly robust guarantee of coverage. We demonstrate on both simulated and real datasets that our prediction intervals are calibrated and relatively tight.