Main content start

Root cause discovery

Date
Tue July 8th 2025, 4:00pm
Location
CoDa E160
Speaker
Jinzhou Li, Stanford Statistics

Although the statistical literature on causality has largely focused on forward causal problems concerning the effects of causes, reverse causal questions about identifying the causes of effects are equally important. In this talk, we discuss one such reverse causal question, known as root cause discovery, which aims to identify the root cause of an observed effect. This work is motivated by the problem of identifying the disease-causing gene (i.e., the root cause) in a patient affected by a monogenic disorder using the gene expression data of healthy individuals as reference. We consider a linear structural equation model where the causal ordering is unknown. We first show that simply comparing marginal squared z-scores cannot identify the root cause in general. We then prove, without additional assumptions, that the root cause is identifiable even when the causal ordering is not. Two key ingredients of this identifiability result are the use of permutations and Cholesky decomposition, which allow us to exploit an invariant property across different permutations to discover the root cause. Furthermore, we characterize permutations that yield the correct root cause and, based on this, propose a valid method for root cause discovery. We also adapt this approach to high-dimensional settings. Finally, we evaluate the performance of our methods through simulations and apply the high-dimensional method to identify disease-causing genes in the gene expression dataset that motivates this work.

This is based on joint work with Benjamin Chu, Ines Scheller, Julien Gagneur and Marloes Maathuis.