A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics
Genome-wide association studies (GWAS) have highlighted that almost any trait is affected by many variants of relatively small effect. On one hand, this presents a challenge for inferring the effect of any single variant as the signal-to-noise ratio is low for variants of small effect. This challenge is compounded when combining information across many variants in polygenic scores for predicting trait values. On the other hand, the large number of contributing variants provides an opportunity to learn about the average behavior of variants encoded in the distribution of variant effect sizes. We present a flexible, unifying framework that combines information across variants to infer a distribution of effect sizes and uses this distribution to improve the estimation of the effects of individual variants. We also develop a variational inference (VI) scheme to perform efficient inference under this framework. We show this framework is useful by constructing polygenic scores (PGSs) that outperform an existing method. Our modeling framework easily extends to jointly inferring effect sizes across multiple cohorts, where we show that building PGSs using additional cohorts of differing ancestries improves predictive accuracy and portability. We also investigate the inferred distributions of effect sizes across many traits and find that these distributions have effect sizes ranging over multiple orders of magnitude.