Calibration methods for generative models with application to protein structure modeling
Generative models frequently suffer miscalibration, where the statistics of their generations deviate from desired values. Protein structure diffusion models, for example, produce highly realistic samples but often underestimate the probabilities of important modes. These deviations are not addressed by existing fine-tuning methods for image and text generation.
We frame calibration of generative models as a constrained optimization problem that we seek to solve by fine-tuning. Because the natural objective is intractable, we introduce two surrogate objectives for which we can compute low-variance gradient estimates amenable to stochastic optimization. The resulting procedures reduce the majority of calibration error across hundreds of simultaneous constraints and models with up to nine billion parameters. Lastly, we describe an application for assimilating thermodynamic measurements to calibrate a generative model of protein structure Boltzmann ensembles.