Distance-based summaries and modeling of evolutionary trees
Ranked tree shapes are mathematical objects of great importance used to model hierarchical data and evolutionary processes with applications ranging across many fields including evolutionary biology and infectious disease transmission. While Bayesian methods allow exploration of the posterior distribution of trees, assessing uncertainty and summarizing tree distributions remains challenging for these types of structures. Similarly, in many instances, one seeks to summarize samples of trees obtained with different methods, or from different samples and environments, and wishes to assess stability and generalizability of these summaries. Here, we exploit recently proposed distance metrics of unlabeled ranked evolutionary trees and provide an efficient combinatorial optimization algorithm for estimating Fréchet means and variances. We show the applicability of our summary statistics for studying popular tree distributions and for studying the evolution of viruses.