Dan Daniel Erdmann-Pham

I am a statistician working on the rigorous, interpretable, and scalable analysis of data with a specific focus on data arising in biology. Data underpins much modern scientific discovery, which has motivated the development of a rich set of tools to aid its analysis. The field of machine learning in particular has supplied an inventory of quantitative methods ranging from hypothesis testing to function approximation that are available off-the-shelf. However, choosing the most suitable algorithm for a given data set, or indeed whether an algorithm delivering satisfactory performance exists, is often obscured by tacit theoretical assumptions not readily accessible to the user, or a lack of clarity regarding method-specific capabilities and limitations. The broad theme of my work is to bridge such gaps by providing transparent data-analysis schemes for which provable optimality guarantees exist.
With the recent explosion of novel experimental protocols in biology specifically, establishing transparency frequently requires ideas beyond those traditionally associated with data-driven statistics. In my work, I show that borrowing from various domains of mathematics not typically associated with statistical applications can lead to tangible improvement over existing methodology; or to new frameworks for tasks that previously remained inaccessible.