A simple, general statistical method subsumes many in genomics
I will discuss a simple, unifying statistical formulation of many fundamental problems in genome science. This formulation enables us to develop a transparent statistical algorithm that operates directly on raw sequence observations. The approach, called NOMAD, solves myriad application-specific problems and avoids computationally intensive, multistep and heuristic methods that rely on reference genomes and alignment, mainstays of the field today. I will describe the statistical approach in NOMAD and compare it to classical methods as well as illustrate some of its applications in biology. These include discovery of human genome regulation missed by current methods, and de novo prediction of viral strain adaptation including in SARS-CoV-2. Finally, I will touch on the many open statistical directions that arise as consequences of problems that can be stated through by NOMAD's formulation.
This is joint work with many people including Kaitlin Chuang, Tavor Baharav and Roozbeh Dehghannasiri.