Skip to content Skip to navigation

Undergraduate Summer Research in Statistics

Students will engage in interdisciplinary research using statistical methods for data mining, causal inference, machine learning within the following disciplines: biological sciences, computational statistics or statistical learning/optimization.

Summer 2020 Application is Closed!

Application Deadline: February 21, 2020 (Friday)

Participating Faculty for 2020:

Project: We will do some reading and research about multivariate data, looking at visualization and estimation problems. We will also write a research paper about a related problem, where the student will participate e.g. by writing some code.

  • Assistant Professor Mike Baiocchi, Epidemiology and Population Health, by Courtesy, Statistics 

Project: We are interested in developing a framework for measuring how a behavioral-intervention changes how people think. Probably the most natural way to do this would be to ask people to respond to a situation, having them talk about it and think-aloud about what they would do. Our group is developing new methods for taking in free-text and doing rigorous causal inference (that is, figuring out how much something changed due to an intervention). A good candidate would know something about things like regression, but a great candidate will want to think very carefully about how behavior changes and how to measure that change. We work on educational intervention (e.g., reducing imposter-syndrome) and violence prevention programs (both here at Stanford and also in the slums of Nairobi, Kenya).

  • Assistant Professor Johan Ugander, Management Science  & Engineering

Project: This project will analyze ranked choice voting through the lens of recent efforts to model contextual decision making. How do the candidates on a ballot potentially create a “context” within which citizens make a sequence of ranked choices? How do different ways to model context differ in their ability to capture behavior in empirical data? A good candidate will have taken coursework in machine learning and optimization, be proficient in data wrangling, and have an interest in behavioral models of discrete choice and/or ranking.

  • Assistant Professor Julia Palacios, Statistics and Biomedical Data Science

Project: We will explore different model-based optimization algorithms for summarizing posterior distributions. We will do some reading and research on estimation of distribution algorithms and apply them to problems in cancer and antibody repertoire evolution. We will write a research paper with members of the lab.

  • Professor Trevor HastieMathematical Sciences, Statistics and Biomedical Data Science 

Lead Researcher: Shinnosuke Nakayama
Project: Illegal, unreported and unregulated fishing (IUU) contributes 10-30% of seafood in the market, jeopardizing livelihood of 3 billion people who rely on fisheries while aggravating modern slavery problems. We have started understanding fishing activities through automated identification system (AIS), which provides locations of fishing vessels at high frequencies. However, many fishing vessels are undetectable — they can “go dark” by turning off the AIS device, and small fishing boats are not required to carry the device. Toward painting a comprehensive picture of the IUU landscape, we aim to characterize activities of fishing vessels off the radar using satellite imagery. The project involves analysis of port usage by small vessels and characterization of dark vessel behavior through image analysis in combination with AIS data. See project details 

  • Assistant Professor Julia Salzman, Biochemistry and Biomedical Data Science

Project: This project involves statistical analysis of millions of single-cell RNA sequencing profiles of human and mouse lemur cells. Goals include uncovering evolutionary divergence and conservation of tissue function and regulation across primates and identification of disease-relevant pathways

  • Professor Giulio De Leo, Biology and Senior Fellow at the Woods Institute for the Enviornment 

First Project: Parameter-estimation in non-linear fishery models
We are looking for a highly motivated student who will engage in parameter estimation for non-linear fishery models. Specifically, we have gathered landing catch from the abalone fishery in Isla Natividad for 6 fishing zones over 17 years, we developed a size-structured integral projection model (IPM) (programmed in R) and we need to estimate unknown parameters such as the strength of density-dependence and catchability. The students will develop and run the scripts to implement a number of estimators using (i) classic Maximum Likelihood, (ii) a Bayesian approach (possibly with Stan) as well as (iii) particle filtering (POMP package)

Lead Researcher: Richard Grewelle, 4th year Ph.D. student in Biology
Second Project: A theoretical approach in estimating the number of genes in a polygenic trait 
Many genetic traits are regulated by multiple genes.  There is a continuum from single gene traits to quantitative traits, where genes or non-gene elements contribute infinitesimally to the resulting phenotype (trait).  It is of great interest to many biologists to determine, through experiment, the number of separate genetic elements contributing to a phenotype in an organism or population.  Often diseases are regulated by multiple genes.  There is one approach used that was developed decades ago to statistically determine this number.  However, its use requires great effort experimentally.  A PhD candidate, Richard Grewelle, has developed an alternative approach that requires less experimental effort.  The statistical framework needs further development before it can be broadly applied.  A prospective summer intern should have an interest in developing mathematical or statistical approaches and have an upper level undergraduate to graduate level understanding of statistics or mathematics.  Some computation is required, but most efforts will involve theory.  Programming proficiency is a bonus but not necessary.

  • Professor Barbara Block, Charles and Elizabeth Prothro, Marine Sciences


Project: Our research team has multiple projects involving population level and Single cell datasets (Transcriptomic, epigenetic and proteomic) regarding stem and progenitor cell function in health and musculoskeletal diseases. The aims for the potential summer projects would be to (a) develop statistical approaches to discern specific population subsets from the bulk population data to analyze how different cell populations change during disease pathogenesis, (b) optimize tools to stratify patients and (c) to correlate data from multiple tissues and develop predictive models. Please feel free to reach out to discuss in detail.

Project: Do sharks have friends? Using Social Network Graphs to Identify Patterns in Shark Aggregations
Recent advances in animal tagging and marine fish observation has resulted in new efforts to study the social behavior of sharks. New observations have shown that many species of sharks can form large aggregations at different times during their life history. For some species these aggregations are temporary, but for others, they are more persistent. One of those species for which aggregations appear to persist is the population of Sand Tiger sharks (Carcharias taurus), along the Eastern Coast of the US. We have two datasets that contain information about potential aggregations and interactions between individual sharks that can be explored more thoroughly to identify patterns in the networks of interactions. For a student with interest in animal behavior and/or network graphs and analysis, there is much that could be done using simulations to identify non-random associations between sharks, and identifying patterns in aggregations during their annual migratory behavior. The appropriate student would ideally be interested in social network analysis, computer simulations (although not necessary), and have some coding experience or willingness to learn

  • Or other Statistics affiliated faculty who agreed to supervise and mentor your work.

Funding is provided by VPUE and is offered to undergraduate students to support full-time research projects in Statistics. 

This program runs for 8 weeks starting in June, 2020. 

This research opportunity is for Stanford University undergraduate students only. Learn more about student eligibility.

Previous research topics include:

  • Causal inference & evaluating behavioral programs
  • Data Mining Analysis
  • Computational Statistics & Multivariate Analysis
  • Analysis of multivariate microarray data
  • HAP map
  • Kaggle NASA image analysis
  • Human Microbiome studies

Summer research program requirements:

  • Have not conferred your undergraduate degree (including coterm students).
  • Proficiency in R (knowledge of C++, Julia or Java also a plus)
  • Applicants should have taken at least two of the following courses: Stats 191, 202, 208, 216, 217, 229, 290 before summer quarter.
  • Must be able to commit to full-time research (40 hours per week)

Application Materials:

  • CV/Resume with work history and relevant experience
  • Unofficial Transcript
  • Application form

Students accepted into the program receive a lump sum of $6,500 and are responsible for finding their own housing during the summer. Preference is given to Mathematical and Computational Science majors, however any Stanford undergraduate that meets our prerequisites may apply. 

Contact if you have any questions.