Main content start

Estimating the size of a set using cascading exclusion

Date
Tue February 17th 2026, 4:00pm
Location
CoDa E160
Speaker
Persi Diaconis, Stanford Math and Statistics

Let S be a finite set and X_1, X_2,..., X_n an iid sample from S. To estimate the size |S| without further structure you can wait for repeats and use the birthday problem. This requires a sample of size the order of |S|^1/2. On the other hand, if S = {1,2,...,|S|} (German tank problem), one can use the sample maximum and get an efficient estimator with any growing sample size. I will present refinements that interpolate between these extremes. A general, non-asymptotic theory is developed. This includes estimating the volume of a compact convex set in R^n, the unseen species problem and a host of testing problems that follow from the question of whether this new observation is a typical pick from a large pre-specified population. Regression-style predictors are also treated. A general theorem gives nonparametric, finite error bounds in all cases.

This is joint work with Sourav Chatterjee and Susan Holmes.