Large alphabet inference

Tue April 9th 2024, 4:30pm
Sloan 380Y
Amichai Painsky, Tel Aviv University

Consider a finite sample from an unknown distribution over a large alphabet. Making inference in a large alphabet regime is a fundamental problem in statistics and related fields, which entails several basic challenges. For example, how accurately can we infer the parameters of events that do not appear in the sample? What can we say about the most frequent events in the sample? The entire underlying distribution? In this talk we introduce a novel inference scheme that tackles these challenging problems. Our proposed framework applies selective inference, as we construct confidence intervals (CIs) for the desired set of parameters. Interestingly, we show that obtained CIs are dimension-free, as they do not grow with the alphabet size. Further, we show that our CIs are (almost) tight, in the sense that they cannot be further improved without violating the prescribed coverage rate.