The effect of class-set size on classification accuracy: A random-class model
In many multi-class classification tasks, the potential set of classes is vast (think face detection or speaker identification). In these cases, researchers evaluate the classifiers on a subset from the population of classes to which the classifier will later be applied. In this talk, I argue that it may be useful to model the observed set of classes as a random sample from a population. As the main results, I will present nonparametric and parametric characterizations for the effect of the number of classes on classification accuracy in a few-shot learning design, which involve a new type of multi-class ROC curve. In addition, I present practical methods for estimation with relatively few observed classes, which allow us to predict the classification accuracy if more classes would be added. I also discuss use-cases from neuroscience including evaluating representations in brain-decoding tasks and subject privacy.
The talk is based on works with Charles Zheng and Yuli Slavutsky.