Variable importance, cohort Shapley and COMPAS
There are several enormous literatures on variable importance with small interactions among them. This talk begins by presenting the problem as one of identifying causes of effects vs effects of causes and then categorizing the choices one can (and must) make. It emphasizes the cohort Shapley method based on cooperative game theory which has features important in some applications. First, it does not use any synthetic combinations of data values. Those can be unlikely, physically impossible, or even logically impossible, and they open the door to adversarial manipulations. It is also model-free in working with model outputs only requiring no access to the black box function. For illustration we consider the effects of race and gender on algorithmic fairness in the COMPAS data. Cohort Shapley can also be used to measure which personal features contribute most to identifying a subject.
This talk is based on recent joint work with Ben Seiler, Masayoshi Mase and Naofumi Hama. The opinions expressed are my own, and not those of Stanford, the National Science Foundation, or Hitachi, Ltd.