Speaker: Somabha Mukherjee, Wharton School, University of Pennsylvania
Abstract: Dependent data arise in all avenues of science, technology and society, such as Facebook friendship networks, epidemic networks, election data and peer group effects. Analysis of dependent network data is crucial for understanding the behavior of edge and higher-order motif estimates in very large and inaccessible networks, deriving asymptotics of graph-based tests for equality of distributions, in the study of coincidences, and many more seemingly diverse areas in statistics and probability. In this talk, I am going to focus on the Ising model, which is a useful framework introduced by statistical physicists, and later used by statisticians, for modeling dependent binary data.
In its original form, the Ising model can capture only pairwise interactions, which are seldom observed in the real world. For example, in a peer group, the decision of an individual is affected not just by pairwise communications, but by interactions with larger community tuples. It is also known in physics that atoms on a crystal surface interact not just in pairs, but in triplets and higher-order tuples. These higher-order interactions can be captured by the so called tensor Ising models, where the sufficient statistic (Hamiltonian) is a multilinear form of degree p. I will talk about estimation of the natural parameters in this model, why maximum-likelihood estimation fails in general Ising models, and will briefly talk about the asymptotics of the parameter estimates in a special case of the tensor Ising model, where every p-tuple of nodes interact with equal strengths. The asymptotics are highly non-standard, characterized by the presence of a critical curve in the interior of the parameter space on which the estimates have a limiting mixture distribution, and a surprising superefficiency phenomenon occurring at the boundary point(s) of this critical curve.
I will also consider a more realistic version of the Ising model, which is a generalization of the vanilla logistic regression, and talk briefly about estimating the natural parameters of this model under sparsity assumptions on the parameters. Towards the end, I will talk briefly about some other places where dependent combinatorial data arise, including graph-based nonparametric tests for equality of distributions.