Composite likelihood for a very large-scale binary regression with crossed random effects
Sparsely sampled crossed random effects models arise in review data, with effects for reviewers crossed with effects for items. The settings have no balance and the least squares algebra grows as N^(3/2) or worse. For generalized linear mixed models (GLMMs) there is the further difficulty of a very high-dimensional integral. For instance, we consider a likelihood with an integral over D~700,000 random effects, using only N~5,000,000 observations. The usual Laplace approximation method evaluates the D-dimensional integral using just one integration point and there is uncertainty about whether that is reliable. The MLE is infeasible in this problem and has only recently been shown to be consistent (Jiang 2013). For a probit model, we develop a composite likelihood approach based on computing D one-dimensional integrals. It is very scalable and we prove consistency which might not hold for the Laplace-based method.
This is based on joint work with Ruggero Bellio, Swarnadip Ghosh and Cristiano Varin.