Large-scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Naive methods of handling such data will produce inferences that do not generalize. Regression models that properly account for crossed random effects can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.
This is based on joint with Swarnadip Ghosh and Trevor Hastie of Stanford University.