Mixture models are popular tools for inference with heterogeneous data in a wide range of fields. We consider the problem of prediction and inference in mixture models with streaming data. The Bayesian approach can provide a clean solution, as we will briefly review in the first part of the talk. However, analytic computations are involved. Nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. Approximate algorithm may offer these solutions, but often loose clear (Bayesian) statistical properties. Embedding the algorithm in a full probabilistic framework may illuminate.
This is what we do here. We reconsider a recursive algorithm proposed by M. Newton and collaborators for sequential learning in nonparametric mixture models. The so-called Newton algorithm is simple and fast, but theoretically intriguing. Although proposed as an approximation of a Bayesian solution, its quasi-Bayes properties remain an unsolved question. By framing the algorithm into a probabilistic setting, we can shed light on the underlying statistical model, that we show to be, asymptotically, an exchangeable mixture model, with a novel prior on densities. In this clean probabilistic framework, several applications and extensions become fairly natural, as we also illustrate in simulation studies.
*This is joint work with Sandra Fortini.