The recent explosion of web analytics tools has enabled us to collect an immense amount of partial preferences for large sets of items such as products from Amazon, movies from Netflix, or restaurants from Yelp, from a large and diverse population of users through transactions, clicks, etc. Modeling, learning, and ultimately predicting the preference behavior of users from pairwise comparisons has been extensively studied since the 1927 work of Thurstone. Yet, almost all models to date have been founded on a clustering-perspective in which users are grouped by their preference behavior.
We take a fundamentally different decomposition-perspective and propose a new class of generative models for pairwise comparisons in which user preference behavior can be decomposed into contributions from multiple shared latent "causes" (partial orders) that are prevalent in the population. We show how the estimation of shared latent partial orders in the new generative model can be formally reduced to the estimation of topics in a statistically equivalent topic modeling problem in which causes correspond to topics and item-pairs to words. We show that an inevitable consequence of having a relatively small number of shared latent causes in a world of large number of item-pairs is the presence of "novel" item-pairs for each latent cause. We then leverage recent advances in the topic modeling literature and develop an algorithm based on extreme-point identification of convex polytopes to learn the shared latent partial orders. Our algorithm is provably consistent and comes with polynomial sample and computational complexity guarantees. We demonstrate that our new model is empirically competitive with the current state-of-the-art approaches in predicting preferences on semi-synthetic and real world datasets.
This talk is based on joint work with Weicong Ding and Venkatesh Saligrama at Boston University.