Learning to rank in web search is one of the most natural applications of existing bandit algorithms. One challenge of this problem is that the clicks on recommended items are typically not independent of each other. For instance, if the highest ranked item is attractive, the user does not click on lower ranked items because the highest ranked item satisfies the user; not because the lower ranked items are not attractive. The learning agent does not observe what would have happened if the lower ranked items were examined. In addition, the click on a recommended item often depends on both its attractiveness and position. If the item is not clicked, it may be because it is not attractive or because its position is not examined. The learning agent does not observe which of these factors causes the lack of clicks. In this talk, we present several near-optimal bandit algorithms for learning to rank from partial feedback, which address the aforementioned problems. To solve the latter problem, we propose the first bandit algorithm for finding the maximum entry of a stochastic rank-1 matrix whose regret scales favorably with all quantities of interest.
Branislav Kveton is a Senior Machine Learning Scientist at Big Data Experience Lab (BEL) at Adobe Research in San Jose, CA. He was at Technicolor's Research Center from 2011 to 2014, and at Intel Research from 2006 to 2011. He graduated from the Intelligent Systems Program at the University of Pittsburgh in 2006, where he was advised by Milos Hauskrecht. Branislav is an expert on algorithms that learn in real time and incrementally from large amounts of data. He published numerous papers at top-tier machine learning and AI conferences, and serves on the program committees of these conferences every year.