The speaker will talk about applying theory to practice, with a focus on two IBM case studies. In the first case study, the practitioner initiated the interaction. This interaction led to the following problem. Assume that there is a set of "voters" and a set of "candidates", where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate's scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Gödel Prize, the top prize for a paper in theoretical computer science.
The interaction in the second case study was initiated by theoreticians, who wanted to lay the foundations for "data exchange", in which data is converted from one format to another. Although this problem may sound mundane, the issues that arise are fascinating, and this work made data exchange a new subfield, with special sessions in every major database conference.
This talk will be completely self-contained, and the speaker will derive morals from the case studies. The talk is aimed at both theoreticians and practitioners, to show them the mutual benefits of working together.
Stanford Data Science / Infoseminar is a weekly event held at Stanford that brings together people interested in big data, analytics, databases, and other interesting computer science topics.