I will discuss several new problems related to the general challenge of understanding what conclusions can be made, given a dataset that is relatively small in comparison to the complexity or dimensionality of the underlying distribution from which it is drawn. In the first setting we consider the problem of learning a population of Bernoulli (or multinomial) parameters. This is motivated by the ''federated learning" setting where we have data from a large number of heterogeneous individuals, who each supply a very modest amount of data, and ask the extent to which the number of data sources can compensate for the lack of data from each source. Second, I will introduce the problem of data "amplification". Given n independent draws from a distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where n is too small to learn D to any nontrivial accuracy. We also discuss connections between this setting and the challenge of interpreting the behavior of GANs and other ML/AI systems. Finally (if there is time), I will also discuss memory/data tradeoffs for regression, with the punchline that any algorithm that uses a subquadratic amount of memory will require asymptotically more data than second-order methods to achieve comparable accuracy. This talk is based on four joint papers with various subsets of Weihao Kong, Brian Axelrod, Shivam Garg, Vatsal Sharan, Aaron Sidford, Sham Kakade, and Ramya Vinayak.
The Information Systems Laboratory Colloquium (ISLC)
is typically held in Packard 101 every Thursday at 4:30 pm during the academic year. Coffee and refreshments are served at 4pm in the second floor kitchen of Packard Bldg.
The Colloquium is organized by graduate students Joachim Neu, Tavor Baharav and Kabir Chandrasekher. To suggest speakers, please contact any of the students.
To receive email notifications of seminars you can join the ISL mailing list.