Image
statistics image

Learning from missing and imperfect information

Summary
Manolis Zampetakis (Yale)
Pigott (01-260) 113
Mar
11
Date(s)
Content

Positive-unlabeled (PU) learning requires classification when only positive and unlabeled data are available, a scenario common in bioinformatics, medical studies, and fraud detection. Its significance lies in learning from datasets where negative samples are difficult or costly to obtain. In this talk, we generalize PU learning to positive and imperfect unlabeled (PIU) learning, which is a generalization of PU learning that accounts for poor quality unlabeled data due to biases and adversarial corruption. This issue can arise as a result of many reasons including reliance on public and crowd-sourced sources to collect the unlabeled data.

This change in the formulation of PU learning leads to some new theoretical implications. We show how it connects to fundamental problems, such as learning from smoothed distributions, detecting data truncation, and estimation under truncation: each central to statistics and learning theory. If time permits, we will also explore how these ideas provide a new perspective on causal inference, enabling estimation in settings where standard assumptions like overlap and unconfoundedness break down.

This is based on joint works with Jane Lee and Anay Mehrotra.