Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5-100 times faster by providing explanations instead of just labels. Furthermore, given the inherent imperfection of labeling functions, we find that a simple rule-based semantic parser suffices.
The full paper can be found here: https://arxiv.org/abs/1805.03818.
The Stanford EE Computer Systems Colloquium (EE380) meets on Wednesdays 4:30-5:45 throughout the academic year. Talks are given before a live audience in Room 104 of the Shriram Building on the Stanford Campus. The live talks (and the videos hosted at Stanford and on YouTube) are open to the public.
Stanford students may enroll in EE380 to take the Colloquium as a one unit S/NC class. Enrolled students are required to keep and electronic notebook or journal and to write a short, pithy comment about each of the ten lectures and a short free form evaluation of the class in order to receive credit. Assignments are due at the end of the quarter, on the last day of examinations.
EE380 is a video class. Live attendance is encouraged but not required. We (the organizers) feel that watching the video is not a substitute for being present in the classroom. Questions are encouraged.
Many past EE380 talks are available on YouTube, see the EE380 Playlist.