Image
Stanford EE

Refined Data Sampling and Re-weighting for Efficient and Fair Learning

Summary
Prof. Taesup Moon (Seoul National Univ)
Packard 202
Feb
9
Date(s)
Content

Abstract: In this talk, I will present two recent works focusing on refining data sampling and re-weighting techniques to develop efficient and fair machine learning methods. Firstly, I will introduce GRIT (GRouped mIni-baTch) sampling strategy designed for pre-training vision-language models in a data and compute efficient manner. GRIT effectively groups hard negative samples within each batch and is shown to encourage the models to learn fine-grained, generalizable representations, especially when combined with label smoothing and correction methods to address false negatives. Secondly, I will shift focus to FairDRO (Fair Distributionally Robust Optimization), a unified framework that seamlessly integrates regularization- and reweighting-based methods for group fairness-aware learning. FairDRO computes re-weights for samples from underrepresented groups in a principled manner, aiming to narrow the performance gap between different groups. Theoretical insights will also be provided, demonstrating the equivalence of FairDRO with appropriate fairness regularization methods.

If time permits, I will also give a brief overview of additional research findings from the M.IN.D Lab @ SNU on Adaptive and Trustworthy ML, including topics such as continual and debiased learning.

Bio: Taesup Moon is an Associate Professor at the Department of Electrical and Computer Engineering, Seoul National University (SNU), Korea. He received his BS in electrical engineering from SNU in 2002, and his MS and PhD also in electrical engineering from Stanford University, in 2004 and 2008, respectively. He has previously worked at Yahoo! Labs (2008-2012), UC Berkeley (2012-2013), Samsung Advanced Institute of Technology (SAIT) (2013-2015), DGIST (2015-2017), and Sungkyunkwan University (2017-2021). His current research interests are in developing adaptive and trustworthy machine intelligence algorithms as well as in various (big) data science applications.