Bayesian Optimization and other Bad Ideas for Hyperparameter Optimization [IT Forum]

Topic: 
Bayesian Optimization and other Bad Ideas for Hyperparameter Optimization
Friday, January 20, 2017 - 1:15pm
Venue: 
Packard 202
Speaker: 
Kevin Jamieson (UC Berkeley)
Abstract / Description: 

The performance of machine learning systems depends critically on tuning parameters that are difficult to set by standard optimization techniques. Such "hyperparameters"---including model architecture, regularization, and learning rates---are often tuned in an outer loop by black-box search methods evaluating performance on a holdout set. We formulate such hyperparameter tuning as a pure-exploration problem of deciding how many resources should be allocated to particular hyperparameter configurations. I will introduce our Hyperband algorithm for this framework and a theoretical analysis that demonstrates its ability to adapt to uncertain convergence rates and the dependency of hyperparameters on the validation loss. I will close with several experimental validations of Hyperband, including experiments on training deep networks where Hyperband outperforms state-of-the-art Bayesian optimization methods by an order of magnitude.


 

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:00 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Bio:

Kevin is a post-doc in the AMP lab at UC Berkeley working with Benjamin Recht. He is interested in the theory and practice of algorithms that sequentially collect data using an adaptive strategy. This includes active learning, multi-armed bandit problems, and stochastic optimization. His work ranges from theory, experimental work, to open-source machine learning systems. Kevin received his Ph.D. from the ECE department at the University of Wisconsin - Madison under the advisement of Robert Nowak.