EE Student Information

The Department of Electrical Engineering supports Black Lives Matter. Read more.

• • • • •

EE Student Information, Spring Quarter through Academic Year 2020-2021: FAQs and Updated EE Course List.

Updates will be posted on this page, as well as emailed to the EE student mail list.

Please see Stanford University Health Alerts for course and travel updates.

As always, use your best judgement and consider your own and others' well-being at all times.

ISL Colloquium presents "Optimizing the Cost of Distributed Learning"

Optimizing the Cost of Distributed Learning
Thursday, October 1, 2020 - 4:30pm
Prof. Carlee Joe Wong (CMU)
Abstract / Description: 

As machine learning models are trained on ever-larger and more complex datasets, it has become standard to distribute this training across multiple physical computing devices. Such an approach offers a number of potential benefits, including reduced training time and storage needs due to parallelization. Distributed stochastic gradient descent (SGD) is a common iterative framework for training machine learning models: in each iteration, local workers compute parameter updates on a local dataset. These are then sent to a central server, which aggregates the local updates and pushes global parameters back to local workers to begin a new iteration. Distributed SGD, however, can be expensive in practice: training a typical deep learning model might require several days and thousands of dollars on commercial cloud platforms. Cloud-based services that allow occasional worker failures (e.g., locating some workers on Amazon spot or Google preemptible instances) can reduce this cost, but may also reduce the training accuracy. We quantify the effect of worker failure and recovery rates on the model accuracy and wall-clock training time, and show both analytically and experimentally that these performance bounds can be used to optimize the SGD worker configurations. In particular, we can optimize the number of workers that utilize spot or preemptible instances. Compared to heuristic worker configuration strategies and standard on-demand instances, we dramatically reduce the cost of training a model, with modest increases in training time and the same level of accuracy. Finally, we discuss implications of our work for federated learning environments, which use a variant of distributed SGD. Two major challenges in federated learning are unpredictable worker failures and a heterogeneous (non-i.i.d.) distribution of data across the workers, and we show that our characterization of distributed SGD's performance under worker failures can be adapted to this setting.


The ISL Colloquium meets weekly during the academic year. Seminars are each Thursday at 4:30pm PT, unless indicated otherwise.

Until further notice, the ISL Colloquium convenes exclusively via Zoom (on Thursdays at 4:30pm PT) due to the ongoing pandemic. To avoid "Zoom-bombing", we ask attendees to input their email address here to receive the Zoom meeting details via email.


Carlee Joe-Wong is an Assistant Professor of Electrical and Computer Engineering at Carnegie Mellon University. She received her A.B., M.A., and Ph.D. degrees from Princeton University in 2011, 2013, and 2016, respectively. Dr. Joe-Wong's research is in optimizing networked systems, particularly on applying machine learning and pricing to data and computing networks. From 2013 to 2014, she was the Director of Advanced Research at DataMi, a startup she co-founded from her Ph.D. research on mobile data pricing. She has received a few awards for her work, including the ARO Young Investigator Award in 2019, the NSF CAREER Award in 2018, and the INFORMS ISS Design Science Award in 2014.