EE Student Information

Information Systems Lab (ISL) Colloquium

ISL Colloquium presents Estimating the Information Flow in Deep Neural Networks

Topic: 
Estimating the Information Flow in Deep Neural Networks
Abstract / Description: 

This talk will discuss the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). We will explain this discrepancy between theory and experiments, and explain what was actually measured by these past works.

To this end, an auxiliary (noisy) DNN framework will be introduced, in which I(X;T) is a meaningful quantity that depends on the network's parameters. We will show that this noisy framework is a good proxy for the original (deterministic) system both in terms of performance and the learned representations. To accurately track I(X;T) over noisy DNNs, a differential entropy estimator tailor to exploit the DNN's layered structure will be developed and theoretical guarantees on the associated minimax risk will be provided. Using this estimator along with a certain analogy to an information-theoretic communication problem, we will elucidate the geometric mechanism that drives compression of I(X;T) in noisy DNNs. Based on these findings, we will circle back to deterministic networks and explain what the past observations of compression were in fact showing. Future research directions inspired by this study aiming to facilitate a comprehensive information-theoretic understanding of deep learning will also be discussed.

Date and Time: 
Friday, November 9, 2018 - 1:15pm
Venue: 
Packard 202

ISL Colloquium presents When do neural networks have bad local minima, and when not?

Topic: 
When do neural networks have bad local minima, and when not?
Abstract / Description: 

To explain the recent success of neural networks, researchers have conjectured that all local minima are global minima despite the non-convexity of the problem. Is this really true? Is this just hand-wavy intuition that is roughly true in special cases or can be a rigorous result in a broad setting?

In this talk, instead of explaining "why neural-nets are good", we try to understand "when neural-nets are good, and when not" --with a restricted definition of "good" by "every local-min is global-min". We focus on the binary classification problem and discuss how architecture and data affect the landscape. On the positive side, we prove that no bad local minima exist under reasonable assumptions on the neuron types, the neural-net structure, the loss function, and the dataset. On the negative side, we provide dozens of counterexamples that show the necessity of most assumptions.

Our approach can be viewed as a game of "local-min attack" and "defense". An attacker tries to construct examples that bad local minima exist, and the defender modifies the setting to eliminate bad local minima. For instance, the attacker constructs bad local minima for 1-hidden-layer ReLU network with linearly separable data, then the defender proves that smooth versions of ReLU eliminate them. At last, we present a strong defense consisting of a special neuron and a special regularizer that can eliminate bad local minima for a deep neural-net in the realizable case.

Joint work with Shiyu Liang, Yixuan Li, Jason Lee and R. Srikant.

Date and Time: 
Thursday, November 8, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents Taming the Devil of Gradient-based Optimization Methods with the Angel of Differential Equations

Topic: 
Taming the Devil of Gradient-based Optimization Methods with the Angel of Differential Equations
Abstract / Description: 

In this talk, we use ordinary differential equations to model, analyze, and interpret gradient-based optimization methods. In the first part of the talk, we derive a second-order ODE that is the limit of Nesterov's accelerated gradient method for non-strongly objectives (NAG-C). The continuous-time ODE is shown to allow for a better understanding of NAG-C and, as a byproduct, we obtain a family of accelerated methods with similar convergence rates. In the second part, we begin by recognizing that existing ODEs in the literature are inadequate to distinguish between two fundamentally different methods, Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method. In response, we derive high-resolution ODEs as more accurate surrogates for the three aforementioned methods. These novel ODEs can be integrated into a general framework that allows for a fine-grained analysis of the discrete optimization algorithms through translating properties of the amenable ODEs into those of their discrete counterparts. As the first application of this framework, we identify the effect of a term referred to as 'gradient correction' in NAG-SC but not in the heavy-ball method, shedding insight into why the former achieves acceleration while the latter does not. Moreover, in this high-resolution ODE framework, NAG-C is shown to boost the squared gradient norm minimization at the inverse cubic rate, which is the sharpest known rate concerning NAG-C itself. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates as NAG-C for smooth convex functions. This is based on joint work with Stephen Boyd, Emmanuel Candes, Simon Du, Michael Jordan, and Bin Shi.

Date and Time: 
Thursday, November 1, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Battling Demons in Peer Review

Topic: 
New Battling Demons in Peer Review
Abstract / Description: 

Peer review is the backbone of scholarly research. It is however faced with a number of challenges (or "demons") such as subjectivity, bias/miscalibration, noise, and strategic behavior. The growing number of submissions in many areas of research such as machine learning has significantly increased the scale of these demons. This talk will present some principled and practical approaches to battle these demons in peer review:

(1) Subjectivity: How to ensure that all papers are judged by the same yardstick?

(2) Bias/miscalibration: How to use ratings in presence of arbitrary or adversarial miscalibration?

(3) Noise: How to assign reviewers to papers to simultaneously ensure fair and accurate evaluations in the presence of review noise?

(4) Strategic behavior: How to insulate peer review from strategic behavior of author-reviewers?

The work uses tools from statistics and learning theory, social choice theory, information theory, game theory and decision theory. (No prior knowledge on these topics will be assumed.)


The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Thursday at 4:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Martin Zhang, Farzan Farnia, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, October 25, 2018 - 4:15pm
Venue: 
Packard 202

ISL Colloquium: New Breakthroughs in Scheduling Theory

Topic: 
New Breakthroughs in Scheduling Theory
Abstract / Description: 

Scheduling policies are at the heart of computer systems. The right scheduling policy can dramatically reduce response times, ensure fairness, provide class-based priority, etc., without requiring additional resources. While stochastic response time analysis of different scheduling policies has been the focus of thousands of theoretical papers, results are limited to analyzing a relatively small number of "simple" scheduling policies.

In this talk, we introduce the SOAP class of scheduling policies: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe even conceived. SOAP policies include the highly intractable SERPT policy (Shortest-Expected-Remaining-Processing-Time), as well as the famously complex Gittins Index policy. SOAP also includes all sorts of "combination" policies, like a policy that performs Shortest-Job-First on jobs with known size and Foreground-Background scheduling on jobs with unknown size, or a policy that is allowed to preempt jobs only when they hit certain ages. In this talk we present a stochastic response time analysis of all SOAP policies via a single universal framework.

Joint work with: Ziv Scully and Alan Scheller-Wolf

Appeared in ACM SIGMETRICS/POMACS 2018. APS 2018 Best Student Paper Finalist.


The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Thursday at 4:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Martin Zhang, Farzan Farnia, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, October 11, 2018 - 4:15pm
Venue: 
Packard 202

General Strong Polarization

Topic: 
General Strong Polarization
Abstract / Description: 

A martingale is a sequence of random variables that maintain their future expected value conditioned
on the past. A [0,1]-bounded martingale is said to polarize if it converges in the limit to either 0
or 1 with probability 1. A martingale is said to polarize strongly, if in t steps it is
sub-exponentially close to its limit with all but exponentially small probability. In 2008, Arikan
built a powerful class of error-correcting codes called Polar codes. The essence of his theory
associates a martingale with every invertible square matrix over a field (and a channel) and showed
that polarization of the martingale leads to a construction of codes that converge to Shannon
capacity. In 2013, Guruswami and Xia, and independently Hassani et al. showed that strong
polarization of the Arikan martingale leads to codes that converge to Shannon capacity at finite
block lengths, specifically at lengths that are inverse polynomial in the gap to capacity, thereby
resolving a major mathematical challenge associated with the attainment of Shannon capacity.

We show that a simple necessary condition for an invertible matrix to polarize over any non-trivial
channel is also sufficient for strong polarization over all symmetric channels over all prime
fields. Previously the only matrix which was known to polarize strongly was the 2×2Hadamard matrix.
In addition to the generality of our result, it also leads to arguably simpler proofs. The essence
of our proof is a local definition'' of polarization which only restricts the evolution of the
martingale in a single step, and a general theorem showing the local polarization suffices for
strong polarization.

In this talk I will introduce polarization and polar codes and, time permitting, present a full
proof of our main theorem. No prior background on polar codes will be assumed.

 

Based on joint work with Jaroslaw Blasiok, Venkatesan Guruswami, Preetum Nakkiran and Atri Rudra.

Date and Time: 
Friday, October 5, 2018 - 3:00pm to Saturday, October 6, 2018 - 2:55pm
Venue: 
Gates 463A

ISL Colloquium: A Differential View of Reliable Communications

Topic: 
A Differential View of Reliable Communications
Abstract / Description: 

This talk introduces a "differential" approach to information theory. In contrast to the more traditional "elemental" approach, in which we work to understand communication networks by studying the behavior of their elements in isolation, the differential approach works to understand the impact components can have on the larger networks in which they are employed. Results achieved through this differential viewpoint highlight some startling facts about network communications -- including both opportunities where even very small changes to a communication network can have a big impact on network performance and vulnerabilities where small failures can cause big harm.

Date and Time: 
Thursday, May 31, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Finite Sample Guarantees for Control of an Unknown Linear Dynamical System

Topic: 
Finite Sample Guarantees for Control of an Unknown Linear Dynamical System
Abstract / Description: 

In principle, control of a physical system is accomplished by first deriving a faithful model of the underlying dynamics from first principles, and then solving an optimal control problem with the modeled dynamics. In practice, the system may be too complex to precisely characterize, and an appealing alternative is to instead collect trajectories of the system and fit a model of the dynamics from the data. How many samples are needed for this to work? How sub-optimal is the resulting controller?

In this talk, I will shed light on these questions when the underlying dynamical system is linear and the control objective is quadratic, a classic optimal control problem known as the Linear Quadratic Regulator. Despite the simplicity of linear dynamical systems, deriving finite-time guarantees for both system identification and controller performance is non-trivial. I will first talk about our results in the "one-shot" setting, where measurements are collected offline, a model is estimated from the data, and a controller is synthesized using the estimated model with confidence bounds. Then, I will discuss our recent work on guarantees in the online regret setting, where noise injected into the system for learning the dynamics needs to trade-off with state regulation.

This talk is based on joint work with Sarah Dean, Horia Mania, Nikolai Matni, and Benjamin Recht.

Date and Time: 
Thursday, May 24, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium & IT Forum: Random initialization and implicit regularization in nonconvex statistical estimation

Topic: 
Random initialization and implicit regularization in nonconvex statistical estimation
Abstract / Description: 

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation / learning problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require suitable initialization and proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory is often either far from optimal or completely lacks theoretical guarantees.

This talk is concerned with a striking phenomenon arising in two nonconvex problems (i.e. phase retrieval and matrix completion): even in the absence of careful initialization, proper saddle escaping, and/or explicit regularization, gradient descent converges to the optimal solution within a logarithmic number of iterations, thus achieving near-optimal statistical and computational guarantees at once. All of this is achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal entrywise error control.

Date and Time: 
Wednesday, May 23, 2018 - 4:15pm
Venue: 
Building 370

ISL Colloquium: Low-dimensional Structures and Deep Models for High-dimensional Data

Topic: 
Low-dimensional Structures and Deep Models for High-dimensional Data
Abstract / Description: 

In this talk, we will discuss a class of models and techniques that can effectively model and extract rich low-dimensional structures in high-dimensional data such as images and videos, despite nonlinear transformation, gross corruption, or severely compressed measurements. This work leverages recent advancements in convex optimization from Compressive Sensing for recovering low-rank or sparse signals that provide both strong theoretical guarantees and efficient and scalable algorithms for solving such high-dimensional combinatorial problems. We illustrate how these new mathematical models and tools could bring disruptive changes to solutions to many challenging tasks in computer vision, image processing, and pattern recognition. We will also illustrate some emerging applications of these tools to other data types such as 3D range data, web documents, image tags, bioinformatics data, audio/music analysis, etc. Throughout the talk, we will discuss strong connections of algorithms from Compressive Sensing with other popular data-driven methods such as Deep Neural Networks, providing some new perspectives to understand Deep Learning.

This is joint work with John Wright of Columbia, Emmanuel Candes of Stanford, Zhouchen Lin of Peking University, Shenghua Gao of ShanghaiTech, and my former students Zhengdong Zhang of MIT, Xiao Liang of Tsinghua University, Arvind Ganesh, Zihan Zhou, Kerui Min of UIUC.

Date and Time: 
Thursday, May 3, 2018 - 4:15pm
Venue: 
Packard 101

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium