Information Systems Lab (ISL) Colloquium

ISL Colloquium: Battling Demons in Peer Review

Topic: 
New Battling Demons in Peer Review
Abstract / Description: 

Peer review is the backbone of scholarly research. It is however faced with a number of challenges (or "demons") such as subjectivity, bias/miscalibration, noise, and strategic behavior. The growing number of submissions in many areas of research such as machine learning has significantly increased the scale of these demons. This talk will present some principled and practical approaches to battle these demons in peer review:

(1) Subjectivity: How to ensure that all papers are judged by the same yardstick?

(2) Bias/miscalibration: How to use ratings in presence of arbitrary or adversarial miscalibration?

(3) Noise: How to assign reviewers to papers to simultaneously ensure fair and accurate evaluations in the presence of review noise?

(4) Strategic behavior: How to insulate peer review from strategic behavior of author-reviewers?

The work uses tools from statistics and learning theory, social choice theory, information theory, game theory and decision theory. (No prior knowledge on these topics will be assumed.)


The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Thursday at 4:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Martin Zhang, Farzan Farnia, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, October 25, 2018 - 4:15pm
Venue: 
Packard 202

ISL Colloquium: New Breakthroughs in Scheduling Theory

Topic: 
New Breakthroughs in Scheduling Theory
Abstract / Description: 

Scheduling policies are at the heart of computer systems. The right scheduling policy can dramatically reduce response times, ensure fairness, provide class-based priority, etc., without requiring additional resources. While stochastic response time analysis of different scheduling policies has been the focus of thousands of theoretical papers, results are limited to analyzing a relatively small number of "simple" scheduling policies.

In this talk, we introduce the SOAP class of scheduling policies: Schedule Ordered by Age-based Priority. The SOAP policies include almost all scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe even conceived. SOAP policies include the highly intractable SERPT policy (Shortest-Expected-Remaining-Processing-Time), as well as the famously complex Gittins Index policy. SOAP also includes all sorts of "combination" policies, like a policy that performs Shortest-Job-First on jobs with known size and Foreground-Background scheduling on jobs with unknown size, or a policy that is allowed to preempt jobs only when they hit certain ages. In this talk we present a stochastic response time analysis of all SOAP policies via a single universal framework.

Joint work with: Ziv Scully and Alan Scheller-Wolf

Appeared in ACM SIGMETRICS/POMACS 2018. APS 2018 Best Student Paper Finalist.


The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Thursday at 4:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Martin Zhang, Farzan Farnia, and Zhengyuan Zhou. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, October 11, 2018 - 4:15pm
Venue: 
Packard 202

General Strong Polarization

Topic: 
General Strong Polarization
Abstract / Description: 

A martingale is a sequence of random variables that maintain their future expected value conditioned
on the past. A [0,1]-bounded martingale is said to polarize if it converges in the limit to either 0
or 1 with probability 1. A martingale is said to polarize strongly, if in t steps it is
sub-exponentially close to its limit with all but exponentially small probability. In 2008, Arikan
built a powerful class of error-correcting codes called Polar codes. The essence of his theory
associates a martingale with every invertible square matrix over a field (and a channel) and showed
that polarization of the martingale leads to a construction of codes that converge to Shannon
capacity. In 2013, Guruswami and Xia, and independently Hassani et al. showed that strong
polarization of the Arikan martingale leads to codes that converge to Shannon capacity at finite
block lengths, specifically at lengths that are inverse polynomial in the gap to capacity, thereby
resolving a major mathematical challenge associated with the attainment of Shannon capacity.

We show that a simple necessary condition for an invertible matrix to polarize over any non-trivial
channel is also sufficient for strong polarization over all symmetric channels over all prime
fields. Previously the only matrix which was known to polarize strongly was the 2×2Hadamard matrix.
In addition to the generality of our result, it also leads to arguably simpler proofs. The essence
of our proof is a local definition'' of polarization which only restricts the evolution of the
martingale in a single step, and a general theorem showing the local polarization suffices for
strong polarization.

In this talk I will introduce polarization and polar codes and, time permitting, present a full
proof of our main theorem. No prior background on polar codes will be assumed.

 

Based on joint work with Jaroslaw Blasiok, Venkatesan Guruswami, Preetum Nakkiran and Atri Rudra.

Date and Time: 
Friday, October 5, 2018 - 3:00pm
Venue: 
Gates 463A

ISL Colloquium: A Differential View of Reliable Communications

Topic: 
A Differential View of Reliable Communications
Abstract / Description: 

This talk introduces a "differential" approach to information theory. In contrast to the more traditional "elemental" approach, in which we work to understand communication networks by studying the behavior of their elements in isolation, the differential approach works to understand the impact components can have on the larger networks in which they are employed. Results achieved through this differential viewpoint highlight some startling facts about network communications -- including both opportunities where even very small changes to a communication network can have a big impact on network performance and vulnerabilities where small failures can cause big harm.

Date and Time: 
Thursday, May 31, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Finite Sample Guarantees for Control of an Unknown Linear Dynamical System

Topic: 
Finite Sample Guarantees for Control of an Unknown Linear Dynamical System
Abstract / Description: 

In principle, control of a physical system is accomplished by first deriving a faithful model of the underlying dynamics from first principles, and then solving an optimal control problem with the modeled dynamics. In practice, the system may be too complex to precisely characterize, and an appealing alternative is to instead collect trajectories of the system and fit a model of the dynamics from the data. How many samples are needed for this to work? How sub-optimal is the resulting controller?

In this talk, I will shed light on these questions when the underlying dynamical system is linear and the control objective is quadratic, a classic optimal control problem known as the Linear Quadratic Regulator. Despite the simplicity of linear dynamical systems, deriving finite-time guarantees for both system identification and controller performance is non-trivial. I will first talk about our results in the "one-shot" setting, where measurements are collected offline, a model is estimated from the data, and a controller is synthesized using the estimated model with confidence bounds. Then, I will discuss our recent work on guarantees in the online regret setting, where noise injected into the system for learning the dynamics needs to trade-off with state regulation.

This talk is based on joint work with Sarah Dean, Horia Mania, Nikolai Matni, and Benjamin Recht.

Date and Time: 
Thursday, May 24, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium & IT Forum: Random initialization and implicit regularization in nonconvex statistical estimation

Topic: 
Random initialization and implicit regularization in nonconvex statistical estimation
Abstract / Description: 

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation / learning problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require suitable initialization and proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory is often either far from optimal or completely lacks theoretical guarantees.

This talk is concerned with a striking phenomenon arising in two nonconvex problems (i.e. phase retrieval and matrix completion): even in the absence of careful initialization, proper saddle escaping, and/or explicit regularization, gradient descent converges to the optimal solution within a logarithmic number of iterations, thus achieving near-optimal statistical and computational guarantees at once. All of this is achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal entrywise error control.

Date and Time: 
Wednesday, May 23, 2018 - 4:15pm
Venue: 
Building 370

ISL Colloquium: Low-dimensional Structures and Deep Models for High-dimensional Data

Topic: 
Low-dimensional Structures and Deep Models for High-dimensional Data
Abstract / Description: 

In this talk, we will discuss a class of models and techniques that can effectively model and extract rich low-dimensional structures in high-dimensional data such as images and videos, despite nonlinear transformation, gross corruption, or severely compressed measurements. This work leverages recent advancements in convex optimization from Compressive Sensing for recovering low-rank or sparse signals that provide both strong theoretical guarantees and efficient and scalable algorithms for solving such high-dimensional combinatorial problems. We illustrate how these new mathematical models and tools could bring disruptive changes to solutions to many challenging tasks in computer vision, image processing, and pattern recognition. We will also illustrate some emerging applications of these tools to other data types such as 3D range data, web documents, image tags, bioinformatics data, audio/music analysis, etc. Throughout the talk, we will discuss strong connections of algorithms from Compressive Sensing with other popular data-driven methods such as Deep Neural Networks, providing some new perspectives to understand Deep Learning.

This is joint work with John Wright of Columbia, Emmanuel Candes of Stanford, Zhouchen Lin of Peking University, Shenghua Gao of ShanghaiTech, and my former students Zhengdong Zhang of MIT, Xiao Liang of Tsinghua University, Arvind Ganesh, Zihan Zhou, Kerui Min of UIUC.

Date and Time: 
Thursday, May 3, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Reinforcement Learning: Hidden Theory, and New Super-Fast Algorithms

Topic: 
Reinforcement Learning: Hidden Theory, and New Super-Fast Algorithms
Abstract / Description: 

Stochastic Approximation algorithms are used to approximate solutions to fixed point equations that involve expectations of functions with respect to possibly unknown distributions. The most famous examples today are TD- and Q-learning algorithms. The first half of this lecture will provide an overview of stochastic approximation, with a focus on optimizing the rate of convergence. A new approach to optimize the rate of convergence leads to the new Zap Q-learning algorithm. Analysis suggests that its transient behavior is a close match to a deterministic Newton-Raphson implementation, and numerical experiments confirm super fast convergence.

Date and Time: 
Tuesday, May 1, 2018 - 4:30pm
Venue: 
Packard 101

ISL Seminar: Inventing Algorithms via Deep Learning

Topic: 
Inventing Algorithms via Deep Learning
Abstract / Description: 

Deep learning is a part of daily life, owing to its successes in computer vision and natural language processing. In these applications, the success of the model-free deep learning approach can be attributed to a lack of (mathematical) generative model. In yet other applications, the data is generated by a simple model and performance criterion mathematically precise and training/test samples infinitely abundant, but the space of algorithmic choices is enormous (example: chess). Deep learning has recently shown strong promise in these problems too (example: alphazero). In this talk, we study two canonical problems of great scientific and engineering interest through the lens of deep learning.

The first is reliable communication over noisy media where we successfully revisit classical open problems in information theory; we show that creatively trained and architected neural networks can beat state of the art on the AWGN channel with noisy feedback by a 100 fold improvement in bit error rate.

The second is optimization and classification problems on graphs, where the key algorithmic challenge is scalable performance to arbitrary sized graphs. Representing graphs as randomized nonlinear dynamical systems via recurrent neural networks, we show that creative adversarial training allows one to train on small size graphs and test on much larger sized graphs (100~1000x) with approximation ratios that rival state of the art on a variety of optimization problems across the complexity theoretic hardness spectrum.

Apart from the obvious practical value, this study of mathematically precise problems sheds light on the mysteries of deep learning methods: training example choices, architectural design decisions and loss function/learning methodologies. Our (mostly) empirical research is conducted under the backdrop of a theoretical research program of understanding gated neural networks (eg: attention networks, GRU, LSTM); we show the first provably (globally) consistent algorithms to recover the parameters of a classical gated neural network architecture: mixture of experts (MoE).

Date and Time: 
Thursday, April 26, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium: Reinforcement Learning without Reinforcement

Topic: 
Reinforcement Learning without Reinforcement
Abstract / Description: 

Reinforcement Learning (RL) is concerned with solving sequential decision-making problems in the presence of uncertainty. RL is really about two problems together. The first is the 'Bellman problem': Finding the optimal policy given the model, which may involve large state spaces. Various approximate dynamic programming and RL schemes have been developed, but either there are no guarantees, or not universal, or rather slow. In fact, most RL algorithms have become synonymous with stochastic approximation (SA) schemes that are known to be rather slow. This is an even more difficult problem for MDPs with continuous state (and action) spaces. We present a class of non-SA algorithms for reinforcement learning in continuous state space MDP problems based on 'empirical' ideas, which are simple, effective and yet universal with probabilistic guarantees. The idea involves randomized Kernel-based function fitting combined with 'empirical' updates. The key is the first known "probabilistic contraction analysis" method we have developed for analysis of fairly general stochastic iterative algorithms, wherein we show convergence to a probabilistic fixed point of a sequence of random operators via a stochastic dominance argument.

The second RL problem is the 'online learning (or the Lai-Robbins) problem' when the model itself is unknown. We propose a simple posterior sampling-based regret-minimization reinforcement learning algorithm for MDPs. It achieves O(sqrt{T})-regret which is order-optimal. It not only optimally manages the "exploration versus exploitation tradeoff" but also obviates the need for expensive computation for exploration. The algorithm differs from classical adaptive control in its focus on non-asymptotic regret optimality as opposed to asymptotic stability. This seems to resolve a long standing open problem in Reinforcement Learning.

Date and Time: 
Tuesday, April 24, 2018 - 4:00pm
Venue: 
Packard 101

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium