Information Systems Lab (ISL) Colloquium

ISL Colloquium presents "Stochastic Descent Algorithms: Minimax Optimality, Implicit Regularization, and Deep Networks"

Topic: 
Stochastic Descent Algorithms: Minimax Optimality, Implicit Regularization, and Deep Networks
Abstract / Description: 

Stochastic descent methods have had a long history in optimization, adaptive filtering, and online learning and have recently gained tremendous popularity as the workhorse for deep learning. So much so that, it is now widely recognized that the success of deep networks is not only due to their special deep architecture, but also due to the behavior of the stochastic descent methods used, which plays a key role in reaching "good" solutions that generalize well to unseen data. In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD)---originally developed for quadratic loss and linear models in the context of H-infinity control in the 1990's---and extend them to general stochastic mirror descent (SMD) algorithms for general loss functions and nonlinear models. These minimax properties can be used to explain the convergence and implicit-regularization of the algorithms when the linear regression problem is over-parametrized (in what is now being called the "interpolating regime"). In the nonlinear setting, exemplified by training a deep neural network, we show that when the setup is "highly over-parametrized", stochastic descent methods enjoy similar convergence and implicit-regularization properties. This observation gives some insight into why deep networks exhibit such powerful generalization abilities. It is also a further example of what is increasingly referred to as the "blessing of dimensionality".

Date and Time: 
Thursday, January 17, 2019 - 4:15pm
Venue: 
Packard 101

IT Forum & ISL presents Functional interfaces to compression (or: Down With Streams!)

Topic: 
Functional interfaces to compression (or: Down With Streams!)
Abstract / Description: 

From a computer-science perspective, the world of compression can seem like an amazing country glimpsed through a narrow straw. The problem isn't the compression itself, but the typical interface: a stream of symbols (or audio samples, pixels, video frames, nucleotides, or Lidar points...) goes in, an opaque bitstream comes out, and on the other side, the bitstream is translated back into some approximation of the input. The coding and decoding modules maintain an internal state that evolves over time. In practice, these "streaming" interfaces with inaccessible mutable state have limited the kinds of applications that can be built.

In this talk, I'll discuss my group's experience with what can happen when we build applications around compression systems that expose a "functional" interface, one that makes state explicit and visible. We've found it's possible to achieve tighter couplings between coding and the rest of an application, improving performance and allowing compression algorithms to be used in settings where they were previously infeasible. In Lepton (NSDI 2017), we implemented a Huffman and a JPEG encoder in purely functional style, allowing the system to compress images in parallel across a distributed network filesystem with arbitrary block boundaries (e.g., in the middle of a Huffman symbol or JPEG block). This free-software system is is in production at Dropbox and has compressed, by 23%, hundreds of petabytes of user files. ExCamera (NSDI 2017) uses a purely functional video codec to parallelize video encoding into thousands of tiny tasks, each handling a fraction of a second of video, much shorter than the interval between key frames, and executing with 4,000-way parallelism on AWS Lambda. Salsify (NSDI 2018) uses a purely functional video codec to explore execution paths of the encoder without committing to them, letting it match the capacity estimates from a transport protocol. This architecture outperforms "streaming"-oriented video applications -- Skype, Facetime, Hangouts, WebRTC -- in delay and visual quality. I'll briefly discuss some of our ongoing work in trying to compress the communication between a pair of neural networks jointly trained to accomplish some goal, e.g. to support efficient evaluation when data and compute live in different places. In general, our findings suggest that while, in some contexts, improvements in codecs may have reached a point of diminishing returns, compression *systems* still have plenty of low-hanging fruit.

Date and Time: 
Friday, January 11, 2019 - 1:15pm
Venue: 
Packard 202

John G. Linvill Distinguished Seminar on Electronic Systems Technology

Topic: 
Internet of Things and Internet of Energy for Connecting at Any Time and Any Place
Abstract / Description: 

In this presentation, I would like to discuss with you how to establish a sustainable and smart society through the internet of energy for connecting at any time and any place. I suspect that you have heard the phrase, "Internet of Energy" less often. The meaning of this phrase is simple. Because of a ubiquitous energy transmission system, you do not need to worry about a shortage of electric power. One of the most important items for establishing a sustainable society is [...]


"Inaugural Linvill Distinguished Seminar on Electronic Systems Technology," EE News, July 2018

 

Date and Time: 
Monday, January 14, 2019 - 4:30pm
Venue: 
Hewlett 200

CANCELLED! ISL & IT Forum present "Bayesian Suffix Trees: Learning and Using Discrete Time Series"

Topic: 
CANCELLED! Bayesian Suffix Trees: Learning and Using Discrete Time Series
Abstract / Description: 

CANCELLED!  We apologize for any inconvenience.

One of the main obstacles in the development of effective algorithms for inference and learning from discrete time series data, is the difficulty encountered in the identification of useful temporal structure. We will discuss a class of novel methodological tools for effective Bayesian inference and model selection for general discrete time series, which offer promising results on both small and big data. Our starting point is the development of a rich class of Bayesian hierarchical models for variable-memory Markov chains. The particular prior structure we adopt makes it possible to design effective, linear-time algorithms that can compute most of the important features of the resulting posterior and predictive distributions without resorting to MCMC. We have applied the resulting tools to numerous application-specific tasks, including on-line prediction, segmentation, classification, anomaly detection, entropy estimation, and causality testing, on data sets from different areas of application, including data compression, neuroscience, finance, genetics, and animal communication. Results on both simulated and real data will be presented.

Date and Time: 
Wednesday, December 12, 2018 - 3:00pm
Venue: 
Packard 202

ISL Colloquium presents "Information-theoretic Privacy"

Topic: 
Information-theoretic Privacy: A holistic view via leakage measures, robust privacy guarantees, and adversarial models for mechanism design
Abstract / Description: 

Privacy is the problem of ensuring limited leakage of information about sensitive features while sharing information (utility) about non-private features to legitimate data users. Even as differential privacy has emerged as a strong desideratum for privacy, there is a need for varied yet rigorous approaches for applications with different requirements. This talk presents an information-theoretic approach and takes a holistic view focusing on leakage measures, design of privacy mechanisms, and verifiable implementations using generative adversarial models. Specifically, we introduce maximal alpha leakage as a new class of adversarially motivated tunable leakage measures that quantifies the maximal gain of an adversary in refining a tilted belief of any (potentially random) function of a dataset conditioned on a disclosed dataset. The choice of alpha determines the specific adversarial action ranging from refining a belief for alpha = 1 to guessing the best posterior for alpha = ∞, and for these extremal values this measure simplifies to mutual information (MI) and maximal leakage (MaxL), respectively. The problem of guaranteeing privacy can then be viewed as one of designing a randomizing mechanism that minimizes alpha leakage subject to utility constraints. We then present bounds on the robustness of privacy guarantees that can be made when designing mechanisms from a finite number of samples. Finally, we focus on a data-driven approach, generative adversarial privacy (GAP), to design privacy mechanisms using neural networks. GAP is modeled as a constrained minimax game between a privatizer (intent on publishing a utility-guaranteeing learning representation that limits leakage of the sensitive features) and an adversary (intent on learning the sensitive features). We demonstrate the performance of GAP on multi-dimensional Gaussian mixture models and the GENKI dataset. Time permitting, we will briefly discuss the learning-theoretic underpinnings of GAP as well as connections to the problem of algorithmic fairness.

This work is a result of multiple collaborations: (a) maximal alpha leakage with J. Liao (ASU), O. Kosut (ASU), and F. P. Calmon (Harvard); (b) robust mechanism design with M. Diaz (ASU), H. Wang (Harvard), and F. P. Calmon (Harvard); and (c) GAP with C. Huang (ASU), P. Kairouz (Google), X. Chen (Stanford), and R. Rajagopal (Stanford).

Date and Time: 
Wednesday, December 5, 2018 - 4:15pm
Venue: 
Packard 202

ISL Colloquium presents "Estimation After Parameter Selection"

Topic: 
Estimation After Parameter Selection
Abstract / Description: 

In many practical parameter estimation problems, such as medical experiments and cognitive radio communications, parameter selection is performed prior to estimation. The selection process has a major impact on subsequent estimation by introducing a selection bias and creating coupling between decoupled parameters. As a result, classical estimation theory may be inappropriate and inaccurate and a new methodology is needed. In this study, the problem of estimating a preselected unknown deterministic parameter, chosen from a parameter set based on a predetermined data-based selection rule, Ψ, is considered. In this talk, I will present a general non-Bayesian estimation theory for estimation after parameter selection, includes estimation methods, performance analysis, and adaptive sampling strategies. The new theory is based on the post-selection mean-square-error (PSMSE) criterion as a performance measure instead of the commonly used mean-square-error (MSE). We derive the corresponding Cramér-Rao-type bound on the PSMSE of any Ψ-unbiased estimator, where the Ψ -unbiasedness is in the Lehmann-unbiasedness sense. Then, the post-selection maximum-likelihood (PSML) estimator is presented and its Ψ–efficiency properties are demonstrated. Practical implementations of the PSML estimator are proposed as well. As time permits, I will discuss the similar ideas that can be applied to estimation after model selection and to estimation in Good-Turing models.

Date and Time: 
Monday, December 3, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents Transportation Systems Resilience: Capacity-Aware Control and Value of Information

Topic: 
Transportation Systems Resilience: Capacity-Aware Control and Value of Information
Abstract / Description: 

Resilience of a transportation system is its ability to operate under adverse events like incidents and storms. Availability of real-time traffic data provides new opportunities for predicting travelers' routing behavior and implementing network control operations during adverse events. In this talk, we will discuss two problems: controlling highway corridors in response to disruptions and modeling strategic route choices of travelers with heterogeneous access to incident information. Firstly, we present an approach to designing control strategies for highway corridors facing stochastic capacity disruptions such random incidents and vehicle platoons/moving bottlenecks. We exploit the properties of traffic flow dynamics under recurrent incidents to derive verifiable conditions for stability of traffic queues, and also obtain guarantees on the system throughput. Secondly, we introduce a routing game in which travelers receive asymmetric and incomplete information about uncertain network state, and make route choices based on their private beliefs about the state and other travelers' behavior. We study the effects of information heterogeneity on travelers' equilibrium route choices and costs. Our analysis is useful for evaluating the value of receiving state information for travelers, which can be positive, zero, or negative in equilibrium. These results demonstrate the advantages of considering network state uncertainty in both strategic and operational aspects of system resilience.

Date and Time: 
Friday, November 30, 2018 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents Deconstructing the Blockchain to Approach Physical Limits

Topic: 
Deconstructing the Blockchain to Approach Physical Limits
Abstract / Description: 

The concept of a blockchain was invented by Satoshi Nakamoto to maintain a distributed ledger for an electronic payment system, Bitcoin. In addition to its security, important performance measures of a blockchain protocol are its transaction throughput, confirmation latency and confirmation reliability. These measures are limited by two underlying physical network attributes: communication capacity and speed-of-light propagation delay. Existing systems operate far away from these physical limits. In this work we introduce Prism, a new blockchain protocol, which can provably achieve 1) security against up to 50% adversarial hashing power; 2) optimal throughput up to the capacity C of the network; 3) confirmation latency for honest transactions proportional to the propagation delay D, with confirmation error probability exponentially small in the bandwidth-delay product CD; 4) eventual total ordering of all transactions. Our approach to the design of this protocol is based on deconstructing the blockchain into its basic functionalities and systematically scaling up these functionalities to approach their physical limits.

This is joint work with Vivek Bagaria, David Tse, Giulia Fanti and Pramod Viswanath. The full paper can be found here.

Date and Time: 
Thursday, November 29, 2018 - 3:00pm
Venue: 
Packard 101

ISL Colloquium presents Artwork Personalization via Online Learning

Topic: 
Artwork Personalization via Online Learning
Abstract / Description: 

For many years, the main goal of the Netflix recommendation engine has been to get the right titles in front of each member at the right time. Today, we use nonlinear, probabilistic and deep learning approaches to make better and better rankings of our movies and TV shows for each user. But the job of recommendation does not end there. The homepage should be able to convey to the member enough evidence of why this is a good title for her, especially for shows that the member has never heard of. One way to address this challenge is to personalize the way we portray the titles on our service. Our image personalization engine is driven by online learning and contextual bandits. Traditional bandits frameworks make strong assumptions which do not apply when predictions entail actions in the real world. We will discuss how learning algorithms can be augmented to better deal with causality, bias, and noncompliance.

Date and Time: 
Tuesday, November 13, 2018 - 4:15pm
Venue: 
Packard 202

ISL Colloquium presents Estimating the Information Flow in Deep Neural Networks

Topic: 
Estimating the Information Flow in Deep Neural Networks
Abstract / Description: 

This talk will discuss the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). We will explain this discrepancy between theory and experiments, and explain what was actually measured by these past works.

To this end, an auxiliary (noisy) DNN framework will be introduced, in which I(X;T) is a meaningful quantity that depends on the network's parameters. We will show that this noisy framework is a good proxy for the original (deterministic) system both in terms of performance and the learned representations. To accurately track I(X;T) over noisy DNNs, a differential entropy estimator tailor to exploit the DNN's layered structure will be developed and theoretical guarantees on the associated minimax risk will be provided. Using this estimator along with a certain analogy to an information-theoretic communication problem, we will elucidate the geometric mechanism that drives compression of I(X;T) in noisy DNNs. Based on these findings, we will circle back to deterministic networks and explain what the past observations of compression were in fact showing. Future research directions inspired by this study aiming to facilitate a comprehensive information-theoretic understanding of deep learning will also be discussed.

Date and Time: 
Friday, November 9, 2018 - 1:15pm
Venue: 
Packard 202

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium