EE Student Information

Information Systems Lab (ISL) Colloquium

ISL Colloquium presents Two Talks

Topic: 
Experimentation and Decision-Making in Two-Sided Marketplaces: The Impact of Interference / Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States
Abstract / Description: 

Talk 1: Marketplace platforms use experiments (also known as "A/B tests") as a method for making data-driven decisions. When platforms consider introducing a new feature, they often first run an experiment to test the feature on a subset of users and then use this data to decide whether to launch the feature platform-wide. However, it is well documented that estimates of the treatment effect arising from these experiments may be biased, due to the presence of interference. In this talk, we survey a collection of recent results and insights we have developed on experimentation and decision-making in two-sided marketplaces. In particular, we study the bias that interference creates in both the treatment effect estimates as well as standard error estimates, and show how both types of biases affect the platform's ability to make decisions. We show that for a large class of interventions ("positive interventions"), these biases cause the platform to launch too often. Through simulations calibrated to real-world data, we show that in many settings the treatment effect bias impacts decision-making more than the standard error bias. Based on joint work with Ramesh Johari, Inessa Liskovich, Gabriel Weintraub, and Geng Zhao.


Talk 2: In this work, we design a simple reinforcement learning (RL) agent that implements an optimistic version of Q-learning and establish through regret analysis that this agent can operate with some level of competence in an arbitrarily complex environment. While we leverage concepts from the literature on provably efficient RL, we consider a general agent-environment interface and provide a novel agent design and analysis. This level of generality positions our results to inform the design of future agents for operation in complex real environments. We establish that, as time progresses, our agent performs competitively relative to policies that require longer times to evaluate. The time it takes to approach asymptotic performance is polynomial in the complexity of the agent's state representation and the time required to evaluate the best policy that the agent can represent. Notably, there is no dependence on the complexity of the environment. The ultimate per-period performance loss of the agent is bounded by a constant multiple of a measure of distortion introduced by the agent's state representation. This work is the first to establish that an algorithm approaches this asymptotic condition within a tractable time frame.

Date and Time: 
Thursday, October 14, 2021 - 4:00pm
Venue: 
Packard 101

ISL Colloquium: KO codes: Inventing Non-linear Encoding and Decoding for Reliable Wireless Communication via Deep-learning

Topic: 
KO codes: Inventing Non-linear Encoding and Decoding for Reliable Wireless Communication via Deep-learning
Abstract / Description: 

Landmark codes underpin reliable physical layer communication, e.g., Reed-Muller, BCH, Convolution, Turbo, LDPC and Polar codes: each is a linear code and represents a mathematical breakthrough. The impact on humanity is huge: each of these codes has been used in global wireless communication standards (satellite, WiFi, cellular). Traditionally, the design of codes has been driven by human-ingenuity and hence the progress is sporadic. Can we automate and accelerate this process of discovering codes?

In this talk, I will talk about KO codes, a new family of computationally efficient deep-learning driven codes that outperform the state-of-the-art reliability performance on the standardized AWGN channel. KO codes beat state-of-the-art Reed-Muller and Polar codes, under the low-complexity successive cancellation decoding, in the challenging short-to-medium block length regime on the AWGN channel. We show that the gains of KO codes are primarily due to the nonlinear mapping of information bits directly to transmit real symbols (bypassing modulation) and yet possess an efficient, high performance decoder. The key technical innovation that renders this possible is design of a novel family of neural architectures inspired by the computation tree of the Kronecker Operation (KO) central to Reed-Muller and Polar codes. These architectures pave way for the discovery of a much richer class of hitherto unexplored nonlinear algebraic structures. And more interestingly, despite having a lot of encoding and decoding structure, KO codes exhibit striking similarity to random Gaussian codes!


To avoid Zoom-bombing, we ask attendees to sign in via the above URL to receive the Zoom meeting details by email.

Please join us for a coffee half hour starting at 3:30pm at the Grove outside Packard.

Date and Time: 
Thursday, October 7, 2021 - 4:00pm
Venue: 
Packard 101 + Zoom (registration req'd)

ISL Colloquium: Sequential Decision Making: How Much Adaptivity Is Needed Anyways?

Topic: 
Sequential Decision Making: How Much Adaptivity Is Needed Anyways?
Abstract / Description: 

Adaptive stochastic optimization under partial observability is one of the fundamental challenges in artificial intelligence and machine learning with a wide range of applications, including active learning, optimal experimental design, interactive recommendations, viral marketing, Wikipedia link prediction, and perception in robotics, to name a few. In such problems, one needs to adaptively make a sequence of decisions while taking into account the stochastic observations collected in previous rounds. For instance, in active learning, the goal is to learn a classifier by carefully requesting as few labels as possible from a set of unlabeled data points. Similarly, in experimental design, a practitioner may conduct a series of tests in order to reach a conclusion. Even though it is possible to determine all the selections ahead of time before any observations take place (e.g., select all the data points at once or conduct all the medical tests simultaneously), so-called a priori selection, it is more efficient to consider a fully adaptive procedure that exploits the information obtained from past selections in order to make a new selection. In this talk, we introduce semi-adaptive policies, for a wide range of decision-making problems, that enjoy the power of fully sequential procedures while performing exponentially fewer adaptive rounds.

 

Note: This talk will be held in person in Packard 101, and has been pushed back to 4:45pm to accommodate the new class schedule.

The talk will be streamed on Zoom for those who cannot attend (registration required).


This talk is hosted by the ISL Colloquium. To receive talk announcements, subscribe to the mailing list isl-colloq@lists.stanford.edu.


Please join us for a coffee half hour starting at 4:15pm at the Bytes outdoor tables outside of Packard.

Date and Time: 
Thursday, September 30, 2021 - 4:45pm
Venue: 
Packard 101

ISL Colloquium: Interpolation Phase Transition in Neural Networks: Memorization and Generalization

Topic: 
Interpolation Phase Transition in Neural Networks: Memorization and Generalization
Abstract / Description: 

A mystery of modern neural networks is their surprising generalization power in overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones; despite this, they achieve good prediction error on unseen data.

To demystify the above phenomena, we focus on two-layer neural networks in the neural tangent (NT) regime. Under a simple data model where n inputs are d-dimensional isotropic vectors and there are N hidden neurons, we show that as soon as Nd >> n, the minimum eigenvalue of the empirical NT kernel is bounded away from zero, and therefore the network can exactly interpolate arbitrary labels.

Next, we study the generalization error of NT ridge regression (including min-$ell_2$ norm interpolation). We show that in the same overparametrization regime Nd >> n, in terms of generalization errors, NT ridge regression is well approximated by kernel ridge regression (infinite-width kernel), which is in further we approximated by polynomial ridge regression. A surprising phenomenon is a "self-induced" regularization due to the high-degree components of the activation function.

Link to the ArXiv paper: https://arxiv.org/abs/2007.12826


This talk is hosted by the ISL Colloquium. To receive talk announcements, subscribe to the mailing list isl-colloq@lists.stanford.edu.

Date and Time: 
Thursday, September 23, 2021 - 4:30pm
Venue: 
Packard 101

ISL Colloquium presents "The Hidden Convex Optimization Landscape of Deep Neural Networks"

Topic: 
The Hidden Convex Optimization Landscape of Deep Neural Networks
Abstract / Description: 

The popularity of Deep Neural Networks (DNNs) continues to grow as a result of the great empirical success in a large number of machine learning tasks. However, despite their prevalence in machine learning and the dramatic surge of interest, there are major gaps in our understanding of the fundamentals of neural net models. Understanding the mechanism behind their extraordinary generalization properties remains an open problem. A significant challenge arises in the non-convexity of training DNNs. In non-convex optimization, the choice of optimization method and its internal parameters such as initialization, mini-batching and step sizes have a considerable effect on the quality of the learned model. This is in sharp contrast to convex optimization problems, where these optimization parameters have no effect, and globally optimal solutions can be obtained in a very robust, efficient, transparent and reproducible manner.

In this talk, we introduce exact convex optimization formulations of multilayer neural network training problems. We show that two and three layer neural networks with ReLU or polynomial activations can be globally trained via convex programs with the number of variables polynomial in the number of training samples and number of hidden neurons. Our results provide an equivalent characterization of neural networks as convex models where a mixture of locally linear models are fitted to the data with sparsity inducing convex regularization. Moreover, we show that certain standard two and three layer convolutional neural networks can be globally optimized in fully polynomial time. We discuss extensions to batch normalization and generative adversarial networks. Finally, we present numerical simulations verifying our claims and illustrating that standard local search heuristics such as stochastic gradient descent can be inefficient compared to the proposed convex program.

Date and Time: 
Thursday, June 3, 2021 - 4:30pm

ISL Colloquium: Multi-Scale Methods for Machine Learning

Topic: 
Multi-Scale Methods for Machine Learning
Abstract / Description: 

Many modern applications require us to very quickly find relevant results from an enormous output space of potential candidates, for example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can be in the millions to billions. Moreover, observational or training data is often limited for many of the so-called "long-tail" of items in the output space. Given the inherent paucity of training data for most of the items in the output space, developing machine learning models that perform well for spaces of this size is a contemporary challenge. In this talk, I will present a multi-scale machine learning framework called Prediction for Enormous and Correlated Output Spaces (PECOS). PECOS proceeds by first building a hierarchy over the output space using unsupervised learning, and then learning a machine learning model that makes predictions at each level of the hierarchy. Finally, the multi-scale predictions are combined to obtain an overall predictive model. This leads to an inference method that scales logarithmically with the size of the output space. A key to obtaining high performance is leveraging sparsity and developing highly efficient sparse matrix routines.

Date and Time: 
Thursday, May 20, 2021 - 4:30pm

ISL Colloquium: Interior Point Methods for Nearly Linear Time Algorithms

Topic: 
Interior Point Methods for Nearly Linear Time Algorithms
Abstract / Description: 

Linear programming is a foundational continuous optimization problem and special cases, e.g. maximum cardinality bipartite matching, are among the most fundamental problems in combinatorial optimization. In this talk, I will survey recent advances in solving these problems using interior point methods. I will discuss how new interior point methods can be leveraged to efficiently reduce these problems to data structures problems which can be solved efficiently with techniques from randomized numerical linear algebra and graph theory. This talk will highlight recent joint work which showed that linear programs with d variables and n constraints can be solved with high probability to high precision in time Otilde(nd + poly(d)) and that a variety of graph problems, e.g. maximum flow and optimal transport, on n-node -m-edge graphs can be solved to with high probability to high precision in time Otilde(m + n^1.5). These results constitute the first polynomial time methods for solving these problems to high precision which run in nearly-linear time on sufficiently large and dense instances. No previous experience with interior point methods required.
This talk will touch upon joint work with Jan van den Brand, Yin Tat Lee, Yang P. Liu, Danupon Nanongkai, Richard Peng, Thatchaphol Saranurak, Zhao Song, and Di Wang including 2009.01802, 2002.02304, and 2101.05719.

Date and Time: 
Thursday, May 6, 2021 - 4:30pm

ISL Colloquium: Private Stochastic Convex Optimization

Topic: 
Private Stochastic Convex Optimization
Abstract / Description: 

I will summarize some recent works on differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. In the standard l2/l2 setting, we will see two approaches to getting optimal rates for this problem. We show that for a wide range of parameters, privacy causes no additional overhead in accuracy or run time. In the process, we will develop techniques for private stochastic optimization that work for other geometries. For the LASSO setting when optimizing over the l1 ball, we will see private algorithms that achieve optimal rates. Based on joint works with various subsets of Hilal Asi, Raef Bassily, Vitaly Feldman, Tomer Koren and Abhradeep Thakurta.

Date and Time: 
Thursday, April 29, 2021 - 4:30pm

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium