EE Student Information

The Department of Electrical Engineering supports Black Lives Matter. Read more.

• • • • •

EE Student Information, Spring Quarter through Academic Year 2020-2021: FAQs and Updated EE Course List.

Updates will be posted on this page, as well as emailed to the EE student mail list.

Please see Stanford University Health Alerts for course and travel updates.

As always, use your best judgement and consider your own and others' well-being at all times.

Information Systems Lab (ISL) Colloquium

ISL Colloquium and IT-Forum presents "Compression for biological data analysis"

Topic: 
Compression for biological data analysis
Abstract / Description: 

Compression has for decades served primarily the utilitarian purpose of enabling easier storage and transmission of data. Here however, I show how compression can be used to better understand biological processes and assist in data analysis.

First, I will demonstrate the relationship between lossy compression and understanding the perceptual characteristics of downstream agents. Quartz, my lossy compression program for next-generation sequencing quality scores counterintuitively improves SNP calling, despite discarding 95% of quality scores, showing the oversensitivity of variant callers to sequencer noise. More recently, I developed HyperMinHash, a lossy floating-point compression of the popular MinHash Jaccard index sketch, that reduces the space-complexity from log(n) to loglog(n) by using the understanding that MinHash cares less about large hash values than smaller ones.

In the second part of this talk, I show how we exploit the compressive structure of biological data to speed up similarity search. I prove that by organizing the database to facilitate clustered search, our time-complexity scales with metric entropy (number of covering hyperspheres) if the fractal dimension of a dataset is low. This is the key insight behind our compressively accelerated versions of standard tools in genomics (CORA, 10-100x speedup for all-mapping of NGS reads), metagenomics (MICA, 3.5x speedup Diamond), and chemical informatics (Ammolite, 150x speedup SMSD).

Date and Time: 
Friday, February 1, 2019 - 1:15pm
Venue: 
Packard 202

IT Forum presents "Adapting Maximum Likelihood Theory in Modern Applications"

Topic: 
Adapting Maximum Likelihood Theory in Modern Applications
Abstract / Description: 

Maximum likelihood estimation (MLE) is influential because it can be easily applied to generate optimal, statistically efficient procedures for broad classes of estimation problems. Nonetheless, the theory does not apply to modern settings --- such as problems with computational, communication, or privacy considerations --- where our estimators have resource constraints. In this talk, I will introduce a modern maximum likelihood theory that addresses these issues, focusing specifically on procedures that must be computationally efficient or privacy- preserving. To do so, I first derive analogues of Fisher information for these applications, which allows a precise characterization of tradeoffs between statistical efficiency, privacy, and computation. To complete the development, I also describe a recipe that generates optimal statistical procedures (analogues of the MLE) in the new settings, showing how to achieve the new Fisher information lower bounds.

Date and Time: 
Friday, February 22, 2019 - 1:15pm
Venue: 
Packard 202

ISL Colloquium presents "Statistical Inference Under Local Information Constraints"

Topic: 
Statistical Inference Under Local Information Constraints
Abstract / Description: 

Independent samples from an unknown probability distribution p on a domain of size k are distributed across n players, with each player holding one sample. Each player can send a message to a central referee in a simultaneous message passing (SMP) model of communication, whose goal is to solve a pre-specified inference problem. The catch, however, is that each player cannot simply send their own sample to the referee; instead, the message they send must obey some (local) information constraint. For instance, each player may be limited to communicating only L bits, where L << log k; or they may seek to reveal as little information as possible, and preserve local differentially privacy.

We propose a general formulation for inference problems in this distributed setting, and instantiate it to two fundamental inference questions, learning and uniformity testing. We study the role of randomness for those questions, and obtain striking separations between public- and private-coin protocols for the latter, while showing the two settings are equally powerful for the former. (Put differently, "sharing with neighbors does help a lot for the test, but not really for learning.")

Based on joint works with Jayadev Acharya (Cornell University), Cody Freitag (Cornell University), and Himanshu Tyagi (IISc Bangalore).

Date and Time: 
Friday, February 1, 2019 - 1:15am
Venue: 
Packard 202

ISL Colloquium presents "Learning in Non-Stationary Environments: Near-Optimal Guarantees"

Topic: 
Learning in Non-Stationary Environments: Near-Optimal Guarantees
Abstract / Description: 

Motivated by scenarios in which heterogeneous autonomous agents interact, in this talk we present recent work on the development of learning algorithms with performance guarantees for both simultaneous and hierarchical decision-making. Adoption of new technologies is transforming application domains from intelligent infrastructure to e-commerce, allowing operators and intelligently augmented humans to make decisions rapidly as they engage with these systems. Algorithms and market mechanisms supporting interactions occur on multiple time-scales, face resource constraints and system dynamics, and are exposed to exogenous uncertainties, information asymmetries, and behavioral aspects of human decision-making. Techniques for synthesis and analysis of decision-making algorithms, for either inference or influence, that fundamentally depend on an assumption of environment stationarity often breakdown in this context. For instance, humans engaging with platform-based transportation services make decisions that are dependent on their immediate observations of the environment and past experience, both of which are functions of the decisions of other users, multi-timescale policies (e.g., dynamic pricing and fixed laws), and even environmental context that may be non-stationary (e.g., weather patterns or congestion). Implementation of algorithms designed assuming stationarity may lead to unintended or unforeseen consequences.

Stochastic models with high-probability guarantees that capture the dynamics and the decision-making behavior of autonomous agents are needed to support effective interventions such as physical control, economic incentives, or information shaping mechanisms. Two fundamental components are missing in the state-of-the-art: (i) a toolkit for analysis of interdependent learning processes and for adaptive inference in these settings, and (ii) certifiable algorithms for co-designing adaptive influence mechanisms that achieve measurable improvement in system-level performance while ensuring individual-level quality of service through design-in high-probability guarantees. In this talk, we discuss our work towards addressing these gaps. In particular, we provide (asymptotic and non-asymptotic) convergence guarantees for simultaneous play, multi-agent gradient-based learning (a class of algorithms that encompasses a broad set of multi-agent reinforcement learning algorithms) and performance guarantees (regret bounds) for hierarchical decision-making (incentive design) with bandit feedback in non-stationary, Markovian environments. Building on insights from these results, the talk concludes with a discussion of interesting future directions in the design of certifiable, robust algorithms for adaptive inference and influence.

Date and Time: 
Thursday, January 24, 2019 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents "Algorithms and Algorithmic Intractability in High Dimensional Linear Regression"

Topic: 
Algorithms and Algorithmic Intractability in High Dimensional Linear Regression
Abstract / Description: 

In this talk we will focus on the high dimensional linear regression problem. The goal is to recover a hidden k-sparse binary vector \beta under n noisy linear observations Y=X\beta+W where X is an n \times p matrix with iid N(0,1) entries and W is an n-dimensional vector with iid N(0,\sigma^2) entries. In the literature of the problem, an apparent asymptotic gap is observed between the optimal sample size for information-theoretic recovery, call it n*, and for computationally efficient recovery, call it n_alg.

We will discuss several new contributions on studying this gap. We first identify tightly the information limit of the problem using a novel analysis of the Maximum Likelihood Estimator (MLE) performance. Furthermore, we establish that the algorithmic barrier n_alg coincides with the phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of k-sparse binary vectors. The presence of such an Overlap Gap Property phase transition, which originates in spin glass theory, is known to provide evidence of an algorithmic hardness. Finally, we show that in the extreme case where the noise level is zero, i.e. \sigma=0, the computational-statistical gap closes by proposing an optimal polynomial-time algorithm using the Lenstra-Lenstra-Lov\'asz lattice basis reduction algorithm.

This is joint work with David Gamarnik.

Date and Time: 
Friday, January 18, 2019 - 3:00pm
Venue: 
Gates 463A

ISL Colloquium presents "Stochastic Descent Algorithms: Minimax Optimality, Implicit Regularization, and Deep Networks"

Topic: 
Stochastic Descent Algorithms: Minimax Optimality, Implicit Regularization, and Deep Networks
Abstract / Description: 

Stochastic descent methods have had a long history in optimization, adaptive filtering, and online learning and have recently gained tremendous popularity as the workhorse for deep learning. So much so that, it is now widely recognized that the success of deep networks is not only due to their special deep architecture, but also due to the behavior of the stochastic descent methods used, which plays a key role in reaching "good" solutions that generalize well to unseen data. In an attempt to shed some light on why this is the case, we revisit some minimax properties of stochastic gradient descent (SGD)---originally developed for quadratic loss and linear models in the context of H-infinity control in the 1990's---and extend them to general stochastic mirror descent (SMD) algorithms for general loss functions and nonlinear models. These minimax properties can be used to explain the convergence and implicit-regularization of the algorithms when the linear regression problem is over-parametrized (in what is now being called the "interpolating regime"). In the nonlinear setting, exemplified by training a deep neural network, we show that when the setup is "highly over-parametrized", stochastic descent methods enjoy similar convergence and implicit-regularization properties. This observation gives some insight into why deep networks exhibit such powerful generalization abilities. It is also a further example of what is increasingly referred to as the "blessing of dimensionality".

Date and Time: 
Thursday, January 17, 2019 - 4:15pm
Venue: 
Packard 101

IT Forum welcomes Suhas Diggavi (UCLA)

Topic: 
An information theoretic approach to nanopore sequencing
Abstract / Description: 

While DNA sequencing technology has undergone a major revolution, the read lengths afforded by most current generation sequencing technologies still remain small (in the range of hundreds of DNA base-pairs). These short reads are then stitched together with algorithms that exploit the overlap between these reads in order to reconstruct the DNA. Long repeated regions of the DNA greater than this short read length, which are common, are not resolvable with this technology, requiring sequencers capable of accurate long reads. Nanopore sequencing promises to address this problem, by increasing the read lengths by orders of magnitude (up to 10K-100K bases). However, nanopore sequencers built to date, have higher error rates than the short-read technologies, which if unresolved could limit their applications. There are many algorithmic challenges in realizing this potential, due to many transformations between the DNA nucleotide and the nanopore current output. Impairments in nanopore sequencers include inter-symbol interference (multiple bases affect each observation), random channel variations, insertions/deletions and noise. In this talk we develop signal-level mathematical models for the nanopore channel, which allows us to develop information theoretic bounds for its decoding capability. We will apply these to some experimental nanopore sequencer data to develop some preliminary understanding of the trade-offs in their performance. We will also use the insights from this modeling to develop novel nanopore alignments techniques which we evaluate using real datasets.

This talk is joint work with Wei Mao, Dhaivat Joshi, Sreeram Kannan and Shunfu Mao.

Date and Time: 
Friday, May 3, 2019 - 1:15pm
Venue: 
Packard 202

IT Forum & ISL presents Functional interfaces to compression (or: Down With Streams!)

Topic: 
Functional interfaces to compression (or: Down With Streams!)
Abstract / Description: 

From a computer-science perspective, the world of compression can seem like an amazing country glimpsed through a narrow straw. The problem isn't the compression itself, but the typical interface: a stream of symbols (or audio samples, pixels, video frames, nucleotides, or Lidar points...) goes in, an opaque bitstream comes out, and on the other side, the bitstream is translated back into some approximation of the input. The coding and decoding modules maintain an internal state that evolves over time. In practice, these "streaming" interfaces with inaccessible mutable state have limited the kinds of applications that can be built.

In this talk, I'll discuss my group's experience with what can happen when we build applications around compression systems that expose a "functional" interface, one that makes state explicit and visible. We've found it's possible to achieve tighter couplings between coding and the rest of an application, improving performance and allowing compression algorithms to be used in settings where they were previously infeasible. In Lepton (NSDI 2017), we implemented a Huffman and a JPEG encoder in purely functional style, allowing the system to compress images in parallel across a distributed network filesystem with arbitrary block boundaries (e.g., in the middle of a Huffman symbol or JPEG block). This free-software system is is in production at Dropbox and has compressed, by 23%, hundreds of petabytes of user files. ExCamera (NSDI 2017) uses a purely functional video codec to parallelize video encoding into thousands of tiny tasks, each handling a fraction of a second of video, much shorter than the interval between key frames, and executing with 4,000-way parallelism on AWS Lambda. Salsify (NSDI 2018) uses a purely functional video codec to explore execution paths of the encoder without committing to them, letting it match the capacity estimates from a transport protocol. This architecture outperforms "streaming"-oriented video applications -- Skype, Facetime, Hangouts, WebRTC -- in delay and visual quality. I'll briefly discuss some of our ongoing work in trying to compress the communication between a pair of neural networks jointly trained to accomplish some goal, e.g. to support efficient evaluation when data and compute live in different places. In general, our findings suggest that while, in some contexts, improvements in codecs may have reached a point of diminishing returns, compression *systems* still have plenty of low-hanging fruit.

Date and Time: 
Friday, January 11, 2019 - 1:15pm
Venue: 
Packard 202

John G. Linvill Distinguished Seminar on Electronic Systems Technology

Topic: 
Internet of Things and Internet of Energy for Connecting at Any Time and Any Place
Abstract / Description: 

In this presentation, I would like to discuss with you how to establish a sustainable and smart society through the internet of energy for connecting at any time and any place. I suspect that you have heard the phrase, "Internet of Energy" less often. The meaning of this phrase is simple. Because of a ubiquitous energy transmission system, you do not need to worry about a shortage of electric power. One of the most important items for establishing a sustainable society is [...]


"Inaugural Linvill Distinguished Seminar on Electronic Systems Technology," EE News, July 2018

 

Date and Time: 
Monday, January 14, 2019 - 4:30pm
Venue: 
Hewlett 200

CANCELLED! ISL & IT Forum present "Bayesian Suffix Trees: Learning and Using Discrete Time Series"

Topic: 
CANCELLED! Bayesian Suffix Trees: Learning and Using Discrete Time Series
Abstract / Description: 

CANCELLED!  We apologize for any inconvenience.

One of the main obstacles in the development of effective algorithms for inference and learning from discrete time series data, is the difficulty encountered in the identification of useful temporal structure. We will discuss a class of novel methodological tools for effective Bayesian inference and model selection for general discrete time series, which offer promising results on both small and big data. Our starting point is the development of a rich class of Bayesian hierarchical models for variable-memory Markov chains. The particular prior structure we adopt makes it possible to design effective, linear-time algorithms that can compute most of the important features of the resulting posterior and predictive distributions without resorting to MCMC. We have applied the resulting tools to numerous application-specific tasks, including on-line prediction, segmentation, classification, anomaly detection, entropy estimation, and causality testing, on data sets from different areas of application, including data compression, neuroscience, finance, genetics, and animal communication. Results on both simulated and real data will be presented.

Date and Time: 
Wednesday, December 12, 2018 - 3:00pm
Venue: 
Packard 202

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium