Information Systems Lab (ISL) Colloquium

ISL Colloquium presents "A structural result for Personalized PageRank and its algorithmic consequences"

Topic: 
A structural result for Personalized PageRank and its algorithmic consequences
Abstract / Description: 

Many systems, including the Internet, social networks, and the power grid, can be represented as graphs. When analyzing graphs, it is often useful to compute scores describing the relative importance or distance between nodes. One example is Personalized PageRank (PPR), which assigns to each node v a vector whose i-th entry describes the importance of the i-th node from the perspective of v. PPR has proven useful in many applications, such as recommending who users should follow on social networks (if this i-th entry is large, v may be interested in following the i-th user). Unfortunately, computing n such PPR vectors (where n is the number of nodes) is infeasible for many graphs of interest.

In this work, we argue that the situation is not so dire. Our main result shows that the dimensionality of the set of PPR vectors scales sublinearly in n with high probability, for a certain class of random graphs and for a notion of dimensionality similar to rank. Put differently, we argue that the effective dimension of this set is much less than n, despite the fact that the matrix containing these vectors has rank n. Furthermore, we show this dimensionality measure relates closely to the complexity of a PPR estimation scheme that was proposed (but not analyzed) by Jeh and Widom. This allows us to argue that accurately estimating all n PPR vectors amounts to computing a vanishing fraction of the n^2 vector elements (when the technical assumptions of our main result are satisfied). Finally, we demonstrate empirically that similar conclusions hold when considering real-world networks, despite the assumptions of our theory not holding.

This is joint work with Daniel Vial, University of Michigan and will appear in ACM Sigmetrics 2019.

Date and Time: 
Thursday, May 16, 2019 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents "Optimization Aspects of Temporal Abstraction in Reinforcement Learning"

Topic: 
Optimization Aspects of Temporal Abstraction in Reinforcement Learning
Abstract / Description: 

Temporal abstraction refers to the idea that complicated sequential decision making problems can sometimes be simplified by considering the "big picture" first. In this talk, I will give an overview of some of my work on learning such temporal abstractions end-to-end within the "option-critic" architecture (Bacon et al., 2017). I will then explain how other related hierarchical RL frameworks, such as Feudal RL by Dayan and Hinton (1993), can also be approached under the same option-critic architecture. However, we will see that that this formulation leads to a so-called "bilevel" optimization problem. While this is a more difficult problem, the good news is that the literature on bilevel optimization is rich and many of its tools have yet to be re-discovered by our community. I will finally show how "iterative differentiation" techniques (Griewankand Walther, 2008) can be applied to our problem while providing a new interpretation to the "inverse RL" approach of Rust (1988).

Date and Time: 
Thursday, May 2, 2019 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents "Decentralized control over unreliable communication links"

Topic: 
Decentralized control over unreliable communication links
Abstract / Description: 

Decentralized control problems have been a topic of significant research interest due to their relevance to multi-agent systems and large-scale distributed systems.The design of optimal decentralized control strategies has been investigated under various models for inter-controller communication such as graph-based communication models and communication with delays. A common feature of much of the prior work is that the underlying communication structure of the decentralized system is assumed to be fixed and unchanging. For example, several works assume a fixed communication graph among controllers whose edges describe perfect communication links between controllers. Similarly, when the communication graph incorporates delays, the delays are assumed to be fixed and known. This is a key limitation since in many situations communication among controllers may suffer from imperfections such as random packet loss and random packet delays. These imperfections introduce a new layer of uncertainty in the information structure that is not present in the models considered in prior work. In this talk, we will describe a decentralized LQG control problem where some of the communication links suffer from random packet loss. We will first identify optimal decentralized control strategies for finite horizon version of our problem. We will then discuss the infinite horizon problem and show that there are critical thresholds for packet loss probabilities above which no strategy can achieve finite cost and below which optimal strategies can be explicitly identified.

Date and Time: 
Thursday, April 18, 2019 - 4:15pm
Venue: 
Packard 101

#StanfordToo: A Conversation about Sexual Harassment in Our Academic Spaces

Topic: 
#StanfordToo: A Conversation about Sexual Harassment in Our Academic Spaces
Abstract / Description: 

Individuals of all genders invited to be a part of:
#StanfordToo: A Conversation about Sexual Harassment in Our Academic Spaces, where we will feature real stories of harassment at Stanford academic STEM in a conversation with Provost Drell, Dean Minor (SoM), and Dean Graham (SE3). We will have plenty of time for audience discussion on how we can take concrete action to dismantle this culture and actively work towards a more inclusive Stanford for everyone. While our emphasis is on STEM fields, we welcome and encourage participation from students, postdocs, staff, and faculty of all academic disciplines and backgrounds.

Date and Time: 
Friday, April 19, 2019 - 3:30pm
Venue: 
STLC 111

ISL Colloquium presents "Learn to Communicate - Communicate to Learn"

Topic: 
Learn to Communicate - Communicate to Learn
Abstract / Description: 

Machine learning and communications are intrinsically connected. The fundamental problem of communications, as stated by Shannon, "reproducing at one point either exactly or approximately a message selected at another point," can be considered as a classification problem. With this connection in mind, I will focus on the fundamental joint source-channel coding problem using modern machine learning techniques. I will introduce uncoded "analog" schemes for wireless image transmission, and show their surprising performance both through simulations and practical implementation. This result will be used to motivate unsupervised learning techniques for wireless image transmission, leading to a "deep joint source-channel encoder" architecture, which behaves similarly to analog transmission, and not only improves upon state-of-the-art digital schemes, but also achieves graceful degradation with channel quality, and performs exceptionally well over fading channels despite not utilizing explicit pilot signals or channel state estimation.

In the second part of the talk, I will focus on distributed machine learning, particularly targeting wireless edge networks, and show that ideas from coding and communication theories can help improve their performance. Finally, I will introduce the novel concept of "over-the-air stochastic gradient descent" for wireless edge learning, and show that it significantly improves the efficiency of machine learning across bandwidth and power limited wireless devices compared to the standard digital approach that separates computation and communication. This will close the circle, making another strong case for analog communication in future communication systems.

Date and Time: 
Thursday, March 14, 2019 - 4:15pm
Venue: 
Packard 101

ISL Colloquium presents "Sensing the World Wirelessly: Perception in the Age of IoT"

Topic: 
Sensing the World Wirelessly: Perception in the Age of IoT
Abstract / Description: 

The success of wireless and mobile systems has led to a digital infrastructure that is integrated into the fabric of the physical world at a scale unimaginable two decades ago. This has come to be known as the internet of things, or the IoT. Batteryless devices constitute the largest component of this infrastructure, as they are attached to clothing, food, drugs, and manufacturing parts. However, due to their batteryless nature, these devices were assumed to be intrinsically limited in bandwidth, range, and sensing capability. To overcome these limitations, existing approaches required designing new hardware that replaces the hundreds of billions of devices already deployed.

In this talk, I will describe how our research enables transforming batteryless devices into powerful sensors without modifying their hardware in any way, thus bringing direct benefits to the billions of devices deployed in today's world. Specifically, I will describe how we can extract a sensing bandwidth from batteryless devices that is 10,000x larger than their communication bandwidth, and how we can extend their operation range by over 10x. I will also describe how we have designed novel inference algorithms and learning models that build on these techniques to deliver a variety of sensing tasks including sub-centimeter positioning, deep-tissue communication, and non-contact food sensing.

The systems we have built have transformative implications on smart environments, healthcare, manufacturing, and food safety. They enable agile robots to operate in non-line-of-sight environments where vision systems typically fail. They have led to the first demonstration of communication with deep-tissue batteryless micro-implants in a large living animal (pig) from meter-scale distances. Most recently, we demonstrated the potential of using these techniques to sense contaminated food in closed containers. I will conclude by describing how rethinking the abstractions of computing will enable us to bring the next generation of micro-computers to exciting new domains ranging from the human body to the depth of the ocean.

Date and Time: 
Thursday, March 7, 2019 - 4:15pm
Venue: 
Packard 101

ISL Colloquium and IT-Forum presents "Coding over Sets for DNA Storage"

Topic: 
Coding over Sets for DNA Storage
Abstract / Description: 

DNA based storage is a novel technology, where digital information is stored in synthetic DNA molecules. The recent advance in DNA sequencing methods and decrease in sequencing costs have paved the way for storage methods based on DNA. The natural stability of DNA molecules, (the genetic information from fossils is maintained over tens of thousands of years) motivate their use for long-term archival storage. Furthermore, because the information is stored on molecular levels, such storage systems have extremely high data densities. Recent experiments report data densities of 2 PB/gram, which corresponds to the capacity of a thousand conventional hard disk drives in one gram of DNA.

In this talk we present error-correcting codes for the storage of data in synthetic DNA. We investigate a storage model where data is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

Date and Time: 
Friday, March 8, 2019 - 1:15pm
Venue: 
Packard 202

ISL Colloquium presents "Reinforcement Learning and Optimal Control: An Overview"

Topic: 
Reinforcement Learning and Optimal Control: An Overview
Abstract / Description: 

We will provide an overview of a book in preparation, which will provide a synthesis of optimization/control and artificial intelligence methodologies as they relate to sequential decision problems. In particular, we consider large and challenging multistage problems, which can be solved in principle by dynamic programming, but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal policies with adequate performance. These methods are collectively referred to as reinforcement learning, and also by alternative names such as approximate dynamic programming, and neuro-dynamic programming. Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with a background in either field.

Date and Time: 
Monday, March 4, 2019 - 4:00pm
Venue: 
Packard 101

ISL Colloquium and IT-Forum presents "Compression for biological data analysis"

Topic: 
Compression for biological data analysis
Abstract / Description: 

Compression has for decades served primarily the utilitarian purpose of enabling easier storage and transmission of data. Here however, I show how compression can be used to better understand biological processes and assist in data analysis.

First, I will demonstrate the relationship between lossy compression and understanding the perceptual characteristics of downstream agents. Quartz, my lossy compression program for next-generation sequencing quality scores counterintuitively improves SNP calling, despite discarding 95% of quality scores, showing the oversensitivity of variant callers to sequencer noise. More recently, I developed HyperMinHash, a lossy floating-point compression of the popular MinHash Jaccard index sketch, that reduces the space-complexity from log(n) to loglog(n) by using the understanding that MinHash cares less about large hash values than smaller ones.

In the second part of this talk, I show how we exploit the compressive structure of biological data to speed up similarity search. I prove that by organizing the database to facilitate clustered search, our time-complexity scales with metric entropy (number of covering hyperspheres) if the fractal dimension of a dataset is low. This is the key insight behind our compressively accelerated versions of standard tools in genomics (CORA, 10-100x speedup for all-mapping of NGS reads), metagenomics (MICA, 3.5x speedup Diamond), and chemical informatics (Ammolite, 150x speedup SMSD).

Date and Time: 
Friday, February 1, 2019 - 1:15pm
Venue: 
Packard 202

IT Forum presents "Adapting Maximum Likelihood Theory in Modern Applications"

Topic: 
Adapting Maximum Likelihood Theory in Modern Applications
Abstract / Description: 

Maximum likelihood estimation (MLE) is influential because it can be easily applied to generate optimal, statistically efficient procedures for broad classes of estimation problems. Nonetheless, the theory does not apply to modern settings --- such as problems with computational, communication, or privacy considerations --- where our estimators have resource constraints. In this talk, I will introduce a modern maximum likelihood theory that addresses these issues, focusing specifically on procedures that must be computationally efficient or privacy- preserving. To do so, I first derive analogues of Fisher information for these applications, which allows a precise characterization of tradeoffs between statistical efficiency, privacy, and computation. To complete the development, I also describe a recipe that generates optimal statistical procedures (analogues of the MLE) in the new settings, showing how to achieve the new Fisher information lower bounds.

Date and Time: 
Friday, February 22, 2019 - 1:15pm
Venue: 
Packard 202

Pages

Subscribe to RSS - Information Systems Lab (ISL) Colloquium