We deconstruct the Generative Adversarial Networks (GANs) to its three fundamental problems to study: formulation, generalization, and optimization. We propose systematic principles to formulate the population goals of GANs (when infinite samples are available), and reveal and further develop connections between GANs and robust statistics. We provide principled methods to achieve the population formulations of GANs given finite samples with small generalization error, and demonstrate the intricacy of moving from infinite samples to finite samples in statistical error. We show through examples the importance of solving the inner maximization problem before the outer minimization problem, and demonstrate embedding the knowledge of the solution of the inner maximization problem could make a locally unstable algorithm globally stable. Joint work with Banghua Zhu and David Tse.
Is our current age of "Techlash" there is a daily stream of headlines demanding reforms to the major platforms on the Internet — including calls for antitrust action to structurally breakup big tech companies or at minimum levy against them billions of dollars of fines. In this talk, we'll take a step back to look at lessons from the 1956 Bell Labs Consent Decree as a startling and relevant case for how FTC antitrust action sought to protect and promote innovation. We'll survey Bell Labs' legacy as the birthplace of modern computing, starting with Shannon's Mathematical Theory of Communication and Shockley's transistor, and examine how the thousands of inventions that were licensed from Bell Labs after the Consent Decree shaped Silicon Valley. We ask, can a page of history revise our current understanding of innovation and competition and reshape the goals of the next generation of startups?
Individuals of all genders invited to be a part of:
#StanfordToo: A Conversation about Sexual Harassment in Our Academic Spaces, where we will feature real stories of harassment at Stanford academic STEM in a conversation with Provost Drell, Dean Minor (SoM), and Dean Graham (SE3). We will have plenty of time for audience discussion on how we can take concrete action to dismantle this culture and actively work towards a more inclusive Stanford for everyone. While our emphasis is on STEM fields, we welcome and encourage participation from students, postdocs, staff, and faculty of all academic disciplines and backgrounds.
The availability of massive public image datasets appears to have hardly been exploited in image compression. In this work, we present a novel framework for image compression based on human image generation and publicly available images as "side information." Our framework consists of one human who describes images using text instructions to another, who is tasked with reconstructing the original image to the first human's satisfaction. These image reconstructions were then rated by human scorers on the Amazon Mechanical Turk platform and compared to reconstructions obtained by existing image compressors. While this setup lacks certain components typical of traditional compressors, the insights gained from these experiments offer a new perspective on designing image compressors of the future.
The Santa Clara Valley chapters of the IEEE Information Theory and Signal Processing societies are co-sponsors this event.
Universal learning is considered from an information theoretic point of view following the universal prediction approach pursued in the 90's by F&Merhav. Interestingly, the extension to learning is not straight-forward. In previous works we considered on-line learning and supervised learning in a stochastic setting. Yet, the most challenging case is batch learning where prediction is done on a test sample once the entire training data is observed, in the individual setting where the features and labels, both training and test, are specific individual quantities. This work provides schemes that for any individual data compete with a "genie" (or reference) that knows the true test label. It suggests design criteria and derive the corresponding universal learning schemes. The main proposed scheme is termed Predictive Normalized Maximum Likelihood (pNML). As demonstrated, pNML learning and its variations provide robust, "stable" learning solutions that outperforms the current leading approach based on Empirical Risk Minimization (ERM). Furthermore, the pNMLconstruction provides a pointwise indication for the learnability that measures the uncertainty in learning the specific test challenge with the given training examples - thus the learner knows when it does not know. The improved performance of the pNML, the induced learnability measure and its utilization are demonstrated in several learning problems including deep neural networks models.
Joint work with Yaniv Fogel and Koby Bibas
Positioning has been the Holy Grail of wireless sensing research with a wide range of applications from tracking virtual reality devices to in-body implants. However, despite two decades of active research, a widely deployable system with high accuracy has always been elusive. Wireless signals reflected from objects in the environment interfere with and distort the signal from the intended target device, corrupting the position estimates. In order to fight this 'multipath' phenomenon, previous approaches built specialized wireless devices with huge antenna arrays or large bandwidths making them impractical for ubiquitous deployment. In this talk, I will introduce a new technique called 'Synthetic Aperture Radio' that harnesses, rather than fighting, the multipath that naturally occurs in the environment and exploits the device motion that naturally occurs in these applications. By applying this technique, I have demonstrated the first real-time and centimeter-level accurate positioning system using standard, off-the-shelf WiFi radios. Building on synthetic aperture radio technique, I have developed practical positioning systems for indoor navigation, tracking virtual reality accessories and resource constrained devices like endoscopic capsules. Looking forward, these techniques lay a foundation for utilizing ubiquitous wireless devices for developing important machine vision applications in various domains like medical sensing, physical security and autonomous vehicles.
DNA based storage is a novel technology, where digital information is stored in synthetic DNA molecules. The recent advance in DNA sequencing methods and decrease in sequencing costs have paved the way for storage methods based on DNA. The natural stability of DNA molecules, (the genetic information from fossils is maintained over tens of thousands of years) motivate their use for long-term archival storage. Furthermore, because the information is stored on molecular levels, such storage systems have extremely high data densities. Recent experiments report data densities of 2 PB/gram, which corresponds to the capacity of a thousand conventional hard disk drives in one gram of DNA.
In this talk we present error-correcting codes for the storage of data in synthetic DNA. We investigate a storage model where data is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.
Compression has for decades served primarily the utilitarian purpose of enabling easier storage and transmission of data. Here however, I show how compression can be used to better understand biological processes and assist in data analysis.
First, I will demonstrate the relationship between lossy compression and understanding the perceptual characteristics of downstream agents. Quartz, my lossy compression program for next-generation sequencing quality scores counterintuitively improves SNP calling, despite discarding 95% of quality scores, showing the oversensitivity of variant callers to sequencer noise. More recently, I developed HyperMinHash, a lossy floating-point compression of the popular MinHash Jaccard index sketch, that reduces the space-complexity from log(n) to loglog(n) by using the understanding that MinHash cares less about large hash values than smaller ones.
In the second part of this talk, I show how we exploit the compressive structure of biological data to speed up similarity search. I prove that by organizing the database to facilitate clustered search, our time-complexity scales with metric entropy (number of covering hyperspheres) if the fractal dimension of a dataset is low. This is the key insight behind our compressively accelerated versions of standard tools in genomics (CORA, 10-100x speedup for all-mapping of NGS reads), metagenomics (MICA, 3.5x speedup Diamond), and chemical informatics (Ammolite, 150x speedup SMSD).
Modern cloud storage systems need to store vast amounts of data in a fault tolerant manner, while also preserving data reliability and accessibility in the wake of frequent server failures. Traditional MDS (Maximum Distance Separable) codes provide the optimal trade-off between redundancy and number of worst-case erasures tolerated. Minimum storage regenerating (MSR) codes are a special sub-class of MDS codes that provide mechanisms for exact regeneration of a single code-block by downloading the minimum amount of information from the remaining code-blocks. As a result, MSR codes are attractive for use in distributed storage systems to ensure node repairs with optimal repair- bandwidth. However, all known constructions of MSR codes require large sub-packetization levels (which is a measure of the granularity to which a single vector codeword symbol needs to be divided into for efficient repair). This restricts the applicability of MSR codes in practice.
This talk will present a lower bound that exponentially large sub- packetization is inherent for MSR codes. We will also propose a natural relaxation of MSR codes that allows one to circumvent this lower bound, and present a general approach to construct MDS codes that significantly reduces the required sub-packetization level by incurring slightly higher repair-bandwidth as compared to MSR codes.
The lower bound is joint work with Omar Alrabiah, and the constructions are joint work with Ankt Rawat, Itzhak Tamo, and Klim Efremenko.