IT-Forum

ISL & IT-Forum present "Approaching Capacity at Short Blocklengths"

Topic: 
Approaching Capacity at Short Blockengths
Abstract / Description: 

This talk explores a family of recent results directed to approaching capacity at short blocklengths on the order of 50-500 channel transmissions. Convolutional codes out-perform polar codes and LDPC codes to approach the random coding union bound with low complexity when used with an optimized CRCs and list decoding. This perspective rehabilitates "catastrophic" convolutional codes, which are more properly understood for finite blocklengths as clever expurgation rather than any sort of catastrophe. This approach also provides a low-complexity approach for maximum-likelihood decoding of high-rate BCH codes. The use of variable length coding, i.e. incremental redundancy controlled with simple ACK/NACK feedback, allows capacity to be closely approached by practical codes with fewer than 500 channel uses. This talk reviews the information-theoretic results of Polyanskiy with respect to ACK/NACK feedback, presents new results extending the classic approach of Horstein for full feedback, and shows how to optimize the number and length of incremental redundancy transmissions (and feedback transmissions) for a variable-length code with feedback (i.e. a type-II hybrid ARQ). The talk also shows how to avoid entirely the overhead of a CRC in a hybrid ARQ setting by directly computing the reliability of convolutional codeword decisions. Finally, attendees will learn about a novel communications architecture that allows the use of incremental redundancy even without feedback.

 

Date and Time: 
Friday, January 24, 2020 - 1:15pm
Venue: 
Packard 202

ISL Colloquium presents "Self-Programming Networks: Applications to Financial Trading Systems"

Topic: 
Self-Programming Networks: Applications to Financial Trading Systems
Abstract / Description: 

We describe Self-Programming Networks (SPNs), an ongoing research effort at Stanford for making cloud computing networks autonomous; that is, to enable the networks to sense and monitor themselves, and program and control themselves. We describe the goals and the architecture of SPNs and present two key outcomes: (i) Huygens, for scalable and accurate clock synchronization, and (ii) Simon, for fine-grained network telemetry using observations from the network’s edge. We also describe the relevance of this work to existing financial trading systems and demonstrate how, in future, it enables financial trading systems in the Cloud.


The Information Systems Laboratory Colloquium (ISLC) is typically held in Packard 101 every Thursday at 4:30 pm during the academic year. Coffee and refreshments are served at 4pm in the second floor kitchen of Packard Bldg.

The Colloquium is organized by graduate students Joachim Neu, Tavor Baharav and Kabir Chandrasekher. To suggest speakers, please contact any of the students.

Date and Time: 
Thursday, January 23, 2020 - 4:30pm
Venue: 
Packard 101

ISL & IT Forum present "Higher Criticism for discriminating frequency-tables and testing authorship"

Topic: 
Higher Criticism for discriminating frequency-tables and testing authorship
Abstract / Description: 

The Higher Criticism (HC) test is a useful tool for detecting the presence of a signal spread across a vast number of features, especially in the sparse setting when only few features are useful while the rest contain only noise. We adapt the HC test to the two-sample setting of detecting changes between two frequency tables. We apply this adaptation to authorship attribution challenges, where the goal is to identify the author of a document using other documents whose authorship is known. The method is simple yet performs well without handcrafting and tuning. Furthermore, as an inherent side effect, the HC calculation identifies a subset of discriminating words, which allow additional interpretation of the results. Our examples include authorship in the Federalist Papers and machine-generated texts.

We take two approaches to analyze the success of our method. First, we show that, in practice, the discriminating words identified by the test: have low variance across documents belonging to a corpus of homogeneous authorship. We conclude that in testing a new document against the corpus of an author, HC is mostly affected by words characteristic of that author and is relatively unaffected by topic structure. Finally, we analyze the power of the test in discriminating two multinomial distributions under sparse and weak perturbations model. We show that our test has maximal power in a wide range of the model parameters, even though these parameters are unknown to the user.

Date and Time: 
Friday, January 10, 2020 - 1:15pm
Venue: 
Packard 202

ISL Colloquium and IT-Forum present "A Notion of Entropy for Sparse Marked Graphs and its Applications in Graphical Data Compression"

Topic: 
A Notion of Entropy for Sparse Marked Graphs and its Applications in Graphical Data Compression
Abstract / Description: 

Many modern data sources arising from social networks, biological data, etc. are best viewed as indexed by combinatorial structures such as graphs, rather than time series. The local weak limit theory for sparse graphs, also known as the objective method, due to Benjamini, Schramm, Aldous, Steele, Lyons and others, provides a framework which enables one to make sense of a stationary process indexed by graphs. The theory of time series is the engine driving an enormous range of applications in areas such as control theory, communications, information theory and signal processing. It is to be expected that a theory of stationary stochastic processes indexed by combinatorial structures, in particular graphs, would eventually have a similarly wide-ranging impact.

Employing the above framework, we introduce a notion of entropy for probability distributions on rooted graphs. This is a generalization of the notion of entropy introduced by Bordenave and Caputo to graphs which carry marks on their vertices and edges. Such marks can represent information on real-world data. For instance, in a social network graph where each node represents an individual and edges represent friendships, a vertex mark represents the type of an individual, while edge marks represent shared data between friends. The above notion of entropy can be considered as a natural counterpart for the Shannon entropy rate in the world of graphical data. We illustrate this by introducing a universal compression scheme for marked graphical data. Furthermore, we introduce an algorithm that can perform such a compression with low complexity.

This talk is based on joint work with Venkat Anantharam.

Date and Time: 
Friday, December 6, 2019 - 1:15pm
Venue: 
Packard 202

ISL & IT Forum present "An information-theoretic solution to random access communication"

Topic: 
An information-theoretic solution to random access communication
Abstract / Description: 

The emergence of networks of many devices in the context of cyber-physical systems underscores the need for novel solutions for communication over random access channels. Most protocols currently in place rely on collision avoidance and are woefully ill-suited to handling high variation in the number and variety of communicators. We ask the question of whether it is possible, in a scenario where no one knows how many transmitters are active, for the receiver to almost always recover the messages sent by all active transmitters. Surprisingly, we find that not only is reliable decoding possible in this regime, but it is possible to attain both the capacity and the dispersion of the multiple access channel in operation. Our strategy relies on ''semi-rateless'' codes: decoding attempts are made at a sparse set of pre-determined decoding times, and the decoder is tasked with recovering from the channel output the messages of the active transmitters. Once the decoder determines that reliable decoding can be made, a positive acknowledgement (ACK) is sent to the encoders, indicating the end of that communication epoch and the start of a new one. The feedback ensures that the collection of active transmitters remains fixed during each epoch. These semi-rateless codes work for both channel and source coding; for source coding, our semi-rateless scheme achieves the optimal performance up to the first three order terms in the Gaussian approximation of the optimal rate. The analysis includes a refinement of the random coding argument ensuring the existence of a deterministic code that meets multiple constraints simultaneously, a random-coding union achievability bound for multiple access, and a sharp converse bound for multiple access source coding via a connection to composite hypothesis testing.

Based on joint works (ArXiv:1801.09018, ArXiv:1902.03366) with M. Effros, R. C. Yavas, S. Chen.

Date and Time: 
Friday, November 22, 2019 - 1:15pm
Venue: 
Packard 202

Nonparametric generative modeling via optimal transport and diffusions with provable guarantees

Topic: 
Nonparametric generative modeling via optimal transport and diffusions with provable guarantees
Abstract / Description: 

By building upon the recent theory that established the connection between implicit generative modeling (IGM) and optimal transport, in this study, we propose a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem. We provide formal theoretical analysis where we prove finite-time error guarantees for the proposed algorithm. Our experimental results support our theory and show that our algorithm is able to successfully capture the structure of different types of data distributions.

The talk will be based on the following paper:
"Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions", ICML 2019

Date and Time: 
Friday, November 8, 2019 - 1:00pm to Saturday, November 9, 2019 - 12:55pm
Venue: 
Allen 101X

IT Forum & ISL Colloquium presents "The Case Against Reality"

Topic: 
The Case Against Reality
Abstract / Description: 

If I have a visual experience that I describe as a red tomato a meter away, then I am inclined to believe that there is, in fact, a red tomato a meter away, even if I close my eyes. I believe that my perceptions are, in the normal case, veridical—that they accurately depict aspects of the real world. But is my belief supported by our best science? In particular: Does evolution by natural selection favor veridical perceptions? Many scientists and philosophers claim that it does. But this claim, though plausible, has not been properly tested. In this talk I present a new theorem: Veridical perceptual systems are never more fit than non-veridical perceptual systems of equal complexity that are simply tuned to the relevant fitness functions. This entails that perception is not a window on reality; it is more like a windows interface on your laptop. Spacetime is your desktop, and physical objects, like apples and neurons, are simply icons on this desktop. I discuss this interface theory of perception and its implications for one of the most puzzling unsolved problems in science: the relationship between brain activity and conscious experiences.

Date and Time: 
Friday, November 8, 2019 - 1:15pm
Venue: 
Packard 202

ISL & IT Forum present " Dictionary Compression and its Applications"

Topic: 
Dictionary Compression and its Applications
Abstract / Description: 

Dictionary compression is a known technique, promising to solve the problem of compressing small inputs. However, it has been only available to implementers since relatively recently, as newer compression algorithms shipped dictionary builders alongside their main codec. Due to this recent timeframe, complexities around deploying this solution at larger scales only start to be appreciated. Understanding these difficulties, and finding ways to harness them, is key to target system's performance and reliability. Yet, the price is big, as dictionary compression not only improves compression, it also offers the potential to redesign systems around their capabilities.

We'll cover the benefits, trade-off and operational difficulties of dictionary compression, as well as their important second-order impacts for systems adopting it.

Date and Time: 
Friday, November 1, 2019 - 1:15pm
Venue: 
Packard 202

ISL Colloquium & IT Forum present "Generalized Resilience and Robust Statistics"

Topic: 
Generalized Resilience and Robust Statistics
Abstract / Description: 

Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbations. This generalizes a property called resilience previously employed in the special case of mean estimation with outliers. We justify the generalized resilience property by showing that it holds under moment or hypercontractive conditions. Even in the total variation case, these subsume conditions in the literature for mean estimation, regression, and covariance estimation; the resulting analysis simplifies and sometimes improves these known results in both population limit and finite-sample rate. Our robust estimators are based on minimum distance (MD) functionals (Donoho and Liu, 1988), which project onto a set of distributions under a discrepancy related to the perturbation. We present two approaches for designing MD estimators with good finite- sample rates: weakening the discrepancy and expanding the set of distributions. We also present connections to Gao et al. (2019)'s recent analysis of generative adversarial networks for robust estimation.

Joint work with Banghua Zhu and Jacob Steinhardt

Date and Time: 
Friday, October 11, 2019 - 1:15pm
Venue: 
Packard 202

IT-Forum and ISL Colloquium SPECIAL SEMINAR: "Towards an Average-case Complexity of High-dimensional Statistics"

Topic: 
Towards an Average-case Complexity of High-dimensional Statistics
Abstract / Description: 

The prototypical high-dimensional statistical estimation problem
entails finding a structured signal in noise. These problems have
traditionally been studied in isolation, with researchers aiming to
develop statistically and computationally efficient algorithms, as well
as to try to understand the fundamental limits governing the interplay
between statistical and computational cost. In this talk I will
describe a line of work that yields average-case reductions directly
between a number of central high-dimensional statistics problems,
relating two problems by transforming one into the other. It turns out
that several problems described by robust formulations can be addressed
by one set of techniques, and we will focus on these in the talk. In
this direction, we obtain the following average-case lower bounds based
on the planted clique conjecture: a statistical-computational gap in
robust sparse mean estimation, a detection-recovery gap in community
detection, and a universality principle for computational-statistical
gaps in sparse mixture estimation. In addition to showing strong
computational lower bounds tight against what is achievable by
efficient algorithms, the methodology gives insight into the common
features shared by different high-dimensional statistics problems with
similar computational behavior. Joint work with Matthew Brennan.


Special joint seminar:  IT-Forum and The Information Systems Laboratory Colloquium (ISLC)

Date and Time: 
Friday, November 15, 2019 - 1:15pm
Venue: 
Packard 202

Pages

Subscribe to RSS - IT-Forum