# IT-Forum

## IT-Forum: Reverse hypercontractivity beats measure concentration for information theoretic converses

Topic:
Reverse hypercontractivity beats measure concentration for information theoretic converses
Abstract / Description:

Concentration of measure is a collection of tools and results from analysis and probability theory that have been used in many areas of pure and applied mathematics. Arguably, the first data science application of measure concentration (under the name ''blowing-up lemma'') is the proof of strong converses in multiuser information theory by Ahlswede, G\'acs and K\"orner in 1976. Since then, measure concentration has found applications in many other information theoretic problems, most notably the converse (impossibility) results in information theory. Motivated by this, information theorists (e.g. Marton) have also contributed to the mathematical foundations of measure concentration using their information-theoretic techniques.

Now, after all the past 40 years of such progress, we found that, amusingly, measure concentration is not the right hammer for many of these information theoretic applications. We introduce a new machinery based on functional inequalities and reverse hypercontractivity which yields strict improvements in terms of sharpness of the bounds, generality of the source/channel distributions, and simplicity of the proofs. Examples covered in the talk include: 1. optimal second-order converse to common randomness generation with rate-limited communication; 2. sharpening the relay channel converse bounds by and Wu and Ozgur with much simpler proofs.

The work benefited from collaborations with Thomas Courtade, Paul Cuff, Ayfer Ozgur, Ramon van Handel, and Sergio Verd\'u.

Date and Time:
Friday, August 24, 2018 - 1:15pm
Venue:
Packard 202

Topic:
Abstract / Description:

Quantization plays a critical role in digital signal processing systems. Quantizers are typically designed to obtain an accurate digital representation of the input signal, operating independently of the system task, and are commonly implemented using serial scalar analog-to-digital converters (ADCs). This talk is concerned with hardware-limited task-based quantization, where a system utilizing a serial scalar ADC is designed to provide a suitable representation in order to allow the recovery of a parameter vector underlying the input signal. We propose hardware-limited task-based quantization systems for a fixed and finite quantization resolution, and characterize their achievable distortion. Our results illustrate the benefits of properly taking into account the underlying task in the design of the quantization scheme.

Date and Time:
Wednesday, June 13, 2018 - 1:15pm
Venue:
Packard 202

## IT Forum: Phase transitions in generalized linear models

Topic:
Phase transitions in generalized linear models
Abstract / Description:

This is joint work with Jean Barbier, Florent Krzakala, Nicolas Macris and Lenka Zdeborova.

We consider generalized linear models (GLMs) where an unknown $n$-dimensional signal vector is observed through the application of a random matrix and a non-linear (possibly probabilistic) componentwise output function.

We study the models in the high-dimensional limit, where the observations consists of $m$ points, and $m/n \to \alpha > 0$ as $n \to \infty$. This situation is ubiquitous in applications ranging from supervised machine learning to signal processing.

We will analyze the model-case when the observation matrix has i.i.d. elements and the components of the ground-truth signal are taken independently from some known distribution.

We will compute the limit of the mutual information between the signal and the observations in the large system limit. This quantity is particularly interesting because it is related to the free energy (i.e. the logarithm of the partition function) of the posterior distribution of the signal given the observations. Therefore, the study of the asymptotic mutual information allows to deduce the limit of important quantities such as the minimum mean squared error for the estimation of the signal.

We will observe some phase transition phenomena. Depending on the noise level, the distribution of the signal and the non-linear function of the GLM we may encounter various scenarios where it may be impossible / hard (only with exponential-time algorithms) / easy (with polynomial-time algorithms) to recover the signal.

Date and Time:
Friday, May 18, 2018 - 1:15pm
Venue:
Packard 202

## ISL Colloquium & IT Forum: Random initialization and implicit regularization in nonconvex statistical estimation

Topic:
Random initialization and implicit regularization in nonconvex statistical estimation
Abstract / Description:

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation / learning problems. Due to the highly nonconvex nature of the empirical loss, state-of-the-art procedures often require suitable initialization and proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descent, however, prior theory is often either far from optimal or completely lacks theoretical guarantees.

This talk is concerned with a striking phenomenon arising in two nonconvex problems (i.e. phase retrieval and matrix completion): even in the absence of careful initialization, proper saddle escaping, and/or explicit regularization, gradient descent converges to the optimal solution within a logarithmic number of iterations, thus achieving near-optimal statistical and computational guarantees at once. All of this is achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent achieves near-optimal entrywise error control.

Date and Time:
Wednesday, May 23, 2018 - 4:15pm
Venue:
Building 370

## IT-Forum: Tight sample complexity bounds via dualizing LeCam's method

Topic:
Tight sample complexity bounds via dualizing LeCam's method
Abstract / Description:

In this talk we consider a general question of estimating linear functional of the distribution based on the noisy samples from it. We discover that the (two-point) LeCam lower bound is in fact achievable by optimizing bias-variance tradeoff of an empirical-mean type of estimator. We extend the method to certain symmetric functionals of high-dimensional parametric models.

Next, we apply this general framework to two problems: population recovery and predicting the number of unseen species. In population recovery, the goal is to estimate an unknown high-dimensional distribution (in $L_\infty$-distance) from noisy samples. In the case of \textit{erasure} noise, i.e. when each coordinate is erased with probability $\epsilon$, we discover a curious phase transition in sample complexity at $\epsilon=1/2$. In the second (classical) problem, we observe $n$ iid samples from an unknown distribution on a countable alphabet and the goal is to predict the number of new species that will be observed in the next (unseen) $tn$ samples. Again, we discover a phase transition at $t=1$. In both cases, the complete characterization of sample complexity relies on complex-analytic methods, such as Hadamard's three-lines theorem.

Joint work with Yihong Wu (Yale).

Date and Time:
Friday, May 4, 2018 - 1:15pm
Venue:
Packard 202

## IT Forum: From Gaussian Multiterminal Source Coding to Distributed Karhunen–Loève Transform

Topic:
From Gaussian Multiterminal Source Coding to Distributed Karhunen–Loève Transform
Abstract / Description:

Characterizing the rate-distortion region of Gaussian multiterminal source coding is a longstanding open problem in network information theory. In this talk, I will show how to obtain new conclusive results for this problem using nonlinear analysis and convex relaxation techniques. A byproduct of this line of research is an efficient algorithm for determining the optimal distributed Karhunen–Loève transform in the high-resolution regime, which partially settles a question posed by Gastpar, Dragotti, and Vetterli. I will also introduce a generalized version of the Gaussian multiterminal source coding problem where the source-encoder connections can be arbitrary. It will be demonstrated that probabilistic graphical models offer an ideal mathematical language for describing how the performance limit of a generalized Gaussian multiterminal source coding system depends on its topology, and more generally they can serve as the long-sought platform for systematically integrating the existing achievability schemes and converse arguments. The architectural implication of our work for low-latency lossy source coding will also be discussed.

This talk is based on joint work with Jia Wang, Farrokh Etezadi, and Ashish Khisti.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Date and Time:
Friday, April 13, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum & ISL presents Robust sequential change-point detection

Topic:
Robust sequential change-point detection
Abstract / Description:

Sequential change-point detection is a fundamental problem in statistics and signal processing, with broad applications in security, network monitoring, imaging, and genetics. Given a sequence of data, the goal is to detect any change in the underlying distribution as quickly as possible from the streaming data. Various algorithms have been developed including the commonly used CUSUM procedure. However, there is a still a gap when applying change-point detection methods to real problems, notably, due to the lack of robustness. Classic approaches usually require exact specification of the pre and post change distributions forms, which may be quite restrictive and do not perform well with real data. On the other hand, Huber’s classic robust statistics built based on least favorable distributions are not directly applicable since they are computationally intractable in the multi-dimensional setting. In this seminar, I will present several of our recent works in developing computationally efficient and robust change-point detection algorithms with certain near optimality properties, by building a connection of statistical sequential analysis with (online) convex optimization.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Date and Time:
Friday, March 16, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: Restricted Isometry Property of Random Projection for Low-Dimensional Subspaces

Topic:
Restricted Isometry Property of Random Projection for Low-Dimensional Subspaces
Abstract / Description:

Dimensionality reduction is in demand to reduce the complexity of solving large-scale problems with data lying in latent low-dimensional structures in machine learning and computer version. Motivated by such need, in this talk I will introduce the Restricted Isometry Property (RIP) of Gaussian random projections for low-dimensional subspaces in R^N, and prove that the projection Frobenius norm distance between any two subspaces spanned by the projected data in R^n for n smaller than N remain almost the same as the distance between the original subspaces with probability no less than 1 - e^O(-n).

Previously the well-known Johnson-Lindenstrauss (JL) Lemma and RIP for sparse vectors have been the foundation of sparse signal processing including Compressed Sensing. As an analogy to JL Lemma and RIP for sparse vectors, this work allows the use of random projections to reduce the ambient dimension with the theoretical guarantee that the distance between subspaces after compression is well preserved.

As a direct result of our theory, when solving the subspace clustering (SC) problem at a large scale, one may conduct SC algorithm on randomly compressed samples to alleviate the high computational burden and still have theoretical performance guarantee. Because the distance between subspaces almost remains unchanged after projection, the clustering error rate of any SC algorithm may keep as small as that conducting in the original space. Considering that our theory is independent of SC algorithms, this may benefit future studies on other subspace related topics.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Date and Time:
Friday, February 23, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: BATS: Network Coding in Action

Topic:
BATS: Network Coding in Action
Abstract / Description:

Multi-hop wireless networks can be found in many application scenarios, including IoT, fog computing, satellite communication, underwater communication, etc. The main challenge in such networks is the accumulation of packet loss in the wireless links. With existing technologies, the throughput decreases exponentially fast with the number of hops.

In this talk, we introduce BATched Sparse code (BATS code) as a solution to this challenge. BATS code is a rateless implementation of network coding. The advantages of BATS codes include low encoding/decoding complexities, high throughput, low latency, and low storage requirement. This makes BATS codes ideal for implementation on IoT devices that have limited computing power and storage. At the end of the talk, we will show a video demonstration of BATS code over a Wi-Fi network with 10 IoT devices acting as relay nodes.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Date and Time:
Friday, February 9, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: Deterministic Random Matrices

Topic:
Deterministic Random Matrices
Abstract / Description:

Random matrices have become a very active area of research in the recent years and have found enormous applications in modern mathematics, physics, engineering, biological modeling, and other fields. In this work, we focus on symmetric sign (+/-1) matrices (SSMs) that were originally utilized by Wigner to model the nuclei of heavy atoms in mid-50s. Assuming the entries of the upper triangular part to be independent +/-1 with equal probabilities, Wigner showed in his pioneering works that when the sizes of matrices grow, their empirical spectra converge to a non-random measure having a semicircular shape. Later, this fundamental result was improved and substantially extended to more general families of matrices and finer spectral properties. In many physical phenomena, however, the entries of matrices exhibit significant correlations. At the same time, almost all available analytical tools heavily rely on the independence condition making the study of matrices with structure (dependencies) very challenging. The few existing works in this direction consider very specific setups and are limited by particular techniques, lacking a unified framework and tight information-theoretic bounds that would quantify the exact amount of structure that matrices may possess without affecting the limiting semicircular form of their spectra.

From a different perspective, in many applications one needs to simulate random objects. Generation of large random matrices requires very powerful sources of randomness due to the independence condition, the experiments are impossible to reproduce, and atypical or non-random looking outcomes may appear with positive probability. Reliable deterministic construction of SSMs with random-looking spectra and low algorithmic and computational complexity is of particular interest due to the natural correspondence of SSMs and undirected graphs, since the latter are extensively used in combinatorial and CS applications e.g. for the purposes of derandomization. Unfortunately, most of the existing constructions of pseudo-random graphs focus on the extreme eigenvalues and do not provide guaranties on the whole spectrum. In this work, using binary Golomb sequences, we propose a simple completely deterministic construction of circulant SSMs with spectra converging to the semicircular law with the same rate as in the original Wigner ensemble. We show that this construction has close to lowest possible algorithmic complexity and is very explicit. Essentially, the algorithm requires at most 2log(n) bits implying that the real amount of randomness conveyed by the semicircular property is quite small.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Jiantao Jiao and Yanjun Han. To suggest speakers, please contact any of the students.

Date and Time:
Friday, February 2, 2018 - 1:15pm
Venue:
Packard 202