## EE Student Information

Academic Year 20-21 FAQ and updated EE course list

# IT-Forum

## IT Forum presents "Perceptual Engineering"

Topic:
Perceptual Engineering
Abstract / Description:

The distance between the real and the digital is clearest at the interface layer. The ways that our bodies interact with the physical world are rich and elaborate while digital interactions are far more limited. Through an increased level of direct and intuitive interaction, my work aims to raise computing devices from external systems that require deliberate usage to those that are truly an extension of us, advancing both the state of research and human ability. My approach is to use the entire body for input and output, to allow for implicit and natural interactions. I call my concept "perceptual engineering," i.e., a method to alter the user's perception (or more specifically the input signals to their perception) and manipulate it in subtle ways. For example, modifying a user's sense of space, place, balance and orientation or manipulating their visual attention, all without the user's explicit input, and in order to assist or guide their interactive experience in an effortless way.

I build devices and immersive systems that explore the use of cognitive illusions to manage attention, physiological signals for interaction, deep learning for automatic VR generation, embodiment for remote collaborative learning, tangible interaction for augmenting play, haptics for enhancing immersion, and vestibular stimulation to mitigate motion sickness in VR. My "perceptual engineering" approach has been shown to, (1) support implicit and natural interactions with haptic feedback, (2) induce believable physical sensations of motion in VR, (3) provide a novel way to communicate with the user through proprioception and kinesthesia, and (4) serve as a platform to question the boundaries of our sense of agency and trust. For decades, interaction design has been driven to answer the question: how can new technologies allow users to interact with digital content in the most natural way? If we look at the evolution of computing over the last 50 years, interaction has gone from punch cards to mouse and keyboard to touch and voice. Similarly, devices have become smaller and closer to the user's body. With every transition, the things people can do have become more personal. The main question that drives my research is: what is the next logical step?

Date and Time:
Friday, December 7, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum presents Estimating the Information Flow in Deep Neural Networks

Topic:
Estimating the Information Flow in Deep Neural Networks
Abstract / Description:

This talk will discuss the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information I(X;T) between the input X and internal representations T decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true I(X;T) over these networks is provably either constant (discrete X) or infinite (continuous X). We will explain this discrepancy between theory and experiments, and explain what was actually measured by these past works.

To this end, an auxiliary (noisy) DNN framework will be introduced, in which I(X;T) is a meaningful quantity that depends on the network's parameters. We will show that this noisy framework is a good proxy for the original (deterministic) system both in terms of performance and the learned representations. To accurately track I(X;T) over noisy DNNs, a differential entropy estimator tailor to exploit the DNN's layered structure will be developed and theoretical guarantees on the associated minimax risk will be provided. Using this estimator along with a certain analogy to an information-theoretic communication problem, we will elucidate the geometric mechanism that drives compression of I(X;T) in noisy DNNs. Based on these findings, we will circle back to deterministic networks and explain what the past observations of compression were in fact showing. Future research directions inspired by this study aiming to facilitate a comprehensive information-theoretic understanding of deep learning will also be discussed.

Date and Time:
Wednesday, October 31, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum presents Uncoupled isotonic regression and Wasserstein deconvolution

Topic:
Uncoupled isotonic regression and Wasserstein deconvolution
Abstract / Description:

Isotonic regression is a standard problem in shape-constrained estimation where the goal is to estimate an unknown nondecreasing regression function f from independent pairs (x_i,y_i) where 𝔼[y_i]=f(x_i), i=1,...n. While this problem is well understood both statistically and computationally, much less is known about its uncoupled counterpart where one is given only the unordered sets {x_1,...,x_n} and {y_1,...,y_n}. In this work, we leverage tools from optimal transport theory to derive minimax rates under weak moments conditions on y_i and to give an efficient algorithm achieving optimal rates. Both upper and lower bounds employ moment-matching arguments that are also pertinent to learning mixtures of distributions and deconvolution.

Date and Time:
Friday, October 26, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: Arbitrarily Varying Broadcast and Relay Channels

Topic:
Arbitrarily Varying Broadcast and Relay Channels
Abstract / Description:

Two models of an arbitrarily varying channel (AVC) are studied; both are relevant to modern networks under jamming attacks by an adversary or a hacker. The arbitrarily varying broadcast channel is considered when state information is available at the transmitter in a causal manner. Inner and outer bounds are established, on both the random code capacity region and the deterministic code capacity region with degraded message sets. The form of the bounds raises the question whether the minimax theorem can be generalized to rate regions, i.e. whether the order of the intersection over state distributions and the union over Shannon strategies can be interchanged. A sufficient condition is given, under which this assertion holds and the random code capacity region is determined. As an example, the arbitrarily varying binary symmetric broadcast channel is examined, showing that there are cases where the condition holds, hence the capacity region is determined, and other cases where there is a gap between the bounds. The gap implies that the minimax theorem does not always hold for rate regions.

In the second part of the talk, a new model is introduced, namely, the arbitrarily varying relay channel. The results include the cutset bound, decode-forward bound and partial decode-forward bound on the random code capacity, which require modification of the usual methods for the AVC to fit the block Markov coding scheme. The random code capacity is further determined for special cases. Then, deterministic coding schemes are considered, and the deterministic code capacity is derived under certain conditions, for the degraded and reversely degraded relay channel, and the case of orthogonal sender components. The following question is addressed: If the encoder-decoder and encoder-relay marginals are both symmetrizable, does that necessarily imply zero capacity? We show and explain why the answer is no. The random code capacity is determined for the arbitrarily varying Gaussian relay channel with sender frequency division, and the deterministic code capacity is bounded using the techniques of Csisz\'ar and Narayan's 1991 paper on the Gaussian AVC. It is observed that the gap vanishes as the input becomes less constrained. It is commonly believed that the primitive relay channel "captures most essential features and challenges of relaying, and thus serves as a good testbed for new relay coding techniques" (Kim, 2007). It is observed that in the arbitrarily varying case, this may no longer be true.

This work is part of a Ph.D. thesis under the supervision of Yossef Steinberg.

Date and Time:
Friday, October 12, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: Structured Cooperation for Channels with Feedback and beyond

Topic:
Structured Cooperation for Channels with Feedback and beyond
Abstract / Description:

The capacities of fundamental communication problems such as channels with feedback and two-way communications channels are characterized with multi-letter expressions. The challenge in simplifying these expressions is their exhaustive dependence on all information that is accumulated throughout the communication. In this talk, we aim to simplify such capacities by imposing a structure on the accumulated data via a new sequential quantization technique on a directed graph.

First application of this method is for channels with memory and feedback. We will show upper and lower single-letter bounds on the capacity. The bounds are expressed with structured auxiliary random variable (r.v.), a notion that suits problems of sequential nature. For all cases where the capacity is known, the bounds are tight (with small cardinality of the structured auxiliary r.v.). This reveals a simple capacity formula that captures the major role of structure in feedback problems. We will also present a simple and sequential coding scheme, which is based on the posterior matching principle, and achieves the lower bound (and the capacity in many cases).

As time permits, we will show that structure is beneficial for other communication scenarios such as two-way communication channels with common outputs and the energy harvesting model.

The talk is based on a joint work with Prof. Henry Pfister (Duke Univeristy), Prof. Haim Permuter (BGU) and Prof. Navin Kashyap (IISc).

Date and Time:
Monday, October 8, 2018 - 4:15pm
Venue:
Packard 202

## IT-Forum: Data-Driven Policy Learning: Generalization and Optimization

Topic:
Data-Driven Policy Learning: Generalization and Optimization
Abstract / Description:

The problem of learning good treatment assignment rules from observational data lies at the heart of many challenges in data-driven decision making. While there is a growing body of literature devoted to this problem, most existing results are focused on the binary-action case (i.e., where one action corresponds to assignment to control and to assignment to treatment). In this paper, we study the offline multi-action policy learning problem with observational data and, building on the theory of efficient semi-parametric inference, propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup and provides a substantial performance improvement over the existing learning algorithms. We additionally investigate the application aspects of policy learning by working with decision trees, and discuss two different approaches for solving the key step of the learning algorithm to exact optimality, one using a mixed integer program formulation and the other using a tree-search based algorithm.

This is joint work with Susan Athey and Stefan Wager.

The Information Theory Forum (IT-Forum) at Stanford ISL is an interdisciplinary academic forum which focuses on mathematical aspects of information processing. With a primary emphasis on information theory, we also welcome researchers from signal processing, learning and statistical inference, control and optimization to deliver talks at our forum. We also warmly welcome industrial affiliates in the above fields. The forum is typically held in Packard 202 every Friday at 1:15 pm during the academic year.

The Information Theory Forum is organized by graduate students Yanjun Han and Yihui Quek. To suggest speakers, please contact any of the students.

Date and Time:
Friday, October 5, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum presents Representations, fairness, and privacy: information-theoretic tools for machine learning

Topic:
Representations, fairness, and privacy: information-theoretic tools for machine learning
Abstract / Description:

Information theory can shed light on the algorithm-independent limits of learning from data and serve as a design driver for new machine learning algorithms. In this talk, we discuss a set of flexible information-theoretic tools called the principal inertia components (PICs) that can be used to (i) understand fairness and discrimination in machine learning models, (ii) provide an estimation-theoretic view of privacy, and (iii) characterize data representations learned by complex learning models. The PICs enjoy a long history in both the statistics and information theory, and provide a fine-grained decomposition of the dependence between two random variables. We illustrate these techniques in both synthetic and real-world datasets, and discuss future research directions.

Date and Time:
Friday, September 28, 2018 - 1:15pm
Venue:
Packard 202

## IT-Forum: Reverse hypercontractivity beats measure concentration for information theoretic converses

Topic:
Reverse hypercontractivity beats measure concentration for information theoretic converses
Abstract / Description:

Concentration of measure is a collection of tools and results from analysis and probability theory that have been used in many areas of pure and applied mathematics. Arguably, the first data science application of measure concentration (under the name ''blowing-up lemma'') is the proof of strong converses in multiuser information theory by Ahlswede, G\'acs and K\"orner in 1976. Since then, measure concentration has found applications in many other information theoretic problems, most notably the converse (impossibility) results in information theory. Motivated by this, information theorists (e.g. Marton) have also contributed to the mathematical foundations of measure concentration using their information-theoretic techniques.

Now, after all the past 40 years of such progress, we found that, amusingly, measure concentration is not the right hammer for many of these information theoretic applications. We introduce a new machinery based on functional inequalities and reverse hypercontractivity which yields strict improvements in terms of sharpness of the bounds, generality of the source/channel distributions, and simplicity of the proofs. Examples covered in the talk include: 1. optimal second-order converse to common randomness generation with rate-limited communication; 2. sharpening the relay channel converse bounds by and Wu and Ozgur with much simpler proofs.

The work benefited from collaborations with Thomas Courtade, Paul Cuff, Ayfer Ozgur, Ramon van Handel, and Sergio Verd\'u.

Date and Time:
Friday, August 24, 2018 - 1:15pm
Venue:
Packard 202

Topic:
Abstract / Description:

Quantization plays a critical role in digital signal processing systems. Quantizers are typically designed to obtain an accurate digital representation of the input signal, operating independently of the system task, and are commonly implemented using serial scalar analog-to-digital converters (ADCs). This talk is concerned with hardware-limited task-based quantization, where a system utilizing a serial scalar ADC is designed to provide a suitable representation in order to allow the recovery of a parameter vector underlying the input signal. We propose hardware-limited task-based quantization systems for a fixed and finite quantization resolution, and characterize their achievable distortion. Our results illustrate the benefits of properly taking into account the underlying task in the design of the quantization scheme.

Date and Time:
Wednesday, June 13, 2018 - 1:15pm
Venue:
Packard 202

## IT Forum: Phase transitions in generalized linear models

Topic:
Phase transitions in generalized linear models
Abstract / Description:

This is joint work with Jean Barbier, Florent Krzakala, Nicolas Macris and Lenka Zdeborova.

We consider generalized linear models (GLMs) where an unknown $n$-dimensional signal vector is observed through the application of a random matrix and a non-linear (possibly probabilistic) componentwise output function.

We study the models in the high-dimensional limit, where the observations consists of $m$ points, and $m/n \to \alpha > 0$ as $n \to \infty$. This situation is ubiquitous in applications ranging from supervised machine learning to signal processing.

We will analyze the model-case when the observation matrix has i.i.d. elements and the components of the ground-truth signal are taken independently from some known distribution.

We will compute the limit of the mutual information between the signal and the observations in the large system limit. This quantity is particularly interesting because it is related to the free energy (i.e. the logarithm of the partition function) of the posterior distribution of the signal given the observations. Therefore, the study of the asymptotic mutual information allows to deduce the limit of important quantities such as the minimum mean squared error for the estimation of the signal.

We will observe some phase transition phenomena. Depending on the noise level, the distribution of the signal and the non-linear function of the GLM we may encounter various scenarios where it may be impossible / hard (only with exponential-time algorithms) / easy (with polynomial-time algorithms) to recover the signal.

Date and Time:
Friday, May 18, 2018 - 1:15pm
Venue:
Packard 202