## EE Student Information

#### EE Student Information, Spring Quarter 19-20: FAQs and Updated EE Course List.

Updates will be posted on this page, as well as emailed to the EE student mail list.

As always, use your best judgement and consider your own and others' well-being at all times.

# Statistics and Probability Seminars

## Statistics Department Seminar presents "Data denoising and transfer learning in single cell transcriptomics"

Topic:
Data denoising and transfer learning in single cell transcriptomics
Abstract / Description:

Cells are the basic biological units of multicellular organisms. The development of single-cell RNA sequencing (scRNA-seq) technologies have enabled us to study the diversity of cell types in tissue and to elucidate the roles of individual cell types in disease. Yet, scRNA-seq data are noisy and sparse, with only a small proportion of the transcripts that are present in each cell represented in the final data matrix. We propose a transfer learning framework based on deep neural nets to borrow information across related single cell data sets for denoising and expression recovery. Our goal is to leverage the expanding resources of publicly available scRNA-seq data, for example, the Human Cell Atlas which aims to be a comprehensive map of cell types in the human body. Our method is based on a Bayesian hierarchical model coupled to a deep autoencoder, the latter trained to extract transferable gene expression features across studies coming from different labs, generated by different technologies, and/or obtained from different species. Through this framework, we explore the limits of data sharing: How much can be learned across cell types, tissues, and species? How useful are data from other technologies and labs in improving the estimates from your own study? If time allows, I will also discuss the implications of such data denoising to downstream statistical inference.

Date and Time:
Tuesday, May 26, 2020 - 4:30pm
Venue:
Zoom Meeting ID 910 4626 3951

## Probability Seminar presents "Induced subgraphs with prescribed degrees mod q"

Topic:
Induced subgraphs with prescribed degrees mod q
Abstract / Description:

A classical result of Galai asserts that the vertex-set of every graph can be partitioned into two sets such that each induces a graph with all degrees even. Scott studied the (harder) problem of determining for which graphs can we find a partition into arbitrary many parts, each of which induces a graph with all odd degrees. In this talk we discuss various extensions of this problem to arbitrary residues mod $q\geq 3$. Among other results, we show that for every $q$, a typical graph $G(n,1/2)$ can be equi-partitioned (up to divisibility conditions) into $q+1$ sets, each of which spans a graph with a prescribed degree sequence.

A completely unrelated problem: Based on the same approach we obtained a non-trivial bound (but weaker than known results) on the singularity probability of a random symmetric Bernoulli matrix. The new argument avoids both decoupling and distance from random hyperplanes and it turns this problem into a simple and elegant exercise.

This is mostly based on a joint work with Liam Hardiman (UCI) and Michael Krivelevich (Tel Aviv University).

Date and Time:
Monday, May 18, 2020 - 4:00pm
Venue:
Zoom ID: 917 2019 2125 (meeting locked 10 min. after start)

## Statistics Department Seminar presents "Advancing medical research with 3D shape analysis of bioimaging data"

Topic:
Advancing medical research with 3D shape analysis of bioimaging data
Abstract / Description:

Advances in bioimaging techniques have enabled us to access the 3D shapes of a variety of structures: organs, cells, proteins. Since biological shapes are related to physiological functions, medical research is poised to incorporate more shape statistics. This leads to the question: how can we build quantified descriptions of shape variability from biomedical images

We first consider two biomedical analyses that require shape learning on small imaging datasets: (1) surgical planning for orthopedic surgery, and (2) research on pre-symptomatic biomarkers of Alzheimer's disease. We introduce elements of shape statistics to assess the accuracy of these studies. Then, we address a shape reconstruction challenge in pharmacological research: protein shape reconstruction using cryo-electron microscopy.

This talk shows how shape descriptors at different scales contribute to the development of precision medicine. The elements of geometric statistics required for this work are implemented in the open-source Python library Geomstats.

Date and Time:
Tuesday, May 19, 2020 - 4:30pm
Venue:
Zoom ID: 998 6129 8033 (meeting locked 10 min. after start)

## Statistics Department Seminar presents "Performative prediction"

Topic:
Performative prediction
Abstract / Description:

When predictions support decisions they may influence the outcome they aim to predict. We call such predictions performative the prediction influences the target. Performativity is a well-studied phenomenon in policy-making that has so far been neglected in supervised learning. When ignored, performativity surfaces as undesirable distribution shift, routinely addressed with retraining. We develop a risk minimization framework for performative prediction bringing together concepts from statistics, game theory, and causality. A conceptual novelty is an equilibrium notion we call performative stability. Performative stability implies that the predictions are calibrated not against past outcomes, but against the future outcomes that manifest from acting on the prediction. Our main results are necessary and sufficient conditions for the convergence of retraining to a performatively stable point of nearly minimal loss. In full generality, performative prediction strictly subsumes the setting known as strategic classification. We thus also give the first sufficient conditions for retraining to overcome strategic feedback effects.

This is joint work with Juan C. Perdomo, Tijana Zrnic, and Celestine Mendler-Dünner, and is available from Arxiv.

Date and Time:
Tuesday, May 12, 2020 - 4:30pm
Venue:
Zoom

## Statistics Department Seminar presents "Optimal procedures in private estimation"

Topic:
Optimal procedures in private estimation
Abstract / Description:

In this talk, I will review private procedures (typically called mechanisms) for releasing functions of a sample, focusing on differential privacy and related strong definitions of privacy. I will describe mechanisms that enjoy instance-optimal — meaning that in a strong sense, they achieve the best possible behavior for the given problem instance — guarantees. On the methodological side, I will highlight a few examples, including median estimation and statistical risk minimization. On the more theoretical side, I will describe techniques for giving such instance-optimal bounds, highlighting the desiderata I believe one must satisfy for an optimality result to truly mean optimal.

This is based on joint work with Hilal Asi and Feng Ruan.

Date and Time:
Tuesday, May 5, 2020 - 4:30pm
Venue:
Zoom

## Probability Seminar presents "Speeding up Markov chains with deterministic jumps"

Topic:
Speeding up Markov chains with deterministic jumps
Abstract / Description:

A striking example of the phenomenon: Consider simple random walk on the integers mod n [X(k+1)] = X(k) + epsilon(k+1)(mod n) where epsilon takes values {0,1,-1} with probability 1/3 each. This walk takes order n2 steps to get random. Now make a slight modification: X(k+1) = 2X(k) + epsilon(k+1) (mod n). This has the same amount of randomness BUT, for almost all n, the walk gets random in order log(n) steps.

In joint work with Sourav Chatterjee we show this as quite a general phenomenon. For any doubly stochastic Markov chain on n states and any permutation f(x) on the state space, the walk that goes from x to f(x) to y, where y is one step of the chain, mixes in order log (n) states for almost all permutations f. Since it happens for most f, this raises the problem of finding specific f for real problems. Some progress will be reported.

Date and Time:
Monday, April 27, 2020 - 4:00pm

## Probability Seminar presents "Distribution of descents (and other permutation statistics) in conjugacy classes of S_n"

Topic:
Distribution of descents (and other permutation statistics) in conjugacy classes of S_n
Abstract / Description:

The distribution of descents in Sn, the symmetric group, have been previously studied. This talk starts with a bijective proof (using tableaux) of the symmetry of the descents and major indices in matchings (fixed point free involutions) and uses a generating function approach to prove a central theorem for descents in matchings. This approach will be extended to prove central limit theorems for descents in all conjugacy classes of Sn, and to other permutation statistics, such as peaks and major indices.

This is joint work with Sangchul Lee (UCLA) and should be accessible to all graduate students.

Date and Time:
Monday, April 20, 2020 - 4:00pm
Venue:
Meeting ID 98027417672

## Stats Dept. presents "Complex trait genetics through the lens of regulatory networks"

Topic:
Complex trait genetics through the lens of regulatory networks
Abstract / Description:

Genome-wide association studies (GWAS) have cataloged many significant associations between genetic variants and complex traits. However, most of these findings have unclear biological significance, because they often have small effects and occur in non-coding regions. Integration of GWAS with gene regulatory networks addresses both issues by aggregating weak genetic signals within regulatory programs. Here we develop a Bayesian framework that integrates GWAS summary statistics with regulatory networks to infer enrichments and associations simultaneously. Our method improves upon existing approaches by explicitly modeling network topology to assess enrichments, and by automatically leveraging enrichments to identify associations. Applying this method to 18 human traits and 38 regulatory networks shows that genetic signals of complex traits are often enriched in networks specific to trait-relevant tissue or cell types. Prioritizing variants within enriched networks identifies known and new trait-associated genes revealing novel biological and therapeutic insights.

Date and Time:
Tuesday, April 28, 2020 - 4:30pm
Venue:
Zoom

## Stats Dept. presents "A general framework to analyze stochastic linear bandit"

Topic:
A general framework to analyze stochastic linear bandit
Abstract / Description:

Multi-armed bandit (MAB) experiments have recently received significant attention in data-centric enterprises, given their promise in reducing opportunity cost of experimentation. In this talk we consider the well-known stochastic linear bandit problem which includes (as special case) standard MAB experiments and their personalized (contextual) counterpart. In this setting a decision-maker sequentially chooses among a set of given actions in $\mathbb{R}^d$, observes their noisy reward, and aims to maximize her cumulative expected reward over a horizon of length $T$. We introduce a general family of algorithms for the problem that achieve the best possible performance (i.e., are rate optimal), and show that some of the well-known algorithms in the literature such as optimism in the face of uncertainty linear bandit (OFUL) and Thompson sampling (TS) are special cases of our family of algorithms. Therefore, we obtain a unified proof of rate optimality for all of these algorithms. Our analysis also yields a number of new results and solves an open problem. For example, we show that TS can incur a linear worst-case regret, unless it uses inflated (by a factor of $\sqrt{d}$) posterior variances at each step. We also show that TS can incur a linear Bayesian regret if it does not use the correct prior or noise distribution.

This talk is based on joint work with Nima Hamidi, and a preleminary draft is available on ArXiv.

Date and Time:
Tuesday, April 21, 2020 - 4:30pm
Venue:
Zoom

## Stats Dept. presents "Challenges in analyzing two-sided markets and its application on ridesourcing platforms"

Topic:
Challenges in analyzing two-sided markets and its application on ridesourcing platforms
Abstract / Description:

In this talk, we will introduce a general analytical framework for large-scale data obtained from two-sided markets, especially ridesourcing platforms like DiDi. This framework integrates classical methods including Experiment Design, Causal Inference and Reinforcement Learning, with modern machine learning methods, such as Graph Convolutional Models, Deep Learning, Transfer Learning and Generative Adversarial Network. We aim to develop fast and efficient approaches to address five major challenges for ride-sharing platforms, ranging from demand-supply forecasting, demand-supply diagnosis, MDP-based policy optimization, A-B testing, to business operation simulation. Each challenge requires substantial methodological developments and inspires many researchers from both industry and academia to participate in this endeavor. Based on our preliminary results for the policy optimization challenge, in 2019 we received the INFORMS Daniel Wagner Prize for Excellence in Operations Research Practice. All the research accomplishments presented in this talk are joint work by a group of researchers at Didi Chuxing and our international collaborators.

Date and Time:
Tuesday, April 14, 2020 - 4:30pm
Venue:
Zoom