## Statistics Department Seminar welcomes Jonathan Taylor

TBA - please check Statistics Dept Seminar page for updates. https://statistics.stanford.edu/events/statistics-seminar

Topic:

TBA

Abstract / Description:

TBA - please check Statistics Dept Seminar page for updates. https://statistics.stanford.edu/events/statistics-seminar

Date and Time:

Tuesday, November 19, 2019 - 4:30pm

Venue:

Sequoia Hall Room 115

Topic:

Reliable predictions? Equitable treatment? Some recent progress in predictive inference

Abstract / Description:

Recent progress in machine learning (ML) provides us with many potentially effective tools to learn from datasets of ever increasing sizes and make useful predictions. How do we know that these tools can be trusted in critical and high-sensitivity systems? If a learning algorithm predicts the GPA of a prospective college applicant, what guarantees do I have concerning the accuracy of this prediction? How do we know that it is not biased against certain groups of applicants? This talk introduces statistical ideas to ensure that the learned models satisfy some crucial properties, especially reliability and fairness (in the sense that the models need to apply to individuals in an equitable manner). To achieve these important objectives, we shall not "open up the black box" and try understanding its underpinnings. Rather we discuss broad methodologies — conformal inference, quantile regression, the Jackknife+ — that can be wrapped around any black box to produce results that can be trusted and are equitable.

- please check Statistics Dept Seminar page for updates. https://statistics.stanford.edu/events/statistics-seminar

Date and Time:

Tuesday, November 12, 2019 - 4:30pm

Venue:

Sequoia Hall Room 115

Topic:

Imputation and causal inference in genomics

Abstract / Description:

Genomic data can be complex, large, noisy and sparse. Here I will discuss two problems we have worked on. The first problem deals with the highly sparse data from single-cell experiments of gene expression. These data contain a large number of zeros (> 80%); many of these zeros are missing values rather than no expression. Underlying these data are complex regulatory relationships among genes, as well as potentially many cell types with different gene expression profiles. We took a deep learning approach and designed imputation methods based on autoencoders. We generated synthetic data using real singlecell data to evaluate the performance, although the theoretical properties of autoencoders for imputation are yet to be understood.

The second problem deals with causal inference: can we learn the biological mechanism directly from genomic data? For example, which genes regulate which other genes? And which genes are targeted by drugs? Genetic variation makes this inference possible (under certain assumptions), as it provides randomization among the individuals: this is known as the principle of Mendelian randomization in genetic epidemiology. We extended the interpretation of this principle to capture more causal relationships. We also developed an algorithm for learning causal graphs based on the PC algorithm, a classical algorithm in computer science for inferring directed acyclic graphs.

Date and Time:

Tuesday, November 5, 2019 - 4:30pm

Venue:

Sequoia Hall Room 115

Topic:

TBA

Abstract / Description:

Abstract TBA - check stats website for updates, https://statistics.stanford.edu/events/probability-seminar

Date and Time:

Friday, December 6, 2019 - 4:00pm

Venue:

Sequoia Hall Room 200

Topic:

TBA

Abstract / Description:

Abstract TBA - check Stats website for updates, https://statistics.stanford.edu/events/probability-seminar

Date and Time:

Monday, November 18, 2019 - 4:00pm

Venue:

Sequoia Hall Room 200

Topic:

TBA

Abstract / Description:

Liouville quantum gravity (LQG) is in some sense the canonical model of a two-dimensional Riemannian manifold and is defined using the (formal) metric tensor

e^{γh(z)}(dx^{2} + dy^{2}) where h is an instance of some form of the Gaussian free field and γ ∈ (0, 2) is a parameter. This expression does not make literal sense since h is a distribution and not a function, so cannot be exponentiated. Previously, the associated metric (distance function) was constructed only in the special case γ = p 8/3 in joint work with Sheffield. In this talk, we will show how to associate with LQG a canonical conformally covariant metric for all γ ∈ (0, 2). It is obtained as a limit of certain approximations which were recently shown to be tight by Ding, Dub´edat, Dunlap and Falconet. This is based on joint work with Ewain Gwynne.

Abstract TBA - check stats website for updates, https://statistics.stanford.edu/events/probability-seminar

Date and Time:

Monday, November 11, 2019 - 4:30pm

Venue:

Sequoia Hall Room 200

Topic:

Geometric law for numbers of returns until a hazard

Abstract / Description:

For a ψ-mixing sequence of identically distributed random variables X0, X1, X2, . . . and pairs of shrinking disjoint sets VN , WN , N = 1, 2, . . . , we count the number NN of returns to VN by the sequence until its first arrival to WN (hazard time). Let µ be the distribution of X0. It turns out that if µ(VN ), µ(WN ) → 0 as N → ∞ with the same speed then NN tends in distribution to a geometric random variable. A somewhat different setup deals with a ψ− or φ-mixing stationary process with a countable state space A where for a fixed pair of sequences ξ, η ∈ AN we count the number Nξ,η(n, m) of i's for which (Xi, Xi+1, . . . , Xi+n−1) coincides with (ξ0, ξ1, . . . , ξn−1) until the first j for which (Xj , Xj+1, . . . , Xj+m−1) coincides with (η0, η1, . . . , ηn−1). It turns out that for almost all pairs ξ, η if ratios of probabilities of cylinder sets [ξ0, . . . , ξn−1] and [η0, . . . , ηm(n)−1] converges as n, m(n) → ∞, then Nξ,η(n, m(n)) tends in distribution to a geometric random variable. Motivations, connections, and several generalizations of these results will be discussed as well.

Date and Time:

Monday, November 11, 2019 - 3:15pm

Venue:

Sequoia Hall Room 200

Topic:

Combinatorial anti-concentration inequalities

Abstract / Description:

Consider a degree-d polynomial f(ξ1, . . . , ξn) of independent Bernoulli random variables. What can be said about the concentration of f on any single value? This generalises the classical Littlewood–Offord problem, which asks the same question for linear polynomials. In this talk we discuss a few recent results in this area, focusing on combinatorial aspects.

*This is joint work with Jacob Fox and Lisa Sauermann.*

Date and Time:

Monday, November 4, 2019 - 4:00pm

Venue:

Sequoia Hall Room 200

Topic:

Models and Algorithms for Understanding Neural and Behavioral Data

Abstract / Description:

The trend in neural recording capabilities is clear: we can record orders of magnitude more neurons now than we could only a few years ago, and technological advances do not seem to be slowing. Coupled with rich behavioral measurements, genetic sequencing, and connectomics, these datasets offer unprecedented opportunities to learn how neural circuits function. But they also pose serious modeling and algorithmic challenges. How do we develop probabilistic models for such heterogeneous data? How do we design models that are flexible enough to capture complex spatial and temporal patterns, yet interpretable enough to provide new insight? How do we construct algorithms to efficiently and reliably fit these models? I will present some of our recent work on recurrent switching linear dynamical systems and corresponding Bayesian inference algorithms that aim to overcome these challenges, and I will show how these methods can help us gain insight into complex neural and behavioral data.

Date and Time:

Thursday, October 24, 2019 - 1:30pm

Venue:

Medical School Office Building Room x303

Topic:

"Robustness meets algorithms"

Abstract / Description:

In every corner of machine learning and statistics, there is a need for estimators that work not just in an idealized model but even when their assumptions are violated. Unfortunately, in high-dimensions, being provably robust and efficiently computable are often at odds with each other.

In this talk, we give the first efficient algorithm for estimating the parameters of a highdimensional Gaussian which is able to tolerate a constant fraction of corruptions that is independent of the dimension. Prior to our work, all known estimators either needed time exponential in the dimension to compute, or could tolerate only an inverse polynomial fraction of corruptions. Not only does our algorithm bridge the gap between robustness and algorithms, it turns out to be highly practical in a variety of settings.

Date and Time:

Tuesday, October 22, 2019 - 4:30pm

Venue:

McCullough Building Room 115