## EE Student Information

### The Department of Electrical Engineering supports Black Lives Matter. Read more.

• • • • •

Updates will be posted on this page, as well as emailed to the EE student mail list.

As always, use your best judgement and consider your own and others' well-being at all times.

# Statistics and Probability Seminars

## Statistics Department Seminar presents "Computational methods for understanding genetics of complex human traits"

Topic:
Computational methods for understanding genetics of complex human traits
Abstract / Description:

While genomewide association studies (GWAS) have been successful in mapping genetics of a range of complex traits, it has been difficult to translate the associations into mechanistic understandings. In the first part of my talk, I will describe a recently developed method to identify risk factors of complex traits. While a trait may have a large number of risk variants at the DNA level, their effects are likely mediated by a smaller number of intermediate traits such as cellular phenotypes. Identifying causal risk factors is thus a promising approach to translate GWAS into actionable targets. Mendelian Randomization (MR) is a framework to address this problem, using genetic variants of an exposure trait as "natural randomization" to estimate its effect on an outcome. However, current MR methods make strong assumptions that are violated when SNPs act on outcome not through the exposure, known as pleiotropic effects. We propose a method, CAUSE to deal with pleiotropy, by explicitly modeling hidden factors that would confound the relationship of exposure and outcome. We show in simulations and GWAS that CAUSE significantly reduces false discoveries while maintaining power.

In the second part, I will talk about our method for using rare variants to study complex trait genetics. Comparing with common variants, the focus of GWAS, rare variants are usually not in linkage disequilibrium, making it easier to detect causal variants and genes. However, the power of identifying rare variants is low. We described our approach to addressing this challenge, in the context of de novo mutations. Our method combines information of variants at the level of genes, and leverages functional information of variants. This method enabled the discovery of a number of risk genes of autism.

Date and Time:
Tuesday, August 11, 2020 - 4:30pm
Venue:
Meeting ID 959 2194 3145 (+password)

## Statistics Department Seminar presents "Triumphs and challenges in the identification of the genetic determinants of coronary artery disease in the multi-ethnic Million Veteran Program"

Topic:
Triumphs and challenges in the identification of the genetic determinants of coronary artery disease in the multi-ethnic Million Veteran Program
Abstract / Description:

Coronary artery disease (CAD) remains the number one cause of morbidity and mortality worldwide despite the development of several effective primary and secondary preventative therapies over the last 50 years. Genome wide association studies (GWAS) of Coronary Artery Disease (CAD) have identified ~180 autosomal susceptibility loci to date largely among European populations but also South and East Asian populations. Curiously, almost 13 years after the discovery of the first genome wide significant locus in Europeans at the 9p21 locus, no locus has reached genome wide significance among blacks and Hispanic admixed populations. We will review the results of the largest GWAS for CAD performed to date involving Europeans, African American, and Hispanic American participants of the Million Veteran Program demonstrating along the way the strengths and limitations of several established statistical algorithms in the calculation of ethnic specific heritability estimates, local ancestry estimates, and polygenic risk scores.

Date and Time:
Tuesday, July 28, 2020 - 4:30pm
Venue:
Zoom ID 941 3461 3493 (+password)

## Probability Seminar presents "Extreme eigenvalues of adjacency matrices of random $d$-regular graphs"

Topic:
Extreme eigenvalues of adjacency matrices of random $d$-regular graphs
Abstract / Description:

I will discuss some results on extreme eigenvalue distributions of adjacency matrices of random $d$-regular graphs, which are believed to be universal following the Tracy-Widom distribution

In the first part of the talk I will present the results on random $d$-regular graphs, where $d$ grows with the size $N$ of the graph, we confirmed that on the regime $N^{2/9}<< d<< N^{1/3}$ the extremal eigenvalues after proper rescaling are concentrated at scale $N^{-2/3}$ and their fluctuations are governed by the Tracy-Widom statistics. Thus, in the same regime of $d$, about fifty two percent of all $d$-regular graphs have the second-largest eigenvalue strictly less than $2\sqrt{d-1}$. In the second part of the talk, I will focus on random $d$-regular graphs with fixed $d>=3$, and give a new proof of Alon's second eigenvalue conjecture that with high probability, the second eigenvalue of a random $d$-regular graph is bounded by $2\sqrt{d-1}+o(1)$, where we can show that the error term is polynomially small in the size of the graph.

These are based on joint works with Roland Bauerschmids, Antti Knowles and Horng-Tzer Yau.

Date and Time:
Monday, July 27, 2020 - 4:00pm
Venue:
Zoom ID 959 4815 5057 (+password)

## Statistics Department Seminar presents "Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling"

Topic:
Testing goodness-of-fit and conditional independence with approximate co-sufficient sampling
Abstract / Description:

Goodness-of-fit (GoF) testing is ubiquitous in statistics, with direct ties to model selection, confidence interval construction, conditional independence testing, and multiple testing. While testing the GoF of a simple null hypothesis provides an analyst great flexibility in the choice of test statistic while still ensuring validity, most GoF tests for composite null hypotheses are far more constrained, as the test statistic must have a tractable distribution over the entire null model space. A notable exception is co-sufficient sampling (CSS), which resamples. But CSS testing requires the null model to have a compact (in an information-theoretic sense) sufficient statistic, which only holds for a very limited class of models; even for a null model as simple as logistic regression, CSS testing is powerless. In this work, we leverage the concept of approximate sufficiency to generalize CSS testing to essentially any parametric model with an asymptotically-efficient estimator; we call our extension "approximate CSS" (aCSS) testing. We quantify the finite-sample Type I error inflation of aCSS testing and show that it is vanishing under standard maximum likelihood asymptotics, for any choice of test statistic. We also apply our proposed procedure both theoretically and in simulation to a number of models of interest.

This work is joint with Lucas Janson.

Date and Time:
Tuesday, July 21, 2020 - 4:30pm
Venue:
Meeting ID 941 3461 3493 (+password)

## Probability Seminar presents "Gaussian regularization of pseudospectrum, eigenvalue gaps, and overlaps"

Topic:
Gaussian regularization of pseudospectrum, eigenvalue gaps, and overlaps
Abstract / Description:

Hermitian matrices are stable under small, additive perturbations, but this fact fails dramatically to generalize to the non-Hermitian case, as there are non-diagonalizable n x n matrices whose spectra move by O(ε^n) after an ε-perturbation. This issue is especially concerning for numerical linear algebra applications, where even the presence of routine machine noise can drastically alter the spectrum of a modestly sized matrix–and is mitigated only by the fact that the non-diagonalizable matrices of any dimension have measure zero.

In this talk I'll quantify this fact: a small entry-wise Gaussian perturbation εG of any n x n matrix A has a basis of eigenvectors with condition number poly(n), and eigenvalue gaps 1/poly(n). The main technique exploits the relationship between pseudospectrum and eigenvector condition number, reducing the problem to the proof of certain tail bounds on small singular values of the matrices zI – A – εG, for generic complex z. Time permitting, I'll discuss extensions to a numerical linear algebra application, where this random regularization is used as a preconditioning step in an algorithm to rapidly approximate the eigenvectors and eigenvalues of any matrix.

This is based on joint work with Jorge Garza Vargas, Archit Kulkarni, Satyaki Mukherjee, and Nikhil Srivastava, and found in these three papers:

Gaussian regularization of the pseudospectrum and Davies' conjecture
Overlaps, eigenvalue gaps, and pseudospectrum under real Ginibre and absolutely continuous perturbations
Pseudospectral shattering, the sign function, and diagonalization in nearly matrix multiplication time

Date and Time:
Monday, July 20, 2020 - 4:00pm
Venue:
Meeting ID 958 9191 8759 (+password)

## Statistics Department Seminar presents "Statistical frameworks for mapping 3D shape variation onto genotypic and phenotypic variation"

Topic:
Statistical frameworks for mapping 3D shape variation onto genotypic and phenotypic variation
Abstract / Description:

The recent curation of large-scale databases with 3D surface scans of shapes has motivated the development of tools that better detect global-patterns in morphological variation. Studies which focus on identifying differences between shapes have been limited to simple pairwise comparisons and rely on pre-specified landmarks (that are often known). In this talk, we present SINATRA: a statistical pipeline for analyzing collections of shapes without requiring any correspondences. Our method takes in two classes of shapes and highlights the physical features that best describe the variation between them.

The SINATRA pipeline implements four key steps. First, SINATRA summarizes the geometry of 3D shapes (represented as triangular meshes) by a collection of vectors (or curves) that encode changes in their topology. Second, a nonlinear Gaussian process model, with the topological summaries as input, classifies the shapes. Third, an effect size analog and corresponding association metric is computed for each topological feature used in the classification model. These quantities provide evidence that a given topological feature is associated with a particular class. Fourth, the pipeline iteratively maps the topological features back onto the original shapes (in rank order according to their association measures) via a reconstruction algorithm. This highlights the physical (spatial) locations that best explain the variation between the two groups.

We use a rigorous simulation framework to assess our approach, which themselves are a novel contribution to 3D image analysis. Lastly, as a case study, we use SINATRA to analyze mandibular molars from four different suborders of primates and demonstrate its ability recover known morphometric variation across phylogenies.

Date and Time:
Tuesday, July 14, 2020 - 4:30pm
Venue:
Meeting ID 941 3461 3493 (+password)

## Probability Seminar presents "Limit theorems for descents of Mallows permutations"

Topic:
Limit theorems for descents of Mallows permutations
Abstract / Description:

The Mallows measure on the symmetric group gives a way to generate random permutations which are more likely to be sorted than not. There has been a lot of recent work to try and understand limiting properties of Mallows permutations. I'll discuss recent work on the joint distribution of descents, a statistic counting the number of "drops" in a permutation, and descents in its inverse, generalizing work of Chatterjee and Diaconis, and Vatutin. The proof is new even in the uniform case and uses Stein's method with a size-bias coupling as well as a regenerative representation of Mallows permutations.

Date and Time:
Monday, July 13, 2020 - 4:00pm
Venue:
Meeting ID 916 4174 2729 (+password

## Statistics Department Seminar presents "Group testing for efficient SARS-CoV-2 detection during the COVID-19 pandemic"

Topic:
Group testing for efficient SARS-CoV-2 detection during the COVID-19 pandemic
Abstract / Description:

Group testing involves testing individual items together as a combined group, rather than separately, to better understand each item. This process is used in numerous applications, including the screening of blood donations, the detection of sexually transmitted diseases, the discovery of chemical compounds for new pharmaceuticals, and the estimation of virus transmission rates from insects to plants. Applied in appropriate settings, group testing can greatly reduce associated testing costs and increase testing efficiency. This is why group testing has played an important role with increasing testing capacity during the COVID-19 pandemic. My presentation examines the statistical and non-statistical aspects of group testing for SARS-CoV-2, the virus that causes COVID-19. I will explain how group testing is being used now and how it could be implemented better to maximize its impact.

Date and Time:
Tuesday, July 7, 2020 - 4:30pm
Venue:
Zoom

## Statistics Department Seminar presents "Novel clinical trial designs and statistical methods in the era of precision medicine"

Topic:
Novel clinical trial designs and statistical methods in the era of precision medicine
Abstract / Description:

We begin with FDA's Guidance for Industry on (a) Master Protocols for Efficient Clinical Trial Designs to Expedite Development of Oncology Drugs and Biologics in 2018; (b) Enrichment Strategies for Clinical Trials to Support Determination of Effectiveness of Human Drugs and Biological Products in March 2019; and (c) Adaptive Designs for Clinical Trials of Drugs and Biologics in November 2019. We then describe their biostatistical and biopharmaceutical underpinnings, focusing on recent advancements in adaptive group sequential trial designs and statistical methods for their analysis, and conclude with challenges and opportunities for statistical science in precision medicine and regulatory submission.

This is joint work with Tze Lai and Nikolas Weissmueller.

Date and Time:
Tuesday, June 30, 2020 - 4:30pm
Venue:
Zoom ID: 941 3461 3493 (+password to come)

## Statistics Department Seminar “Identifying condition-specific patterns in large-scale genomic data”

Topic:
Identifying condition-specific patterns in large-scale genomic data
Abstract / Description:

Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation. But it still remains computationally challenging even when the number of conditions is moderate.

I will present CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology which learns patterns of condition specificity present in genomic data by leveraging pairwise information. CLIMB provides a generic framework facilitating a host of downstream analyses, such as clustering genomic features sharing similar conditional-specific patterns and identifying which of these features are involved in cell fate commitment. It improves upon existing methods by boosting statistical power to identify biologically meaningful signals while retaining interpretability and computational tractability. We illustrate CLIMB's value on a CTCF ChIP-seq dataset measured in 17 different cell populations and an RNA-seq dataset measured in three committed hematopoietic lineages. These analyses demonstrate that CLIMB captures biologically relevant clusters in the data and improves upon commonly-used pairwise comparisons and unsupervised clusterings typical of genomic analyses.

Date and Time:
Tuesday, June 23, 2020 - 4:30pm
Venue:
Zoom ID 941 3461 3493 (+password)