EE Student Information

The Department of Electrical Engineering supports Black Lives Matter. Read more.

• • • • •

EE Student Information, Spring Quarter through Academic Year 2020-2021: FAQs and Updated EE Course List.

Updates will be posted on this page, as well as emailed to the EE student mail list.

Please see Stanford University Health Alerts for course and travel updates.

As always, use your best judgement and consider your own and others' well-being at all times.

Graduate

Workshop in Biostatistics presents "Air Pollution, COVID19, and Race: Data Science Challenges and Opportunities"

Topic: 
Air Pollution, COVID19, and Race: Data Science Challenges and Opportunities
Abstract / Description: 

Biomedical Data Science Seminar

The coronavirus will likely kill thousands of Americans. But what if I told you about a serious threat to American national security. This emergency comes from climate change and air pollution. To help

address this threat, we have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels dividing the
country into 1-square-kilometer zones across the continental U.S. We have paired this information with health data contained in Medicare claims records from the last 12 years, which includes 97% of
the population ages 65 or older. We also developed statistical methods for causal inference and computational efficient algorithms for the analysis of over 550 million health records. The result?
This data science platform is telling us that federal limits on the nation's most widespread air pollutants are not stringent enough. Our research shows that short- and long-term exposure to air
pollution is killing thousands of senior citizens each year. Our research shows the critical new role of data science in public health and the associated methodological challenges. For example, with
enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. We will discuss these and other challenges.


Suggested Reading

  • "Evaluating the impact of long-term exposure to fine particulate matter on mortality among the elderly," https://advances.sciencemag.org/content/6/29/eaba5692
  • "Exposure to air pollution and COVID-19 mortality in the United States: A nationwide cross-sectional study," https://www.medrxiv.org/content/10.1101/2020.04.05.20054502v2
  • "Inequalities in air pollution exposure are increasing in the United States," https://www.medrxiv.org/content/10.1101/2020.07.13.20152942v1
Date and Time: 
Thursday, October 22, 2020 - 2:30pm

Workshop in Biostatistics presents "Don’t Expl-AI-n Yourself: Exploring "Healthy" Models in Machine Learning for Health"

Topic: 
Don’t Expl-AI-n Yourself: Exploring "Healthy" Models in Machine Learning for Health
Abstract / Description: 

Despite the importance of human health, we do not fundamentally understand what it means to be healthy. Health is unlike many recent machine learning success stories - e.g., games or driving - because there are no agreed-upon, well-defined objectives. In this talk, Dr. Marzyeh Ghassemi will discuss the role of machine learning in health, argue that the demand for model interpretability is dangerous, and explain why models used in health settings must also be "healthy". She will focus on a progression of work that encompasses prediction, time series analysis, and representation learning.

Date and Time: 
Thursday, October 15, 2020 - 2:30pm

Workshop in Biostatistics presents "Statistical analysis of single cell CRISPR screens"

Topic: 
Statistical analysis of single cell CRISPR screens
Abstract / Description: 

Mapping gene-enhancer regulatory relationships is key to unraveling molecular disease mechanisms based on GWAS associations in non-coding regions. This problem is notoriously challenging: there is a many-to-many mapping between genes and enhancers, and enhancers can be located far from their target genes. Recently developed CRISPR regulatory screens (CRSs) based on single cell RNA-seq (scRNA-seq) are a promising high-throughput experimental approach to this problem. They operate by infecting a population of cells with thousands of CRISPR guide RNAs (gRNAs), each targeting an enhancer. Each cell receives a random combination of CRISPR gRNAs, which suppress the action of their corresponding enhancers. The gRNAs and whole transcriptome in each cell are then recovered through scRNA-seq. CRSs provide more direct evidence of regulation than existing methods based on epigenetic data or even chromatin conformation. However, the analysis of these screens presents significant statistical challenges, some inherited from scRNA-seq analysis (modeling single cell gene expression) and some unique to CRISPR perturbation screens (the confounding effect of sequencing depth). In this talk, I will first give some background on single cell CRISPR screen technology. I will then present the first genome-wide single cell CRS dataset (Gasperini et al. 2019) and discuss challenges that arose in its initial analysis. Finally, I will present a novel methodology for the analysis of this data based on the conditional randomization test. The key idea is to base inference on the randomness in the assortment of gRNAs among cells rather than on the randomness in single cell gene expression, since the former is easier to model than the latter.

Suggested Readings:
● "Towards a comprehensive catalogue of validated and target-linked human enhancers" (Nature Reviews Genetics 2020).
● "A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens" (Cell 2019).
● "Panning for gold: 'model-X' knockoffs for high dimensional controlled variable selection" (Journal of the Royal Statistical Society, Series B 2018) and "Fast and Powerful Conditional Randomization Testing via Distillation" (arXiv 2020).
● "Conditional resampling improves sensitivity and specificity of single cell CRISPR regulatory screens" (bioRxiv 2020).

Date and Time: 
Thursday, October 8, 2020 - 2:30pm

Workshop in Biostatistics

Topic: 
Prediction, Estimation, and Attribution
Abstract / Description: 

The scientific needs and computational limitations of the Twentieth Century fashioned classical statistical methodology. Both the needs and limitations have changed in the Twenty-First, and so has the methodology. Large-scale prediction algorithms - neural nets, deep learning, boosting, support vector machines, random forests - have achieved star status in the popular press. They are recognizable as heirs to the regression tradition, but ones carried out at enormous scale and on titanic data sets. How do these algorithms compare with standard regression techniques such as Ordinary Least Squares or logistic regression? Several key discrepancies will be examined, centering on the differences between prediction and estimation or prediction and attribution (that is, significance testing). Most of the discussion is carried out through small numerical examples. The talk does not assume familiarity with prediction algorithms.

Date and Time: 
Thursday, October 1, 2020 - 2:30pm

Statistics Department Seminar presents "On the statistical foundations of adversarially robust learning"

Topic: 
On the statistical foundations of adversarially robust learning
Abstract / Description: 

Robustness has long been viewed as an important desired property of statistical methods. More recently, it has been recognized that complex prediction models such as deep neural nets can be highly vulnerable to adversarially chosen perturbations of their outputs at test time. This area, termed adversarial robustness, has garnered an extraordinary level of attention in the machine learning community over the last few years. However, little is known about the most basic statistical questions. In this talk, I will present answers to some of them. In particular, I will show how class imbalance has a crucial effect, and leads to unavoidable tradeoffs between robustness and accuracy, even in the limit of infinite data (i.e., for the Bayes error). I will also show other results, some of them involving novel applications of results from robust isoperimetry (Cianchi et al, 2011).

This is joint work with Hamed Hassani, David Hong, and Alex Robey.

Date and Time: 
Tuesday, November 17, 2020 - 4:30pm

Statistics Department Seminar presents "Conditional calibration for false discovery rate control under dependence"

Topic: 
Conditional calibration for false discovery rate control under dependence
Abstract / Description: 

We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our general framework we propose a concrete algorithm, the dependence-adjusted Benjamini–Hochberg (dBH) procedure, which adaptively thresholds the q-value for each hypothesis. Under positive regression dependence the dBH procedure uniformly dominates the standard BH procedure, and in general it uniformly dominates the Benjamini–Yekutieli (BY) procedure (also known as BH with log correction). Simulations and real-data examples illustrate power gains over competing approaches to FDR control under dependence.

This is joint work with Lihua Lei.


 

Joint Colloquium with Berkeley at Stanford

Date and Time: 
Tuesday, November 10, 2020 - 4:30pm

Statistics Department Seminar presents "Some theoretical results on model-based reinforcement learning"

Topic: 
Some theoretical results on model-based reinforcement learning
Abstract / Description: 

We discuss some recent results on model-based methods for reinforcement learning (RL) in both online and offline problems.

For the online RL problem, we discuss several model-based RL methods that adaptively explore an unknown environment and learn to act with provable regret bounds. In particular, we focus on finite-horizon episodic RL where the unknown transition law belongs to a generic family of models. We propose a model based "value-targeted regression" RL algorithm that is based on an optimism principle: in each episode, the set of models that are "consistent" with the data collected is constructed. The criterion of consistency is based on the total squared error that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, for an arbitrary family of transition models, using the notion of the so-called Eluder dimension proposed by Russo and Van Roy (2014).


Next we discuss batch-data (offline) reinforcement learning, where the goal is to predict the value of a new policy using data generated by some behavior policy (which may be unknown). We show that the fitted Q-iteration method with linear function approximation is equivalent to a model-based plugin estimator. We establish that this model-based estimator is minimax optimal and its statistical limit is determined by a form of restricted chi-square divergence between the two policies.

Date and Time: 
Tuesday, November 3, 2020 - 4:30pm
Venue: 
Registration required

Statistics Department Seminar presents "Screenomics: A playground for the mining and modeling of both "big" and "small" longitudinal data"

Topic: 
Screenomics: A playground for the mining and modeling of both "big" and "small" longitudinal data
Abstract / Description: 

We recently developed and forwarded a framework for capturing, visualizing, and analyzing the unique record of an individual's everyday digital experiences: screenomics. In our quest to derive knowledge from and understand screenomes – ordered sequences of hundreds of thousands of smartphone and laptop screenshots obtained every five seconds for between one day and six months – the data have become a playground for learning about the computational machinery used to process images and text, machine learning algorithms, human-labeling of unknown taxonomies, qualitative inquiry, and the tension between N = 1 and N = many approaches. Using illustrative problems, I share how engagement with these new data is reshaping both how we do analyses and how we study the person-context transactions that drive human behavior.

Date and Time: 
Tuesday, October 27, 2020 - 4:30pm

Statistics Department Seminar presents "Two mathematical lessons of deep learning"

Topic: 
Two mathematical lessons of deep learning
Abstract / Description: 

Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models appeared at odds with the standard intuitions on optimization.

First, I will present the evidence, empirical and mathematical, that necessitates revisiting classical notions such as over-fitting. I will continue to discuss the emerging understanding of generalization and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation.

Second, I will discuss why the landscapes of over-parameterized neural networks are essentially never convex, even locally. Yet, they satisfy the local Polyak–Lojasiewicz condition, which allows SGD-type methods to converge to a global minimum.

A key piece of the puzzle remains: how do these lessons come together to form a complete mathematical picture of modern DL?

Date and Time: 
Tuesday, October 20, 2020 - 4:30pm

Statistics Department Seminar presents "Backfitting for large-scale crossed random effects regressions"

Topic: 
Backfitting for large-scale crossed random effects regressions
Abstract / Description: 

Large-scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Naive methods of handling such data will produce inferences that do not generalize. Regression models that properly account for crossed random effects can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.

This is based on joint with Swarnadip Ghosh and Trevor Hastie of Stanford University.

Date and Time: 
Tuesday, October 13, 2020 - 4:30pm

Pages

Subscribe to RSS - Graduate