Statistics and Probability Seminars

Workshop in Biostatistics presents "Algorithm-assisted decision making in child welfare"

Topic: 
Algorithm-assisted decision making in child welfare
Abstract / Description: 

Every year, there are more than 4 million referrals made to child protection agencies across the US. The practice of screening calls is left to each jurisdiction to follow local practices and policies, potentially leading to large variation in the way in which referrals are treated across the country. While increasing access to linked administrative data is available, it is difficult for workers to make systematic use of historical information about all the children and adults on a single referral call. Jurisdictions around the country are thus increasingly turning to predictive modeling approaches to help distill this rich information. The end result is typically a single risk score reflecting the likelihood of a near-term adverse event. Yet the use of predictive analytics in the area of child welfare remains highly contentious. There is concern that some communities—such as those in poverty or from particular racial and ethnic groups—will be disadvantaged by the reliance on government administrative data. In this talk, I will describe some of the work we have done both in the lab and in the community as part of developing, deploying and evaluating a prediction tool currently in use in the Allegheny County Office of Children, Youth and Families.


Suggested Readings:
● Counterfactual risk assessment, evaluation, and fairness
● Toward algorithmic accountability in public services
● Decisions in the presence of erroneous algorithmic scores
● Excerpt from Virginia Eubanks' Automating Inequality

Date and Time: 
Thursday, November 12, 2020 - 2:30pm
Venue: 
Zoom ID 926 9609 8893 (+password)

Statistics Department Seminar presents "On the statistical foundations of adversarially robust learning"

Topic: 
On the statistical foundations of adversarially robust learning
Abstract / Description: 

Robustness has long been viewed as an important desired property of statistical methods. More recently, it has been recognized that complex prediction models such as deep neural nets can be highly vulnerable to adversarially chosen perturbations of their outputs at test time. This area, termed adversarial robustness, has garnered an extraordinary level of attention in the machine learning community over the last few years. However, little is known about the most basic statistical questions. In this talk, I will present answers to some of them. In particular, I will show how class imbalance has a crucial effect, and leads to unavoidable tradeoffs between robustness and accuracy, even in the limit of infinite data (i.e., for the Bayes error). I will also show other results, some of them involving novel applications of results from robust isoperimetry (Cianchi et al, 2011).

This is joint work with Hamed Hassani, David Hong, and Alex Robey.

Date and Time: 
Tuesday, November 17, 2020 - 4:30pm

Statistics Department Seminar presents "Conditional calibration for false discovery rate control under dependence"

Topic: 
Conditional calibration for false discovery rate control under dependence
Abstract / Description: 

We introduce a new class of methods for finite-sample false discovery rate (FDR) control in multiple testing problems with dependent test statistics where the dependence is fully or partially known. Our approach separately calibrates a data-dependent p-value rejection threshold for each hypothesis, relaxing or tightening the threshold as appropriate to target exact FDR control. In addition to our general framework we propose a concrete algorithm, the dependence-adjusted Benjamini–Hochberg (dBH) procedure, which adaptively thresholds the q-value for each hypothesis. Under positive regression dependence the dBH procedure uniformly dominates the standard BH procedure, and in general it uniformly dominates the Benjamini–Yekutieli (BY) procedure (also known as BH with log correction). Simulations and real-data examples illustrate power gains over competing approaches to FDR control under dependence.

This is joint work with Lihua Lei.


 

Joint Colloquium with Berkeley at Stanford

Date and Time: 
Tuesday, November 10, 2020 - 4:30pm

Statistics Department Seminar presents "Some theoretical results on model-based reinforcement learning"

Topic: 
Some theoretical results on model-based reinforcement learning
Abstract / Description: 

We discuss some recent results on model-based methods for reinforcement learning (RL) in both online and offline problems.

For the online RL problem, we discuss several model-based RL methods that adaptively explore an unknown environment and learn to act with provable regret bounds. In particular, we focus on finite-horizon episodic RL where the unknown transition law belongs to a generic family of models. We propose a model based "value-targeted regression" RL algorithm that is based on an optimism principle: in each episode, the set of models that are "consistent" with the data collected is constructed. The criterion of consistency is based on the total squared error that the model incurs on the task of predicting values as determined by the last value estimate along the transitions. The next value function is then chosen by solving the optimistic planning problem with the constructed set of models. We derive a bound on the regret, for an arbitrary family of transition models, using the notion of the so-called Eluder dimension proposed by Russo and Van Roy (2014).


Next we discuss batch-data (offline) reinforcement learning, where the goal is to predict the value of a new policy using data generated by some behavior policy (which may be unknown). We show that the fitted Q-iteration method with linear function approximation is equivalent to a model-based plugin estimator. We establish that this model-based estimator is minimax optimal and its statistical limit is determined by a form of restricted chi-square divergence between the two policies.

Date and Time: 
Tuesday, November 3, 2020 - 4:30pm
Venue: 
Registration required

Statistics Department Seminar presents "Screenomics: A playground for the mining and modeling of both "big" and "small" longitudinal data"

Topic: 
Screenomics: A playground for the mining and modeling of both "big" and "small" longitudinal data
Abstract / Description: 

We recently developed and forwarded a framework for capturing, visualizing, and analyzing the unique record of an individual's everyday digital experiences: screenomics. In our quest to derive knowledge from and understand screenomes – ordered sequences of hundreds of thousands of smartphone and laptop screenshots obtained every five seconds for between one day and six months – the data have become a playground for learning about the computational machinery used to process images and text, machine learning algorithms, human-labeling of unknown taxonomies, qualitative inquiry, and the tension between N = 1 and N = many approaches. Using illustrative problems, I share how engagement with these new data is reshaping both how we do analyses and how we study the person-context transactions that drive human behavior.

Date and Time: 
Tuesday, October 27, 2020 - 4:30pm

Statistics Department Seminar presents "Two mathematical lessons of deep learning"

Topic: 
Two mathematical lessons of deep learning
Abstract / Description: 

Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models appeared at odds with the standard intuitions on optimization.

First, I will present the evidence, empirical and mathematical, that necessitates revisiting classical notions such as over-fitting. I will continue to discuss the emerging understanding of generalization and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation.

Second, I will discuss why the landscapes of over-parameterized neural networks are essentially never convex, even locally. Yet, they satisfy the local Polyak–Lojasiewicz condition, which allows SGD-type methods to converge to a global minimum.

A key piece of the puzzle remains: how do these lessons come together to form a complete mathematical picture of modern DL?

Date and Time: 
Tuesday, October 20, 2020 - 4:30pm

Statistics Department Seminar presents "Backfitting for large-scale crossed random effects regressions"

Topic: 
Backfitting for large-scale crossed random effects regressions
Abstract / Description: 

Large-scale genomic and electronic commerce data sets often have a crossed random effects structure, arising from genotypes x environments or customers x products. Naive methods of handling such data will produce inferences that do not generalize. Regression models that properly account for crossed random effects can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as N^(3/2) (or worse) for N observations. Papaspiliopoulos, Roberts and Zanella (2020) present a collapsed Gibbs sampler that costs O(N), but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized least squares estimate and prove that it costs O(N) under greatly relaxed though still strict sampling assumptions. Empirically, the backfitting algorithm costs O(N) under further relaxed assumptions. We illustrate the new algorithm on a ratings data set from Stitch Fix.

This is based on joint with Swarnadip Ghosh and Trevor Hastie of Stanford University.

Date and Time: 
Tuesday, October 13, 2020 - 4:30pm

Statistics Department Seminar presents "Berry–Esseen bounds for Chernoff-type nonstandard asymptotics in isotonic regression"

Topic: 
Berry–Esseen bounds for Chernoff-type nonstandard asymptotics in isotonic regression
Abstract / Description: 

A Chernoff-type distribution is a non-normal distribution defined by the slope at zero of the greatest convex minorant of a two-sided Brownian motion with a polynomial drift. While a Chernoff-type distribution appears as the distributional limit in many nonregular estimation problems, the accuracy of Chernoff-type approximations has been largely unknown. In this talk, I will discuss Berry–Esseen bounds for Chernoff-type limit distributions in the canonical nonregular statistical estimation problem of isotonic (or monotone) regression. The derived Berry–Esseen bounds match those of the oracle local average estimator with optimal bandwidth in each scenario of possibly different Chernoff-type asymptotics, up to multiplicative logarithmic factors. Our method of proof differs from standard techniques on Berry–Esseen bounds, and relies on new localization techniques in isotonic regression and an anti-concentration inequality for the supremum of a Brownian motion with a Lipschitz drift.

This talk is based on joint work with Qiyang Han.

Date and Time: 
Tuesday, October 6, 2020 - 4:30pm

Probability Seminar presents "Non-stationary fluctuations for some non-integrable models"

Topic: 
Non-stationary fluctuations for some non-integrable models
Abstract / Description: 

The Kardar-Parisi-Zhang (KPZ) equation is a conjecturally universal model for dynamics of fluctuating interfaces such as fires and epidemic fronts. The universality was originally justified by Kardar, Parisi, and Zhang via non-rigorous renormalization group calculations. In this talk, we introduce some mathematically rigorous results and take a step towards this universality in the context of some non-integrable interacting particle systems outside their respective invariant measures.

Date and Time: 
Monday, November 16, 2020 - 4:00pm to Tuesday, November 17, 2020 - 3:55pm

Pages

Subscribe to RSS - Statistics and Probability Seminars