Given a large cohort of "similar" cases Z1, . . . , ZN one can construct an efficient statistical inference procedure by 'learning from the experience of others' (also known as "borrowing strength" from the ensemble). But what if, instead, we observe a massive database where each case is accompanied with extra contextual information (usually in the form of covariates), making the Zi's non-exchangeable? It's not obvious how we go about gathering strength when each piece of information is fuzzy and heterogeneous. Moreover, if we include irrelevant cases, borrowing information can heavily damage the quality of the inference! This raises some fundamental questions: when (not) to borrow? whom (not) to borrow? how (not) to borrow? These questions are at the heart of the "Problem of Relevance" in statistical inference – a puzzle that remained too little addressed since its inception nearly half a century ago (Efron and Morris, 1971,1972; Efron 2019). The purpose of this talk is to present an attempt to develop basic principles and practical guidelines in order to tackle some of the unsettled issues that surround the relevance problem. Through examples, we will demonstrate how our new statistical perspective answers previously unanswerable questions in a realistic and feasible way.
This is a joint work with my previous student Kaijun Wang, who is currently a postdoctoral research fellow at Fred Hutchinson Cancer Research Center.