Motivated by problems in single cell flow cytometry, we conduct an axiomatic study of the problem of estimating the strength of a known causal relationship between a pair of continuous variables. We propose that an estimate of causal strength should be based on the conditional distribution of the effect given the cause (and not on the driving distribution of the cause), and study dependence measures on conditional distributions. Shannon capacity, appropriately regularized, emerges as a natural measure under these axioms. We examine the problem of calculating Shannon capacity from samples and design a simple, consistent and efficient fixed k-nearest neighbor estimator. The estimators strongly outperform state of the art in single cell flow cytometry analytics in terms of sample complexity.
An important component of this design involves efficient estimation of mutual information between a pair of random variables from i.i.d. samples drawn from an unknown joint density. The most popular mutual information estimator is one proposed by Kraskov and Stogbauer and Grassberger (KSG) in 2004, and is nonparametric and based on the distances of each sample to its k-nearest neighboring sample, where k is a fixed small integer. Despite its widespread use (part of scientific software packages), theoretical properties of this estimator have been largely unexplored. We demonstrate that the KSG estimator is consistent while identifying an upper bound on the rate of convergence of the bias as a function of number of samples. We argue that the superior performance benefits of the KSG estimator stems from a curious "correlation boosting" effect and build on this intuition to modify the KSG estimator in novel ways to construct a superior estimator.
Pramod Viswanath received the Ph.D. degree in EECS from University of California at Berkeley in 2000. From 2000 to 2001, he was a member of research staff at Flarion technologies, NJ. Since 2001, he is on the faculty at University of Illinois at Urbana Champaign in Electrical and Computer Engineering, where he currently is a professor. He is a coauthor, with David Tse, of the text Fundamentals of Wireless Communication, which has been used in over 60 institutions around the world. He is the inventor of opportunistic beamforming and co-designer of Flash-OFDM, designs that have evolved into fourth-generation wireless technologies. In the past, he has worked on information theory with applications to wireless communications, networking and data and meta-data privacy. His evolving current research interests include machine learning with applications to NLP.