Statistics Department Seminar “Identifying condition-specific patterns in large-scale genomic data”

Identifying condition-specific patterns in large-scale genomic data
Tuesday, June 23, 2020 - 4:30pm
Zoom ID 941 3461 3493 (+password)
Qunhua Li (Pennsylvania State University)
Abstract / Description: 

Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation. But it still remains computationally challenging even when the number of conditions is moderate.

I will present CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology which learns patterns of condition specificity present in genomic data by leveraging pairwise information. CLIMB provides a generic framework facilitating a host of downstream analyses, such as clustering genomic features sharing similar conditional-specific patterns and identifying which of these features are involved in cell fate commitment. It improves upon existing methods by boosting statistical power to identify biologically meaningful signals while retaining interpretability and computational tractability. We illustrate CLIMB's value on a CTCF ChIP-seq dataset measured in 17 different cell populations and an RNA-seq dataset measured in three committed hematopoietic lineages. These analyses demonstrate that CLIMB captures biologically relevant clusters in the data and improves upon commonly-used pairwise comparisons and unsupervised clusterings typical of genomic analyses.