EE Student Information

The Department of Electrical Engineering supports Black Lives Matter. Read more.

• • • • •

EE Student Information, Spring & Summer Quarters 19-20: FAQs and Updated EE Course List.

Updates will be posted on this page, as well as emailed to the EE student mail list.

Please see Stanford University Health Alerts for course and travel updates.

As always, use your best judgement and consider your own and others' well-being at all times.

ISL Colloquium and IT-Forum presents "Compression for biological data analysis"

Compression for biological data analysis
Friday, February 1, 2019 - 1:15pm
Packard 202
William Yu (Harvard Medical School)
Abstract / Description: 

Compression has for decades served primarily the utilitarian purpose of enabling easier storage and transmission of data. Here however, I show how compression can be used to better understand biological processes and assist in data analysis.

First, I will demonstrate the relationship between lossy compression and understanding the perceptual characteristics of downstream agents. Quartz, my lossy compression program for next-generation sequencing quality scores counterintuitively improves SNP calling, despite discarding 95% of quality scores, showing the oversensitivity of variant callers to sequencer noise. More recently, I developed HyperMinHash, a lossy floating-point compression of the popular MinHash Jaccard index sketch, that reduces the space-complexity from log(n) to loglog(n) by using the understanding that MinHash cares less about large hash values than smaller ones.

In the second part of this talk, I show how we exploit the compressive structure of biological data to speed up similarity search. I prove that by organizing the database to facilitate clustered search, our time-complexity scales with metric entropy (number of covering hyperspheres) if the fractal dimension of a dataset is low. This is the key insight behind our compressively accelerated versions of standard tools in genomics (CORA, 10-100x speedup for all-mapping of NGS reads), metagenomics (MICA, 3.5x speedup Diamond), and chemical informatics (Ammolite, 150x speedup SMSD).


Yun William Yu is a Research Fellow in the Department of Biomedical Informatics at Harvard Medical School, where he works on sketching and streaming algorithms for aggregate patient medical records. He received a BS in mathematics and a BA in chemistry from Indiana University, and completed an MRes in biomedical physical chemistry and an MPhil in mathematics at Imperial College London on a Marshall Scholarship. Supported by a Hertz Fellowship, he did his PhD in applied mathematics under Professor Bonnie Berger at the Massachusetts Institute of Technology.