Christopher Dwan, University of Minnesota
Track: End-User Applications
Date: Wednesday, February 05
Time: 11:30am - 12:15pm
Location: California Ballroom C
Over the past several years, Dwan has helped users to run BLAST millions of times. No matter what’s used -- specialized hardware, the UNIX command line, or a Web interface -- the issues and misconceptions remain the same. Dwan demonstrates and analyzes "stupid BLAST tricks," which are a small sampling of ways in which a user can create thoroughly confusing results using BLAST. Dwan does not report on bugs or implementation issues: the direct consequences of the algorithm itself are far more interesting, and confusing.
An example comes from comparative genomics: running BLAST "out of the box" against a variety of chromosome- and genome-sized targets quickly reveals scale dependencies in whether the sequences have reported regions of high similarity. This can be explained in part by the statistics of the BLAST heuristic: while a perfect 8-mer match is a unique and rare thing among sequences a few thousand residues in length, several 8-mer matches are to be expected among entire eukaryotic chromosomes. Combined with the BLAST alignment generation algorithm (which anchors to these supposedly rare 8-mers), confusion can reign.
What does Dwan hope attendees will take away from his presentation? “Everybody who does sequence-based bioinformatics, in any capacity at all, has used BLAST. This presentation shows some of the ways that I have seen it misused and abused over the past few years. I hope that most people will nod their heads at some point and say ‘I wondered why that didn't work out right ... now I know.’
“While BLAST is a great and useful tool, it was designed for a very specific task. Its algorithm encapsulates assumptions that may or may not be true for any other task. I hope that people will come away with a better feel for when they're getting bogus results based on a perfectly legal use of the tool. They may also get some hints on how to work around that limitation to get accurate results and advance their research.”
Dwan’s prediction for a bioinformatics breakthrough: “When someone figures out the genetic mechanics behind cellular differentiation in eukaryotes, it will be a great day. Today, we have a pretty firm grasp on how to find the genes and their protein products that make up most organisms. We still have very little insight (beyond observing that it happens, and wondering how) into how these genes are differentially regulated to produce the hundreds of different types of cells that make up a large organism.”
Download presentation file