Christopher Lee, UCLA Bioinformatics
Track: Bioinformatics Overview
Date: Wednesday, February 05
Time: 11:30am - 12:15pm
Location: California Ballroom A & B
Parker argues the significance of a fundamental shift in bioinformatics, from in-the-small to in-the-large. Adopting a large-scale perspective is a way to address problems endemic to the world of the small, such as integrating many hard-to-integrate tools.
In their 1976 paper, "Programming In the Large Versus Programming In the Small," a paper that helped to forge the software engineering field, DeRemer and Kron distinguished two scales of system issues. In-the-small issues lie within the comprehension of one person and the jurisdiction of a single craft, including issues of algorithms, specific mathematical models, and individual expertise. In-the-large issues focus on relationships and structures that span component boundaries and cross jurisdictions, including connections between different data and disciplines, models of how tools interact and change, and the ability to answer complex queries across all of the data. The distinction between in-the-small and in-the-large is critical for many disciplines.
Most research in bioinformatics today focuses on analysis; i.e., on reducing individual problems to manageable proportions with specialized tools. This in-the-small focus has served the field well so far. However, some in-the-large challenges appear critical:
- The enormous mass and complexity of bioinformatics data.
- The need for strong data integration.
- The predominance of complex, but crucial, queries that combine diverse types and quantities of data.
Parker posits that addressing these challenges is not merely a matter of degree (that is, "we need better tools"), but requires a qualitative change in perspective.
Relying on concrete examples, Parker shows that a "make"-like system can be used to address large-scale bioinformatics challenges. Such a system not only can maintain dependencies among results of analysis, and lift one's level of thinking from hacking to biology, but it also can serve as an in-the-large query language. Discussion includes the usefulness of this in-the-large perspective in two case studies: GeneMine, an interactive data-mining tool for biologists to analyze gene and protein structure-function; and a single nucleotide polymorphism (SNP) discovery system, which has produced about a third of the total coding region SNPs currently known in the human genome.
Download presentation file