Aaron Mackey, University of Virginia
Track: Bioinformatics 2003 Tutorial
Date: Monday, February 03
Time: 1:30pm - 5:00pm
Location: California Ballroom C
With the explosion of genome data and protein, gene, and functional database resources, bioinformaticists need to organize and interact with large datasets from diverse sources. Many public databases include cross-references between their own data and related data in other databases. However, it is not yet possible, through any public interface, to access an integrated version of these cross-referenced datasets, nor to add one's own private data. The need to manage genome-scale data has prompted some sophisticated researchers to build their own local databases, integrating only those datasets specific to their research needs. Mackey believes "boutique" relational databases must become more ubiquitous in research labs and core support facilities.
Mackey scrutinizes the design and use of relational databases for this purpose, and introduces the entity-relationship model of data management that allows one to exploit the cross-referencing information of public databases for novel analyses. He builds
successively more complicated database models, discussing database schema design decisions and example usages of each structure. Advanced concepts for modelling hierarchical and time-dependent data will be discussed. Database models covering multiple domains of bioinformatic-related data are included, and example uses will be demonstrated. Mackey also reviews a few publicly available database frameworks for bioinformatics applications.
Elaborates Mackey: “Biological data presents special challenges to a relational database schema designer who may be used to more traditional data scenarios. In bioinformatics, the data is often ‘dirty,’ can exist in a tree-like hierarchy (or more complicated graph structures), and changes often over time. The climax of the tutorial will be a discussion of sophisticated approaches to storing and efficiently manipulating these difficult datasets. Attendees should be comfortable about considering building and using their own custom relational databases, tailored to their specific research goals.”
What does Mackey feel is the most intriguing aspect of bioinformatics? “In a word, synergy; the integration of genomic, proteomic, and expression analyses in the context of controlled vocabularies, such as those embodied by the Gene Ontology project and related structured anatomic and subcellular locational ‘maps,’ will drive new areas of biological research as never before, bringing bioinformatics back to the wet-bench.”