Peter Covitz, NCI Center for Bioinformatics
Date: Wednesday, February 05
Time: 4:45pm - 5:30pm
Location: California Ballroom A & B
caCORE is a synthesis of open source technologies and bioinformatics workflows that supports data management, access, and semantic standardization for genomic and clinical research. Cancer Bioinformatics Infrastructure Objects (caBIO) provide an integrated UML model with a Java-Bean-based implementation. The NCI Enterprise Vocabulary Services (EVS) provide controlled vocabularies, thesauri, and ontologies that link the diversity of terminology found in the object data. The Cancer Data Standards Repository
(caDSR) provides a metadata management system for NCI research data standards. Together, these technologies offer a broad framework for managing, integrating, accessing, and analyzing a variety of biomedical information. The architecture offers Java, SOAP-XML, and HTTP-XML programming interfaces to cancer research data hosted at the NCI, and will soon support access and presentation of local user data. caCORE is available at ncicb.nci.nih.gov/core.
Covitz discusses his presentation and the future of bioinformatics:
“I think the coolest thing about my presentation is that it shows just how far you can get when you combine modern enterprise software development approaches with open source philosophy. At the NCI Center for Bioinformatics, we host one of the few object-oriented Web service APIs to large bioinformatics data sets. We provide structured programmatic access to our huge cancer research data sets without having to replicate the data locally. Just call our server from your code, and we'll be there.
“I hope the presentation will encourage people to make their own data available through a Web services API as well. As a community, we really do need to start coming together on this issue. A thousand Web sites without any common type programmatic access other than screen scraping is an untenable situation that must change.
“What fascinates me about the field is the inherent tension between local and global imperatives: Bioinformaticists must operate close to the sources of the data they work with, and to the scientists who want to ask biological questions of the data, in order to keep their activities relevant. And yet in order to get their work done, they must also integrate large volumes of data collected in far-away places. Further, they are asked to provide their results to the entire world, not just through publication and casual
communication, but with live electronic systems that are expected to be ‘available’ all the time.
“Meeting these expectations is often beyond the means of single individuals, and thus team science is rapidly becoming an operating norm. My prediction (by no means original) is that team science is going to spread from the large genome project type efforts down into the smaller laboratory settings. However, true collaboration does not come naturally to many life scientists, who usually hone their skills on individual lab projects during graduate and post-doctoral training. A contentious cultural shift may be required.”