O'Reilly Bioinformatics Technology Conference.
Books Safari Bookshelf Conferences O'Reilly Network

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow At-a-Glance
Arrow BOFs
Arrow Posters
Arrow Community
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow Venue Map
Arrow See & Do
Arrow Press
Arrow Join Mailing List 
Arrow Related Reading

Practical Innovation at BioCon 2003

Poster Session

GenBankParser: When all you want is to parse the flat files.

By John Kloss
Programmer, Database Administrator, Systems Administrator, Genome Sequencing Center, Washington University Medical School in St. Louis

The GenBankParser is a simple parser of the NCBI's GenBank Flat File Format. Pa rsing is performed via recursive decent which is based upon the structure of the format not on particular fields. Fields and subfields are parsed by separate and disjoint parsers which allows the GenBankParser to quickly adjust to new or changing field formats in subsequent GenBank releases. Access to the information parsed is provided by a user defined callback funtion and an intuitive "point and ask for it" interface where accessor methods are named after the fields and subfields of the GenBank Flat File Format. Field and subfield parsers are selected by the user at compile time during which the necessary accessor and mutator functions are generated. Unlike other GenBank Flat File Parsers, only those fields in which the user is interested will be parsed. All other information is ignored, which provides significant speed advantages when one only cares about a few fields in the flat file. In individual tests against the GenBank Flat File parser methods provided by the bioperl distribution, the GenBankParser proved to be ten to fifteen times faste r at generating common parsed output such as fasta format for nucleic or protein sequences.

oreilly.com Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts
International | About O'Reilly | Affiliated Companies | Privacy Policy

© 2002, O'Reilly Media, Inc.