The O'Reilly Peer-to-Peer and Web Services Conference
oreilly.comO'Reilly Network
ConferencesInternationalSafari: Books Online

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow BOFs
Arrow Community Meetings
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow See & Do
Arrow Press
Arrow Mail List

Practical Tools For Innovation
O'Reilly Bioinformatics Technology Conference
January 28-31, 2002 -- Tucson, AZ
Chambered Nautilus


Parsing in Perl for Bioinformatics

Damian Conway, Thoughtstream

Track: Bioinformatics Tutorials
Date: Monday, January 28
Time: 8:30am - 5:00pm
Location: Canyon II

Parsing is the process of detecting and verifying the structure of incoming data and then processing that data so as to make it available to a program in convenient ways.

This full-day tutorial will introduce beginner and intermediate Perl programmers to the wide range of parsing mechanisms available in Perl and explain specific techniques for parsing data in a variety of commonly used formats. Most examples will be based on typical parsing problems encountered in Bioinformatics.

The techniques covered include:

  • simple parsing with regexes
  • linear parsing with state machines
  • piece-wise parsing with extractors
  • structured parsing with grammars
  • processing comma-separated text
  • dealing with XML and other tagged formats
  • dealing with BLAST output and other heterogeneous structured formats
  • handling queries in synthetic and natural languages
  • extracting data structures from structured data
  • processing file inclusions
  • coping with incomplete, malformed, and ambiguous data
  • selecting and using appropriate parsing tools from the CPAN
  • integrating parsing and object oriented programming

During the tutorial we will also consider the following uses for the techniques described:

  • data mining (parsing as a data recognition tool)
  • error detection and consistency checking (parsing as a data validation tool)
  • structured I/O (parsing as a data acquisition tool)
  • recognition and extraction (parsing as a data search tool)
  • hierarchical data processing (parsing as a data transformation tool)
  • task specific languages (parsing as a command specification tool) Home | Conferences Home | Bioinformatics Conference Home
Registration | Speakers | Keynotes | Tutorials | Sessions | BOFs
Exhibitors | Sponsors | Hotel/Travel | Press | Mail List

© 2001, O'Reilly Media, Inc.