Parsing is the process of detecting and verifying the structure of incoming data and then processing that data so as to make it available to a program in convenient ways.
This full-day tutorial will introduce beginner and intermediate Perl programmers to the wide range of parsing mechanisms available in Perl and explain specific techniques for parsing data in a variety of commonly used formats. Most examples will be based on typical parsing problems encountered in Bioinformatics.
The techniques covered include:
- simple parsing with regexes
- linear parsing with state machines
- piece-wise parsing with extractors
- structured parsing with grammars
- processing comma-separated text
- dealing with XML and other tagged formats
- dealing with BLAST output and other heterogeneous structured formats
- handling queries in synthetic and natural languages
- extracting data structures from structured data
- processing file inclusions
- coping with incomplete, malformed, and ambiguous data
- selecting and using appropriate parsing tools from the CPAN
- integrating parsing and object oriented programming
During the tutorial we will also consider the following uses for the techniques described:
- data mining (parsing as a data recognition tool)
- error detection and consistency checking (parsing as a data validation tool)
- structured I/O (parsing as a data acquisition tool)
- recognition and extraction (parsing as a data search tool)
- hierarchical data processing (parsing as a data transformation tool)
- task specific languages (parsing as a command specification tool)