Advanced Techniques for Parsing
Mark-Jason Dominus, Chief Programmer, Plover Systems Co.
Date: Monday, July 23
Time: 1:30pm - 5:00pm
Location: Portland 252
Parsing is the task of analyzing unstructured inputs, such as character strings, and transforming them into structured data, such as databases or hierarchies. Nearly every program has to parse input. Perl provides some built-in operators for parsing, but they go only so far. And although CPAN contains several excellent parsing modules, they are all fundamentally limited.
In this class, we will see how to build a parsing system that is unlimited in extent. The basic idea is to construct modular tools that can assemble simple parsers into more complex ones. When complex parsers are built from simple components, parsing code is powerful, flexible, and maintainable. Parsers are written directly in Perl, not in a separate language.
Some tools are generic, and are useful in building nearly any parser, but we'll also see how to build special-purpose parser-constructing tools as we need them to solve parsing problems that are specific to an application. Topics will include:
- Building lexers and emulating Perl's '
- Very simple parsers
- Writing functions to combining simple parsers into more complex ones
- Recursive descent
- Case study: parsing regular expressions
- Case study: parsing outlines and trees
- Parsers that diagnose and recover from erroneous input
- Operator overloading