Mark-Jason Dominus, Plover Systems Co.
Track: Perl
Date: Tuesday, July 08
Time: 8:45am - 12:15pm
Location: Salon F
It can be hard to predict what a regex will do. Almost everyone has
written a regex that failed to match something they wanted it to, or
that matched something they thought it wouldn't. That will never
happen to you again after you take this tutorial.
We will explore the algorithm that Perl uses internally to perform
regex matching. Understanding this algorithm will allow us to predict
whether a regex will match, which of several matches Perl will find,
and which regexes will be faster than others. During this discussion
we'll pause to discuss practical applications that illustrate features
of the algorithm. We'll examine the essential but usually
misunderstood concept of 'greed', and we'll learn why commonly-used
regex symbols like '.', '$', and '\1' might not mean what you thought
they did.
In the second section, we'll apply our knowledge of the internals,
examining several common disasters, a few practical parsing
applications, and some new features that would have been hard to
understand before. We'll see an example of every regex metacharacter
and modifier. We'll finish with a discussion of some of the new
optimizations that were added in Perl 5.6, and why you should avoid
the '/i' modifier.
Part I: Inside the Regex Engine. Regular Expressions are Programs;
Backtracking; Quantifiers; Greed; Assertions; Backreferences.
Part II: Disasters and Optimizations. Where machines come from;
Disaster examples; Regex modifiers; Tokenizing; New optimizations;
Matching strings with balanced parentheses.