

|
Innovate--Collaborate--Discover
O'Reilly Open Source Convention
Sheraton San Diego Hotel, San Diego, CA
July 23-27, 2001
|
|
Tutorial
Data Munging
Damian Conway, Thoughtstream
Track: Perl Conference 5
Date: Monday, July 23
Time: 8:45am
- 5:15pm
Location: Grande Ballroom C
Target audience:
Novice perl programmers who are familiar with simple I/O and
variables, and who want to a deeper insight into the techniques
of Perl's "core business": extraction, manipulation, and
reporting of data.
What attendees will learn:
This tutorial will show you how to use a range of standard Perl
features and numerous CPAN modules to read in, decipher, process,
and reformat ASCII text data.
You will learn:
- how regular expressions work, and how to make them
work better for you,
- how to balance nested brackets and match delimiters
- how to recognize and process common text formats like
CSV and HTML
- how to preprocess archived text formats like (g)zip,
tar, uuencoding, MIME, and binary formats,
- how to handle ambiguity and errors when processing
text,
- how to convert your processed data back into readable
text, in either fixed or floating formats
- how to extract, process, and generate simple natural
language data,
Tutorial outline:
- Getting at the data in the first place
- Un(g)zipping, untarring, uudecoding, demiming
- Compress::Zlib
- Archive::Tar
- Convert::UU
- MIME tools
- Handling file inclusions
- Regular expressions
- How they work
- How they're used (m//, s///, split, grep)
- How to build them (easily)
- Common regexps and Regexp::Common
- Some useful modules for decoding particular formats
- Text::CSV_XS for comma separated values
- Text::Balanced for delimiters, brackets, and tags
- HTML::TreeBuilder for HTML
- unpack and vec for binary formats
- Simple transformations
- Text::Tabs
- Text::Autoformat
- Fuzzy processing
- String::Approx and String::EditDistance
- Text::Soundex and Text::Metaphone
- Natural language
- Lingua::EN::Inflect
- Lingua::EN::Infinitive
- Lingua::EN::Squeeze
- Reporting
- printf and sprintf
- Text::Autoformat::form()
- Interpolation
|
oreilly.com Home |
Conferences Home |
Open Source Convention Home
Registration |
Hotels/Travel |
Tutorials |
Sessions |
Speakers
Press |
Mail List |
Exhibitors |
Sponsors

© 2001, O'Reilly Media, Inc.
conftech@oreilly.com
|
|