O'Reilly Open Source Convention
Books Safari Bookshelf Conferences O'Reilly Network

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow At-a-Glance
Arrow Wiki
Arrow BOFs
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow Venue Map
Arrow See & Do
Arrow Tips for
Arrow Press
Arrow Mail List



All Your Texts Are Belong To Us - Hacking Literature With Perl
Maciej Ceglowski, Web Applications Developer, National Institute for Technology and Liberal Education

Track: Perl
Date: Thursday, July 29
Time: 5:20pm - 6:05pm
Location: Salon H


In 1996, Don Foster correctly identified Joe Klein as the author of the bestselling political novel "Primary Colors," bringing instant notoriety to himself and to the sub-branch of statistical natural language he called 'literary forensics.' Since then, Internet search engines, open source databases, and enormous digitization efforts like the Gutenberg project have made it easier than ever to unleash computers on text, with fascinating results.

This talk will show how to apply algorithms from fields as diverse as graph theory, signal processing, and information retrieval to literary texts, unlocking their secrets without forcing the programmer to do any actual reading. Whether tracing thematic connections, figuring out whether an author really wrote a given passage, or creating Cliff's Notes-like summaries of long novels, computers can fake a surprising degree of literary acumen. Many of the half-forgotten natural language processing techniques from the 1970's turn out to be perfectly suited to literary analysis, and quite simple to implement.

Come see our open source literary toolkit in action, learn about clever ways to play with natural language, and help bring us closer to the goal of replacing the graduate student in literature with a small Perl script.

O'Reilly Home | Privacy Policy

© 2004, O'Reilly Media, Inc.