 |
 |
|
|
Session
Smoking Out Corporate Malfeasance with Open Source Software
Bob Mason, Lead Programmer, Tobacco Documents Library, UCSF Library
Track: Linux
Date: Wednesday, August 3rd, 2005
Time: 1:45pm - 2:30pm
Location: Portland 252
The Master Settlement Agreement of 1998 opened internal Tobacco Industry records and research to the public. The UCSF Library was tasked with organizing 42 million pages of low quality TIFF images into a searchable database, so that many interested communities (public health, law, political science, medical science, corporate history, and the public at large) would have optimal access to these documents.
Faced with modest resources for software purchase or development, UCSF turned to a dozen or so open source products/libraries to put together a database, a search engine, and an Optical Character Recognition 'Farm' (or grid) to make this huge corpus of difficult material searchable and accessible. The OCR grid uses the spare cycles of PC's on campus--in labs and classrooms--to convert images into searchable text and PDF's. At peak levels, a million pages a day were processed.
The site thus created offers a rare and often astonishingly revealing look at corporate power in action, and an example of how socially important and technologically interesting systems can be built with Open Source tools.
|
|
 |
 |
 |
Diamond Sponsors
Platinum Sponsors
Gold Sponsors
Silver Sponsors
Media Sponsors
In-Kind Sponsors
Sponsors
OSCON 2005 Sponsor Opportunities — Email us at
Download the OSCON 05 Sponsor/Exhibitor Prospectus
OSCON 2005 Media Sponsor Opportunities — Call Margi Levin at 707-827-7184 or email at
Press and Media
For media-related inquiries, contact Suzanne Axtell at
Conference News
Want to receive conference news? Sign up for our email newsletter.
|
 |