Open Source Data Warehouses
Roger Magoulas, Director Market Research, O'Reilly Media, Inc.
Date: Thursday, August 4th, 2005
Time: 5:20pm - 6:05pm
Commodity hardware, faster, cheaper disks, and open source software now make building a data warehouse more of a resource and design issue than a cost consideration for many organizations. A robust analysis infrastructure can be built using open source components with no performance or functional compromises.
This talk covers a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.
Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows--tools like Perl, SQL and Apache--rather than having to procure and staff for the proprietary tools that dominate the commercial space. Topics include:
Data Warehouse Architecture:
Consolidated Data Store (CDS)
Condition, correlate and transform data
Multi-topic data marts
Multi-channel data access
Open Source Components
Open Source Database Choices
Data Movement: SQL/Perl/DBI
- fast, flexible
Data Access and Presentation: AJaX/Java/Perl/Ruby/SQL
- Template toolkit for ad hoc SQL
- Perl reporting; Spreadsheet::WriteExcel
- Visualization techniques
- Graphing Tools
Analysis: R (CRAN)
Organizing data for analysis
Natural language processing for creating category hierarchies
Aggregate tables and aggregate navigation
Using statistical techniques from other disciplines
The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.
Download presentation file
OSCON 2005 Sponsor Opportunities — Email us at
Download the OSCON 05 Sponsor/Exhibitor Prospectus
OSCON 2005 Media Sponsor Opportunities — Call Margi Levin at 707-827-7184 or email at
Press and Media
For media-related inquiries, contact Suzanne Axtell at
Want to receive conference news? Sign up for our email newsletter.