Roger Magoulas, Director Market Research, O'Reilly Media, Inc.
Track: MySQL
Date: Wednesday, July 28
Time: 1:45pm - 3:20pm
Location: Salon I
TrackBack
Commodity hardware, faster disks, and open source software now make building a data warehouse more of a resource and design issue than a cost issue for many organizations. Now a robust analysis infrastructure can be built on an open source platform with no performance or functional compromises.
This talk will cover a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.
Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows -- tools like Perl, SQL and Apache -- rather than having to procure and staff for the proprietary tools that dominate the commercial space.
Data Warehouse Architecture:
- Consolidated Data Store (CDS)
- Process to condition, correlate and transform data
- Multi-topic data marts
- dimensional models
- Multi-channel data access
Open Source Components
Database: MySQL
- fast, effective
Data Movement: Perl/DBI/SQL
- flexible data access
Data Access: Perl/Apache/SQL
- template toolkit for ad hoc SQL
- Perl hash for crosstabs/pivot
- Perl for reports
Dimensional Model
- organizes data for queries and navigation from detail to summary
- normalized fact table for quantitative data
- denormalized dimensions with descriptive data
- conforming dimensions available to multiple facts
Performance Considerations
- configuration
- indexing
- SQL-92 joins
- aggregate tables and aggregate navigation
The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.
Download presentation file