O'Reilly European Open Source Convention - October 17-20, 2005 - Amsterdam, The Netherlands
 Convention Coverage


Open Source Data Warehouses
Roger Magoulas, Director Market Research, O'Reilly Media, Inc.

Track: Databases
Date: Thursday, 20 October 2005
Time: 10:45 - 11:30
Location: Foyer Room

Commodity hardware, faster, cheaper disks, and open source software now make building a data warehouse more of a resource and design issue than a cost consideration for many organizations. A robust analysis infrastructure can be built using open source components with no performance or functional compromises.

This talk covers a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.

Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows--tools like Perl, SQL and Apache--rather than having to procure and staff for the proprietary tools that dominate the commercial space. Topics include:

Data Warehouse Architecture:

  • Consolidated Data Store (CDS)
  • Condition, correlate and transform data
  • Multi-topic data marts
  • Dimensional models
  • Multi-channel data access

    Open Source Components

  • Open Source Database Choices
  • Data Movement: SQL/Perl/DBI
    - fast, flexible
  • Data Access and Presentation: AJaX/Java/Perl/Ruby/SQL
    - Template toolkit for ad hoc SQL
    - Perl reporting; Spreadsheet::WriteExcel
    - Visualization techniques
    - Graphing Tools
  • Analysis: R (CRAN)

    Dimensional Model

  • Organizing data for analysis
  • Natural language processing for creating category hierarchies

    Performance Considerations

  • Configuration
  • Indexing
  • SQL-92 joins
  • Data partitioning
  • Aggregate tables and aggregate navigation

    Analysis Examples

  • Using statistical techniques from other disciplines

    The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.

  • Diamond Sponsors

    Computer Associates International Inc., (CA)

    Gold Sponsors


    Silver Sponsors

    Intel Corporation
    Red Hat
    Sleepycat Software

    Media Sponsors

    boing boing
    C/C++ Users Journal
    Ping Wales
    Ping Wales
    Security Horizon
    Software Developers Journal
    Software Network
    Wydawnictwo Software


    EuroOSCON Sponsor Opportunities — Email us at

    Download the EuroOSCON Sponsor/Exhibitor Prospectus

    EuroOSCON Media Sponsor Opportunities — Call Margi Levin at 707-827-7184 or email at

    Press and Media

    For media-related inquiries, contact Suzanne Axtell at

    Conference News

    Want to receive conference news? Sign up for our email newsletter.
    O'Reilly Home | Privacy Policy

    © 2005, O'Reilly Media, Inc.