O'Reilly Open Source Convention - August 1-5, 2005 - Portland, Oregon
 Convention Coverage

Session

Open Source Data Warehouses
Roger Magoulas, Director Market Research, O'Reilly Media, Inc.

Track: Databases
Date: Thursday, August 4th, 2005
Time: 5:20pm - 6:05pm
Location: E148

Commodity hardware, faster, cheaper disks, and open source software now make building a data warehouse more of a resource and design issue than a cost consideration for many organizations. A robust analysis infrastructure can be built using open source components with no performance or functional compromises.

This talk covers a proven analysis architecture, the open source tool options for each architecture component, the basics of dimensional modeling, and a few tricks of the trade.

Why open source? Aside from the cost savings, open source lets you leverage what your staff already knows--tools like Perl, SQL and Apache--rather than having to procure and staff for the proprietary tools that dominate the commercial space. Topics include:

Data Warehouse Architecture:

  • Consolidated Data Store (CDS)
  • Condition, correlate and transform data
  • Multi-topic data marts
  • Dimensional models
  • Multi-channel data access

    Open Source Components

  • Open Source Database Choices
  • Data Movement: SQL/Perl/DBI
    - fast, flexible
  • Data Access and Presentation: AJaX/Java/Perl/Ruby/SQL
    - Template toolkit for ad hoc SQL
    - Perl reporting; Spreadsheet::WriteExcel
    - Visualization techniques
    - Graphing Tools
  • Analysis: R (CRAN)

    Dimensional Model

  • Organizing data for analysis
  • Natural language processing for creating category hierarchies

    Performance Considerations

  • Configuration
  • Indexing
  • SQL-92 joins
  • Data partitioning
  • Aggregate tables and aggregate navigation

    Analysis Examples

  • Using statistical techniques from other disciplines

    The presentation should provide you with the basic architecture, toolkit, design principles, and strategy for building an effective open source data warehouse.


    Download presentation file


  • Diamond Sponsors

    Computer Associates International Inc., (CA)
    Hewlett Packard
    SpikeSource
    Sun Microsystems

    Platinum Sponsors

    Novell, Inc.

    Gold Sponsors

    ActiveState
    IBM
    Ticketmaster

    Silver Sponsors

    ActiveGrid
    Alfresco
    Black Duck Software
    CollabNet
    Covalent Technologies
    Google
    GroundWork Open Source Solutions
    Intel Corporation
    Mergere, Inc.
    Microsoft
    Oracle
    Palamida
    SourceLabs
    SugarCRM
    Yahoo! Inc.
    Zend Technologies, Inc.

    Media Sponsors

    boing boing
    C/C++ Users Journal
    DevtownStation News
    Digital ID World
    Enterprise Open Source Journal
    Free Software Magazine
    InsideMac Radio
    Integration Developer News
    Linux Journal
    LinuxQuestions.org
    Open Enterprise Trends
    Queue
    SDForum
    Software Association of Oregon
    Version Tracker
    Wi-Fi Technology Forum
    Women's Technology Cluster
    WorldWIT

    In-Kind Sponsors

    Dell Inc.
    Gibson
    Griffin Technology
    Harman Multimedia
    Smugmug

    Sponsors

    OSCON 2005 Sponsor Opportunities — Email us at

    Download the OSCON 05 Sponsor/Exhibitor Prospectus

    OSCON 2005 Media Sponsor Opportunities — Call Margi Levin at 707-827-7184 or email at

    Press and Media

    For media-related inquiries, contact Suzanne Axtell at

    Conference News

    Want to receive conference news? Sign up for our email newsletter.

    O'Reilly Home | Privacy Policy

    © 2005, O'Reilly Media, Inc.