Session

Big Data and the Open Warehouse
Roger Magoulas, Director Market Research, O'Reilly Media, Inc.

Track: Insight
Date: Tuesday, 19 September 2006
Time: 11:55 - 12:35
Location: Salon Versailles

Multicore commodity hardware, faster, cheaper disks, and open source software now make building a data warehouse more of a resource and design issue than a cost decision for many organizations. A robust analysis infrastructure can be built using open source components with no performance or functional compromises.

This talk will cover a hardware/software/design architecture, open source tool options, the basics of dimensional modeling, options for handling big data, and a few tricks of the trade.

Why open source? Aside from cost savings, open source tools let you leverage what your staff already knows -- tools like SQL, Python, Perl, Apache -- rather than having to procure staff for the proprietary tools that dominate the commercial space.

Topics covered include:
Data Warehouse Architecture

  • Hardware considerations and configuration
  • Consolidated Data Store (CDS)
  • Extract / Transform / Load options and techniques
  • Multitopic data marts
  • Dimensional models and shared dimensions
  • Data access options

Open Source Components

  • Database Choices
  • Data Movement: SQL/Python
  • Data Access and Presentation: Ajax/Java/PHP/Ruby/SQL
Performance Considerations
  • Configuration
  • Indexing
  • Data partitioning
  • Parallelization
  • Aggregate tables and aggregate navigation
Analysis Techniques
  • Natural Language Processing
  • Topic Maps
Organizational Considerations
  • Sponsors
  • User types

The presentation should provide you with the basic architecture, tool options, design principles, analysis techniques, and organizational strategy to get started thinking about building your own business intelligence infrastucture.


Download presentation file