Developing Berkeley DB Java Edition: Internals and War Stories

Charles Lamb, Architect, Oracle

Track: Databases
Date: Wednesday, July 26
Time: 2:35pm - 3:20pm
Location: Portland 252

Berkeley DB, Java Edition (JE) is an open source, pure Java, embedded, transactional datastore. It is designed to be deployed in a multithreaded, high-concurrency environment as a transactional engine for handling hundreds of gigabytes. Its persistence capabilities are used in applications developed by the Internet Archive (Heritrix), General Dynamics (CoMotion), and TIBCO (BusinessEvents).

The talk is directed at Java engineers who are interested in real-world database internals programming and debugging, and focuses on three of the interesting aspects of the JE architecture and some of the issues that were encountered during development.

Log-based storage system: Log-based file systems are wide-spread, but log-based storage systems are less common. While JE's log-based storage system is relatively straightforward, "cleaning" an append-only storage system is not necessarily easy. We discuss the ongoing evolution of the JE storage system cleaner.

High-concurrency B+Tree: The JE API is based on a B+Tree with related search and cursor operations. To improve concurrency we use a variety of novel locking techniques internally. Debugging these concurrency methods has sometimes led to interesting problems. We look at some of the testing methods used and how we debug these results both in the lab and in the field.

Performance Profiling: Over the past few years we've had an opportunity to stress test JE in a variety of high-concurrency applications and scenarios. During those stress tests we've sometimes had to improve performance for a number of benchmarks. We look at some of those tests and how we improved results.