Keiko Nakata - A simple and efficient implementation for small databases

The paper presents a technique for implementing small databases that frequently occur in operating systems and distributed systems design.

The problem

We want to manage data that is structured and must persist across system restart.

The proposed solution is suitable for databases with the following characteristics:

relatively small – up to about 10 megabytes
moderate rate of updates – a burst rate of up to 10 transactions per second, a long term rate of up to 1000 transactions per day
single-shot transaction (not a transaction composed of multiple client actions)

The techniques based on

a main memory database with checkpoints and a re-do log
high-level language data structure
RPCs
strongly typed access to the backing store

Design

The database is represented as an ordinary data structure in virtual memory. Its counterpart on disk has two components:

a checkpoint of some previous (consistent) state of the entire database, and
a log recording each subsequent update to the database.

A read access (to the database) consists purely of a lookup in the virtual memory.

An update is made in three steps, consisting of two virtual memory operations and one disk write:

The virtual memory data is read to verify any preconditions of the update, such as consistency invariants or access controls.
All the parameters of the update are gathered together and recorded as an entry in the disk log. This is the commit point.
The update is applied to the virtual memory data.

The implementation must use an appropriate locking strategy to serialize updates and to prevent interference between updates and concurrent enquiries.

From time to time, the implementation

records on disk a checkpoint containing the entire contents of the virtual memory data structure, then
resets the log to be empty, and
removes any previous checkpoint from the disk.

The implementation can use the primitives of its host file system to ensure that these actions are performed atomically.

Example

Consider a simple name server, which offers a general purpose name-to-value mapping. It may use three locking modes to mediate concurrent access, shared, update and exclusive:

&	shared	update	exclusive
shared	o	o	x
update	o	x	x
exclusive	x	x	x

An enquiry is performed with a shared lock.
An update first acquires an update lock. After it has verified its preconditions, it assembles its log record and commits it to disk. Then the update lock is converted to an exclusive lock and modifies the virtual memory structures.
An update lock is held while writing a checkpoint.

References

A simple and efficient implementation of a small databases (SOSP ’87)