DirectoryStorage

General Overview of DirectoryStorage Operation

In this explanation HOME refers to the home directory of the storage, specified in the constructor.

Transactions

DirectoryStorage implements the high-level ZODB transaction semantics using low-level filesystem operations, many of which are not atomic.

Transactional behaviour is implemented by os-specific classes derived from BaseFilesystem, such as PosixFilesystem. These classes use several real filesystem directories to create the appearance of a single large virtual directory in which a group of files can be replaced atomically.

On starting a transaction it creates a new subdirectory of HOME/journal/ which used to hold whole files written in that transaction. The name of this transaction directory is derived from the transaction id such as HOME/journal/working_034468c19bc9d9d5_temp.

During a transaction, files are written into the transaction directory. The name of each file in the transaction directory is the same as the name of the file being written to the storage, however the transaction directory does not have subdirectories.

At transaction commit, it syncs all of the files written during that transaction, renames the transaction file from HOME/journal/working_034468c19bc9d9d5_temp to HOME/working_034468c19bc9d9d5_done, and syncs the journal directory. At this point all changes are durable. If there is a fatal error, the recovery process knows that the transaction needs to be rolled forwards, not backwards, because of the _done name.

At transaction abort, the journal directory is emptied and removed.

After transaction commit, it asynchonously flushes the files from the transaction directory into the database directory. The transaction directory can be removed once every file has been moved.

Doing this asynchronously means that the most current version of the file is temporarily stored only in the journal directory. If the file has to be read, we open it from the journal not the usual database directory.

Note that DirectoryStorage allows many transactions to build up in the journal directory, so that they can be flushed in batches. Batching of flushes is a significant optimisation because it allows many IO operations to be combined or eliminated, both by the storage and by the operating system.

In order to prevent journal overload only a small number of batches of transactions are allowed to remain unflushed. Writes are blocked to prevent this limit being exceeded.

Format

The main database directory is HOME/A. Files are not stored directly in that directory to prevent it growing too large. They are stored in a subdirectory whose name is derived from the filename, as defined by the format. (See doc/formats.txt)

Full

The Full storage stores three types of file. Files named tXXXXX, where XXXXX is the 16-character hex-encoded transaction id, contain details about each transaction including a list of modified oids.

Files oYYYYY.c, where YYYYY is the 16-character hex-encoded oid, are 8 byte long files which contain the current serial number of the oid.

Files oYYYYY.XXXXX, where YYYYY is the oid, and XXXXX is the serial number, contain data about this revision of this object. (Note that serial numbers are chosen to be identical to transaction ids)

A small number of other files are used to store information such as the last used oid, last used serial number, and last pack time.

Minimal

This storage only uses one type of file. Files are named o.YYYYY.d, where YYYYY is the 16-character hex-encoded oid, and contain all information about the current revision of this object. No historical revisions are stored.