DirectoryStorage

DirectoryStorage FAQ

1. Why did you write a new storage?

I wanted to use a scalable storage that valued stability, manageability, and simplicity of maintenance over raw performance.

3. How widely used is DirectoryStorage?

Certainly many 10's of users, possibly 100's.

That is far fewer users than FileStorage. If your storage needs are undemanding than you should go with the majority, and stick to FileStorage.

4. How stable is DirectoryStorage?

DirectoryStorage's design focus on simplicity should cause it to have fewer critical problems than other storages. There have been zero reported incidents of data loss through DirectoryStorage bugs.

5. How scalable is DirectoryStorage?

Most users find the limiting factor to be packing speed. DirectoryStorage needs to be packed to reclaim space from old revisions of objects, and objects that have been deleted.

For large storages, packing is reported to be faster than one hour per gigabyte. During that time you can still use the storage, read and write performance will be reduced a little. It is not possible to replicate into or out of a storage that is being packed.

6. What is the best filesystem for DirectoryStorage?

The developers of this storage have always used reiserfs on linux.

Very little testing has been performed on other filesystems. DirectoryStorage makes heavy use of the filesystem, so this choice is critically important to performance and stability. We would appreciate any feedback on using DirectoryStorage with other filesystems.

The filesystem characteristic that is particularly important to DirectoryStorage performance is efficiency of small files. DirectoryStorage uses alot of files that are exactly 8 bytes long.

Early prototypes showed that NTFS can support DirectoryStorage with high performance. Those benchmark have not been repeated since the win32 support was picked up in version 1.1.15.

7. What are the best reiserfs mount options?

noatime, not notail. Some people find this suprising -- in most other applications notail gives a performance increase.

8. How can I copy data from a FileStorage into a DirectoryStorage?

Use a copyTransactionsFrom script. As of version 1.1.12 an example is included in the distribution. For earlier versions, use this one:

from ZODB.FileStorage import FileStorage
from DirectoryStorage.Full import Full
from DirectoryStorage.Filesystem import Filesystem

src = FileStorage('data/Data.fs',read_only=1)

fs = Filesystem('/this/is/the/path/to/my/storage')
dst = Full(fs)

dst.copyTransactionsFrom(src)

src.close()
dst.close()

Edit the paths to point to your Data.fs and DirectoryStorage directory.

You must run the script immediately after creating the DirectoryStorage. It will not work if you have already started Zope with this new storage.

You should take care that nothing is using the FileStorage while it is being read.

10. Does this mean I can edit Zope content by editing ordinary files?

Unfortunately no. The files it creates are pickles. The content of these files is meaningful to ZODB, but do not contain application-level content. Of course it is perfectly possible for you to edit these - the effect is the same as editing a FileStorage Data.fs file.

There are many other products that provide good ways of editing Zope content using a normal editor.

11. Does this mean I can put my site into CVS?

You can, but it may not do you any good. See question 7. If you are used to putting a Data.fs file in CVS, I recommend you tar the DirectoryStorage directory and put that in CVS.

12. Why do I get an exception Invalid argument in sync_directory?

You must be using a filesystem that does not support fsync on directory inodes, such as NFS, smbfs, or PVFS2. As of version 1.1.13 you can use the option [posix]/dirsync in the configuration file to turn off fsync of directory inodes, which should get you up and running.

This option will affect the ACID characteristics of the storage. It definitely may allow a transaction to get lost if the computer crashes soon after it is committed. Other more serious effects may be possible too. You may prefer to use a different filesystem if robustness is a priority.

13. Why do I get a "DirectoryStorage Left snapshot mode" log entry after startup?

It is perfectly normal. LocalFilesystem starts up in snapshot mode as part of its recovery process, to allow it to asynchronously flush any outstanding journal entries and, if the storage was in snapshot mode when it shutdown, to recombine any data written since entering that snapshot. This log entry appears when this work is complete. In most cases there is very little work to do, and the message appears immediately.

14. Why do I get log messages "Flushing 41 transactions (File limit reached)"?

It is perfectly normal. DirectoryStorage uses a journal directory. All writes go into the journal directory first, and are "flushed" into the main storage directory sometime after a transaction has committed. Multiple transactions are flushed in a batch. This is explained in doc/operation. This log message indicates that it has started flushing a batch of transactions because there are now enough files to make it worthwhile.

15. How does the performance of DirectoryStorage compare to other storages?

Relative to FileStorage, all figures approximate:

  • Reads are a factor of 1.5 slower.
  • Intermittant writes are a factor of 1.5 slower.
  • Packing is at least 8 times slower in version 1.0.

This was measured with a 67M database, on reiserfs on linux.

The quoted write performance is accurate for typical usage scenarios where there is not high write pressure. Under high write pressure the journal queue becomes a bottleneck, and performance degrades to 3 times slower than FileStorage.

Having said all of that, you may find that storage performance makes a negligible contribution to your overall system performance.

16. How can I improve write performance?

DirectoryStorage is tuned for installations where there are more reads than writes. This tuning is appropriate for most ZODB/Zope installations, and the default settings are appropriate. However there are several configuration options available if you need better write performance either temporarily (maybe for importing data in bulk from some other source) or permanently (due to a characteristic of your application).

  1. In the config/settings file, change sync: 1 to sync: 0. This eliminates the overhead of checking that all changes in one transaction are on disk before starting the next. This improves throughput. Note that this means your storage is likely to end up corrupt if your operating system crashes while the storage is running. (or soon after it finishes!).
  2. Change check_dangling_references: 1 to 0. This disables extra checks for possible ZODB or application bugs in write transactions, and eliminates an I/O overhead. Note that you may want to leave this option turned off in production if you are using stable versions of ZODB and application code, and storage write performance is critical to your application performance.

17. Why is packing so slow?

DirectoryStorage uses a mark and sweep algorithm. It traverses the database marking every file it needs to keep, then traverses the directory structure unlinking unmarked files. Each file needs one bit of storage for its mark flag.

In the current implementation this mark flag is stored inside the file permission mask. This leads to very fast reads (it has to read the inode anyway, so the mark bit is read 'for free') but slow writes (there are not many inodes per block, so it incurs excess IO overhead).

Other storages use a similar algorithm but store state in memory. This limits their scalability.

A number of alternative packing approaches are under consideration. The key to performance on any operation on a large body of data is to perform that operation incrementally. Any operation that needs to scan all of the data is bound to scale linearly, or worse. Both DirectoryStorage's and FileStorage's packing implementation currently do exactly that.

18. Is there a danger of running out of file descriptors?

No. DirectoryStorage only opens one file at a time (per connection) and closes it again as soon as possible.

19. Is there a danger of running out of inodes?

DirectoryStorage uses alot of tiny files, so this is certainly a risk on filesystems that are vulnerable to this problem. Note that reiserfs does not have this problem because it can create new inodes on demand.

20. I have just packed my storage, old revisions have disappeared, but free disk space has not increased. Why?

You will find lots of files named *-number-deleted in your storage directory. This is a disaster recovery mechanism in case of bugs in the packing code. Packing renames the files so that they are invisible to the rest of DirectoryStorage, and they will be eventually unlinked (and free space reclaimed) on a subsequent pack. If a bug in the packing code should incorrectly remove a file, you can undo the pack by unrenaming these files.

The number in the filename is a timestamp, and by default these deleted files are kept for 10 days. This can be changed using the delay_delete parameter in the configuration file.

The design rule at work here is that there shouldnt be a complicated process (mark/sweep packing) in charge of something that can do permanent damage (unlinking).

21. I know about Full.py, but whats the story of Minimal.py?

Minimal.py is a variant that does not support undo, incremental backups, replication, versions, or packing.

It is ideal if you are using a ZODB as a short term throwaway cache, particulary after turning off md5 checks and fsync in the config file. It is not recommended for any use where long-term data durability is a requirement.

I am not aware of anyone using Minimal in production. It is only 50 lines of code so we dont expect any problems. However if any problems are discovered, they are unlikely to get fixed without a volunteer.

22. Can I pack from the command line?

Using Zope? The best way is to use something like wget to pull Zope's manage_pack url.

If you cant do that, shut down all other processes using the storage and run a script like this:

import time
from DirectoryStorage.Full import Full
from DirectoryStorage.Filesystem import Filesystem
from ZODB.referencesf import referencesf as ZODB_referencesf
fs = Filesystem('/this/is/the/path/to/my/storage')
storage = Full(fs,synchronous=1)
# pack keeping 1 week of history
storage.pack(time.time()-60*60*24*7,ZODB_referencesf)

Unlike most other tools, you dont need to explicitly enter snapshot mode before running this. The storage will manage that for you.

23. Can I force the storage into snapshot mode from the command line?

Using version 1.1.10? Just use:

python snapshot.py --storage /this/is/the/path/to/my/storage

If you are using version 1.1.9 or earlier then you can use the same technique, but the snapshot.py command-line is a little longer. Also that version of snapshot.py needs to have Zope running. If for any reason you are not, or can not run Zope, try this script:

from DirectoryStorage.Full import Full
from DirectoryStorage.Filesystem import Filesystem
fs = Filesystem('/this/is/the/path/to/my/storage')
storage = Full(fs,synchronous=1)
storage.close()