Features/Qcow2DataIntegrity

From QEMU
Revision as of 11:00, 27 April 2010 by Stefanha (talk | contribs)

Summary

The qcow2 image format driver has known data integrity issues which need to be addressed.

Owner

Description

There are several areas where data integrity could be improved for qcow2:

Error code paths

Some error code paths do not back out metadata changes and leave the image in an inconsistent state. These are plain bugs.

This could be fixed by fleshing out error handling code and testing using Kevin's blkdebug.

Metadata update ordering, Part 1

The order in which metadata updates are made is unsafe. There may be time windows during which the QCOW2 image is in an inconsistent state. For example, an old L2 cluster is freed before a new one is allocated and put in place. An interruption during these time windows will result in data loss.

This could be fixed by ensuring that metadata updates never transition through an inconsistent QCOW2 image state. (The worst case scenario would be leaked clusters but never data loss.)

Metadata update ordering, Part 2

There is no data integrity when cache=writeback or cache=none. (Although cache=none bypasses the page cache, the physical disk may have a volatile write cache that needs to be flushed). QCOW2 metadata updates do not use barriers and may therefore be written to disk out of order. If disk I/O is interrupted (e.g. host power failure), the result may be an inconsistent QCOW2 image.

This could be fixed by calling fdatasync() in the right places in the QCOW2 code. But using fdatasync() seems like a big performance issue here and we are basically degrading to cache=writethrough - we don't actually want a flush, we want a barrier.

Status

Needs further discussion with Kevin Wolf, Christoph Hellwig, and others.

Links