Features/Qcow2DataIntegrity: Difference between revisions

From QEMU
(Created page with '== Summary == The qcow2 image format driver has known data integrity issues which need to be addressed. == Owner == * Name: Stefan Hajnoczi * Email: stefanha…')
 
Line 12: Line 12:
There are several areas where data integrity could be improved for qcow2:
There are several areas where data integrity could be improved for qcow2:


1. Some error code paths do not back out metadata changes and leave
=== Error code paths ===
 
Some error code paths do not back out metadata changes and leave
the image in an inconsistent state.  These are plain bugs.
the image in an inconsistent state.  These are plain bugs.


This could be fixed by fleshing out error handling code and testing using Kevin's blkdebug.
This could be fixed by fleshing out error handling code and testing using Kevin's blkdebug.


2. The order in which metadata updates are made is unsafe.  There may
=== Metadata update ordering, Part 1 ===
 
The order in which metadata updates are made is unsafe.  There may
be time windows during which the QCOW2 image is in an inconsistent
be time windows during which the QCOW2 image is in an inconsistent
state.  For example, an old L2 cluster is freed before a new one is
state.  For example, an old L2 cluster is freed before a new one is
Line 27: Line 31:
would be leaked clusters but never data loss.)
would be leaked clusters but never data loss.)


3. There is no data integrity when cache=writeback or cache=none.
=== Metadata update ordering, Part 2 ===
 
There is no data integrity when cache=writeback or cache=none.
(Although cache=none bypasses the page cache, the physical disk may have
(Although cache=none bypasses the page cache, the physical disk may have
a volatile write cache that needs to be flushed).  QCOW2 metadata
a volatile write cache that needs to be flushed).  QCOW2 metadata

Revision as of 10:58, 27 April 2010

Summary

The qcow2 image format driver has known data integrity issues which need to be addressed.

Owner

Description

There are several areas where data integrity could be improved for qcow2:

Error code paths

Some error code paths do not back out metadata changes and leave the image in an inconsistent state. These are plain bugs.

This could be fixed by fleshing out error handling code and testing using Kevin's blkdebug.

Metadata update ordering, Part 1

The order in which metadata updates are made is unsafe. There may be time windows during which the QCOW2 image is in an inconsistent state. For example, an old L2 cluster is freed before a new one is allocated and put in place. An interruption during these time windows will result in data loss.

This could be fixed by ensuring that metadata updates never transition through an inconsistent QCOW2 image state. (The worst case scenario would be leaked clusters but never data loss.)

Metadata update ordering, Part 2

There is no data integrity when cache=writeback or cache=none. (Although cache=none bypasses the page cache, the physical disk may have a volatile write cache that needs to be flushed). QCOW2 metadata updates do not use barriers and may therefore be written to disk out of order. If disk I/O is interrupted (e.g. host power failure), the result may be an inconsistent QCOW2 image.

This could be fixed by calling fdatasync() in the right places in the QCOW2 code. But using fdatasync() seems like a big performance issue here and we are basically degrading to cache=writethrough - we don't actually want a flush, we want a barrier.

Status

Needs further discussion with Kevin Wolf, Christoph Hellwig, and others.