The qcow2 image format driver has known data integrity issues which need to be addressed.
There are several areas where data integrity could be improved for qcow2:
Some error code paths do not back out metadata changes and leave the image in an inconsistent state. These are plain bugs.
This could be fixed by fleshing out error handling code and testing using Kevin's blkdebug.
The order in which metadata updates are made is unsafe. There may be time windows during which the QCOW2 image is in an inconsistent state. For example, an old L2 cluster is freed before a new one is allocated and put in place. An interruption during these time windows will result in data loss.
This could be fixed by ensuring that metadata updates never transition through an inconsistent QCOW2 image state. (The worst case scenario would be leaked clusters but never data loss.)
There is no data integrity when cache=writeback or cache=none. (Although cache=none bypasses the page cache, the physical disk may have a volatile write cache that needs to be flushed). QCOW2 metadata updates do not use barriers and may therefore be written to disk out of order. If disk I/O is interrupted (e.g. host power failure), the result may be an inconsistent QCOW2 image.
This could be fixed by calling fdatasync() in the right places in the QCOW2 code. But using fdatasync() seems like a big performance issue here and we are basically degrading to cache=writethrough - we don't actually want a flush, we want a barrier.
Kevin Wolf has addressed instances of error code paths and metadata update ordering. He has introduced a bdrv_(p)write_sync() functions to ensure that metadata update ordering is preserved.
Currently there may still be error code path and Metadata update ordering, Part 1 issues left.
Performance could be improved using a userspace I/O barrier instead of syncs.