Documentation/Migration with shared storage: Difference between revisions

From QEMU
No edit summary
No edit summary
Line 45: Line 45:
QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc).
QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc).


This data needs to be discarded before after migration starts.
This data needs to be discarded after the last piece of incoming migration data is received but before the guest starts.


=== QED ===
=== QED ===

Revision as of 23:17, 11 November 2011

Migration in QEMU is designed assuming cache coherent shared storage and raw format block devices. There are some cases where less migration will also work with more weakly coherent shared storage. This wiki page attempts to outline those scenarios. It also attempts to iterate through the reasons why various image formats do not support migration even with shared storage.

NFS

Background

NFS only offers close-to-open cache coherence. This means that the only guarantee provided by the protocol is that if you close a file in a client A and then open the file in another client B, client B will see client A's changes.

The way migration works in QEMU, the source stops the guest after it sends all of the required data but does not immediately free any resources. This makes migration more reliable since it avoids the Two Generals Problem allowing a reliable third node to make the final decision about whether migration was successful.

As soon as the destination receives all of the data, it immediately starts the guest. This means that the reliable third node is not in the critical path of migration downtime but can still recover a failed migration.

Since the source never knows that the destination is okay, the only way to support NFS robustly would be to close all files on the source before sending the last chunk of migration data. This would mean that if any failure occurred after this point, the VM would be lost.

In Practice

A Linux NFS server that exports with 'sync' offers a stronger coherency than NFS guarantees. This is an implementation detail, not a guarantee as far as I know. If the client sends a read request, then any data that has been acknowledged done with a stable write by any other client will be returned without the need to close and reopen the file.

A file opened with O_DIRECT with the Linux NFS client code wil always issue a protocol read operation given a userspace read() call. This means that if you issue stable writes (fsync) on the source and then use O_DIRECT to read on the destination, you can safely access the same file without reopening.

Conclusion

Migration with QEMU is safe, in practice, when using Linux as an NFS server and client when both the source and destination are using cache=none for the disks and a raw file.

iSCSI/Direct Attached Storage

iSCSI has a similar cache coherency guarantee to direct attached storage (via fibre channel). Any read request will return data that has been acknowledged as written by another client.

Since QEMU issues read() requests in userspace, Linux normally uses the page cache. The Linux page cache is not coherent across multiple nodes so the only way to safely access storage coherently is to bypass the Linux page cache via cache=none.

Conclusion

iSCSI, FC, or other forms of direct attached storage are only safe to use with live migration if you use cache=none and a raw image.

Clustered File Systems

Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use with live migration regardless of the caching option use as long as raw images are used.

Image Formats

Image formats are not safe to use with live migration. The reason is that QEMU caches data for image formats and does not have a mechanism to flush those caches. The following attempts to describe the issues with the various formats

QCOW2

QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc).

This data needs to be discarded after the last piece of incoming migration data is received but before the guest starts.

QED

QED caches similar data to QCOW2. In addition, the QED header has a dirty flag that must be handled specially in the case of live migration.

Raw Files

Technically, the file size of a raw file is mutable metadata that QEMU caches. This is only applicable when using online image resizing. If you avoid online image resizing during live migration, raw files are completely safe provided the storage used meets the above requirements.