Documentation/Migration with shared storage

From QEMU
Revision as of 10:28, 12 October 2016 by Paolo Bonzini (talk | contribs) (moved Migration/Storage to Documentation/Migration with shared storage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Migration in QEMU is designed assuming cache coherent shared storage. There are some cases where migration will also work with more weakly coherent shared storage. This wiki page attempts to outline those scenarios.

NFS

Background

NFS only offers close-to-open cache coherence. This means that the only guarantee provided by the protocol is that if you close a file in a client A and then open the file in another client B, client B will see client A's changes.

The way migration works in QEMU, the source stops the guest after it sends all of the required data but does not immediately free any resources. This makes migration more reliable since it avoids the Two Generals Problem allowing a reliable third node to make the final decision about whether migration was successful.

As soon as the destination receives all of the data, it immediately starts the guest. This means that the reliable third node is not in the critical path of migration downtime but can still recover a failed migration.

Since the source never knows that the destination is okay, the only way to support NFS robustly would be to close all files on the source before sending the last chunk of migration data. This would mean that if any failure occurred after this point, the VM would be lost.

In Practice

A Linux NFS server that exports with 'sync' offers a stronger coherency than NFS guarantees. This is an implementation detail, not a guarantee as far as I know. If the client sends a read request, then any data that has been acknowledged done with a stable write by any other client will be returned without the need to close and reopen the file.

A file opened with O_DIRECT with the Linux NFS client code wil always issue a protocol read operation given a userspace read() call. This means that if you issue stable writes (fsync) on the source and then use O_DIRECT to read on the destination, you can safely access the same file without reopening.

Conclusion

Migration with QEMU is safe, in practice, when using Linux as an NFS server and client when both the source and destination are using cache=none for the disks.

iSCSI/Direct Attached Storage

iSCSI has a similar cache coherency guarantee to direct attached storage (via fibre channel). Any read request will return data that has been acknowledged as written by another client.

Since QEMU issues read() requests in userspace, Linux normally uses the page cache. The Linux page cache is not coherent across multiple nodes so the only way to safely access storage coherently is to bypass the Linux page cache via cache=none or the QEMU iscsi block driver (using iscsi:// URIs).

Conclusion

iSCSI, FC, or other forms of direct attached storage are only safe to use with live migration if you use cache=none.

Clustered File Systems

Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use with live migration regardless of the caching option used.

Image Formats

Image formats are safe to use with live migration if QEMU doesn't cache data for image formats or implements a mechanism to flush those caches. The following attempts to describe the issues with the various formats

QCOW2

QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc). This data is discarded after the last piece of incoming migration data is received but before the guest starts, hence QCOW2 images are safe to use with migration.

QED

QED caches similar data to QCOW2. In addition, the QED header has a dirty flag that is normally checked images on open to fix inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check is postponed until migration completes, hence QED images are safe to use with migration.

Raw Files

Technically, the file size of a raw file is mutable metadata that QEMU caches. This is only applicable when using online image resizing. If you avoid online image resizing during live migration, raw files are completely safe provided the storage used meets the above requirements.