Documentation/Migration with shared storage: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Migration in QEMU is designed assuming cache coherent shared storage | Migration in QEMU is designed assuming cache coherent shared storage. There are some cases where migration will also work with more weakly coherent shared storage. This wiki page attempts to outline those scenarios. | ||
== NFS == | == NFS == | ||
Line 21: | Line 21: | ||
=== Conclusion === | === Conclusion === | ||
Migration with QEMU is safe, in practice, when using Linux as an NFS server and client when both the source and destination are using cache=none for the disks | Migration with QEMU is safe, in practice, when using Linux as an NFS server and client when both the source and destination are using cache=none for the disks. | ||
== iSCSI/Direct Attached Storage == | == iSCSI/Direct Attached Storage == | ||
Line 27: | Line 27: | ||
iSCSI has a similar cache coherency guarantee to direct attached storage (via fibre channel). Any read request will return data that has been acknowledged as written by another client. | iSCSI has a similar cache coherency guarantee to direct attached storage (via fibre channel). Any read request will return data that has been acknowledged as written by another client. | ||
Since QEMU issues read() requests in userspace, Linux normally uses the page cache. The Linux page cache is not coherent across multiple nodes so the only way to safely access storage coherently is to bypass the Linux page cache via cache=none. | Since QEMU issues read() requests in userspace, Linux normally uses the page cache. The Linux page cache is not coherent across multiple nodes so the only way to safely access storage coherently is to bypass the Linux page cache via cache=none or the QEMU iscsi block driver (using ''iscsi://'' URIs). | ||
=== Conclusion === | === Conclusion === | ||
iSCSI, FC, or other forms of direct attached storage are only safe to use with live migration if you use cache=none | iSCSI, FC, or other forms of direct attached storage are only safe to use with live migration if you use cache=none. | ||
== Clustered File Systems == | == Clustered File Systems == | ||
Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use with live migration regardless of the caching option | Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use with live migration regardless of the caching option used. | ||
== Image Formats == | == Image Formats == | ||
Image formats are | Image formats are safe to use with live migration if QEMU doesn't cache data for image formats or implements a mechanism to flush those caches. The following attempts to describe the issues with the various formats | ||
=== QCOW2 === | === QCOW2 === | ||
QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc). | QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc). This data is discarded after the last piece of incoming migration data is received but before the guest starts, hence QCOW2 images are safe to use with migration. | ||
This data | |||
=== QED === | === QED === | ||
QED caches similar data to QCOW2. In addition, the QED header has a dirty flag that | QED caches similar data to QCOW2. In addition, the QED header has a dirty flag that is normally checked images on open to fix inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check is postponed until migration completes, hence QED images are safe to use with migration. | ||
=== Raw Files === | === Raw Files === | ||
Technically, the file size of a raw file is mutable metadata that QEMU caches. This is only applicable when using online image resizing. If you avoid online image resizing during live migration, raw files are completely safe provided the storage used meets the above requirements. | Technically, the file size of a raw file is mutable metadata that QEMU caches. This is only applicable when using online image resizing. If you avoid online image resizing during live migration, raw files are completely safe provided the storage used meets the above requirements. |
Revision as of 11:32, 20 February 2013
Migration in QEMU is designed assuming cache coherent shared storage. There are some cases where migration will also work with more weakly coherent shared storage. This wiki page attempts to outline those scenarios.
NFS
Background
NFS only offers close-to-open cache coherence. This means that the only guarantee provided by the protocol is that if you close a file in a client A and then open the file in another client B, client B will see client A's changes.
The way migration works in QEMU, the source stops the guest after it sends all of the required data but does not immediately free any resources. This makes migration more reliable since it avoids the Two Generals Problem allowing a reliable third node to make the final decision about whether migration was successful.
As soon as the destination receives all of the data, it immediately starts the guest. This means that the reliable third node is not in the critical path of migration downtime but can still recover a failed migration.
Since the source never knows that the destination is okay, the only way to support NFS robustly would be to close all files on the source before sending the last chunk of migration data. This would mean that if any failure occurred after this point, the VM would be lost.
In Practice
A Linux NFS server that exports with 'sync' offers a stronger coherency than NFS guarantees. This is an implementation detail, not a guarantee as far as I know. If the client sends a read request, then any data that has been acknowledged done with a stable write by any other client will be returned without the need to close and reopen the file.
A file opened with O_DIRECT with the Linux NFS client code wil always issue a protocol read operation given a userspace read() call. This means that if you issue stable writes (fsync) on the source and then use O_DIRECT to read on the destination, you can safely access the same file without reopening.
Conclusion
Migration with QEMU is safe, in practice, when using Linux as an NFS server and client when both the source and destination are using cache=none for the disks.
iSCSI/Direct Attached Storage
iSCSI has a similar cache coherency guarantee to direct attached storage (via fibre channel). Any read request will return data that has been acknowledged as written by another client.
Since QEMU issues read() requests in userspace, Linux normally uses the page cache. The Linux page cache is not coherent across multiple nodes so the only way to safely access storage coherently is to bypass the Linux page cache via cache=none or the QEMU iscsi block driver (using iscsi:// URIs).
Conclusion
iSCSI, FC, or other forms of direct attached storage are only safe to use with live migration if you use cache=none.
Clustered File Systems
Clustered File Systems such as GPFS, Ceph, Glusterfs, or GFS2 are safe to use with live migration regardless of the caching option used.
Image Formats
Image formats are safe to use with live migration if QEMU doesn't cache data for image formats or implements a mechanism to flush those caches. The following attempts to describe the issues with the various formats
QCOW2
QCOW2 caches two forms of data, cluster metadata (L1/L2 data, refcount table, etc) and mutable header information (file size, snapshot entries, etc). This data is discarded after the last piece of incoming migration data is received but before the guest starts, hence QCOW2 images are safe to use with migration.
QED
QED caches similar data to QCOW2. In addition, the QED header has a dirty flag that is normally checked images on open to fix inconsistent metadata. This is undesirable during live migration since the dirty bit may be set if the source host is modifying the image file. The check is postponed until migration completes, hence QED images are safe to use with migration.
Raw Files
Technically, the file size of a raw file is mutable metadata that QEMU caches. This is only applicable when using online image resizing. If you avoid online image resizing during live migration, raw files are completely safe provided the storage used meets the above requirements.