Features/LiveBlockMigration: Difference between revisions
(Added intermediate discussion result for live block copy and image streaming) |
|||
Line 94: | Line 94: | ||
protocol/copy on read. | protocol/copy on read. | ||
= Common implementation for live block copy and image streaming = | |||
==Requirements== | |||
* Live block copy: | |||
** Open both source and destination read-write | |||
** Mirror writes to source and destination | |||
* Image streaming: | |||
** Source must be read-only | |||
* Be able to do a "partial" copy, i.e. share a backing file and don't copy data that can be read from this backing file | |||
* Use the COW provided by image formats, so that a crash doesn't force you to restart the whole copy | |||
** Need a raw-cow driver that uses an in-memory COW bitmap for a raw image, so that we can use raw output images | |||
** Anything that can do COW is good enough, so the implementation should be independent of the format | |||
* libvirt already has implemented block_stream QMP commands, so let's keep them unchanged if possible | |||
Links | ==Building blocks== | ||
http://wiki.qemu.org/Features/Snapshots | # Copy on read | ||
http://wiki.qemu.org/Features/Snapshots2 | #* Improves time it takes to complete the copy | ||
http://wiki.qemu.org/Features/Block/Merge | #* Allows simpler implementation of copy | ||
http://wiki.qemu.org/Features/Livebackup | # Background task doing the copy (by reading the whole image sequentially) | ||
[http://wiki.libvirt.org/page/Snapshots libvirt wiki for settling on needed libvirt snapshot APIs] | # Mirroring of writes to source and destination | ||
#* source is a backing file of destination, i.e. we need to support writable backing files | |||
# Wrappers for live block copy and image streaming that enable the above features | |||
# Switching the backing file after the copy has completed (separate monitor command like in block copy?). Ends the mirroring if enabled. | |||
==Example (live block copy)== | |||
This section tries to explain by example what happens when you perform a live block copy. To make things interesting, we have a backing file (''base'') that is shared between source and destination, and two overlays (''sn1'' and ''sn2'') that are only present on the source and must be copied. | |||
So initially the backing file chain look like this: | |||
base <- sn1 <- sn2 | |||
We start by creating the copy as a new image on top of sn2: | |||
base <- sn1 <- sn2 <- copy | |||
This gets all reads right automatically. For writes we use a mirror mechanism that redirects all writes to both sn2 and copy, so that sn2 and copy read the same at any time (and this mirror mechanism is really the only difference of live block copy from image streaming). | |||
We start a background task that loops over all clusters. For each cluster, there are the following possible cases: | |||
# The cluster is already allocated in ''copy''. Nothing to do. | |||
# Use something like bdrv_is_allocated() to follow the backing file chain. If the cluster is read from ''base'' (which is shared), nothing to do. | |||
#* For image streaming we're already on the destination. If we don't have shared storage and the protocol doesn't provide this bdrv_is_allocated() variant, we can still do something like qemu-img rebase and just compare the read data with data in ''base'' in order to decide if we need to copy. | |||
# Otherwise copy the cluster into ''copy'' | |||
When the copy has completed, the backing file of ''copy'' is switched to ''base'' (in a qemu-img rebase -u way). | |||
=Links= | |||
* http://wiki.qemu.org/Features/Snapshots | |||
* http://wiki.qemu.org/Features/Snapshots2 | |||
* http://wiki.qemu.org/Features/Block/Merge | |||
* http://wiki.qemu.org/Features/Livebackup | |||
* [http://wiki.libvirt.org/page/Snapshots libvirt wiki for settling on needed libvirt snapshot APIs] |
Revision as of 15:12, 21 June 2011
Future qemu is expected to support these features (some already implemented):
Live features
Live block copy
Ability to copy 1+ virtual disk from the source backing file/block device to a new target that is accessible by the host. The copy supposed to be executed while the VM runs in a transparent way.
Status: code exists (by Marcelo) today in qemu but needs refactoring due to a race condition at the end of the copy operation. We agreed that a re-implementation of the copy operation should take place that makes sure the image is completely mirrored until management decides what copy to keep.
Live snapshots and live snapshot merge
Live snapshot is already incorporated (by Jes) in qemu (still need qemu-agent work to freeze the guest FS).
Live snapshot merge is required in order of reducing the overhead caused by the additional snapshots (sometimes over raw device). Currently not implemented for a live running guest
Possibility: enhance live copy or image streaming to be used for live snapshot merge. It is almost the same mechanism.
Copy on read (image streaming)
Ability to start guest execution while the parent image reside remotely and each block access is replicated to a local copy (image format snapshot)
It should be nice to have a general mechanism that will be used for all image formats. What about the protocol to access these blocks over the net? We can reuse existing ones (nbd/iscsi).
Such functionality can be hooked together with live block migration instead of the 'post copy' method.
Live block migration (pre/post)
Beyond live block copy we'll sometimes need to move both the storage and the guest. There are two main approached here: - pre copy First live copy the image and only then live migration the VM. It is simple but if the purpose of the whole live block migration was to balance the cpu load, it won't be practical to use since copying an image of 100GB will take too long. - post copy First live migrate the VM, then live copy it's blocks. It's better approach for HA/load balancing but it might make management complex (need to keep the source VM alive, what happens on failures?) Using copy on read might simplify it - post copy = live snapshot + copy on read.
In addition there are two cases for the storage access: 1. The source block device is shared and can be easily accessed by the destination qemu-kvm process. That's the easy case, no special protocol needed for the block devices copying. 2. There is no shared storage at all. This means we should implement a block access protocol over the live migration fd :(
We need to chose whether to implement a new one, or re-use NBD or iScsi (target&initiator)
Using external dirty block bitmap
FVD has an option to use external dirty block bitmap file in addition to the regular mapping/data files.
We can consider using it for live block migration and live merge too. It can also allow additional usages of 3rd party tools to calculate diffs between the snapshots. There is a big down side thought since it will make management complicated and there is the risky of the image and its bitmap file get out of sync. It's much better choice to have qemu-img tool to be the single interface to the dirty block bitmap data.
Summary
* We need Marcelo's new (to come) block copy implementation * should work in parallel to migration and hotplug * General copy on read is desirable * Live snapshot merge to be implemented using block copy * Need to utilize a remote block access protocol (iscsi/nbd/other) Which one is the best? * Keep qemu-img the single interface for dirty block mappings. * Live block migration pre copy == live copy + block access protocol + live migration * Live block migration post copy == live migration + block access protocol/copy on read.
Common implementation for live block copy and image streaming
Requirements
- Live block copy:
- Open both source and destination read-write
- Mirror writes to source and destination
- Image streaming:
- Source must be read-only
- Be able to do a "partial" copy, i.e. share a backing file and don't copy data that can be read from this backing file
- Use the COW provided by image formats, so that a crash doesn't force you to restart the whole copy
- Need a raw-cow driver that uses an in-memory COW bitmap for a raw image, so that we can use raw output images
- Anything that can do COW is good enough, so the implementation should be independent of the format
- libvirt already has implemented block_stream QMP commands, so let's keep them unchanged if possible
Building blocks
- Copy on read
- Improves time it takes to complete the copy
- Allows simpler implementation of copy
- Background task doing the copy (by reading the whole image sequentially)
- Mirroring of writes to source and destination
- source is a backing file of destination, i.e. we need to support writable backing files
- Wrappers for live block copy and image streaming that enable the above features
- Switching the backing file after the copy has completed (separate monitor command like in block copy?). Ends the mirroring if enabled.
Example (live block copy)
This section tries to explain by example what happens when you perform a live block copy. To make things interesting, we have a backing file (base) that is shared between source and destination, and two overlays (sn1 and sn2) that are only present on the source and must be copied.
So initially the backing file chain look like this:
base <- sn1 <- sn2
We start by creating the copy as a new image on top of sn2:
base <- sn1 <- sn2 <- copy
This gets all reads right automatically. For writes we use a mirror mechanism that redirects all writes to both sn2 and copy, so that sn2 and copy read the same at any time (and this mirror mechanism is really the only difference of live block copy from image streaming).
We start a background task that loops over all clusters. For each cluster, there are the following possible cases:
- The cluster is already allocated in copy. Nothing to do.
- Use something like bdrv_is_allocated() to follow the backing file chain. If the cluster is read from base (which is shared), nothing to do.
- For image streaming we're already on the destination. If we don't have shared storage and the protocol doesn't provide this bdrv_is_allocated() variant, we can still do something like qemu-img rebase and just compare the read data with data in base in order to decide if we need to copy.
- Otherwise copy the cluster into copy
When the copy has completed, the backing file of copy is switched to base (in a qemu-img rebase -u way).