Features/Snapshots
Live Snapshots
This document is describing the current design of live snapshots for QEMU. It is a work in progress and things may change as we progress.
Overall concept
The idea is to be able to issue a command to QEMU via the monitor or QMP, which causes QEMU to create a new snapshot image with the original image as the backing file, mounted read-only. This will allow the original image file to be backed up.
Roll-back to a previous version requires one to boot from the previous backing file, at which point the snapshot file becomes invalid. Unfortunately there is no way to detect that a backing file has been booted, making it important for administrators to take care to not rely on snapshot files being valid after a roll-back.
The snapshot image will have to be in a format which support backing files, ie QCOW2 and QED, however the original image can be of any supported format. Ie. it is possible to make a QCOW2 snapshot of a RAW image, or a QED snapshot of a QED image.
Guest Agent
Certain operations in the snapshot process can be improved through support from within the guest. These features will be implemented in the Guest Agent. Please check the guest Guest Agent page for design and implementation details.
The two main guest agent features of interest to live snapshots are:
- File system freeze (fsfreeze/fsthaw): This puts the guest file systems into a consistent state, avoiding the need for fsck next time they are mounted.
- Guest application notification: This allows guest applications to register and be notified prior to a snapshot, in order for them to allow flushing their data to disk. This is a future feature!
As of this writing (July 25, 2011), communication with the QEMU guest agent is performed via a virtio serial channel. Commands are sent over the channel encoded as QMP commands, and replies are encoded as QMP replies. There are future plans to implement a passthrough mechanism for agent commands issued via QMP, allowing these commands to be accessible via the QMP monitor instead of an external agent socket on the host.
Note that guest agent collaboration is also needed for snapshots using other methods, such as snapshots performed on btrfs, LVM, enterprise storage, etc.
Snapshot command flow
The snapshot command flow is as follows. Commands are demonstrated using monitor commands for QEMU and agent commands are marked (agent). See the Guest Agent: Example Usage page for details on the specific command implementation for the guest agent commands.
- Run the guest, if not currently running:
(qemu) cont
- RECOMMENDED: Call guest agent requesting it to freeze all file systems and flush all I/O requests. Note that this runs on the guest, and as such the guest must currently be running:
(agent) guest-fsfreeze-freeze
- Initiate synchronous snapshot of device <blockX> to new device snapshot-file:
(qemu) snapshot_blkdev <blockX> <snapshot-file> <format>
'Note:' The above will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file. During snapshot creation the guest will momentarily be halted by QEMU. Pending I/Os will be flushed to disk, the COW headers will be created in the snapshot file/device, and QEMU will replace the file backing device <blockX> with the new snapshot file. On completion of the command, the guest will resume running as the command returns, unless the admin tool explicitly issued the optional stop command as described above.
This command is repeated for each device that is to be snapshot.
- Call guest agent requesting it to thaw/unfreeze all file systems within the guest (if guest-fsfreeze-freeze was issued above):
(agent) guest-fsfreeze-thaw
At this point, the snapshot for the device is complete, and QEMU has pivoted the guest to the new snapshot file for execution.
To visualize this sequence, below are call sequences showing the order and direction of these commands going to both QEMU and the guest agent:
Minimum set of commands:
Guest Manager QEMU ------- -------- ------- | | | | | | | <<- freeze ---o | | | | | o--- snapshot_blkdev --->> | | | | | <<- thaw -----o | | | | | | | | | | = = =
HMP command
The HMP (monitor) command is designed to be flexible enough to handle both internal and external snapshots, as well as snapshots to various different snapshot file formats.
snapshot_blkdev device snapshot-file [format]:
Parameter | Description |
---|---|
device | block device to snapshot |
snapshot-file | target snapshot file (new image filename) |
format | format of snapshot image, valid formats are QCOW2 & QED. If not specified, the image will default to QCOW2. |
QMP command
The QMP command matches the behaviour of the human monitor command, except it is named slightly differently to match the fact that the command is synchronous.
blockdev-snapshot-sync device snapshot-file [format]
Parameter (JSON String) | Description |
---|---|
device | block device to snapshot |
snapshot-file | target snapshot file (new image filename) |
format | format of snapshot image, valid formats are QCOW2 & QED. If not specified, the image will default to QCOW2. |
Here is an example of a QMP snapshot command, in JSON format:
{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "virtio0", "snapshot-file": "/some/place/my-image", "format": "qcow2" } }
Future features
Internal snapshots to images which support internal snapshots (QCOW2 & QED) are not expected to be supported initially.
There have been requests and suggestions for a number of alternative and enhanced interfaces for accessing live snapshots as follows:
Atomic Snapshots of Multiple Devices
There has been some concern with the current snapshot_blkdev command; namely, it performs snapshots one device at a time, even if a guest has multiple devices. This can be troublesome in the instance of a snapshot failure. While qemu will revert back to the original backing store should a snapshot fail, that could still leave the guest with multiple disks in an overall inconsistent state, with respect to its other devices.
A proposed new group of commands - Snapshot Sets - will allow multiple devices to be queued for a snapshot, with the snapshot for all devices happening during a single command. This will allow an entire set to have a snapshot taken, and if any one device fails, the entire set reverted back to the original backing store.
The proposed commands are:
- snapshot_set_create(id)
- snapshot_set_destroy(id)
- snapshot_set_add(id, device, snapshot-file, format)
- snapshot_set_execute(id)
For more details on this proposed API, please see: Atomic Snapshots of Multiple Devices
Example Snapshot Sets Command Sequence
Below we have an example command sequence of an arbitrary number of devices added to a snapshot set, and a snapshot performed of the entire set with the set forgotten at the end of the snapshot:
Guest Manager QEMU ------- -------- ------- | | | | | | | o--- snapshot_set_create(1234) --------------->> | | | | | | | | o--- snapshot_set_add(1234, "virtio0", | | | "/some/place/my-image0", | | | "qcow2" ) ----------->> | | | | | | | | o--- snapshot_set_add(1234, "virtio0", | | | "/some/place/my-image0", | | | "qcow2" ) ----------->> | | | | | | | . . . . . . . . . | o--- snapshot_set_add(1234, "virtioX", | | | "/some/place/my-imageX", | | | "qcow2" ) ----------->> | | | | | | | *<--- freeze ---o | | | | | o--- snapshot_set_execute(id) ----------------->> | | | | *<--- thaw -----o | | | | | | | | | | = = =
internal snapshots
By making the snapshot-file argument of the monitor and QMP command optional, that could be used as a request to make the snapshot internally instead of to an external file. However, without live block migration of an internal snapshot, there is no way to make a backup of an internal snapshot while still leaving the VM running, so this feature is not planned at the present. For now, the snapshot-file argument is required, and only external snapshots are implemented.
fd passed as target for snapshot file/device
To get around problems with selinux, in particular in conjunction with images based on NFS, there is a wish to be able to pass an already open file descriptor using the getfd interface.
However, this poses a number of problems. When creating the COW headers for the new image file, as the COW header needs to know the file name of the disk image it is pointing to. On Linux this can be obtained through /proc/self/fd/<X> but this is not available on all other operating systems.
An alternative solution would be to extend the getfd interface to take an optional file name. However this would be a hack and open up for errors, as it would allow a broken/hostile guest/QEMU process to create an image which points to the wrong place, but which wouldn't be discovered until the time where the image was actually booted.
Allowing the controlling application to create the COW headers in the new image is not an acceptable solution. It is race prone as the image is not following the backing file which is still in use, and would also cause problems for COW formats where the new COW headers include state as of when they are created.
Separating into multiple commands
There are suggestions for splitting the snapshot process into multiple monitor/QMP commands to allow for asynchronous command processing. The process would be split as follows, using human monitor style commands as example:
(agent) guest-agent-fsfreeze
Call guest agent requesting it to freeze all file systems and flush all I/O requests.
(qemu) freeze-io <blockX>
Instruct QEMU to freeze all I/O processing for block device <blockX>
(qemu) getfd <fd> snapshotfd
Provide file descriptor <fd> and assign it the logical name snapshotfd
(qemu) snapshot-blkdev-async <blockX> fd:snapshotfd <format>
Initiate asynchronous snapshot of device <blockX> to recently provided file descriptor snapshotfd. This will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file, externally creating these will not be allowed!
On completion a completion notification will be returned to the caller, hence this will require QAPI in place for proper async QMP command support.
(qemu) thaw-io <blockX>
Un-freeze I/O processing for device <blockX>
(agent) guest-agent-fsthaw
Call guest agent requesting it to thaw/unfreeze all file systems within the guest.
(qemu) snapshot-blkdev-status <blockX>
Query the current snapshot status of <blockX>. In addition some form of notification of completion will be required.
Note that the caller can loop the process of comments freeze-io, getfd, snapshot-blkdev-async, and thaw-io to snapshot multiple block devices in one guest.
Live merge
See http://wiki.qemu.org/Features/LiveBlockMigration