Features/Snapshots

From QEMU

Live Snapshots

This document is describing the current design of live snapshots for QEMU. It is a work in progress and things may change as we progress.

Overall concept

The idea is to be able to issue a command to QEMU via the monitor or QMP, which causes QEMU to create a new snapshot image with the original image as the backing file, mounted read-only. This will allow the original image file to be backed up.

Roll-back to a previous version requires one to boot from the previous backing file, at which point the snapshot file becomes invalid. Unfortunately there is no way to detect that a backing file has been booted, making it important for administrators to take care to not rely on snapshot files being valid after a roll-back.

The snapshot image will have to be in a format which support backing files, ie QCOW2 and QED, however the original image can be of any supported format. Ie. it is possible to make a QCOW2 snapshot of a RAW image, or a QED snapshot of a QED image.

Guest Agent

Certain operations in the snapshot process can be improved through support from within the guest. These features will be implemented in the Guest Agent. Please check the guest Guest Agent page for design and implementation details.

The two main guest agent features of interest to live snapshots are:

  1. File system freeze (fsfreeze/fsthaw): This puts the guest file systems into a consistent state, avoiding the need for fsck next time they are mounted.
  2. Guest application notification: This allows guest applications to register and be notified prior to a snapshot, in order for them to allow flushing their data to disk. This is a future feature!

As of this writing (July 25, 2011), communication with the QEMU guest agent is performed via a virtio serial channel. Commands are sent over the channel encoded as QMP commands, and replies are encoded as QMP replies. There are future plans to implement a passthrough mechanism for agent commands issued via QMP, allowing these commands to be accessible via the QMP monitor instead of an external agent socket on the host.

Note that guest agent collaboration is also needed for snapshots using other methods, such as snapshots performed on btrfs, LVM, enterprise storage, etc.

Snapshot command flow

The snapshot command flow is as follows. Commands are demonstrated using monitor commands for QEMU and agent commands are marked (agent). See the guest agent page for details on the specific command implementation for the guest agent commands.

(qemu) cont

Run the guest

(agent) guest-agent-fsfreeze

Call guest agent requesting it to freeze all file systems and flush all I/O requests (optional)

(qemu) stop

Pause guest. Optional - only required if there is no freeze support or the admin tool expects taking a system dump of the guest at the same time to match the disks snapshot.

(qemu) snapshot_blkdev <blockX> <snapshot-file> <format>

Initiate synchronous snapshot of device <blockX> to new device snapshot-file. This will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file.

During snapshot creation the guest will momentarily be halted by QEMU. Pending I/Os will be flushed to disk, the COW headers will be created in the snapshot file/device, and QEMU will replace the file backing device <blockX> with the new snapshot file. On completion of the command, the guest will resume running as the command returns, unless the admin tool explicitly issued the optional stop command as described above.

This command is repeated for each device that is to be snapshot.

(qemu) cont

Un-pause the guest (optional - as per stop command above).

(agent) guest-agent-fsthaw

Call guest agent requesting it to thaw/unfreeze all file systems within the guest.

Monitor command

The monitor command is designed to be flexible enough to handle both internal and external snapshots, as well as snapshots to various different snapshot file formats.

snapshot_blkdev device snapshot-file [format]:

device block device to snapshot
snapshot-file target snapshot file
format format of snapshot image, valid formats are QCOW2 & QED. If not specified, the image will default to QCOW2.

QMP command

The QMP command matches the behaviour of the human monitor command, except it is named slightly differently to match the fact that the command is synchronous.

blockdev-snapshot-sync device snapshot-file [format]

device device name to snapshot (json-string)
snapshot-file name of new image file (json-string)
format format of new image (json-string, optional)


Here is an example of a QMP snapshot command, in JSON format:

{ "execute": "blockdev-snapshot-sync", "arguments": { "device": "virtio0",
                                                      "snapshot-file":
                                                      "/some/place/my-image",
                                                      "format": "qcow2" } }

Future features

Internal snapshots to images which support internal snapshots (QCOW2 & QED) are not expected to be supported initially.

There have been requests and suggestions for a number of alternative and enhanced interfaces for accessing live snapshots as follows:

Atomic Snapshots of Multiple Devices

There has been some concern with the current snapshot_blkdev command; namely, it performs snapshots one device at a time, even if a guest has multiple devices. This can be troublesome in the instance of a snapshot failure. While qemu will revert back to the original backing store should a snapshot fail, that could still leave the guest with multiple disks in an overall inconsistent state, with respect to its other devices.

A proposed new group of commands - Snapshot Sets - will allow multiple devices to be queued for a snapshot, with the snapshot for all devices happening during a single command. This will allow an entire set to have a snapshot taken, and if any one device fails, the entire set reverted back to the original backing store.

The proposed commands are:

  • snapshot_set_create(id)
  • snapshot_set_destroy(id)
  • snapshot_set_add(id, device, snapshot-file, format)
  • snapshot_set_execute(id)

For more details on this proposed API, please see: Atomic Snapshots of Multiple Devices


Example Snapshot Sets Command Sequence

Below we have an example command sequence of an arbitrary number of devices added to a snapshot set, and a snapshot performed of the entire set with the set forgotten at the end of the snapshot:

Guest         Manager                                             QEMU
-------       --------                                           -------
  |               |                                                 |
  |               |                                                 |
  |               o---  snapshot_set_create(1234) --------------->> |
  |               |                                                 |
  |               |                                                 |
  |               o---  snapshot_set_add(1234, "virtio0",           |
  |               |                        "/some/place/my-image0", |
  |               |                        "qcow2" )  ----------->> |
  |               |                                                 |
  |               |                                                 |
  |               o---  snapshot_set_add(1234, "virtio0",           |
  |               |                        "/some/place/my-image0", |
  |               |                        "qcow2" )  ----------->> |
  |               |                                                 |
  |               |                                                 |
  .               .                                                 .
  .               .                                                 .
  .               .                                                 .
  |               o---  snapshot_set_add(1234, "virtioX",           |
  |               |                        "/some/place/my-imageX", |
  |               |                        "qcow2" )  ----------->> |
  |               |                                                 |
  |               |                                                 |
  *<--- freeze ---o                                                 |
  |               |                                                 |
  |               o--- snapshot_set_execute(id) ----------------->> |
  |               |                                                 |
  *<--- thaw -----o                                                 |
  |               |                                                 |
  |               |                                                 |
  |               |                                                 |
  =               =                                                 =

internal snapshots

By making the snapshot-file argument of the monitor and QMP command optional, that could be used as a request to make the snapshot internally instead of to an external file. However, without live block migration of an internal snapshot, there is no way to make a backup of an internal snapshot while still leaving the VM running, so this feature is not planned at the present. For now, the snapshot-file argument is required, and only external snapshots are implemented.

fd passed as target for snapshot file/device

To get around problems with selinux, in particular in conjunction with images based on NFS, there is a wish to be able to pass an already open file descriptor using the getfd interface.

However, this poses a number of problems. When creating the COW headers for the new image file, as the COW header needs to know the file name of the disk image it is pointing to. On Linux this can be obtained through /proc/self/fd/<X> but this is not available on all other operating systems.

An alternative solution would be to extend the getfd interface to take an optional file name. However this would be a hack and open up for errors, as it would allow a broken/hostile guest/QEMU process to create an image which points to the wrong place, but which wouldn't be discovered until the time where the image was actually booted.

Allowing the controlling application to create the COW headers in the new image is not an acceptable solution. It is race prone as the image is not following the backing file which is still in use, and would also cause problems for COW formats where the new COW headers include state as of when they are created.

Separating into multiple commands

There are suggestions for splitting the snapshot process into multiple monitor/QMP commands to allow for asynchronous command processing. The process would be split as follows, using human monitor style commands as example:

(agent) guest-agent-fsfreeze

Call guest agent requesting it to freeze all file systems and flush all I/O requests.

(qemu) freeze-io <blockX>

Instruct QEMU to freeze all I/O processing for block device <blockX>

(qemu) getfd <fd> snapshotfd

Provide file descriptor <fd> and assign it the logical name snapshotfd

(qemu) snapshot-blkdev-async <blockX> fd:snapshotfd <format>

Initiate asynchronous snapshot of device <blockX> to recently provided file descriptor snapshotfd. This will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file, externally creating these will not be allowed!

On completion a completion notification will be returned to the caller, hence this will require QAPI in place for proper async QMP command support.

(qemu) thaw-io <blockX>

Un-freeze I/O processing for device <blockX>

(agent) guest-agent-fsthaw

Call guest agent requesting it to thaw/unfreeze all file systems within the guest.

(qemu) snapshot-blkdev-status <blockX>

Query the current snapshot status of <blockX>. In addition some form of notification of completion will be required.

Note that the caller can loop the process of comments freeze-io, getfd, snapshot-blkdev-async, and thaw-io to snapshot multiple block devices in one guest.

Live merge

See http://wiki.qemu.org/Features/LiveBlockMigration

Other proposed qemu features that solve similar or related problems

Snapshots2 and Livebackup

snapshots2
livebackup