Features/Snapshots
Live Snapshots
This document is describing the current design of live snapshots for QEMU. It is a work in progress and things may change as we progress.
Overall concept
The idea is to be able to issue a command to QEMU via the monitor or QMP, which causes QEMU to create a new snapshot image with the original image as the backing file, mounted read-only. This will allow the original image file to be backed up.
Roll-back to a previous version requires one to boot from the previous backing file, at which point the snapshot file becomes invalid. Unfortunately there is no way to detect that a backing file has been booted, making it important for administrators to take care to not rely on snapshot files being valid after a roll-back.
The snapshot image will have to be in a format which support backing files, ie QCOW2 (and QED when the code is integrated), however the original image can be of any supported format. Ie. it is possible to make a QCOW2 snapshot of a RAW image, or a QED snapshot of a QED image.
Guest Agent
Certain operations in the snapshot process can be optimized or improved through support from within the guest. These features will be implemented in the Guest Agent. Please check the guest Guest Agent page for design and implementation details.
The two main guest agent features of interest to live snapshots are:
- File system freeze (fsfreeze/fsthaw): This puts the guest file systems into a consistent state, avoiding the need for fsck next time they are mounted.
- Guest application notification: This allows guest applications to register and be notified prior to a snapshot, in order for them to allow flushing their data to disk. This is a future feature!
As of this writing (July 25, 2011), communication with the QEMU guest agent is performed via a virtio serial channel. Commands are sent over the channel encoded as QMP commands, and replies are encoded as QMP replies. There are future plans to implement a monitor proxy command, allowing these commands to be accessible via the human monitor as well.
High level design
Snapshot procedure:
(qemu) cont
Run the guest
(agent) guest-agent-fsfreeze
Call guest agent requesting it to freeze all file systems and flush all I/O requests (optional)
(qemu) stop
Pause guest. Optional - only required if admin tool expects taking a system dump of the guest at the same time to match the disks snapshot.
(qemu) snapshot_blkdev <blockX> <snapshot-file> <format>
Initiate synchronous snapshot of device <blockX> to new device snapshot-file. This will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file.
During snapshot creation the guest will momentarily be paused by QEMU. Pending I/Os will be flushed to disk, the COW headers will be created in the snapshot file/device, and QEMU will replace the file backing device <blockX> with the new snapshot file. On completion of the command, the guest will be unpaused before the command returns.
This command is repeated for each device that is to be snapshot.
(qemu) cont
Un-pause the guest (optional - as per stop command above).
(agent) guest-agent-fsthaw
Call guest agent requesting it to thaw/unfreeze all file systems within the guest.
Monitor command
The monitor command is designed to be flexible enough to handle both internal and external snapshots, as well as snapshots to various different snapshot file formats.
snapshot_blkdev device snapshot-file [format]:
device | block device to snapshot |
snapshot-file | target snapshot file |
format | format of snapshot image, valid formats are QCOW2 & QED (when merged upstream If not specified, the image will default to QCOW2. |
QMP command
The QMP command matches the behaviour of the human monitor command, except it is named slightly differently to match the fact that the command is synchronous.
blockdev-snapshot-sync device snapshot-file [format]
device | device name to snapshot (json-string) |
snapshot-file | name of new image file (json-string) |
format | format of new image (json-string, optional) |
Future features
Internal snapshots to images which support internal snapshots (QCOW2 & QED) are not expected to be supported initially.
There have been requests and suggestions for a number of alternative and enhanced interfaces for accessing live snapshots as follows:
internal snapshots
By making the snapshot-file argument of the monitor and QMP command optional, that could be used as a request to make the snapshot internally instead of to an external file. However, without live block migration of an internal snapshot, there is no way to make a backup of an internal snapshot while still leaving the VM running, so this feature is not planned at the present. For now, the snapshot-file argument is required, and only external snapshots are implemented.
fd passed as target for snapshot file/device
To get around problems with selinux, in particular in conjunction with images based on NFS, there is a wish to be able to pass an already open file descriptor using the getfd interface.
However, this poses a number of problems. When creating the COW headers for the new image file, as the COW header needs to know the file name of the disk image it is pointing to. On Linux this can be obtained through /proc/self/fd/<X> but this is not available on all other operating systems.
An alternative solution would be to extend the getfd interface to take an optional file name. However this is ugly and error prone, as it would allow a broken/hostile controller to create an image which points to the wrong place, but which wouldn't be discovered until the time where the image was actually being booted from.
Allowing the controlling application to create the COW headers in the new image is not an acceptable solution. It is race prone and would cause problems for COW formats where the new COW headers include state as of when they are created.
Separating into multiple commands
There are suggestions for splitting the snapshot process into multiple monitor/QMP commands. The process would be split as follows, using human monitor style commands as example:
(agent) guest-agent-fsfreeze
Call guest agent requesting it to freeze all file systems and flush all I/O requests.
(qemu) freeze-io <blockX>
Instruct QEMU to freeze all I/O processing for block device <blockX>
(qemu) getfd <fd> snapshotfd
Provide file descriptor <fd> and assign it the logical name snapshotfd
(qemu) snapshot-blkdev-async <blockX> fd:snapshotfd <format>
Initiate asynchronous snapshot of device <blockX> to recently provided file descriptor snapshotfd. This will write the COW headers to the snapshot device, and pivot the block device <blockX> to point to the new device, using the original file/device as it's backing file. It is important to note that it is QEMU which will generate the COW headers in the new snapshot file, externally creating these will not be allowed!
On completion a completion notification will be returned to the caller, hence this will require QAPI in place for proper async QMP command support.
(qemu) thaw-io <blockX>
Un-freeze I/O processing for device <blockX>
(agent) guest-agent-fsthaw
Call guest agent requesting it to thaw/unfreeze all file systems within the guest.
(qemu) snapshot-blkdev-status <blockX>
Query the current snapshot status of <blockX>. In addition some form of notification of completion will be required.
Note that the caller can loop the process of comments freeze-io, getfd, snapshot-blkdev-async, and thaw-io to snapshot multiple block devices in one guest.
Live merge
See http://wiki.qemu.org/Features/LiveBlockMigration