VM Snapshot enhancement
This feature will enhance VM snapshot functionality, to make it possible lively taking internal/external snapshots, and make it works better with underlining components such as LVM to take snapshot by them instead of qemu. Qemu then will have a completed picture about snapshot methods:
--qemu manage snapshots:
--qemu do not manage snapshots:
|----block drain case
- Name: Wenchao Xia
- Email: email@example.com, firstname.lastname@example.org
This feature would provide APIs that can do:
- 1 block device live snapshot as internal/external data or drain I/O to cooperate with external components, export sync API for all type.
- 2 vmstate live save as internal/external data, export async API for external data, with fixed size.
- 3 combination of above, and make screendump at the correct time.
First consider what is needed to take snapshot for an common application, or qemu:
Pic 1, principle to make snapshot for an application
As the picture shows, generally the application need to ensure disk data is consistent at up level, and then make a clone/mirror of the disk data at that time. So the disk job doesn't have to be done by qemu, qemu can just ensure data's consistence and let other completed the job remained.
Often snapshots are used frequently to make data recoverable at some time point, for example, a management program may want following for each VM, which enables data restore for multiple time point, while keeps the space used little and VM online with good performance all the time. To make things clear, in this text, I'll call a program want to form following as "incremental backup application".
Pic 2, general goal on backup server
Note the key requirements of it: delta data and base are separated, vmstate data are standalone, and a hidden one:VM need good performance.
As show above, how deep it can recover to, resulting a choice of whether to save vmstate. Who take the action, resulting a choice of whether qemu write/read the snapshot itself. How to take the action resulting external/internal cases. There are basically three choices of them, as following:
pic3, co-operation relationship in the big picture
take LVM2 as an example as third party tool, vmstate save are optional, following are the typical cases:
Common Disadvantages now: vmstate size is not predictable in lively saving.
--Type one, qemu manage snapshot:
Common advantages: less dependence, qemu manages all, workable on most systems.
Common disadvantages: lower level component have no chance to take the job.
- Case 1: external image snapshot data + external vmstate data
This is what qemu 1.3 support. Step:
1 lively save vmstate to external place. 2 at 95%, freeze guest FS by GA. 3 freeze qemu block I/O by pause VM or queue I/O. 4 blkdev-snapshot-sync each block device, get readonly image. 5 restore qemu block I/O by resume VM or flush queued I/O. 6 restore guest FS by GA. 7 Copy out readonly image. 8 Copy out vmstate file, if vmstate need to be stored in another place. 9 merge the image files.
Directly we have readonly standalone base / delta image files, "incremental backup application" can directly copy out them to form a chain.
External chain are slower in deleting(merging), reading that internal snapshot.
- Case 2: internal image snapshot data + internal vmstate data
This is what qemu 1.3 supported, but not lively. Step:
1 lively save vmstate to internal qcow2 file. 2 at 95%, freeze guest FS by GA. 3 freeze qemu block I/O by pause VM or queue I/O. 4 blkdev-snapshot-internal-sync each block device. 5 restore qemu block I/O by resume VM or flush queued I/O. 6 restore guest FS by GA. 7 export base/delta internal data by an qemu API. 8 export vmstate internal data by an qemu API. 9 delete the internal snapshot.
Better performance for running VM and deleting is faster, than external case.
The data are not separated, a tool is needed to export them, which seems not available in qemu now.
- Case 3: internal image snapshot data + external vmstate data
1 lively save vmstate to external qcow2 file. 2 at 95%, freeze guest FS by GA. 3 freeze qemu block I/O by pause VM or queue I/O. 4 blkdev-snapshot-internal-sync each block device. 5 restore qemu block I/O by resume VM or flush queued I/O. 6 restore guest FS by GA. 7 export base/delta internal data by an qemu API. 8 Copy out vmstate file, if vmstate need to be stored in another place. 9 delete the internal snapshot.
Same with case 2, except vmstate is already stand alone.
Same with case 2, May have not paired vmstate with block snapshots, but not a problem if management stack handle it.
Small summary: Case 3 seems the best, for it have best performance, and qemu need to provide a way to transform internal snapshot and external snapshot each other, that is something like: qemu_export_internal_delta(int *id, char *buf). But I haven't confirm if it is possible to export base data lively in theory of qcow2. If not we may need to combine external/internal steps to form a incremental back up chain with better performance.
--Type two, qemu do not manage snapshot:
- Case 4: block I/O drain + external vmstate data
1 lively save vmstate to external place. 2 at 95%, freeze guest FS by GA. 3 block I/O drain and then pause VM or queue I/O. 4 3rd part components create snapshot. 5 restore qemu block I/O by resume VM or flush queued I/O. 6 restore guest FS by GA. 7 3rd part component to get delta/base data. 8 Copy out vmstate file, if vmstate need to be stored in another place. 9 3rd part component to merge its snapshot.
Faster, backing chain and block bitmap management can be offloaded from qemu to lower component, this also gives a chance to let lower software/hardware accelerate it.
Need extra components.
As a summary: For qemu managing snapshot type, recommend case 1 for 1st generation of incremental backup, implement case 3 to get better performance. For qemu not managing snapshot type, implement the missing part and let the dedicated software/hardware do it. It is also possible to define some interface(snapshot ioctl call for a host block devie) and let 3rd part implement it, as an alternative method.