Features/Livebackup: Difference between revisions
No edit summary |
No edit summary |
||
Line 27: | Line 27: | ||
* From now on, till the livebackup_client destroys the snapshot, each write from the VM is checked by the livebackup interposer. If the blocks written are already marked as dirty in the snapshot struct's dirty blocks bitmap, the original blocks are saved off in a COW file before the VM write is allowed to proceed. | * From now on, till the livebackup_client destroys the snapshot, each write from the VM is checked by the livebackup interposer. If the blocks written are already marked as dirty in the snapshot struct's dirty blocks bitmap, the original blocks are saved off in a COW file before the VM write is allowed to proceed. | ||
* The livebackup_client now iterates through all the dirty blocks in the snapshot, and transfers them over to the backup server. It can either reconstitute the virtual disk image at the time of the backup by writing the blocks to the virtual disk image file, or can save the blocks in a COW redo file of qcow, qcow2 or vmdk format. | * The livebackup_client now iterates through all the dirty blocks in the snapshot, and transfers them over to the backup server. It can either reconstitute the virtual disk image at the time of the backup by writing the blocks to the virtual disk image file, or can save the blocks in a COW redo file of qcow, qcow2 or vmdk format. | ||
* The important thing to note is that the time for which qemu needs to copy the original blocks and save them in a COW file is equal to the time that the livebackup_client program is connected to qemu and transferring blocks. This is hopefully something like 15 minutes out of 24 hours or so. | |||
Revision as of 06:36, 27 April 2011
Livebackup - A full backup solution for making full and incremental disk backups of a running VM
Livebackup provides the ability for an administrator or a management server to use a livebackup_client program to connect to the qemu process and copy the disk blocks that were modified since the last backup was taken.
Contact
Jagane Sundar (jagane at sundar dot org)
Overview
The goal of this project is to add the ability to do full and incremental disk backups of a running VM. These backups will be transferred over a TCP connection to a backup server, and the virtual disk images will be reconstituted there. This project does not transfer the memory contents of the running VM, or the device states of emulated devices, i.e. livebackup is not VM suspend.
Use Cases
Today IaaS cloud platforms such as EC2 provide you with the ability to have two types of virtual disks in VM instances
- ephemeral virtual disks that are lost if there is a hardware failure
- EBS storage volumes which are costly
I think that an efficient disk backup mechanism will enable a third type of virtual disk - one that is backed up, perhaps every hour or so. So a cloud operator using KVM virtual machines can offer three types of VMS:
- an ephemeral VM that is lost if a hardware failure happens
- a backed up VM that can be restored from the last hourly backup
- a fully highly available VM running off of shared storage.
High Level Design
- When qemu block driver is called to open a virtual disk file, it checks for the presence of a file with suffix .livebackupconf, for example when opening the file vdisk0.img, it would look for a file called vdisk0.img.livebackupconf.
- If the livebackupconf file exists, then this disk is part of the backup set, and the block driver for that virtual disk starts tracking blocks that are modified using an in-memory dirty blocks bitmap. This in-memory dirty blocks bitmap is saved to a file called vdisk0.img.dirty_blocks when the VM shuts down. Thus this dirty blocks bitmap is persisted across VM reboots. It is operated on in memory when the VM is running, but saved to disk when the VM shuts down, and read in again when the VM boots.
- qemu starts a livebackup thread, that listens on a TCP port for connections from livebackup_client
- When the operator wants to take an incremental backup of the running VM, he uses the program livebackup_client. This program opens a TCP connection to the qemu process' livebackup thread.
- First, the livebackup_client issues a snapshot command.
- qemu saves the dirty blocks bitmap of each virtual disk in a snapshot struct, and allocates new in-memory dirty blocks map for each virtual disk
- From now on, till the livebackup_client destroys the snapshot, each write from the VM is checked by the livebackup interposer. If the blocks written are already marked as dirty in the snapshot struct's dirty blocks bitmap, the original blocks are saved off in a COW file before the VM write is allowed to proceed.
- The livebackup_client now iterates through all the dirty blocks in the snapshot, and transfers them over to the backup server. It can either reconstitute the virtual disk image at the time of the backup by writing the blocks to the virtual disk image file, or can save the blocks in a COW redo file of qcow, qcow2 or vmdk format.
- The important thing to note is that the time for which qemu needs to copy the original blocks and save them in a COW file is equal to the time that the livebackup_client program is connected to qemu and transferring blocks. This is hopefully something like 15 minutes out of 24 hours or so.
Comparison to other proposed features
Snapshots and Snapshots2
snapshots
snapshots2
In both of these proposals, the original virtual disk is made read-only and the VM writes to a different COW file. After backup of the original virtual disk file is complete, the COW file is merged with the original vdisk file.
Instead, I create an Original-Blocks-COW-file to store the original blocks that are overwritten by the VM everytime the VM performs a write while the backup is in progress. Livebackup copies these underlying blocks from the original virtual disk file before the VM's write to the original virtual disk file is scheduled. The advantage of this is that there is no merge necessary at the end of the backup, we can simply delete the Original-Blocks-COW-file.
Here's Stefan Hajnoczi's feedback (Thanks, Stefan)
Here's what I understand:
1. User takes a snapshot of the disk, QEMU creates old-disk.img backed by the current-disk.img.
2. Guest issues a write A.
3. QEMU reads B from current-disk.img.
4. QEMU writes B to old-disk.img.
5. QEMU writes A to current-disk.img.
6. Guest receives write completion A.
The tricky thing is what happens if there is a failure after Step 5. If writes A and B were unstable writes (no fsync()) then no ordering is guaranteed and perhaps write A reached current-disk.img but write B did not reach old-disk.img. In this case we no longer have a consistent old-disk.img snapshot - we're left with an updated current-disk.img and old-disk.img does not have a copy of the old data.
The solution is to fsync() after Step 4 and before Step 5 but this will hurt performance. We now have an extra read, write, and fsync() on every write.
Other technologies that may be used to solve this problem
LVM snapshots
It is possible to create a new LVM partition for each virtual disk in the VM. When a VM needs to be backed up, each of these LVM partitions is snapshotted. At this point things get messy - I don't really know of a good way to identify the blocks that were modified since the last backup. Also, once these blocks are identified, we need a mechanism to transfer them over a TCP connection to the backup server. Perhaps a way to export the 'dirty blocks' map to userland and use a deamon to transfer the block. Or maybe a kernel thread capable of listening on TCP sockets and transferring the blocks over to the backup client (I don't know if this is possible).