Features/FaultTolerance
Summary
Kemari synchronizes VMs on different physical machines to achieve fault tolerance with QEMU.
Owner
- Name: Yoshi Tamura
- Email: tamura.yoshiaki@lab.ntt.co.jp
Description
The goal of Kemari is to provide a fault tolerant platform for virtualization environments, so that in the event of a hardware failure the virtual machine fails over from compromised to properly operating hardware (a physical machine) in a way that is completely transparent to the guest operating system.
In contrast to hardware based fault tolerant servers and HA servers, by abstracting hardware using virtualization, Kemari can be used on off-the-shelf hardware and no application modifications are needed.
Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. An interesting implication of this is that during normal operation only the active node is actually executing code.
Synchronization model
Kemari uses events that modify externally visible state as synchronizations points to minimize the amount of traffic that goes over the wire while still maintaining the FT pair consistent at all times. This means that all outgoing I/O needs to be trapped and sent to the fallback host before the primary is resumed, so that it can be replayed in the face of hardware failure. The basic assumption here is that outgoing I/O operations are idempotent, which is usually true for disk I/O and reliable network protocols.
Implementation
Kemari consists of two components, event-tap and ft-transaction, and leverages existing live migration facility in QEMU.
- event-tap
- Controls when to start VM sync
- A thin layer sits between device emulation and net/block layer
- The last event is replayed on the secondary upon failover
- ft-transaction
- Consists of sender/receiver for VM sync
- Sender encapsulates VM coming form QEMUFile with transaction headers
- Receiver decapsulates and buffers data to update VM transactionally
- Failover
Failover must be kicked whenever a failure in the primary node is detected. Currently, we don't have automatic failover mechanism, but in the long term we have plans to integrate Kemari with the major HA stacks (Pacemaker/Corosync, RHCS, etc).
Ideally, we would like to leverage the hardware failure detection capabilities of newish x86 hardware to trigger failover.
Howto
- Get the code from the repository below, build and deploy it to primary/secondary servers
- Start secondary side with, -incoming <protocol>:<address>:<port>,ft_mode
- Boot the guest, and start Kemari to synchronize the VM by running the following command in QEMU. Just add "-k" option to usual migrate command.
migrate -d -k tcp:192.168.0.20:4444
TODO
Item | Status | Who |
---|---|---|
I/O event queuing | In progress | Yoshi |
Asynchronous VM transfer | In progress | Yoshi |
Storage replication QEMU's block migration or others | In progress | Yoshi |
Writing tests using KVM Test framework | ||
Integration with HA stack (Pacemaker/Corosync) | ||
Zero copy VM transfer w/ RDMA or others | ||
SMP performance enhancements |
If you have items you're interested, feel free to add them to the list.
Status
Currently working on a patchset that can be merged into mainline QEMU (hopefully 0.14) to provide a basic set of Kemari features.
git://kemari.git.sourceforge.net/gitroot/kemari/kemari