Features/FaultTolerance

Summary

Kemari synchronizes VMs on different physical machines to achieve fault tolerance with QEMU.

Owner

Name: Yoshi Tamura
Email: tamura.yoshiaki@lab.ntt.co.jp

Description

The goal of Kemari is to provide a fault tolerant platform for virtualization environments, so that in the event of a hardware failure the virtual machine fails over from compromised to properly operating hardware (a physical machine) in a way that is completely transparent to the guest operating system.

In contrast to hardware based fault tolerant servers and HA servers, by abstracting hardware using virtualization, Kemari can be used on off-the-shelf hardware and no application modifications are needed.

Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. An interesting implication of this is that during normal operation only the active node is actually executing code.

Synchronization model

Kemari uses events that modify externally visible state as synchronizations points to minimize the amount of traffic that goes over the wire while still maintaining the FT pair consistent at all times. This means that all outgoing I/O needs to be trapped and sent to the fallback host before the primary is resumed, so that it can be replayed in the face of hardware failure. The basic assumption here is that outgoing I/O operations are idempotent, which is usually true for disk I/O and reliable network protocols.

Implementation

Kemari consists of two components, event-tap and ft-transaction, and leverages existing live migration facility in QEMU.

event-tap

Controls when to start VM sync
A thin layer sits between device emulation and net/block layer
The last event is replayed on the secondary upon failover

ft-transaction

Consists of sender/receiver for VM sync
Sender encapsulates VM coming form QEMUFile with transaction headers
Receiver decapsulates and buffers data to update VM transactionally

Failover

Failover must be kicked whenever a failure in the primary node is detected. Currently, we don't have automatic failover mechanism, but in the long term we have plans to integrate Kemari with the major HA stacks (Pacemaker/Corosync, RHCS, etc).

Ideally, we would like to leverage the hardware failure detection capabilities of newish x86 hardware to trigger failover.

Howto

Get the code from the repository below, build and deploy it to primary/secondary servers. Please prepare a SAN to place the guest image same as live migration.
Start secondary side with, -incoming kemari:<protocol>:<address>:<port>
Boot the guest, and start Kemari to synchronize the VM by running the following command in QEMU. Just add "kemari:" in front of protocol to usual migrate command.

 migrate -d kemari:tcp:192.168.0.20:4444

TODO

Item	Status	Who
I/O event queuing	In progress	Yoshi
Asynchronous VM transfer	In progress	Yoshi
Storage replication QEMU's block migration or others	In progress	Yoshi
Writing tests using KVM Test framework
Integration with HA stack (Pacemaker/Corosync)
Zero copy VM transfer w/ RDMA or others
SMP performance enhancements

If you have items you're interested, feel free to add them to the list.

Status

Currently working on a patchset that can be merged into mainline QEMU (hopefully 0.14) to provide a basic set of Kemari features.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari

Links