Features/Migration: Difference between revisions

From QEMU
No edit summary
No edit summary
Line 55: Line 55:
=== Live upgrade ===
=== Live upgrade ===


qemu upgrade and kernel upgrade without copying guest pages.  There is an implementation on tree, but there are no tests in qtest or in avocado.  And we need to measure how well it works.
qemu upgrade and kernel upgrade without copying guest pages.  There is an implementation on tree, but there are no tests in qtest or in avocado.  And we need to measure how well it works.  Oracle has another implementation.


=== Double layer bitmap ===
=== Double layer bitmap ===
Line 65: Line 65:
=== Use KTLS in QEMU live migration ===
=== Use KTLS in QEMU live migration ===


We need to see if we can reuse nvme over fabrics.
We need to see if we can reuse nvme over fabrics  


=== use TLS for communication (Volunteer?) ===
=== use TLS for communication (Volunteer?) ===

Revision as of 15:59, 12 October 2023

Summary

Migration roadmap.

Owner

  • Name: Juan Quintela
  • Email: quintela@redhat.com

Detailed Summary

This page describes what are the changes planned for migration and who is supposed to do each of the changes. If you want to collaborate on any of the items don't doubt to contact me directly or asking on the qemu mailing list.

Status

This is the roadmap, features are integrated upstream as they are done.

ToDo list

Detect pages that are always dirty

Right now, we send every dirty page that we send. But some of the pages are always dirty. So we can do better than that: - Create an array that has 8bits (or perhaps 4bits) per page - Make it a counter of how many times the page has been found dirty - If it has been dirty more than 80%? of the time, just don't sent it, leave for the complete stage.

Implementing it, and especially meassuring that it helps is a good idea towards prioritize what and when send each page.

Create a thread for migration destination

Right now it is a coroutine. That makes some things more complicated. But more importantly, once that you move over 10Gigabits/second, the coroutine is the bottleneck, and we can migrate faster.

Advantages: - We can use synchronous operations Disadvantage

Move xbzrle to multifd

When you are doing inter-data centers migration, anything that you can got to help is welcome. In this cases xbzrle 'could' probably help. So, why is this here? a- Move it to multifd, we can implement it the same that zlib or zstd. b- We need to test it with more cache. I guess that 25% to 50% of RAM is not out of question. Current cache of 64MB is a joke for current workloads. c - We need to measure that it helps.

Postcopy multifd

This project has several parts: a - make postcopy works with current multifd, we need to flush the channels at the switch. b - make postcopy channel just another channel, created from the beggining. That makes errors and setup much simpler, and we only start transferring data when we start postcopy. We should also use multifd protocoly for pages. c - Switch bitmap walking from multifd send thread to multifd recv thread. that makes synchronization with postcopy petititons on destination, and much, much easier.

Live upgrade

qemu upgrade and kernel upgrade without copying guest pages. There is an implementation on tree, but there are no tests in qtest or in avocado. And we need to measure how well it works. Oracle has another implementation.

Double layer bitmap

Use a bitmap of the migration bitmap. Using 1 bit each 64/128bits (needs to see the difference). Bits on the lower layer are only valid if the top level bitmap is 1. This is useful for Multiterabyte machine migrations, that the bitmap is too big, and lots of bits are zero. Also helps when we have to merge vfio/vhost bitmaps that are normally quite empty.

Once we detect that it helps inside QEMU, we can change vhost/kvm interfaces to make use of it.

Use KTLS in QEMU live migration

We need to see if we can reuse nvme over fabrics

use TLS for communication (Volunteer?)

Right now all migration communication are done through clear channels. If you need to encrypt the channel, you need to use an external program. The problem with this is the performance loss. We need to transfer all data to another program, and then to the network.

Improve migration bitmap handling (Volunteer?)

Split bitmap use. We always use all bitmaps: VGA, CODE & Migration, independently of what we are doing. We could improve it with:

  • VGA: only add it to VGA framebuffers
  • MIGRATION: We only need to allocate/handle it during migration.
  • CODE: Only needed with TCG, no need at all for KVM


use TLS for communication (Volunteer?)

Right now all migration communication are done through clear channels. If you need to encrypt the channel, you need to use an external program. The problem with this is the performance loss. We need to transfer all data to another program, and then to the network.

Improve migration bitmap handling (Volunteer?)

Split bitmap use. We always use all bitmaps: VGA, CODE & Migration, independently of what we are doing. We could improve it with:

  • VGA: only add it to VGA framebuffers
  • MIGRATION: We only need to allocate/handle it during migration.
  • CODE: Only needed with TCG, no need at all for KVM

KVM migration bitmap (Volunteer?)

  • We could use the native bitmap format, and change/improve kernel to only set bits for dirty pages, not cleaning clean ones.
  • We could change kernel log code to set the bitmap for "used" pages when we start logging, this would allow us to not migration zero pages at all, right now we have to "allocate" the pages, to check that they are zero, and then sent them as zero pages

Abstract QEMUFile use (Google Summer of Code Project)

Note this has a lot of overlap with the Visitor/BER patches series of Dave Gilbert that abstracts the format out of QEMUFile

We can change QEMUFile use to something like:

struct MigrationChannel {
    void *opaque;
    uint32 get_uint32(struct MigrationChannel *)
    int put_uint32(structMigrationCHannel *)
/* the same for all the basic types */
}

And then change all ocurrences of:

qemu_get_sbe32(f, &foo);

into

foo = MC->get_uint32(MC);

Where migration channel has been initialized properly for QEMUFile. This would make trivial to change the protocol format to anything else.

Continuous VMState testing (GSOC project)

Note this has overlaps with the fault tolerance/micro checkpoint work and the reverse-emulation/debugging schemes that both take regular snapshots

Add a new flag that during normal operation at random intervals:

  • stops the VM
  • saves all device state to a buffer
  • reset all devices
  • load all device state from that buffer

This way, we could test that we can migrate at any moment, and if we have a problem, we know exactly what the device state that caused the problem is.


Finish conversion to VMState.

Pending things are:

  • send generated fields
  • rebase cpu ports to latest (need previous one)
  • virtio: exist very old version (very old means as of more than 1 year ago). Problem is how to describe lists easily in VMState
  • slirp: some patches exist, same previous problem, how to handle easily lists. Slirp is basically list of lists of lists.
  • misc devices: almost all of them don't work on a migrated platform, so we could change them.

Protocol changes

  • Add size + checksum to sections. This is one incompatible change and needs further thought.
  • Make embedded sections real sections, with headers. This will allow us to version internal state.
  • Unit testing. In colaboration with qdev, allow devices to be tested alone with old/new migration versions/subsections.
  • Change to BER/ASN.1?

Improve testing

  • Add testing for all VMSTATE_FOO() macros and interpreter
    • Patches posted, pending integration
  • How to be sure that ideas we are compatible (or not) with previous versions
    • Amit is working on that with a tester that outputs the code from all devices, and check what is different from two versions.

Define target machine in the monitor

This would allow us to sent the configuration through the migration channel. This needs very big changes in qemu, but we are heading on that direction.

Fault Tolerance (David Gilbert)

Adapting QEMU To fault tolerance.

Documentation

Code

The code still not merged is currently kept in several branches of this git repository: