Revision as of 18:29, 8 December 2020

Live Migration Road Map

Complete VmState transition
- CPU port posted upstream
- virtio is posted to vmstate, still missing are virtio-serial lists.
- slirp is posted to vmstate still missing are the toplevel lists , need some testing
Visitors
Device State automatic code generation - for example using annotation like Qt.
Migration downtime calculation :
- The calculation of estimated migration downtime is done with the last bandwidth (how much we sent in the last iteration).The bandwidth can fluctuate between iteration it could be better to try and use an average bandwidth. Because QEMU is single threaded the actual downtime can be greater, if the thread will be busy. Separating the migration thread can help in this case.
- We need a mechanism to detect when we exceed maximal downtime and return to the iteration phase This can be implement using a timer.
Migration speed calculation:
Default migration speed can be too low, this can result in extending the migration and in some case never complete it (https://bugzilla.redhat.com/show_bug.cgi?id=695394).
In the current implementation calculating the actual migration speed is very complex: we use a QemuFileBuffered for the outgoing migration , it can sends the data in two cases: 100 millisecond pasted from the previous packet or the buffer is full (~3.2M).
Tests:
- autotest for migration (already exist), need to add test with guest and host load.
- VmState unit test , save to file/ load for file.
- VmState Sections/Subsections testing.
Bugs !

Sending cold pages aka Page Priority - also SAP (see http://www.linux-kvm.org/wiki/images/c/cb/2011-forum-kvm_hudzia.pdf)
Splitting Bitmap - Juan is working on it.
Migration protocol - The protocol should be separate from device state and data format.It should be bi-directional protocol unlike today.
Remove Buffered File - Too many copies of the data. Will not be needed when migration will have separate thread/s .

@@ Line 32: / Line 32: @@
 ** Use the new util/userfaultfd.c across qemu repo (majorly, postcopy)
 ** disable dirty logging since not necessary for snapshots
+** page request delay measurements
+** faster fault handling (better responsiveness for the user)
 ** support shmem/hugetlbfs
 *** before that, we should disable for shmem/hugetlbfs - they need special care...
 **** try to reuse the postcopy vhost-user framework on negociation and providing fault/wakeup handlers