ToDo/LiveMigration: Difference between revisions
No edit summary |
Stefan Weil (talk | contribs) m (Replace Qemu->QEMU) |
||
Line 11: | Line 11: | ||
* Device State automatic code generation - for example using annotation like Qt. | * Device State automatic code generation - for example using annotation like Qt. | ||
* Migration downtime calculation : | * Migration downtime calculation : | ||
** The calculation of estimated migration downtime is done with the last bandwidth (how much we sent in the last iteration).The bandwidth can fluctuate between iteration it could be better to try and use an average bandwidth.Because | ** The calculation of estimated migration downtime is done with the last bandwidth (how much we sent in the last iteration).The bandwidth can fluctuate between iteration it could be better to try and use an average bandwidth. Because QEMU is single threaded the actual downtime can be greater, if the thread will be busy. Separating the migration thread can help in this case. | ||
** We need a mechanism to detect when we exceed maximal downtime and return to the iteration phase This can be implement using a timer. | ** We need a mechanism to detect when we exceed maximal downtime and return to the iteration phase This can be implement using a timer. | ||
* Migration speed calculation: | * Migration speed calculation: |
Revision as of 17:15, 23 May 2012
Live Migration Road Map
Stability
- Complete VmState transition
- CPU port posted upstream
- virtio is posted to vmstate, still missing are virtio-serial lists.
- slirp is posted to vmstate still missing are the toplevel lists , need some testing
- Visitors
- Device State automatic code generation - for example using annotation like Qt.
- Migration downtime calculation :
- The calculation of estimated migration downtime is done with the last bandwidth (how much we sent in the last iteration).The bandwidth can fluctuate between iteration it could be better to try and use an average bandwidth. Because QEMU is single threaded the actual downtime can be greater, if the thread will be busy. Separating the migration thread can help in this case.
- We need a mechanism to detect when we exceed maximal downtime and return to the iteration phase This can be implement using a timer.
- Migration speed calculation:
- Default migration speed can be too low, this can result in extending the migration and in some case never complete it (https://bugzilla.redhat.com/show_bug.cgi?id=695394).
- In the current implementation calculating the actual migration speed is very complex: we use a QemuFileBuffered for the outgoing migration , it can sends the data in two cases: 100 millisecond pasted from the previous packet or the buffer is full (~3.2M).
- Tests:
- autotest for migration (already exist), need to add test with guest and host load.
- VmState unit test , save to file/ load for file.
- VmState Sections/Subsections testing.
- Bugs !
Performance
- XBRLE page delta compression - SAP patches http://lists.gnu.org/archive/html/qemu-devel/2011-07/msg00474.html http://www.linux-kvm.org/wiki/images/c/cb/2011-forum-kvm_hudzia.pdf. patch need to be split into smaller patches. For backward compatibly we need to check if the other side supports the compression.
- Sending cold pages aka Page Priority - also SAP (see http://www.linux-kvm.org/wiki/images/c/cb/2011-forum-kvm_hudzia.pdf)
- Migration threads - Juan is working on it.
- Splitting Bitmap - Juan is working on it.
- Migration protocol - The protocol should be separate from device state and data format.It should be bi-directional protocol unlike today.
- Post copy for guest with very large memory - http://www.linux-kvm.org/wiki/images/e/ed/2011-forum-yabusame-postcopy-migration.pdf
- RDMA
- Remove Buffered File - Too many copies of the data. Will not be needed when migration will have separate thread/s .