Features/PostcopyRecovery: Difference between revisions
(Removing the TBD because it's finished in commit "0c26781c09 migration: Sync requested pages after postcopy recovery") |
mNo edit summary |
||
Line 75: | Line 75: | ||
Note that the postcopy migration can be interrupted by many times, we can resume the migration using the same steps described above until the migration completes. | Note that the postcopy migration can be interrupted by many times, we can resume the migration using the same steps described above until the migration completes. | ||
[[Category:Completed feature pages]] |
Latest revision as of 15:36, 12 April 2022
Introduction
Postcopy recovery allows the postcopy migration to be interrupted, and also the migration can be resumed at any time when the migration channel is ready again (by either fixing up the broken channel, or provide a new channel).
Below is an example of how to use the postcopy recovery feature using QMP commands.
Example Commands
QMP handshakes
The QMP handshake on source side should be as simple as usual:
{"execute": "qmp_capabilities" } {"return": {}}
However on destination side, this needs to be run instead (before postcopy starts), to initialize the QMP channel:
{"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } {"return": {}}
Out-of-band messages are required on destination host for postcopy recovery otherwise we may face potential lockups on the destination node so that the QMP channel can hang itself.
Enable postcopy
This needs to be run on both sides. It enables postcopy on both sides of the VMs.
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"capability": "postcopy-ram"}]}} {"return": {}}
Start precopy migration
This needs to be run on source side only. It starts the general precopy migration.
{"execute": "migrate", "arguments": {"uri": "tcp:192.168.0.2:9000"}} {"return": {}}
Start postcopy migration
This needs to be run on source side only. It switches the current precopy migration to postcopy migration so that the destination VM can start to run without migrating all the pages.
{"execute": "migrate-start-postcopy"} {"return": {}}
Break the migration channel
We can try to either unplug the wire of the migration channel to emulate an interrupt of migration, or we can use the "migrate-pause" command to emulate network interruption from the software layer. If the latter, the command only needs to be run on source side only as:
{"execute": "migrate-pause"} {"return": {}}
Check migration status
This can be run on both sides. After the migration was interrupted, we should see that both sides of VM went into "postcopy-paused" state.
{"execute": "query-migrate"} {"return": {"status": "postcopy-paused", "socket-address": [{"path": "...", "type": "..."}]}}
Resume the interrupted postcopy migration
This should only be run on destination side. It rebuilds a migration channel. When using this command, we can either use the previous listening port if the network recovered, or we can simply use a new migration channel that can continue the migration. Here a new channel is used.
Note that here we used "exec-oob" rather than "execute" to execute an out-of-band message, so that the command will be executed in the isolated iothread channel.
{"exec-oob": "migrate-recover", "arguments": {"uri": "tcp:192.168.0.2:9001"}, "id": "recover-cmd"} {"return": {}, "id": "recover-cmd"}
Then to resume the migration, we need to migrate again on source side with "resume" flag. Note that we should use the new channel rather than the old one if it's not the same:
{"execute": "migrate", "arguments": {"resume": true, "uri": "tcp:192.168.0.2:9001"}} {"return": {}}
Wait until migration completes
Check on both sides that the migration can be completed normally.
Note that the postcopy migration can be interrupted by many times, we can resume the migration using the same steps described above until the migration completes.