Features/PostcopyRecovery: Difference between revisions

From QEMU
(Postcopy Recovery Procedures)
No edit summary
Line 1: Line 1:
=== QMP handshake ===
= Introduction =
{"execute": "qmp_capabilities"}<br>
{"return": {}}<br>


=== Enable postcopy ===
Postcopy recovery allows the postcopy migration to be interrupted, and also the migration can be resumed at any time when the migration channel is ready again (by either fixing up the broken channel, or provide a new channel).
{"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "postcopy-ram"}]}}<br>
{"return": {}}<br>


=== Start precopy migration ===
Below is an example of how to use the postcopy recovery feature using QMP commands.
{"execute": "migrate", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket"}}<br>
{"return": {}}<br>


=== Start postcopy migration ===
= Example Commands =
{"execute": "migrate-start-postcopy"}<br>
{"return": {}}<br>
{"timestamp": {"seconds": 1559730429, "microseconds": 585575}, "event": "STOP"}<br>
{"timestamp": {"seconds": 1559730429, "microseconds": 613592}, "event": "RESUME"}<br>


=== Manually pause the postcopy migration before it completes.  This emulates a network failure. ===
== QMP handshakes ==
{"execute": "migrate-pause"}<br>
{"return": {}}<br>


=== Check the status of migration, should be "postcopy-paused" ===
This needs to be run on '''both''' sides.  It initialize the QMP channels.
{"execute": "query-migrate"}<br>
{"return": {"status": "postcopy-paused", "socket-address": [{"path": "/tmp/migration-test-12K6jV/migsocket", "type": "unix"}]}}<br>


=== Try to continue the previous postcopy migration, until it finishes ===
  {"execute": "qmp_capabilities"}
{"execute": "migrate-recover", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket-recover"}, "id": "recover-cmd"}<br>
  {"return": {}}
{"timestamp": {"seconds": 1559730429, "microseconds": 620072}, "event": "MIGRATION", "data": {"status": "setup"}}<br>
{"return": {}, "id": "recover-cmd"}<br>


=== Wait until migration completes (status becomes "completed") ===
== Enable postcopy ==
{"execute": "query-migrate"}<br>
 
{"return": {"postcopy-blocktime": 89, "status": "completed", "postcopy-vcpu-blocktime": [90]}}<br>
This needs to be run on '''both''' sides.  It enables postcopy on both sides of the VMs.
 
  {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "postcopy-ram"}]}}
  {"return": {}}
 
== Start precopy migration ==
 
This needs to be run on '''source''' side only.  It starts the general precopy migration.
 
  {"execute": "migrate", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket"}}
  {"return": {}}
 
== Start postcopy migration ==
 
This needs to be run on '''source''' side only.  It switches the current precopy migration to postcopy migration so that the destination VM can start to run without migrating all the pages.
 
  {"execute": "migrate-start-postcopy"}
  {"return": {}}
  {"timestamp": {"seconds": 1559730429, "microseconds": 585575}, "event": "STOP"}
  {"timestamp": {"seconds": 1559730429, "microseconds": 613592}, "event": "RESUME"}
 
== Break the migration channel ==
 
You can try to unplug the wire of the migration channel to emulate an interrupt of migration.  Or you can use the "migrate-pause" command to emulate from teh software layer.  The command only needs to be run on '''source''' side only.
 
  {"execute": "migrate-pause"}
  {"return": {}}
 
== Check migration status ==
 
This can be run on '''both''' sides.  After the migration was interrupted, you should see that both sides of VM went into "postcopy-paused" state.
 
  {"execute": "query-migrate"}
  {"return": {"status": "postcopy-paused", "socket-address": [{"path": "/tmp/migration-test-12K6jV/migsocket", "type": "unix"}]}}
 
== Resume the interrupted postcopy migration ==
 
This should only be run on '''destination''' side.  It rebuilds a migration channel.  When using this command, you can either use the previous listening port if the network recovered.  Or, we can simply use a new migration channel that can continue the migration.  Here a new channel is used.
 
  {"execute": "migrate-recover", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket-recover"}, "id": "recover-cmd"}
  {"timestamp": {"seconds": 1559730429, "microseconds": 620072}, "event": "MIGRATION", "data": {"status": "setup"}}
  {"return": {}, "id": "recover-cmd"}
 
Then to resume the migration, we need to run this on '''source''' side.  Note that we should use the new channel rather than the old one if it's not the same:
 
  {"execute": "migrate", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket-recover"}}
  {"return": {}}
 
== Wait until migration completes ==
 
Check on '''both''' sides that the migration can be completed normally.
 
  {"execute": "query-migrate"}
  {"return": {"postcopy-blocktime": 89, "status": "completed", "postcopy-vcpu-blocktime": [90]}}
 
Note that the postcopy migration can be interrupted by many times, we can resume the migration using the same steps described above until the migration completes.

Revision as of 07:00, 5 September 2019

Introduction

Postcopy recovery allows the postcopy migration to be interrupted, and also the migration can be resumed at any time when the migration channel is ready again (by either fixing up the broken channel, or provide a new channel).

Below is an example of how to use the postcopy recovery feature using QMP commands.

Example Commands

QMP handshakes

This needs to be run on both sides. It initialize the QMP channels.

 {"execute": "qmp_capabilities"}
 {"return": {}}

Enable postcopy

This needs to be run on both sides. It enables postcopy on both sides of the VMs.

 {"execute": "migrate-set-capabilities", "arguments": {"capabilities": [{"state": true, "capability": "postcopy-ram"}]}}
 {"return": {}}

Start precopy migration

This needs to be run on source side only. It starts the general precopy migration.

 {"execute": "migrate", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket"}}
 {"return": {}}

Start postcopy migration

This needs to be run on source side only. It switches the current precopy migration to postcopy migration so that the destination VM can start to run without migrating all the pages.

 {"execute": "migrate-start-postcopy"}
 {"return": {}}
 {"timestamp": {"seconds": 1559730429, "microseconds": 585575}, "event": "STOP"}
 {"timestamp": {"seconds": 1559730429, "microseconds": 613592}, "event": "RESUME"}

Break the migration channel

You can try to unplug the wire of the migration channel to emulate an interrupt of migration. Or you can use the "migrate-pause" command to emulate from teh software layer. The command only needs to be run on source side only.

 {"execute": "migrate-pause"}
 {"return": {}}

Check migration status

This can be run on both sides. After the migration was interrupted, you should see that both sides of VM went into "postcopy-paused" state.

 {"execute": "query-migrate"}
 {"return": {"status": "postcopy-paused", "socket-address": [{"path": "/tmp/migration-test-12K6jV/migsocket", "type": "unix"}]}}

Resume the interrupted postcopy migration

This should only be run on destination side. It rebuilds a migration channel. When using this command, you can either use the previous listening port if the network recovered. Or, we can simply use a new migration channel that can continue the migration. Here a new channel is used.

 {"execute": "migrate-recover", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket-recover"}, "id": "recover-cmd"}
 {"timestamp": {"seconds": 1559730429, "microseconds": 620072}, "event": "MIGRATION", "data": {"status": "setup"}}
 {"return": {}, "id": "recover-cmd"}

Then to resume the migration, we need to run this on source side. Note that we should use the new channel rather than the old one if it's not the same:

 {"execute": "migrate", "arguments": {"uri": "unix:/tmp/migration-test-12K6jV/migsocket-recover"}}
 {"return": {}}

Wait until migration completes

Check on both sides that the migration can be completed normally.

 {"execute": "query-migrate"}
 {"return": {"postcopy-blocktime": 89, "status": "completed", "postcopy-vcpu-blocktime": [90]}}

Note that the postcopy migration can be interrupted by many times, we can resume the migration using the same steps described above until the migration completes.