Features/Migration/Troubleshooting: Difference between revisions

From QEMU
Line 62: Line 62:


= Types of failure =
= Types of failure =
Migrations fail in lots of different ways; when reporting the bug make sure to indicate the type of failure and the additional details mentioned below.
== Migrations that never finish ==
== Migrations that never finish ==


== Migrations that fail with an error ==
A migration that doesn't finish is not necessarily a bug - it might be that your VM is changing memory too quickly for the migration to stuff the data down the network connection.
If the VM is still running happily on the guest, 'info migrate' on the source shows it as 'active' and you still see a large network bandwidth transferring data then this is probably what's happening.  You can try using postcopy migration, autoconverge or increasing your downtime to cope with big VMs rapidly changing memory.
 
If it doesn't finish but the source has stopped, or the source is still running but 'info migrate' isn't in active, or even if it's in active but there's very little network bandwidth, then report a bug, remember to include the output of 'info migrate' (taken a couple of times a few seconds apart) gathered from the source.


== Migrations that fail with a guest hang or crash ==
== Migrations that fail with a guest hang or crash ==
These are the worst case and are pretty hard to debug; if the only failure is in the guest then it's best to start seeing if you can see any logs inside the guest after restarting it,
e.g. anything that would indicate a particular device failure etc.  Running a memory test in the guest during a migrate (assuming you're host is OK!) is a good way to check
the migration code isn't doing something really bad.
If reporting this provide details of the guest you're running, also check the qemu logs on the source and destination for warnings.
== Migrations that fail with an error ==

Revision as of 19:15, 9 June 2016

You're probably looking at this page because migration has just failed; sorry about that, but hopefully this page will give you some idea of how to figure out why and importantly what to include in any bug report.


Basics

QEMU's migration is like lifting the state out of one machine and dumping it into another - the (emulated) hardware of the two machines have to match pretty much exactly for this to succeed.

Note that QEMU supports migrating forward between QEMU versions but in general not backwards, although some distros support this on their packaged QEMU versions.

Machine types

QEMU's machine types (the parameter to -M or --machine) is a definition of the basic shape of the emulated machine; the closest analogy is to the model of motherboard in a system. Migration requires you to have the same machine type on the source and destination machines. Architectures tend to have a variety of machine types (e.g. on x86 there is the 'pc' and the 'q35' family) that correspond to different generations of system. In addition some architectures version the machine types - e.g. pc-i440fx-2.5, pc-i440fx-2.6. Newer QEMUs normally keep (most of) the older machine types around so that you can migrate. So for example, a 2.6 release of QEMU should be able to migrate from a 2.5 release using the pc-i440fx-2.5 or pc-i440fx-2.4 machine types; Note it's not heavily tested!

Note that some machine types are aliases; on x86 the 'pc' and 'q35' machine types are aliases to whatever the latest version is on that version of qemu, and thus migrating between two different qemu versions both started with machine type 'pc' often won't work - use the full machine type.

ROMs

The ROM images used on the two hosts should be approximately (within a page size) the same size; if the ROMs do not match in size the migration is normally refused; care should be taken when packaging or upgrading BIOS, net boot roms etc to ensure this constraint is met.

Devices

The devices on the source and destination VMs must be identical - although any host devices they depend on can be different; for example you can't migrate between a VM with an IDE controller and another that replaced it with a SATA controller; but you can migrate between a VM with an IDE controller connected to a local file and another VM with an IDE controller backed by an iSCSI LUN.

Ordering and addressing of devices

When adding a device using -device on the qemu command line, it's normally added to the next available slot on the bus unless an address is specified. It's best to specify the address explicitly to avoid the source and destination ending up with different allocations; e.g. use -device pci-ohci,addr=5 -device usb-mouse,port=1

Hotplugged devices

Hotplugging can't normally be performed during a migration, however it's fine to hot plug/unplug a device before migration starts as long as care is taken to ensure that the state of the destination VM is identical to the current state of the source VM prior to the start of migration. Particular care should be taken to specify the address/port of devices with hotplugged devices since the automatic allocation on the command line of the destination won't necessariyl reflect the history of hot plug/unplug events on the source.


Host devices

Host PCI devices that are passed through to the guest normally block migration. There are various attempts to fix this for special cases of network cards, but none of them are complete yet.

Block storage

To do: cache=none, all the different ways to migrate block

Reporting bugs

If you report a migration bug please make sure that you:

* Include the full QEMU versions you're using (including the full package version if you're using a distro's build)
* The full qemu command line on both the source and destination (feel free to remove identifying paths/passwords/IPs etc)
* The qemu log output from both the source and destination.
* Describe the networking between the two hosts (e.g. TCP over 10Gb ether)
* Any of the migration parameters or capabilities you set/changed.
* Details of whether you hot plugged anything
* Is it repeatable or occasional?
* How does it fail? An error? A hang ? etc - see below for additional details.

Finding logs

If you're using any system that uses libvirt, then libvirt normally captures the logs from the VM. On system libvirt they're normally in /var/log/libvirt/qemu/guestname.log. If you're running it on a desktop with a user libvirt session then try ~/.cache/libvirt/qemu/log (although that probably doesn't migrate). If you're using openstack it can be a little tricky to figure out which instance name corresponds to the VM you're migrating.

Types of failure

Migrations fail in lots of different ways; when reporting the bug make sure to indicate the type of failure and the additional details mentioned below.

Migrations that never finish

A migration that doesn't finish is not necessarily a bug - it might be that your VM is changing memory too quickly for the migration to stuff the data down the network connection. If the VM is still running happily on the guest, 'info migrate' on the source shows it as 'active' and you still see a large network bandwidth transferring data then this is probably what's happening. You can try using postcopy migration, autoconverge or increasing your downtime to cope with big VMs rapidly changing memory.

If it doesn't finish but the source has stopped, or the source is still running but 'info migrate' isn't in active, or even if it's in active but there's very little network bandwidth, then report a bug, remember to include the output of 'info migrate' (taken a couple of times a few seconds apart) gathered from the source.

Migrations that fail with a guest hang or crash

These are the worst case and are pretty hard to debug; if the only failure is in the guest then it's best to start seeing if you can see any logs inside the guest after restarting it, e.g. anything that would indicate a particular device failure etc. Running a memory test in the guest during a migrate (assuming you're host is OK!) is a good way to check the migration code isn't doing something really bad.

If reporting this provide details of the guest you're running, also check the qemu logs on the source and destination for warnings.

Migrations that fail with an error