Features/Migration/Troubleshooting: Difference between revisions

From QEMU
 
(3 intermediate revisions by the same user not shown)
Line 39: Line 39:
=== Block storage ===
=== Block storage ===


To do: cache=none, all the different ways to migrate block
Block storage has to somehow be accessible from both the source and destination, either by shared storage (e.g. NFS) or by explicit migration of the block data between local storage systems.  Confusion as to whether a particular block device needs shared or non-shared migration has been known to cause problems in the past - e.g. corruptions caused by a block device being copied back to itself explicitly but writing to the same NFS storage.
 
* Shared storage:   Other than making sure the shared storage is available from both machines, there is not much special needed for real shared storage such as NFS.  Older QEMU required the use of cache=none to ensure consistent block state between the source and destination hosts, however this has been replaced by an explicit flush in the migration path, so that any cached data on the source should make it to the fileserver before the destination uses it.  Care is needed with file permissions such that both source and destination can access the file, pre-assignment of SELinux labels so that both sides agree on the label to use is a common requirement for libvirt.
 
* Non-shared storage: Explicitly migrating the block device across is possible, although can be problematic for huge files.  The live migration code in qemu does have support for an old block migration protocol that shared the migration stream, however this is deprecated and shared block migration is now normally accomplished using a separate NBD block server/client configuration in qemu the use of block jobs to perform the copy before the main live migration process is started.
 
Bugs sometimes are triggered between the live migration and block layers; at the end of the source live migration the block layer must release the disc (flushing caches as it goes); on a migration failure the source must then not try and run until the block layer has been told to relock/acquire the disc.
 
Discs that have already failed prior to migration, or fail during migration present problems that sometimes cause failures, it's not always clear what the correct response is.


== CPUs ==
== CPUs ==
Line 92: Line 100:
e.g. anything that would indicate a particular device failure etc.  Running a memory test in the guest during a migrate (assuming you're host is OK!) is a good way to check
e.g. anything that would indicate a particular device failure etc.  Running a memory test in the guest during a migrate (assuming you're host is OK!) is a good way to check
the migration code isn't doing something really bad.
the migration code isn't doing something really bad.
For shared storage (e.g. NFS) make sure to use cache=none to avoid odd corruptions.
If reporting this provide details of the guest you're running, also check the qemu logs on the source and destination for warnings.
If reporting this provide details of the guest you're running, also check the qemu logs on the source and destination for warnings.
Failures during reboot after migration or of a migration during boot can indicate incompatibilities between the version of BIOS ROMS and the destination qemu. See ROMS above.
Failures during reboot after migration or of a migration during boot can indicate incompatibilities between the version of BIOS ROMS and the destination qemu. See ROMS above.
Line 148: Line 155:
* Something kills/cancels the migration on the source - e.g. migrate_cancel on the source or the source is killed before migration is complete.
* Something kills/cancels the migration on the source - e.g. migrate_cancel on the source or the source is killed before migration is complete.
* An actual IO error generated by one of the devices as it's loaded - e.g. networking/disc etc
* An actual IO error generated by one of the devices as it's loaded - e.g. networking/disc etc
* A permissions problem involving the disc (e.g. SELinux on shared networking)]
* A disagreement with libvirt over when the ownership of a disc file should change.
* A migration failure on the source.  In this case check 'info migrate' on the source and it should say 'failed'.  One way to start debugging this is to do a migrate to /dev/null to see if the problem can be isolated to the source; e.g.  migrate "exec:cat > /dev/null"  - if that still shows  migrate failed in info migrate then the problem is purely on the source.
* A migration failure on the source.  In this case check 'info migrate' on the source and it should say 'failed'.  One way to start debugging this is to do a migrate to /dev/null to see if the problem can be isolated to the source; e.g.  migrate "exec:cat > /dev/null"  - if that still shows  migrate failed in info migrate then the problem is purely on the source.


Line 184: Line 193:


A feature (UDP fragementation offload) that was available on the source kernel isn't available on the destination.  This is a guest visible negotiated feature.  UFO support was remove in the Linux 4.14 kernel so migration to 4.14 is currently a problem;  [https://patchwork.ozlabs.org/patch/840094/  a fix is under investigation ]
A feature (UDP fragementation offload) that was available on the source kernel isn't available on the destination.  This is a guest visible negotiated feature.  UFO support was remove in the Linux 4.14 kernel so migration to 4.14 is currently a problem;  [https://patchwork.ozlabs.org/patch/840094/  a fix is under investigation ]
=== virtio queue problems - VQ x size 0xxx < last_avail_idx 0xxxxx - used_idx 0xxx ===
There are multiple causes of queue inconsistencies within virtio devices, these include:
* Driver or device bugs that leave the queue in an inconsistent state even before migration
* Differences in feature enablement on source or destination, leading to more or less queues being used (although that often triggers Bad config data errors)
* Changes in virtio state after the CPU has been stopped on the source.  Once the CPU is stopped on the source, no more changes should be made to the queues (e.g. no more packets received into a network device, no more accesses marked completed in storage), otherwise the contents of the guest RAM and the migrated virtio pointer state will be inconsistent.
Sometimes these bugs can be guest dependent, for example a newly enabled virtio feature, accidentally enabled in old machine types, might not cause a problem until a guest new enough to start using the feature is migrated.


=== RDMA ERROR: could not create rdma event channel ===
=== RDMA ERROR: could not create rdma event channel ===

Latest revision as of 18:08, 29 March 2023

You're probably looking at this page because migration has just failed; sorry about that, but hopefully this page will give you some idea of how to figure out why and importantly what to include in any bug report.


Basics

QEMU's migration is like lifting the state out of one machine and dumping it into another - the (emulated) hardware of the two machines have to match pretty much exactly for this to succeed.

Note that QEMU supports migrating forward between QEMU versions but in general not backwards, although some distros support this on their packaged QEMU versions.

Machine types

QEMU's machine types (the parameter to -M or --machine) is a definition of the basic shape of the emulated machine; the closest analogy is to the model of motherboard in a system. Migration requires you to have the same machine type on the source and destination machines. Architectures tend to have a variety of machine types (e.g. on x86 there is the 'pc' and the 'q35' family) that correspond to different generations of system. In addition some architectures version the machine types - e.g. pc-i440fx-2.5, pc-i440fx-2.6. Newer QEMUs normally keep (most of) the older machine types around so that you can migrate. So for example, a 2.6 release of QEMU should be able to migrate from a 2.5 release using the pc-i440fx-2.5 or pc-i440fx-2.4 machine types; Note it's not heavily tested!

Note that some machine types are aliases; on x86 the 'pc' and 'q35' machine types are aliases to whatever the latest version is on that version of qemu, and thus migrating between two different qemu versions both started with machine type 'pc' often won't work - use the full machine type.

ROMs

The ROM images used on the two hosts should be approximately (within a page size) the same size; if the ROMs do not match in size the migration is normally refused; care should be taken when packaging or upgrading BIOS, net boot roms etc to ensure this constraint is met; one way is for packaged versions to pad the ROM sizes so that their is room to grow in future versions. After migration, the contents of the source ROMs are still used on the destination, so a reboot of the VM on the destination will use the ROM image from the source. This can sometimes cause a problem where a new feature on a later QEMU upsets an older ROM image, or the opposite for a backwards migration.

Devices

The devices on the source and destination VMs must be identical - although any host devices they depend on can be different; for example you can't migrate between a VM with an IDE controller and another that replaced it with a SATA controller; but you can migrate between a VM with an IDE controller connected to a local file and another VM with an IDE controller backed by an iSCSI LUN.

Ordering and addressing of devices

When adding a device using -device on the qemu command line, it's normally added to the next available slot on the bus unless an address is specified. It's best to specify the address explicitly to avoid the source and destination ending up with different allocations; e.g. use -device pci-ohci,addr=5 -device usb-mouse,port=1

Hotplugged devices

Hotplugging can't normally be performed during a migration, however it's fine to hot plug/unplug a device before migration starts as long as care is taken to ensure that the state of the destination VM is identical to the current state of the source VM prior to the start of migration. Particular care should be taken to specify the address/port of devices with hotplugged devices since the automatic allocation on the command line of the destination won't necessarily reflect the history of hot plug/unplug events on the source.

Host devices

Host PCI devices that are passed through to the guest normally block migration. There are various attempts to fix this for special cases of network cards, but none of them are complete yet.

Block storage

Block storage has to somehow be accessible from both the source and destination, either by shared storage (e.g. NFS) or by explicit migration of the block data between local storage systems. Confusion as to whether a particular block device needs shared or non-shared migration has been known to cause problems in the past - e.g. corruptions caused by a block device being copied back to itself explicitly but writing to the same NFS storage.

  • Shared storage: Other than making sure the shared storage is available from both machines, there is not much special needed for real shared storage such as NFS. Older QEMU required the use of cache=none to ensure consistent block state between the source and destination hosts, however this has been replaced by an explicit flush in the migration path, so that any cached data on the source should make it to the fileserver before the destination uses it. Care is needed with file permissions such that both source and destination can access the file, pre-assignment of SELinux labels so that both sides agree on the label to use is a common requirement for libvirt.
  • Non-shared storage: Explicitly migrating the block device across is possible, although can be problematic for huge files. The live migration code in qemu does have support for an old block migration protocol that shared the migration stream, however this is deprecated and shared block migration is now normally accomplished using a separate NBD block server/client configuration in qemu the use of block jobs to perform the copy before the main live migration process is started.

Bugs sometimes are triggered between the live migration and block layers; at the end of the source live migration the block layer must release the disc (flushing caches as it goes); on a migration failure the source must then not try and run until the block layer has been told to relock/acquire the disc.

Discs that have already failed prior to migration, or fail during migration present problems that sometimes cause failures, it's not always clear what the correct response is.

CPUs

The emulated CPU has to match on the source and destination; in general architecture specific code ensures that '-cpu something' matches or warns if the host is not capable of supporting it (but in general it doesn't fail). Some architectures have other dependencies (e.g. the interrupt controller having to match the emulated CPU, or the host). '-cpu host' varies wildly based on the host system, and the semantics vary a lot based on the architecture.

x86

On x86 incompatibilities have been seen due to at least:

  • BIOS versions - with different BIOSs advertising different CPU capabilities.
  • microcode updates - occasionally a microcode update will remove a broken feature or advertise a new security work around.
  • BIOS settings - disabling hyperthreading can change register availability for performance counters; other options can be enabled or disabled.

Migration between different vendors x86 chips is generally not supported, even if the sets of flags can be matched.

Reporting bugs

If you report a migration bug please make sure that you:

* Include the full QEMU versions you're using (including the full package version if you're using a distro's build)
* The full qemu command line on both the source and destination (feel free to remove identifying paths/passwords/IPs etc)
* The qemu log output from both the source and destination.
* Describe the networking between the two hosts (e.g. TCP over 10Gb ether)
* Any of the migration parameters or capabilities you set/changed.
* Details of whether you hot plugged anything
* Is it repeatable or occasional?
* How does it fail? An error? A hang ? etc - see below for additional details.
* State if you're running in a nested environment, and if you are give details of the top level hypervisor.

Finding logs

If you're using any system that uses libvirt, then libvirt normally captures the logs from the VM. On system libvirt they're normally in /var/log/libvirt/qemu/guestname.log. If you're running it on a desktop with a user libvirt session then try ~/.cache/libvirt/qemu/log (although that probably doesn't migrate). If you're using openstack it can be a little tricky to figure out which instance name corresponds to the VM you're migrating.

Types of failure

Migrations fail in lots of different ways; when reporting the bug make sure to indicate the type of failure and the additional details mentioned below.

Migrations that never finish

A migration that doesn't finish is not necessarily a bug - it might be that your VM is changing memory too quickly for the migration to stuff the data down the network connection. If the VM is still running happily on the guest, 'info migrate' on the source shows it as 'active' and you still see a large network bandwidth transferring data then this is probably what's happening. You can try using postcopy migration, autoconverge or increasing your downtime to cope with big VMs rapidly changing memory.

If it doesn't finish but the source has stopped, or the source is still running but 'info migrate' isn't in active, or even if it's in active but there's very little network bandwidth, then report a bug, remember to include the output of 'info migrate' (taken a couple of times a few seconds apart) gathered from the source.

Migrations that fail with a guest hang or crash

These are the worst case and are pretty hard to debug; if the only failure is in the guest then it's best to start seeing if you can see any logs inside the guest after restarting it, e.g. anything that would indicate a particular device failure etc. Running a memory test in the guest during a migrate (assuming you're host is OK!) is a good way to check the migration code isn't doing something really bad. If reporting this provide details of the guest you're running, also check the qemu logs on the source and destination for warnings. Failures during reboot after migration or of a migration during boot can indicate incompatibilities between the version of BIOS ROMS and the destination qemu. See ROMS above.

Migrations that fail with an error

When a migration fails, check the logs on both sides for errors; sometimes it's tricky to figure out which side caused the problem.

Names in qemu's migration errors

The names in qemu's migration errors correspond to internal object names; they fall into 3 categories:

  • Simple names like 'vmmouse'
  • Fixed but structured names, e.g. '/rom@etc/acpi/tables/2'
  • Names with PCI, USB or SCSI bus IDs in, e.g. '0000:02.0/qxl.vram'
  • Garbage - if the name is really corrupt that indicates a bug somewhere - report it.

qemu: terminating on signal .. from pid ....

That's typically signal 15 and the pid typically corresponds to libvirtd; it's normal for libvirt to kill the source qemu at the end of the migration, however any other cases it's worth checking if there are no other qemu errors on either side, then it's best to go and check libvirtd's own logs to see why it's upset.

qemu: Not a migration stream

The first few bytes of the stream are wrong; this suggests that the source migration failed very quickly or the destination is reading from the wrong socket.

State blocked by non-migratable device '.....'

Devices can block migration either because the code hasn't been written/tested for their migration or because a particular feature is hard to migrate. Examples include:

  • Older qemu's couldn't migrate AHCI/SATA
  • x86 cpus can't migrate with the 'invtsc' feature flag enabled.

error while loading state for instance 0x... of device '....' =

This tells you exactly the device that's failed, if you're lucky there might be some errors preceding it telling you what went wrong. While most of these cases are bugs, other cases can include IO problems on the backing device on the destination or a missing subsection definition.

Unknown savevm section or instance '...'

In this case the source has sent migration data for a device that can't be found on the destination. There are two common causes of this:

  • A mismatch in the qemu command line/machine type causing the destination not to have the device at all.
  • A mismatch in the order of devices on a bus, e.g. in a case where I hadn't specified the port number for a usb-kbd and had the order different, I got an Unknown savevm section or instance '0000:00:04.0/1/usb-kbd' because it was actually /0/usb-kbd.

Unknown ramblock "..." cannot accept migration

Similar to the unknown savevm section above, in this case we're missing a block of RAM or ROM; again this is normally down to a command line mismatch.

Length mismatch: ....: .... in != .....

A block of RAM or ROM is a different size on the source and destination, while QEMU can cope in some specific cases, in general it can't (because it wouldn't have anywhere to put the excess data in the guests address space). If this is a ROM the problem is normally down to the source and destination having different versions of the associated ROM installed; check the bios and ipxe packages that provide them. Packagers are advised to pad ROMs to nice convenient power-of-2 boundaries with plenty of space for growth to avoid this problem.

Other common causes are different settings for the size of VRAM on graphics emulation.

load of migration failed: Input/output error

This is normally seen on the destination and there are a few failure cases.

  • A network failure during the migration - the destination can't receive the data
  • Something kills/cancels the migration on the source - e.g. migrate_cancel on the source or the source is killed before migration is complete.
  • An actual IO error generated by one of the devices as it's loaded - e.g. networking/disc etc
  • A permissions problem involving the disc (e.g. SELinux on shared networking)]
  • A disagreement with libvirt over when the ownership of a disc file should change.
  • A migration failure on the source. In this case check 'info migrate' on the source and it should say 'failed'. One way to start debugging this is to do a migrate to /dev/null to see if the problem can be isolated to the source; e.g. migrate "exec:cat > /dev/null" - if that still shows migrate failed in info migrate then the problem is purely on the source.

Missing section footer for ...

First check the command lines to ensure they're identical; if they are then this is a bug; please report it. It indicates that the section (or possibly a previous section) are saved by the source in a way that the destination didn't agree with; typically this is a problem with a .needed function behaving differently on the source or destination or a difference in the number of elements of an array.

(x86) failed to set MSR 0xXXXX to 0xYYYY

In general this shouldn't happen as long as the choice of guest CPU type and flags is compatible with the host. To diagnose the actual MSR number and value need to be looked up in the CPU documentation. If you have a failing case then please report the output of /proc/cpuinfo on both hosts and the output of x86info -a from both hosts and guest. Using -cpu host can trigger it if the two hosts aren't absolutely identical.

  • MSR 0x202 - IA32_MTRR_PHYSBASE1
if the destination has a smaller physical address width than the source, it can object to setting MSRs to physical addresses larger than it can handle. e.g. a migration from a large Xeon to an E3 series xeon with a smaller physical address space. Use of the 'phys=bits' and 'host-phys-bits-limit' cpu options can be used to match the sizes across multiple hosts.
  • MSR 0x38f - IA32_PERF_GLOBAL_CTRL
if the destination has fewer performance counters than the source host the destination will reject the bitmask of counters. Seen migrating from a host with hyperthread disabledto a host with hyperthread enabled with -host cpu.
  • MSR 0x4b564dxx - KVM MSRs
MSRs starting 0x4b564d are virtual MSRs handled by the KVM host; bugs related to these may be either kernel or QEMU bugs.
  • MSR 0x4b564d02 MSR_KVM_ASYNC_PF_EN - failed to set to 0x....5
Migrating a guest with a new (~4.13) kernel from a host with a new kernel (~4.13) to a host with an old kernel (<4.13); current known bug, the guest enables a feature that's only supported in newer host kernels (KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) and the older kernel gets upset when it's asked to enable it.
  • MSR 0xc001xxxx - AMD MSRs
MSRs starting 0xc001xxxx are normally AMD specific MSRs. It's worth checking that the VM is using the correct CPU model and whether the problem is related to a CPU model change or the use of any particular AMD feature.

get_pci_config_device: Bad config data: i=0xxx read: yy device: zz cmask: ss wmask: pp w1cmask:qq

The PCI 'config' data is compared between the source and destination devices, some bits are allowed to change (as controlled by the various masks) but when they don't match this error is given. 'i' is the index into the PCI config space. The most common cause is a missing entry in the capabilities list due to a feature being enabled on one side but not the other. On a Linux guest, check with 'lspci -vv' and the 'Capabilities: [XX]' values show each entry in the list. Note that the order of entries in this list must stay the same. If the command line to qemu matches on both sides and you trigger this error it's a bug, please report it.

virtio-net: saved image requires TUN_F_UFO support

A feature (UDP fragementation offload) that was available on the source kernel isn't available on the destination. This is a guest visible negotiated feature. UFO support was remove in the Linux 4.14 kernel so migration to 4.14 is currently a problem; a fix is under investigation

virtio queue problems - VQ x size 0xxx < last_avail_idx 0xxxxx - used_idx 0xxx

There are multiple causes of queue inconsistencies within virtio devices, these include:

  • Driver or device bugs that leave the queue in an inconsistent state even before migration
  • Differences in feature enablement on source or destination, leading to more or less queues being used (although that often triggers Bad config data errors)
  • Changes in virtio state after the CPU has been stopped on the source. Once the CPU is stopped on the source, no more changes should be made to the queues (e.g. no more packets received into a network device, no more accesses marked completed in storage), otherwise the contents of the guest RAM and the migrated virtio pointer state will be inconsistent.

Sometimes these bugs can be guest dependent, for example a newly enabled virtio feature, accidentally enabled in old machine types, might not cause a problem until a guest new enough to start using the feature is migrated.

RDMA ERROR: could not create rdma event channel

When attempting RDMA migration check that the qemu has all the permissions needed to access the RDMA devices. When using libvirt the /etc/libvirt/qemu.conf has a cgroup_device_acl and a chunk commented out that should be added for RDMA usage.

QEMU aborts/seg faults/crashes

Any segfault by either the source or destination qemu is a bug - please report it. Abort's are normally also bugs except in specific cases (e.g. corrupt image files); again if the error isn't obvious report it.

Debugging

OK, so it's failed and you actually want to debug it - thanks. Here are some suggested starting points:

  • Using 'info qtree' is a good way to see if the two VMs configuration match
  • There's lots of tracing in the migration and vmstate code; if in doubt turn the tracing on migration/migration.c, migration/vmstate.c and migration/savevm.c on and see where things go.
  • That tracing can be pretty useful if there's a disagreement about conditional fields in a vmstate structure; if one side decides it wants to send a byte but the other thinks it's not needed things get very confused.

- I tend to use tracing to stderr most of the time, but tracing using systemtap works nicely on RHEL when you don't want to rebuild the binary.

  • Errors are generally not detected if the source does NOT send a section - you just have a device on the destination that doesn't load it's state.
  • Some migration bugs can be dependent on the state of the guest; e.g. which screen mode it's in, whether a CD image is in the drive, if it's using a particular device - try and isolate the case it fails.