Features/AllocationFailures

From QEMU
Revision as of 13:30, 22 October 2018 by Dgilbert (talk | contribs) (Created page with " == Introduction == In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a sign...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a signal and kills the process. However with enough ulimit configuration it is possible to get an allocation to fail. This documents what we would like the behaviour to be when an allocation fails.

Gracefully handling allocation failures

 Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if
 QEMU exits.
 Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.
 Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.
  1. Allocation failures should never cause bad pointers
 Allocations should be checked and the choice is whether to take some
 recovery or exit; it should never end up writing to a NULL pointer for
 example.
  1. When exiting due to an allocation error, error exits should be used not abort
 'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
 memory is generally not an internal error and so should just exit.
 That's tricky with some libraries (glib in particular) - but where possible try
 and make it give an error exit
  1. Small allocation checks can exit
    Small allocations are less likely to fail than large allocations, and they're a pain to check every one,
    so pragmatically don't worry about them and allow exits on allocation failures.
  1. However, if you are allocating a list of hundreds of tiny allocations and the total size is large
    then it is worth thinking about the total size.
  1. 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.
  1. Allocations during startup can exit
    If the VM hasn't started running then it's OK to exit due to an allocation failure.
    Take care of allocations that can happen during either startup or hot plug.
  1. Large allocations that fail after startup should not cause a failure
    All large allocations that can happen once the VM is running should be checked, and
    a failure should not cause the VM to fail.
  1. Failures triggered by a monitor command should return an error on the monitor
    A simple example of this is hotplugging more RAM or a device
  1. Clean up
    When a large allocation fails, make sure to clean up so the VM isn't stuck
    with a half allocated device etc.