Features/AllocationFailures

From QEMU
Revision as of 13:35, 22 October 2018 by Dgilbert (talk | contribs) (→‎Gracefully handling allocation failures)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a signal and kills the process. However with enough ulimit configuration it is possible to get an allocation to fail. This documents what we would like the behaviour to be when an allocation fails.

Gracefully handling allocation failures

Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if QEMU exits.

Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.

Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.

  1. Allocation failures should never cause bad pointers
    Allocations should be checked and the choice is whether to take some
    recovery or exit; it should never end up writing to a NULL pointer for
    example.
  2. When exiting due to an allocation error, error exits should be used not abort
    'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
    memory is generally not an internal error and so should just exit.
    That's tricky with some libraries (glib in particular) - but where possible try
    and make it give an error exit
  3. Small allocation checks can exit
    Small allocations are less likely to fail than large allocations, and they're a pain to check every one,
    so pragmatically don't worry about them and allow exits on allocation failures.
    However, if you are allocating a list of hundreds of tiny allocations and the total size is large
    then it is worth thinking about the total size.
    'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.
  4. Allocations during startup can exit
    If the VM hasn't started running then it's OK to exit due to an allocation failure.
    Take care of allocations that can happen during either startup or hot plug.
  5. Large allocations that fail after startup should not cause a failure
    All large allocations that can happen once the VM is running should be checked, and
    a failure should not cause the VM to fail.
  6. Failures triggered by a monitor command should return an error on the monitor
    A simple example of this is hotplugging more RAM or a device
  7. Clean up
    When a large allocation fails, make sure to clean up so the VM isn't stuck
    with a half allocated device etc.