Features/AllocationFailures: Difference between revisions

From QEMU
(Created page with " == Introduction == In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a sign...")
 
Line 9: Line 9:
== Gracefully handling allocation failures ==
== Gracefully handling allocation failures ==


  Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if
Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if
  QEMU exits.
QEMU exits.


  Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.
Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.


  Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.
Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.


# Allocation failures should never cause bad pointers
# Allocation failures should never cause bad pointers
  Allocations should be checked and the choice is whether to take some
#:  Allocations should be checked and the choice is whether to take some
  recovery or exit; it should never end up writing to a NULL pointer for
#:  recovery or exit; it should never end up writing to a NULL pointer for
  example.
#:  example.


# When exiting due to an allocation error, error exits should be used not abort
# When exiting due to an allocation error, error exits should be used not abort
  'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
#:  'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
  memory is generally not an internal error and so should just exit.
#:  memory is generally not an internal error and so should just exit.
  That's tricky with some libraries (glib in particular) - but where possible try
#:  That's tricky with some libraries (glib in particular) - but where possible try
  and make it give an error exit
#:  and make it give an error exit


# Small allocation checks can exit
# Small allocation checks can exit
Line 31: Line 31:
#: so pragmatically don't worry about them and allow exits on allocation failures.
#: so pragmatically don't worry about them and allow exits on allocation failures.


#: However, if you are allocating a list of hundreds of tiny allocations and the total size is large
  However, if you are allocating a list of hundreds of tiny allocations and the total size is large
#: then it is worth thinking about the total size.
  then it is worth thinking about the total size.


#: 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.
#: 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.

Revision as of 13:32, 22 October 2018

Introduction

In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a signal and kills the process. However with enough ulimit configuration it is possible to get an allocation to fail. This documents what we would like the behaviour to be when an allocation fails.

Gracefully handling allocation failures

Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if QEMU exits.

Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.

Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.

  1. Allocation failures should never cause bad pointers
    Allocations should be checked and the choice is whether to take some
    recovery or exit; it should never end up writing to a NULL pointer for
    example.
  1. When exiting due to an allocation error, error exits should be used not abort
    'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
    memory is generally not an internal error and so should just exit.
    That's tricky with some libraries (glib in particular) - but where possible try
    and make it give an error exit
  1. Small allocation checks can exit
    Small allocations are less likely to fail than large allocations, and they're a pain to check every one,
    so pragmatically don't worry about them and allow exits on allocation failures.
 However, if you are allocating a list of hundreds of tiny allocations and the total size is large
 then it is worth thinking about the total size.
  1. 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.
  1. Allocations during startup can exit
    If the VM hasn't started running then it's OK to exit due to an allocation failure.
    Take care of allocations that can happen during either startup or hot plug.
  1. Large allocations that fail after startup should not cause a failure
    All large allocations that can happen once the VM is running should be checked, and
    a failure should not cause the VM to fail.
  1. Failures triggered by a monitor command should return an error on the monitor
    A simple example of this is hotplugging more RAM or a device
  1. Clean up
    When a large allocation fails, make sure to clean up so the VM isn't stuck
    with a half allocated device etc.