Features/AllocationFailures: Difference between revisions
(Created page with " == Introduction == In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a sign...") |
|||
(One intermediate revision by the same user not shown) | |||
Line 9: | Line 9: | ||
== Gracefully handling allocation failures == | == Gracefully handling allocation failures == | ||
Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if | |||
QEMU exits. | |||
Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss. | |||
Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules. | |||
# Allocation failures should never cause bad pointers | # Allocation failures should never cause bad pointers | ||
#: Allocations should be checked and the choice is whether to take some | |||
#: recovery or exit; it should never end up writing to a NULL pointer for | |||
#: example. | |||
# When exiting due to an allocation error, error exits should be used not abort | # When exiting due to an allocation error, error exits should be used not abort | ||
#: 'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of | |||
#: memory is generally not an internal error and so should just exit. | |||
#: That's tricky with some libraries (glib in particular) - but where possible try | |||
#: and make it give an error exit | |||
# Small allocation checks can exit | # Small allocation checks can exit | ||
#: Small allocations are less likely to fail than large allocations, and they're a pain to check every one, | #: Small allocations are less likely to fail than large allocations, and they're a pain to check every one, | ||
#: so pragmatically don't worry about them and allow exits on allocation failures. | #: so pragmatically don't worry about them and allow exits on allocation failures. | ||
#::However, if you are allocating a list of hundreds of tiny allocations and the total size is large | |||
#: However, if you are allocating a list of hundreds of tiny allocations and the total size is large | #::then it is worth thinking about the total size. | ||
#: then it is worth thinking about the total size. | #:: 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size. | ||
#: 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size. | |||
# Allocations during startup can exit | # Allocations during startup can exit | ||
#: If the VM hasn't started running then it's OK to exit due to an allocation failure. | #: If the VM hasn't started running then it's OK to exit due to an allocation failure. | ||
#: Take care of allocations that can happen during either startup or hot plug. | #: Take care of allocations that can happen during either startup or hot plug. | ||
# Large allocations that fail after startup should not cause a failure | # Large allocations that fail after startup should not cause a failure | ||
#: All large allocations that can happen once the VM is running should be checked, and | #: All large allocations that can happen once the VM is running should be checked, and | ||
#: a failure should not cause the VM to fail. | #: a failure should not cause the VM to fail. | ||
# Failures triggered by a monitor command should return an error on the monitor | # Failures triggered by a monitor command should return an error on the monitor | ||
#: A simple example of this is hotplugging more RAM or a device | #: A simple example of this is hotplugging more RAM or a device | ||
# Clean up | # Clean up | ||
#: When a large allocation fails, make sure to clean up so the VM isn't stuck | #: When a large allocation fails, make sure to clean up so the VM isn't stuck | ||
#: with a half allocated device etc. | #: with a half allocated device etc. |
Latest revision as of 13:35, 22 October 2018
Introduction
In most situations memory allocations never fail in qemu; we just find out later that there's not enough memory when the kernel OOM-killer delivers a signal and kills the process. However with enough ulimit configuration it is possible to get an allocation to fail. This documents what we would like the behaviour to be when an allocation fails.
Gracefully handling allocation failures
Allocation failure during QEMU startup is the least problematic; the user hasn't lost anything at this point if QEMU exits.
Allocation failure on a running QEMU is much more important - if a VM is happily running, a crashing QEMU causes downtime for the user and/or potential data loss.
Ideally all memory paths would be checked and fail cleanly; however it's acknowledged that in QEMUs structure it's difficult to check all allocations, thus we provide pragmattic rules.
- Allocation failures should never cause bad pointers
- Allocations should be checked and the choice is whether to take some
- recovery or exit; it should never end up writing to a NULL pointer for
- example.
- When exiting due to an allocation error, error exits should be used not abort
- 'abort' indicates an internal error for which the host may capture a coredump and submit an error, running out of
- memory is generally not an internal error and so should just exit.
- That's tricky with some libraries (glib in particular) - but where possible try
- and make it give an error exit
- Small allocation checks can exit
- Small allocations are less likely to fail than large allocations, and they're a pain to check every one,
- so pragmatically don't worry about them and allow exits on allocation failures.
- However, if you are allocating a list of hundreds of tiny allocations and the total size is large
- then it is worth thinking about the total size.
- 'small' is arbitrary and is arbitrarily chosen to be a typical 4k page size.
- Allocations during startup can exit
- If the VM hasn't started running then it's OK to exit due to an allocation failure.
- Take care of allocations that can happen during either startup or hot plug.
- Large allocations that fail after startup should not cause a failure
- All large allocations that can happen once the VM is running should be checked, and
- a failure should not cause the VM to fail.
- Failures triggered by a monitor command should return an error on the monitor
- A simple example of this is hotplugging more RAM or a device
- Clean up
- When a large allocation fails, make sure to clean up so the VM isn't stuck
- with a half allocated device etc.