Features/Qcow3: Difference between revisions
(Created page with '=What is QCOW3?= The QCOW image format includes a version number to allow introducing new features that change the format in an incompatible way, so that older qemu versions can'…') |
|||
Line 61: | Line 61: | ||
=Roadmap= | =Roadmap= | ||
# Make QCOW2 asynchronous | # Introduce qcow2 test suite to exercise qcow2-specific routines | ||
## Calls directly into qcow2 routines to exercise internal features. | |||
## Will guard us against introducing regressions. | |||
# Make QCOW2 asynchronous | |||
## Option: Callbacks | |||
### Well-understood and proven performance at the cost of some maintainability/complexity. | |||
## Option: Coroutines | |||
### Riskier new approach that requires only limited code changes. | |||
### Questions about performance not yet addressed. | |||
# Introduce QCOW3 format and feature negotiation. | # Introduce QCOW3 format and feature negotiation. | ||
# Implement a safe delayed metadata update mode relying on consistency check. | # Implement a safe delayed metadata update mode relying on consistency check. | ||
## Eventually consistent refcounts reduce performance overhead of metadata updates. | |||
## Ability to rebuild refcount table by scanning L1/L2 tables. | |||
## Option: Use allocation offset header field (on block devices) | |||
# Reduce consistency check times | |||
## Clear dirty flag periodically and after guest flush to reduce dirty time window and risk of having to do a consistency check after crash. | |||
# Metadata caching tweaks (cache size, replacement algorithm) | # Metadata caching tweaks (cache size, replacement algorithm) | ||
# Scalability and parallel requests | |||
## In theory parallel requests may just work at this point but their performance may not scale for multithreaded workloads. | |||
# Copy-on-read for image streaming | |||
Estimated development times are based on the assumption that both Stefan | Estimated development times are based on the assumption that both Stefan | ||
Line 79: | Line 95: | ||
to ensure that coroutines do not introduce a significant new overhead. | to ensure that coroutines do not introduce a significant new overhead. | ||
Estimated development time: | Estimated development time: 4 weeks | ||
==Introduce QCOW3 format and feature negotiation== | ==Introduce QCOW3 format and feature negotiation== |
Revision as of 17:10, 2 February 2011
What is QCOW3?
The QCOW image format includes a version number to allow introducing new features that change the format in an incompatible way, so that older qemu versions can't read the image any more.
When this version number was changed from QCOW1 to QCOW2, the format was radically changed. QCOW1 and QCOW2 have two completely different driver implementations and the two versions actually are exposed as two different image formats ("qcow" and "qcow2") to the user.
The proposal for QCOW3 is different: It includes a version number increase in order to introduce some incompatible features, however it's strictly an extension of QCOW2 and keeps the fundamental structure unchanged, so that a single codebase will be enough for working with both QCOW2 and QCOW3 images.
Internally, QEMU will have a single driver for both QCOW2 and QCOW3 images, so it's an option to continue letting users know the format as "qcow2", just with an additional flag that can be set on image creation and specifies which version number should be used (and therefore, which features should be available).
Requirements
The key requirements for QCOW3 are:
- Near-raw performance, competitive with QED.
- Fully QCOW2 backwards-compatible feature set.
- Extensibility to tackle current and future storage virtualization challenges.
Near-raw performance, competitive with QED
Performance analysis has shown that QCOW2 performance is significantly worse than using raw files. Different approaches, including QED's simplified metadata and fully asynchronous implementation, have proven that a modern image format can achieve near-raw performance. Metadata caching and batched updates can also improve performance but require image format changes to be effective in all cases.
Improving performance is the key motivation for a QCOW3 image format.
Fully QCOW2 backwards-compatible feature set
The QCOW2 format offers encryption, compression, and internal snapshotting features not supported by other formats. Unlike other formats, QCOW2 allows images to be stored directly on block devices instead of using a file system. These features must be preserved in order to provide backwards compatibility for existing deployments.
Furthermore, it should be easy to upgrade from QCOW2 to QCOW3 so that existing users can do so without lengthy downtime or storage administration overheads.
Extensibility to tackle current and future storage virtualization challenges
Several new features in the storage area including discard support, external snapshots, and image streaming are being developed and integrated. These and other future features must fit into the format gracefully. A feature bit mechanism can be used to provide forwards, backwards, and incompatible feature negotiation support.
QCOW2 allows introducing incompatible new features only by increasing the version number. This is what we'll do for QCOW3. With this change, a more flexible mechanism will be introduced that can be used for future changes.
Roadmap
- Introduce qcow2 test suite to exercise qcow2-specific routines
- Calls directly into qcow2 routines to exercise internal features.
- Will guard us against introducing regressions.
- Make QCOW2 asynchronous
- Option: Callbacks
- Well-understood and proven performance at the cost of some maintainability/complexity.
- Option: Coroutines
- Riskier new approach that requires only limited code changes.
- Questions about performance not yet addressed.
- Option: Callbacks
- Introduce QCOW3 format and feature negotiation.
- Implement a safe delayed metadata update mode relying on consistency check.
- Eventually consistent refcounts reduce performance overhead of metadata updates.
- Ability to rebuild refcount table by scanning L1/L2 tables.
- Option: Use allocation offset header field (on block devices)
- Reduce consistency check times
- Clear dirty flag periodically and after guest flush to reduce dirty time window and risk of having to do a consistency check after crash.
- Metadata caching tweaks (cache size, replacement algorithm)
- Scalability and parallel requests
- In theory parallel requests may just work at this point but their performance may not scale for multithreaded workloads.
- Copy-on-read for image streaming
Estimated development times are based on the assumption that both Stefan and Kevin can focus on QCOW3 related work. If this doesn't hold true, delays are to be expected.
Make QCOW2 asynchronous using coroutines
The current QCOW2 implementation performs synchronous metadata accesses. This can temporarily stop the guest from running, results in poor performance, and introduces timing jitter.
Coroutines can be used to make QCOW2 asynchronous without invasive code changes. An emphasis will need to be placed on profiling and optimizing to ensure that coroutines do not introduce a significant new overhead.
Estimated development time: 4 weeks
Introduce QCOW3 format and feature negotiation
The QCOW header version number must be bumped to 3 in order to support incompatible file format changes. At the same time a feature bit mechanism should be introduced to make future file format changes easier.
Estimated development time: 1 week
Implement a safe delayed metadata update mode relying on consistency check
For writeback caches modes, qcow2 already batches writes to L2 tables and refcount blocks, requiring less writes and less flushes than QED and therefore providing better potential performance.
For writethrough modes any batching would be incorrect, so we still need an improvement here. We'll add an optional QED-like mode for this:
Refcount updates are batched even in writethrough modes and we do no ordering between L2 tables and refcount blocks. This makes the overhead of writing out refcount blocks negligible because it only happens when a refcount block is full (once in 2 GB for 64k clusters).
This will improve performance but also means that refcount tables may not be accurate. Introduce a dirty bit and consistency check that rebuilds the refcount tables by scanning L2 tables on startup if the dirty bit was set.
Estimated development time: 1 month