|
|
(21 intermediate revisions by 8 users not shown) |
Line 1: |
Line 1: |
| =Specification= | | = Overview = |
|
| |
|
| The file format looks like this:
| | QED was an attempt at creating a better performing image format by removing some features compared to qcow2. However, it turned out that the achieved performance improvements were mostly related to an improved implementation rather than the file format per se. These improvements, as well as a few format extensions, have been merged back into qcow2 since. The development of QED has been abandoned. |
|
| |
|
| +----------+----------+----------+-----+
| | '''It is not recommended to use QED for any new images.''' For existing images, converting to qcow2 should be considered as today qcow2 provides both more features and better performance, and has an actively maintained code base. |
| | cluster0 | cluster1 | cluster2 | ... |
| |
| +----------+----------+----------+-----+
| |
|
| |
|
| The first cluster begins with the '''header'''. The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file. A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1 and L2 tables are composed of one or more contiguous clusters.
| | QED supports backing files and sparse images. |
|
| |
|
| Normally the file size will be a multiple of the cluster size. If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written. Legitimate extra information should use space between the header and the first regular cluster.
| | = Status = |
|
| |
|
| All fields are little-endian.
| | * '''QED is deprecated and only supported for compatibility with existing images''' (similar to qcow1) |
| | * Base QED is in qemu.git since [http://git.qemu.org/qemu.git/commit/?id=75411d236d93d79d8052e0116c3eeebe23e2778b 2010-12-17] and will form part of QEMU 0.14. |
| | * No additional features are planned to get merged |
|
| |
|
| ==Header== | | = Features = |
| Header {
| |
| uint32_t magic; /* QED\0 */
| |
|
| |
| uint32_t cluster_size; /* in bytes */
| |
| uint32_t table_size; /* for L1 and L2 tables, in clusters */
| |
| uint32_t header_size; /* in clusters */
| |
|
| |
| uint64_t features; /* format feature bits */
| |
| uint64_t compat_features; /* compat feature bits */
| |
| uint64_t l1_table_offset; /* in bytes */
| |
| uint64_t image_size; /* total logical image size, in bytes */
| |
|
| |
| /* if (features & QED_F_BACKING_FILE) */
| |
| uint32_t backing_filename_offset; /* in bytes from start of header */
| |
| uint32_t backing_filename_size; /* in bytes */
| |
| }
| |
|
| |
|
| Field descriptions:
| | * [[Features/QED/Specification|Open specification]] |
| * ''cluster_size'' must be a power of 2 in range [2^12, 2^26]. | | * Fully asynchronous I/O path |
| * ''table_size'' must be a power of 2 in range [1, 16]. | | * Strong data integrity due to simple design |
| * ''header_size'' is the number of clusters used by the header and any additional information stored before regular clusters. | | * Backing files |
| * ''features'', ''compat_features'', and ''autoclear_features'' are file format extension bitmaps. They work as follows: | | ** Backing files may be smaller than the QED image |
| ** An image with unknown ''features'' bits enabled must not be opened. File format changes that are not backwards-compatible must use ''features'' bits. | | * Sparse files |
| ** An image with unknown ''compat_features'' bits enabled can be opened safely. The unknown features are simply ignored and represent backwards-compatible changes to the file format. | | ** Retains sparseness over non-sparse channels (e.g. HTTP) |
| ** An image with unknown ''autoclear_features'' bits enable can be opened safely after clearing the unknown bits. This allows for backwards-compatible changes to the file format which degrade gracefully and can be re-enabled again by a new program later. | | * Zero clusters |
| * ''l1_table_offset'' is the offset of the first byte of the L1 table in the image file and must be a multiple of ''cluster_size''.
| | * Periodic dirty flag flush |
| * ''image_size'' is the block device size seen by the guest and must be a multiple of 512 bytes. | |
| * ''backing_filename'' is a string in (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. | |
|
| |
|
| Feature bits:
| | = Uncompleted work = |
| * QED_F_BACKING_FILE = 0x01. The image uses a backing file. The backing filename string is given in the ''backing_filename_{offset,size}'' fields and may be an absolute path or relative to the image file.
| |
| * QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use.
| |
|
| |
|
| There are currently no defined ''compat_features'' or ''autoclear_features'' bits.
| | * [[Features/QED/OutstandingWork|Outstanding work]] |
| | |
| Fields predicated on a feature bit are only used when that feature is set. The fields always take up header space, regardless of whether or not the feature bit is set.
| |
| | |
| ==Tables==
| |
| | |
| Tables provide the translation from logical offsets in the block device to cluster offsets in the file.
| |
| | |
| #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
| |
|
| |
| Table {
| |
| uint64_t offsets[TABLE_NOFFSETS];
| |
| }
| |
| | |
| The tables are organized as follows:
| |
| | |
| +----------+
| |
| | L1 table |
| |
| +----------+
| |
| ,------' | '------.
| |
| +----------+ | +----------+
| |
| | L2 table | ... | L2 table |
| |
| +----------+ +----------+
| |
| ,------' | '------.
| |
| +----------+ | +----------+
| |
| | Data | ... | Data |
| |
| +----------+ +----------+
| |
| | |
| A table is made up of one or more contiguous clusters. The table_size header field determines table size for an image file. For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.
| |
| | |
| The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:
| |
| header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
| |
| | |
| The 12 least significant bits of an offset are reserved and are not part of the byte offset within the image file. The meaning of 12 bits are as follows:
| |
| | |
| ===L2 table offsets===
| |
| * 0 - unallocated. The L2 table is not yet allocated. Read within the cluster access the backing image. If there is no backing image, then zeroes are produced. Writes within the cluster cause a new cluster and L2 table to be allocated. The new cluster starts with data from the backing image (or zeroes if no backing image) and the data being written.
| |
| | |
| ===Data cluster offsets===
| |
| * 0 - unallocated. The cluster is not yet allocated. Read within the cluster access the backing image. If there is no backing image, then zeroes are produced. Writes within the cluster cause a new cluster to be allocated. The new cluster starts with data from the backing image (or zeroes if no backing image) and the data being written.
| |
| * 1 - zero cluster. The contents of the cluster are all zero. Reads within the cluster produce zeroes. Writes within the cluster cause a new cluster to be allocated.
| |
| | |
| Logical offsets are translated into cluster offsets as follows:
| |
| | |
| table_bits table_bits cluster_bits
| |
| <--------> <--------> <--------------->
| |
| +----------+----------+-----------------+
| |
| | L1 index | L2 index | byte offset |
| |
| +----------+----------+-----------------+
| |
|
| |
| Structure of a logical offset
| |
| | |
| offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
| |
|
| |
| def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
| |
| l2_offset = l1_table[l1_index]
| |
| l2_table = load_table(l2_offset)
| |
| cluster_offset = l2_table[l2_index] & offset_mask
| |
| return cluster_offset + byte_offset
| |
| | |
| =Operations=
| |
| | |
| ==Read==
| |
| # If L2 table is not present in L1, read from backing image.
| |
| # If data cluster is not present in L2, read from backing image or zero fill if no backing image.
| |
| # Otherwise read data from cluster.
| |
| | |
| ==Write==
| |
| # If L2 table is not present in L1, allocate new cluster and L2. Perform L2 and L1 link after writing data.
| |
| # If data cluster is not present in L2, allocate new cluster. Perform L1 link after writing data.
| |
| # Otherwise overwrite data cluster.
| |
| | |
| The L2 link '''should''' be made after the data is in place on storage. However, when no ordering is enforced the worst case scenario is an L2 link to an unwritten cluster.
| |
| | |
| The L1 link '''must''' be made after the L2 cluster is in place on storage. If the order is reversed then the L1 table may point to a bogus L2 table. (Is this a problem since clusters are allocated at the end of the file?)
| |
| | |
| ==Grow==
| |
| # If table_size * TABLE_NOFFSETS < new_image_size, fail -EOVERFLOW. The L1 table is not big enough.
| |
| # Write new image_size header field.
| |
| | |
| =Data integrity=
| |
| ==Write==
| |
| Writes that complete before a flush must be stable when the flush completes.
| |
| | |
| If storage is interrupted (e.g. power outage) then writes in progress may be lost, stable, or partially completed. The storage must not be otherwise corrupted or inaccessible after it is restarted.
| |
| | |
| = Future Features =
| |
| * [[Features/QED/Streaming|Streaming]] | | * [[Features/QED/Streaming|Streaming]] |
| * [[Features/QED/OnlineDefrag|Online defragmentation]] | | * [[Features/QED/OnlineDefrag|Online defragmentation]] |
| * [[Features/QED/Trim|Trim]]
| |
| * [[Features/QED/ParallelSubmission|Parallel submission]] | | * [[Features/QED/ParallelSubmission|Parallel submission]] |
| * [[Features/QED/ScanAvoidance|Meta-data scan avoidance]] | | * [[Features/QED/ScanAvoidance|Meta-data scan avoidance]] |
| | |
| | [[Category:Obsolete feature pages]] |