Features/QED

From QEMU
Revision as of 13:03, 16 September 2010 by AnthonyLiguori (talk | contribs)

Specification

The file format looks like this:

+----------+----------+----------+-----+
| cluster0 | cluster1 | cluster2 | ... |
+----------+----------+----------+-----+

The first cluster contains a header. The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file. A regular cluster may be a data cluster, an L2, or an L1 table. L1 and L2 tables are composed of one or more contiguous clusters.

Normally the file size will be a multiple of the cluster size. Extra information at the end of the file may not be preserved if new clusters are allocated. Legitimate out-of-band data should use space between the header and the first regular cluster.

All fields are little-endian.

Header

Header {
    uint32_t magic;               /* QED\0 */

    uint32_t cluster_size;        /* in bytes */
    uint32_t table_size;          /* for L1 and L2 tables, in clusters */
    uint32_t header_size;         /* in clusters */

    uint64_t features;            /* format feature bits */
    uint64_t compat_features;     /* compat feature bits */
    uint64_t l1_table_offset;     /* in bytes */
    uint64_t image_size;          /* total logical image size, in bytes */

    /* if (features & QED_F_BACKING_FILE) */
    uint32_t backing_filename_offset; /* in bytes from start of header */
    uint32_t backing_filename_size;   /* in bytes */

    /* if (compat_features & QED_CF_BACKING_FORMAT) */
    uint32_t backing_fmt_offset;  /* in bytes from start of header */
    uint32_t backing_fmt_size;    /* in bytes */
}

Strings are given in (byte offset, byte size) form. They are not NUL-terminated.

The cluster_size and table_size fields must be powers of 2.

l1_table_offset and image_size must be multiples of cluster_size.

Note that backing_filename_offset and backing_fmt_offset do not have alignment constraints.

Tables

#define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
 
Table {
    uint64_t offsets[TABLE_NOFFSETS];
}

The tables are organized as follows:

                   +----------+
                   | L1 table |
                   +----------+
              ,------'  |  '------.
         +----------+   |    +----------+
         | L2 table |  ...   | L2 table |
         +----------+        +----------+
     ,------'  |  '------.
+----------+   |    +----------+
|   Data   |  ...   |   Data   |
+----------+        +----------+

A table is made up of one or more contiguous clusters. The table_size header field determines table size for an image file. For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.

The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:

header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size

Operations

Read

  1. If L2 table is not present in L1, read from backing image.
  2. If data cluster is not present in L2, read from backing image or zero fill if no backing image.
  3. Otherwise read data from cluster.

Write

  1. If L2 table is not present in L1, allocate new cluster and L2. Perform L2 and L1 link after writing data.
  2. If data cluster is not present in L2, allocate new cluster. Perform L1 link after writing data.
  3. Otherwise overwrite data cluster.

The L2 link should be made after the data is in place on storage. However, when no ordering is enforced the worst case scenario is an L2 link to an unwritten cluster.

The L1 link must be made after the L2 cluster is in place on storage. If the order is reversed then the L1 table may point to a bogus L2 table. (Is this a problem since clusters are allocated at the end of the file?)

Grow

  1. If table_size * TABLE_NOFFSETS < new_image_size, fail -EOVERFLOW. The L1 table is not big enough.
  2. Write new image_size header field.

Data integrity

Write

Writes that complete before a flush must be stable when the flush completes.

If storage is interrupted (e.g. power outage) then writes in progress may be lost, stable, or partially completed. The storage must not be otherwise corrupted or inaccessible after it is restarted.

Future Features