Features/OpBlockers

From QEMU

Background:


Block jobs, and other QAPI operations, may modify and impact the BlockDriverState graph in QEMU. In order to support multiple operations safely, we need a mechanism to block and gate operations,

We currently have op blockers, that are attached to each BDS. However, in practice we check this on the device level, rather than on the granularity of the individual BDS. Also, due to limitations of the current system, we only allow a single block job at a time.


Proposed New Design Features:


This design would supersede the current op blocker system.

Rather than have the op blockers attached directly to each BDS, Block Job access will be done through a separate Node Access Control system.

This new system will:

  • Allow / Disallow operations to BDSs, (generally initiated by QAPI actions; i.e. BDS node operations other than guest read/writes)
  • Not be limited to "block jobs" (e.g. block-commit, block-stream, etc..) - will also apply to other operations, such as blockdev-add, change-backing-file, etc.
  • Allow granularity in options, and provide the safety needed to support multiple block jobs and operations.
  • Reference each BDS by its node-name
  • Be independent of the bdrv_states graph (i.e. does not reside in the BDS structure)


Node Access Control: Jobs


Every QAPI/HMP initiated operation is considered a "job", regardless if it is implemented via coroutines (e.g. block jobs such as block-commit), or handled synchronously in a QAPI handler (e.g. change-backing-file, blockdev-add).

A job consists of:

  • Unique job ID
  • A list of all node-names affected by the operation
  • Action flags (see below) for each node in the above list


Node Access Control: Action Flags


Every operation must set action flag pairs, for every node affected by the operation.

Action flags are set as a Require/Allow pair. If the Require flag is set, then the operation requires the ability to take the specific action. If the Allow flag is set, then the operation will allow other operations to perform same action in parallel.

The default is to prohibit, not allow, parallel actions.

The proposed actions are:

1. Modify - Visible Data

  • This is modification that would be detected by an external read (for instance, a hypothetical qemu-io read from the specific node), inclusive of its backing files. A classic example would be writing data into another node, as part of block-commit.

2. Modify - Metadata

  • This is a write that is not necessarily visible to an external read, but still modifies the underlying image format behind the node. I.e., change-backing-file to an identical backing file. (n.b.: if changing the backing file to a non-identical backing file, the "Write - Visible Data" action would also be required).

3. Graph Reconfiguration

  • Removing, adding, or reordering nodes in the bdrv_state graph. (n.b.: Write - Visible Data may need to be set, if appropriate)

4. Read - Visible Data

  • Read data visible to an external read.

5. Read - Metadata

  • Read node metadata


Node Access Control: Tracking Jobs


Each new NAC job will have a unique ID. Associated with that ID is a list of each BDS node affected by the operation, alongside its corresponding Action Flags.

When a new NAC job is created, a separate list of node-name flags is updated to provide easy go/no go decisions.

An operation is prohibited if:

  • A "Require" Action Flag is set, and
  • The logical AND of the "Allow" Action Flag of all NAC-controlled operations for that node is 0.

- OR -

  • The operation does not set the "Allow" Action Flag, and
  • The logical OR of the corresponding "Require" Action Flag of all NAC-controlled operations for that node is 1.

When a NAC controlled job completes, the node-name flag list is updated, and the corresponding NAC job removed from the job list.


General Notes:


  • This is still a "honor" system, in that each handler / job is responsible for acquiring the NAC permission, and properly identifying all nodes affected correctly by their operation.

This should be done before any action is taken by the handler - that is, it should be clean to abort the operation if the NAC does not give consent.

  • It may be possible to later expand the NAC system, to provide handles for use by low-level operations in block.c.