Jump to navigation Jump to search

Two different patch series posted on the mailing list sparked a discussion on block device operations (snapshotting, enabling mirroring, switching to a different image) that have to be performed atomically.

One is Jeff Cody's multiple-device snapshot series, documented here.

The other is Federico Simoncelli's device mirroring series.

This page documents the four approaches that were discussed.

Jeff Cody

{ 'type': 'SnapshotDev',
  'data': {'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
{ 'command': 'blkdev-group-snapshot-sync',
  'data': { 'devlist':  [ 'SnapshotDev' ] } }

A single command prepares snapshots for many devices and commits the transaction if all snapshots are successfully prepared.

This proposal does not yet support for reopening and mirroring. Introducing them later is possible (for example by modifying the SnapshotDev type), but needs to be done before 1.1 in case backwards-incompatible API changes are needed.

Patches are almost ready for inclusion. Simple interface with only one command.
No support for reopening and mirroring. Improved error handling does not extend to the existing blockdev-snapshot-sync command.
How does error reporting work if you have multiple commands with the same device as target (e.g. reopen + mirror)?

Federico Simoncelli

Federico's patches are tailored on oVirt's use of live snapshots. oVirt wants to create snapshot files outside QEMU to have control on the paths that are used for backing files. To this end, a drive-reopen command is provided that can be used instead of blockdev-snapshot-sync if the snapshot file is created externally.

A second command, drive-migrate, activates mirroring on a given block device. Because of the same constraint on creating snapshots externally, oVirt in practice needs a combination of a drive-reopen command + activation of mirroring. And because the two operations have to be done atomically, drive-migrate also needs to specify a new source file.

{ 'drive-reopen',
  'data': { 'device': 'str', 'source': 'str', '*format': 'str' } }
{ 'drive-migrate',
  'data': { 'device': 'str', 'dest': 'str', '*dest-format': 'str',
            'new-source': 'str', '*source-format': 'str' } }
Patches on the mailing list.
Complicated interface tailored only on the oVirt usecase (Paolo tried to shoehorn more general-purpose cases in the same drive-migrate command, but with little success). Doesn't extend to mirroring multiple devices.

Paolo Bonzini

Adding transactions lets oVirt express its desired combination of drive-reopen + drive-mirror as two commands wrapped in a transaction.

The existing blockdev-snapshot-sync command would need changes to support invocations in a transaction, based on Jeff's code.

{ 'command' : 'blockdev-start-transaction' }
{ 'command' : 'blockdev-commit-transaction' }
{ 'command' : 'blockdev-abort-transaction' }
{ 'command' : 'drive-reopen',
  'data': { 'device': 'str', 'source': 'str', '*format': 'str' } }
{ 'command' : drive-mirror,
  'data': { 'device': 'str', 'dest': 'str', '*format': 'str' } }

Only drive-reopen, drive-mirror and blockdev-snapshot-sync can be part of a transaction. Other QMP commands are not, and will never be part of a transaction even in future versions of QEMU.

Uniform handling of all cases. oVirt usecase falls out nicely. Improved error handling extends to the existing blockdev-snapshot-sync command.
Requires changes to Jeff's patches. Requires transaction infrastructure that does not exist (however, most of the algorithms are already found in Jeff's patches).

Anthony Liguori

Anthony proposed a pair of commands to freeze/unfreeze a block device. This can also provide atomicity and subsumes Jeff's group snapshots too, but management would have to provide its own error handling.

{ 'command' : 'blockdev-freeze',
  'data': { 'device': 'str' } }
{ 'command' : 'blockdev-unfreeze',
  'data': { 'device': 'str' } }
{ 'command' : 'drive-reopen',
  'data': { 'device': 'str', 'source': 'str', '*format': 'str' } }
{ 'command' : drive-mirror,
  'data': { 'device': 'str', 'dest': 'str', '*format': 'str' } }
Poses fewest problems in adding new kinds of operations. oVirt usecase falls out nicely.
Management probably cannot provide the improvements in error handling provided by Jeff's patches. Possibly intrusive changes to the block layer required for freeze/unfreeze.