ToDo/Channel I/O Passthrough

From QEMU

This page lists areas in the implementation of vfio-ccw for channel I/O passthrough that still need work. It covers QEMU, the kernel part and the interface (most topics are expected to involve all three areas).

Missing architecture features

Support for unlimited prefetch

The current implementation prefetches the whole channel program, translates it and submits it to the hardware. This approach does not work if the guest does not want prefetching (e.g. because it wants to dynamically rewrite channel programs).

I/O instructions not executed on the hardware

We started out with only passing START SUBCHANNEL to the hardware, while relying on QEMU to emulate any other I/O instruction the guest uses. While this takes care of a huge part of what is needed, we need to handle some more.

HALT SUBCHANNEL and CLEAR SUBCHANNEL

[looked at by cohuck]

Terminating a running channel program is especially useful during error recovery, but there are devices that use e.g. a csch during their startup procedure (nothing we currently plan to support, though.) While they both terminate a running channel program, there are some differences:

  • hsch is accecpted while there is at most the start function specified (i.e., neither the halt nor the clear function).
  • csch is accecpted in any case (unless the subchannel is not operational). It will clear any start or halt function, and it will clear any pending I/O interrupt.

There is an inherent race condition when issuing any of these instructions: The scsw after a stsch may indicate that a start/halt is still in progress, but the subchannel may have become status pending with final status immediately afterwards. This needs careful serialization so that we don't get confused in the state machine and present a consistent status to the guest.

CANCEL SUBCHANNEL

This instruction is used to cancel a start operation that has been accepted by the subchannel but not yet started executing. We have the same race condition as with hsch/csch. A major difference is that xsch will not generate an interrupt, nor will the guest get an interrupt for the ssch it issued.

The easiest option would be to give the guest a cc 2 in any case: That covers both

  • guest did a ssch/hsch, but did not get a status yet
  • guest did nothing (subchannel idle)

STORE SUBCHANNEL

The guest only gets QEMU's view of the subchannel when it executes stsch. It may want the hardware's view instead. Either pass this through, or trigger QEMU to update its view.

MODIFY SUBCHANNEL

Enable/disable is handled by QEMU (we keep the real subchannel enabled during usage by the mdev framework). Things become complicated if we want to support channel monitoring (currently emulated for virtio-ccw devices in QEMU).

SET CHANNEL MONITOR

Not a per-subchannel command, currently emulated by QEMU. This becomes hairy when we want to deal with both passthrough and emulated devices.

TEST PENDING INTERRUPTION and TEST SUBCHANNEL

It's probably fine to leave them as-is (emulated), as I/O interrupts have to be managed by the host anyway. We may need to make sure the control blocks are updated by tsch correctly (and avoid interference with other instructions).

RESUME SUBCHANNEL

It is unclear how rsch can work with the current infrastructure.

Interface considerations

Current status

For ssch processing, we added an I/O region:

 struct ccw_io_region {
 #define ORB_AREA_SIZE 12
         __u8    orb_area[ORB_AREA_SIZE];
 #define SCSW_AREA_SIZE 12
         __u8    scsw_area[SCSW_AREA_SIZE];
 #define IRB_AREA_SIZE 96
         __u8    irb_area[IRB_AREA_SIZE];
         __u32   ret_code;
 } __packed;

However, this has some problems:

  • Semantics of the fields are unclear (is the scsw_area supposed to contain the copy of an scsw, or is it used to convey a command by specifying the start function in the fctl field?)
  • While this is probably extensible for halt/clear handling, other commands may not work so well.
  • We mix up sending a command from user space to the vfio module and handling a status.

This is unfortunately not really well documented, either (Documentation/s390/vfio-ccw.txt only states that "scsw_area should be filled with the SCSW of the Virtual Subchannel").

Add documentation

Whatever else we do, we need to document everything properly:

  • Documentation/s390/vfio-ccw.txt should be more detailed/precise
  • More comments regarding the interface in the code
  • Anything else?

More I/O regions

Additional I/O regions for kernel/user space communication seem to be the way to go. They need to be guarded via capabilities, making the interface easily extensible.

Regions that have been proposed:

  • Status area (containing scsw/pmcw/...)
  • Command area
  • CCW area
  • Measurements/statistics (for channel measurements etc.)

Big picture items

These may affect more than one guest at a time.

Channel path handling

  • Who manages the paths (including path grouping), the guest or the host?
  • How do we reflect path changes to the guest?
  • Do we need special handling for DASD reserve/release?

Tracepoints

We need some more of these, at strategic points.

Further features

Support for migration

Migration for other types of vfio devices has been discussed already. Currently, we require to unplug all passthrough devices before we can migrate.

This should leverage a common framework for migration of vfio devices, to avoid code duplication etc. That framework still needs to be designed -- we need to make sure that it accommodates our use case as well.

Transport mode

We currently only support command mode (cf. the name 'ccw'). It might be feasible to support transport mode as well (handling tcws etc.), but it is unsure what benefit that would bring other than enabling special guests.

Things we won't support

This includes things that are either not really feasible, or would mean a lot of effort for little gain.

Non-I/O subchannels

No public documentation for CHSC, message, or EADM subchannels is available, and we don't know about possible pitfalls.

I/O subchannels in QDIO mode

No public documentation for QDIO is available, either.

Who is currently looking at what?

cohuck: Cornelia Huck <cohuck AT redhat dot com>

  • halt/clear handling