Features/Channel I/O Passthrough: Difference between revisions

From QEMU
No edit summary
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''''Notice: this page is still under construction'''''
 


Channel I/O is a high-performance input/output (I/O) architecture that is implemented (especially) on s390 (see [https://en.wikipedia.org/wiki/Channel_I/O the wikipedia article]).
Channel I/O is a high-performance input/output (I/O) architecture that is implemented (especially) on s390 (see [https://en.wikipedia.org/wiki/Channel_I/O the wikipedia article]).
Line 8: Line 8:
standard operating system algorithms for handling channel devices.
standard operating system algorithms for handling channel devices.


However this is not enough. On s390 for the majority of devices, which use the standard Channel I/O based mechanism, it also needs to provide the functionality of passing through them to a QEMU virtual machine. This includes devices that don't have a virtio counterpart (e.g. tape drives) or that have specific characteristics which guests want to exploit.
However this is not enough. To make the majority of devices on s390, which use the standard Channel I/O based mechanism, usable in a QEMU virtual machine, a passthrough mechanism is needed. This includes devices that don't have a virtio counterpart (e.g. tape drives) or that have specific characteristics which guests want to exploit.


For passing a device to a guest, this uses the same interface as everybody else, namely vfio. Thus, the vfio support for channel devices is introduced , and this new vfio device is named "vfio-ccw".
For passing a device to a guest, this uses the same interface as everybody else, namely vfio. Thus, the vfio support for channel devices is introduced, and this new vfio device is named "vfio-ccw".


== Implementation ==
== Implementation ==
Line 19: Line 19:
* The vfio_mdev driver for the mediated vfio ccw device.
* The vfio_mdev driver for the mediated vfio ccw device.


The QEMU part introduces a basic Channel I/O passthrough infrastructure based on vfio.
The QEMU part introduces a basic Channel I/O passthrough infrastructure based on vfio. It focuses on supporting dasd-eckd (cu_type/dev_type = 0x3990/0x3390) as the target device currently.
* Focus on supporting dasd-eckd (cu_type/dev_type = 0x3990/0x3390) as the target device currently.  
 
* Support new QEMU parameters in the style of:
=== QEMU (since 2.10) ===
 
It supports the new QEMU parameters in the style of:
     -machine s390-ccw-virtio(,s390-squash-mcss=on|off) \
     -machine s390-ccw-virtio(,s390-squash-mcss=on|off) \
     -device vfio-ccw,sysfsdev=$MDEV_PATH
     -device vfio-ccw,sysfsdev=$MDEV_PATH


The new machine option s390-squash-css is added to squash e.g. passed-through channel devices from their real css (0-3, or 0 for hosts not activating MCSS-E) into the default css, as all virtio-ccw devices are in css 0xfe (and show up in the default css 0 for guests not activating MCSS-E).
At that time, the default css 0xFE was restricted to virtual subchannel devices. So the new machine option s390-squash-mcss was added to squash e.g. passed-through channel devices from their real css (0-3, or 0 for hosts not activating MCSS-E) into the default css, as all virtio-ccw devices are in css 0xFE (and show up in the default css 0 for guests not activating MCSS-E).
 
=== QEMU (since 2.12) ===
 
The hope when the decision (css 0xFE restriction) was made was, that non-virtual subchannel devices will come around when guest can exploit multiple channel subsystems. Using css 0xFE seemed like a good idea; but as things worked out differently in the meantime (MCSS-E will not come in the foreseeable future), it causes more problems right now than it avoids.
 
A recent discussion about unrestricted cssid resulted in a decision to remove the restriction of putting non-virtual devices in non-0xFE css, and thus to deprecate the s390-squash-mcss property (since 2.12). So, users should not use this property anymore.
 
One pain-point of this change is downgrade or upgrade of QEMU for command line users. The old way and the new way of doing vfio-ccw are mutually incompatible.
 
Libvirt is only going to support the new way, so for libvirt users, the possible problems at QEMU downgrade are the following. If a domain contains virtual devices placed into a css different than 0xFE the domain will refuse to start with an older QEMU. Putting devices into a css different than 0xFE however won't make much sense in the near future (guest support). Libvirt will refuse to do vfio-ccw with an older QEMU. This is business as usual.
 
Reference of the discussions could be found:
* Questions about usability mess that caused by differentiating address based on devices types (see [https://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg02495.html])
* [PATCH 0/3] unrestrict cssids related patches (see [https://lists.nongnu.org/archive/html/qemu-devel/2017-12/msg00031.html])
 
=== LGM consideration ===
 
The adverse effect of getting rid of the css 0xFE restriction on migration should not be too severe. vfio-ccw devices are not live-migratable yet, and for virtual devices using the extra freedom would only make sense with the MCSS-E guest support in place.
 
The auto-generated bus ids are also affected. We hope to not encounter any auto-generated bus ids in production as Libvirt is always explicit about the bus id. Since 8ed179c937 ("s390x/css: catch section mismatch on load", 2017-05-18) the worst that can happen because the same device ended up having a different bus id is a cleanly failed migration. Guests supposed to be migrated should make sure that they use explicit devnos.
 
The following shows LGM behaviors for both QEMU 2.10 and QEMU 2.12 (and higher, theoretically) for reference.
 
==== QEMU 2.10 ====
    ------------+---------------+-------------
                | squashing off | squashing on
    ------------+---------------+-------------
        auto id |        F      |    F     
    ------------+---------------+-------------
    explicit id |        F      |    S     
    ------------+---------------+-------------
 
*T1. squashing off + auto id
 
Fail due to css mismatch - there is no css 0 in the new vm.
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                     
*T2. squashing off + explicit given id
 
Fail due to css mismatch - there is no css 0 in the new vm.
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                     
*T3. squashing on  + auto id
 
Fail due to busid mismatch.
  qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                     
*T4. squashing on  + explicit given id
 
  Succeed.
 
==== QEMU 2.12 ====
 
    ------------+---------------+-------------
                | squashing off | squashing on
    ------------+---------------+-------------
        auto id |        F      |    F     
    ------------+---------------+-------------
    explicit id |        S'    |    S     
    ------------+---------------+-------------
 
*T5. squashing off + auto id
 
Fail due to busid mismatch.
  qemu-system-s390x: Unknown savevm section or instance '/fe.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
 
*T6. squashing off + explicit given id
 
Setting vfio-ccw.devno=non-fe.x.xxxx (same as T2). Fail due to css mismatch - there is no css 0 in the new vm.
  qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
  qemu-system-s390x: Failed to load s390_css:css
  qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
  qemu-system-s390x: load of migration failed: Invalid argument
 
Setting vfio-ccw.devno=fe.x.xxxx.
  Succeed.                                                                                           
 
*T7. squashing on  + auto id
 
Fail due to busid mismatch.
  qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
  qemu-system-s390x: load of migration failed: Invalid argument
 
*T8. squashing on  + explicit given id
 
  Succeed.
 
=== Libvirt ===
 
The libvirt support is under development.


== Setup ==
== Setup ==


This example setup is done with a Linux guest running in an s390x-ccw-virtio machine.
This example setup is done for a Linux guest running in an s390x-ccw-virtio machine.


=== Host ===
=== Host ===


* Kernel Configuration
* Kernel Configuration
   CONFIG_S390_CCW_IOMMU=m
   CONFIG_S390_CCW_IOMMU=y
   CONFIG_VFIO=m
   CONFIG_VFIO=m
   CONFIG_VFIO_MDEV=m
   CONFIG_VFIO_MDEV=m
Line 48: Line 148:


* You need to have a [https://en.wikipedia.org/wiki/Direct-access_storage_device DASD/ECKD] device as the target device.
* You need to have a [https://en.wikipedia.org/wiki/Direct-access_storage_device DASD/ECKD] device as the target device.
Find the subchannel (0.0."%schid") of your target DASD/ECKD device and bind the subchannel to the vfio_ccw driver.
Find the subchannel (0."$ssid"."$schid") of your target DASD/ECKD device and bind the subchannel to the vfio_ccw driver.
   #find the DASD you can use with lsdasd on your host. e.g.:
   #find the DASD you can use with lsdasd on your host, e.g.:
   devno="7e52"
   #devno="7e52"
   schid="16ca"
   #schid="16ca"
  #ssid="0"
  #for device 0.0.7e52 on subchannel 0.0.16ca.
   #unbind the CCW device from its subchannel
   #unbind the CCW device from its subchannel
   echo 0.0."$devno" > /sys/bus/ccw/devices/0.0."$devno"/driver/unbind
   echo 0."$ssid"."$devno" > /sys/bus/ccw/devices/0."$ssid"."$devno"/driver/unbind
   #unbind the subchannel from the I/O subchannel driver
   #unbind the subchannel from the I/O subchannel driver
   echo 0.0."$schid" > /sys/bus/css/devices/0.0."$schid"/driver/unbind
   echo 0."$ssid"."$schid" > /sys/bus/css/devices/0."$ssid"."$schid"/driver/unbind
   #bind the subchannel to the vfio_ccw driver
   #bind the subchannel to the vfio_ccw driver
   echo 0.0."$schid" > /sys/bus/css/drivers/vfio_ccw/bind
   echo 0."$ssid"."$schid" > /sys/bus/css/drivers/vfio_ccw/bind


* Create a Mediated Device for the phsical device
* Create a Mediated Device for the physical device
   #generate a uuid with uuidgen. e.g.:
   #generate a uuid with uuidgen. e.g.:
   uuid="6dfd3ec5-e8b3-4e18-a6fe-57bc9eceb920"
   #uuid="6dfd3ec5-e8b3-4e18-a6fe-57bc9eceb920"
   echo "$uuid" > /sys/bus/css/devices/0.0."$schid"/mdev_supported_types/vfio_ccw-io/create
   echo "$uuid" > /sys/bus/css/devices/0."$ssid"."$schid"/mdev_supported_types/vfio_ccw-io/create


=== Starting QEMU ===
=== Starting QEMU ===


* Add a vfio-ccw device to your QEMU command line (machine needs to be <code>s390x-ccw-virtio</code>, with <code>s390-squash-css=on</code>). Example:
* Add a vfio-ccw device to your QEMU command line (machine needs to be <code>s390x-ccw-virtio</code><s>, with <code>s390-squash-mcss=on</code></s>). Example:
   -M s390-ccw-virtio,s390-squash-css=on \
   -M s390-ccw-virtio<s>,s390-squash-mcss=on</s> \
   -device vfio-ccw,devno=0.0.1234,sysfsdev=/sys/bus/mdev/devices/$uuid \
   -device vfio-ccw,devno=fe.0.1234,sysfsdev=/sys/bus/mdev/devices/$uuid \
   ... ...
   ... ...
* Start QEMU. Your guest will be presented with a real Channel I/O device, with the example above, to be more specific, a DASD/ECKD device.
* Start QEMU. Your guest will be presented with a real Channel I/O device, with the example above, to be more specific, a DASD/ECKD device.
Line 74: Line 176:
=== Try the device on your Guest ===
=== Try the device on your Guest ===


* The guest will see the DASD/ECKD as a channel-attached device. With the example above, the device will show up as 0.0.1234.
* The guest will see the DASD/ECKD as a channel-attached device. With the example above, the device will show up as 0.0.1234 (verify e.g. via the lscss tool).


* You can online the device by calling <code>chccwdev -e 0.0.1234</code> and use it as a block device.
* You can online the device by calling <code>chccwdev -e 0.0.1234</code> and use it as a block device.
Line 80: Line 182:
== Restrictions ==
== Restrictions ==


* Only target the I/O subchannels.
* Only target I/O subchannels.
* Only basic commands (read/write) are supported.
* Only basic commands (read/write) have been tested.
* Some commands may need special handling in the future, for example, anything related to path grouping.
* Some commands may need special handling in the future, for example, anything related to path grouping.
Please refer to [[ToDo/Channel_I/O_Passthrough]] as well.

Latest revision as of 10:49, 11 June 2018


Channel I/O is a high-performance input/output (I/O) architecture that is implemented (especially) on s390 (see the wikipedia article).

Motivation

In the past, a guest virtualized via QEMU/KVM on s390 only sees paravirtualized virtio devices via the "Virtio Over Channel I/O (virtio-ccw)" transport. This makes virtio devices discoverable via standard operating system algorithms for handling channel devices.

However this is not enough. To make the majority of devices on s390, which use the standard Channel I/O based mechanism, usable in a QEMU virtual machine, a passthrough mechanism is needed. This includes devices that don't have a virtio counterpart (e.g. tape drives) or that have specific characteristics which guests want to exploit.

For passing a device to a guest, this uses the same interface as everybody else, namely vfio. Thus, the vfio support for channel devices is introduced, and this new vfio device is named "vfio-ccw".

Implementation

vfio-ccw is realized with a mdev implementation. It has two drivers for two types of devices in the kernel:

  • The vfio_ccw driver for the physical subchannel device.
  • The vfio_mdev driver for the mediated vfio ccw device.

The QEMU part introduces a basic Channel I/O passthrough infrastructure based on vfio. It focuses on supporting dasd-eckd (cu_type/dev_type = 0x3990/0x3390) as the target device currently.

QEMU (since 2.10)

It supports the new QEMU parameters in the style of:

   -machine s390-ccw-virtio(,s390-squash-mcss=on|off) \
   -device vfio-ccw,sysfsdev=$MDEV_PATH

At that time, the default css 0xFE was restricted to virtual subchannel devices. So the new machine option s390-squash-mcss was added to squash e.g. passed-through channel devices from their real css (0-3, or 0 for hosts not activating MCSS-E) into the default css, as all virtio-ccw devices are in css 0xFE (and show up in the default css 0 for guests not activating MCSS-E).

QEMU (since 2.12)

The hope when the decision (css 0xFE restriction) was made was, that non-virtual subchannel devices will come around when guest can exploit multiple channel subsystems. Using css 0xFE seemed like a good idea; but as things worked out differently in the meantime (MCSS-E will not come in the foreseeable future), it causes more problems right now than it avoids.

A recent discussion about unrestricted cssid resulted in a decision to remove the restriction of putting non-virtual devices in non-0xFE css, and thus to deprecate the s390-squash-mcss property (since 2.12). So, users should not use this property anymore.

One pain-point of this change is downgrade or upgrade of QEMU for command line users. The old way and the new way of doing vfio-ccw are mutually incompatible.

Libvirt is only going to support the new way, so for libvirt users, the possible problems at QEMU downgrade are the following. If a domain contains virtual devices placed into a css different than 0xFE the domain will refuse to start with an older QEMU. Putting devices into a css different than 0xFE however won't make much sense in the near future (guest support). Libvirt will refuse to do vfio-ccw with an older QEMU. This is business as usual.

Reference of the discussions could be found:

  • Questions about usability mess that caused by differentiating address based on devices types (see [1])
  • [PATCH 0/3] unrestrict cssids related patches (see [2])

LGM consideration

The adverse effect of getting rid of the css 0xFE restriction on migration should not be too severe. vfio-ccw devices are not live-migratable yet, and for virtual devices using the extra freedom would only make sense with the MCSS-E guest support in place.

The auto-generated bus ids are also affected. We hope to not encounter any auto-generated bus ids in production as Libvirt is always explicit about the bus id. Since 8ed179c937 ("s390x/css: catch section mismatch on load", 2017-05-18) the worst that can happen because the same device ended up having a different bus id is a cleanly failed migration. Guests supposed to be migrated should make sure that they use explicit devnos.

The following shows LGM behaviors for both QEMU 2.10 and QEMU 2.12 (and higher, theoretically) for reference.

QEMU 2.10

   ------------+---------------+-------------
               | squashing off | squashing on
   ------------+---------------+-------------
       auto id |        F      |     F       
   ------------+---------------+-------------
   explicit id |        F      |     S       
   ------------+---------------+-------------
  • T1. squashing off + auto id

Fail due to css mismatch - there is no css 0 in the new vm.

 qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
 qemu-system-s390x: Failed to load s390_css:css
 qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
 qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                      
  • T2. squashing off + explicit given id

Fail due to css mismatch - there is no css 0 in the new vm.

 qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
 qemu-system-s390x: Failed to load s390_css:css
 qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
 qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                      
  • T3. squashing on + auto id

Fail due to busid mismatch.

 qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
 qemu-system-s390x: load of migration failed: Invalid argument
                                                                                                      
  • T4. squashing on + explicit given id
 Succeed.

QEMU 2.12

   ------------+---------------+-------------
               | squashing off | squashing on
   ------------+---------------+-------------
       auto id |        F      |     F       
   ------------+---------------+-------------
   explicit id |        S'     |     S       
   ------------+---------------+-------------
  • T5. squashing off + auto id

Fail due to busid mismatch.

 qemu-system-s390x: Unknown savevm section or instance '/fe.0.0003/virtio-rng' 0
 qemu-system-s390x: load of migration failed: Invalid argument
  • T6. squashing off + explicit given id

Setting vfio-ccw.devno=non-fe.x.xxxx (same as T2). Fail due to css mismatch - there is no css 0 in the new vm.

 qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
 qemu-system-s390x: Failed to load s390_css:css
 qemu-system-s390x: error while loading state for instance 0x0 of device 's390_css'
 qemu-system-s390x: load of migration failed: Invalid argument

Setting vfio-ccw.devno=fe.x.xxxx.

 Succeed.                                                                                             
  • T7. squashing on + auto id

Fail due to busid mismatch.

 qemu-system-s390x: Unknown savevm section or instance '/00.0.0003/virtio-rng' 0
 qemu-system-s390x: load of migration failed: Invalid argument
  • T8. squashing on + explicit given id
 Succeed.

Libvirt

The libvirt support is under development.

Setup

This example setup is done for a Linux guest running in an s390x-ccw-virtio machine.

Host

  • Kernel Configuration
 CONFIG_S390_CCW_IOMMU=y
 CONFIG_VFIO=m
 CONFIG_VFIO_MDEV=m
 CONFIG_VFIO_MDEV_DEVICE=m
 CONFIG_VFIO_CCW=m
  • Modules Required
 modprobe vfio.ko
 modprobe mdev.ko
 modprobe vfio_mdev.ko
 modprobe vfio_iommu_type1.ko
 modprobe vfio_ccw.ko
  • You need to have a DASD/ECKD device as the target device.

Find the subchannel (0."$ssid"."$schid") of your target DASD/ECKD device and bind the subchannel to the vfio_ccw driver.

 #find the DASD you can use with lsdasd on your host, e.g.:
 #devno="7e52"
 #schid="16ca"
 #ssid="0"
 #for device 0.0.7e52 on subchannel 0.0.16ca.
 #unbind the CCW device from its subchannel
 echo 0."$ssid"."$devno" > /sys/bus/ccw/devices/0."$ssid"."$devno"/driver/unbind
 #unbind the subchannel from the I/O subchannel driver
 echo 0."$ssid"."$schid" > /sys/bus/css/devices/0."$ssid"."$schid"/driver/unbind
 #bind the subchannel to the vfio_ccw driver
 echo 0."$ssid"."$schid" > /sys/bus/css/drivers/vfio_ccw/bind
  • Create a Mediated Device for the physical device
 #generate a uuid with uuidgen. e.g.:
 #uuid="6dfd3ec5-e8b3-4e18-a6fe-57bc9eceb920"
 echo "$uuid" > /sys/bus/css/devices/0."$ssid"."$schid"/mdev_supported_types/vfio_ccw-io/create

Starting QEMU

  • Add a vfio-ccw device to your QEMU command line (machine needs to be s390x-ccw-virtio, with s390-squash-mcss=on). Example:
 -M s390-ccw-virtio,s390-squash-mcss=on \
 -device vfio-ccw,devno=fe.0.1234,sysfsdev=/sys/bus/mdev/devices/$uuid \
 ... ...
  • Start QEMU. Your guest will be presented with a real Channel I/O device, with the example above, to be more specific, a DASD/ECKD device.

Try the device on your Guest

  • The guest will see the DASD/ECKD as a channel-attached device. With the example above, the device will show up as 0.0.1234 (verify e.g. via the lscss tool).
  • You can online the device by calling chccwdev -e 0.0.1234 and use it as a block device.

Restrictions

  • Only target I/O subchannels.
  • Only basic commands (read/write) have been tested.
  • Some commands may need special handling in the future, for example, anything related to path grouping.

Please refer to ToDo/Channel_I/O_Passthrough as well.