Features/VirtioVhostUser

From QEMU
Jump to: navigation, search

The virtio-vhost-user device lets guests act as vhost device backends so that virtual network switches and storage appliance VMs can provide virtio devices to other guests.

virtio-vhost-user is currently under development and is not yet ready for production.

virtio-vhost-user was inspired by vhost-pci by Wei Wang and Zhiyong Yang.

Code

Quickstart

Compiling QEMU & DPDK

$ git clone -b virtio-vhost-user https://github.com/stefanha/qemu
$ (cd qemu && ./configure --target-list=x86_64-softmmu && make)
$ git clone -b virtio-vhost-user https://github.com/stefanha/dpdk
$ (cd dpdk && make config T=x86_64-native-linuxapp-gcc && \
   make T=x86_64-native-linuxapp-gcc install && \
   make "RTE_SDK=$PWD" RTE_TARGET=x86_64-native-linuxapp-gcc -C examples/vhost_scsi)

Launching the DPDK guest

Create a new guest (dpdk.img) for DPDK testing and then launch it:

$ cd qemu/x86_64-softmmu
$ ./qemu-system-x86_64 -M accel=kvm -cpu host -smp 2 -m 4G \
      -drive if=virtio,file=dpdk.img,format=raw \
      -chardev socket,id=chardev0,path=vhost-user.sock,server,nowait \
      -device virtio-vhost-user-pci,chardev=chardev0 \
      -netdev user,id=netdev0 -device virtio-net-pci,netdev=netdev0

Make sure the guest kernel command-line includes intel_iommu=on.

Copy the following files from the DPDK directory into the guest:

  • x86_64-native-linuxapp-gcc/app/testpmd
  • examples/vhost_scsi/build/app/vhost-scsi
  • usertools/dpdk-devbind.py

Testing vhost-user-net with DPDK testpmd

Run DPDK's testpmd inside the DPDK guest to foward traffic between the vhost-user-net device and the virtio-net-pci device:

# lspci -n # if you're PCI device addresses are different from mine
# export VVU_DEVICE="0000:00:04.0"
# export VNET_DEVICE="0000:00:05.0"
# nmcli d disconnect ens5 # we're going to use vfio-pci
# modprobe vfio enable_unsafe_noiommu_mode=1 # requires CONFIG_VFIO_NOIOMMU=y
# modprobe vfio-pci
# ./dpdk-devbind.py -b vfio-pci '00:04.0'
# ./dpdk-devbind.py -b vfio-pci '00:05.0'
# echo 1536 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./testpmd -l 0-1 --pci-whitelist "$VVU_DEVICE" \
            --vdev net_vhost0,iface="$VVU_DEVICE",virtio-transport=1 \
            --pci-whitelist "$VNET_DEVICE"
   

Now launch another guest with a vhost-user netdev:

$ qemu -M accel=kvm -cpu host -m 1G \
       -object memory-backend-file,id=mem0,mem-path=/var/tmp/foo,size=1G,share=on \
       -numa node,memdev=mem0 \
       -drive if=virtio,file=test.img,format=raw \
       -chardev socket,id=chardev0,path=vhost-user.sock \
       -netdev vhost-user,chardev=chardev0,id=netdev0 \
       -device virtio-net-pci,netdev=netdev0

When you exit testpmd you will see that packets have been forwarded.

Testing vhost-user-scsi with DPDK vhost-scsi

Run DPDK's vhost-scsi to serve as a vhost-user-scsi device backend:

# lspci -n # if you're PCI device addresses are different from mine
# export VVU_DEVICE="0000:00:04.0"
# modprobe vfio enable_unsafe_noiommu_mode=1 # requires CONFIG_VFIO_NOIOMMU=y
# modprobe vfio-pci
# ./dpdk-devbind.py -b vfio-pci '00:04.0'
# echo 1536 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./vhost-scsi -l 0-1 --pci-whitelist "$VVU_DEVICE" -- --virtio-vhost-user-pci "$VVU_DEVICE"

Now launch a guest using the vhost-user-scsi device:

$ qemu -M accel=kvm -cpu host -m 1G \
       -object memory-backend-file,id=mem0,mem-path=/var/tmp/foo,size=1G,share=on \
       -numa node,memdev=mem0 \
       -drive if=virtio,file=test.img,format=raw \
       -chardev socket,id=chardev0,path=vhost-user.sock

Hotplug the scsi adapter on the QEMU monitor once the guest has booted:

(qemu) device_add vhost-user-scsi-pci,disable-modern=on,chardev=chardev0

A new SCSI LUN will be detected by the guest.

Use cases

Appliances for cloud environments

In cloud environments everything is a guest. It is not possible for users to run vhost-user processes on the host. This precludes high-performance vhost-user appliances from running in cloud environments.

virtio-vhost-user allows vhost-user appliances to be shipped as virtual machine images. They can provide I/O services directly to other guests instead of going through an extra layer of device emulation like a host network switch:

    Traditional Appliance VMs       virtio-vhost-user Appliance VMs
+-------------+   +-------------+  +-------------+   +-------------+
|     VM1     |   |     VM2     |  |     VM1     |   |     VM2     |
|  Appliance  |   |   Consumer  |  |  Appliance  |   |   Consumer  |
|      ^      |   |      ^      |  |      <------+---+------>      |
+------|------+---+------|------+  +-------------+---+-------------+
|      +-----------------+      |  |                               |
|             Host              |  |             Host              |
+-------------------------------+  +-------------------------------+

Exitless VM-to-VM communication

Once the vhost-user session has been established all vring activity can be performed by poll mode drivers in shared memory. This eliminates vmexits in the data path so that the highest possible VM-to-VM communication performance can be achieved.

Even when interrupts are necessary, virtio-vhost-user can use lightweight vmexits thanks to ioeventfd instead of exiting to host userspace. This ensures that VM-to-VM communication bypasses device emulation in QEMU.

How it works

Virtio devices were originally emulated inside the QEMU host userspace process. Later on, vhost allowed a subset of a virtio device, called the vhost device backend, to be implement inside the host kernel. vhost-user then allowed vhost device backends to reside in host userspace processes instead.

virtio-vhost-user takes this one step further by moving the vhost device backend into a guest. It works by tunneling the vhost-user protocol over a new virtio device type called virtio-vhost-user.

The following diagram shows how two guests communicate:

+-------------+                     +-------------+
|     VM1     |                     |     VM2     |
|             |                     |             |
|    vhost    |    shared memory    |             |
|   device    | +-----------------> |             |
|   backend   |                     |             |
|             |                     | virtio-net  |
+-------------+                     +-------------+
|             |                     |             |
|  virtio-    |  vhost-user socket  |             |
| vhost-user  | <-----------------> | vhost-user  |
|    QEMU     |                     |    QEMU     |
+-------------+                     +-------------+

VM2 sees a regular virtio-net device. VM2's QEMU uses the existing vhost-user feature as if it were talking to a host userspace vhost-user backend.

VM1's QEMU tunnels the vhost-user protocol messages from VM1's QEMU to the new virtio-vhost-user device so that guest software in VM1 can act as the vhost-user backend.

It is possible to reuse existing vhost-user backend software with virtio-vhost-user since they use the same vhost-user protocol messages. A driver is required for the virtio-vhost-user PCI device that carries the message instead of the usual vhost-user UNIX domain socket. The driver can be implemented in a guest userspace process using Linux vfio-pci but guest kernel driver implementation would also be also possible.

The vhost device backend vrings are accessed through shared memory and do not require vhost-user message exchanges in the data path. No vmexits are taken when poll mode drivers are used. Even when interrupts are used, QEMU is not involved in the data path because ioeventfd lightweight vmexits are taken.

All vhost device types work with virtio-vhost-user, including net, scsi, and blk.