Internships/ProjectIdeas/SVQOptimization

From QEMU

Shadow Virtqueue performance optimization

Summary: Implement multithreading and mmap Queue Notifier optimizations for Shadow Virtqueues

To perform a virtual machine live migration, QEMU needs to know when devices modify memory so that the memory contents can be migrated each time they have been modified. Otherwise the virtual machine would resume with outdated memory and probably crash.

This is especially difficult with passthrough hardware devices because QEMU does not intercept the memory writes. As a method to overcome this for virtio devices, QEMU can present an emulated virtqueue to the device, called a Shadow Virtqueue (SVQ), instead of allowing the device to access guest memory directly. SVQ will then forward the writes to the guest, being the effective writer of the guest memory and knowing when a portion of it needs to be migrated again.

As this effectively breaks passthrough and adds extra steps in the communication, this comes with a performance penalty in several forms: Context switches, more memory reads and writes increasing cache pressure, etc.

At this moment the SVQ code is not optimized. It cannot forward buffers in parallel using multiqueue and multithreading, and it does not use the mmap Queue Notify mechanism to notify the device for available buffers efficiently, so these notifications need to perform an extra host kernel context switch.

The SVQ code requires modifications for multithreading. There are examples of multithreaded devices already, like virtio-blk, which can be used as a template. Proposals for a command-line syntax to map virtqueues to threads have been sent to QEMU mailing list already and you can use them.

Regarding the mmap Queue Notify optimization, QEMU already has code to set up the Queue Notify address for a vDPA device in vhost_vdpa_host_notifier_init(). At the moment the SVQ code doesn't make use of this yet. The SVQ code needs to write into the mmap region to avoid the more expensive event_notifier_set() call in vhost_svq_kick().

Your goal is the complete the following tasks:

  • Measure SVQ performance compared to non-SVQ with standard profiling tools like netperf (TCP_STREAM & TCP_RR) or iperf equivalent + DPDK's testpmd with AF_PACKET.
  • Add multithreading to SVQ, extracting the code from the Big QEMU Lock (BQL)
  • Add mmap Queue Notify support to vhost_svq_kick().

Links:

Details:

  • Project size: 350 hours
  • Skill level: intermediate
  • Languages: C
  • Mentor: Eugenio Perez Martin <eperezma@redhat.com> (eperezma on IRC)
  • Suggested by: Eugenio Perez Martin <eperezma@redhat.com>