Internships/ProjectIdeas/NVMePerformance: Difference between revisions

From QEMU
(Created page with " === QEMU NVMe Performance Optimization === '''Summary:''' QEMU's NVMe implementation uses traditional trap-and-emulation method to emulate I/Os, thus the performance suffer...")
 
Line 1: Line 1:


=== QEMU NVMe Performance Optimization ===
=== NVMe Emulation Performance Optimization ===
'''Summary:'''
'''Summary:'''
QEMU's NVMe implementation uses traditional trap-and-emulation method to emulate I/Os, thus the performance suffers due to frequent VM-exits. NVMe Specification 1.3 defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers like QEMU NVMe. The goal of this summer of code is to add such support in QEMU and apply polling techniques to achieve comparable performance as virtio-blk dataplane. Specifically, this project includes the following parts: (1) add shadow doorbell buffer and ioeventfd support into QEMU NVMe emulation, which will reduce the number of VM-exits and make them less expensive (reducing VCPU latency); (2) add iothread support to QEMU NVMe emulation to reduce or eliminate VM-exits; (3) add a RAM disk back-end for debugging; (4) implement an interrupt coalescing scheme for efficient host-to-guest communication (at least one of (3) and (4)).
QEMU's NVMe emulation uses the traditional trap-and-emulation method to emulate I/Os, thus the performance suffers due to frequent VM-exits. It is possible to run emulation in a dedicated thread called an IOThread using the "ioeventfd" mechanism to receive notifications of Submission Queue Tail Doorbell writes.  Emulating NVMe in a separate thread allows the vcpu thread to continue execution and results in better performance.
 
The NVMe Specification 1.3 defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers by reducing the number of Submission Queue Tail Doorbell writes.
 
Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes.  This technique is called polling and reduces notification latency at the expense of an another thread consuming CPU to detect queue activity.
 
The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance.
 
Tasks include:
* Add Submission Queue Tail Doorbell register ioeventfd support (see existing patch linked below)
* Add Shadow Doorbell Buffer support to reduce doorbell writes
* Add Submission Queue polling
* Add IOThread support so emulation can run in a dedicated thread


'''Links:'''
'''Links:'''
Line 9: Line 21:
* https://github.com/ucare-uchicago/femu
* https://github.com/ucare-uchicago/femu
* https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf
* https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf
* https://patchwork.kernel.org/patch/10259575/
'''Details:'''
'''Details:'''
* Skill level: intermediate-advanced
* Skill level: intermediate-advanced
* Language: C
* Language: C
* Mentor: Paolo Bonzini <pbonzini@redhat.com>, bonzini on IRC
* Mentor: Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
* Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini
* Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini

Revision as of 17:11, 9 January 2020

NVMe Emulation Performance Optimization

Summary: QEMU's NVMe emulation uses the traditional trap-and-emulation method to emulate I/Os, thus the performance suffers due to frequent VM-exits. It is possible to run emulation in a dedicated thread called an IOThread using the "ioeventfd" mechanism to receive notifications of Submission Queue Tail Doorbell writes. Emulating NVMe in a separate thread allows the vcpu thread to continue execution and results in better performance.

The NVMe Specification 1.3 defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers by reducing the number of Submission Queue Tail Doorbell writes.

Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes. This technique is called polling and reduces notification latency at the expense of an another thread consuming CPU to detect queue activity.

The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance.

Tasks include:

  • Add Submission Queue Tail Doorbell register ioeventfd support (see existing patch linked below)
  • Add Shadow Doorbell Buffer support to reduce doorbell writes
  • Add Submission Queue polling
  • Add IOThread support so emulation can run in a dedicated thread

Links:

Details:

  • Skill level: intermediate-advanced
  • Language: C
  • Mentor: Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
  • Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini