Internships/ProjectIdeas/NVMePerformance: Difference between revisions
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=== NVMe Emulation Performance Optimization === | === NVMe Emulation Performance Optimization === | ||
'''Summary:''' | '''Summary:''' | ||
QEMU's NVMe emulation uses the traditional trap-and- | QEMU's NVMe emulation uses the traditional trap-and-emulate method to | ||
emulate I/Os, thus the performance suffers due to frequent VM-exits. | |||
Version 1.3 of the NVMe specification defines a new feature to update | |||
doorbell registers using a Shadow Doorbell Buffer. This can be utilized | |||
to enhance performance of emulated controllers by reducing the number of | |||
Submission Queue Tail Doorbell writes. | |||
Further more, it is possible to run emulation in a dedicated thread | |||
called an IOThread. Emulating NVMe in a separate thread allows the vcpu | |||
thread to continue execution and results in better performance. | |||
Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes. | Finally, it is possible for the emulation code to watch for changes to | ||
the queue memory instead of waiting for doorbell writes. This technique | |||
is called polling and reduces notification latency at the expense of an | |||
another thread consuming CPU to detect queue activity. | |||
The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance. | The goal of this project is to add implement these optimizations so | ||
QEMU's NVMe emulation performance becomes comparable to virtio-blk | |||
performance. | |||
Tasks include: | Tasks include: | ||
* Add Shadow Doorbell Buffer support to reduce doorbell writes | * Add Shadow Doorbell Buffer support to reduce doorbell writes | ||
* Add Submission Queue Tail Doorbell register ioeventfd support when the Shadow Doorbell Buffer is enabled (see existing patch linked below) | |||
* Add Submission Queue polling | * Add Submission Queue polling | ||
* Add IOThread support so emulation can run in a dedicated thread | * Add IOThread support so emulation can run in a dedicated thread | ||
Line 22: | Line 33: | ||
* https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf | * https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf | ||
* https://patchwork.kernel.org/patch/10259575/ | * https://patchwork.kernel.org/patch/10259575/ | ||
* https://lore.kernel.org/qemu-devel/1447825624-17011-1-git-send-email-mlin@kernel.org/T/#u | |||
'''Details:''' | '''Details:''' | ||
* | * Project size: 350 hours | ||
* | * Difficulty: intermediate to advanced | ||
* Mentor: | * Required skills: C programming | ||
* Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini | * Desirable skills: knowledge of the NVMe PCI specification, knowledge of device driver or emulator development | ||
* Mentor: Klaus Jensen <its@irrelevant.dk> (kjensen on IRC), Keith Busch <kbusch@kernel.org> | |||
* Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC) |
Latest revision as of 12:38, 25 February 2022
NVMe Emulation Performance Optimization
Summary: QEMU's NVMe emulation uses the traditional trap-and-emulate method to emulate I/Os, thus the performance suffers due to frequent VM-exits. Version 1.3 of the NVMe specification defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers by reducing the number of Submission Queue Tail Doorbell writes.
Further more, it is possible to run emulation in a dedicated thread called an IOThread. Emulating NVMe in a separate thread allows the vcpu thread to continue execution and results in better performance.
Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes. This technique is called polling and reduces notification latency at the expense of an another thread consuming CPU to detect queue activity.
The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance.
Tasks include:
- Add Shadow Doorbell Buffer support to reduce doorbell writes
- Add Submission Queue Tail Doorbell register ioeventfd support when the Shadow Doorbell Buffer is enabled (see existing patch linked below)
- Add Submission Queue polling
- Add IOThread support so emulation can run in a dedicated thread
Links:
- https://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf
- http://ucare.cs.uchicago.edu/pdf/fast18-femu.pdf
- https://github.com/ucare-uchicago/femu
- https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf
- https://patchwork.kernel.org/patch/10259575/
- https://lore.kernel.org/qemu-devel/1447825624-17011-1-git-send-email-mlin@kernel.org/T/#u
Details:
- Project size: 350 hours
- Difficulty: intermediate to advanced
- Required skills: C programming
- Desirable skills: knowledge of the NVMe PCI specification, knowledge of device driver or emulator development
- Mentor: Klaus Jensen <its@irrelevant.dk> (kjensen on IRC), Keith Busch <kbusch@kernel.org>
- Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC)