Features/MultiProcessQEMU: Difference between revisions
(Created page with "Multi-process QEMU is an effort to run emulated devices in separate processes to achieve better security. Separate processes can have tighter seccomp whitelists, namespaces,...") |
|||
Line 23: | Line 23: | ||
===Direct I/O dispatch=== | ===Direct I/O dispatch=== | ||
Hardware register accesses are dispatched via the QEMU process in the proof-of-concept patch series. A more performant solution would allow kvm.ko to dispatch I/O directly to the corresponding device emulation process. This will eliminate the overhead of out-of-process QEMU. Creating a new API (beyond ioeventfd) that transfers load and store values instead of just signalling a doorbell is necessary to achieve this. | Hardware register accesses are dispatched via the QEMU process in the proof-of-concept patch series. A more performant solution would allow kvm.ko to dispatch I/O directly to the corresponding device emulation process. This will eliminate the overhead of out-of-process QEMU. Creating a new API (beyond ioeventfd) that transfers load and store values instead of just signalling a doorbell is necessary to achieve this. | ||
See also [https://www.spinics.net/lists/kvm/msg208139.html Proposal for MMIO/PIO dispatch file descriptors (ioregionfd)] | |||
===VM-to-VM device emulation=== | ===VM-to-VM device emulation=== | ||
For additional isolation it is attractive to run device emulation code in a VM instead of a host userspace process. This is also a natural fit for compute clouds and other environments where it is not possible for users to run their own host userspace processes. This could be achieved along the lines of [[Features/VirtioVhostUser]] or vRDMA could be used. | For additional isolation it is attractive to run device emulation code in a VM instead of a host userspace process. This is also a natural fit for compute clouds and other environments where it is not possible for users to run their own host userspace processes. This could be achieved along the lines of [[Features/VirtioVhostUser]] or vRDMA could be used. |
Revision as of 17:00, 13 March 2020
Multi-process QEMU is an effort to run emulated devices in separate processes to achieve better security. Separate processes can have tighter seccomp whitelists, namespaces, and SELinux policies so the attack surface is reduced compared to a monolithic QEMU process. In the event that an emulated device is compromised, it will be more difficult to do damage to the host from a more confined process.
This feature is being developed by Elena Ufimtseva <elena.ufimtseva@oracle.com>, Jag Raman <jag.raman@oracle.com>, and John G Johnson <john.g.johnson@oracle.com>.
Status
An initial patch series with proof-of-concept support for the LSI SCSI controller is under review on the QEMU mailing list.
Future work
The following ideas for the future direction of multi-process QEMU have been proposed.
VFIO-over-socket
Inventing a new device emulation protocol from scratch has many disadvantages. VFIO could be used as the protocol to avoid reinventing the wheel and to reuse code (existing VMMs already support kernel VFIO). An experimental branch called vfio-user can serve as a starting point.
The goal is to stick with the kernel VFIO structs and constants, keeping differences minimal. This will make it easy to add for new features like live migration later once they have been introduced in the VFIO community.
qemu-device launcher
QEMU's modules system can be used to load features on demand at runtime. Device emulation code can be extended to support module compilation so Kconfig files can have 'm' (module) in addition to 'y' (built-in) and 'n' (not built). A qemu-device program will serve as the launcher for running a device. It will load the device as a QEMU module.
Security policies
Device emulation authors should have access to readily available security policies for classes of devices like network interfaces and storage controllers. It should not be necessary to write per-device security policies in most cases. This is critical for security because it is unlikely that requiring a from-scratch policy for each device will probably result in insecure policies or no policies at all.
Seccomp should be used to whitelist system calls needed. Namespaces should be used to revoke access to networking, files, and PIDs if such access is not needed. SELinux should be used to define access to resources.
Direct I/O dispatch
Hardware register accesses are dispatched via the QEMU process in the proof-of-concept patch series. A more performant solution would allow kvm.ko to dispatch I/O directly to the corresponding device emulation process. This will eliminate the overhead of out-of-process QEMU. Creating a new API (beyond ioeventfd) that transfers load and store values instead of just signalling a doorbell is necessary to achieve this.
See also Proposal for MMIO/PIO dispatch file descriptors (ioregionfd)
VM-to-VM device emulation
For additional isolation it is attractive to run device emulation code in a VM instead of a host userspace process. This is also a natural fit for compute clouds and other environments where it is not possible for users to run their own host userspace processes. This could be achieved along the lines of Features/VirtioVhostUser or vRDMA could be used.