Multi-process QEMU is an effort to run emulated devices in separate processes to achieve better security. Separate processes can have tighter seccomp whitelists, namespaces, and SELinux policies so the attack surface is reduced compared to a monolithic QEMU process. In the event that an emulated device is compromised, it will be more difficult to do damage to the host from a more confined process.
This feature is being developed by Elena Ufimtseva <firstname.lastname@example.org>, Jag Raman <email@example.com>, and John G Johnson <firstname.lastname@example.org>.
An initial patch series with proof-of-concept support for the LSI SCSI controller is under review on the QEMU mailing list.
The current version posted for the review is V8. You can find the series discussion over the mailing list here: https://patchwork.kernel.org/cover/11695313/
The repository with the current branch can be found here: https://github.com/oracle/qemu/tree/multi-process-qemu-v0.8
- no dirty logging support, since that live migration is not included in this version;
- no support for PCIe;
- no MSI interrupts support;
- no IOMMU support;
The following ideas for the future direction of multi-process QEMU have been proposed.
Inventing a new device emulation protocol from scratch has many disadvantages. VFIO could be used as the protocol to avoid reinventing the wheel and to reuse code (existing VMMs already support kernel VFIO). An experimental branch called vfio-user can serve as a starting point.
The goal is to stick with the kernel VFIO structs and constants, keeping differences minimal. This will make it easy to add for new features like live migration later once they have been introduced in the VFIO community.
QEMU's modules system can be used to load features on demand at runtime. Device emulation code can be extended to support module compilation so Kconfig files can have 'm' (module) in addition to 'y' (built-in) and 'n' (not built). A qemu-device program will serve as the launcher for running a device. It will load the device as a QEMU module.
Device emulation authors should have access to readily available security policies for classes of devices like network interfaces and storage controllers. It should not be necessary to write per-device security policies in most cases. This is critical for security because it is unlikely that requiring a from-scratch policy for each device will probably result in insecure policies or no policies at all.
Seccomp should be used to whitelist system calls needed. Namespaces should be used to revoke access to networking, files, and PIDs if such access is not needed. SELinux should be used to define access to resources.
Direct I/O dispatch
Hardware register accesses are dispatched via the QEMU process in the proof-of-concept patch series. A more performant solution would allow kvm.ko to dispatch I/O directly to the corresponding device emulation process. This will eliminate the overhead of out-of-process QEMU. Creating a new API (beyond ioeventfd) that transfers load and store values instead of just signalling a doorbell is necessary to achieve this.
VM-to-VM device emulation
For additional isolation it is attractive to run device emulation code in a VM instead of a host userspace process. This is also a natural fit for compute clouds and other environments where it is not possible for users to run their own host userspace processes. This could be achieved along the lines of Features/VirtioVhostUser or vRDMA could be used.