Features/Containers

From QEMU
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This feature is about running container images under QEMU.

Background

Containers have become a popular way to capture workloads and deploy them under Linux. Docker, CoreOS, Fedora Atomic, and other projects provide the ability to run container images. There is some overlap between containers and virtual machines.

The key difference between containers and virtual machines is that multiple containers share a single kernel whereas virtual machines run isolated on emulated hardware. There are several use cases where virtual machines are preferrable as an execution engine but the container workflow is still desirable. This includes secure multi-tenantant container hosting where the shared kernel architecture of containers is considered a risk, and use cases where tenants need to load specific kernel modules.

The goal is to allow QEMU to act as an execution engine for containers. Although the container will run as a virtual machine, the workflow for building, distributing, and deploying images is the same as for containers. It should be possible to run standards-compliant OCI container images either under QEMU.

Status

Marc Mari worked on reducing startup times in 2015. The goal was to boot guests boot in under 40 milliseconds and this was achieved by replacing the PIO fw_cfg interface with MMIO fw_cfg and building a minimal QEMU without most library dependencies. The core MMIO fw_cfg work was merged in QEMU and SeaBIOS.

The following work remains:

  • linuxboot.bin MMIO fw_cfg support
  • QEMU shared library dynamic linker optimization. Marc experimented with block modules to delay loading of block driver related shared libraries (e.g. librados, libssh, libnfs, etc). Perhaps more low-level linker optimizations are possible to avoid 100s of milliseconds in the runtime dynamic linker on startup.
  • NVDIMM device support is currently (March 2016) being upstreamed by Xiao Guangrong. This will enable guest page cache bypass using Direct Access for files (DAX).

Memory footprint has not been investigated yet. Ideally QEMU should only take as much memory as the guest requires (similar to automatic memory ballooning with overcommit).

Links