Internships/AIOQueueSizingAndBoundedMigrationLatency

From QEMU
Revision as of 14:02, 26 January 2020 by Stefanha (talk | contribs) (Created page with "=== AIO Queue Sizing and Bounded Migration Latency === '''Summary:''' A two-part project: dynamically allocate AIO queues to prevent congestion and implement bounded drain t...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

AIO Queue Sizing and Bounded Migration Latency

Summary: A two-part project: dynamically allocate AIO queues to prevent congestion and implement bounded drain time during live migration.

This project consists of two independent tasks that should each take up around half of the available time.

The first task is to solve a bottleneck when multiple disks are emulated by a single AIO engine (a common case in QEMU). Each disk that the guest sees has a fixed-length queue for I/O requests. The AIO engine is a component in QEMU that performs I/O requests on behalf of the guest using a host operating system API (preadv(2)/pwritev(2), Linux AIO, or Linux io_uring). AIO engines currently have a fixed-length queue, regardless of how many disks are assigned to them. This means that guests with multiple disks can experience lower performance that single-disk guests. The goal is to extend QEMU's AIO engines with logic for resizing the queues depending on how many disks are sharing the AIO engine.

The second task is to avoid delays (10+ milliseconds) during live migration of guests that are performing lots of disk I/O. There is a point during the performance-critical phase of live migration where the guest is stopped and the bdrv_drained_begin/end() functions are called to wait for all pending I/O requests to complete. If the guest has submitted a lot of I/O requests and accessing the disk image takes a long time, then this can produce noticeable migration "downtime" during which the guest is unresponsive. You will design a solution to reduce I/O activity as the performance-critical phase of live migration approaches. One simple solution is to only submit one request at a time so that waiting for completion takes a short amount of time. Smarter solutions are also possible, although they may not be necessary - this will require some experimentation.

This project will expose you to the QEMU block layer and disk I/O emulation in general. You will learn how asynchronous I/O works and touch device emulation, migration, and block layer code in QEMU.

Links:


Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Stefan Hajnoczi <stefanha@gmail.com> ("stefanha" on IRC)