Google Summer of Code 2022

From QEMU

Introduction

QEMU has been accepted into Google Summer of Code 2022. This page contains our ideas list and information for applicants and mentors. Google Summer of Code is an open source internship program offering paid remote work.

Application Process

1. Discuss the project idea with the mentor(s)

Read the project ideas list and choose one you are interested in. Read the links in the project idea description and start thinking about how you would approach this. Ask yourself:

  • Do I have the necessary technical skills to complete this project?
  • Will I be able to work independently without the physical presence of my mentor?

If you answer no to these questions, choose another project idea and/or organization that fits your skills.

Once you have identified a suitable project idea, email the mentor(s) your questions about the idea and explain your understanding of the project idea to them to verify that you are on the right track.

2. Fill out the application form

The application form asks for a problem description and outline of how you intend to implement a solution. You will need to do some background research (looking at source code, browsing relevant specifications, etc) in order to decide how to tackle the project. The form asks for an initial project schedule which you should create by breaking down the project into tasks and estimating how long they will take. The schedule can be adjusted during the summer so don't worry about getting everything right ahead of time.

3. IRC interview including a coding exercise

You may be invited to an IRC interview. The interview consists of a 30-minute coding exercise, followed by technical discussion and a chance to ask questions you have about the project idea, QEMU, and GSoC. The coding exercise is designed to show fluency in the programming language for your project idea (QEMU projects are typically in C but could also be in Python or Rust).

Here is a C coding exercise we have used in previous years when interviewing applicants: 2014 coding exercise

Try it and see if you can complete it comfortably. We cannot answer questions about the previous coding exercise but hopefully it should be self-explanatory.

If you find the exercise challenging, think about applying to other organizations where you have a stronger technical background and will be more competitive compared with other candidates.

Key Dates

From the timeline

  • March 7 - Organizations and project ideas announced
  • April 4 to 19 18:00 UTC - Application period
  • May 20 - Accepted applicants announced
  • June 13 to September 12 - Coding period

Find Us

For general questions about QEMU in GSoC, please contact the following people:

Project Ideas

This is the listing of suggested project ideas. Students are free to suggest their own projects, see #How to propose a custom project idea below.

Add zoned device support to QEMU's virtio-blk emulation

Summary:

The goal of this project is to let guests (virtual machines) access zoned storage devices on the host (hypervisor) through a virtio-blk device. This involves extending QEMU's block layer and virtio-blk emulation code.

Zoned devices are a special type of block device (hard-disks or SSDs) that are split into regions called zones. Any sector from any zone can be read in any order (sequentially or randomly) but zones can only be written sequentially and do not accept random writes. The "Links" section below contains more information about zoned devices and how they fit into the software stack.

QEMU's block layer needs new APIs that call Linux ZBD ioctls when disk images are located on zoned devices. The virtio-blk emulation code then needs to be extended to handle zoned device commands by calling the new block layer APIs to perform zoned device I/O on behalf of the guest. The virtio-blk zoned device command VIRTIO specification is currently being drafted and you will implement it in QEMU.

This project will expose you to device emulation and zoned storage. You will gain experience in systems programming and especially how storage devices work in the context of Linux and QEMU.

The concrete goals are:

  • Add QEMU block layer APIs resembling Linux ZBD ioctls.
  • Extend QEMU virtio-blk emulation to implement zoned device commands using new QEMU block layer zoned storage APIs.
  • Add qemu-iotests test cases covering zoned block devices.

Stretch goals (if there is enough time):

  • Implement zoned storage emulation in QEMU's block/null.c driver so it's easy to run tests without root (needed for Linux null or scsi_debug drivers) or nested guests (needed for QEMU NVMe ZNS).
  • Implement SCSI ZBC support in QEMU's SCSI target to enable zoned devices in QEMU's emulated SCSI HBAs.
  • Implement NVMe ZNS using new QEMU block layer zoned storage APIs (currently it emulates fake zones but doesn't call actual Linux ZBD ioctls).

You do not need to have a physical zoned storage device for this project because there are several ways to simulate zoned devices in software (Linux null_blk, Linux scsi_debug, tcmu-runner, and QEMU NVMe ZNS emulation).

Links:

Details:

  • Project size: 350 hours
  • Difficulty: intermediate
  • Required skills: C programming
  • Mentors: Damien Le Moal <Damien.LeMoal@wdc.com>, Dmitry Fomichev <Dmitry.Fomichev@wdc.com>, Hannes Reinecke <hare@suse.de>, Stefan Hajnoczi <stefanha@redhat.com>

VIRTIO_F_IN_ORDER support for virtio devices

Summary: Implement VIRTIO_F_IN_ORDER in QEMU and Linux (vhost and virtio drivers)

The VIRTIO specification defines a feature bit (VIRTIO_F_IN_ORDER) that devices and drivers can negotiate when the device uses descriptors in the same order in which they were made available by the driver.

This feature can simplify device and driver implementations and increase performance. For example, when VIRTIO_F_IN_ORDER is negotiated, it may be easier to create a batch of buffers and reduce DMA transactions when the device uses a batch of buffers.

Currently the devices and drivers available in Linux and QEMU do not support this feature. An implementation is available in DPDK for the virtio-net driver.

Goals:

  • Implement VIRTIO_F_IN_ORDER for a single device/driver in QEMU and Linux (virtio-net or virtio-serial are good starting points).
  • Generalize your approach to the common virtio core code for split and packed virtqueue layouts.
  • If time allows, support for the packed virtqueue layout can be added to Linux vhost, QEMU's libvhost-user, and/or QEMU's virtio qtest code.

Links:

Details:

  • Project size: 350 hours
  • Difficulty: intermediate
  • Required skills: C programming
  • Mentors: Stefano Garzarella <sgarzare@redhat.com>, Eugenio Perez Martin <eperezma@redhat.com>
    • IRC/Matrix nicks: sgarzare, eperezma
  • Suggested by: Jason Wang <jasowang@redhat.com>

Create encrypted storage using VM-based container runtimes

Summary: Extend crun to create encrypted storage by running a libkrun VM

The Linux cryptsetup(8) tool requires root privileges to encrypt storage with LUKS. However, privileged containers are generally discouraged for security reasons. A possible solution to avoid extra privileges is using VM-based container runtimes (e.g crun with libkrun or kata-containers) and running the storage encryption tool inside the VM.

This internship focusses on a proof-of-concept for integrating and extending the crun container runtime with libkrun in order to create encrypted storage without root privileges. The initial step will focus on creating encrypted images to demonstrate the feasibility and the necessary changes in the software stack. If the timeframe allows it, an interesting follow-up to the first step is the encryption of persistent storage using block-based volumes.

This project will expose you to container runtimes and virtual machines. You must be willing to dig into different source codes like crun (written in C), libkrun (written in Rust), and possibly podman or other kubernetes/containers projects (written in Go).

Links:

Details:

  • Project size: 350 hours
  • Required skills: C programming
  • Desirable skills: ability to read Go and Rust code, knowledge of containers and virtualization
  • Mentor: Alice Frosi <afrosi@redhat.com>, Co-mentor: Sergio Lopez Pascual <slp@redhat.com>

Improve s390x (IBM Z) emulation with RISU

Summary: Adapt RISU to s390x and fix CPU emulation along the way.

RISU (Random Instruction Sequence generator for Userspace testing) is a tool for testing CPU instructions with randomly generated opcodes. RISU generates random CPU instruction sequences and runs them both on a reference machine and under QEMU. The results are compared between the reference machine and QEMU so that inconsistencies in QEMU's emulation can be detected and fixed.

The goal of this project is to adapt the RISU framework for the IBM Z CPU architecture (a.k.a. s390x), so that it could be used to test the s390x emulation of QEMU for correctness. This will certainly help to spot some instruction emulation deficiencies in QEMU which should be addressed during this internship, too.

Goals / tasks include:

  • Getting familiar with the RISU framework (i.e. study the code, run it on other architectures like x86)
  • Getting familiar with s390x instructions (i.e. study the "z/Architecture Principles of Operation" document)
  • Adapt the RISU framework for s390x
  • Get familiar with the TCG emulation framework of QEMU (see the target/s390x/ folder in the QEMU sources)
  • Fix at least one problem that has been discovered by running RISU on s390x and get the patch accepted in the QEMU project

Links:

Details:

  • Project size: 350 hours
  • Difficulty: intermediate
  • Required skills: C and Perl programming, good basic understand of assembly (CPU instructions) but not necessarily s390x
  • Mentor: Thomas Huth <thuth@redhat.com> (th_huth on IRC)

Implement a snapshot fuzzing device

Summary: Add a new emulated device for rapid guest-initiated snapshot/restore functionality for fuzzing.

Fuzz testing runs a program with random inputs to find bugs that lead to crashes or other program failures. Fuzz testing is a popular technique for finding security bugs.

Many recent fuzzing projects rely on snapshot/restore functionality [1,2,3,4,5]. For example tests/fuzzers that target large targets, such as OS kernels and browsers benefit from full-VM snapshots, where solutions such as manual state-cleanup and fork-servers are insufficient. Many of the existing solutions are based on QEMU, however there is currently no upstream-solution. Furthermore, hypervisors, such as Xen have already incorporated support for snapshot-fuzzing. In this project, you will implement a virtual-device for snapshot fuzzing, following a spec agreed-upon by the community. The device will implement standard fuzzing APIs that allow fuzzing using engines, such as libFuzzer and AFL++. The simple APIs exposed by the device will allow fuzzer developers to build custom harnesses in the VM to request snapshots, memory/device/register restores, request new inputs, and report coverage.

Project goals include:

  • Adding a new emulated device for snapshot fuzzing into QEMU.
  • Writing documentation and final editing of the hardware interface specification so fuzzer developers can learn how to take advantage of the device from inside a guest.

Links:

  1. https://arxiv.org/pdf/2111.03013.pdf
  2. https://blog.mozilla.org/attack-and-defense/2021/01/27/effectively-fuzzing-the-ipc-layer-in-firefox/
  3. https://www.usenix.org/system/files/sec20-song.pdf
  4. https://github.com/intel/kernel-fuzzer-for-xen-project
  5. https://github.com/quarkslab/rewind

Details:

  • Project size: 350 hours
  • Difficulty: intermediate
  • Required skills: C programming
  • Desirable skills: previous experience with fuzzing and/or device driver development
  • Topic/Skill Areas: Fuzzing, OS/Systems/Drivers
  • Mentor: Alexander Bulekov <alxndr@bu.edu> (a1xndr on IRC)

Coverage-guided disk image fuzzing

Summary: Implement a coverage-guided fuzzer for disk images file formats

Fuzz testing runs a program with random inputs to find bugs that lead to crashes or other program failures. Fuzz testing is a popular technique for finding security bugs.

QEMU has a qcow2 fuzzer (see tests/image-fuzzer). However, this fuzzer is not coverage-guided, is limited to qcow2 images, and does not run on OSS-Fuzz. Therefore the existing fuzzer does not provide a lot of code coverage and a modern coverage-guided fuzzer integrated into OSS-Fuzz is desirable.

Disk image files sometimes come from an untrusted source and this makes QEMU's disk image format code an attack surface. One example is the qemu-img utility that can convert between disk image formats and may be used to import untrusted disk images during virtual machine creation. As such, it is important to fuzz this code effectively.

Your task will be to create a coverage-guided fuzzer for image formats supported by QEMU. Beyond basic image-parsing code (qemu-img info), the fuzzer should be able to find bugs in image-conversion code (qemu-img convert). Combined with a corpus of disk image files, the coverage-guided fuzzer will be able to explore code paths without much built-in knowledge of the about disk image file layout.

Project goals include:

  • Implement a fuzzer capable of exploring qemu-img convert and block/qcow2-*.c code.
  • Retarget the fuzzer to VMDK (block/vmdk.c) and VHDX (block/vhdx*.c) image files.
  • Add the new fuzzer to OSS-Fuzz
  • Stretch goal: Support DMG (block/dmg.c), Parallels (block/parallels.c), VDI (block/vdi.c), and VPC (block/vpc.c)

Links:

Details:

  • Project size: 175 hours
  • Difficulty: intermediate
  • Required skills: C programming
  • Topic/Skill Areas: Fuzzing, libFuzzer/AFL
  • Mentor: Alexander Bulekov <alxndr@bu.edu> (a1xndr on IRC)

NVMe Emulation Performance Optimization

Summary: QEMU's NVMe emulation uses the traditional trap-and-emulate method to emulate I/Os, thus the performance suffers due to frequent VM-exits. Version 1.3 of the NVMe specification defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers by reducing the number of Submission Queue Tail Doorbell writes.

Further more, it is possible to run emulation in a dedicated thread called an IOThread. Emulating NVMe in a separate thread allows the vcpu thread to continue execution and results in better performance.

Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes. This technique is called polling and reduces notification latency at the expense of an another thread consuming CPU to detect queue activity.

The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance.

Tasks include:

  • Add Shadow Doorbell Buffer support to reduce doorbell writes
  • Add Submission Queue Tail Doorbell register ioeventfd support when the Shadow Doorbell Buffer is enabled (see existing patch linked below)
  • Add Submission Queue polling
  • Add IOThread support so emulation can run in a dedicated thread

Links:

Details:

  • Project size: 350 hours
  • Difficulty: intermediate to advanced
  • Required skills: C programming
  • Desirable skills: knowledge of the NVMe PCI specification, knowledge of device driver or emulator development
  • Mentor: Klaus Jensen <its@irrelevant.dk> (kjensen on IRC), Keith Busch <kbusch@kernel.org>
  • Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC)

Extend aarch64 support in rust-vmm/vmm-reference

Summary: Flesh out aarch64 (ARM) support in vmm-reference to make its features comparable to x86_64

The vmm-reference is a reference implementation of a Rust VMM based on rust-vmm crates. This is currently used for testing the integration of rust-vmm components, with plans of extending it such that it becomes a starting point for custom Rust VMMs.

The vmm-reference currently has support for x86_64 and proof-of-concept level support for aarch64. On aarch64, it just supports booting a dummy VM with no devices, while on x86_64 it has support for the virtio-net and virtio-blk devices. The purpose of this project is to extend the existing functionality getting it closer to what is already available on x86_64, and consume the readily available crates (for example vm-allocator) that would make the integration easier.

Goals:

  • Set up interrupt controller.
  • Add a real-time clock device.
  • Add a serial port.
  • Add flattened device tree (FDT) so guest has a machine description.

(See below for a full list of tasks)

Links:

Details:

  • Project size: 350 hours
  • Difficulty: intermediate to advanced
  • Required skills: Rust programming
  • Desired skills: Python programming for integration tests
  • Mentors: Andreea Florescu <fandree@amazon.com>, Sergey Glushchenko <gsserge@amazon.com> (rust-vmm Slack chat)

Automated maintenance and checking using clang-query, clang-tidy and libclang

Summary: Convert QEMU's code analysis tools to clang-query, clang-tidy, and libclang

Currently QEMU is using a handwritten Perl script (scripts/checkpatch.pl) taken from the Linux kernel to check that patches obey the QEMU coding standard. In addition, the Coccinelle semantic diff tool is used periodically to do maintenance tasks, such as replacing idioms that are less safe or harder-to-read with better equivalent code.

This project will look into converting these checks and scripts to use clang-based tools such as clang-query, clang-tidy and libclang. For example, the matching part of the exec_rw_const.cocci script:

@@
expression E1, E2, E3;
@@
(
- cpu_physical_memory_rw(E1, E2, E3, false)
+ cpu_physical_memory_read(E1, E2, E3)
|
- cpu_physical_memory_rw(E1, E2, E3, true)
+ cpu_physical_memory_write(E1, E2, E3)
)

could be rewritten to use the following query:

match callExpr(hasDeclaration(functionDecl(hasName("cpu_physical_memory_rw"))),
               hasArgument(3, integerLiteral().bind("write")))

and a diagnostic could then be implemented using clang-tidy.

The project will cover developing matchers for common "checkpatch.pl" checks and Coccinelle scripts, and integration in the build system and/or CI.

The project can be expanded to 350 hours by adding some of the following:

  • coding style checks (e.g. spacing) using clang tools
  • auto fixing of reported errors

Links:

Details:

  • Project size: 175 hours
  • Required skills: C and C++ programming
  • Optional skills: Python programming (if Python bindings for libclang are used)
  • Desirable skills: knowledge of basic parsing and compilation techniques and terminology
  • Mentor: Paolo Bonzini <pbonzini@redhat.com>

Implement -M nitro-enclave in QEMU

Summary: AWS EC2 provides the ability to create an isolated sibling VM context from within a VM. This project implements the machine model and input data format parsing needed to run these sibling VMs stand alone in QEMU.

Nitro Enclaves are the first widely adopted implementation of hypervisor assisted compute isolation. Similar to technologies like SGX, it allows to spawn a separate context that is inaccessible by the parent Operating System. This is implemented by "giving up" resources of the parent VM (CPU cores, memory) to the hypervisor which then spawns a second vmm to execute a completely separate virtual machine. That new VM only has a vsock communication channel to the parent and has a built-in lightweight TPM.

One big challenge with Nitro Enclaves is that due to its roots in security, there are very few debugging / introspection capabilities. That makes OS bringup, debugging and bootstrapping very difficult. Having a local dev&test environment that looks like an Enclave, but is 100% controlled by the developer and introspectable would make life a lot easier for everyone working on them. It also may pave the way to see Nitro Enclaves adopted in VM environments outside of EC2.

This project will consist of adding a new machine model to QEMU that mimics a Nitro Enclave environment, including the lightweight TPM, the vsock communication channel and building firmware which loads the special "EIF" file format which contains kernel, initramfs and metadata from a -kernel image.

Tasks:

  • Implement a device model for the TPM device (link to spec or driver

code below)

  • Implement a new machine model
  • Implement firmware for the new machine model that implements EIF parsing
  • Add tests for the TPM device
  • Add integration test for the machine model executing an actual EIF payload

Links:

Details:

  • Skill level: intermediate - advanced (some understanding of QEMU machine modeling would be good)
  • Language: C
  • Mentor: tbd, agraf will find a mentor
  • Suggested by: Alexander Graf (OFTC: agraf, Email: graf@amazon.com)

How to add a project idea

  1. Create a new wiki page under "Internships/ProjectIdeas/YourIdea" and follow #Project idea template.
  2. Add a link from this page like this: {{:Internships/ProjectIdeas/YourIdea}}

Example idea from a previous year: Internships/ProjectIdeas/I2CPassthrough

Project idea template

=== TITLE ===
 
 '''Summary:''' Short description of the project
 
 Detailed description of the project.
 
 '''Links:'''
 * Wiki links to relevant material
 * External links to mailing lists or web sites
 
 '''Details:'''
 * Skill level: beginner or intermediate or advanced
 * Language: C
 * Mentor: Email address and IRC nick
 * Suggested by: Person who suggested the idea

How to propose a custom project idea

Applicants are welcome to propose their own project ideas. The process is as follows:

  1. Email your project idea to qemu-devel@nongnu.org. CC Stefan Hajnoczi <stefanha@gmail.com> and regular QEMU contributors who you think might be interested in mentoring.
  2. If a mentor is willing to take on the project idea, work with them to fill out the "Project idea template" above and email Stefan Hajnoczi <stefanha@gmail.com>.
  3. Stefan will add the project idea to the wiki.

Note that other candidates can apply for newly added project ideas. This ensures that custom project ideas are fair and open.

How to get familiar with our software

See what people are developing and talking about on the mailing lists:

Grab the source code or browse it:

Build QEMU and run it: QEMU on Linux Hosts

Links

Information for mentors

Mentors are responsible for keeping in touch with their intern and assessing progress. GSoC has evaluations where both the mentor and intern assess each other.

The mentor typically gives advice, reviews the intern's code, and has regular communication with the intern to ensure progress is being made.

Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the intern's experience.

The mentor chooses their intern by reviewing application forms and conducting IRC interviews with applicants. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right intern is critical so that both the mentor and the intern can have a successful experience.