Google Summer of Code 2020
Introduction
QEMU is applying to Google Summer of Code 2020. This page contains our ideas list and information for students and mentors. Google Summer of Code is an open source internship program for university students offering 12-week, full-time, paid remote work from May to August.
Applicants: You are welcome to think about project ideas and familiarize yourself with QEMU, but please don't invest too much time at this early stage. Google will announce participating organizations on February 20.
Application Process
1. Discuss the project idea with the mentor(s)
Read the project ideas list and choose one you are interested in. Read the links in the project idea description and start thinking about how you would approach this. Ask yourself:
- Do I have the necessary technical skills to complete this project in 12 weeks?
- Will I be able to work independently without the physical presence of my mentor?
If you answer no to these questions, choose another project idea and/or organization that fits your abilities better.
Once you have identified a suitable project idea, email the mentor(s) your questions about the idea and explain your understanding of the project idea to them to verify that you are on track.
2. Fill out the application form
The application form asks for a problem description and outline of how you intend to implement a solution. You will need to do some background research (looking at source code, browsing relevant specifications, etc) in order to form an idea of how to tackle the project. The form asks for an initial 12-week project schedule which you should create by breaking down the project into tasks and estimating how long they will take. The schedule can be adjusted during the summer so don't worry about getting everything right ahead of time.
3. IRC interview including a coding exercise
You may be invited to an IRC interview. The interview consists of a 30-minute coding exercise, followed by technical discussion and a chance to ask questions you have about the project idea, QEMU, and GSoC. The coding exercise is designed to show fluency in the programming language for your project idea (QEMU projects are typically in C but could also be in Python or Rust).
Here is a C coding exercise we have used in previous years when interviewing students: 2014 coding exercise
Try it and see if you are comfortable enough writing C. We cannot answer questions about the previous coding exercise but hopefully it should be self-explanatory.
If you find the exercise challenging, think about applying to other organizations where you have a stronger technical background and will be more competitive compared with other candidates.
Key Dates
From the timeline
- March 16 - 31, 2020 - Student Applications
- April 27, 2020 - Student Projects Announced
Find Us
- IRC (GSoC specific): #qemu-gsoc on irc.oftc.net
- IRC (development):
- QEMU: #qemu on irc.oftc.net
- KVM: #kvm on chat.freenode.net
- Mailing lists:
- QEMU: qemu-devel
- KVM: linux-kvm
For general questions about QEMU in GSoC, please contact the following people:
- Stefan Hajnoczi <stefanha@gmail.com> (stefanha on IRC)
Project Ideas
This is the listing of suggested project ideas. Students are free to suggest their own projects, see #How to propose a custom project idea below.
Device Emulation
NVMe Emulation Performance Optimization
Summary: QEMU's NVMe emulation uses the traditional trap-and-emulate method to emulate I/Os, thus the performance suffers due to frequent VM-exits. Version 1.3 of the NVMe specification defines a new feature to update doorbell registers using a Shadow Doorbell Buffer. This can be utilized to enhance performance of emulated controllers by reducing the number of Submission Queue Tail Doorbell writes.
Further more, it is possible to run emulation in a dedicated thread called an IOThread. Emulating NVMe in a separate thread allows the vcpu thread to continue execution and results in better performance.
Finally, it is possible for the emulation code to watch for changes to the queue memory instead of waiting for doorbell writes. This technique is called polling and reduces notification latency at the expense of an another thread consuming CPU to detect queue activity.
The goal of this project is to add implement these optimizations so QEMU's NVMe emulation performance becomes comparable to virtio-blk performance.
Tasks include:
- Add Shadow Doorbell Buffer support to reduce doorbell writes
- Add Submission Queue Tail Doorbell register ioeventfd support when the Shadow Doorbell Buffer is enabled (see existing patch linked below)
- Add Submission Queue polling
- Add IOThread support so emulation can run in a dedicated thread
Links:
- https://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf
- http://ucare.cs.uchicago.edu/pdf/fast18-femu.pdf
- https://github.com/ucare-uchicago/femu
- https://vmsplice.net/~stefan/stefanha-kvm-forum-2017.pdf
- https://patchwork.kernel.org/patch/10259575/
- https://lore.kernel.org/qemu-devel/1447825624-17011-1-git-send-email-mlin@kernel.org/T/#u
Details:
- Project size: 350 hours
- Difficulty: intermediate to advanced
- Required skills: C programming
- Desirable skills: knowledge of the NVMe PCI specification, knowledge of device driver or emulator development
- Mentor: Klaus Jensen <its@irrelevant.dk> (kjensen on IRC), Keith Busch <kbusch@kernel.org>
- Suggested by: Huaicheng Li <huaicheng@cs.uchicago.edu>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC)
BusLogic SCSI adapter emulation
Summary: Port the BusLogic SCSI adapter from VirtualBox to QEMU
QEMU does not emulate the BusLogic BT-958 SCSI adapter. Virtual machines created by VirtualBox may only include the BusLogic driver and therefore be unable to boot under QEMU.
This project is aimed at supporting the BusLogic BT-958 adapter in QEMU. VirtualBox code may be used as a reference. There is no hardware documentation available, however the Linux driver may be used to recover the details of the adapter behavior.
This project will expose you to device emulation and how SCSI Host Bus Adapters (HBAs) work. You will learn in detail how drivers perform disk I/O with the BusLogic BT-958 adapter. Previous experience with device driver development or device emulation will be helpful but is not necessary.
Links:
- Implementation in VirtualBox
- Prior (incomplete) implementation in QEMU
- Linux driver manual
- Linux driver implementation
Details:
- Skill level: advanced
- Language: C
- Mentor: Denis Dmitriev <Denis.Dmitriev@ispras.ru>, Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
- Suggested by: Pavel Dovgalyuk
TCG Plugin Cache Modelling
Summary: Implement a simple cache modelling plugin for QEMU's TCG plugins.
QEMU's TCG emulation has traditionally avoided doing complex modelling of the processor in favor of running fast. However the recent introduction of TCG plugins we can put some simple cache modelling into a plugin which can be optionally loaded when we want to examine how a program works. With such a plugin we could identify areas of code in either a linux-user program or a whole system that may not be cache optimal. The aim would be to write a plugin that allows you to simply model different icache/dcache configurations rather than actually simulate the micro-architecture of a CPU.
Links:
- See Features/TCGPlugins
- See also the docs
- Example integration of Dinero IV Cache Simulator with a out-of-tree plugin solution
Details:
- Skill level: intermediate with a good understanding of a processor instruction and data caches
- Language: C, Python
- Mentor: Alex Bennée (alex.bennee@linaro.org)
- Suggested by: Alex Bennée
Block layer improvements
Anonymization of virtual disk images
Summary: Extend the qemu-img utility to drop all data from the virtual disk while preserving image metadata.
Virtual disk images like QCOW2, VHDX, or VMDK files may reach a bad state during their lifecycle and require debugging. This happens on the side of cloud or hosting providers and these images contain end-user (even not cloud provider) data. European cloud providers nowadays treat this under terms of GDPR privacy regulations and these image files cannot be easily sent to developers for investigation.
The idea of this project is to drop all end-user data from images, including data blocks, memory inside internal snapshots, etc. On the other hand, all bits and bytes of metadata of original image should be preserved even so-called "in-use" bit and internal metadata state. This will allow problematic image files to be debugged without transmitting the privacy-sensitive data contents of the disk image files.
The task is to implement a "qemu-img anonymize" command for the QCOW2 file format and also add support for the VHDX and VMDK file formats if time permits. This new command will not only help meet GDPR regulations but also make support more convenient for users because anonymized disk image files compress much better.
This project will allow you to learn about how disk image file formats work. You will become familiar with the internals of the QCOW2 file format and how data is laid out on disk.
Links:
- qemu-img utility source code
- qcow2 image file format specification
- Python script to anonymize the old QED file format
- General Data Protection Regulation
Details:
- Skill level: intermediate
- Language: C
- Mentor: Denis V. Lunev <den@openvz.org>
- Suggested by: Denis V. Lunev <den@openvz.org>
Measure and Analyze QEMU Performance
TCG Continuous Benchmarking
Summary: The nature of this project lies more in exploration, analysis and presentation than in coding. The performance of a software product will be examined to the greatest details. The software under examination will be QEMU emulator - across its modes, across its components, and across time.
QEMU may operate in so called user mode (an executable built for one processor (in QEMU parlance, target) is, by means of QEMU emulation, executed on the system with another processor (again, in QEMU slang, host)) and in system mode (the whole system of one kind (target) is emulated on the system of another kind (host)). These two modes will be examined separately:
TASK
PART I: (user mode)
- select around a dozen test programs (resembling components of SPEC benchmark, but all must be open source, and preferably having license compatible with QEMU); those test programs should be distributed like this: 4-5 FPU CPU-intensive, 4-5 non-FPU CPU intensive, 1-2 I/O intensive;
- measure execution time and other performance data of all selected test program across all targets on Intel and possibly other hosts, for the latest QEMU version:
- try to improve performance if there is an obvious bottleneck;
- develop tests that will be protection against performance regressions in future;
- provide automated nightly tests for letting know QEMU developers if something changed performance-wise.
- measure execution time of all selected test programs for selected targets for all QEMU versions in last 5 years (there are appr. 15 such versions):
- confirm performance improvements and/or detect performance degradation.
- summarize all results in a comprehensive form, using also graphics/data visualization.
PART II: (system mode)
- measure execution time and other performance data for boot/shutdown cycle for selected machines for the latest QEMU version:
- try to improve performance if there is an obvious bottleneck.
- summarize all results in a comprehensive form.
DELIVERABLES
1) Each target maintainer for target will be given a list of top 25 functions in terms of spent host CPU time for each benchmark described in the previous section. Additional information and observations will be also provided, if the judgment is they are useful and relevant.
2) Each machine maintainer machine (that has successful boot/shutdown cycle) will be given a list of top 25 functions in terms of spent host time during boot/shutdown cycle. Additional information and observations may also be provided.
3) The community will be given all devised performance measurement methods in the form of easily reproducible step-by-step setup and execution procedures.
Deliverables should be gradually distributed over wider time interval of around two months.
Links:
Details:
- Skill level: intermediate
- Languages:
- C (for code analysis, performance improvements)
- Python (for automatization)
- potentially JavaScript (d3.js or similar library; for data visualization)
- Mentor: Aleksandar Markovic (aleksandar.markovic@rt-rk.com)
- Suggested by: Aleksandar Markovic
How to add a project idea
- Create a new wiki page under "Internships/ProjectIdeas/YourIdea" and follow #Project idea template.
- Add a link from this page like this: {{:Internships/ProjectIdeas/YourIdea}}
Example idea from a previous year: Internships/ProjectIdeas/I2CPassthrough
Project idea template
=== TITLE === '''Summary:''' Short description of the project Detailed description of the project. '''Links:''' * Wiki links to relevant material * External links to mailing lists or web sites '''Details:''' * Skill level: beginner or intermediate or advanced * Language: C * Mentor: Email address and IRC nick * Suggested by: Person who suggested the idea
How to propose a custom project idea
Applicants are welcome to propose their own project ideas. The process is as follows:
- Email your project idea to qemu-devel@nongnu.org. CC Stefan Hajnoczi <stefanha@gmail.com> and regular QEMU contributors who you think might be interested in mentoring.
- If a mentor is willing to take on the project idea, work with them to fill out the "Project idea template" above and email Stefan Hajnoczi <stefanha@gmail.com>.
- Stefan will add the project idea to the wiki.
Note that other candidates can apply for newly added project ideas. This ensures that custom project ideas are fair and open.
How to get familiar with our software
See what people are developing and talking about on the mailing lists:
Grab the source code or browse it:
Build QEMU and run it: QEMU on Linux Hosts
Links
Information for mentors
Mentors are responsible for keeping in touch with their student and assessing the student's progress. GSoC has a mid-term evaluation and a final evaluation where both the mentor and student assess each other.
The mentor typically gives advice, reviews the student's code, and has regular communication with the student to ensure progress is being made.
Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the student's experience.
The mentor chooses their student by reviewing student application forms and conducting IRC interviews with candidates. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right student is critical so that both the mentor and the student can have a successful experience.