Google Summer of Code 2016: Difference between revisions
No edit summary |
|||
Line 114: | Line 114: | ||
'''Summary''': Add "qemu-img dd" subcommand. | '''Summary''': Add "qemu-img dd" subcommand. | ||
dd(1) is a convenient tool to work on binary files, while qemu-img(1) has the knowledge of many image formats (qcow2, vhdx, vdi, vmdk, etc.) and protocols (nfs, iscsi, gluster, ssh, etc.). If we put them together, we'll have the power of dd to work on various virtual images | dd(1) is a convenient tool to work on binary files, while qemu-img(1) has the knowledge of many image formats (qcow2, vhdx, vdi, vmdk, etc.) and protocols (nfs, iscsi, gluster, ssh, etc.). If we put them together, we'll have the power of dd to work on various virtual images, or even pipe it through any host side utilities, such as grep(1), xxd(1) or xz(1). The idea is to implement a new subcommand in qemu-img, the tool provided by QEMU for manupulating virtual images. | ||
Currently qemu-img has following subcommands: | Currently qemu-img has following subcommands: |
Revision as of 07:40, 14 March 2016
Introduction
QEMU is participating in Google Summer of Code 2016. This page contains our ideas list and information for students and mentors. Google Summer of Code is a program that pays students for 12-week full-time remote work on open source projects from May to August 2016!
Are you eligible for Outreachy? QEMU also participates in Outreachy, a program designed to increase participation of underrepresented groups in open source. We encourage students to apply both to GSoC and Outreachy if they are eligible. The project ideas list for Outreachy is available here.
Application Process
After contacting the mentor to discuss the project idea you should fill out the application form at [1]. The form asks for a problem description and outline of how you intend to implement a solution. You will need to do some background research (looking at source code, browsing relevant specifications, etc) in order to form an idea of how to tackle the project. The form asks for an initial 12-week project schedule which you should create by breaking down the project into tasks and estimating how long they will take. The schedule can be adjusted during the summer so don't worry about getting everything right ahead of time.
Candidates may be invited to an IRC interview with the mentor. The interview consists of a 30 minute C coding exercise, followed by technical discussion and a chance to ask questions you have about the project idea, QEMU, and GSoC. The coding exercise is designed to show fluency in C programming.
Here is a C coding exercise we have used in previous years when interviewing students: 2014 coding exercise
Try it and see if you are comfortable enough writing C. We cannot answer questions about the previous coding exercise but hopefully it should be self-explanatory.
If you find the exercise challenging, think about applying to other organizations where you have a stronger technical background and will be more competitive compared with other candidates.
Find Us
- IRC (GSoC specific): #qemu-gsoc on irc.oftc.net
- IRC (development):
- QEMU: #qemu on irc.oftc.net
- libvirt: #virt on irc.oftc.net
- KVM: #kvm on chat.freenode.net
- Mailing lists:
- QEMU: qemu-devel
- libvirt: libvir-list
- KVM: linux-kvm
Please contact the mentor for the project idea you are interested in. IRC is usually the quickest way to get an answer.
For general questions about QEMU in GSoC, please contact the following people:
- Stefan Hajnoczi (stefanha on IRC)
How to get familiar with our software
See what people are developing and talking about on the mailing lists:
Grab the source code or browse it:
Build QEMU and run it: QEMU on Linux Hosts
Important links
Project Ideas
This is the listing of suggested project ideas. Students are free to suggest their own projects, see #How to propose a custom project idea below.
QEMU projects
AF_VSOCK packet capture in Linux and Wireshark
Summary: Develop a AF_VSOCK packet capture Linux device driver and Wireshark dissector
Wireshark and Linux's packet capture functionality support more than just Ethernet traffic dumping. USB device traffic and netlink software communication can also be captured.
The AF_VSOCK address family is currently not support by Wireshark because there is no Linux kernel device driver for packet capture. AF_VSOCK is the socket address family that is used by the virtio-vsock host/guest communication device that is currently in development. The aim of this project is to first implement a Linux device driver for AF_VSOCK packet capture and then a Wireshark dissector. Minor changes to tcpdump may be necessary too.
This will allow tcpdump and Wireshark to dump host/guest communication with virtio-vsock (and maybe also VMware VMSockets). Traffic capture is an essential debugging tool for network communication and has not been available to programs using AF_VSOCK.
This project is challenging because you need to work on multiple codebases. You must have experience with device driver development and network programming.
Links:
- How AF_NETLINK does packet capture: nlmon.c
- Wireshark dissector docs (I'm not a Wireshark expert, some research may be necessary)
- virtio-vsock: Zero-configuration host/guest communication (pdf) presentation on virtio-vsock
Details:
- Skill level: advanced
- Language: C
- Mentor: Stefan Hajnoczi <stefanha@redhat.com> (stefanha on IRC)
qemu-img fuzzing using afl-fuzz
Summary: Apply the afl-fuzz fuzz testing tool to qemu-img and submit patches fixing bugs discovered with afl-fuzz.
The qemu-img tool is used to convert between disk image file formats and inspect image files. It supports multiple file formats including qcow2, vmdk, vhdx, and parallels. Since this tool is often used on untrusted inputs (e.g. in a cloud or hosting environment where end-users can upload disk image files), it must not allow arbitrary code execution or other classes of security bugs.
afl-fuzz instruments the program to record codepaths taken for each input test file. This allows afl-fuzz to mutate inputs and choose the ones that explore new codepaths. The amount of prior knowledge that afl-fuzz needs about the input grammar is limited since it learns how inputs affect the codepath. This makes it possible to fuzz various disk image file formats without painstakingly writing grammars for each file format.
In Outreach Program for Women 2014, a qcow2-specific fuzzing tool was developed in Python and several bugs were discovered. This project aims to tackle the other file formats (especially vmdk, vhdx, and parallels).
This project is suitable for candidates interested in software security, software testing, compilers, and disk image file formats.
Links:
- afl-fuzz
- VMDK file format
- VHDX file format (docx)
- Existing qcow2 fuzzer in qemu.git: source code
Details:
- Skill level: intermediate
- Language: C
- Mentor: Stefan Hajnoczi <stefanha@redhat.com> (stefanha on IRC)
- Suggested by: Stefan Hajnoczi
qemu-img new subcommand "dd"
Summary: Add "qemu-img dd" subcommand.
dd(1) is a convenient tool to work on binary files, while qemu-img(1) has the knowledge of many image formats (qcow2, vhdx, vdi, vmdk, etc.) and protocols (nfs, iscsi, gluster, ssh, etc.). If we put them together, we'll have the power of dd to work on various virtual images, or even pipe it through any host side utilities, such as grep(1), xxd(1) or xz(1). The idea is to implement a new subcommand in qemu-img, the tool provided by QEMU for manupulating virtual images.
Currently qemu-img has following subcommands: Command syntax: check [-q] [-f fmt] [--output=ofmt] [-r [leaks | all]] [-T src_cache] filename create [-q] [-f fmt] [-o options] filename [size] commit [-q] [-f fmt] [-t cache] filename compare [-f fmt] [-F fmt] [-T src_cache] [-p] [-q] [-s] filename1 filename2 convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-T src_cache] [-O output_fmt] [-o options] [-s snapshot_id_or_name] [-l snapshot_param] [-S sparse_size] filename [filename2 [...]] output_filename info [-f fmt] [--output=ofmt] [--backing-chain] filename map [-f fmt] [--output=ofmt] filename snapshot [-q] [-l | -a snapshot | -c snapshot | -d snapshot] filename rebase [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] -b backing_file [-F backing_fmt] filename resize [-q] filename [+ | -]size amend [-q] [-f fmt] [-t cache] -o options filename
You will extend the subcommand set with the new "dd" command, in a syntax that is familiar to *nix "dd" users.
Note that we don't have to mirror the behavior of GNU coreutils' or BDS systems' dd(1), or try to support every operand found there. A subset of operands (and probably some qemu-img specific ones) as chosen by you will be implemented. It is also your responsibility to write documentation for the new command and options.
Links
Details:
- Skill level: beginner
- Language: C
- Mentor: Fam Zheng <famz@redhat.com>, fam on IRC
qtest-os: a mini operating system written in Python
Summary: Write a Python library to interact with QEMU's qtest, and then as much as possible of a "mini-OS" written in Python
QEMU uses "qtest" as a mechanism for tests to interact with devices. qtest unit tests are currently written in C, using the GTest framework from glib and glue libraries called "libqtest" and "libqos". libqtest implements the qtest socket protocol, while libqos provides utility functions to deal with e.g. guest memory allocation and PCI devices. However, the functionality of libqos is limited, and using a high-level language like Python will make it easier to prototype and build more complex functionality in libqos.
This project will investigate using qtest from Python, including:
- writing a Python library with the same functionality as libqtest
- converting some of the existing tests from C to Python
- using the existing Python bindings to ACPICA (the ACPI reference implementation) to write ACPI unit tests
- extending qtest with a driver model ("qtest-os").
The last bullet splits a unit test in three parts: a description of QEMU's supported machine types, a set of drivers, and the unit test code proper. For example, given:
- a SCSI unit test
- a description of the machine type X saying that X a PCI bus
- a driver for X's PCI host bridge
- a driver for virtio-scsi
qtest would infer that the unit test can run by starting X with a virtio-scsi device.
Links:
Details:
- Skill level: medium
- Language: Python
- Mentor: Paolo Bonzini <pbonzini@redhat.com> (bonzini on IRC)
Postcopy migration: Recovery from a broken network connection
Summary: Improve the postcopy migration mode so it can cope with a network failure during the migration.
Postcopy migration is a scheme that is good at live migrating large VMs that rapidly change memory, but if the network connection fails during the postcopy phase you're left with an inconsistent VM. I had some ideas how to fix this by putting both VMs into a paused state and then hunting for the missing pages (see the Links).
Links: https://www.mail-archive.com/qemu-devel@nongnu.org/msg344360.html
Details:
- Skill level: medium/advanced
- Language: C
- Mentor Dave Gilbert <dgilbert@redhat.com> (davidgiluk on IRC)
Multi-threaded TCG Projects
Summary: The MTTCG Project is an ongoing project to convert the TCG engine from its current single threaded approach to something that will take advantage of all cores on a modern processor. With this conversion things that where true in the old world mat not be true now, especially on non-x86 backends.
Runtime memory ordering refers to the guarantees about ordering of load and store operations the processor makes to the program. On x86 (which is strongly ordered) all loads and stores appear sequentially consistent. On other architectures there are often specialised instructions, for example ARMv8 has LDAR (Load Acquire) and STLR (Store Release), which indicate what assumptions can be made.
A naive implementation of a solution would be to introduce new TCGOps to represent these barriers and emit explicit barriers on the backend when needed. A more complex solution could then merge barrier operations with the next load/store operation to generate more efficient generated code. It's expected that the pathalogical case would be supporting x86 guests on weak model backends as every load and store will need a full memory barrier. This may make MTTCG for x86 on ${OTHER ARCH} pointless.
Further Reading:
- The kernel has a detailed guide to memory barriers
- These kvm-unit-tests [2] and [3] have been written to exercise barrier code
- Ulrich Drepper's paper What Every Programmer Should Know About Memory is a very detailed description of modern memory systems including ordering problems
Requirements: Working on this will require the student to develop a good understanding of micro architectures and the ability to read architectural manuals to glean correct behaviour of operations. An understanding of compiler theory or previous knowledge of the TCG would also be beneficial to this work. Finally as the MTTCG code is not itself currently up-streamed a familiarity with GIT and being able to frequently re-base work on a moving target would be useful.
Details:
- Skill level: advanced
- Language: C
- Mentor Alex Bennee <alex.bennee@linaro.org> (stsquad on IRC)
Event loop profiling tool
Summary: Develop a top(1)-like tool to monitor event loop dispatching
A running QEMU process can have a number of different types of threads. An I/O thread (either the main thread, or a custom iothread for dataplane devices) is a thread that runs an poll based event loop.
The event loop dispatches I/O events that come from user interface (e.g. monitor fd), guest OS (e.g. ioeventfd), or program's internal sources (e.g. bottom halves or timers). Their occupation of host CPU time is often very useful debug/diagnostic information. Ideally the profiling code in QEMU would be in a dedicated thread so it is still usable even when the event loops are stuck.
In this project you will develop a tool for QEMU that is like the top(1) utility for Linux, to monitor QEMU's event loops. As a prerequisite, you need to modify QEMU to expose necessary data that will be collected by the new tool to generate the profiling output.
You must be familiar with (n)curses library and multi-threaded programming. You can write the tool in either C or Python.
Links:
Details:
- Skill level: advanced
- Language: C, (optional) Python
- Mentor: Fam Zheng <famz@redhat.com>, fam on IRC
Qemu usb-mtp emulation
Summary: Make usb-mtp a reliable host/guest file sharing medium
USB Media Transfer Protocol is a ubiquitous way of transfering digital media, especially with portable devices such as smartphones, tablets etc. It is an interesting approach because unlike other methods, the device exposing the feature has full control on file operations thereby ensuring data integrity.
Qemu can emulate a USB MTP server which can enable guests to have an easy way of sharing files with the host. All modern operating systems come with MTP clients to make this a plug and play experience.
This project aims to add missing features to Qemu's usb-mtp emulation and make it robust and stable for everyday use.
One of the most important missing features is write support i.e guests have read-only access to the MTP share. It's envisioned that that would be one of the main deliverables of this project. Adding write support involves propagating changes to existing and new MTP Objects back to the server. Write support also involves Quotas so that clients cannot indiscriminately write to the share and fill up the entire volume.
There are other potentially interesting features that can be added such as metadata support, PTP backwards compatibility (Mac OS ?) etc
As mentioned above, another important aspect of this project is to make the emulation stable. Currently, usb-mtp has been primarily tested with Linux guests. We should test other guest operating systems and fix issues as they come up.
Experience with USB, MTP, Virtual file systems, and upto some extent, the Qemu device layer will be helpful but not required. That said, this project is a good way for interested students to delve deeper into these areas.
Links:
Details:
- Skill level: advanced
- Language: C
- Mentor: Bandan Das <bsd@redhat.com> (bsd on IRC)
- Mentor: Gerd Hoffman <kraxel@redhat.com (kraxel on IRC)
Qemu AMD IO MMU emulation
Summary: Interrupt remapping/Improvements to Qemu AMD IO MMU emulation
The I/O Memory Management Unit (IO MMU) is a system function that translates addresses used in DMA transactions, protects memory from disallowed access by I/O devices, and remaps peripheral interrupts.
A project to add AMD IO MMU emulation to Qemu has been on going for a while. The aim of this project should be to get the current patches merged into Qemu (If they're not merged by then). In addition to that we should implement interrupt remapping for the user-space irqchip mode in a similar way to what is currently done with Intel VT-d. The two above should be the main aims of the project but we should also experiment with caching root page table pointer, polishing event logging abilities of IO MMU where as much information as possible relating to an event should be encoded into the event log data, some events should be written to the hardware event reporting registers, some events are not reported e.t.c., implement interrupts related to r/w1c IO MMU control register bits starting from modifying ACPI tables if necessary. This project should also put some focus on some bugs in address translation whereby IO MMU sometimes receives host physical addresses instead of guest physical address from Qemu DMA engine and also implement Accessed and Dirty bits page bits. If time allows IO MMU event counters starting with modifying ACPI tables to encode the right MMIO/IO MMU event counters capabilities, reserving MMIO region to the right alignment, reporting event counters configuration to IO MMU through MMIO and finally counting events.
Links:
Details:
- Skill level: intermediate to advanced.
- Language: C
- Mentor: Jan Kiszka <jan.kiszka@web.de>
- Mentor: Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
- Suggested by: David Kiarie
Jailhouse Projects
The Jailhouse Linux-based partitioning hypervisor (https://github.com/siemens/jailhouse) is also associated with the QEMU/KVM communities so we act as an umbrella organization.
Enhanced Config Generator
Summary: Goal of this project is to improve the Jailhouse config generator so that it can provide non-root cell configurations and validate them against a system setup.
Jailhouse is a Linux-based static partitioning hypervisor. It runs both on real hardware as well as inside QEMU/KVM, primarily for testing and evaluation purposes. Jailhouse is configured via two types of configuration files: system configurations that describe both the hypervisor as well as the so-called root cell which is the Linux that booted the system. Additional partitions, non-root cells, are described via cell configurations. While we already have a generator tool for the former, written in Python, creating non-root cells remains a manual, error-prone task so far.
In this project, the config generator shall be extended to derive non-root cell configs for a specific system. Inputs will be the corresponding system configuration and cell parameters the user has to provide such as target CPU set, memory size and a set of devices. The command could look like this:
jailhouse config cell SYSCONFIG CELLCONFIG { -c | --cpus } CPUSET { -m | --mem } MEMSIZE [-p | --pci PCIDEVICE]
Moreover, we need better analysis for configuration sets. The user should be able to perform basic consistency checks on a given set of system and cell configurations such as
- Do all configuration records conform the format requirements (alignment rules, only valid flags, consistent internal references like from PCI devices to capabilities etc.)?
- Are resources assigned multiple times?
- Do all resources exist in the system (validate against what Linux reports)?
- ...
The command for this could look like the following:
jailhouse config validate [-r | --root ROOT] SYSCONFIG [CELLCONFIG] ...
The results of this projects may eventually also help to improve the manageability of Jailhouse via libvirt (Jailhouse patches for libvirt).
Links:
Details:
- Skill level: intermediate
- Language: Python, C
- Mentor: jan.kiszka@web.de
- Suggested by: Jan Kiszka
Libvirt projects
This year libvirt is trying to get accepted as a separate organization. There's a wiki page similar to this on libvirt wiki: Libvirt GSoC page.
However, if an idea that involves work in both qemu and libvirt appears, it should be listed on both lists.
Project idea template
=== TITLE === '''Summary:''' Short description of the project Detailed description of the project. '''Links:''' * Wiki links to relevant material * External links to mailing lists or web sites '''Details:''' * Skill level: beginner or intermediate or advanced * Language: C * Mentor: Email address and IRC nick * Suggested by: Person who suggested the idea
How to propose a custom project idea
Applicants are welcome to propose their own project ideas. The process is as follows:
- Email your project idea to qemu-devel@nongnu.org. CC Stefan Hajnoczi <stefanha@gmail.com> and regular QEMU contributors who you think might be interested in mentoring.
- If a mentor is willing to take on the project idea, work with them to fill out the "Project idea template" above and email Stefan Hajnoczi <stefanha@gmail.com>.
- Stefan will add the project idea to the wiki.
Note that other candidates can apply for newly added project ideas. This ensures that custom project ideas are fair and open.
Information for mentors
Mentors are responsible for keeping in touch with their student and assessing the student's progress. GSoC has a mid-term evaluation and a final evaluation where both the mentor and student assess each other.
The mentor typically gives advice, reviews the student's code, and has regular communication with the student to ensure progress is being made.
Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the student's experience.
The mentor chooses their student by reviewing student application forms and conducting IRC interviews with candidates. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right student is critical so that both the mentor and the student can have a successful experience.