Google Summer of Code 2019: Difference between revisions
(Hook up 2 Jailhouse tasks) |
|||
Line 49: | Line 49: | ||
{{:Internships/ProjectIdeas/IOUring}} | {{:Internships/ProjectIdeas/IOUring}} | ||
{{:Internships/ProjectIdeas/VhostUserBlkDeviceBackend}} | {{:Internships/ProjectIdeas/VhostUserBlkDeviceBackend}} | ||
{{:Internships/ProjectIdeas/JailhouseConfigChecker}} | |||
{{:Internships/ProjectIdeas/JailhouseAMDInterruptRemapping}} | |||
== Project idea template == | == Project idea template == |
Revision as of 18:17, 29 January 2019
Introduction
QEMU is applying to Google Summer of Code 2019. This page contains our ideas list and information for students and mentors. Google Summer of Code is an open source internship program for university students offering 12-week, full-time, paid remote work from May to August!
Applicants: You are welcome to think about project ideas and familiarize yourself with QEMU, but please don't invest too much time at this early stage. Google will announce participating organizations on February 26.
Application Process
After contacting the mentor to discuss the project idea you should fill out the application form at [1]. The form asks for a problem description and outline of how you intend to implement a solution. You will need to do some background research (looking at source code, browsing relevant specifications, etc) in order to form an idea of how to tackle the project. The form asks for an initial 12-week project schedule which you should create by breaking down the project into tasks and estimating how long they will take. The schedule can be adjusted during the summer so don't worry about getting everything right ahead of time.
Candidates may be invited to an IRC interview with the mentor. The interview consists of a 30 minute C coding exercise, followed by technical discussion and a chance to ask questions you have about the project idea, QEMU, and GSoC. The coding exercise is designed to show fluency in C programming.
Here is a C coding exercise we have used in previous years when interviewing students: 2014 coding exercise
Try it and see if you are comfortable enough writing C. We cannot answer questions about the previous coding exercise but hopefully it should be self-explanatory.
If you find the exercise challenging, think about applying to other organizations where you have a stronger technical background and will be more competitive compared with other candidates.
Find Us
- IRC (GSoC specific): #qemu-gsoc on irc.oftc.net
- IRC (development):
- QEMU: #qemu on irc.oftc.net
- KVM: #kvm on chat.freenode.net
- Mailing lists:
- QEMU: qemu-devel
- KVM: linux-kvm
Please contact the mentor for the project idea you are interested in. IRC is usually the quickest way to get an answer.
For general questions about QEMU in GSoC, please contact the following people:
- Stefan Hajnoczi (stefanha on IRC)
Project Ideas
This is the listing of suggested project ideas. Students are free to suggest their own projects, see #How to propose a custom project idea below.
I2C Passthrough
Summary: Implement I2C device passthrough on Linux hosts so that a single board computer like a Raspberry Pi can be used to develop applications for microcontrollers like the micro:bit.
QEMU emulates I2C devices in software but currently cannot pass through real I2C devices from the host to the guest. It would be useful to access real I2C devices from inside the guest, for example for developers writing and testing software under QEMU on their computer. This would be used like -device i2c-passthrough,device=/dev/i2c-N,hostaddr=0x48,address=0x49 (another possibility is to implement Linux I2C device support as a character device, like -chardev linux-i2c,address=0x48,device=/dev/i2c-N,id=i2c-chardev -device i2c-passthrough,chardev=i2c-chardev,address=0x49.
A very basic implementation of I2C passthrough can be very simple (150-200 lines of code perhaps) using the read/write interface to /dev/i2c-N. However, read/write is very limited and cannot drive all I2C devices because it does not support repeated start conditions. The current I2C API in QEMU is also very limited. Therefore there are many possible extensions:
- implementing I2C passthrough as an SMBusDevice instead of I2CSlave. SMBusDevice provides a higher-level interface than I2CSlave, with support for repeated start conditions and block transfers. (Question: how would the SMBusDevice API translate to /dev/i2c-N ioctls)?
- improving SMBusDevice support in QEMU: currently the SMBusDevice->I2CSlave adaptor in hw/i2c/smbus_slave.c only implements a subset of SMBus, in particular it can only read a single byte at a time from the device. Your task could be to improve the QEMU I2CBus API so that, at least for some I2C bus implementations, more functionality of SMBusDevice is accessible via I2CBus.
- Currently I2C is entirely synchronous. It would be nice to support asynchronous communication between the I2C master and the chardev, where the i2c-passthrough can hold down the I2C clock until linux-i2c has data ready. On the emulation side, this would require clock stretching support in i2c-passthrough.
If you are interested in device emulation especially, you could also implement an I2C adapter from scratch (e.g. the micro:bit TWI controller emulation on the nRF51 system-on-chip) if there's time.
This project will allow you to learn about the I2C bus and how to write device emulation code in QEMU. You will enjoy it if you like working with physical hardware.
Links:
- I2C wikipedia page
- Overview of Linux I2C programming interfaces
- Linux I2C userspace interface documentation
Details:
- Skill level: advanced (the project is not very difficult, but it is larger than usual)
- Language: C
- Mentor: Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
- Special requirements: you should already own already a relatively powerful single board computer (for example a Raspberry Pi 3 or a Le Potato) and one or more I2C devices (for example real-time clocks, temperature sensors, Wii Nunchuks) and basic tools for connecting them such as a soldering iron and 0.1" jumper cables and header pins. It is important that you have these before the coding period begins; if you don't, take into account shipping and customs time, and/or talk to the mentors.
qgraph support for device tree discovery
Summary: Enable tests for the qgraph driver framework to discover devices using a device tree, instead of having to specify the devices on the machine manually.
qgraph, developed during GSoC 2018 (merging in progress), implements an introspectable description of QEMU's supported machine types and a set of drivers that tests can use to start a guest and interact with its devices. Currently, PCI is the only discoverable bus implemented in qgraph; any non-PCI device must be listed manually in the machine type. The purpose of this project is to add support to qgraph for reading a device tree and selecting tests based on the OpenFirmware device names found in the device tree. The device tree can be provided by the virtual machine, or it can be read from an external file.
If the task turns out to be too short to fill the whole internship, the remaining time can be filled by writing new tests for devices that are usually exposed as part of a device tree.
Links:
Details:
- Skill level: intermediate
- Language: C
- Mentor: Paolo Bonzini <pbonzini@redhat.com>, Laurent Vivier <lvivier@redhat.com>
- Suggested by: Paolo Bonzini <pbonzini@redhat.com> (IRC nick bonzini)
virtio-net oss-fuzz support
Status: Alexander Oleinik is working on this project for GSoC.
Summary: Integrate oss-fuzz into QEMU so that the virtio-blk device can be fuzz tested.
oss-fuzz offers a fuzz testing service to open source projects. This means random inputs are continuously tested against the program in order to find crashes and other bugs. Fuzz testing complements hand-written test suites by exploring the input space of a program and therefore the code paths that may be taken.
The goal of this project is to experiment with integrating oss-fuzz into the virtio-net device that Qemu emulates for guest networking. virtio-net-pci is a PCI device connected to the virtual machine's PCI bus and has a configuration space that can be programmed by the guest. The device itself is specified by the VIRTIO specification which describes the functionality of the device. Bugs could potentially exist at both the PCI and VIRTIO levels, so it's important to fuzz both of them. Care should be taken to pick a design that could be generalized for all virtio devices, eg. virtio-blk.
Fuzzing emulated devices involves accessing their hardware registers randomly to make the device respond. QEMU has a device testing interface called "qtest" that accepts read/write and other commands over a socket and is ideal for writing device-level tests. You may find that oss-fuzz works better integrated directly into the QEMU program instead of as a separate qtest program, so you can consider adding a new command-line option to QEMU for running in oss-fuzz mode.
This project involves learning about VIRTIO and PCI devices, as well as figuring out how to integrate oss-fuzz into QEMU so that it can effective explore the code paths in virtio-net device emulation code. You will enjoy this project if you want to learn how device emulation works and are interested in fuzzers.
The project will primarily be in three phases. The first phase involves understanding the ecosytsem - Qemu, qtest, oss-fuzz and llvm etc. The second phase involves utilizing the qtest framework or utilizing the functionality to integrate in Qemu to fuzz the virtio-net device registers. THe third phase will involve running our fuzzing framework, analyzing results and identifying bugs.
Links:
Details:
- Skill level: intermediate
- Language: C
- Mentor: Bandan Das <bsd@redhat.com> ("bsd" on #qemu IRC), Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on #qemu IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on #qemu IRC)
Memory hotplug test
Summary: Create memory hotplug testing infrastructure.
Several QEMU targets support memory hotplug feature, which allows to (un)provision extra memory without requiring virtual machine reboot or restart for changes to take the effect. QEMU however lacks regression tests for the feature. Goal of the project is refactor QEMU's "-m" option legacy parsing into machine object properties, which should provide ability to inspect memory settings using QMP interface and implement test cases on top of that.
Project is to be split in to following parts:
- Refactor "-m mem,slots,maxmem" into MachineClass properties and replace related global variables where they are still used with properties. Make "set_memory_options()" use new properties leaving only CLI parsing there. Goal is to make "set_memory_options()" function a thin wrapper that takes care of CLI parsing and makes CLI option '-m' an alias to new -machine mem/mem-slots/maxmem options.
- Create a set of basic tests for 'make check' to verify that -m/-machine CLI parsing works as expected and based on that a set memory hot-add tests using QMP introspection to verify the expected behaviour.
- If time allows create guest-host memory hotplug ABI tests for x86/sPAPR/s390x targets.
Links:
- $(QEMU sources)/docs/memory-hotplug.txt
Details:
- Skill level: intermediate
- Language: C
- Mentor: Igor Mammedov <imammedo@redhat.com> ("imammedo" on IRC), David Hildenbrand <david@redhat.com> ("dhildenb" on IRC)
Measure Tiny Code Generation Quality
Status: Vanderson M. do Rosario <vandersonmr2@gmail.com> (vanderson on #qemu IRC) is working on this project for 2019 GSoC.
Mentor: Alex Bennée <alex.bennee@linaro.org> (stsquad on #qemu IRC)
Project Github: vandersonmr/gsoc-qemu [2]
Summary: in most applications, the majority of the execution time is spent in a very small portion of code. Regions of a code which have high-frequency execution are called hot while all other regions are called cold. As a direct consequence, emulators also spent most of their execution time emulating these hot regions and, so, dynamic compilers and translators need to pay extra attention to them. To guarantee that these hot regions are compiled/translated generating high-quality code is fundamental to achieve a final high-performance emulation. Thus, one of the most important steps in tuning an emulator performance is to identify which are the hot regions and to measure their translation quality.
TBStatistics (TBS)
Improving the code generation of the TCG backend is a hard task that involves reading through large amounts of text looking for anomalies in the generated code. It would be nice to have tools to more readily extract and parse code generation information. This would include options to dump:
- The hottest Translations Blocks (TB) and their execution count (total and atomic).
- Translation counters:
- The number of times a TB has been translated, uncached and spanned.
- The amount of time spent doing such translations and optimizations.
- Code quality metrics:
- The number of TB guest, IR (TCG ops), and host instructions.
- The Number of spills during the register allocation.
For that reason, we collect all this information dynamically for every TB or for a specific set of TBs and store it on TBStatistics structures. Every TB can have one TBStatistics linked to it by a new field inserted in the TranslationBlock structure[3]. Moreover, TBStatistics are not flushed during tb_flush and they survive longer being able to be relinked to retranslated TBs using their keys (phys_pc, pc, flags, cs_base) to matches new TBs and their TBStats.
Not all stats in a TBStatistics need to be being collected. The stats_enabled flag can be used to set which stats are going to be collected/are enabled to a particular TBStatistics. The possible flags can be found in quemu-commons.h and they are: TB_NOTHING, TB_EXEC_STATS (to collect code and translations stats), TB_JIT_STATS (to collect executions stats), TB_JIT_TIME (to collect time stats) and TB_PAUSED.
struct TBStatistics { tb_page_addr_t phys_pc; target_ulong pc; uint32_t flags; /* cs_base isn't included in the hash but we do check for matches */ target_ulong cs_base; uint32_t stats_enabled; /* Execution stats */ struct { unsigned long normal; unsigned long atomic; /* filled only when dumping x% cover set */ uint16_t coverage; } executions; struct { unsigned num_guest_inst; unsigned num_tcg_ops; unsigned num_tcg_ops_opt; unsigned spills; /* CONFIG_PROFILE */ unsigned temps; unsigned deleted_ops; unsigned in_len; unsigned out_len; unsigned search_out_len; } code; struct { unsigned long total; unsigned long spanning; } translations; struct { int64_t restore; uint64_t restore_count; int64_t interm; int64_t code; int64_t opt; int64_t la; } time; /* HMP information - used for referring to previous search */ int display_id; /* current TB linked to this TBStatistics */ TranslationBlock *tb; };
Creating and Storing TBStats
When a TB is going to be generated by calling tb_gen_code, we check if tbstats collection is enabled and if the current TB is not uncached. If so, we call tb_get_stats which either creates a new TBStatistics for that specific pc, phys_pc and flag or returns an already existing one (stored in a qht hash table in TBContext). Then, we get the default flags set through HMP or -d tb_stats argument. This default value will be used to set what stats are going to be collected for that TB.
/* * We want to fetch the stats structure before we start code * generation so we can count interesting things about this * generation. */ if (tb_stats_collection_enabled() && !(tb->cflags & CF_NOCACHE)) { tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb); uint32_t flag = get_default_tbstats_flag(); if (qemu_log_in_addr_range(tb->pc)) { if (flag & TB_EXEC_STATS) { tb->tb_stats->stats_enabled |= TB_EXEC_STATS; } } if (flag & TB_JIT_STATS) { tb->tb_stats->stats_enabled |= TB_JIT_STATS; atomic_inc(&tb->tb_stats->translations.total); } if (flag & TB_JIT_TIME) { tb->tb_stats->stats_enabled |= TB_JIT_TIME; ti = profile_getclock(); } } else { tb->tb_stats = NULL; }
Collecting/Filling TBStatistics Information
To collect the stats for the TBStatistics, different parts of the QEMU source code were changed. We list here how we collected each information for each field.
TB_EXEC_STATS
Normal Execution Count
To collect the execution count of each TB, we instrument the begin of each one of them, adding a call to a helper function called exec_freq. This instrumentation is done in the gen_tb_exec_count function. The exec_freq helper receives the address of the TBStatistic structure linked to the TB.
include/exec/gen-icount.h:
static inline void gen_tb_exec_count(TranslationBlock *tb) { if (tb_stats_enabled(tb, TB_EXEC_STATS)) { TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats); gen_helper_inc_exec_freq(ptr); tcg_temp_free_ptr(ptr); } }
The helper function access the field executions.total of the TBStatistic structure (which address was passed as a parameter) and increment it atomically counting one more execution of the TB. This is done on every execution start of a TB.
accel/tcg/tcg-runtime.c:
void HELPER(inc_exec_freq)(void *ptr) { TBStatistics *stats = (TBStatistics *) ptr; g_assert(stats); atomic_inc(&stats->executions.normal); }
Atomic Execution Count
TBs can also be executed atomically and we count how many times this is done by incrementing the executions.atomic field every time cpu_exec_step_atomic is called.
void cpu_exec_step_atomic(CPUState *cpu) .... if (tb_stats_enabled(tb, TB_EXEC_STATS)) { tb->tb_stats->executions.atomic++; } ....
Coverage
The field coverage of a TB measures the relative proportion of executed guest instructions that this TB represents when compared to the whole execution. To do so, we calculate these coverages of each TB before dumping any statistics by calling a function named calculate_last_search_coverages. This function basically counts the total number of guest instructions executed and then divide the number of guest instructions of each TB by the total getting the coverage percentage for that TB.
static uint64_t calculate_last_search_coverages(void) { uint64_t total_exec_count = 0; GList *i; /* Compute total execution count for all tbs */ for (i = last_search; i; i = i->next) { TBStatistics *tbs = (TBStatistics *) i->data; total_exec_count += (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst; } for (i = last_search; i; i = i->next) { TBStatistics *tbs = (TBStatistics *) i->data; uint64_t tb_total_execs = (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst; tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1); } return total_exec_count; }
TB_JIT_STATS
Spanning and Total Translations
We count the number of translations of a region by incrementing its TBStatistics translations.total for every time that tb_gen_code is called. TBStatistics survive even after flushes and can be recovered and linked to a new translation of a region using its phys_pc, pc, cs_base and flags.
We also count how many times a TB spans by atomically incrementing the translations.spanning field after the span.
accel/tcg/translate-all.c:
TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc, target_ulong cs_base, uint32_t flags, int cflags) ... if (tb_stats_collection_enabled()) { tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb); ... if (flag & TB_JIT_STATS) { tb->tb_stats->stats_enabled |= TB_JIT_STATS; atomic_inc(&tb->tb_stats->translations.total); } ... } ... if ((pc & TARGET_PAGE_MASK) != virt_page2) { phys_page2 = get_page_addr_code(env, virt_page2); if (tb_stats_enabled(tb, TB_JIT_STATS)) { atomic_inc(&tb->tb_stats->translations.spanning); } } ...
Guest Instructions
The guest instructions are already counted in the DisasContextBase (db->num_insns) in the translator_loop, so we simply add it to the code.num_guest_inst.
accel/tcg/translator.c:
void translator_loop(const TranslatorOps *ops, DisasContextBase *db, CPUState *cpu, TranslationBlock *tb, int max_insns) .... if (tb_stats_enabled(tb, TB_JIT_STATS)) { atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns); } ....
Host Code Size
QEMU does not have any specific way of couting the number of host instructions. Moreover, we realize that implementing such function would need to modify many target-specific code and it would be unfeasible for this GSoC. So, instead, we count the size in bytes of the generated host code that is already calculated in the tb_gen_code function.
accel/tcg/translate-all.c:
TranslationBlock *tb_gen_code(CPUState *cpu, target_ulong pc, target_ulong cs_base, uint32_t flags, int cflags) ... if (tb_stats_enabled(tb, TB_JIT_STATS)) { atomic_add(&tb->tb_stats->code.out_len, gen_code_size); } ...
TCG Instructions
To count the number of TCG instructions we need only to iterate over the ops in the TCGContext and them store the result in the code.num_tcg_inst. We do this twice, before and after aplying the optimizations, filling num_tcg_ops and num_tcg_ops_opt.
tcg/tcg.c:
int tcg_gen_code(TCGContext *s, TranslationBlock *tb) .... if (tb_stats_enabled(tb, TB_JIT_STATS)) { int n = 0; QTAILQ_FOREACH(op, &s->ops, link) { n++; } atomic_add(&tb->tb_stats->code.num_tcg_ops, n); } ....
Spills
We increment the code.spills counter after a load is generated to deal if a register spill in the function temp_sync.
static void temp_sync(...) ... case TEMP_VAL_REG: tcg_out_st(s, ts->type, ts->reg, ts->mem_base->reg, ts->mem_offset); /* Count number of spills */ if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) { atomic_inc(&s->current_tb->tb_stats->code.spills); } break; ...
TB_JIT_TIME and Old CONFIG_PROFILER Stats
Other fields in TBStatistics including the ones related to the measurement of time all came from an old global scheme of profiling of QEMU. This profiling was controlled by a CONFIG_PROFILER flag that should be set during compilation. We decided that all statistics should now concentrate in the TBStatistics structure and there was no reason to maintain two mechanisms of profiling. Moreover, instead of collecting these statistics globally we now collect them per TB, having more information and still being able to reconstruct the global profiling by summing up all TB stats.
In each region of code where there was a #ifdef CONFIG_PROFILER follow by updates of global statistics, we changed to an update of the TB stats as shows the following example:
-#ifdef CONFIG_PROFILER - atomic_set(&prof->restore_time, - prof->restore_time + profile_getclock() - ti); - atomic_set(&prof->restore_count, prof->restore_count + 1); -#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) { + atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti); + atomic_inc(&tb->tb_stats->time.restore_count); + }
Controling and Dumping TBStatistics
There are two ways of controlling the collection of statistics: before execution by using the -d tb_stats command argument or by using the tb_stats HMP command in the QEMU monitor. In both, you can choose which will be the default level of profiling collection of the tbs (exec_stats, jit_stats, jit_time, all).
Further, the HMP tb_stats command can be also used to pause, stop or restart the profiling during the execution in system mode.
To dump a list TBs stats, the HMP command "info tb-list" can be used. "info tb-list" receive as parameter the number of tbs to be dumped and which metric it should be sorted by: hotness, hg (host/guest), or spills. Follows an example of the output of three TBs stats sorted by hotness (execution frequency):
info tb-list
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0 | exec:4828932/0 guest inst cov:16.38% | trans:1 ints: g:3 op:82 op_opt:34 spills:3 | h/g (host bytes / guest insts): 90.666664 | time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns) | targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2) TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0 | exec:4825842/0 guest inst cov:21.82% | trans:1 ints: g:4 op:80 op_opt:38 spills:2 | h/g (host bytes / guest insts): 84.000000 | time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns) | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1) TB id:3 | phys:0xec1c1 virt:0x00000000000ec1c1 flags:0x0000b0 | exec:872032/0 guest inst cov:1.97% | trans:1 ints: g:2 op:56 op_opt:26 spills:1 | h/g (host bytes / guest insts): 68.000000 | time to gen at 2.4GHz => code:1692.08(ns) IR:473.75(ns) | targets: 0x00000000000ec1c5 (id:4), 0x00000000000ec1cb (id:13)
If necessary to iteratively examine one of this listed tbs the info tb id command can be used, where the id comes from the TBs listed by "info tb-list". The following example shows the result of dumping the guest instructions of tb with id 1.
info tb
TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0 | exec:6956495/0 guest inst cov:21.82% | trans:2 ints: g:2 op:40 op_opt:19 spills:1 | h/g (host bytes / guest insts): 84.000000 | time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns) | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1) ---------------- IN: 0x00034d0d: 89 de movl %ebx, %esi 0x00034d0f: 26 8b 0e movl %es:(%esi), %ecx 0x00034d12: 26 f6 46 08 80 testb $0x80, %es:8(%esi) 0x00034d17: 75 3b jne 0x34d54 ------------------------------
In the example the guest instruction of the TB was dumped, but level of code being dumped can be select by a second parameter to "info tb" that can be: out_asm for dumping the host code of the TB, in_asm for dumping the guest code and op and op_opt for dumping the QEMU TCG IR for that TB.
info cfg
info tb only shows information of one TB, but info cfg id can be used to create a .dot representing the Control Flow Graph (CFG) of the neighborhood of a TB. Each node in the graph has the guest code of the TB and its execution frequency with the colors of the nodes representing its relative hotness in the neighborhood. The image below shows an example of such CFG.
info coverset
Finally, there is an option to dump not the "n" hottest blocks but all necessary hot blocks to achieve m% of the total execution counting (info coverset m). This is a useful metric to understand the execution footprint of a program. More execution dense applications will have a smaller number of blocks to achieve m% of the execution while sparse ones will have a larger number. Dynamic compilers normally have better performance with applications with a small number of blocks needed to achieve the 90% coverset.
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0 | exec:5202686/0 guest inst cov:11.28% | trans:1 ints: g:3 op:82 op_opt:34 spills:3 | h/g (host bytes / guest insts): 90.666664 | time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns) | targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2) TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0 | exec:5199468/0 guest inst cov:15.03% | trans:1 ints: g:4 op:80 op_opt:38 spills:2 | h/g (host bytes / guest insts): 84.000000 | time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns) | targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1) ------------------------------ 2 TBs to reach 25% of guest inst exec coverage Total of guest insts exec: 138346727 ------------------------------
Overheads
To understand the overhead of collecting these statistics we executed some benchmarks using four configuration variations in the qemu-x86-64-linux-user:
- baseline: without any modification.
- tb_stats creation: with the creation of tb_stats enabled.
- jit stats: with the collection of JIT code and translation stats enabled.
- exec stats: with the collection of exec count enabled.
- all stats: with both exec count and JIT code/translation enabled.
All values below are the median of 10 executions and they are all slowdowns in relation to the baseline value.
The average slowdowns were: 0.7% for tb_stats creation, 1.7% for jit stats, 227% for exec stats and 231% for all.
TODO: collect and compare memory usage.
Linux Perf Integration
Another tool added to QEMU to help investigate its performance was adding the option to dump a jitdump file using the command line argument -perf. This jitdump file can be used by Linux Perf to enhance the report of JITed code. Using such enhancement we can have the execution time relative percentage of a TB together with its guest PC. Moreover, we can explore the host code of the TB and observe which host instructions took more time in the execution. Both examples can seem in the following pictures:
To use perf with QEMU is as simple as using the following three commands:
perf record -k 1 qemu-x86_64 -perf ./a.out perf inject -j -i perf.data -o perf.data.jitted perf report -i perf.data.jitted
Future Work
Future work would include:
- elide or beautify common blocks like softmmu access macros (which are always the same)
- develop a safe way of collecting the gen host instructions count.
- QEMU currently only works on translating simple basic blocks with one or two exit paths. This work could be a pre-cursor to supporting Internships/ProjectIdeas/Multi-exit Hot Blocks in the future.
AVX
Status: Jan Bobek is working on this project for GSoC.
Summary: Support for AVX within TCG
QEMU's TCG just-in-time compiler translates target CPU instructions into host CPU instructions so that programs written for other CPU architectures can be run on any host. Modern CPUs features vector processing instruction sets, sometimes called Single Instruction Multiple Data (SIMD) instructions, for performing the same operation on multiple elements of data in just one instruction. Intel's SSE and AVX instruction set extensions were introduced for x86 CPUs for this purpose.
The target/i386 front end has support for TCG emulation of SSE4.1, but does not have support for later vector extensions such as AVX. Your task is to implement and test AVX instructions that are currently missing in QEMU.
Links:
- http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg06250.html
- https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
Details:
- Skill level: intermediate to advanced
- Language: C
- Mentor: Richard Henderson <richard.henderson@linaro.org> (rth on #qemu IRC)
- Suggested by: Nick Renieris
API documentation generation
Status: Gabriel Barreto is working on this project for GSoC.
Summary: Generation of API documentation from doc comments
QEMU currently has many functions documented using the GTK-Doc syntax, but there is no mechanism to actually generate API documentation from these doc comments. We need build rules that generate API documentation from C and Python source code.
Tasks:
- Picking a documentation generation tool and syntax (unclear if we should stay with GTK-Doc)
- Fixing or converting existing doc comments to the chosen syntax
- Writing build rules to generate the documentation
- Extra tasks, if time allows:
- Improving clarity or formatting of existing doc comments
- Converting existing ad-hoc comments in the code to doc comment syntax
- Add doc comments to existing APIs that are undocumented
Links:
Details:
- Skill level: beginner
- Language: C, Python, GNU Make
- Mentor: Eduardo Habkost <ehabkost@redhat.com> ("ehabkost" on IRC)
- Suggested by: Eduardo Habkost
Guest ABI automated testing
Summary: Automated test of Guest ABI and compatibility
QEMU tries to provide a stable guest ABI on versioned machine-types. Despite investing lots of effort keeping compatibility, we have no automated testing to detect common mistakes that break guest ABI. It should be possible to write automated test cases that will compare a virtual machine to a previously stored dump of guest ABI information.
A guest ABI dump may include, for example:
- Data returned by CPUID instruction (or equivalent)
- Physical memory and I/O port maps
- Device addresses (PCI, USB, etc.)
- Device IDs and other guest-visible device fields
- Value of QOM properties that affect device behavior
This might require improving or adding new QMP commands to provide information to be validated by the automated test cases. Some test cases may use a custom kernel image for collecting guest-visible data, or extending the qtest protocol.
Links:
- Ancient test code for CPUID compatibility
- incomplete proof of concept for validating CPUID data
- Hack that uses GDB to extract machine-type information from QEMU
Details:
- Skill level: intermediate
- Language: C, Python
- Mentor: Eduardo Habkost <ehabkost@redhat.com> ("ehabkost" on IRC)
- Suggested by: Eduardo Habkost
Nested SVM test improvements
Summary: Implement tests for AMD SVM nested virtualization
KVM supports both AMD SVM and Intel VMX technologies for hardware-assisted virtualization on x86 and while similar in concept, these technologies have significant differences between them. When we want to test nested virtualization we need to make the test familiar with these differences (as the test, in fact, is a small hypervisor running on top of KVM). Currently, we have two frameworks which test nested virtualization: kvm-unit-tests and kvm selftests. SVM support in kvm-unit-tests lags behind VMX and kvm selftest framework doesn't support it at all. The project will have the following parts:
- Implementing SVM support in kvm selftest framework, writing a generic 'nesting' test utilizing vmx/svm depending on the host's hardware.
- Improving SVM support in kvm-unit-tests making it match (where possible) VMX and adding SVM-specific features.
The applicant must have access to modern physical AMD hardware (Opteron 62xx/63xx, Epyc) to be able to accomplish the task.
Links:
- AMD developer manual: https://developer.amd.com/resources/developer-guides-manuals/
- kvm-unit-tests: https://www.linux-kvm.org/page/KVM-unit-tests
- kvm selftest framework: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/tree/tools/testing/selftests/kvm
Details:
- Skill level: intermediate or advanced
- Language: C
- Mentor: Vitaly Kuznetsov <vkuznets@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC)
io_uring AIO engine
Status: Aarushi Mehta is working on the project for Outreachy. Project status is here.
Summary: Add io_uring support to QEMU for high-performance disk I/O on Linux
The io_uring interface supersedes the Linux AIO API for asynchronous I/O. The core functionality of asynchronous I/O APIs is I/O request submission (reads/writes/flushes) and completion processing at a later point in time. Unlike a blocking read(2)/write(2) syscall, this allows the application threads to perform other activity while one or more I/O requests are in flight.
QEMU currently supports two asynchronous I/O engines: aio=threads (a thread-pool that invokes preadv(2)/pwritev(2)/fdatasync(2)) and aio=native (Linux AIO). This project will add io_uring support, which should achieve better performance than Linux AIO. This is because io_uring offers several optimizations:
- Memory buffers can be registered (pinned) ahead of time to avoid pinning on each request
- File descriptors can be held long-term to avoid the need to acquire them on each request
- A single system call can both submit and complete I/O requests or polling mode can be used to avoid system calls altogether
This project consists of the following tasks:
- Understanding the io_uring userspace ABI
- Extending block/file-posix.c to use io_uring
- Benchmarking io_uring against Linux AIO and aio=threads
- Adding polling mode support for completions (similar to existing Linux AIO polling code in QEMU)
- Stretch goal: Adding polling mode support for submissions
- Stretch goal: Adding a fast path when QEMU block layer features are not in use
- Stretch goal: Use IORING_OP_POLL_ADD so unify QEMU's polling and blocking event loop code paths
Links:
Details:
- Skill level: intermediate
- Language: C
- Mentor: Julia Suvorova <jusual@mail.ru> ("jusual" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
vhost-user-blk device backend
Summary: Implement a vhost-user-blk device backend inside QEMU so guests can efficiently access shared disk images.
QEMU can connect virtio-blk disks to external processes that act as vhost-user-blk device backends. This makes it possible for QEMU guests to access disks managed by SPDK or other software-defined storage appliances.
QEMU itself does not offer a vhost-user-blk device backend although the QEMU block layer has a number of features that make QEMU desirable as a software-defined storage appliance in its own right. For example, multiple VMs could safely access a shared qcow2 disk image file with one of the QEMUs acting as the vhost-user-blk device backend. Today this is can be worked around using QEMU's NBD support, but its performance will always be lower since it is a network protocol.
The goal is to add a vhost-user-blk device backend to QEMU so that disks can be exported to other processes. The following steps are necessary:
- Understand libvhost-user, QEMU's library for implementing vhost-user device backends
- Add a QEMU monitor command for instantiating vhost-user-blk device backends given a blockdev and a UNIX domain socket
- Add a QEMU monitor command for shutting down and removing vhost-user-blk device backends
- Implement a vhost-user-blk device backend using libvhost-user (see the vhost-user-blk.c stand-alone example below)
- Extend QEMU's vhost-user tests to take advantage of your vhost-user-blk device backend
Links:
Details:
- Skill level: intermediate
- Language: C
- Mentor: Kevin Wolf <kwolf@redhat.com> ("kwolf" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
Configuration checker for Jailhouse Hypervisor
Summary: Validate Jailhouse configurations against a set of consistency rules
The Jailhouse hypervisor uses low-level configuration files in order to describe the partitioning of a system. These files are currently not checked for consistency and are rather easy to get wrong. Not all errors that the system designer may make can be identified, but at least a good part of them can.
The goal of this task is to enhance the existing Python-based tooling around Jailhouse to take a set of configurations (system+root-cell configuration and all at simultaneously active non-root cell configs), run a predefined list of checks against them and report any findings. This could look like that:
# jailhouse config check ROOT.cell NON-ROOT-A.cell NON-ROOT-B.cell ... Error: MSI-X region of PCI device 00:01.2 directly mapped into NON-ROOT-A
The input to the checker shall be binary config files for which Jailhouse already has a parsing module that translate them into Python objects.
Rules that should at least be validated are:
- memory region overlaps
- invalid pass-through of critical resources (MSI-X, irq controllers, PCI config ports etc.)
- inconsistencies between root and non-root configs (e.g duplicate assignments of resources, missing root-cell access to loadable memory regions of non-root cells etc.)
- fully zero-initialized entries in configuration array (indicates missing elements)
- invalid PCI capability or shared-memory region indices
The rules will be further detailed as the project starts. A bonus task can be the definition of additional rules, based on the analysis of the configuration format and its semantics.
Links:
- Jailhouse project, including setup in QEMU/KVM: https://github.com/siemens/jailhouse
- Jailhouse tutorial: https://events.linuxfoundation.org/sites/events/files/slides/ELCE2016-Jailhouse-Tutorial.pdf
Details:
- Skill level: intermediate
- Language: Python, C
- Mentor: Jan Kiszka <jan.kiszka@web.de>
AMD Interrupt Remapping Support for Jailhouse hypervisor
Summary: Supplement existing AMD IOMMU code with interrupt remapping support.
Jailhouse currently supports IOMMU on AMD-based x86 systems, however it does so for memory transfers only. In order for the isolation to be complete, the analogous translation should be applied to interrupt messages as well. This technique is commonly known as interrupt remapping and it is provided by AMD IOMMU.
The goal of this project is to implement the missing bits in AMD IOMMU interrupt remapping support to bring it on par with Intel interrupt remapping support Jailhouse already has. This involves writing code for the hypervisor core and arch-specific parts as well as adding relevant options to the config file generator.
In order to succeed with this project, you must have a sufficiently recent (2015 or newer) AMD computer (PC or laptop) which you can use as a testbed. You should also check that:
- The APU is Kaveri-based or newer
- There is an IOMMU option in the UEFI setup tool and it's enabled
- A recent Linux distribution reports IOMMU support in dmesg
- And you can pass-through a PCI device such as a network or sound card to a guest running in the virt-manager.
You should also check that the mainboard has a serial port header as this helps debugging a lot.
Links:
- Jailhouse project, including setup in QEMU/KVM: https://github.com/siemens/jailhouse
- Jailhouse tutorial: https://events.linuxfoundation.org/sites/events/files/slides/ELCE2016-Jailhouse-Tutorial.pdf
- A strawman implementation on top of the old Jailhouse tree (can be used as a starting point): https://github.com/vsinitsyn/jailhouse/tree/amd-vi-ir
- AMD IOMMU specification: https://support.amd.com/TechDocs/48882_IOMMU.pdf
Details:
- Skill level: Advanced
- Language: C
- Mentor: Jan Kiszka <jan.kiszka@web.de>, Valentine Sinitsyn <valentine.sinitsyn@gmail.com>
Project idea template
=== TITLE === '''Summary:''' Short description of the project Detailed description of the project. '''Links:''' * Wiki links to relevant material * External links to mailing lists or web sites '''Details:''' * Skill level: beginner or intermediate or advanced * Language: C * Mentor: Email address and IRC nick * Suggested by: Person who suggested the idea
How to propose a custom project idea
Applicants are welcome to propose their own project ideas. The process is as follows:
- Email your project idea to qemu-devel@nongnu.org. CC Stefan Hajnoczi <stefanha@gmail.com> and regular QEMU contributors who you think might be interested in mentoring.
- If a mentor is willing to take on the project idea, work with them to fill out the "Project idea template" above and email Stefan Hajnoczi <stefanha@gmail.com>.
- Stefan will add the project idea to the wiki.
Note that other candidates can apply for newly added project ideas. This ensures that custom project ideas are fair and open.
How to get familiar with our software
See what people are developing and talking about on the mailing lists:
Grab the source code or browse it:
Build QEMU and run it: QEMU on Linux Hosts
Important links
Information for mentors
Mentors are responsible for keeping in touch with their student and assessing the student's progress. GSoC has a mid-term evaluation and a final evaluation where both the mentor and student assess each other.
The mentor typically gives advice, reviews the student's code, and has regular communication with the student to ensure progress is being made.
Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the student's experience.
The mentor chooses their student by reviewing student application forms and conducting IRC interviews with candidates. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right student is critical so that both the mentor and the student can have a successful experience.