Google Summer of Code 2019

From QEMU
Revision as of 18:31, 29 January 2019 by Marcel (talk | contribs)

Introduction

QEMU is applying to Google Summer of Code 2019. This page contains our ideas list and information for students and mentors. Google Summer of Code is an open source internship program for university students offering 12-week, full-time, paid remote work from May to August!

Applicants: You are welcome to think about project ideas and familiarize yourself with QEMU, but please don't invest too much time at this early stage. Google will announce participating organizations on February 26.

Application Process

After contacting the mentor to discuss the project idea you should fill out the application form at [1]. The form asks for a problem description and outline of how you intend to implement a solution. You will need to do some background research (looking at source code, browsing relevant specifications, etc) in order to form an idea of how to tackle the project. The form asks for an initial 12-week project schedule which you should create by breaking down the project into tasks and estimating how long they will take. The schedule can be adjusted during the summer so don't worry about getting everything right ahead of time.

Candidates may be invited to an IRC interview with the mentor. The interview consists of a 30 minute C coding exercise, followed by technical discussion and a chance to ask questions you have about the project idea, QEMU, and GSoC. The coding exercise is designed to show fluency in C programming.

Here is a C coding exercise we have used in previous years when interviewing students: 2014 coding exercise

Try it and see if you are comfortable enough writing C. We cannot answer questions about the previous coding exercise but hopefully it should be self-explanatory.

If you find the exercise challenging, think about applying to other organizations where you have a stronger technical background and will be more competitive compared with other candidates.

Find Us

  • IRC (GSoC specific): #qemu-gsoc on irc.oftc.net
  • IRC (development):
    • QEMU: #qemu on irc.oftc.net
    • KVM: #kvm on chat.freenode.net

Please contact the mentor for the project idea you are interested in. IRC is usually the quickest way to get an answer.

For general questions about QEMU in GSoC, please contact the following people:

Project Ideas

This is the listing of suggested project ideas. Students are free to suggest their own projects, see #How to propose a custom project idea below.

I2C Passthrough

Summary: Implement I2C device passthrough on Linux hosts so that a single board computer like a Raspberry Pi can be used to develop applications for microcontrollers like the micro:bit.

QEMU emulates I2C devices in software but currently cannot pass through real I2C devices from the host to the guest. It would be useful to access real I2C devices from inside the guest, for example for developers writing and testing software under QEMU on their computer. This would be used like -device i2c-passthrough,device=/dev/i2c-N,hostaddr=0x48,address=0x49 (another possibility is to implement Linux I2C device support as a character device, like -chardev linux-i2c,address=0x48,device=/dev/i2c-N,id=i2c-chardev -device i2c-passthrough,chardev=i2c-chardev,address=0x49.

A very basic implementation of I2C passthrough can be very simple (150-200 lines of code perhaps) using the read/write interface to /dev/i2c-N. However, read/write is very limited and cannot drive all I2C devices because it does not support repeated start conditions. The current I2C API in QEMU is also very limited. Therefore there are many possible extensions:

  • implementing I2C passthrough as an SMBusDevice instead of I2CSlave. SMBusDevice provides a higher-level interface than I2CSlave, with support for repeated start conditions and block transfers. (Question: how would the SMBusDevice API translate to /dev/i2c-N ioctls)?
  • improving SMBusDevice support in QEMU: currently the SMBusDevice->I2CSlave adaptor in hw/i2c/smbus_slave.c only implements a subset of SMBus, in particular it can only read a single byte at a time from the device. Your task could be to improve the QEMU I2CBus API so that, at least for some I2C bus implementations, more functionality of SMBusDevice is accessible via I2CBus.
  • Currently I2C is entirely synchronous. It would be nice to support asynchronous communication between the I2C master and the chardev, where the i2c-passthrough can hold down the I2C clock until linux-i2c has data ready. On the emulation side, this would require clock stretching support in i2c-passthrough.

If you are interested in device emulation especially, you could also implement an I2C adapter from scratch (e.g. the micro:bit TWI controller emulation on the nRF51 system-on-chip) if there's time.

This project will allow you to learn about the I2C bus and how to write device emulation code in QEMU. You will enjoy it if you like working with physical hardware.

Links:

Details:

  • Skill level: advanced (the project is not very difficult, but it is larger than usual)
  • Language: C
  • Mentor: Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)
  • Special requirements: you should already own already a relatively powerful single board computer (for example a Raspberry Pi 3 or a Le Potato) and one or more I2C devices (for example real-time clocks, temperature sensors, Wii Nunchuks) and basic tools for connecting them such as a soldering iron and 0.1" jumper cables and header pins. It is important that you have these before the coding period begins; if you don't, take into account shipping and customs time, and/or talk to the mentors.

qgraph support for device tree discovery

Summary: Enable tests for the qgraph driver framework to discover devices using a device tree, instead of having to specify the devices on the machine manually.

qgraph, developed during GSoC 2018 (merging in progress), implements an introspectable description of QEMU's supported machine types and a set of drivers that tests can use to start a guest and interact with its devices. Currently, PCI is the only discoverable bus implemented in qgraph; any non-PCI device must be listed manually in the machine type. The purpose of this project is to add support to qgraph for reading a device tree and selecting tests based on the OpenFirmware device names found in the device tree. The device tree can be provided by the virtual machine, or it can be read from an external file.

If the task turns out to be too short to fill the whole internship, the remaining time can be filled by writing new tests for devices that are usually exposed as part of a device tree.

Links:

  • KVM Forum 2018 presentation on qtest: video
  • last version of the qgraph framework: patchew

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Paolo Bonzini <pbonzini@redhat.com>, Laurent Vivier <lvivier@redhat.com>
  • Suggested by: Paolo Bonzini <pbonzini@redhat.com> (IRC nick bonzini)

virtio-net oss-fuzz support

Status: Alexander Oleinik is working on this project for GSoC.

Summary: Integrate oss-fuzz into QEMU so that the virtio-blk device can be fuzz tested.

oss-fuzz offers a fuzz testing service to open source projects. This means random inputs are continuously tested against the program in order to find crashes and other bugs. Fuzz testing complements hand-written test suites by exploring the input space of a program and therefore the code paths that may be taken.

The goal of this project is to experiment with integrating oss-fuzz into the virtio-net device that Qemu emulates for guest networking. virtio-net-pci is a PCI device connected to the virtual machine's PCI bus and has a configuration space that can be programmed by the guest. The device itself is specified by the VIRTIO specification which describes the functionality of the device. Bugs could potentially exist at both the PCI and VIRTIO levels, so it's important to fuzz both of them. Care should be taken to pick a design that could be generalized for all virtio devices, eg. virtio-blk.

Fuzzing emulated devices involves accessing their hardware registers randomly to make the device respond. QEMU has a device testing interface called "qtest" that accepts read/write and other commands over a socket and is ideal for writing device-level tests. You may find that oss-fuzz works better integrated directly into the QEMU program instead of as a separate qtest program, so you can consider adding a new command-line option to QEMU for running in oss-fuzz mode.

This project involves learning about VIRTIO and PCI devices, as well as figuring out how to integrate oss-fuzz into QEMU so that it can effective explore the code paths in virtio-net device emulation code. You will enjoy this project if you want to learn how device emulation works and are interested in fuzzers.

The project will primarily be in three phases. The first phase involves understanding the ecosytsem - Qemu, qtest, oss-fuzz and llvm etc. The second phase involves utilizing the qtest framework or utilizing the functionality to integrate in Qemu to fuzz the virtio-net device registers. THe third phase will involve running our fuzzing framework, analyzing results and identifying bugs.

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Bandan Das <bsd@redhat.com> ("bsd" on #qemu IRC), Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on #qemu IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on #qemu IRC)

Memory hotplug test

Summary: Create memory hotplug testing infrastructure.

Several QEMU targets support memory hotplug feature, which allows to (un)provision extra memory without requiring virtual machine reboot or restart for changes to take the effect. QEMU however lacks regression tests for the feature. Goal of the project is refactor QEMU's "-m" option legacy parsing into machine object properties, which should provide ability to inspect memory settings using QMP interface and implement test cases on top of that.

Project is to be split in to following parts:

  • Refactor "-m mem,slots,maxmem" into MachineClass properties and replace related global variables where they are still used with properties. Make "set_memory_options()" use new properties leaving only CLI parsing there. Goal is to make "set_memory_options()" function a thin wrapper that takes care of CLI parsing and makes CLI option '-m' an alias to new -machine mem/mem-slots/maxmem options.
  • Create a set of basic tests for 'make check' to verify that -m/-machine CLI parsing works as expected and based on that a set memory hot-add tests using QMP introspection to verify the expected behaviour.
  • If time allows create guest-host memory hotplug ABI tests for x86/sPAPR/s390x targets.

Links:

  • $(QEMU sources)/docs/memory-hotplug.txt

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Igor Mammedov <imammedo@redhat.com> ("imammedo" on IRC), David Hildenbrand <david@redhat.com> ("dhildenb" on IRC)

Measure Tiny Code Generation Quality

Status: Vanderson M. do Rosario <vandersonmr2@gmail.com> (vanderson on #qemu IRC) is working on this project for 2019 GSoC.

Mentor: Alex Bennée <alex.bennee@linaro.org> (stsquad on #qemu IRC)

Project Github: vandersonmr/gsoc-qemu [2]

Summary: in most applications, the majority of the execution time is spent in a very small portion of code. Regions of a code which have high-frequency execution are called hot while all other regions are called cold. As a direct consequence, emulators also spent most of their execution time emulating these hot regions and, so, dynamic compilers and translators need to pay extra attention to them. To guarantee that these hot regions are compiled/translated generating high-quality code is fundamental to achieve a final high-performance emulation. Thus, one of the most important steps in tuning an emulator performance is to identify which are the hot regions and to measure their translation quality.

TBStatistics (TBS)

Improving the code generation of the TCG backend is a hard task that involves reading through large amounts of text looking for anomalies in the generated code. It would be nice to have tools to more readily extract and parse code generation information. This would include options to dump:

  • The hottest Translations Blocks (TB) and their execution count (total and atomic).
  • Translation counters:
    • The number of times a TB has been translated, uncached and spanned.
    • The amount of time spent doing such translations and optimizations.
  • Code quality metrics:
    • The number of TB guest, IR (TCG ops), and host instructions.
    • The Number of spills during the register allocation.

For that reason, we collect all this information dynamically for every TB or for a specific set of TBs and store it on TBStatistics structures. Every TB can have one TBStatistics linked to it by a new field inserted in the TranslationBlock structure[3]. Moreover, TBStatistics are not flushed during tb_flush and they survive longer being able to be relinked to retranslated TBs using their keys (phys_pc, pc, flags, cs_base) to matches new TBs and their TBStats.

Not all stats in a TBStatistics need to be being collected. The stats_enabled flag can be used to set which stats are going to be collected/are enabled to a particular TBStatistics. The possible flags can be found in quemu-commons.h and they are: TB_NOTHING, TB_EXEC_STATS (to collect code and translations stats), TB_JIT_STATS (to collect executions stats), TB_JIT_TIME (to collect time stats) and TB_PAUSED.

struct TBStatistics {
    tb_page_addr_t phys_pc;
    target_ulong pc;
    uint32_t     flags;
    /* cs_base isn't included in the hash but we do check for matches */
    target_ulong cs_base;

    uint32_t stats_enabled;

    /* Execution stats */
    struct {
        unsigned long normal;
        unsigned long atomic;
        /* filled only when dumping x% cover set */
        uint16_t coverage;
    } executions;

    struct {
        unsigned num_guest_inst;
        unsigned num_tcg_ops;
        unsigned num_tcg_ops_opt;
        unsigned spills; 

        /* CONFIG_PROFILE */
        unsigned temps;
        unsigned deleted_ops;
        unsigned in_len;
        unsigned out_len;
        unsigned search_out_len;
    } code; 

    struct {
        unsigned long total;
        unsigned long spanning;
    } translations; 

    struct {
        int64_t restore;
        uint64_t restore_count;
        int64_t interm;
        int64_t code;
        int64_t opt;
        int64_t la;
    } time; 

    /* HMP information - used for referring to previous search */
    int display_id; 

    /* current TB linked to this TBStatistics */
    TranslationBlock *tb;
};


Creating and Storing TBStats

When a TB is going to be generated by calling tb_gen_code, we check if tbstats collection is enabled and if the current TB is not uncached. If so, we call tb_get_stats which either creates a new TBStatistics for that specific pc, phys_pc and flag or returns an already existing one (stored in a qht hash table in TBContext). Then, we get the default flags set through HMP or -d tb_stats argument. This default value will be used to set what stats are going to be collected for that TB.

    /*                                                                                                                                                              
     * We want to fetch the stats structure before we start code                                                                                                    
     * generation so we can count interesting things about this                                                                                                     
     * generation.                                                                                                                                                  
     */                                                                                                                                                             
    if (tb_stats_collection_enabled() && !(tb->cflags & CF_NOCACHE)) {                                                                                              
        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);                                                                                               
        uint32_t flag = get_default_tbstats_flag();                                                                                                                 
                                                                                                                                                                    
        if (qemu_log_in_addr_range(tb->pc)) {                                                                                                                       
            if (flag & TB_EXEC_STATS) {                                                                                                                             
                tb->tb_stats->stats_enabled |= TB_EXEC_STATS;                                                                                                       
            }                                                                                                                                                       
        }                                                                                                                                                           
                                                                                                                                                                    
        if (flag & TB_JIT_STATS) {                                                                                                                                  
            tb->tb_stats->stats_enabled |= TB_JIT_STATS;                                                                                                            
            atomic_inc(&tb->tb_stats->translations.total);                                                                                                          
        }                                                                                                                                                           
                                                                                                                                                                    
        if (flag & TB_JIT_TIME) {                                                                                                                                   
            tb->tb_stats->stats_enabled |= TB_JIT_TIME;                                                                                                             
            ti = profile_getclock();                                                                                                                                
        }                                                                                                                                                           
    } else {                                                                                                                                                        
        tb->tb_stats = NULL;                                                                                                                                        
    }

Collecting/Filling TBStatistics Information

To collect the stats for the TBStatistics, different parts of the QEMU source code were changed. We list here how we collected each information for each field.

TB_EXEC_STATS

Normal Execution Count

To collect the execution count of each TB, we instrument the begin of each one of them, adding a call to a helper function called exec_freq. This instrumentation is done in the gen_tb_exec_count function. The exec_freq helper receives the address of the TBStatistic structure linked to the TB.

include/exec/gen-icount.h:

static inline void gen_tb_exec_count(TranslationBlock *tb)
{
    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
        TCGv_ptr ptr = tcg_const_ptr(tb->tb_stats);
        gen_helper_inc_exec_freq(ptr);
        tcg_temp_free_ptr(ptr);
    }
}

The helper function access the field executions.total of the TBStatistic structure (which address was passed as a parameter) and increment it atomically counting one more execution of the TB. This is done on every execution start of a TB.

accel/tcg/tcg-runtime.c:

void HELPER(inc_exec_freq)(void *ptr)
{
    TBStatistics *stats = (TBStatistics *) ptr;
    g_assert(stats);
    atomic_inc(&stats->executions.normal);
}

Atomic Execution Count

TBs can also be executed atomically and we count how many times this is done by incrementing the executions.atomic field every time cpu_exec_step_atomic is called.

void cpu_exec_step_atomic(CPUState *cpu) 
....
    if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
        tb->tb_stats->executions.atomic++;
    }
....

Coverage

The field coverage of a TB measures the relative proportion of executed guest instructions that this TB represents when compared to the whole execution. To do so, we calculate these coverages of each TB before dumping any statistics by calling a function named calculate_last_search_coverages. This function basically counts the total number of guest instructions executed and then divide the number of guest instructions of each TB by the total getting the coverage percentage for that TB.

static uint64_t calculate_last_search_coverages(void)
{
    uint64_t total_exec_count = 0;
    GList *i;

    /* Compute total execution count for all tbs */
    for (i = last_search; i; i = i->next) {
        TBStatistics *tbs = (TBStatistics *) i->data;
        total_exec_count +=
           (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
    }

    for (i = last_search; i; i = i->next) {
        TBStatistics *tbs = (TBStatistics *) i->data;
        uint64_t tb_total_execs =
            (tbs->executions.atomic + tbs->executions.normal) * tbs->code.num_guest_inst;
        tbs->executions.coverage = (10000 * tb_total_execs) / (total_exec_count + 1);
    }

    return total_exec_count;
}

TB_JIT_STATS

Spanning and Total Translations

We count the number of translations of a region by incrementing its TBStatistics translations.total for every time that tb_gen_code is called. TBStatistics survive even after flushes and can be recovered and linked to a new translation of a region using its phys_pc, pc, cs_base and flags.

We also count how many times a TB spans by atomically incrementing the translations.spanning field after the span.

accel/tcg/translate-all.c:

TranslationBlock *tb_gen_code(CPUState *cpu,
                            target_ulong pc, target_ulong cs_base,
                            uint32_t flags, int cflags)
   ...
    if (tb_stats_collection_enabled()) {
        tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags, tb);
   ...
       if (flag & TB_JIT_STATS) {
           tb->tb_stats->stats_enabled |= TB_JIT_STATS;
           atomic_inc(&tb->tb_stats->translations.total);
       }
   ...
   }
   
   ...
   
   if ((pc & TARGET_PAGE_MASK) != virt_page2) {
        phys_page2 = get_page_addr_code(env, virt_page2);
       if (tb_stats_enabled(tb, TB_JIT_STATS)) {
           atomic_inc(&tb->tb_stats->translations.spanning);
       }
   }
   ...

Guest Instructions

The guest instructions are already counted in the DisasContextBase (db->num_insns) in the translator_loop, so we simply add it to the code.num_guest_inst.

accel/tcg/translator.c:

void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
                     CPUState *cpu, TranslationBlock *tb, int max_insns)
   ....
   if (tb_stats_enabled(tb, TB_JIT_STATS)) {
       atomic_add(&db->tb->tb_stats->code.num_guest_inst, db->num_insns);
   }
   ....

Host Code Size

QEMU does not have any specific way of couting the number of host instructions. Moreover, we realize that implementing such function would need to modify many target-specific code and it would be unfeasible for this GSoC. So, instead, we count the size in bytes of the generated host code that is already calculated in the tb_gen_code function.

accel/tcg/translate-all.c:

TranslationBlock *tb_gen_code(CPUState *cpu,
                             target_ulong pc, target_ulong cs_base,
                             uint32_t flags, int cflags)
   ...
   if (tb_stats_enabled(tb, TB_JIT_STATS)) {
       atomic_add(&tb->tb_stats->code.out_len, gen_code_size);
   }
   ...


TCG Instructions

To count the number of TCG instructions we need only to iterate over the ops in the TCGContext and them store the result in the code.num_tcg_inst. We do this twice, before and after aplying the optimizations, filling num_tcg_ops and num_tcg_ops_opt.

tcg/tcg.c:

int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
  ....
   if (tb_stats_enabled(tb, TB_JIT_STATS)) {
       int n = 0;
       QTAILQ_FOREACH(op, &s->ops, link) {
           n++;
       }
       atomic_add(&tb->tb_stats->code.num_tcg_ops, n);
   }
  ....

Spills

We increment the code.spills counter after a load is generated to deal if a register spill in the function temp_sync.

    static void temp_sync(...)
        ...
        case TEMP_VAL_REG:
            tcg_out_st(s, ts->type, ts->reg,
                       ts->mem_base->reg, ts->mem_offset);
            
           /* Count number of spills */
           if (tb_stats_enabled(s->current_tb, TB_JIT_STATS)) {
               atomic_inc(&s->current_tb->tb_stats->code.spills);
           }
           break;
         ...

TB_JIT_TIME and Old CONFIG_PROFILER Stats

Other fields in TBStatistics including the ones related to the measurement of time all came from an old global scheme of profiling of QEMU. This profiling was controlled by a CONFIG_PROFILER flag that should be set during compilation. We decided that all statistics should now concentrate in the TBStatistics structure and there was no reason to maintain two mechanisms of profiling. Moreover, instead of collecting these statistics globally we now collect them per TB, having more information and still being able to reconstruct the global profiling by summing up all TB stats.

In each region of code where there was a #ifdef CONFIG_PROFILER follow by updates of global statistics, we changed to an update of the TB stats as shows the following example:

-#ifdef CONFIG_PROFILER
-    atomic_set(&prof->restore_time,
-                prof->restore_time + profile_getclock() - ti);
-    atomic_set(&prof->restore_count, prof->restore_count + 1);
-#endif
+    if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+        atomic_add(&tb->tb_stats->time.restore, profile_getclock() - ti);
+        atomic_inc(&tb->tb_stats->time.restore_count);
+    }

Controling and Dumping TBStatistics

There are two ways of controlling the collection of statistics: before execution by using the -d tb_stats command argument or by using the tb_stats HMP command in the QEMU monitor. In both, you can choose which will be the default level of profiling collection of the tbs (exec_stats, jit_stats, jit_time, all).

Further, the HMP tb_stats command can be also used to pause, stop or restart the profiling during the execution in system mode.

To dump a list TBs stats, the HMP command "info tb-list" can be used. "info tb-list" receive as parameter the number of tbs to be dumped and which metric it should be sorted by: hotness, hg (host/guest), or spills. Follows an example of the output of three TBs stats sorted by hotness (execution frequency):

info tb-list
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:4828932/0 guest inst cov:16.38%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
	| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:4825842/0 guest inst cov:21.82%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1) 

TB id:3 | phys:0xec1c1 virt:0x00000000000ec1c1 flags:0x0000b0
	| exec:872032/0 guest inst cov:1.97%
	| trans:1 ints: g:2 op:56 op_opt:26 spills:1
	| h/g (host bytes / guest insts): 68.000000
	| time to gen at 2.4GHz => code:1692.08(ns) IR:473.75(ns)
	| targets: 0x00000000000ec1c5 (id:4), 0x00000000000ec1cb (id:13)


If necessary to iteratively examine one of this listed tbs the info tb id command can be used, where the id comes from the TBs listed by "info tb-list". The following example shows the result of dumping the guest instructions of tb with id 1.

info tb

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:6956495/0  guest inst cov:21.82%
	| trans:2 ints: g:2 op:40 op_opt:19 spills:1
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
	| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)

----------------
IN: 
0x00034d0d:  89 de                    movl     %ebx, %esi
0x00034d0f:  26 8b 0e                 movl     %es:(%esi), %ecx
0x00034d12:  26 f6 46 08 80           testb    $0x80, %es:8(%esi)
0x00034d17:  75 3b                    jne      0x34d54

------------------------------

In the example the guest instruction of the TB was dumped, but level of code being dumped can be select by a second parameter to "info tb" that can be: out_asm for dumping the host code of the TB, in_asm for dumping the guest code and op and op_opt for dumping the QEMU TCG IR for that TB.


info cfg

info tb only shows information of one TB, but info cfg id can be used to create a .dot representing the Control Flow Graph (CFG) of the neighborhood of a TB. Each node in the graph has the guest code of the TB and its execution frequency with the colors of the nodes representing its relative hotness in the neighborhood. The image below shows an example of such CFG.

Info-cfg.png

info coverset

Finally, there is an option to dump not the "n" hottest blocks but all necessary hot blocks to achieve m% of the total execution counting (info coverset m). This is a useful metric to understand the execution footprint of a program. More execution dense applications will have a smaller number of blocks to achieve m% of the execution while sparse ones will have a larger number. Dynamic compilers normally have better performance with applications with a small number of blocks needed to achieve the 90% coverset.

TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
	| exec:5202686/0 guest inst cov:11.28%
	| trans:1 ints: g:3 op:82 op_opt:34 spills:3
	| h/g (host bytes / guest insts): 90.666664
	| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
	| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)

TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
	| exec:5199468/0 guest inst cov:15.03%
	| trans:1 ints: g:4 op:80 op_opt:38 spills:2
	| h/g (host bytes / guest insts): 84.000000
	| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
	| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)

------------------------------
2 TBs to reach 25% of guest inst exec coverage
Total of guest insts exec: 138346727

------------------------------

Overheads

To understand the overhead of collecting these statistics we executed some benchmarks using four configuration variations in the qemu-x86-64-linux-user:

  • baseline: without any modification.
  • tb_stats creation: with the creation of tb_stats enabled.
  • jit stats: with the collection of JIT code and translation stats enabled.
  • exec stats: with the collection of exec count enabled.
  • all stats: with both exec count and JIT code/translation enabled.

All values below are the median of 10 executions and they are all slowdowns in relation to the baseline value.

Tbstats-overheads.png

The average slowdowns were: 0.7% for tb_stats creation, 1.7% for jit stats, 227% for exec stats and 231% for all.

TODO: collect and compare memory usage.

Linux Perf Integration

Another tool added to QEMU to help investigate its performance was adding the option to dump a jitdump file using the command line argument -perf. This jitdump file can be used by Linux Perf to enhance the report of JITed code. Using such enhancement we can have the execution time relative percentage of a TB together with its guest PC. Moreover, we can explore the host code of the TB and observe which host instructions took more time in the execution. Both examples can seem in the following pictures:

Perf-tbs-pcs.png
Perf-disas.png

To use perf with QEMU is as simple as using the following three commands:

perf record -k 1 qemu-x86_64 -perf ./a.out
perf inject -j -i perf.data -o perf.data.jitted
perf report -i perf.data.jitted

Future Work

Future work would include:

  • elide or beautify common blocks like softmmu access macros (which are always the same)
  • develop a safe way of collecting the gen host instructions count.

AVX

Status: Jan Bobek is working on this project for GSoC.

Summary: Support for AVX within TCG

QEMU's TCG just-in-time compiler translates target CPU instructions into host CPU instructions so that programs written for other CPU architectures can be run on any host. Modern CPUs features vector processing instruction sets, sometimes called Single Instruction Multiple Data (SIMD) instructions, for performing the same operation on multiple elements of data in just one instruction. Intel's SSE and AVX instruction set extensions were introduced for x86 CPUs for this purpose.

The target/i386 front end has support for TCG emulation of SSE4.1, but does not have support for later vector extensions such as AVX. Your task is to implement and test AVX instructions that are currently missing in QEMU.

Links:

Details:

  • Skill level: intermediate to advanced
  • Language: C
  • Mentor: Richard Henderson <richard.henderson@linaro.org> (rth on #qemu IRC)
  • Suggested by: Nick Renieris

API documentation generation

Status: Gabriel Barreto is working on this project for GSoC.

Summary: Generation of API documentation from doc comments

QEMU currently has many functions documented using the GTK-Doc syntax, but there is no mechanism to actually generate API documentation from these doc comments. We need build rules that generate API documentation from C and Python source code.

Tasks:

  • Picking a documentation generation tool and syntax (unclear if we should stay with GTK-Doc)
  • Fixing or converting existing doc comments to the chosen syntax
  • Writing build rules to generate the documentation
  • Extra tasks, if time allows:
    • Improving clarity or formatting of existing doc comments
    • Converting existing ad-hoc comments in the code to doc comment syntax
    • Add doc comments to existing APIs that are undocumented

Links:

Details:

  • Skill level: beginner
  • Language: C, Python, GNU Make
  • Mentor: Eduardo Habkost <ehabkost@redhat.com> ("ehabkost" on IRC)
  • Suggested by: Eduardo Habkost

Guest ABI automated testing

Summary: Automated test of Guest ABI and compatibility

QEMU tries to provide a stable guest ABI on versioned machine-types. Despite investing lots of effort keeping compatibility, we have no automated testing to detect common mistakes that break guest ABI. It should be possible to write automated test cases that will compare a virtual machine to a previously stored dump of guest ABI information.

A guest ABI dump may include, for example:

  • Data returned by CPUID instruction (or equivalent)
  • Physical memory and I/O port maps
  • Device addresses (PCI, USB, etc.)
  • Device IDs and other guest-visible device fields
  • Value of QOM properties that affect device behavior

This might require improving or adding new QMP commands to provide information to be validated by the automated test cases. Some test cases may use a custom kernel image for collecting guest-visible data, or extending the qtest protocol.


Links:

Details:

  • Skill level: intermediate
  • Language: C, Python
  • Mentor: Eduardo Habkost <ehabkost@redhat.com> ("ehabkost" on IRC)
  • Suggested by: Eduardo Habkost

Nested SVM test improvements

Summary: Implement tests for AMD SVM nested virtualization

KVM supports both AMD SVM and Intel VMX technologies for hardware-assisted virtualization on x86 and while similar in concept, these technologies have significant differences between them. When we want to test nested virtualization we need to make the test familiar with these differences (as the test, in fact, is a small hypervisor running on top of KVM). Currently, we have two frameworks which test nested virtualization: kvm-unit-tests and kvm selftests. SVM support in kvm-unit-tests lags behind VMX and kvm selftest framework doesn't support it at all. The project will have the following parts:

  • Implementing SVM support in kvm selftest framework, writing a generic 'nesting' test utilizing vmx/svm depending on the host's hardware.
  • Improving SVM support in kvm-unit-tests making it match (where possible) VMX and adding SVM-specific features.

The applicant must have access to modern physical AMD hardware (Opteron 62xx/63xx, Epyc) to be able to accomplish the task.

Links:

Details:

  • Skill level: intermediate or advanced
  • Language: C
  • Mentor: Vitaly Kuznetsov <vkuznets@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> ("bonzini" on IRC)

io_uring AIO engine

Status: Aarushi Mehta is working on the project for Outreachy. Project status is here.

Summary: Add io_uring support to QEMU for high-performance disk I/O on Linux

The io_uring interface supersedes the Linux AIO API for asynchronous I/O. The core functionality of asynchronous I/O APIs is I/O request submission (reads/writes/flushes) and completion processing at a later point in time. Unlike a blocking read(2)/write(2) syscall, this allows the application threads to perform other activity while one or more I/O requests are in flight.

QEMU currently supports two asynchronous I/O engines: aio=threads (a thread-pool that invokes preadv(2)/pwritev(2)/fdatasync(2)) and aio=native (Linux AIO). This project will add io_uring support, which should achieve better performance than Linux AIO. This is because io_uring offers several optimizations:

  • Memory buffers can be registered (pinned) ahead of time to avoid pinning on each request
  • File descriptors can be held long-term to avoid the need to acquire them on each request
  • A single system call can both submit and complete I/O requests or polling mode can be used to avoid system calls altogether

This project consists of the following tasks:

  • Understanding the io_uring userspace ABI
  • Extending block/file-posix.c to use io_uring
  • Benchmarking io_uring against Linux AIO and aio=threads
  • Adding polling mode support for completions (similar to existing Linux AIO polling code in QEMU)
  • Stretch goal: Adding polling mode support for submissions
  • Stretch goal: Adding a fast path when QEMU block layer features are not in use
  • Stretch goal: Use IORING_OP_POLL_ADD so unify QEMU's polling and blocking event loop code paths

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Julia Suvorova <jusual@mail.ru> ("jusual" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)

vhost-user-blk device backend

Summary: Implement a vhost-user-blk device backend inside QEMU so guests can efficiently access shared disk images.

QEMU can connect virtio-blk disks to external processes that act as vhost-user-blk device backends. This makes it possible for QEMU guests to access disks managed by SPDK or other software-defined storage appliances.

QEMU itself does not offer a vhost-user-blk device backend although the QEMU block layer has a number of features that make QEMU desirable as a software-defined storage appliance in its own right. For example, multiple VMs could safely access a shared qcow2 disk image file with one of the QEMUs acting as the vhost-user-blk device backend. Today this is can be worked around using QEMU's NBD support, but its performance will always be lower since it is a network protocol.

The goal is to add a vhost-user-blk device backend to QEMU so that disks can be exported to other processes. The following steps are necessary:

  • Understand libvhost-user, QEMU's library for implementing vhost-user device backends
  • Add a QEMU monitor command for instantiating vhost-user-blk device backends given a blockdev and a UNIX domain socket
  • Add a QEMU monitor command for shutting down and removing vhost-user-blk device backends
  • Implement a vhost-user-blk device backend using libvhost-user (see the vhost-user-blk.c stand-alone example below)
  • Extend QEMU's vhost-user tests to take advantage of your vhost-user-blk device backend

Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Kevin Wolf <kwolf@redhat.com> ("kwolf" on IRC), Stefan Hajnoczi <stefanha@redhat.com> ("stefanha" on IRC)

Configuration checker for Jailhouse Hypervisor

Summary: Validate Jailhouse configurations against a set of consistency rules

The Jailhouse hypervisor uses low-level configuration files in order to describe the partitioning of a system. These files are currently not checked for consistency and are rather easy to get wrong. Not all errors that the system designer may make can be identified, but at least a good part of them can.

The goal of this task is to enhance the existing Python-based tooling around Jailhouse to take a set of configurations (system+root-cell configuration and all at simultaneously active non-root cell configs), run a predefined list of checks against them and report any findings. This could look like that:

  # jailhouse config check ROOT.cell NON-ROOT-A.cell NON-ROOT-B.cell ...
  Error: MSI-X region of PCI device 00:01.2 directly mapped into NON-ROOT-A

The input to the checker shall be binary config files for which Jailhouse already has a parsing module that translate them into Python objects.

Rules that should at least be validated are:

  • memory region overlaps
  • invalid pass-through of critical resources (MSI-X, irq controllers, PCI config ports etc.)
  • inconsistencies between root and non-root configs (e.g duplicate assignments of resources, missing root-cell access to loadable memory regions of non-root cells etc.)
  • fully zero-initialized entries in configuration array (indicates missing elements)
  • invalid PCI capability or shared-memory region indices

The rules will be further detailed as the project starts. A bonus task can be the definition of additional rules, based on the analysis of the configuration format and its semantics.

Links:

Details:

  • Skill level: intermediate
  • Language: Python, C
  • Mentor: Jan Kiszka <jan.kiszka@web.de>

AMD Interrupt Remapping Support for Jailhouse hypervisor

Summary: Supplement existing AMD IOMMU code with interrupt remapping support.

Jailhouse currently supports IOMMU on AMD-based x86 systems, however it does so for memory transfers only. In order for the isolation to be complete, the analogous translation should be applied to interrupt messages as well. This technique is commonly known as interrupt remapping and it is provided by AMD IOMMU.

The goal of this project is to implement the missing bits in AMD IOMMU interrupt remapping support to bring it on par with Intel interrupt remapping support Jailhouse already has. This involves writing code for the hypervisor core and arch-specific parts as well as adding relevant options to the config file generator.

In order to succeed with this project, you must have a sufficiently recent (2015 or newer) AMD computer (PC or laptop) which you can use as a testbed. You should also check that:

  • The APU is Kaveri-based or newer
  • There is an IOMMU option in the UEFI setup tool and it's enabled
  • A recent Linux distribution reports IOMMU support in dmesg
  • And you can pass-through a PCI device such as a network or sound card to a guest running in the virt-manager.

You should also check that the mainboard has a serial port header as this helps debugging a lot.

Links:

Details:

  • Skill level: Advanced
  • Language: C
  • Mentor: Jan Kiszka <jan.kiszka@web.de>, Valentine Sinitsyn <valentine.sinitsyn@gmail.com>

PVRDMA Live Migration

Status: Sukrit Bhatnagar is working on this project for GSoC.

Summary: Add live migration support for the PVRDMA device.

The PVRDMA device allows Virtual Machines to use RoCE (RDMA over Converged Ethernet) without assigning a physical device (or a virtual function in SR-IOV case). The big wins are we don't need to pin the VM memory in RAM and the device can support Live Migration. This project addresses the latter point.

While the PVRDMA device can be used in a hybrid environment where the nodes can be a bare-metal machine or a VM, this project aims to enable Live Migration only if all the nodes are VMs.

The above assumption allows a relatively easy approach by creating a QEMU protocol for broadcasting/receiving notifications during Live Migration. Since RoCE uses Ethernet as the data link layer and QEMU already supports Live Migration for emulated Ethernet devices, the project will concentrate on passing the device state from source to destination and on the protocol mentioned above.


Links:

Details:

  • Skill level: intermediate
  • Language: C
  • Mentor: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Yuval Shaia <yuval.shaia@oracle.com>

Project idea template

=== TITLE ===
 
 '''Summary:''' Short description of the project
 
 Detailed description of the project.
 
 '''Links:'''
 * Wiki links to relevant material
 * External links to mailing lists or web sites
 
 '''Details:'''
 * Skill level: beginner or intermediate or advanced
 * Language: C
 * Mentor: Email address and IRC nick
 * Suggested by: Person who suggested the idea

How to propose a custom project idea

Applicants are welcome to propose their own project ideas. The process is as follows:

  1. Email your project idea to qemu-devel@nongnu.org. CC Stefan Hajnoczi <stefanha@gmail.com> and regular QEMU contributors who you think might be interested in mentoring.
  2. If a mentor is willing to take on the project idea, work with them to fill out the "Project idea template" above and email Stefan Hajnoczi <stefanha@gmail.com>.
  3. Stefan will add the project idea to the wiki.

Note that other candidates can apply for newly added project ideas. This ensures that custom project ideas are fair and open.

How to get familiar with our software

See what people are developing and talking about on the mailing lists:

Grab the source code or browse it:

Build QEMU and run it: QEMU on Linux Hosts

Important links

Information for mentors

Mentors are responsible for keeping in touch with their student and assessing the student's progress. GSoC has a mid-term evaluation and a final evaluation where both the mentor and student assess each other.

The mentor typically gives advice, reviews the student's code, and has regular communication with the student to ensure progress is being made.

Being a mentor is a significant time commitment, plan for 5 hours per week. Make sure you can make this commitment because backing out during the summer will affect the student's experience.

The mentor chooses their student by reviewing student application forms and conducting IRC interviews with candidates. Depending on the number of candidates, this can be time-consuming in itself. Choosing the right student is critical so that both the mentor and the student can have a successful experience.