Features/tcg-multithread: Difference between revisions

From QEMU
(start of update)
Line 28: Line 28:
==Plan and problems to solve==
==Plan and problems to solve==


There are 3 main groups of problems and the additional work of enabling the various front and backends.
There are 3 main groups of problems and the additional work of enabling the various front and back ends.


====General Thread Safety===
===General Thread Safety===


These are covered by the current "Base enabling patches for MTTCG" ([https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg00922.html v3], [https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4 WIP Branch]). This is an architecture independent patch series which allows you to run multi-threaded test programs as long as they don't make any assumptions about:
These are covered by the current "Base enabling patches for MTTCG" ([https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg00922.html v3], [https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4 WIP Branch]). This is an architecture independent patch series which allows you to run multi-threaded test programs as long as they don't make any assumptions about:
Line 67: Line 67:


You can subscribe here:
You can subscribe here:
http://listserver.greensocs.com/wws/info/mttcg
        http://listserver.greensocs.com/wws/info/mttcg


If you send to this mail list, please make sure to copy qemu-devel as well.
If you send to this mail list, please make sure to copy qemu-devel as well.

Revision as of 11:34, 27 July 2016

MultiThreaded support in the TCG

This is work in progress. The most tested combination is ARMv7 running on an x86 backend however the general patches run for all architectures depending on what the test case is doing. For full support however each Front End (guest) and Back End (tcg host) need to be converted to have solutions for:

  • Atomic Instructions
  • Memory Coherence (honouring barriers)

The intention is to support all combinations where they make sense. See the bottom of the page for links, recent discussions and code.

Overview

Qemu can currently emulate a number of CPU’s in parallel, but it does so in a single thread. Given many modern hosts are multi-core, and many targets equally use multiple cores, a significant performance advantage can be realised by making use of multiple host threads to simulate multiple target cores.

There was a talk at KVM Forum 2015 (video slides) which acts as a useful primer. The general thread safety for system-emulation TCG builds on the work already done for linux-user emulation. Indeed some of the work has already been merged and is making a difference to the linux-user code. The main focus is working on whole system emulation.

The last design document was was posted to the list in June 2016. The current work in progress can be found in Alex's GIT tree.

Already Merged Work

  • Atomic patching of TranslationBlocks
  • Re-factoring of main cpu_exec loop
  • QHT based lookups of next TB

Ready to Merge

  • Lockless hot-path in cpu_exec (build on QHT, in Paolo's tree for post 2.7)

Plan and problems to solve

There are 3 main groups of problems and the additional work of enabling the various front and back ends.

General Thread Safety

These are covered by the current "Base enabling patches for MTTCG" (v3, WIP Branch). This is an architecture independent patch series which allows you to run multi-threaded test programs as long as they don't make any assumptions about:

  • Atomicity
  • Memory consistency
  • Cache flushes behaviour (v4 should fix cputlb)

This basically means dedicated test programs see Alex's kvm-unit-tests

Memory consistency

This is a current 2016 GSoC project

Host and guest might implement different memory consistency models. While supporting a weak ordering model on a strong ordering backend isn't a problem it's going to be hard supporting strong ordering on a weakly ordered backend.

  • Watch out for subtle differences; e.g. x86 is mostly strong ordered but can reorder stores made by the same CPU doing the load.

Instruction atomicity=

There a number of approaches being discussed on the list at the moment

How to get involved

Right now, there is a small dedicated team looking at this issue. Those are:

  • Fred Konrad (Core MTTCG patch set)
  • Alvise Rigo (LL/SC work)
  • Alex Bennée (Review, testing)
  • Mark Burton
  • Pavel Dovgalyuk

Mailing List

If you would like to be involved, please use the mail list: mttcg@listserver.greensocs.com

You can subscribe here:

       http://listserver.greensocs.com/wws/info/mttcg

If you send to this mail list, please make sure to copy qemu-devel as well.

There is a once a fortnight phone conference with summary notes posted to the mailing lists (archives).

Current Code

Remember these trees are WORK-IN-PROGRESS and could be broken at any particular point. Branches may be re-based without notice.

MTTCG Work:

LL/SC Work

MTTCG Test Cases:

These are tests specifically designed to exercise the code, based on kvm-unit-tests:

Other Work

This is the most important section initially, and we welcome any, and all comments and other work. If you know of any patch sets that may be of value, PLEASE let us know via the qemu-devel mail list.

Proof of concept implementations

Below are all the proof of concept implementations we have found thus far. It is highly likely that some of these patch sets can help us to reach an up-streamable solution. At the very least these provide some evidence that there is a performance improvement to be had.

Follow up work

There are some additional things that will need to be looked at for user-mode emulation.

Signal Handling

There are two types of signal we need to handle. Synchronous (e.g. SIGBUS, SIGSEG) and Asynchronous (e.g. SIGSTOP, SIGINT, SIGUSR). While any signal can be sent asynchronously most of the common synchronous ones occur when there is an error in the translated code. As such rectifying machine state is fairly well tested. For Asynchronus signals there are a plethora of edge cases to deal with especially around the handling of signals with respect to system calls. If they arrive during translated code there behaviour is fairly easy to handle however when in QEMU's own code care has to be taken that syscalls respond correctly to the EINTR.