Features/tcg-multithread
MultiThreaded support in the TCG
This is work in progress. The most tested combination is ARMv7 running on an x86 backend however the general patches run for all architectures depending on what the test case is doing. For full support however each Front End (guest) and Back End (tcg host) need to be converted to have solutions for:
- Atomic Instructions
- Memory Coherence (honouring barriers)
The intention is to support all combinations where they make sense. See the bottom of the page for links, recent discussions and code.
Overview
Qemu can currently emulate a number of CPU’s in parallel, but it does so in a single thread. Given many modern hosts are multi-core, and many targets equally use multiple cores, a significant performance advantage can be realised by making use of multiple host threads to simulate multiple target cores.
There was a talk at KVM Forum 2015 (video slides) which acts as a useful primer. The general thread safety for system-emulation TCG builds on the work already done for linux-user emulation. Indeed some of the work has already been merged and is making a difference to the linux-user code. The main focus is working on whole system emulation.
The last design document was was posted to the list in June 2016. The current work in progress can be found in Alex's GIT tree.
Already Merged Work
- Atomic patching of TranslationBlocks
- Re-factoring of main cpu_exec loop
- QHT based lookups of next TB
- Initial memory consistency support (GSoC 2016)
- Lockless hot-path in cpu_exec (build on QHT)
- cpu-exec: Safe work in quiescent state (gives thread safe tb_flush)
Ready to Merge
- cmpxchg-based atomics
Plan and problems to solve
There are 3 main groups of problems and the additional work of enabling the various front and back ends.
General Thread Safety
These are covered by the current "Base enabling patches for MTTCG" (v3, WIP Branch). This is an architecture independent patch series which allows you to run multi-threaded test programs as long as they don't make any assumptions about:
- Atomicity
- Memory consistency
- Cache flushes behaviour (v4 should fix cputlb)
This basically means dedicated test programs see Alex's kvm-unit-tests
Memory consistency
Host and guest might implement different memory consistency models. While supporting a weak ordering model on a strong ordering back-end isn't a problem it's going to be hard supporting strong ordering on a weakly ordered back-end.
- Remaining Case: strong on weak, ex. emulating x86 memory model on ARM systems
Instruction atomicity
There a number of approaches being discussed on the list at the moment:
cmpxchg-based emulation of atomics
This work by Emilio Cota and Richard Henderson adds a number of atomic primitives which can be used in TCG code to emulate atomic instructions and paired load-link store-conditionals.
Slow path for atomic instruction emulation
This work by Alvise Rigo tweaks the SoftMMU emulation to trigger a slow path in contended cases.
Front-end and Back-end conversions
Each front end will need to be converted to use MTTCG aware atomics and instrument their barrier instructions.
Each back end will need to support the generation of new TCGOps required to support the front ends.
How to get involved
Right now, there is a small dedicated team looking at this issue. Those are:
- Alex Bennée (Review, testing, base enabling tree)
- Fred Konrad (Original core MTTCG patch set)
- Alvise Rigo (LL/SC work)
- Emilio Cota (QHT, cmpxchg atomics)
- Mark Burton
- Pavel Dovgalyuk
Mailing List
If you would like to be involved, please use the mail list: mttcg@listserver.greensocs.com
You can subscribe here:
http://listserver.greensocs.com/wws/info/mttcg
If you send to this mail list, please make sure to copy qemu-devel as well.
There is a once a fortnight phone conference with summary notes posted to the mailing lists (archives).
Current Code
Remember these trees are WORK-IN-PROGRESS and could be broken at any particular point. Branches may be re-based without notice.
MTTCG Work:
- Latest Tree: https://github.com/stsquad/qemu (branch:mttcg/enable-mttcg-for-armv7-v1)
- Fred's Tree: http://git.greensocs.com/fkonrad/mttcg.git (branch:multi_tcg_v8)
LL/SC Work
- Alvise's Tree: https://git.virtualopensystems.com/dev/qemu-mt.git (branch:slowpath-for-atomic-v8-no-mttcg)
MTTCG Test Cases:
These are tests specifically designed to exercise the code, based on kvm-unit-tests:
Other Work
This is the most important section initially, and we welcome any, and all comments and other work. If you know of any patch sets that may be of value, PLEASE let us know via the qemu-devel mail list.
Proof of concept implementations
Below are all the proof of concept implementations we have found thus far. It is highly likely that some of these patch sets can help us to reach an up-streamable solution. At the very least these provide some evidence that there is a performance improvement to be had.