Features/tcg-multithread: Difference between revisions
(31 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
This is the feature that allows the Tiny Code Generator run one host-thread per guest thread or guest vCPU (in system emulation mode). It was first introduced in QEMU [[ChangeLog/2.9|2.9]] for Alpha and ARM. Work to enable full multi-threading support in additional system emulations is on going. | |||
== | ==Overview== | ||
QEMU's system emulation mode could always emulate multiple vCPUs but it scheduled them in a single thread and executed each one in tern in a round-robin fashion. To switch to a host-thread per vCPU a number of changes had to be made to the core code as well as explicit support in each guest architecture. The design decisions are documented in {{src|path=docs/devel/multi-thread-tcg.txt}}. | |||
There was a talk at KVM Forum 2015 ([https://www.youtube.com/watch?v=KnSW0WjWHZI video] [http://www.linux-kvm.org/images/c/cf/02x02-Alex_Benee-Towards_Multithreaded_TCG.pdf slides]) which is a little out of date but acts as a useful primer on the challenges involved. | |||
== | ==Controlling MTTCG== | ||
Once a MTTCG guest is supported there should be no need to enable it explicitly. The system emulation will enable it if the following conditions are met: | |||
* The guest architecture has defined TARGET_SUPPORTS_MTTCG | |||
* The host architectures TCG_TARGET_DEFAULT_MO supports TCG_GUEST_DEFAULT_MO | |||
When this is not the case you can force MTTCG by specifying: | |||
$QEMU $OPTS --accel tcg,thread=multi | |||
although you are likely to get strange behaviour. If you suspect that guest emulation is incorrect you can revert to single threaded mode and re-run your test: | |||
=== | $QEMU $OPTS --accel tcg,thread=single | ||
==Incompatibilities== | |||
MTTCG is not compatible with -icount and enabling icount will force a single threaded run. | |||
== | ==Developer Details== | ||
===Porting a guest architecture=== | |||
Before MTTCG can be enabled for a guest the following changes must be made. | |||
* Correctly translate atomic/exclusive instructions (see tcg_gen_atomic_) | |||
* Ensure the translation step correctly handles barrier instructions (tcg_gen_mb) | |||
* Define TCG_GUEST_DEFAULT_MO | |||
* Audit instructions that modify system state | |||
** generally this means taking BQL (e.g. HELPER(set_cp_reg)) | |||
* Audit MMU management functions | |||
** cputlb provides an API for various tlb_flush_FOO operations | |||
** updates to the guests page tables need to be atomic (e.g. dirty bits) | |||
* Audit power/reset sequences | |||
** see for example {{src|path=target/arm/arm-powerctl.c}} | |||
The work queue API async_[safe_]run_on_cpu provides a mechanism for one vCPU to queue work on another. | |||
Once this work is done your final patch can update configure and enable TARGET_SUPPORTS_MTTCG | |||
===Testing=== | |||
Ideally you'll want a comprehensive set of tests to exercise the corner cases of system emulation behaviour. See [https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v7 Alex's kvm-unit-tests] for an example of how the ARM architecture is exercised. | |||
==Further Work== | |||
* Enabling strong-on-weak memory consistency (e.g. emulate x86 on an ARM host) | |||
==People== | |||
== | Now MTTCG is merged it is supported by the TCG maintainers. However the following people where involved: | ||
* Fred Konrad (Original core MTTCG patch set) | |||
* Alex Bennée (ARM testing, base enabling tree) | |||
* Alvise Rigo (LL/SC work) | |||
* Emilio Cota (QHT, cmpxchg atomics) | |||
==Other Reading== | |||
* Emilio's slides for his CGO17 paper - http://www.cs.columbia.edu/~cota/pubs/cota_cgo17-slides.pdf | |||
* Cross-ISA Machine Emulation for Multicores - http://www.cs.columbia.edu/~cota/pubs/cota_cgo17.pdf DOI:10.1109/CGO.2017.7863741 | |||
* Cross-ISA Machine Instrumentation using Fast and Scalable Dynamic Binary Translation - https://dl.acm.org/doi/pdf/10.1145/3313808.3313811 |
Latest revision as of 10:14, 4 August 2021
This is the feature that allows the Tiny Code Generator run one host-thread per guest thread or guest vCPU (in system emulation mode). It was first introduced in QEMU 2.9 for Alpha and ARM. Work to enable full multi-threading support in additional system emulations is on going.
Overview
QEMU's system emulation mode could always emulate multiple vCPUs but it scheduled them in a single thread and executed each one in tern in a round-robin fashion. To switch to a host-thread per vCPU a number of changes had to be made to the core code as well as explicit support in each guest architecture. The design decisions are documented in docs/devel/multi-thread-tcg.txt.
There was a talk at KVM Forum 2015 (video slides) which is a little out of date but acts as a useful primer on the challenges involved.
Controlling MTTCG
Once a MTTCG guest is supported there should be no need to enable it explicitly. The system emulation will enable it if the following conditions are met:
- The guest architecture has defined TARGET_SUPPORTS_MTTCG
- The host architectures TCG_TARGET_DEFAULT_MO supports TCG_GUEST_DEFAULT_MO
When this is not the case you can force MTTCG by specifying:
$QEMU $OPTS --accel tcg,thread=multi
although you are likely to get strange behaviour. If you suspect that guest emulation is incorrect you can revert to single threaded mode and re-run your test:
$QEMU $OPTS --accel tcg,thread=single
Incompatibilities
MTTCG is not compatible with -icount and enabling icount will force a single threaded run.
Developer Details
Porting a guest architecture
Before MTTCG can be enabled for a guest the following changes must be made.
- Correctly translate atomic/exclusive instructions (see tcg_gen_atomic_)
- Ensure the translation step correctly handles barrier instructions (tcg_gen_mb)
- Define TCG_GUEST_DEFAULT_MO
- Audit instructions that modify system state
- generally this means taking BQL (e.g. HELPER(set_cp_reg))
- Audit MMU management functions
- cputlb provides an API for various tlb_flush_FOO operations
- updates to the guests page tables need to be atomic (e.g. dirty bits)
- Audit power/reset sequences
- see for example target/arm/arm-powerctl.c
The work queue API async_[safe_]run_on_cpu provides a mechanism for one vCPU to queue work on another.
Once this work is done your final patch can update configure and enable TARGET_SUPPORTS_MTTCG
Testing
Ideally you'll want a comprehensive set of tests to exercise the corner cases of system emulation behaviour. See Alex's kvm-unit-tests for an example of how the ARM architecture is exercised.
Further Work
- Enabling strong-on-weak memory consistency (e.g. emulate x86 on an ARM host)
People
Now MTTCG is merged it is supported by the TCG maintainers. However the following people where involved:
- Fred Konrad (Original core MTTCG patch set)
- Alex Bennée (ARM testing, base enabling tree)
- Alvise Rigo (LL/SC work)
- Emilio Cota (QHT, cmpxchg atomics)
Other Reading
- Emilio's slides for his CGO17 paper - http://www.cs.columbia.edu/~cota/pubs/cota_cgo17-slides.pdf
- Cross-ISA Machine Emulation for Multicores - http://www.cs.columbia.edu/~cota/pubs/cota_cgo17.pdf DOI:10.1109/CGO.2017.7863741
- Cross-ISA Machine Instrumentation using Fast and Scalable Dynamic Binary Translation - https://dl.acm.org/doi/pdf/10.1145/3313808.3313811