Features/CPUModels: Difference between revisions

From QEMU
No edit summary
 
(62 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Summary ==
= OUTDATED PAGE =


This set of features provides a framework allowing
'''THIS PAGE IS OUTDATED. It included information about the plans for CPU model interfaces a few years ago.'''
cpu model definitions to be configured vs. the existing
scheme where such is hard-coded within qemu.
The general motivation for this originally was to support
contemporary processor architectures directly and intuitively
rather than resorting to the use of "-cpu qemu64"
augmented with a series of model specific feature flags.


Other considerations were to provide model names reflective of
Some history can be read at: https://habkost.net/posts/2017/03/qemu-cpu-model-probing-story.html
current processors, identify meaningful functional groups
within the architecture spectrum to facilitating guest migration,
and allowing more accurate and enforceable CPU feature
specification by the user.


== Owner ==
= Summary =


* '''Name:''' [[User:Eduardo Habkost]]
'''Presentation about the CPU model work on DevConf 2014: [[File:Cpu-models-and-libvirt-devconf-2014.pdf]]
 
This page was about the feature of "externally-configurable" CPU models, but its scope was gradually changed to discussion about the design of the CPU code, the CPU model system. The old "cpudef" config section was deprecated, so the original description doesn't apply anymore.
 
= Owner =
 
* '''Name:''' [[User:Ehabkost|Eduardo Habkost]]
* '''Email:''' ehabkost@redhat.com
* '''Email:''' ehabkost@redhat.com


== Detailed Summary ==
= Roadmap =
 
* Allow changing of Hypervisor CPUIDs (Don Slutz)
 
== Already done ==
 
=== QEMU 2.9 ===
 
* CPU model probing was implemented through the <tt>query-cpu-model-*</tt> QMP commands
 
=== After QEMU 1.5 ===
 
* x86 CPU properties (Igor Mammedov). '''DONE'''
* machine-friendly error reporting of -cpu enforce/check. '''OBSOLETED'''
** Obsoleted by <tt>query-cpu-model-expansion</tt> QMP commands and <tt>filtered-features</tt> property
* x86 CPU model subclasses. '''DONE'''
 
=== QEMU 1.5 ===
 
* CPU feature words refactor
* (equivalent to) machine-friendly reporting of -cpu enforce/check
** Actually, the new mechanism is based on the "filtered-features" X86CPU property
* Probing for CPU features supported by the host and can be enabled
** Using "-cpu host" and the "feature-words" property
* Probing for the features that are actually enabled on each CPU model
** Using "feature-words" and "filtered-features" property
 
=== QEMU 1.4 ===
 
* Make CPU a subclass of DeviceState (included)
* APIC-ID-related topology fixes (ehabkost) (RFC submitted)
* Fixes for -cpu enforce flag
 
=== Before QEMU 1.4 ===
 
* Drop "-cpu ?dump" (Peter Maydell)
* Move CPU models to C code (ehabkost)
* Eliminate cpudef config section support (ehabkost)
* "unduplicate feature names" series (ehabkost)
* -cpu host use GET_SUPPORTED_CPUID (ehabkost)
* add feature flag name list for CPUID 7
 
= Interfaces/requirements for libvirt =
 
== Ensuring predictable set of guest features ==
 
Requirement: libvirt needs to ensure all features required on the command-line are present and exposed to the guest.
 
Current problem: libvirt doesn't use the "enforce" flag so it can't guarantee that a given feature will be actually exposed to the guest.
 
Old solution: use the "enforce" flag on the "-cpu" option.
: Limitation: no proper machine-friendly interface to report which features are missing.
:: Workaround: See "querying for host capabilities" below.
 
New solution in 1.5: check if "filtered-features" property on CPU object is all zeroes.
 
See Also: [[#Disabling_features_that_were_always_disabled_on_KVM|Disabling features that were always disabled on KVM]]
 
== Listing CPU models ==
 
Requirement: libvirt needs to know which CPU models are available to be used with the "-cpu" option.
 
Current solution: libvirt uses QMP <tt>query-cpu-definitions</tt> command.
: Limitation: needs a live QEMU process for the query.
: Limitation: it can only list CPU model names and nothing else. See "Getting information about CPU models" section.
 
Proposed solution (TODO): use QMP <tt>qom-list-types</tt> command.
: Dependency: X86CPU subclasses.
: Limitation: needs a live QEMU process for the query.
: Example: <code>{ "execute": "qom-list-types", "arguments": { "implements": "cpu", "abstract": false } }</code>
: Caveat: the CPU class name for <code>-cpu ''model''</code> will in the format <code>''model''-''arch''-cpu</code> or <code>''model''-kvm-''arch''-cpu</code>.
 
 
Requirements: CPU class/model list should not depend on any other command-line option (e.g. ''not'' depend on machine-type)
 
Unanswered question: we may have separated subclasses for KVM and TCG CPU models.
 
=== Future plans ===
 
Would be interesting to get rid of the requirement for a live QEMU process to be started, just to list CPU models?
 
== Getting information about CPU models ==
 
Requirement: libvirt uses the predefined CPU models from QEMU, but it needs to be able to query for CPU model details, to find out how it can create a VM that matches what was requested by the user.
 
Current problem: libvirt has a copy of the CPU model definitions on its cpu_map.xml file, and the copy can be out of sync in case CPU models in QEMU change. libvirt also assumes that the set of features on each model is always the same on all machine-types, which is not true.
: Benefits of changing: cpu_map.xml and QEMU won't need to match exactly, anymore. The definitions exposed by libvirt could be completely different from the definitions in QEMU, as long as libvirt probes for CPU model information and uses the right flags in the command-line to make QEMU expose what libvirt users expect.
 
Challenge: the resulting CPU features depend on many factors:
:* Chosen CPU model name (of course)
:* machine-type
:* Host CPU vendor (unless explicit "vendor" option is used)
:* <strike>accel=kvm option (CPU models are different in TCG and KVM models)</strike> (we are going to make TCG and KVM behave the same)
:* <strike>Host CPU capabilities</strike> (not valid anymore, as long as "enforce" is used)
:* <strike>Host kernel capabilities</strike> (not valid anymore, as long as "enforce" is used)
:* <strike>kernel-irqchip option</strike> (not valid anymore, as long as "enforce" is used)
 
:Proposed Solution (TODO): start a paused VM with no devices, but with the right machine-type and right CPU model. Use QMP QOM commands to query for CPU flags (especially the properties starting with the "f-" prefix).
:: Dependency: X86CPU feature properties ("f-*" properties).
:: Limitation: requires a live QEMU process with the right machine-type/CPU-model to be started, to make the query.
:: Limitation: requires starting a new QEMU process for each machine-type/CPU-model pair that is going to be queried.
:Alternative solution: "feature-words" property
 
Problem: <code>qemu -machine ''machine'' -cpu ''model''</code> will create CPU objects where the CPU features are ''already'' filtered based on host capabilities.
:* Using "enforce" wouldn't solve it, because then QEMU would abort, and QMP would be unavailable.
:* Using "check" wouldn't solve it either, because the features are always filtered out when the CPU is created.
:: Solution: "filtered-features" property
 
Requirement: the resulting CPU features for a given host-CPU-vendor + machine-type + CPU-model combination '''must not''' ever change, on any future QEMU version.
: This should allow libvirt to safely cache CPU model data, even if the QEMU binary changes.
 
Requirement: libvirt needs to know if a specific CPU model can be used in the current host.
: See "Ensuring predictable set of guest features" above
: See "Querying host capabilities" below
 
Solution in 1.5: "feature-words" and "filtered-features" X86CPU properties
:Note: libvirt must combine both properties to find out the full CPU model definition. "feature-words" will always be filtered out based on host capabilities
 
== Querying host capabilities ==
 
Requirement: libvirt needs to know which feature can really be enabled, before it tries to start a VM, and before it tries to start a live-migration process.
 
The set of available capabilities depend on:
* Host CPU (hardware) capabilities;
* Kernel capabilities (reported by GET_SUPPORTED_CPUID);
* QEMU capabilities;
* Specific configuration options (e.g. in-kernel IRQ chip is required for some features).
 
Current problem: libvirt uses the CPUID intruction directly and assumes that the presence of a feature in the host CPU means it can be enabled and exposed to the guest. This breaks when virtualization of a feature requires:
* Additional hardware support (e.g. INVPCID);
* Additional host kernel code (this applies to _all_ CPU features, that need to be reported as supported by GET_SUPPORTED_CPUID);
* Additional QEMU-side code;
* Specific configuration options
** kernel-irqchip (affects tsc-deadline and x2apic availability)
** machine-type
** NOTE: any other option that affects CPU feature availability, MUST:
*** have defaults depending on machine-type, so libvirt versions that don't know about the new option will still work because they already check machine-type
*** be documented as affecting availability of CPU features, so once libvirt starts setting the option explicitly, it will take it into account when probing for host capabilities
 
 
Challenge: QEMU doesn't have a generic capability-querying interface, and host capability querying depends on KVM to be initialized.
: Workaround: start a paused VM using the "host" CPU model, that has every single CPU feature supported by the host enabled by default, and query for the information about the CPU though QMP, using the QOM commands.
 
Solution available in 15: start a paused VM with no devices with the "host" CPU model and check the "feature-words" property of the X86CPU object
: Expectation: "filtered-features" should be always all-zeroes when using "-cpu host". If it is not, it is a QEMU bug
:: Problem: libvirt shouldn't be running QEMU multiple times on initialization, for every QEMU binary. libvirt runs QEMU once, already, but when running it, it doesn't know if KVM (and the "host" CPU model) is going to be available, and it is run using "-machine none".
::: Proposed solution: we should make classes for each CPU model, libvirt could start using "-machine none" and create a new "host-x86-cpu" object via QMP.
:::: Requirement: "device_add host-x86-cpu" should work even if using "-machine none"
:::: Requirement: "device_add host-x86-cpu" should make the "feature-words" property (and the future "f-*" properties) be filled correctly.
 
Proposed solution (TODO): start a paused VM with no devices but with "host" CPU model and use QMP QOM commands to query for "f-*" feature properties
: Dependency: X86CPU feature properties
 
 
== Getting level/xlevel/xlevel2 set properly ==
 
Fact: libvirt sometimes adds features based on host capabilities, and this often generates "-cpu ExistingModel,+feature,+feature2,+feature3" command-line options.
: Problem: sometimes using "+feature" won't work if other fields need to be set for the feature to work.
:: Proposed solution: "level" and "xlevel" should be increased automatically if a feature requires it to be set to a higher value, unless it has an override value set on the command-line.
 
 
== Disabling features that were always disabled on KVM ==
 
Challenge: existing configurations may be already broken (people may be using a CPU model, getting some features filtered out silently, and not want their existing configuration to break).
 
: Example: the "monitor" feature was never supported by KVM, but it is included in many CPU models.
:: Proposed solution: If libvirt wants to keep existing VMs using (e.g.) "core2duo" working and not break guest ABI, it will need to use "-cpu core2duo,-monitor", to keep guest ABI.
::: Note: Ignoring "monitor" when checking the "filtered-features" property won't be enough, because newer kernels may really support the "monitor" flag, and on those cases, I assume we want to keep it disabled to maintain guest ABI.
 
: Example 2: the "rdtscp" flag
:: Fact: on AMD hosts, exposing rdtscp was never supported by KVM
:: Fact: TCG supports rdtscp, so the AMD CPU models do have rdtscp enabled in QEMU
::: Assumption: we don't want CPU model definitions to look different in KVM and TCG mode, to keep the rules of the QEMU<->libvirt interfaces simpler
:: Fact: currently libvirt runs CPU models having rdtscp without the "enforce" flag, and rdtscp is silently disabled
::: Consequence: libvirt SHOULD use something like "-cpu Opteron_G5,-rdtscp", especially when it starts using (or emulating) enforce mode
::: This will require a solution on libvirt side. QEMU will just provide the mechanisms to report CPU model information and check what the host and QEMU supports, but the decision to disable rdtscp to be able to run Opteron_G[2345] needs to be taken by libvirt.
 
= Solved challenges =
 
== Allowing CPU models to be updated ==
 
We need a mechanism to allow the existing CPU models on Qemu to be updated, without making guest-visible changes for existing Virtual Machines, when migrating to a new version.
 
=== Examples ===


This functionality deprecates the prior hard wired
Examples where CPU model updates are necessary and have to be deployed to users:
definitions with a configuration file approach for new
models.  Existing hard-wired models currently remain
but are likely to be transitioned to the configuration
file representation.  At the present they may however be
overridden by an identically named model definition in
the configuration file.


Proposed new model definitions are provided here for current
* The Nehalem CPU model currently has the wrong "level" value, making CPU topology information unavailable.
AMD and Intel processors.  Each model consists of a name
* The CPUID PMU leaf was added on Qemu 1.1, but it is not supposed to be visible to guests running using -M pc-1.0
used to select it on the command line [-cpu <name>], and a
* New features are implemented by KVM and we may want to add them to existing models (e.g. SandyBridge may need to have tsc-deadline added)
model_id which by convention corresponds to a least common denominator
commercial instance of the processor class.  The following describes
how the added CPU model functionality is visible to the command line
user.


A table of names/model_ids of all registered CPU definitions may be queried via "-cpu ?model":
=== Requirements ===


        :
* A different CPU will be visible to the guest depending on the machine-type chosen.
    x86      Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)         
** That means that "-M pc-1.0 -cpu Nehalem" will be different from "-M pc-1.1 -cpu Nehalem"
    x86      Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)         
** Rationale:
    x86      Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)         
*** The meaning of "-M pc-1.0 -cpu Nehalem" can't be changed or it will change existing guests
    x86          Nehalem Intel Core i7 9xx (Nehalem Class Core i7)     
*** The meaning of "-M pc-1.1 -cpu Nehalem" needs to be different from the pc-1.0 one, otherwise we would be stuck with a broken "Nehalem" model forever
    x86          Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)   
    x86          Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)
        :       


Also added is "-cpu ?dump" which exhaustively outputs all config
=== Status/solution ===
data for all defined models, and "-cpu ?cpuid" which enumerates
all qemu recognized CPUID feature flags.


The pseudo CPUID flag 'check' when appearing in the command line
* CPU model definitions were moved to C code, so we can easily add compatibility code to them if necessary
feature flag list will warn when feature flags (either implicit
* CPUs are now DeviceState objects
in a cpu model or explicit on the command line) would have
* CPU models will become separate classes, so per-CPU-model compatibility properties can be used on machine-type definitions
otherwise been quietly unavailable to a guest:


    # qemu-system-x86_64 ... -cpu Nehalem,check
== <code>-cpu host</code> and feature probing ==
    warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000]
    warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]


A similar 'enforce' pseudo flag exists which in addition
See http://article.gmane.org/gmane.comp.emulators.kvm.devel/90035
to the above causes qemu to error exit if requested flags are
unavailable.


Configuration data for a cpu model resides in the target config
== <code>-cpu host</code> vs <code>-cpu best</code> ==
file which by default will be installed as:


    /usr/local/etc/qemu/target-<arch>.conf
Currently we have <code>-cpu host</code>, but the naming and semantics are unclear.


The format of this file should be self explanatory given the
We have 3 possible modes of "try to get the best CPU model":
definitions for the above six models and essentially mimics
the structure of the existing static x86_def_t x86_defs.  The
CPU model groupings and definitions provided by the default
configuration file are believed to be accurate and applicable
for the majority of use cases but by definition may be
modified to support alternate schemes.


Encoding of CPUID flag names now allows aliases for both the
# '''all-you-can-enable''': Enable every single bit that can be enabled, including the ones ''not present on the host'' but that can be emulated.
configuration file and the command line which reconciles some
# '''match-host-CPU''': Enable all bits ''that are present in the host CPU'' that can be enabled.
Intel/AMD/Linux/Qemu naming differences. An exhaustive dump
# '''best-predefined-model''': Use the best CPU model available from the pre-defined CPU model list.
of CPUID flag names may be obtained via "-cpu ?cpuid".


== Configuration File Format ==
=== Status ===


Per CPU definition, the following attributes are accepted.
* <code>-cpu host</code> will be the "all-you-can-enable" mode, that will enable every bit from GET_SUPPORTED_CPUID on the VCPU
This is best illustrated by an example:
* We're not going to have a mode for ''match-host-CPU'', probably
* A "best-predefined-model" mode can be implemented by libvirt.


    [cpudef]
== Moving CPU model definitions to C code ==
        name = "Opteron_G3"
        level = "5"
        vendor = "AuthenticAMD"
        family = "15"
        model = "6"
        stepping = "1"
        feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
        feature_ecx = "sse3 cx16 monitor popcnt"
        extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
        extfeature_ecx = "svm sse4a  abm misalignsse lahf_lm"
        xlevel = "0x80000008"
        model_id = "AMD Opteron 23xx (Gen 3 Class Opteron)"


Where:
The old "cpudef" config section was deprecated because there are expectations that QEMU is going to provide the CPU model list, and will keep migration compatibility using machine-types. Machine-type compatibility code is incide QEMU C code, so making external config files depend and/or be affected by internal QEMU C code would be confusing and fragile. Now both CPU model definitions and per-machine-type CPU-model compatibility code are inside the QEMU C code.


* [cpudef] -- flags a definition block
== check/enforce flags ==
* name -- tag used to identify a model on the command line
* vendor -- 12 byte vendor ID
* family -- family code
* model -- model code
* stepping -- production revision
* feature_edx -- CPUID function 0000_0001 returned register EDX content (CPUID feature flags)
* feature_ecx -- CPUID function 0000_0001 returned register ECX content (CPUID feature flags)
* extfeature_edx -- CPUID function 8000_0001 returned register EDX content (CPUID feature flags)
* extfeature_ecx -- CPUID function 8000_0001 returned register ECX content (CPUID feature flags)
* xlevel -- largest extended function supported
* model_id -- model identification string


== Status ==
The pseudo CPUID flag 'check' when appearing in the command line
feature flag list will warn when feature flags (either implicit
in a cpu model or explicit on the command line) would have
otherwise been quietly unavailable to a guest:


This functionality is available in qemu version 0.13.
    # qemu-system-x86_64 ... -cpu Nehalem,check
    warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000]
    warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]


At the time this documentation was written, a proposed change
A similar 'enforce' pseudo flag exists which in addition
to the configuration file syntax exists which would cause minor
to the above causes qemu to error exit if requested flags are
impact to the current structure of the CPU Model configuration file.
unavailable.

Latest revision as of 13:50, 30 October 2017

OUTDATED PAGE

THIS PAGE IS OUTDATED. It included information about the plans for CPU model interfaces a few years ago.

Some history can be read at: https://habkost.net/posts/2017/03/qemu-cpu-model-probing-story.html

Summary

Presentation about the CPU model work on DevConf 2014: File:Cpu-models-and-libvirt-devconf-2014.pdf

This page was about the feature of "externally-configurable" CPU models, but its scope was gradually changed to discussion about the design of the CPU code, the CPU model system. The old "cpudef" config section was deprecated, so the original description doesn't apply anymore.

Owner

Roadmap

  • Allow changing of Hypervisor CPUIDs (Don Slutz)

Already done

QEMU 2.9

  • CPU model probing was implemented through the query-cpu-model-* QMP commands

After QEMU 1.5

  • x86 CPU properties (Igor Mammedov). DONE
  • machine-friendly error reporting of -cpu enforce/check. OBSOLETED
    • Obsoleted by query-cpu-model-expansion QMP commands and filtered-features property
  • x86 CPU model subclasses. DONE

QEMU 1.5

  • CPU feature words refactor
  • (equivalent to) machine-friendly reporting of -cpu enforce/check
    • Actually, the new mechanism is based on the "filtered-features" X86CPU property
  • Probing for CPU features supported by the host and can be enabled
    • Using "-cpu host" and the "feature-words" property
  • Probing for the features that are actually enabled on each CPU model
    • Using "feature-words" and "filtered-features" property

QEMU 1.4

  • Make CPU a subclass of DeviceState (included)
  • APIC-ID-related topology fixes (ehabkost) (RFC submitted)
  • Fixes for -cpu enforce flag

Before QEMU 1.4

  • Drop "-cpu ?dump" (Peter Maydell)
  • Move CPU models to C code (ehabkost)
  • Eliminate cpudef config section support (ehabkost)
  • "unduplicate feature names" series (ehabkost)
  • -cpu host use GET_SUPPORTED_CPUID (ehabkost)
  • add feature flag name list for CPUID 7

Interfaces/requirements for libvirt

Ensuring predictable set of guest features

Requirement: libvirt needs to ensure all features required on the command-line are present and exposed to the guest.

Current problem: libvirt doesn't use the "enforce" flag so it can't guarantee that a given feature will be actually exposed to the guest.

Old solution: use the "enforce" flag on the "-cpu" option.

Limitation: no proper machine-friendly interface to report which features are missing.
Workaround: See "querying for host capabilities" below.

New solution in 1.5: check if "filtered-features" property on CPU object is all zeroes.

See Also: Disabling features that were always disabled on KVM

Listing CPU models

Requirement: libvirt needs to know which CPU models are available to be used with the "-cpu" option.

Current solution: libvirt uses QMP query-cpu-definitions command.

Limitation: needs a live QEMU process for the query.
Limitation: it can only list CPU model names and nothing else. See "Getting information about CPU models" section.

Proposed solution (TODO): use QMP qom-list-types command.

Dependency: X86CPU subclasses.
Limitation: needs a live QEMU process for the query.
Example: { "execute": "qom-list-types", "arguments": { "implements": "cpu", "abstract": false } }
Caveat: the CPU class name for -cpu model will in the format model-arch-cpu or model-kvm-arch-cpu.


Requirements: CPU class/model list should not depend on any other command-line option (e.g. not depend on machine-type)

Unanswered question: we may have separated subclasses for KVM and TCG CPU models.

Future plans

Would be interesting to get rid of the requirement for a live QEMU process to be started, just to list CPU models?

Getting information about CPU models

Requirement: libvirt uses the predefined CPU models from QEMU, but it needs to be able to query for CPU model details, to find out how it can create a VM that matches what was requested by the user.

Current problem: libvirt has a copy of the CPU model definitions on its cpu_map.xml file, and the copy can be out of sync in case CPU models in QEMU change. libvirt also assumes that the set of features on each model is always the same on all machine-types, which is not true.

Benefits of changing: cpu_map.xml and QEMU won't need to match exactly, anymore. The definitions exposed by libvirt could be completely different from the definitions in QEMU, as long as libvirt probes for CPU model information and uses the right flags in the command-line to make QEMU expose what libvirt users expect.

Challenge: the resulting CPU features depend on many factors:

  • Chosen CPU model name (of course)
  • machine-type
  • Host CPU vendor (unless explicit "vendor" option is used)
  • accel=kvm option (CPU models are different in TCG and KVM models) (we are going to make TCG and KVM behave the same)
  • Host CPU capabilities (not valid anymore, as long as "enforce" is used)
  • Host kernel capabilities (not valid anymore, as long as "enforce" is used)
  • kernel-irqchip option (not valid anymore, as long as "enforce" is used)
Proposed Solution (TODO): start a paused VM with no devices, but with the right machine-type and right CPU model. Use QMP QOM commands to query for CPU flags (especially the properties starting with the "f-" prefix).
Dependency: X86CPU feature properties ("f-*" properties).
Limitation: requires a live QEMU process with the right machine-type/CPU-model to be started, to make the query.
Limitation: requires starting a new QEMU process for each machine-type/CPU-model pair that is going to be queried.
Alternative solution: "feature-words" property

Problem: qemu -machine machine -cpu model will create CPU objects where the CPU features are already filtered based on host capabilities.

  • Using "enforce" wouldn't solve it, because then QEMU would abort, and QMP would be unavailable.
  • Using "check" wouldn't solve it either, because the features are always filtered out when the CPU is created.
Solution: "filtered-features" property

Requirement: the resulting CPU features for a given host-CPU-vendor + machine-type + CPU-model combination must not ever change, on any future QEMU version.

This should allow libvirt to safely cache CPU model data, even if the QEMU binary changes.

Requirement: libvirt needs to know if a specific CPU model can be used in the current host.

See "Ensuring predictable set of guest features" above
See "Querying host capabilities" below

Solution in 1.5: "feature-words" and "filtered-features" X86CPU properties

Note: libvirt must combine both properties to find out the full CPU model definition. "feature-words" will always be filtered out based on host capabilities

Querying host capabilities

Requirement: libvirt needs to know which feature can really be enabled, before it tries to start a VM, and before it tries to start a live-migration process.

The set of available capabilities depend on:

  • Host CPU (hardware) capabilities;
  • Kernel capabilities (reported by GET_SUPPORTED_CPUID);
  • QEMU capabilities;
  • Specific configuration options (e.g. in-kernel IRQ chip is required for some features).

Current problem: libvirt uses the CPUID intruction directly and assumes that the presence of a feature in the host CPU means it can be enabled and exposed to the guest. This breaks when virtualization of a feature requires:

  • Additional hardware support (e.g. INVPCID);
  • Additional host kernel code (this applies to _all_ CPU features, that need to be reported as supported by GET_SUPPORTED_CPUID);
  • Additional QEMU-side code;
  • Specific configuration options
    • kernel-irqchip (affects tsc-deadline and x2apic availability)
    • machine-type
    • NOTE: any other option that affects CPU feature availability, MUST:
      • have defaults depending on machine-type, so libvirt versions that don't know about the new option will still work because they already check machine-type
      • be documented as affecting availability of CPU features, so once libvirt starts setting the option explicitly, it will take it into account when probing for host capabilities


Challenge: QEMU doesn't have a generic capability-querying interface, and host capability querying depends on KVM to be initialized.

Workaround: start a paused VM using the "host" CPU model, that has every single CPU feature supported by the host enabled by default, and query for the information about the CPU though QMP, using the QOM commands.

Solution available in 15: start a paused VM with no devices with the "host" CPU model and check the "feature-words" property of the X86CPU object

Expectation: "filtered-features" should be always all-zeroes when using "-cpu host". If it is not, it is a QEMU bug
Problem: libvirt shouldn't be running QEMU multiple times on initialization, for every QEMU binary. libvirt runs QEMU once, already, but when running it, it doesn't know if KVM (and the "host" CPU model) is going to be available, and it is run using "-machine none".
Proposed solution: we should make classes for each CPU model, libvirt could start using "-machine none" and create a new "host-x86-cpu" object via QMP.
Requirement: "device_add host-x86-cpu" should work even if using "-machine none"
Requirement: "device_add host-x86-cpu" should make the "feature-words" property (and the future "f-*" properties) be filled correctly.

Proposed solution (TODO): start a paused VM with no devices but with "host" CPU model and use QMP QOM commands to query for "f-*" feature properties

Dependency: X86CPU feature properties


Getting level/xlevel/xlevel2 set properly

Fact: libvirt sometimes adds features based on host capabilities, and this often generates "-cpu ExistingModel,+feature,+feature2,+feature3" command-line options.

Problem: sometimes using "+feature" won't work if other fields need to be set for the feature to work.
Proposed solution: "level" and "xlevel" should be increased automatically if a feature requires it to be set to a higher value, unless it has an override value set on the command-line.


Disabling features that were always disabled on KVM

Challenge: existing configurations may be already broken (people may be using a CPU model, getting some features filtered out silently, and not want their existing configuration to break).

Example: the "monitor" feature was never supported by KVM, but it is included in many CPU models.
Proposed solution: If libvirt wants to keep existing VMs using (e.g.) "core2duo" working and not break guest ABI, it will need to use "-cpu core2duo,-monitor", to keep guest ABI.
Note: Ignoring "monitor" when checking the "filtered-features" property won't be enough, because newer kernels may really support the "monitor" flag, and on those cases, I assume we want to keep it disabled to maintain guest ABI.
Example 2: the "rdtscp" flag
Fact: on AMD hosts, exposing rdtscp was never supported by KVM
Fact: TCG supports rdtscp, so the AMD CPU models do have rdtscp enabled in QEMU
Assumption: we don't want CPU model definitions to look different in KVM and TCG mode, to keep the rules of the QEMU<->libvirt interfaces simpler
Fact: currently libvirt runs CPU models having rdtscp without the "enforce" flag, and rdtscp is silently disabled
Consequence: libvirt SHOULD use something like "-cpu Opteron_G5,-rdtscp", especially when it starts using (or emulating) enforce mode
This will require a solution on libvirt side. QEMU will just provide the mechanisms to report CPU model information and check what the host and QEMU supports, but the decision to disable rdtscp to be able to run Opteron_G[2345] needs to be taken by libvirt.

Solved challenges

Allowing CPU models to be updated

We need a mechanism to allow the existing CPU models on Qemu to be updated, without making guest-visible changes for existing Virtual Machines, when migrating to a new version.

Examples

Examples where CPU model updates are necessary and have to be deployed to users:

  • The Nehalem CPU model currently has the wrong "level" value, making CPU topology information unavailable.
  • The CPUID PMU leaf was added on Qemu 1.1, but it is not supposed to be visible to guests running using -M pc-1.0
  • New features are implemented by KVM and we may want to add them to existing models (e.g. SandyBridge may need to have tsc-deadline added)

Requirements

  • A different CPU will be visible to the guest depending on the machine-type chosen.
    • That means that "-M pc-1.0 -cpu Nehalem" will be different from "-M pc-1.1 -cpu Nehalem"
    • Rationale:
      • The meaning of "-M pc-1.0 -cpu Nehalem" can't be changed or it will change existing guests
      • The meaning of "-M pc-1.1 -cpu Nehalem" needs to be different from the pc-1.0 one, otherwise we would be stuck with a broken "Nehalem" model forever

Status/solution

  • CPU model definitions were moved to C code, so we can easily add compatibility code to them if necessary
  • CPUs are now DeviceState objects
  • CPU models will become separate classes, so per-CPU-model compatibility properties can be used on machine-type definitions

-cpu host and feature probing

See http://article.gmane.org/gmane.comp.emulators.kvm.devel/90035

-cpu host vs -cpu best

Currently we have -cpu host, but the naming and semantics are unclear.

We have 3 possible modes of "try to get the best CPU model":

  1. all-you-can-enable: Enable every single bit that can be enabled, including the ones not present on the host but that can be emulated.
  2. match-host-CPU: Enable all bits that are present in the host CPU that can be enabled.
  3. best-predefined-model: Use the best CPU model available from the pre-defined CPU model list.

Status

  • -cpu host will be the "all-you-can-enable" mode, that will enable every bit from GET_SUPPORTED_CPUID on the VCPU
  • We're not going to have a mode for match-host-CPU, probably
  • A "best-predefined-model" mode can be implemented by libvirt.

Moving CPU model definitions to C code

The old "cpudef" config section was deprecated because there are expectations that QEMU is going to provide the CPU model list, and will keep migration compatibility using machine-types. Machine-type compatibility code is incide QEMU C code, so making external config files depend and/or be affected by internal QEMU C code would be confusing and fragile. Now both CPU model definitions and per-machine-type CPU-model compatibility code are inside the QEMU C code.

check/enforce flags

The pseudo CPUID flag 'check' when appearing in the command line feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest:

   # qemu-system-x86_64 ... -cpu Nehalem,check
   warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000]
   warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]

A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable.