Revision as of 18:45, 20 March 2012

Summary

This set of features provides a framework allowing cpu model definitions to be configured vs. the existing scheme where such is hard-coded within qemu. The general motivation for this originally was to support contemporary processor architectures directly and intuitively rather than resorting to the use of "-cpu qemu64" augmented with a series of model specific feature flags.

Other considerations were to provide model names reflective of current processors, identify meaningful functional groups within the architecture spectrum to facilitating guest migration, and allowing more accurate and enforceable CPU feature specification by the user.

Owner

Name: User:Eduardo Habkost
Email: ehabkost@redhat.com

Detailed Summary

This functionality deprecates the prior hard wired definitions with a configuration file approach for new models. Existing hard-wired models currently remain but are likely to be transitioned to the configuration file representation. At the present they may however be overridden by an identically named model definition in the configuration file.

Proposed new model definitions are provided here for current AMD and Intel processors. Each model consists of a name used to select it on the command line [-cpu <name>], and a model_id which by convention corresponds to a least common denominator commercial instance of the processor class. The following describes how the added CPU model functionality is visible to the command line user.

A table of names/model_ids of all registered CPU definitions may be queried via "-cpu ?model":

       :
   x86       Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)          
   x86       Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)          
   x86       Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)           
   x86          Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)       
   x86           Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)    
   x86           Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)
       :

Also added is "-cpu ?dump" which exhaustively outputs all config data for all defined models, and "-cpu ?cpuid" which enumerates all qemu recognized CPUID feature flags.

The pseudo CPUID flag 'check' when appearing in the command line feature flag list will warn when feature flags (either implicit in a cpu model or explicit on the command line) would have otherwise been quietly unavailable to a guest:

   # qemu-system-x86_64 ... -cpu Nehalem,check
   warning: host cpuid 0000_0001 lacks requested flag 'sse4.2|sse4_2' [0x00100000]
   warning: host cpuid 0000_0001 lacks requested flag 'popcnt' [0x00800000]

A similar 'enforce' pseudo flag exists which in addition to the above causes qemu to error exit if requested flags are unavailable.

Configuration data for a cpu model resides in the target config file which by default will be installed as:

   /usr/local/etc/qemu/target-<arch>.conf

The format of this file should be self explanatory given the definitions for the above six models and essentially mimics the structure of the existing static x86_def_t x86_defs. The CPU model groupings and definitions provided by the default configuration file are believed to be accurate and applicable for the majority of use cases but by definition may be modified to support alternate schemes.

Encoding of CPUID flag names now allows aliases for both the configuration file and the command line which reconciles some Intel/AMD/Linux/Qemu naming differences. An exhaustive dump of CPUID flag names may be obtained via "-cpu ?cpuid".

Configuration File Format

Per CPU definition, the following attributes are accepted. This is best illustrated by an example:

   [cpudef]
       name = "Opteron_G3"
       level = "5"
       vendor = "AuthenticAMD"
       family = "15"
       model = "6"
       stepping = "1"
       feature_edx = "sse2 sse fxsr mmx pat cmov pge sep apic cx8 mce pae msr tsc pse de fpu    mtrr clflush mca pse36"
       feature_ecx = "sse3 cx16 monitor popcnt"
       extfeature_edx = "fxsr mmx pat cmov pge apic cx8 mce pae msr tsc pse de fpu    lm syscall nx rdtscp"
       extfeature_ecx = "svm sse4a  abm misalignsse lahf_lm"
       xlevel = "0x80000008"
       model_id = "AMD Opteron 23xx (Gen 3 Class Opteron)"

Where:

[cpudef] -- flags a definition block
name -- tag used to identify a model on the command line
vendor -- 12 byte vendor ID
family -- family code
model -- model code
stepping -- production revision
feature_edx -- CPUID function 0000_0001 returned register EDX content (CPUID feature flags)
feature_ecx -- CPUID function 0000_0001 returned register ECX content (CPUID feature flags)
extfeature_edx -- CPUID function 8000_0001 returned register EDX content (CPUID feature flags)
extfeature_ecx -- CPUID function 8000_0001 returned register ECX content (CPUID feature flags)
xlevel -- largest extended function supported
model_id -- model identification string

Status

This functionality is available in qemu version 0.13.

At the time this documentation was written, a proposed change to the configuration file syntax exists which would cause minor impact to the current structure of the CPU Model configuration file.

Current Issues and proposed changes

Allowing CPU models to be updated

We need a mechanism to allow the existing CPU models on Qemu to be updated, without making guest-visible changes for existing Virtual Machines, when migrating to a new version.

Examples

Examples where CPU model updates are necessary and have to be deployed to users:

The Nehalem CPU model currently has the wrong "level" value, making CPU topology information unavailable.
The CPUID PMU leaf was added on Qemu 1.1, but it is not supposed to be visible to guests running using -M pc-1.0
New features are implemented by KVM and we may want to add them to existing models (e.g. SandyBridge may need to have tsc-deadline added)

Requirements

A different CPU will be visible to the guest depending on the machine-type chosen.
- That means that "-M pc-1.0 -cpu Nehalem" will be different from "-M pc-1.1 -cpu Nehalem"
- Rationale:
  - The meaning of "-M pc-1.0 -cpu Nehalem" can't be changed or it will change existing guests
  - The meaning of "-M pc-1.1 -cpu Nehalem" needs to be different from the pc-1.0 one, otherwise we would be stuck with a broken "Nehalem" model forever

Current design proposal

Default CPU models will be shipped on /usr/share instead of /etc, so they can be updated on upgrades
Default CPU models will be versioned (e.g. Nehalem-1.0, Nehalem-1.1)
per-machine-type aliases will be set, so "Nehalem" will be an alias for Nehalem-1.0 on pc-1.0, and an alias for Nehalem-1.1 on pc-1.1
The CPU models loaded from the configuration file won't be loaded if -nodefconfig is used, so libvirt will have to use something like: "-nodefconfig -readconfig /usr/share/qemu/cpudefs-x86_64.conf"

 * Mailing list reference: http://marc.info/?l=qemu-devel&m=133165046801195

Probing of CPU model details

libvirt don't write CPU definitions from scratch, so it will reuse the CPU models from /usr/share. But it needs to probe for details of the CPU models.

Requirements

A detailed probing system, similar to "-cpu ?dump", but in a more extensible and machine-friendly format.

Current design proposal

-query-capabilities RFC series from Anthony
- Message-Id: <1332169763-30665-9-git-send-email-aliguori@us.ibm.com>
To be defined: Command to list CPU models, and the features they include/exclude
To be defined: Command to list the resulting CPUID bits of CPU models
- Do we need a version that works before the VM is created? Some bits depend on other Virtual Machine parameters (e.g. SMP configuration), so it won't list every single bit
- Do we need a version that works after the VM is created, so it lists every single bit? (maybe just the "query-cpus" QMP command is already enough for that)
To be defined: Command to list CPU model aliases (per-machine-type)

Improve cpudef format and semantics

Currently there are too many low-level bits on the cpudef sections. For example:

Instead of just being a "enable this feature" interface, it requires the user to know which CPUID leaf/register exposes the feature (features_edx, features_ecx, etc)
There's no mechanism to enable/disable specific CPUID leafs (e.g. the PMU leaf is not configurable)
The "level" field is too low-level. We need either:
- An "auto" mode that simply sets "level" depending on the set of required CPUID leafs; and/or
- A validation mode, where "level" can't be set too low, in case a required/requested CPUID leaf needs it

Asymmetry between [cpudef] sections and -cpu options

Currently there are two ways to change the CPU configuration: cpudef sections on config file and -cpu options. They use different syntaxes and have different representation powers. We should just use the same system for both, probably with a inheritance system to allow a "[cpu]" section to simply inherit settings from a "[cpudef]" but extend it in some way (adding and removing features, overriding specific fields, etc.).

@@ Line 154: / Line 154: @@
 * per-machine-type aliases will be set, so "Nehalem" will be an alias for Nehalem-1.0 on pc-1.0, and an alias for Nehalem-1.1 on pc-1.1
 * The CPU models loaded from the configuration file won't be loaded if -nodefconfig is used, so libvirt will have to use something like: "-nodefconfig -readconfig /usr/share/qemu/cpudefs-x86_64.conf"
+  * Mailing list reference: http://marc.info/?l=qemu-devel&m=133165046801195
 == Probing of CPU model details ==