https://wiki.qemu.org/api.php?action=feedcontributions&user=Schoenebeck&feedformat=atomQEMU - User contributions [en]2024-03-29T11:59:59ZUser contributionsMediaWiki 1.39.1https://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=11583Documentation/9psetup2024-01-10T13:10:08Z<p>Schoenebeck: add section 'Security Considerations'</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host (<b>recommended option</b>). <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade, not considered safe and has very poor performance. The "proxy" driver has not seen any development in years and will likely be removed in a future version of QEMU. <b>We recommend NOT using the "proxy" driver</b>. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root and therefore using <b>"passthrough" security model is strongly discouraged, especially when running untrusted guests!</b><br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<span id="security"></span><br />
== Security Considerations ==<br />
<br />
* Recommended is <b>'security_model=mapped'</b>. <b>Do not</b> use 'security_model=passthrough' as it requires QEMU to be run as root.<br />
* Recommend FSDRIVER is <b>'local'</b>. <b>Do not</b> use the 'proxy' driver, as it's in bad shape, hasn't seen any development in years, and will be removed in a future version of QEMU.<br />
* Keep in mind that an ordinary guest user might create indefinite many and large files as desired and therefore might fill the entire shared partition until it's full. So if you are sharing a tree with an untrusted guest then you should:<br />
** mount a <b>separate partition/data set</b> on host just for the shared tree<br />
** and/or <b>deploy quotas</b> to limit the <b>amount of data</b> AND the <b>amount of inodes</b> the guest is allowed to create<br />
* If you are sharing more than one file system, or on doubt use <b>'multidevs=remap'</b>. It adds some extra cycles for safely remapping inodes from host to guest to avoid potential file ID collisions on guest side, which could otherwise lead to nasty misbehaviours on guest side which are often not obvious to hunt down. This option also safely allows to mount more filesystems into the shared tree while guest is still running.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=mapped,multidevs=remap \<br />
-device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
<br />
We intentionally add no 'cache' option in this example to avoid confusion. You may add e.g. cache=loose option to increase performance, however keep in mind that [https://lore.kernel.org/all/ZCHU6k56nF5849xj@bombadil.infradead.org/ currently all caching implementations of Linux 9p client do not revalidate file changes made on host side <b>ever</b>!] In other words: changes made on host side would (currently) never become visible on guest unless you would remount or reboot guest! This is currently in the works, and in a future Linux version caching is planned to be enabled by default once this issue was addressed properly.<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11582Documentation/9p2024-01-10T12:39:06Z<p>Schoenebeck: /* Protocol Plans */ Case-Insensitive FS</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20230220100815.1624266-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
** <b>Case-Insensitive FS</b>: On some host systems like e.g. macOS the host filesystem is usually case-insensitive. This information should be transmitted to guest to [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ adapt its behaviour] accordingly.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
Please post bugs and patches related to the Linux 9p client to the [https://github.com/v9fs/linux/issues v9fs Github page] instead.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=11572Documentation/9psetup2024-01-08T21:16:21Z<p>Schoenebeck: /* Example */ use security_model=mapped and multidevs=remap</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host (<b>recommended option</b>). <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade, not considered safe and has very poor performance. The "proxy" driver has not seen any development in years and will likely be removed in a future version of QEMU. <b>We recommend NOT using the "proxy" driver</b>. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root and therefore using <b>"passthrough" security model is strongly discouraged, especially when running untrusted guests!</b><br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=mapped,multidevs=remap \<br />
-device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
<br />
We intentionally add no 'cache' option in this example to avoid confusion. You may add e.g. cache=loose option to increase performance, however keep in mind that [https://lore.kernel.org/all/ZCHU6k56nF5849xj@bombadil.infradead.org/ currently all caching implementations of Linux 9p client do not revalidate file changes made on host side <b>ever</b>!] In other words: changes made on host side would (currently) never become visible on guest unless you would remount or reboot guest! This is currently in the works, and in a future Linux version caching is planned to be enabled by default once this issue was addressed properly.<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/8.1&diff=11379ChangeLog/8.12023-07-07T09:45:01Z<p>Schoenebeck: /* New deprecated options and features */ -fsdev proxy and -virtfs proxy are deprecated</p>
<hr />
<div>== System emulation ==<br />
<br />
=== Removed features and incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features'] page for details of suggested replacement functionality.<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
* The "-singlestep" command line option is deprecated, as it was very misleadingly named. Its replacement is "-one-insn-per-tb" (for the user-mode emulator) or "-accel one-insn-per-tb=on" (for the system-mode emulator)<br />
* The "-fsdev proxy" and "-virtfs proxy" command line options are deprecated ([https://github.com/qemu/qemu/commit/71d72ececa086114df80fe4cc04d701b59002eb2 commit] / [https://qemu-project.gitlab.io/qemu/about/deprecated.html#fsdev-proxy-and-virtfs-proxy-since-8-1 notes]).<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* KVM VMs on a host which supports MTE (the Memory Tagging Extension) can now use MTE in the guest<br />
* Pointer-authentication information is now reported to the gdbstub (a GDB 13 or later will produce better backtraces when pauth is in use by the guest)<br />
* Orangepi-PC, Cubieboard: Add Allwinner WDT watchdog emulation<br />
* mcimxd7-sabre, mcimx6ul-evk: The second ethernet controller PHY is now usable<br />
* fsl-imx6: The SNVS is now implemented, sufficient for the guest to be able to shut down the machine<br />
* The SMMUv3 model can now emulate stage-2 translations (but only as an alternative to, not together with, stage-1)<br />
* Debugging via the gdbstub is now supported when using the hvf acceleration on macos hosts<br />
* xlnx-versal board now emulates a CANFD controller<br />
* sbsa-ref now provides the GIC ITS<br />
* New board model: bpim2u (Banana Pi BPI-M2 Ultra)<br />
* TCG plugin memory instrumentation now catches all SVE accesses<br />
<br />
* New architectural features now emulated:<br />
** FEAT_PAN3 (Support for SCTLR_ELx.EPAN)<br />
** FEAT_LSE2 (Large System Extensions v2)<br />
** FEAT_RME (Realm Management Extensions) -- support is currently experimental only<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
* New SeaBIOS-hppa version 8 firmware<br />
* Fixes boot failure of Debian-12 install CD-ROM (ramdisc could not be loaded)<br />
* Fixes operating system boot and reboot issues on HP-UX and Linux with SMP installations<br />
* Enables PSW-Q bit by default (for MPE-UX operating system)<br />
* Show QEMU version in firmware boot menu<br />
* Adds EXIT menu entry to firmware boot menu<br />
* Enhances PDC CHASSIS codes debug possibilty<br />
<br />
=== LoongArch ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
* Allow FPCSR special purpose register to be accessed in user mode<br />
* Configure FPU to detecting tininess before rounding to align QEMU with architecture specification<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== ISA and Extensions ====<br />
* Support subsets of code size reduction extension<br />
* A large collection of mstatus sum changes and cleanups<br />
* Zero init APLIC internal state<br />
* Implement query-cpu-definitions<br />
* Fix Guest Physical Address Translation<br />
* Make sure an exception is raised if a pte is malformed<br />
* Move zc* out of the experimental properties<br />
* Mask the implicitly enabled extensions in isa_string based on priv version<br />
* Updates and improvements for Smstateen<br />
* Support disas for Zcm* extensions<br />
* Support disas for Z*inx extensions<br />
* Add vector registers to log<br />
<br />
==== Machines ====<br />
* Add signature dump function for spike to run ACT tests<br />
* Add Ventana's Veyron V1 CPU<br />
* Assume M-mode FW in pflash0 only when "-bios none"<br />
* Support using pflash via -blockdev option<br />
<br />
==== Fixes and Misc ====<br />
* Fix invalid riscv,event-to-mhpmcounters entry<br />
* Fix itrigger when icount is used<br />
* Fix mstatus.MPP related support<br />
* Fix the H extension TVM trap<br />
* Restore the predicate() NULL check behavior<br />
* Skip Vector set tail when vta is zero<br />
* Fixup PMP TLB cacheing errors<br />
* Writing to pmpaddr and MML/MMWP correctly triggers TLB flushes<br />
* Fixup PMP bypass checks<br />
* Deny access if access is partially inside a PMP entry<br />
* Fix QEMU crash when NUMA nodes exceed available CPUs<br />
* Fix pointer mask transformation for vector address<br />
* Remove the check for extra Vector tail elements<br />
* Smepmp: Return error when access permission not allowed in PMP<br />
* Fixes for smsiaddrcfg and smsiaddrcfgh in AIA<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
* Fix block device error when trying to boot niagara machine<br />
* Allow keyboard language DIP switches to be set via the -global escc.chnA-sunkbd-layout option<br />
* Update target/sparc to use tcg_gen_lookup_and_goto_ptr() for improved performance<br />
<br />
=== Tricore ===<br />
* Handles PCXI and ICR registers correctly for ISA version 1.6.1 upwards<br />
* Added POPCNT.W, LHA, CRC32L.W, CRC32.B, SHUFFLE, SYSCALL, and DISABLE instructions<br />
* Implemented privilege levels<br />
* Introduced TC37x CPU that supports ISA v1.6.2<br />
* Fix out of bounds index for instructions using 64 register pairs<br />
<br />
=== x86 ===<br />
* The following features are now exposed by TCG (but were already implemented): RDSEED, XSAVEERPTR, 3DNOWPREFETCH, WBNOINVD<br />
* RDPID is now implemented by TCG<br />
* SYSCALL is now implemented by TCG in 32-bit emulators (only for AMD processors; Intel processors hide the feature unless the processor is in long mode).<br />
* On Linux, qemu-i386 will run 32-bit programs as if they were ran by a 64-bit kernel, if the chosen CPU model includes the LM feature<br />
* User-mode emulation will not warn about features that TCG does not implement, if those features are not visible to user mode (e.g. PCID)<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
* add "virtio-multitouch-pci", a multitouch-capable input device<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
* Added TPM TIS I2C device model<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== vDPA ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* [https://github.com/qemu/qemu/commit/f6b0de53fb87ddefed348a39284c8e2f28dc4eda Security fix] for CVE-2023-2861.<br />
* [https://github.com/qemu/qemu/commit/71d72ececa086114df80fe4cc04d701b59002eb2 'Proxy' backend is deprecated].<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
* new PipeWire audio backend (<tt>-audiodev pipewire</tt>)<br />
<br />
=== Character devices ===<br />
<br />
* It's now possible to specify the input independently from the output with ''-chardev file'' (e.g. ''-chardev file,id=repro,path=/dev/null,input-path=input.txt'')<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
* gtk: enable multi-touch events<br />
* sdl: various keyboard grab fixes<br />
* dbus: add multi-touch and win32 support<br />
<br />
=== GDBStub ===<br />
* debugging linux-user guests now report correct pid<br />
* now support "info proc" and the host IO features<br />
* properly respond to "b" packet when reverse debugging<br />
<br />
=== TCG Plugins ===<br />
* cputlb API change now forces slow path for all memory helpers under instrumentation<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
<br />
=== Tracing ===<br />
* The final parts of per-vcpu trace events where removed. Those looking to monitor TCG code should look at https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html<br />
<br />
=== Semihosting ===<br />
<br />
=== Miscellaneous ===<br />
* Command-line parsing of sizes using a fraction of a scale (such as "1.5M") has been improved: it is now possible to write ".5G" as a synonym for "512M", and no longer possible to cause qemu to read out of bounds on garbage input such as "9.999e999".<br />
<br />
== User-mode emulation ==<br />
<br />
=== build ===<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Nios2 ===<br />
<br />
=== HPPA ===<br />
<br />
=== x86 ===<br />
<br />
=== Xtensa ===<br />
<br />
== TCG backends ==<br />
<br />
=== RISC-V ===<br />
<br />
* Support Zba, Zbb, and Zicond standard extensions.<br />
<br />
== Guest agent ==<br />
* The guest-exec command supports values "stdout", "stderr", "merged" values for the capture-output parameter. The <tt>true</tt> and <tt>false</tt> values for the parameter can also be written as "separated" and "none" respectively.<br />
* The guest-get-fsinfo box can return "usb" as the bus type too.<br />
<br />
== Build Information ==<br />
<br />
=== Build Dependencies ===<br />
* The <tt>--meson</tt> and <tt>--sphinx-build</tt> options to configure have been removed. Meson and Sphinx will always be invoked through the Python interpreter specified (optionally) with <tt>--python</tt> or the <tt>$PYTHON</tt> environment variable; in order to use a host installation of Meson or Sphinx, the corresponding distribution packages (including metadata) will have to be installed in the <tt>site-packages</tt> directory of that Python interpreter.<br />
* Either pip+setuptools or ensurepip must now be installed to build QEMU. It is recommended to install distlib as well, but the build process tries to cope with its absence and it shouldn't be necessary.<br />
* A new option <tt>--enable-download</tt> will direct configure to find some missing Python build dependencies. For now this applies to sphinx (downloaded from PyPI) and libslirp (which is then built as a meson subproject). Only required and explicitly enabled dependencies (e.g. only for <tt>--enable-docs</tt> in the case of Sphinx) are downloaded.<br />
** The use of <tt>subprojects/wrapdb.json</tt> (downloaded by "meson wrap update-db") isn't supported yet.<br />
* Starting with QEMU 8.1, only Python 3.8 and newer will be supported (3.7 might work but it is not included in any of the environments that we run CI with).<br />
* new pipewire audio backend requires libpipewire (currently >= 0.3.60)<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
* riscv-cross image now using lcitool<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/8.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/8.1&diff=11378ChangeLog/8.12023-07-07T09:34:57Z<p>Schoenebeck: /* 9pfs */ proxy backend is deprecated</p>
<hr />
<div>== System emulation ==<br />
<br />
=== Removed features and incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features'] page for details of suggested replacement functionality.<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
* The "-singlestep" command line option is deprecated, as it was very misleadingly named. Its replacement is "-one-insn-per-tb" (for the user-mode emulator) or "-accel one-insn-per-tb=on" (for the system-mode emulator)<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* KVM VMs on a host which supports MTE (the Memory Tagging Extension) can now use MTE in the guest<br />
* Pointer-authentication information is now reported to the gdbstub (a GDB 13 or later will produce better backtraces when pauth is in use by the guest)<br />
* Orangepi-PC, Cubieboard: Add Allwinner WDT watchdog emulation<br />
* mcimxd7-sabre, mcimx6ul-evk: The second ethernet controller PHY is now usable<br />
* fsl-imx6: The SNVS is now implemented, sufficient for the guest to be able to shut down the machine<br />
* The SMMUv3 model can now emulate stage-2 translations (but only as an alternative to, not together with, stage-1)<br />
* Debugging via the gdbstub is now supported when using the hvf acceleration on macos hosts<br />
* xlnx-versal board now emulates a CANFD controller<br />
* sbsa-ref now provides the GIC ITS<br />
* New board model: bpim2u (Banana Pi BPI-M2 Ultra)<br />
* TCG plugin memory instrumentation now catches all SVE accesses<br />
<br />
* New architectural features now emulated:<br />
** FEAT_PAN3 (Support for SCTLR_ELx.EPAN)<br />
** FEAT_LSE2 (Large System Extensions v2)<br />
** FEAT_RME (Realm Management Extensions) -- support is currently experimental only<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
* New SeaBIOS-hppa version 8 firmware<br />
* Fixes boot failure of Debian-12 install CD-ROM (ramdisc could not be loaded)<br />
* Fixes operating system boot and reboot issues on HP-UX and Linux with SMP installations<br />
* Enables PSW-Q bit by default (for MPE-UX operating system)<br />
* Show QEMU version in firmware boot menu<br />
* Adds EXIT menu entry to firmware boot menu<br />
* Enhances PDC CHASSIS codes debug possibilty<br />
<br />
=== LoongArch ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
* Allow FPCSR special purpose register to be accessed in user mode<br />
* Configure FPU to detecting tininess before rounding to align QEMU with architecture specification<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== ISA and Extensions ====<br />
* Support subsets of code size reduction extension<br />
* A large collection of mstatus sum changes and cleanups<br />
* Zero init APLIC internal state<br />
* Implement query-cpu-definitions<br />
* Fix Guest Physical Address Translation<br />
* Make sure an exception is raised if a pte is malformed<br />
* Move zc* out of the experimental properties<br />
* Mask the implicitly enabled extensions in isa_string based on priv version<br />
* Updates and improvements for Smstateen<br />
* Support disas for Zcm* extensions<br />
* Support disas for Z*inx extensions<br />
* Add vector registers to log<br />
<br />
==== Machines ====<br />
* Add signature dump function for spike to run ACT tests<br />
* Add Ventana's Veyron V1 CPU<br />
* Assume M-mode FW in pflash0 only when "-bios none"<br />
* Support using pflash via -blockdev option<br />
<br />
==== Fixes and Misc ====<br />
* Fix invalid riscv,event-to-mhpmcounters entry<br />
* Fix itrigger when icount is used<br />
* Fix mstatus.MPP related support<br />
* Fix the H extension TVM trap<br />
* Restore the predicate() NULL check behavior<br />
* Skip Vector set tail when vta is zero<br />
* Fixup PMP TLB cacheing errors<br />
* Writing to pmpaddr and MML/MMWP correctly triggers TLB flushes<br />
* Fixup PMP bypass checks<br />
* Deny access if access is partially inside a PMP entry<br />
* Fix QEMU crash when NUMA nodes exceed available CPUs<br />
* Fix pointer mask transformation for vector address<br />
* Remove the check for extra Vector tail elements<br />
* Smepmp: Return error when access permission not allowed in PMP<br />
* Fixes for smsiaddrcfg and smsiaddrcfgh in AIA<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
* Fix block device error when trying to boot niagara machine<br />
* Allow keyboard language DIP switches to be set via the -global escc.chnA-sunkbd-layout option<br />
* Update target/sparc to use tcg_gen_lookup_and_goto_ptr() for improved performance<br />
<br />
=== Tricore ===<br />
* Handles PCXI and ICR registers correctly for ISA version 1.6.1 upwards<br />
* Added POPCNT.W, LHA, CRC32L.W, CRC32.B, SHUFFLE, SYSCALL, and DISABLE instructions<br />
* Implemented privilege levels<br />
* Introduced TC37x CPU that supports ISA v1.6.2<br />
* Fix out of bounds index for instructions using 64 register pairs<br />
<br />
=== x86 ===<br />
* The following features are now exposed by TCG (but were already implemented): RDSEED, XSAVEERPTR, 3DNOWPREFETCH, WBNOINVD<br />
* RDPID is now implemented by TCG<br />
* SYSCALL is now implemented by TCG in 32-bit emulators (only for AMD processors; Intel processors hide the feature unless the processor is in long mode).<br />
* On Linux, qemu-i386 will run 32-bit programs as if they were ran by a 64-bit kernel, if the chosen CPU model includes the LM feature<br />
* User-mode emulation will not warn about features that TCG does not implement, if those features are not visible to user mode (e.g. PCID)<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
* add "virtio-multitouch-pci", a multitouch-capable input device<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
* Added TPM TIS I2C device model<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== vDPA ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* [https://github.com/qemu/qemu/commit/f6b0de53fb87ddefed348a39284c8e2f28dc4eda Security fix] for CVE-2023-2861.<br />
* [https://github.com/qemu/qemu/commit/71d72ececa086114df80fe4cc04d701b59002eb2 'Proxy' backend is deprecated].<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
* new PipeWire audio backend (<tt>-audiodev pipewire</tt>)<br />
<br />
=== Character devices ===<br />
<br />
* It's now possible to specify the input independently from the output with ''-chardev file'' (e.g. ''-chardev file,id=repro,path=/dev/null,input-path=input.txt'')<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
* gtk: enable multi-touch events<br />
* sdl: various keyboard grab fixes<br />
* dbus: add multi-touch and win32 support<br />
<br />
=== GDBStub ===<br />
* debugging linux-user guests now report correct pid<br />
* now support "info proc" and the host IO features<br />
* properly respond to "b" packet when reverse debugging<br />
<br />
=== TCG Plugins ===<br />
* cputlb API change now forces slow path for all memory helpers under instrumentation<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
<br />
=== Tracing ===<br />
* The final parts of per-vcpu trace events where removed. Those looking to monitor TCG code should look at https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html<br />
<br />
=== Semihosting ===<br />
<br />
=== Miscellaneous ===<br />
* Command-line parsing of sizes using a fraction of a scale (such as "1.5M") has been improved: it is now possible to write ".5G" as a synonym for "512M", and no longer possible to cause qemu to read out of bounds on garbage input such as "9.999e999".<br />
<br />
== User-mode emulation ==<br />
<br />
=== build ===<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Nios2 ===<br />
<br />
=== HPPA ===<br />
<br />
=== x86 ===<br />
<br />
=== Xtensa ===<br />
<br />
== TCG backends ==<br />
<br />
=== RISC-V ===<br />
<br />
* Support Zba, Zbb, and Zicond standard extensions.<br />
<br />
== Guest agent ==<br />
* The guest-exec command supports values "stdout", "stderr", "merged" values for the capture-output parameter. The <tt>true</tt> and <tt>false</tt> values for the parameter can also be written as "separated" and "none" respectively.<br />
* The guest-get-fsinfo box can return "usb" as the bus type too.<br />
<br />
== Build Information ==<br />
<br />
=== Build Dependencies ===<br />
* The <tt>--meson</tt> and <tt>--sphinx-build</tt> options to configure have been removed. Meson and Sphinx will always be invoked through the Python interpreter specified (optionally) with <tt>--python</tt> or the <tt>$PYTHON</tt> environment variable; in order to use a host installation of Meson or Sphinx, the corresponding distribution packages (including metadata) will have to be installed in the <tt>site-packages</tt> directory of that Python interpreter.<br />
* Either pip+setuptools or ensurepip must now be installed to build QEMU. It is recommended to install distlib as well, but the build process tries to cope with its absence and it shouldn't be necessary.<br />
* A new option <tt>--enable-download</tt> will direct configure to find some missing Python build dependencies. For now this applies to sphinx (downloaded from PyPI) and libslirp (which is then built as a meson subproject). Only required and explicitly enabled dependencies (e.g. only for <tt>--enable-docs</tt> in the case of Sphinx) are downloaded.<br />
** The use of <tt>subprojects/wrapdb.json</tt> (downloaded by "meson wrap update-db") isn't supported yet.<br />
* Starting with QEMU 8.1, only Python 3.8 and newer will be supported (3.7 might work but it is not included in any of the environments that we run CI with).<br />
* new pipewire audio backend requires libpipewire (currently >= 0.3.60)<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
* riscv-cross image now using lcitool<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/8.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/8.1&diff=11350ChangeLog/8.12023-06-09T12:06:41Z<p>Schoenebeck: /* 9pfs */ security fix for CVE-2023-2861</p>
<hr />
<div>== System emulation ==<br />
<br />
=== Removed features and incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features'] page for details of suggested replacement functionality.<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
* The "-singlestep" command line option is deprecated, as it was very misleadingly named. Its replacement is "-one-insn-per-tb" (for the user-mode emulator) or "-accel one-insn-per-tb=on" (for the system-mode emulator)<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* KVM VMs on a host which supports MTE (the Memory Tagging Extension) can now use MTE in the guest<br />
* Pointer-authentication information is now reported to the gdbstub (a GDB 13 or later will produce better backtraces when pauth is in use by the guest)<br />
* Orangepi-PC, Cubieboard: Add Allwinner WDT watchdog emulation<br />
* mcimxd7-sabre, mcimx6ul-evk: The second ethernet controller PHY is now usable<br />
* fsl-imx6: The SNVS is now implemented, sufficient for the guest to be able to shut down the machine<br />
* The SMMUv3 model can now emulate stage-2 translations (but only as an alternative to, not together with, stage-1)<br />
* Debugging via the gdbstub is now supported when using the hvf acceleration on macos hosts<br />
* xlnx-versal board now emulates a CANFD controller<br />
* New board model: bpim2u (Banana Pi BPI-M2 Ultra)<br />
<br />
* New architectural features now emulated:<br />
** FEAT_PAN3 (Support for SCTLR_ELx.EPAN)<br />
** FEAT_LSE2 (Large System Extensions v2)<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
* Allow FPCSR special purpose register to be accessed in user mode<br />
* Configure FPU to detecting tininess before rounding to align QEMU with architecture specification<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== ISA and Extensions ====<br />
* Support subsets of code size reduction extension<br />
* A large collection of mstatus sum changes and cleanups<br />
* Zero init APLIC internal state<br />
* Implement query-cpu-definitions<br />
* Fix Guest Physical Address Translation<br />
* Make sure an exception is raised if a pte is malformed<br />
<br />
==== Machines ====<br />
* Add signature dump function for spike to run ACT tests<br />
* Add Ventana's Veyron V1 CPU<br />
<br />
==== Fixes and Misc ====<br />
* Fix invalid riscv,event-to-mhpmcounters entry<br />
* Fix itrigger when icount is used<br />
* Fix mstatus.MPP related support<br />
* Fix the H extension TVM trap<br />
* Restore the predicate() NULL check behavior<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
* Handles PCXI and ICR registers correctly for ISA version 1.6.1 upwards<br />
<br />
=== x86 ===<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
* add "virtio-multitouch-pci", a multitouch-capable input device<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
* Added TPM TIS I2C device model<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== vDPA ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* [https://github.com/qemu/qemu/commit/f6b0de53fb87ddefed348a39284c8e2f28dc4eda Security fix] for CVE-2023-2861.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
* new PipeWire audio backend (<tt>-audiodev pipewire</tt>)<br />
<br />
=== Character devices ===<br />
<br />
* It's now possible to specify the input independently from the output with ''-chardev file'' (e.g. ''-chardev file,id=repro,path=/dev/null,input-path=input.txt'')<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
* gtk: enable multi-touch events<br />
* sdl: various keyboard grab fixes<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
<br />
=== Tracing ===<br />
* The final parts of per-vcpu trace events where removed. Those looking to monitor TCG code should look at https://qemu.readthedocs.io/en/latest/devel/tcg-plugins.html<br />
<br />
=== Semihosting ===<br />
<br />
=== Miscellaneous ===<br />
* Command-line parsing of sizes using a fraction of a scale (such as "1.5M") has been improved: it is now possible to write ".5G" as a synonym for "512M", and no longer possible to cause qemu to read out of bounds on garbage input such as "9.999e999".<br />
<br />
== User-mode emulation ==<br />
<br />
=== build ===<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Nios2 ===<br />
<br />
=== HPPA ===<br />
<br />
=== x86 ===<br />
<br />
=== Xtensa ===<br />
<br />
== TCG backends ==<br />
<br />
=== RISC-V ===<br />
<br />
* Support Zba, Zbb, and Zicond standard extensions.<br />
<br />
== Guest agent ==<br />
* The guest-exec command supports values "stdout", "stderr", "merged" values for the capture-output parameter. The <tt>true</tt> and <tt>false</tt> values for the parameter can also be written as "separated" and "none" respectively.<br />
* The guest-get-fsinfo box can return "usb" as the bus type too.<br />
<br />
== Build Information ==<br />
<br />
=== Build Dependencies ===<br />
* The <tt>--meson</tt> and <tt>--sphinx-build</tt> options to configure have been removed. Meson and Sphinx will always be invoked through the Python interpreter specified (optionally) with <tt>--python</tt> or the <tt>$PYTHON</tt> environment variable; in order to use a host installation of Meson or Sphinx, the corresponding distribution packages (including metadata) will have to be installed in the <tt>site-packages</tt> directory of that Python interpreter.<br />
* Either pip+setuptools or ensurepip must now be installed to build QEMU. It is recommended to install distlib as well, but the build process tries to cope with its absence and it shouldn't be necessary.<br />
* A new option <tt>--enable-download</tt> will direct configure to find some missing Python build dependencies. For now this applies to sphinx (downloaded from PyPI) and libslirp (which is then built as a meson subproject). Only required and explicitly enabled dependencies (e.g. only for <tt>--enable-docs</tt> in the case of Sphinx) are downloaded.<br />
** The use of <tt>subprojects/wrapdb.json</tt> (downloaded by "meson wrap update-db") isn't supported yet.<br />
* Starting with QEMU 8.1, only Python 3.8 and newer will be supported (3.7 might work but it is not included in any of the environments that we run CI with).<br />
* new pipewire audio backend requires libpipewire (currently >= 0.3.60)<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/8.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=11312Documentation/9psetup2023-05-17T12:28:04Z<p>Schoenebeck: /* Example */ drop cache=none option in example (as this is default anyway) and describe cache option in more detail</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host (<b>recommended option</b>). <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade, not considered safe and has very poor performance. The "proxy" driver has not seen any development in years and will likely be removed in a future version of QEMU. <b>We recommend NOT using the "proxy" driver</b>. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root and therefore using <b>"passthrough" security model is strongly discouraged, especially when running untrusted guests!</b><br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
<br />
We intentionally add no 'cache' option in this example to avoid confusion. You may add e.g. cache=loose option to increase performance, however keep in mind that [https://lore.kernel.org/all/ZCHU6k56nF5849xj@bombadil.infradead.org/ currently all caching implementations of Linux 9p client do not revalidate file changes made on host side <b>ever</b>!] In other words: changes made on host side would (currently) never become visible on guest unless you would remount or reboot guest! This is currently in the works, and in a future Linux version caching is planned to be enabled by default once this issue was addressed properly.<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=11311Documentation/9psetup2023-05-17T12:08:01Z<p>Schoenebeck: /* Starting the Guest directly */ discourage using passthrough security model</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host (<b>recommended option</b>). <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade, not considered safe and has very poor performance. The "proxy" driver has not seen any development in years and will likely be removed in a future version of QEMU. <b>We recommend NOT using the "proxy" driver</b>. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root and therefore using <b>"passthrough" security model is strongly discouraged, especially when running untrusted guests!</b><br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600,cache=none<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
We use no cache because [https://lore.kernel.org/all/ZCHU6k56nF5849xj@bombadil.infradead.org/ current caching mechanisms need more work and the results are not what you would expect].<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=11310Documentation/9psetup2023-05-17T11:58:04Z<p>Schoenebeck: /* Starting the Guest directly */ recommend not using the proxy driver</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host (<b>recommended option</b>). <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade, not considered safe and has very poor performance. The "proxy" driver has not seen any development in years and will likely be removed in a future version of QEMU. <b>We recommend NOT using the "proxy" driver</b>. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root.<br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600,cache=none<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
We use no cache because [https://lore.kernel.org/all/ZCHU6k56nF5849xj@bombadil.infradead.org/ current caching mechanisms need more work and the results are not what you would expect].<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=11257Documentation/9p root fs2023-03-28T11:38:23Z<p>Schoenebeck: /* Boot the 9p Root FS System */ clarify that cache=loose makes host side changes not visible</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
[[File:Debian_11_live_boot_menu_screenshot.png|frameless|upright=2.4]]<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
<b>Important</b>: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
Yet another alternative would be to configure mounting tmpfs from host side (~/vm/bullseye/tmp).<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap,posixacl console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about performance, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not see file changes<br />
performed on host side <b>at all</b> (as Linux kernel's 9p client currently<br />
does not revalidate for fs changes on host side at all, which is<br />
[https://lore.kernel.org/all/CAFkjPTmVbyuA0jEAjYhsOsg-SE99yXgehmjqUZb4_uWS_L-ZTQ@mail.gmail.com/ planned to be changed on Linux kernel side soon] though).<br />
So choose wisely upon intended use case scenario. You can change between<br />
<b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11207Documentation/9p2023-03-14T16:17:52Z<p>Schoenebeck: /* Contribute */ add link to Linux client Github issues page</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20230220100815.1624266-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
Please post bugs and patches related to the Linux 9p client to the [https://github.com/v9fs/linux/issues v9fs Github page] instead.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11206Documentation/9p2023-03-14T16:11:49Z<p>Schoenebeck: /* Implementation Plans */ update link to latest Windows host patches</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20230220100815.1624266-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11065Documentation/9p2023-01-05T11:27:40Z<p>Schoenebeck: /* Implementation Plans */ update link to latest Windows host patches</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20221219102022.2167736-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11048Documentation/9p2022-11-17T16:34:33Z<p>Schoenebeck: /* Implementation Plans */ update link to latest Windows patches</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20221111042225.1115931-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11045Documentation/9p2022-11-02T12:48:11Z<p>Schoenebeck: /* Implementation Plans */ update link to latest Windows patches</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20221024045759.448014-1-bin.meng@windriver.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=11037Documentation/9p2022-10-25T15:33:24Z<p>Schoenebeck: /* Test Cases */ 9p tests have been divided into 3 source files in QEMU 7.2</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Source Files of Tests ===<br />
<br />
The 9pfs test code is divided into 3 source files:<br />
<br />
* <b>Test Cases</b>: All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
* <b>Test Client</b>: The test cases use their own lite-weight 9p client implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.c tests/qtest/libqos/virtio-9p-client.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p-client.h tests/qtest/libqos/virtio-9p-client.h] source files.<br />
* <b>Test Transport</b>: The test client uses a virtio based transport to communicate with 9p server, in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.c tests/qtest/libqos/virtio-9p.c] and [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/libqos/virtio-9p.h tests/qtest/libqos/virtio-9p.h] source files.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20220425142705.2099270-1-bmeng.cn@gmail.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/7.2&diff=11034ChangeLog/7.22022-10-25T13:44:25Z<p>Schoenebeck: /* 9pfs */ performance improvement</p>
<hr />
<div>== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features'] page for details of suggested replacement functionality.<br />
<br />
==== Removal of the "slirp" submodule (affects "-netdev user") ====<br />
<br />
The "slirp" submodule / code (which is the code behind "-netdev user" / "-nic user") has been removed from the QEMU source tree, so you now need to install your distributions libslirp development package before compiling QEMU to get the user-mode networking feature included again. For example, if you see an error message like this:<br />
<br />
<code>Parameter 'type' expects a netdev backend type</code><br />
<br />
... this might be caused by the missing "user" mode backend. In that case, please install libslirp first ("<code>dnf install libslirp-devel</code>" on Fedora and "<code>apt-get install libslirp-dev</code>" on Debian for example), recompile your QEMU with <code>--enable-slirp</code>, then try again.<br />
<br />
==== Semihosting calls from userspace ====<br />
<br />
For some target architectures (arm, m68k, mips, nios2, riscv, xtensa) QEMU supports a "semihosting" style ABI where guest code can make calls to directly print messages, read and write host files, and so on. Handling of when this is enabled in system emulation has been made consistent across target architectures. By default it is not enabled; if enabled via the commandline "-semihosting" or "-semihosting-config enable=on" then it is only permitted from non-userspace guest code; if the new-in-7.2 "-semihosting-config userspace=on" option is given then it is also permitted from guest userspace. For some target architectures this is a change in behaviour: mips, nios2 and xtensa previously allowed userspace access by default, and riscv allowed all access by default. If you were using semihosting on these targets and relying on that previous default behaviour, you need to update your commandline to explicitly enable semihosting to the desired level.<br />
<br />
==== Other removed features ====<br />
<br />
* The <tt>-watchdog</tt> option has been removed, use <tt>-device</tt> instead.<br />
* The PPC ''taihu'' machine has been removed, use ''ref405ep'' instead.<br />
<br />
=== New deprecated options and features ===<br />
* Big endian 32-bit MIPS hosts are now deprecated due to lack of CI coverage.<br />
* The "--blacklist" command line option for the QEMU guest agent has been renamed to "--block-rpcs". The old name is still supported for now, but will be removed in the future; "-b" can be used on old and new versions alike.<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* The following CPU architecture features are now emulated:<br />
** FEAT_ETS (Enhanced Translation Synchronization)<br />
** FEAT_PMUv3p5 (PMU Extensions v3.5)<br />
** FEAT_GTG (Guest translation granule size)<br />
* New emulated CPU types:<br />
** Cortex-A35<br />
<br />
==== Machines ====<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
* deprecated 32 bit big endian host<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
* Stability improvements<br />
* Performance improvements by supporting MTTCG<br />
* New '''virt''' platform is added to assist with CI and device testing<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== ISA and Extensions ====<br />
* Update [m|h]tinst CSR in interrupt handling<br />
* Force disable extensions if priv spec version does not match<br />
* fix shifts shamt value for rv128c<br />
* move zmmul out of the experimental<br />
* Add checks for supported extension combinations<br />
* Fix typo and restore Pointer Masking functionality for RISC-V<br />
* Add mask agnostic behaviour (rvv_ma_all_1s) for vector extension<br />
* Add Zihintpause support<br />
* Add xicondops in ISA entry<br />
* Use official extension names for AIA CSRs<br />
* Fix the CSR check for cycle{h}, instret{h}, time{h}, hpmcounter3-31{h}<br />
* Improvements to the RISC-V debugger spec<br />
* Add disas support for vector instructions<br />
<br />
==== Machines ====<br />
* virt: pass random seed to fdt<br />
* opentitan: bump opentitan version<br />
* virt machine device tree improvements<br />
* Allow setting the resetvec for the OpenTitan machine<br />
* Enable booting S-mode firmware from pflash on virt machine<br />
<br />
==== Fixes and Misc ====<br />
* Upgrade OpenSBI to v1.1<br />
* microchip_pfsoc: fix kernel panics due to missing peripherals<br />
* Remove additional priv version check for mcountinhibit<br />
* Fixup register addresses for Ibex SPI<br />
* Cleanup the RISC-V virt machine documentation<br />
* Remove fixed numbering from GDB xml feature files<br />
* Priority level fixes for PLIC<br />
* Fixup TLB size calculation when using PMP<br />
<br />
=== s390x ===<br />
<br />
* Fix emulation of LZRF instruction<br />
* Implement Message-Security-Assist Extension 5 (random number generation via PRNO instruction)<br />
* Implement SHA-512 via KIMD/KLMD instructions<br />
* Enhanced zPCI interpretation support for KVM guests<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
* Support for passing a random seed to the Linux kernel when booted with -kernel<br />
* Support for the MSR_CORE_THREAD_COUNT MSR<br />
==== TCG ====<br />
* Performance improvements in full-system emulation<br />
* Fixes in SSE implementation<br />
* TCG support for AVX, AVX2 and VAES instructions<br />
<br />
==== KVM ====<br />
* Support for the "notify vmexit" mechanism, preventing processor bugs from hanging the whole system, through the ''-accel kvm,notify-vmexit='' and ''-accel kvm,notify-window='' options<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
===== Controllers =====<br />
<br />
===== Devices =====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
* Fixed bug that could cause a stack or heap overflow with the emulated "tulip" NIC (CVE-2022-2962)<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
* Support for setting CD-ROM block size using the physical-block-size property of the scsi-cd device.<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* Massive general [https://github.com/qemu/qemu/commit/f5265c8f917ea8c71a30e549b7e3017c1038db63 performance improvement] somewhere between factor 6 .. 12.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
* UNIX socket support on Windows has been added<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
* On macOS systems, the same QEMU binary can include both the Cocoa user interface and the SDL or GTK+ user interfaces.<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
* The "slirp" submodule has been removed from the QEMU source tree. Use libslirp from your OS distribution instead.<br />
<br />
=== Block device backends and tools ===<br />
=== Tracing ===<br />
<br />
=== Semihosting ===<br />
<br />
Semihosting calls were generally not permitted for userspace guest code in system emulation. This can now be enabled with the "-semihosting-config userspace=on" option. Note that the usual remarks about semihosting apply -- because it permits direct guest access to the host filesystem, it should only be used with trusted guest binaries.<br />
<br />
=== Miscellaneous ===<br />
<br />
== User-mode emulation ==<br />
<br />
* Dump failing executable on CPU exception<br />
* support for system calls pidfd_open(), pidfd_send_signal() and pidfd_getfd()<br />
* support for FUTEX_WAKE_BITSET and PI futexes<br />
* support for madvise(MADV_DONTNEED) on file mappings<br />
<br />
=== build ===<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== LoongArch ===<br />
<br />
=== Nios2 ===<br />
<br />
=== HPPA ===<br />
<br />
* Increased guest stack to 80MB<br />
* Fix signal handling<br />
* Add vDSO emulation and thus avoid an executable stack<br />
* Changed guest memory layout like on real hppa kernel<br />
<br />
=== x86 ===<br />
<br />
* The qemu-i386 and qemu-x86_64 binaries now default to the 'max' CPU model instead of 'qemu32' / 'qemu64'<br />
* Support for saving/restoring SSE registers in signal frames in qemu-i386 (when FXSR is set in CPUID)<br />
* Support for saving/restoring XSAVE state in signal frames (when XSAVE is set in CPUID)<br />
<br />
== TCG ==<br />
<br />
=== ARM ===<br />
<br />
== Guest agent ==<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
* Python 3.7 or newer is now required.<br />
<br />
=== GIT submodules ===<br />
* The libslirp library is not included in QEMU anymore. The development packages for libslirp must be installed in the system to build QEMU with user-mode networking support. <!-- As of version 7.2, QEMU will fail to build without libslirp unless <tt>--disable-libslirp</tt> is passed explicitly to the configure script. This may change in the future --><br />
<br />
=== Container Based Builds ===<br />
* All containers are now "flat" containers (often generated by lci-tool)<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
* Meson 0.61 or newer is now required. QEMU ships with Meson 0.61.5, which will be used if necessary.<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/7.2]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/7.1&diff=10956ChangeLog/7.12022-06-16T18:48:58Z<p>Schoenebeck: /* 9pfs */ fixed Twalk error handling</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
* The <tt>--enable-fips</tt> option to QEMU system emulators has been removed<br />
* The <tt>-writeconfig</tt> option to QEMU system emulators has been removed<br />
* The deprecated x86 CPU model <tt>Icelake-Client</tt> has been removed<br />
* The deprecated properties <tt>loaded</tt> (for crypto objects) and <tt>opened</tt> (for RNG backends) are now read-only<br />
* The deprecated <tt>-soundhw</tt> option has been replaced by <tt>-audio</tt> (e.g. <tt>-audio pa,model=hda</tt>)<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* The following CPU architecture features are now emulated:<br />
** FEAT_TTL (Translation Table Level)<br />
** FEAT_BBM at level 2 (Translation table break-before-make levels)<br />
** FEAT_Debugv8p2 (Debug changes for v8.2)<br />
** FEAT_Debugv8p4 (Debug changes for v8.4)<br />
** FEAT_DoubleFault<br />
** FEAT_RAS (Reliability, Availability and Serviceability extension, minimal version only)<br />
** FEAT_RASv1p1 (RAS extension v1.1, minimal version only)<br />
** FEAT_IESB (Implicit error synchronization event)<br />
** FEAT_CSV2 (Cache speculation variant 2)<br />
** FEAT_CSV2_2 (Cache speculation variant 2, version 2)<br />
** FEAT_CSV3 (Cache speculation variant 3)<br />
** FEAT_DGH (Data gathering hint)<br />
** FEAT_S2FWB (Stage 2 forced Write-Back)<br />
** FEAT_IDST (ID space trap handling)<br />
** FEAT_HCX (Support for the HCRX_EL2 register)<br />
* The emulated SMMUv3 now advertises support for SMMUv3.2-BBML2<br />
* The xlnx-zynqmp SoC model now implements the 4 TTC timers<br />
* The versal machine now models the Cortex-R5s in the Real-Time Processing Unit (RPU) subsystem<br />
* The virt board now supports emulation of the GICv4.0<br />
* New Aspeed AST1030 SoC and eval board<br />
* New emulated CPU types:<br />
** Cortex-A76<br />
** Neoverse-N1<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
* Update to SeaBIOS-hppa firmware version 6:<br />
** supports emulated PS/2 keyboard in boot menu when running in GTK UI <br />
** assigns serial port #1 to LASI and serial port #2 to DINO (as on real hardware) <br />
** includes additional STI text fonts<br />
* Fix performance issue with X11 artist framebuffer (makes the GTK UI faster and thus usable)<br />
* Fix X11 graphics cursor position when running HP-UX 10 or HP-UX 11<br />
* Allows the screensaver to blank the screen in X11<br />
* Allows the X11 server to turn cursor on/off <br />
* Fix serial port pass-through from host to guest<br />
* Lots of general code improvements and tidy-ups<br />
<br />
=== LoongArch ===<br />
<br />
* Add initial support for the LoongArch64 architecture, the Loongson 3A5000 multiprocessor SoC, and the Loongson 7A1000 host bridge.<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
* Implement the Vectored Interrupt Controller (enable with <code>-machine 10m50-ghrd,vic=on</code>).<br />
* Implement shadow register sets, and enable them with the VIC.<br />
* Raise supervisor-only instruction exception for <code>ERET</code> and <code>BRET</code>.<br />
* Raise misaligned data exception for misaligned memory accesses.<br />
* Raise misaligned destination exception for misaligned branch addresses.<br />
* Raise division error exception for divide by zero and divide overflow (disable with <code>-cpu diverr_present=off</code>).<br />
<br />
=== OpenRISC ===<br />
<br />
* The or1k-sim machine now supports 4 16550A UART serial devices, expanded from 1.<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
* Fix the <code>clrpsw</code> and <code>setpsw</code> instructions with respect to changes to <code>PSW.U</code>.<br />
* Fix the <code>wait</code> instruction corrupting the PC and setting <code>PSW.I</code>.<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
<br />
==== ISA and Extensions ====<br />
<br />
* Add support for privileged spec version 1.12.0<br />
* Use privileged spec version 1.12.0 for virt machine by default<br />
* Allow software access to MIP SEIP<br />
* Add initial support for the Sdtrig extension<br />
* Optimisations and improvements for the vector extension<br />
* Improvements to the misa ISA string<br />
* Add isa extension strings to the device tree<br />
* Add and enable native debug feature<br />
* Support configurable marchid, mvendorid, mimpid CSR values<br />
* Add support for the Zbkb, Zbkc, Zbkx, Zknd/Zkne, Zknh, Zksed/Zksh and Zkr extensions<br />
* Enforce floating point extension requirements<br />
* Add support for Zmmul extension<br />
* Support Vector extension tail agnostic setting elements' bits to all 1<br />
<br />
==== Machines ====<br />
<br />
* Add support for Ibex SPI to OpenTitan<br />
* Make RISC-V ACLINT mtime MMIO register writable<br />
* Add TPM support to the virt board<br />
* Improvements to RISC-V machine error handling<br />
* Don't expose the CPU properties on named CPUs<br />
<br />
==== Fixes and Misc ====<br />
* Don't allow `-bios` options with KVM machines<br />
* Fix NAPOT range computation overflow<br />
* Fix DT property mmu-type when CPU mmu option is disabled<br />
* Support 64bit fdt addresses<br />
* Fix incorrect PTE merge in walk_pte<br />
* Fixes for accessing VS hypervisor CSRs<br />
* Fixes for accessing mtimecmp<br />
* Add new short-isa-string CPU option<br />
* Disable the "G" extension by default internally, no functional change<br />
* Improvements for virtulisation<br />
* Add zicsr/zifencei to isa_string<br />
* Support for VxWorks uImage<br />
* Fixup FDT errors when supplying device tree from the command line for virt machine<br />
* Avoid overflowing the addr_config buffer in the SiFive PLIC<br />
* Support -device loader addresses above 2GB<br />
* Correctly wake from WFI on VS-level external interrupts<br />
* Fixes for RV128 support<br />
* Fix vector extension assert for RV32<br />
<br />
=== s390x ===<br />
<br />
* Fix condition code generation for the <code>ICMH</code> instruction.<br />
* Emulate the s390x Vector-Enhancements Facility 2 with TCG<br />
* Remove the old libopcode-based s390 disassembler (use Capstone instead)<br />
* Silence the warning about the msa5 feature when using the "max" CPU on s390x. The "max" CPU now matches the "qemu" CPU of the newest machine type.<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
* Support for architectural LBRs on KVM virtual machines.<br />
<br />
=== Xtensa ===<br />
<br />
* Implement cache testing opcodes.<br />
* Add lx106 core.<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
* Experimental <tt>--object x-vfio-user-server,id=<id>,type=unix,path=<socket-path>,device=<pci-dev-id></tt> for exposing emulated PCI devices over the new vfio-user protocol. A vfio-user client is not yet available in QEMU.<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
<br />
* macOS: [https://github.com/qemu/qemu/commit/f5643914a9e8f79c606a76e6a9d7ea82a3fc3e65 Several fixes] for recently (in QEMU 7.0) added 9p support for macOS hosts.<br />
* [https://lore.kernel.org/all/cover.1647339025.git.qemu_oss@crudebyte.com/ Fixed 'Twalk' error handling] from having violated 9p2000.L protocol spec.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
* Support for zero-copy-send on Linux, which reduces CPU usage on the source host. Note that locked memory is needed to support this.<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
* The ''block-export-add'' QMP command, when exporting an NBD image with dirty bitmaps, now supports passing a specific paired bitmap and node name, rather than a less-specific bitmap name that requires a search for the bitmap through a backing chain of nodes.<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
* QEMU can be compiled with the system slirp library even when using CFI. This requires libslirp 4.7.<br />
<br />
=== Block device backends and tools ===<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
* The ''-m'' and ''-boot'' options are also available via ''-M mem.*'' and ''-M boot.*''.<br />
<br />
== User-mode emulation ==<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== Nios2 ===<br />
<br />
* Fix the <code>rt_sigreturn</code> system call.<br />
* Fix the <code>siginfo_t</code> data for <code>SIGSEGV</code>.<br />
<br />
== TCG ==<br />
<br />
=== ARM ===<br />
<br />
== Guest agent ==<br />
<br />
* guest-get-disks can now return NVMe SMART informations (on Linux)<br />
* guest-get-fsinfo can now return NVMe bus-type<br />
* Improve Solaris support<br />
* Add guest-get-diskstats command (for Linux guests only)<br />
<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
* The final Python 3.6 release was 3.6.15 in September 2021. This release series is now End-of-Life (EOL). As a result, we will begin requiring Python 3.7 or newer in QEMU 7.2, which is the '''next''' release.<br />
* The minimum supported version of libslirp is 4.1. Please note the QEMU project will drop the slirp submodule in future releases. The QEMU tarball won't embed the code for user mode networking in the future anymore, so that an external libslirp installation will be required.<br />
* QEMU does not ship with the "capstone" disassembler code anymore. If you need disassembler support for certain CPU types (x86, ppc, arm or s390x), you now should make sure to have the capstone package of your OS distribution installed first.<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
* Bump Fedora image version for cross-compilation<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/7.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10918Documentation/9p2022-05-09T09:38:32Z<p>Schoenebeck: /* Protocol Plans */ separate error field for Rread / Rwrite</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20220425142705.2099270-1-bmeng.cn@gmail.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
** <b>Separate error field for Rread and Rwrite</b>: this would would save one useless Tread / Twrite request at EOF, one round-trip message and therefore would reduce latency accordingly.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10916Documentation/9p2022-05-09T09:29:26Z<p>Schoenebeck: /* Implementation Plans */ link latest patches for Windows hosts support</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
** <b>Adding support for Windows hosts</b>: See [https://lore.kernel.org/qemu-devel/20220425142705.2099270-1-bmeng.cn@gmail.com/ latest suggested Windows patch set] for issues yet to be resolved.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10896Documentation/9p2022-05-02T16:52:13Z<p>Schoenebeck: /* Implementation Plans */ case-insensitive host filesystems</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
** <b>Appropriate handling for case-insensitive filesystems on host</b>: [https://lore.kernel.org/qemu-devel/1757498.AyhHxzoH2B@silver/ See discussion] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/7.1&diff=10895ChangeLog/7.12022-05-02T12:19:49Z<p>Schoenebeck: /* 9pfs */ macOS host fixes</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
* The '''--enable-fips''' option to QEMU system emulators has been removed<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* Emulation of the CPU architecture feature FEAT_TTL is now supported<br />
* Emulation of the CPU architecture feature FEAT_BBM level 2 is now supported<br />
* The emulated SMMUv3 now advertises support for SMMUv3.2-BBML2<br />
* The xlnx-zynqmp SoC model now implements the 4 TTC timers<br />
* The versal machine now models the Cortex-R5s in the Real-Time Processing Unit (RPU) subsystem<br />
* The virt board now supports emulation of the GICv4.0<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
* Implement the Vectored Interrupt Controller (enable with <code>-machine 10m50-ghrd,vic=on</code>).<br />
* Implement shadow register sets, and enable them with the VIC.<br />
* Raise supervisor-only instruction exception for <code>ERET</code> and <code>BRET</code>.<br />
* Raise misaligned data exception for misaligned memory accesses.<br />
* Raise misaligned destination exception for misaligned branch addresses.<br />
* Raise division error exception for divide by zero and divide overflow (disable with <code>-cpu diverr_present=off</code>).<br />
<br />
=== OpenRISC ===<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
* Fix the <code>clrpsw</code> and <code>setpsw</code> instructions with respect to changes to <code>PSW.U</code>.<br />
* Fix the <code>wait</code> instruction corrupting the PC and setting <code>PSW.I</code>.<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
<br />
* Add support for Ibex SPI to OpenTitan<br />
* Add support for privileged spec version 1.12.0<br />
* Use privileged spec version 1.12.0 for virt machine by default<br />
* Allow software access to MIP SEIP<br />
* Add initial support for the Sdtrig extension<br />
* Optimisations for vector extensions<br />
* Improvements to the misa ISA string<br />
* Add isa extenstion strings to the device tree<br />
* Don't allow `-bios` options with KVM machines<br />
* Fix NAPOT range computation overflow<br />
* Fix DT property mmu-type when CPU mmu option is disabled<br />
* Make RISC-V ACLINT mtime MMIO register writable<br />
* Add and enable native debug feature<br />
* Support 64bit fdt addresses<br />
* Support configuarable marchid, mvendorid, mipid CSR values<br />
* Add support for the Zbkb, Zbkc, Zbkx, Zknd/Zkne, Zknh, Zksed/Zksh and Zkr extensions<br />
* Fix incorrect PTE merge in walk_pte<br />
* Add TPM support to the virt board<br />
<br />
=== s390x ===<br />
<br />
* Fix condition code generation for the <code>ICMH</code> instruction.<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI / SMBIOS ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
<br />
* macOS: [https://github.com/qemu/qemu/commit/f5643914a9e8f79c606a76e6a9d7ea82a3fc3e65 Several fixes] for recently (in QEMU 7.0) added 9p support for macOS hosts.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
* The ''block-export-add'' QMP command, when exporting an NBD image with dirty bitmaps, now supports passing a specific paired bitmap and node name, rather than a less-specific bitmap name that requires a search for the bitmap through a backing chain of nodes.<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
<br />
== User-mode emulation ==<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
=== Nios2 ===<br />
<br />
* Fix the <code>rt_sigreturn</code> system call.<br />
* Fix the <code>siginfo_t</code> data for <code>SIGSEGV</code>.<br />
<br />
== TCG ==<br />
<br />
=== ARM ===<br />
<br />
== Guest agent ==<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/7.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10811Documentation/9p2022-03-09T09:16:55Z<p>Schoenebeck: /* Implementation Plans */ macOS support added in QEMU 7.0</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet (NOTE: macOS hosts are already [[ChangeLog/7.0#9pfs|supported since QEMU 7.0]]).<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/7.0&diff=10809ChangeLog/7.02022-03-08T12:51:32Z<p>Schoenebeck: /* 9pfs */ added support for macOS hosts</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* The virt board has gained a new control knob to disable passing a RNG seed in the DTB (dtb-kaslr-seed)<br />
* The AST2600 SoC now supports a dummy version of the i3c device<br />
* The virt board can now run guests with KVM on hosts with restricted IPA ranges<br />
* The virt board now supports virtio-mem-pci<br />
* The virt board now supports specifying the guest CPU topology<br />
* On the virt board, we now enable PAuth when using KVM or hvf and the host CPU supports it<br />
* xlnx-versal-virt now emulates the PMC SLCR<br />
* xlnx-versal-virt now emulates the OSPI flash memory controller<br />
* The Arm GICv3 ITS now emulates the previously missing MOVI and MOVALL commands<br />
* New board model: mori-bmc<br />
* We now support emulating FEAT_LVA<br />
* We now support emulating FEAT_LPA<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
* Support up to 16 virtual CPUs<br />
* Improved artist graphics driver for HP-UX VDE, HP-UX CDE and Linux framebuffer<br />
* Mouse cursor focus and positioning now works much better under HP-UX X11 <br />
* Emulated TOC button can be triggered with "nmi" in the qemu monitor<br />
* Added support for Qemu SCSI boot order option <br />
* Possibility to change system HOSTID for HP-UX and Linux<br />
* Added firmware 16x32 pixel bitmap font for use on HDPI screens<br />
* Ability to choose serial or graphical console as default firmware console<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
==== Machines ====<br />
* Support up to 4 cores up from 2 on the OpenRISC sim machine<br />
* Support loading an external initrd image on the OpenRISC sim machine<br />
* OpenRISC sim machine now automatically generates a device tree and passes it to the kernel<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== Extensions ====<br />
* Add support for ratified 1.0 Vector extension<br />
* Support for the Zve64f and Zve32f extensions<br />
* Drop support for draft 0.7.1 Vector extension<br />
* Support Zfhmin and Zfh extensions<br />
* RISC-V KVM support<br />
* Mark Hypervisor extension as non experimental<br />
* Enable Hypervisor extension by default<br />
* Support for svnapot, svinval and svpbmt extensions<br />
* Experimental support for 128-bit CPUs<br />
* Initial support for XVentanaCondOps custom extension<br />
* stval and mtval support for illegal instructions<br />
* Support for the UXL field in xstatus<br />
* Add support for zfinx, zdinx and zhinx{min} extensions<br />
<br />
==== Machines ====<br />
* OpenSBI binary loading support for the Spike machine<br />
* Improve kernel loading for non-Linux platforms<br />
* SiFive PDMA 64-bit support<br />
* Support 32 cores on the virt machine<br />
* Add AIA support for virt machine<br />
<br />
==== Fixes ====<br />
* Fix illegal instruction when PMP is disabled<br />
* Corrections for the Vector extension<br />
* Fixes for OpenTitan timer<br />
* Correction of OpenTitan PLIC stride length<br />
* Removal of OpenSBI ELFs<br />
* Fix trap cause for RV32 HS-mode CSR access from RV64 HS-mode<br />
* Fixup OpenTitan SPI address<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
==== KVM ====<br />
<br />
==== x86_64 ====<br />
<br />
==== AMD SEV ====<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* [https://gitlab.com/qemu-project/qemu/-/commit/e64e27d5cb103b7764f1a05b6eda7e7fedd517c5 Fixed 9p server crash] ([https://gitlab.com/qemu-project/qemu/-/issues/841 issue #841]) that happened on some host systems due to incorrect (system dependant) handling of struct dirent size.<br />
* [https://gitlab.com/qemu-project/qemu/-/commit/f45cc81911adc7726e8a2801986b6998b91b816e Added support for macOS hosts].<br />
<br />
==== virtiofs ====<br />
* Fix for CVE-2022-0358 - behaviour with supplementary groups and SGID directories<br />
* Improved security label support<br />
* The virtiofsd in qemu is now starting to be deprecated; please start using and contributing to [https://gitlab.com/virtio-fs/virtiofsd Rust virtiofsd]<br />
<br />
==== Semihosting ====<br />
<br />
* We now generate sane numbers for SYS_HEAPINFO under system emulation<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
* new coverage plugin in contrib which support drcov format traces<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
* A bug in caching block status has been fixed that was causing over-eager treatment of a format layer as all data rather than detecting holes, if an earlier block status query had merely been checking for which portions of the backing chain were allocated. While the bug did not affect guest-visible data, it caused some performance regressions, particularly noticeable and easy to trigger when using 'qemu-nbd --allocation-depth'.<br />
* The SSH driver supports sha256 fingerprints with pre-blockdev command line configuration syntax.<br />
* The SSH driver will print the actual fingerprint and its type when failing to validate a host key.<br />
<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
<br />
* The -sandbox 'spawn' filter, will now correctly block use of the clone syscall for spawnnig processes, while allowing thread creation<br />
* The -sandbox 'spawn' filter, will now entirely block use of the clone3 syscall entirely since there is no way to access its flags parameter from seccomp to distinguish thread vs process creation<br />
* The -sandbox 'spawn' filter, will now block setns, unshare and execveat syscalls since they are not desired.<br />
<br />
== User-mode emulation ==<br />
<br />
* fixed a bug that caused issues mapping the ARM commpage on 32 bit builds<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
== TCG ==<br />
<br />
User-mode emulation (linux-user, bsd-user) will enforce guest alignment constraints and raise SIGBUS to the guest program as appropriate.<br />
<br />
=== ARM ===<br />
<br />
Support for for ARMv4 and ARMv5 hosts has been dropped. These older Arm versions do not have support for misaligned memory access; such support was added to ARMv6. Since ARMv5 is quite old, it is presumed that such systems do not have sufficient RAM to even run QEMU, and so practically speaking no systems are impacted.<br />
<br />
== Guest agent ==<br />
* Support Windows 11 for <code>guest-get-osinfo</code> command<br />
* Fix memory leaks in Windows <code>guest-get-fsinfo</code> command<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
* a large number of containers are now updated by lcitool<br />
* TESTS and IMAGES environment variables can be used filter again when building against all docker targets<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/7.0]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10787Documentation/9p2022-02-22T17:01:18Z<p>Schoenebeck: /* Implementation Plans */ update macOS patch set link to v8</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS hosts</b>: See [https://lore.kernel.org/qemu-devel/20220220165056.72289-1-wwcohen@gmail.com/ latest suggested patch set (and comments about unresolved issues)].<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/7.0&diff=10774ChangeLog/7.02022-02-19T16:44:33Z<p>Schoenebeck: /* 9pfs */ fixed 9p server crash due to incorrect struct dirent size handling</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* The virt board has gained a new control knob to disable passing a RNG seed in the DTB (dtb-kaslr-seed)<br />
* The AST2600 SoC now supports a dummy version of the i3c device<br />
* The virt board can now run guests with KVM on hosts with restricted IPA ranges<br />
* The virt board now supports virtio-mem-pci<br />
* The virt board now supports specifying the guest CPU topology<br />
* On the virt board, we now enable PAuth when using KVM and the host CPU supports it<br />
* xlnx-versal-virt now emulates the PMC SLCR<br />
* xlnx-versal-virt now emulates the OSPI flash memory controller<br />
* The Arm GICv3 ITS now emulates the previously missing MOVI and MOVALL commands<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
* Support up to 16 virtual CPUs<br />
* Improved artist graphics driver for HP-UX VDE, HP-UX CDE and Linux framebuffer<br />
* Mouse cursor focus and positioning now works much better under HP-UX X11 <br />
* Emulated TOC button can be triggered with "nmi" in the qemu monitor<br />
* Added support for Qemu SCSI boot order option <br />
* Possibility to change system HOSTID for HP-UX and Linux<br />
* Added firmware 16x32 pixel bitmap font for use on HDPI screens<br />
* Ability to choose serial or graphical console as default firmware console<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
<br />
=== PowerPC ===<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
==== Extensions ====<br />
* Add support for ratified 1.0 Vector extension<br />
* Support for the Zve64f and Zve32f extensions<br />
* Drop support for draft 0.7.1 Vector extension<br />
* Support Zfhmin and Zfh extensions<br />
* RISC-V KVM support<br />
* Mark Hypervisor extension as non experimental<br />
* Enable Hypervisor extension by default<br />
* Support for svnapot, svinval and svpbmt extensions<br />
* Experimental support for 128-bit CPUs<br />
* Initial support for XVentanaCondOps custom extension<br />
* stval and mtval support for illegal instructions<br />
* Support for the UXL field in xstatus<br />
<br />
==== Machines ====<br />
* OpenSBI binary loading support for the Spike machine<br />
* Improve kernel loading for non-Linux platforms<br />
* SiFive PDMA 64-bit support<br />
* Support 32 cores on the virt machine<br />
<br />
==== Fixes ====<br />
* Fix illegal instruction when PMP is disabled<br />
* Corrections for the Vector extension<br />
* Fixes for OpenTitan timer<br />
* Correction of OpenTitan PLIC stride length<br />
* Removal of OpenSBI ELFs<br />
* Fix trap cause for RV32 HS-mode CSR access from RV64 HS-mode<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
==== KVM ====<br />
<br />
==== x86_64 ====<br />
<br />
==== AMD SEV ====<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* [https://gitlab.com/qemu-project/qemu/-/commit/e64e27d5cb103b7764f1a05b6eda7e7fedd517c5 Fixed 9p server crash] ([https://gitlab.com/qemu-project/qemu/-/issues/841 issue #841]) that happened on some host systems due to incorrect (system dependant) handling of struct dirent size.<br />
<br />
==== virtiofs ====<br />
* Fix for CVE-2022-0358 - behaviour with supplementary groups and SGID directories<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
* new coverage plugin in contrib which support drcov format traces<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
* A bug in caching block status has been fixed that was causing over-eager treatment of a format layer as all data rather than detecting holes, if an earlier block status query had merely been checking for which portions of the backing chain were allocated. While the bug did not affect guest-visible data, it caused some performance regressions, particularly noticeable and easy to trigger when using 'qemu-nbd --allocation-depth'.<br />
<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
<br />
== User-mode emulation ==<br />
<br />
* fixed a bug that caused issues mapping the ARM commpage on 32 bit builds<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
== TCG ==<br />
<br />
User-mode emulation (linux-user, bsd-user) will enforce guest alignment constraints and raise SIGBUS to the guest program as appropriate.<br />
<br />
=== ARM ===<br />
<br />
Support for for ARMv4 and ARMv5 hosts has been dropped. These older Arm versions do not have support for misaligned memory access; such support was added to ARMv6. Since ARMv5 is quite old, it is presumed that such systems do not have sufficient RAM to even run QEMU, and so practically speaking no systems are impacted.<br />
<br />
== Guest agent ==<br />
* Support Windows 11 for <code>guest-get-osinfo</code> command<br />
* Fix memory leaks in Windows <code>guest-get-fsinfo</code> command<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
* a large number of containers are now updated by lcitool<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/7.0]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10734Documentation/9p2022-01-19T11:52:35Z<p>Schoenebeck: /* Contribute */ minor clarifications for contacts</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS hosts</b>: See [https://lore.kernel.org/all/2B4D46DD-074E-4070-BAF0-AADAD1183B33@icloud.com/T/ latest suggested patch set (and comments about unresolved issues)].<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "<b>9p</b>" to the subject line to prevent your message from ending up unseen; better though run [https://github.com/qemu/qemu/blob/master/scripts/get_maintainer.pl scripts/get_maintainer.pl] to get all relevant people that should be CCed (or if you don't have the QEMU sources at hand for executing the script, manually find the currently responsible persons for 9p in QEMU's latest [https://github.com/qemu/qemu/blob/master/MAINTAINERS MAINTAINERS] file).<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=10733Documentation/9p root fs2022-01-19T11:30:55Z<p>Schoenebeck: /* Boot the 9p Root FS System */ add posixacl to boot command</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
[[File:Debian_11_live_boot_menu_screenshot.png|frameless|upright=2.4]]<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
<b>Important</b>: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
Yet another alternative would be to configure mounting tmpfs from host side (~/vm/bullseye/tmp).<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap,posixacl console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about caching, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not immediately see<br />
file changes performed on host side. So choose wisely upon intended use case<br />
scenario.<br />
You can change between <b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=10732Documentation/9p root fs2022-01-18T11:19:48Z<p>Schoenebeck: /* Let's start the Installation */ minor markup change</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
[[File:Debian_11_live_boot_menu_screenshot.png|frameless|upright=2.4]]<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
<b>Important</b>: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
Yet another alternative would be to configure mounting tmpfs from host side (~/vm/bullseye/tmp).<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about caching, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not immediately see<br />
file changes performed on host side. So choose wisely upon intended use case<br />
scenario.<br />
You can change between <b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10731Documentation/9p2022-01-18T11:15:14Z<p>Schoenebeck: /* 9p Filesystem Drivers */ add link to 9p root fs HOWTO</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already (see [[Documentation/9p_root_fs]]).<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS hosts</b>: See [https://lore.kernel.org/all/2B4D46DD-074E-4070-BAF0-AADAD1183B33@icloud.com/T/ latest suggested patch set (and comments about unresolved issues)].<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=10728Documentation/9p root fs2022-01-17T13:24:06Z<p>Schoenebeck: /* Let's start the Installation */ mention alternative of mounting tmpfs from host side</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
[[File:Debian_11_live_boot_menu_screenshot.png|frameless|upright=2.4]]<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
Important: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
Yet another alternative would be to configure mounting tmpfs from host side (~/vm/bullseye/tmp).<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about caching, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not immediately see<br />
file changes performed on host side. So choose wisely upon intended use case<br />
scenario.<br />
You can change between <b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=10727Documentation/9psetup2022-01-17T13:08:12Z<p>Schoenebeck: add link to new 9p root fs HOWTO</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
See also [[Documentation/9p_root_fs]] for a complete HOWTO about installing and configuring an entire guest system ontop of 9p as root fs.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host. <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root.<br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600,cache=loose<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=10726Documentation/9p root fs2022-01-17T13:05:20Z<p>Schoenebeck: /* Let's start the Installation */ add Debian 11 Live CD boot menu screen shot</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
[[File:Debian_11_live_boot_menu_screenshot.png|frameless|upright=2.4]]<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
Important: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about caching, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not immediately see<br />
file changes performed on host side. So choose wisely upon intended use case<br />
scenario.<br />
You can change between <b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=File:Debian_11_live_boot_menu_screenshot.png&diff=10725File:Debian 11 live boot menu screenshot.png2022-01-17T13:02:10Z<p>Schoenebeck: </p>
<hr />
<div></div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p_root_fs&diff=10724Documentation/9p root fs2022-01-17T12:56:24Z<p>Schoenebeck: initial version (with Debian 11 Bullseye as example)</p>
<hr />
<div>= 9P as root filesystem (Howto) =<br />
<br />
It is possible to run a whole virtualized guest system entirely on top of<br />
QEMU's <b>9p pass-through filesystem</b> ([[Documentation/9psetup]])<br />
such that all guest system's files are<br />
directly visible inside a subdirectory on the host system and therefore directly<br />
accessible by both sides.<br />
<br />
This howto shows a way to install and setup<br />
[https://www.debian.org/releases/bullseye/ Debian 11 "Bullseye"] as guest system<br />
as an example with 9p being guest's root filesystem.<br />
<br />
Roughly summarized we are first booting a Debian Live CD with QEMU and then using<br />
the [https://wiki.debian.org/Debootstrap debootstrap] tool to install a<br />
standard basic Debian system into a manually mounted 9p directory.<br />
The same approach can be used almost identical for many other distributions,<br />
e.g for related .deb package based distros like [https://ubuntu.com Ubuntu] you<br />
probably just need to adjust the debootstrap command with a different URL as<br />
argument.<br />
<br />
== Motivation ==<br />
<br />
There are several advantages to run a guest OS entirely on top of 9pfs:<br />
<br />
* <b>Transparency and Shared File Access</b>: The classical way to deploy a virtualized OS (a.k.a. "guest") on a physical machine (a.k.a. "host") is to create a virtual block device (i.e. one huge file on host's filesystem) and leave it to the guest OS to format and maintain a filesystem ontop of that virtualized block device. As that filesystem would be managed by the guest OS, shared file access by host and guest simultaniously is usually cumbersome and problematic, if not even dangerous. A 9p passthrough-filesystem instead allows convenient file access by both host and guest simultaniously as the filesystem is just a regular subdirectory somewhere inside host's own filesystem.<br />
<br />
* <b>Partitioning of Guest's Filesystem</b>: in early UNIX days it was common to subdivide a machine's filesystem into several subdirectories by creating multiple partitions on the hard disk(s) and mounting those partitions to common points of the system's abstract file system tree. Later this became less common as one had to decide upfront at installation how large those individual partitions shall be, and resizing the partitions later on was often considered to be not worth the hassle (e.g. due to system down time, admin work time, potential issues). With modern hybrid filesystems like [https://btrfs.wiki.kernel.org/index.php/Main_Page btrfs] and [https://en.wikipedia.org/wiki/ZFS ZFS] however, subdividing a filesystem tree into multiple, separate parts sees a revival as subdivision into their "data sets" (equivalent to classical hard disk "partitions") comes with almost zero cost now as those "data sets" acquire and release individual data blocks from a shared pool on-demand, so they don't require any size decisions upfront, nor any resizing later on. If we would deploy filesystems like btrfs or zfs on guest side however ontop of a virtualized block device, we would defeat many of those filesystem's advantages. Instead if the filesystem is deployed solely on host side by using 9p, we preserve their advantages and allow a much more convenient and powerful way to manage any of their filesystem aspects as the guest OS is running completely independent and without knowledge what filesystem it is actually running on.<br />
<br />
* <b>(Partial) Live Rollback</b>: As the filesystem is on host side, we can snapshot and rollback the filesystem from host side while guest is still running. By using "data sets" (as described above) we can even rollback only certain part(s) of guest's filesystem, e.g. rolling back a software installation while preserving user data, or the other way around.<br />
<br />
* <b>Deduplication</b>: with either ZFS or (even better) btrfs on host we can reduce the overall storage size and therefore storage costs for deploying a large amount of virtual machines (VMs), as both filesystems support data deduplication. In practice VMs usually share a significant amount of identical data as VMs often use identical operating systems, so they typically have identical versions of applications, libraries, and so forth. Both ZFS and btrfs allow to automatically detect and unify identical blocks and thefore reduce enormous storage space that would otherwise be wasted with a large amount of VMs.<br />
<br />
<span id="start"></span><br />
== Let's start the Installation ==<br />
<br />
In this entire howto we are running QEMU <b>always</b> as <b>regular user</b>.<br />
You don't need to run QEMU with root privileges (on host) for anything in this<br />
article, and for production system's it is in general discouraged to run QEMU<br />
as user root.<br />
<br />
First we create an empty directory where we want to install the guest system to,<br />
for instance somewhere in your (regular) user's home directory on host.<br />
<br />
mkdir -p ~/vm/bullseye<br />
<br />
At this point, if you are using a filesystem on host like btrfs or ZFS, you now<br />
might want to create the individual filesystem data sets and create the<br />
respective (yet empty) subdirectories below ~/vm/bullseye<br />
(for instance home/, var/, var/log/, root/, etc.), this is optional though.<br />
We are not describing how to configure those filesystems in this howto, but we<br />
will outline noteworthy aspects during the process if required.<br />
<br />
Next we download the latest Debian Live CD image. Before blindly pasting the<br />
following command, you probably want to<br />
[https://cdimage.debian.org/debian-cd/current-live/amd64/bt-hybrid/ check this URL]<br />
whether there is a younger version of the live CD image available (likely).<br />
<br />
cd ~/vm<br />
wget https://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-11.2.0-amd64-standard.iso<br />
<br />
Boot the Debian Live CD image and make our target installation directory<br />
~/vm/bullsey/ on host available to the VM via 9p.<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot d -cdrom ~/vm/debian-live-11.2.0-amd64-standard.iso \<br />
-fsdev local,security_model=mapped,id=fsdev-fs0,multidevs=remap,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fs0<br />
<br />
You should now see the following message:<br />
<br />
VNC server running on ::1:5900<br />
<br />
If the machine where you are running QEMU on (i.e. where you are currently<br />
installing to), and the machine from where you are currently typing the commands<br />
are not the same, then you need to establish a SSH tunnel to make the remote<br />
machine's VNC port available on your workstation.<br />
<br />
ssh user@machine -L 5900:127.0.0.1:5900<br />
<br />
Now start any VNC client of your choice on your workstation and connect to<br />
localhost. You should now see the Debian Live CD's boot menu screen inside your<br />
VNC client's window.<br />
<br />
From the boot menu select "Debian GNU/Linux Live". You should now see the following prompt:<br />
<br />
user@debian:/home/user#<br />
<br />
Which tells you that you are in a shell with a regular user named "user".<br />
Let's get super power (inside that Live CD VM):<br />
<br />
sudo bash<br />
<br />
Now mount the target installation directory created on host via 9p pass-through<br />
filesystem inside guest.<br />
<br />
mkdir /mnt/inst<br />
mount -t 9p -o trans=virtio fs0 /mnt/inst -oversion=9p2000.L,posixacl,msize=5000000,cache=mmap<br />
<br />
Next we need to get the <b>debootstrap</b> tool. Note: at this point you might<br />
be tempted to [https://en.wikipedia.org/wiki/Ping_(networking_utility) ping]<br />
some host to check whether Internet connection is working inside the booted Live<br />
CD VM. This will <b>not</b> work (pinging), but the Internet connection should<br />
already be working nevertheless. That's because we were omitting any network<br />
configuration arguments with the QEMU command above, in which case QEMU<br />
defaults to SLiRP user networking where ICMP is not working (see [[Documentation/Networking#Network_Basics]]).<br />
<br />
apt update<br />
apt install debootstrap<br />
<br />
If you are using something like btrfs or ZFS for the installation directory and<br />
already subdivided the installation directory with some emtpy directories, you<br />
should now fix the permissions the guest system sees (i.e. guest should think<br />
it has root permissions on everything, even though the actual filesystem<br />
directories on host are probably owned by another user on host).<br />
<br />
chown -R root:root /mn/inst<br />
<br />
Now download and install a "minimal" Debian 11 ("Bullseye") system into the<br />
target directory.<br />
<br />
debootstrap bullseye /mnt/inst https://deb.debian.org/debian/<br />
<br />
Note: you might see some warnings like:<br />
<br />
FS-Cache: Duplicate cookie detected<br />
<br />
Ignore those warnings. The debootstrap process might take quite some time, so<br />
now would be a good time for a coffee break. Once debootstrap is done, you<br />
should see the following final message:<br />
<br />
I: Basesystem installed successfully.<br />
<br />
Now you have a minimal system installation. But it is so minimal that you won't<br />
be able to do much with it. So it is not yet the basic system that you would<br />
have after completing the standard Debian installer.<br />
<br />
So let's chroot into the minimal system that we have so far, to be able to<br />
install the missing packages.<br />
<br />
mount -o bind /proc /mnt/inst/proc<br />
mount -o bind /dev /mnt/inst/dev<br />
mount -o bind /dev/pts /mnt/inst/dev/pts<br />
mount -o bind /sys /mnt/inst/sys<br />
chroot /mnt/inst /bin/bash<br />
<br />
Important: now we need to mount a tmpfs on /tmp (inside the chroot environment<br />
that we are in now).<br />
<br />
mount -t tmpfs -o noatime,size=500M tmpfs /tmp<br />
<br />
If you omit the previous step, you will most likely get error messages like the<br />
following with the subsequent <i>apt</i> commands below:<br />
<br />
E: Unable to determine file size for fd 7 - fstat (2: No such file or directory)<br />
<br />
Let's install the next fundamental packages. At this point you might get some<br />
locale warnings yet. Ignore them.<br />
<br />
apt update<br />
apt install console-data console-common tzdata locales keyboard-configuration<br />
<br />
We need a kernel to boot from. Let's use Bullseye's standard Linux kernel.<br />
<br />
apt install linux-image-amd64<br />
<br />
Select the time zone for the VM.<br />
<br />
dpkg-reconfigure tzdata<br />
<br />
Configure and generate locales.<br />
<br />
dpkg-reconfigure locales<br />
<br />
In the first dialog select at least "en_US.UTF-8", then "Next", then in the<br />
subsequent dialog select "C.UTF-8" and finish the dialog.<br />
<br />
The basic installation that you might be used to after running the regular<br />
Debian installer is called the "standard" installation. Let's install the<br />
missing "standard" packages. For this we are using the <b>tasksel</b> tool. It<br />
should already be installed, if it is not then install it now.<br />
<br />
apt install tasksel<br />
<br />
The following simple command should usually be sufficient to install all missing<br />
packages for a Debian "standard" installation automatically.<br />
<br />
tasksel install standard<br />
<br />
For some people however the tasksel command above does not work (it would hang<br />
with output "100%"). If you are encountering that issue, then use the following<br />
workaround by using tasksel to just dump the list of packages to be installed<br />
and then manually install the packages via apt by passing those package names as<br />
arguments to apt.<br />
<br />
tasksel --task-packages standard<br />
apt install ...<br />
<br />
Before being able to boot from the installation directory, we need to adjust the<br />
initramfs to contain the 9p drivers, remember we will run 9p as root filesystem,<br />
so 9p drivers are required before the actual system is starting.<br />
<br />
cd /etc/initramfs-tools<br />
echo 9p >> modules<br />
echo 9pnet >> modules<br />
echo 9pnet_virtio >> modules<br />
update-initramfs -u<br />
<br />
The previous update-initramfs might take some time. Once it is done, check that<br />
we really have the three 9p kernel drivers inside the generated initramfs now.<br />
<br />
lsinitramfs /boot/initrd.img-5.10.0-10-amd64 | grep 9p<br />
<br />
Let's set the root password for the installed Debian system.<br />
<br />
passwd<br />
<br />
We probably want Internet connectivity on the installed Debian system. Let's<br />
keep it simple here and just configure DHCP for it to automatically acquire<br />
IP address, gateway/router IP and DNS servers.<br />
<br />
printf 'allow-hotplug ens3\niface ens3 inet dhcp\n' > /etc/network/interfaces.d/ens3<br />
<br />
<span id="use-after-unlink"></span><br />
<b>Important</b>: Finally we setup a tmpfs (permanently) on /tmp for the<br />
installed Debian system, similar to what we already did (temporarily) above for<br />
the Live CD VM that we are currently still runing.<br />
There are various ways to configure that permanently for the installed system.<br />
In this case we are using the systemd approach to configure it.<br />
<br />
cp /usr/share/systemd/tmp.mount /etc/systemd/system/<br />
systemctl enable tmp.mount<br />
<br />
Alternatively you could of course also configure it by adding an entry to<br />
/etc/fstab instead, e.g. something like:<br />
<br />
echo 'tmpfs /tmp tmpfs rw,nosuid,nodev,size=524288k,nr_inodes=204800 0 0' >> /etc/fstab<br />
<br />
This tmpfs on /tmp is currently required to avoid<br />
[https://gitlab.com/qemu-project/qemu/-/issues/103 issues with use-after-unlink]<br />
patterns, which in practice however only happen for files below /tmp. At least<br />
I have not encountered any software so far that used this pattern at locations<br />
other than /tmp.<br />
<br />
Installation is now complete, so let's leave the chroot environment.<br />
<br />
exit<br />
<br />
And shutdown the Live CD VM at this point.<br />
<br />
sync<br />
shutdown -h now<br />
<br />
You can close the VNC client at this point and also close the VNC SSH tunnel<br />
(if you had one), we no longer need them. Finally hit <b>Ctrl-C</b> to quit<br />
QEMU that is still running the remainders of the Live CD VM.<br />
<br />
== Boot the 9p Root FS System ==<br />
<br />
The standard basic installation is now complete.<br />
<br />
Run this command from host to boot the fresh installed Debian 11 ("Bullseye")<br />
system with 9p being guest's root filesystem:<br />
<br />
/usr/bin/qemu-system-x86_64 \<br />
-machine pc,accel=kvm,usb=off,dump-guest-core=off -m 2048 \<br />
-smp 4,sockets=4,cores=1,threads=1 -rtc base=utc \<br />
-boot strict=on -kernel ~/vm/bullseye/boot/vmlinuz-5.10.0-10-amd64 \<br />
-initrd ~/vm/bullseye/boot/initrd.img-5.10.0-10-amd64 \<br />
-append 'root=fsRoot rw rootfstype=9p rootflags=trans=virtio,version=9p2000.L,msize=5000000,cache=mmap console=ttyS0' \<br />
-fsdev local,security_model=mapped,multidevs=remap,id=fsdev-fsRoot,path=$HOME/vm/bullseye/ \<br />
-device virtio-9p-pci,id=fsRoot,fsdev=fsdev-fsRoot,mount_tag=fsRoot \<br />
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \<br />
-nographic<br />
<br />
Note: you need to use at least <b>cache=mmap</b> with the command above. That's<br />
actually not about caching, but rather allows the [https://en.wikipedia.org/wiki/Mmap mmap()]<br />
call to work on the<br />
guest system at all. Without this the guest system would even fail to boot, as<br />
many software components rely on the availability of the mmap() call.<br />
<br />
To speedup things you can also consider to use e.g. <b>cache=loose</b> instead.<br />
That will deploy a filesystem cache on guest side and reduces the amount of 9p<br />
requests to hosts. As a consequence however guest might not immediately see<br />
file changes performed on host side. So choose wisely upon intended use case<br />
scenario.<br />
You can change between <b>cache=mmap</b> or e.g. <b>cache=loose</b> at any time.<br />
<br />
Another aspect to consider is the performance impact of the <b>msize</b> argument<br />
(see [[Documentation/9psetup#msize]] for details).<br />
<br />
Finally you would login as user root to the booted guest and install any other<br />
packages that you need, like a webserver, SMTP server, etc.<br />
<br />
apt update<br />
apt search ...<br />
apt install ...<br />
<br />
That's it!<br />
<br />
== Questions and Feedback ==<br />
<br />
Refer to [[Documentation/9p#Contribute]] for patches, issues etc.</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10721Documentation/9p2022-01-12T14:36:34Z<p>Schoenebeck: /* Implementation Plans */ replace link to macOS patch set (lists.gnu.org -> lore.kernel.org)</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS hosts</b>: See [https://lore.kernel.org/all/2B4D46DD-074E-4070-BAF0-AADAD1183B33@icloud.com/T/ latest suggested patch set (and comments about unresolved issues)].<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10702Documentation/9p2021-12-01T14:26:21Z<p>Schoenebeck: /* Implementation Plans */ add new features plan for supporting macOS</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Features</b>:<br />
** <b>Adding support for macOS hosts</b>: See [https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg04325.html latest suggested patch set].<br />
** <b>Adding support for macOS guests</b>: nobody started work on this yet.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10682Documentation/9p2021-11-12T13:55:59Z<p>Schoenebeck: /* Implementation Plans */ add separate category "Fixes"</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Fixes</b>:<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10681Documentation/9p2021-11-12T13:54:19Z<p>Schoenebeck: /* Implementation Plans */ add link to "use after unlink()" bug</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
** <b>Fixing use after unlink()</b>: See [https://gitlab.com/qemu-project/qemu/-/issues/103 Gitlab issue 103] for details.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10674Documentation/9p2021-11-09T13:29:26Z<p>Schoenebeck: /* Contribute */ scripts/get_maintainer.pl should be used for patch submissions</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen; better though run scripts/get_maintainer.pl to get all relevant people that should be CCed.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/6.2&diff=10669ChangeLog/6.22021-10-28T17:34:25Z<p>Schoenebeck: /* 9pfs */ fixed sub-optimal I/O performance on guest due to incorrect block size</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* On macOS hosts with Apple Silicon CPUs we now support the 'hvf' accelerator for running AArch64 guests<br />
* M-profile CPUs now emulate trapping on division by zero via CCR.DIV_0_TRP<br />
* The pl011 UART model now supports sending 'break'<br />
* The Fujitsu A64FX processor model is now supported in TCG ('-cpu a64fx')<br />
* The M-profile MVE extension is now supported, and enabled in the Cortex-M55<br />
* The deprecated machine names 'raspi2' and 'raspi3' have been removed; use 'raspi2b' and 'raspi3b' instead<br />
* The 'virt' machine now supports an emulated ITS<br />
* New machine type: kudo-bmc<br />
* The xlnx-zcu102 and xlnx-versal-virt machines now support BBRAM and eFUSE devices<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
<br />
=== PowerPC ===<br />
* Improved POWER10 support for the 'powernv' machine<br />
* Initial support for POWER10 DD2.0 CPU added<br />
* Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine type<br />
** With a guest kernel that also has support, this allows for asymmetric and other complex NUMA tolopogies which previously couldn't be communicated to the guest<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
* Add Zb[abcs] instruction support<br />
* Remove RVB support<br />
* Fixup virt flash node<br />
* Don't override users supplied ISA version<br />
* Fixup some CSR accesses<br />
* Fix an overflow in the SiFive CLINT (https://gitlab.com/qemu-project/qemu/-/issues/493)<br />
* ePMP CSR address updates<br />
* SiFive PWM support<br />
* Support for RISC-V ACLINT<br />
* Support vhost-user and numa mem options on all boards<br />
* mstatus.SD bug fix for hypervisor extensions<br />
* OpenTitan fix for USB dev address<br />
* OpenTitan update to latest bitstream build<br />
* Remove the Ibex PLIC<br />
* Bug fix of setting mstatus_hs.[SD|FS] bits<br />
* Fixes for sifive PDMA<br />
* Mark shakti_c as not user creatable<br />
<br />
=== s390x ===<br />
<br />
* Improved storage key emulation (e.g. fixed address handling, lazy storage key enablement for TCG, ...)<br />
* New gen16 CPU features are now enabled automatically in the latest machine type<br />
<br />
=== SPARC ===<br />
* Fix for booting sun4m machines with more than 1 CPU<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
* New Snowridge-v4 CPU model, with split-lock-detect feature disabled<br />
<br />
==== KVM ====<br />
* Support for SGX in the virtual machine, using the /dev/sgx_vepc device on the host and the "memory-backend-epc" backend in QEMU.<br />
* New "hv-apicv" CPU property (aliased to "hv-avic") sets the HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX.<br />
<br />
==== x86_64 ====<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* Fixed an occasional [https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00320.html crash when handling 'Twalk'] requests; thus bug was introduced in QEMU 6.1.0.<br />
* Fixed sub-optimal I/O performance on guest due to incorrect IOUNIT which happened with certain applications like 'cat' which retrieve stat's st_blksize at runtime to determine their read/write buffer size.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
* ESCC reset fixes<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
* New event DEVICE_UNPLUG_GUEST_ERROR, which allows guest-reported failures of hot unplugs to be reported to the user or management layer<br />
** Since this relies on the guest, an event can't be guaranteed and only some hotplug mechanisms can generate it at all<br />
** This will eventually replace MEM_UNPLUG_ERROR which reported the same thing, but only for memory unplug<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
* ''qemu-nbd'' now defaults to writeback caching, rather than writethrough, to match the defaults of ''qemu-img''. While this has better performance, it may affect correctness if you were previously relying on writethrough semantics without explicit use of the '--cache=' option.<br />
<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
<br />
== User-mode emulation ==<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
== TCG ==<br />
<br />
* plugins now have a bool arg parsing helper and cleaned up argument syntax<br />
* the cache plugin is now multi-core aware<br />
<br />
== Guest agent ==<br />
<br />
== Build Information ==<br />
* the --enable-git-update and --disable-git-update options to configure were removed<br />
* the --disable-blobs option to configure is deprecated and should be replaced with --disable-install-blobs<br />
* the --enable-trace-backend to configure is deprecated and should be replaced with --enable-trace-backends<br />
* the --enable-jemalloc to configure is deprecated and should be replaced with --enable-malloc=jemalloc<br />
* the --enable-tcmalloc to configure is deprecated and should be replaced with --enable-malloc=tcmalloc<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/6.2]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9psetup&diff=10601Documentation/9psetup2021-09-09T12:20:49Z<p>Schoenebeck: /* Performance Considerations */ default msize will be raised in upcoming Linux kernel 5.15</p>
<hr />
<div>With QEMU's 9pfs you can create virtual filesystem devices (virtio-9p-device) and expose them to guests, which essentially means that a certain directory on host machine is made directly accessible by a guest OS as a pass-through file system by using the [https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs#9P_protocol 9P network protocol] for communication between host and guest, if desired even accessible, shared by several guests simultaniously.<br />
<br />
This section details the steps involved in setting up VirtFS (Plan 9 folder sharing over Virtio - I/O virtualization framework) between the guest and host operating systems. The instructions are followed by an<br />
example usage of the mentioned steps.<br />
<br />
This page is focused on user aspects like setting up 9pfs, configuration, performance tweaks. For the developers documentation of 9pfs refer to [[Documentation/9p]] instead.<br />
<br />
== Preparation ==<br />
<br />
1. Download the latest kernel code (2.6.36.rc4 or newer) from http://www.kernel.org to build the kernel image for the guest.<br />
<br />
2. Ensure the following 9P options are enabled in the kernel configuration.<br />
CONFIG_NET_9P=y<br />
CONFIG_NET_9P_VIRTIO=y<br />
CONFIG_NET_9P_DEBUG=y (Optional)<br />
CONFIG_9P_FS=y<br />
CONFIG_9P_FS_POSIX_ACL=y<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
<br />
and these PCI and virtio options:<br />
CONFIG_PCI=y<br />
CONFIG_VIRTIO_PCI=y<br />
CONFIG_PCI_HOST_GENERIC=y (only needed for the QEMU Arm 'virt' board)<br />
<br />
3. Get the latest git repository from http://git.qemu.org/ or http://repo.or.cz/w/qemu.git. <br />
<br />
4. Configure QEMU for the desired target. Note that if the configuration step prompts ATTR/XATTR as 'no' then you need to install ''libattr'' & ''libattr-dev'' first.<br />
<br />
For debian based systems install packages ''libattr1'' & ''libattr1-dev'' and for rpm based systems install ''libattr'' & ''libattr-devel''. Proceed to configure and build QEMU.<br />
<br />
5. Setup the guest OS image and ensure kvm modules are loaded.<br />
<br />
== Starting the Guest directly ==<br />
To start the guest add the following options to enable 9P sharing in QEMU<br />
-fsdev <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,security_model=mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>] -device <b>TRANSPORT_DRIVER</b>,fsdev=<b>FSDEVID</b>,mount_tag=<b>MOUNT_TAG</b><br />
<br />
You can also just use the following short-cut of the command above:<br />
-virtfs <b>FSDRIVER</b>,path=<b>PATH_TO_SHARE</b>,mount_tag=<b>MOUNT_TAG</b>,security_model=mapped|mapped-xattr|mapped-file|passthrough|none[,id=<b>ID</b>][,writeout=immediate][,readonly][,fmode=<b>FMODE</b>][,dmode=<b>DMODE</b>][,multidevs=remap|forbid|warn][,socket=<b>SOCKET</b>|sock_fd=<b>SOCK_FD</b>]<br />
<br />
Options:<br />
<br />
* <b>FSDRIVER</b>: Either "local", "proxy" or "synth". This option specifies the filesystem driver backend to use. In short: you want to use "local". In detail:<br />
# local: Simply lets QEMU call the individual VFS functions (more or less) directly on host. <br />
# proxy: this driver was supposed to dispatch the VFS functions to be called from a separate process (by virtfs-proxy-helper), however the "proxy" driver is currently not considered to be production grade. <br />
# synth: This driver is only used for development purposes (i.e. test cases).<br />
<br />
* <b>TRANSPORT_DRIVER</b>: Either "virtio-9p-pci", "virtio-9p-ccw" or "virtio-9p-device", depending on the underlying system. This option specifies the driver used for communication between host and guest. if the -virtfs shorthand form is used then "virtio-9p-pci" is implied.<br />
<br />
* id=<b>ID</b>: Specifies identifier for this fsdev device.<br />
<br />
* path=<b>PATH_TO_SHARE</b>: Specifies the export path for the file system device. Files under this path on host will be available to the 9p client on the guest.<br />
<br />
* security_model=mapped-xattr|mapped-file|passthrough|none: Specifies the security model to be used for this export path. Security model is mandatory only for "local" fsdriver. Other fsdrivers (like "proxy") don't take security model as a parameter. Recommended option is "mapped-xattr".<br />
# passthrough: Files are stored using the same credentials as they are created on the guest. This requires QEMU to run as root.<br />
# mapped: Equivalent to "mapped-xattr".<br />
# mapped-xattr: Some of the file attributes like uid, gid, mode bits and link target are stored as file attributes. This is probably the most reliable and secure option.<br />
# mapped-file: The attributes are stored in the hidden .virtfs_metadata directory. Directories exported by this security model cannot interact with other unix tools.<br />
# none: Same as "passthrough" except the sever won't report failures if it fails to set file attributes like ownership (chown). This makes a passthrough like security model usable for people who run kvm as non root.<br />
<br />
* writeout=immediate: This is an optional argument. The only supported value is "immediate". This means that host page cache will be used to read and write data but write notification will be sent to the guest only when the data has been reported as written by the storage subsystem.<br />
<br />
* readonly: Enables exporting 9p share as a readonly mount for guests. By default read-write access is given.<br />
<br />
* socket=<b>SOCKET</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket file for communicating with virtfs-proxy-helper<br />
<br />
* sock_fd=<b>SOCK_FD</b>: This option is only available for the "proxy" fsdriver. It enables "proxy" filesystem driver to use passed socket descriptor for communicating with virtfs-proxy-helper. Usually a helper like libvirt will create socketpair and pass one of the fds as sock_fd.<br />
<br />
* fmode=<b>FMODE</b>: Specifies the default mode for newly created files on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* dmode=<b>DMODE</b>: Specifies the default mode for newly created directories on the host. Works only with security models "mapped-xattr" and "mapped-file".<br />
<br />
* mount_tag=<b>MOUNT_TAG</b>: Specifies the tag name to be used by the guest to mount this export point.<br />
<br />
* multidevs=remap|forbid|warn: Specifies how to deal with multiple devices being shared with a 9p export, i.e. to avoid file ID collisions. Supported behaviours are either:<br />
# warn: This is the default behaviour on which virtfs 9p expects only one device to be shared with the same export, and if more than one device is shared and accessed via the same 9p export then only a warning message is logged (once) by qemu on host side.<br />
# remap: In order to avoid file ID collisions on guest you should either create a separate virtfs export for each device to be shared with guests (recommended way) or you might use "remap" instead which allows you to share multiple devices with only one export instead, which is achieved by remapping the original inode numbers from host to guest in a way that would prevent such collisions. Remapping inodes in such use cases is required because the original device IDs from host are never passed and exposed on guest. Instead all files of an export shared with virtfs always share the same device id on guest. So two files with identical inode numbers but from actually different devices on host would otherwise cause a file ID collision and hence potential misbehaviours on guest.<br />
# forbid: Assumes like "warn" that only one device is shared by the same export, however it will not only log a warning message but also deny access to additional devices on guest. Note though that "forbid" does currently not block all possible file access operations (e.g. readdir() would still return entries from other devices).<br />
<br />
== Starting the Guest using libvirt ==<br />
<br />
If using libvirt for management of QEMU/KVM virtual machines, the <filesystem> element can be used to setup 9p sharing for guests<br />
<br />
<filesystem type='mount' accessmode='$security_model'><br />
<source dir='$hostpath'/><br />
<target dir='$mount_tag'/><br />
</filesystem><br />
<br />
In the above XML, the source directory will contain the host path that is to be exported. The target directory should be filled with the mount tag for the device, which despite its name, does not have to actually be a directory path - any string 32 characters or less can be used. The accessmode attribute determines the sharing mode, one of 'passthrough', 'mapped' or 'squashed'.<br />
<br />
There is no equivalent of the QEMU 'id' attribute, since that is automatically filled in by libvirt. Libvirt will also automatically assign a PCI address for the 9p device, though that can be overridden if desired.<br />
<br />
== Mounting the shared path ==<br />
You can mount the shared folder using<br />
mount -t 9p -o trans=virtio [mount tag] [mount point] -oversion=9p2000.L<br />
<br />
* mount tag: As specified in Qemu commandline.<br />
* mount point: Path to mount point.<br />
* trans: Transport method (here virtio for using 9P over virtio) <br />
* version: Protocol version. By default it is 9p2000.u .<br />
<br />
Other options that can be used include:<br />
* msize: Maximum packet size including any headers. By default it is 8KB.<br />
* access: Following are the access modes<br />
# access=user : If a user tries to access a file on v9fs filesystem for the first time, v9fs sends an attach command (Tattach) for that user. This is the default mode.<br />
# access=<uid> : It only allows the user with uid=<uid> to access the files on the mounted filesystem<br />
# access=any : v9fs does single attach and performs all operations as one user <br />
# access=client : Fetches access control list values from the server and does an access check on the client.<br />
<br />
<!-- NOTE: anchor 'msize' is linked by a QEMU 9pfs log message in 9p.c --><br />
<span id="msize"></span><br />
== Performance Considerations (msize) ==<br />
You should set an appropriate value for option "msize" on client (guest OS) side to avoid degraded file I/O performance. This 9P option is only available on client side. If you omit to specify a value for "msize" with a Linux 9P client, the client would fall back to its default value which was prior to Linux kernel v5.15 only 8 kiB which resulted in very poor performance. With [https://github.com/torvalds/linux/commit/9c4d94dc9a64426d2fa0255097a3a84f6ff2eebe#diff-8ca710cee9d036f79b388ea417a11afa79f70bdbfca99c938e750e4ff3b4402d Linux kernel v5.15 the default msize was raised to 128 kiB], which [https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01003.html still limits performance on most machines].<br />
<br />
A good value for "msize" depends on the file I/O potential of the underlying storage on host side (i.e. a feature invisible to the client), and then you still might want to trade off between performance profit and additional RAM costs, i.e. with growing "msize" (RAM occupation) performance still increases, but the performance gain (delta) will shrink continuously.<br />
<br />
For that reason it is recommended to benchmark and manually pick an appropriate value for 'msize' for your use case by yourself. As a starting point, you might start by picking something between 10 MiB .. >100 MiB for a spindle based SATA storage, whereas for a PCIe based Flash storage you might pick several hundred MiB or more. Then create some large file on host side (e.g. 12 GiB):<br />
<br />
dd if=/dev/zero of=test.dat bs=1G count=12<br />
<br />
and measure how long it takes reading the file on guest OS side:<br />
<br />
time cat test.dat > /dev/null<br />
<br />
then repeat with different values for "msize" to find a good value.<br />
<br />
== Example ==<br />
An example usage of the above steps (tried on an Ubuntu Lucid Lynx system):<br />
<br />
1. Download the latest kernel source from http://www.kernel.org<br />
<br />
2. Build kernel image<br />
* Ensure relevant kernel configuration options are enabled pertaining to <br />
# Virtualization<br />
# KVM<br />
# Virtio<br />
# 9P<br />
<br />
* Compile <br />
<br />
3. Get the latest QEMU git repository in a fresh directory using<br />
git clone git://repo.or.cz/qemu.git<br />
<br />
4. Configure QEMU<br />
<br />
For example for i386-softmm with debugging support, use <br />
./configure '--target-list=i386-softmmu' '--enable-debug' '--enable-kvm' '--prefix=/home/guest/9p_setup/qemu/'<br />
<br />
If this step prompts ATTR/XATTR as 'no', install packages libattr1 and libattr1-dev on your system using:<br />
sudo apt-get install libattr1<br />
sudo apt-get install libattr1-dev<br />
<br />
5. Compile QEMU<br />
make<br />
make install<br />
<br />
6. Guest OS installation (Installing Ubuntu Lucid Lynx here)<br />
* Create Guest image (here of size 2 GB)<br />
dd if=/dev/zero of=/home/guest/9p_setup/ubuntu-lucid.img bs=1M count=2000 <br />
* Burn a filesystem on the image file (ext4 here)<br />
mkfs.ext4 /home/guest/9p_setup/ubuntu-lucid.img <br />
* Mount the image file <br />
mount -o loop /home/guest/9p_setup/ubuntu-lucid.img /mnt/temp_mount<br />
* Install the Guest OS<br />
<br />
For installing a Debain system you can use package ''debootstrap''<br />
debootstrap lucid /mnt/temp_mount <br />
Once the OS is installed, unmount the guest image.<br />
umount /mnt/temp_mount<br />
<br />
7. Load the KVM modules on the host (for intel here)<br />
modprobe kvm<br />
modprobe kvm_intel <br />
<br />
8. Start the Guest OS<br />
<br />
/home/guest/9p_setup/qemu/bin/qemu -drive file=/home/guest/9p_setup/ubuntu-lucid.img,if=virtio \ <br />
-kernel /path/to/kernel/bzImage -append "console=ttyS0 root=/dev/vda" -m 512 -smp 1 \<br />
-fsdev local,id=test_dev,path=/home/guest/9p_setup/shared,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=test_mount -enable-kvm <br />
<br />
The above command runs a VNC server. To view the guest OS, install and use any VNC viewer (for instance xclientvncviewer).<br />
<br />
9. Mounting shared folder<br />
<br />
Mount the shared folder on guest using<br />
mount -t 9p -o trans=virtio test_mount /tmp/shared/ -oversion=9p2000.L,posixacl,msize=104857600,cache=loose<br />
<br />
In the above example the folder /home/guest/9p_setup/shared of the host is shared with the folder /tmp/shared on the guest.<br />
<br />
[[Category:User documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/6.2&diff=10596ChangeLog/6.22021-09-02T12:15:21Z<p>Schoenebeck: /* 9pfs */ fixed crash when handling 'Twalk' requests</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/about/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* M-profile CPUs now emulate trapping on division by zero via CCR.DIV_0_TRP<br />
* The pl011 UART model now supports sending 'break'<br />
* The Fujitsu A64FX processor model is now supported in TCG ('-cpu a64fx')<br />
* The M-profile MVE extension is now supported, and enabled in the Cortex-M55<br />
* The deprecated machine names 'raspi2' and 'raspi3' have been removed; use 'raspi2b' and 'raspi3b' instead<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
<br />
=== PowerPC ===<br />
- Improved POWER10 support for the 'powernv' machine<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
* Fixup virt flash node<br />
* Don't override users supplied ISA version<br />
* Fixup some CSR accesses<br />
* Fix an overflow in the SiFive CLINT (https://gitlab.com/qemu-project/qemu/-/issues/493)<br />
<br />
=== s390x ===<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
* New Snowridge-v4 CPU model, with split-lock-detect feature disabled<br />
<br />
==== KVM ====<br />
<br />
==== x86_64 ====<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
* Fixed an occasional [https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg00320.html crash when handling 'Twalk'] requests; thus bug was introduced in QEMU 6.1.0.<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
=== Authorization subsystem ===<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
<br />
== User-mode emulation ==<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
== TCG ==<br />
<br />
== Guest agent ==<br />
<br />
== Build Information ==<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
<br />
=== VM Based Builds ===<br />
<br />
=== Build Dependencies ===<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/6.2]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10591Documentation/9p2021-08-27T14:20:45Z<p>Schoenebeck: add 'Fuzzing' section</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Fuzzing ==<br />
<br />
There is [https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04108.html generic fuzzing support] for 9p in QEMU; [https://github.com/google/oss-fuzz oss-fuzz] can be used to run fuzzing on 9p.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=ChangeLog/6.1&diff=10522ChangeLog/6.12021-07-13T16:41:04Z<p>Schoenebeck: /* 9pfs */ add 'Fix potential information leak'</p>
<hr />
<div><br />
== System emulation ==<br />
<br />
=== Incompatible changes ===<br />
<br />
Consult the [https://qemu-project.gitlab.io/qemu/system/removed-features.html 'Removed features' ] page for details of suggested replacement functionality<br />
<br />
* The 'moxie' target has been removed without replacement. There were no known users of this CPU type anymore and no binaries available which could be used for testing.<br />
* The 'lm32' target has been removed without replacement. The only public user of this architecture was the milkymist project, which has been completely inactive for years, and there was never an upstream Linux port.<br />
* The 'unicore32' target has been removed without replacement. Support for this CPU was removed from the upstream Linux kernel a while ago already, and there is no available upstream toolchain to build binaries for it.<br />
* The 'sheepdog' driver has been removed. The corresponding upstream server project is no longer maintained. Users are recommended to switch to an alternative distributed block device driver such as RBD.<br />
* The "info cpustats" HMP command has been removed. It already didn't produce output.<br />
<br />
=== New deprecated options and features ===<br />
<br />
Consult the [https://www.qemu.org/docs/master/system/deprecated.html "Deprecated Features"] chapter of the QEMU System Emulation User's Guide for further details of the deprecations and their suggested replacements.<br />
<br />
* Using non-persistent backing file with pmem=on is now deprecated<br />
<br />
=== 68k ===<br />
<br />
=== Alpha ===<br />
<br />
=== Arm ===<br />
<br />
* New Aspeed machines: rainier-bmc, quanta-q7l1-bmc<br />
* New npcm7xx machine: quanta-gbs-bmc<br />
* New Cortex-M3 based machine: stm32vldiscovery<br />
* Model for Aspeed's Hash and Crypto Engine<br />
* The mps3-an524 board now supports the alternate memory map (via "-machine remap=QSPI")<br />
* SVE2 is now emulated, including bfloat16 support<br />
* FEAT_I8MM is now emulated (integer matrix multiply accumulate)<br />
* FEAT_TLBIOS is now emulated (TLB invalidate instructions in Outer Shareable domain)<br />
* FEAT_TLBRANGE is now emulated (TLB range invalidate instructions)<br />
* FEAT_BF16 and FEAT_AA32BF16 are now emulated (bfloat16 support for AArch64 and AArch32)<br />
* FEAT_MTE3 (MTE asymmetric fault handling) is now emulated<br />
<br />
=== AVR ===<br />
<br />
=== Hexagon ===<br />
<br />
=== HPPA ===<br />
<br />
=== Microblaze ===<br />
<br />
=== MIPS ===<br />
<br />
=== Nios2 ===<br />
<br />
=== OpenRISC ===<br />
<br />
=== PowerPC ===<br />
* With recent enough guests now able to (sometimes) detect hot unplug failures on pseries machine type<br />
* Greatly increased maximum cpu count for pseries; it's now basically arbitrarily high - you will hit KVM or emulation limits before a fixed cut off<br />
* Implemented some POWER10 prefixed instructions in TCG<br />
* Optional support for the H_RPT_INVALIDATE hypercall on pseries machine<br />
* Experimental "Virtual Open Firmware" option for pseries (and Pegasos2) which implements most of the firmware behaviour inside qemu.<br />
* Updated ppce500 firmware image, which should fix pci support<br />
* Added 'pegasos2' machine type emulating the Genesi/bPlan Pegasos II board<br />
* 'mac99' machine now limited to 2GiB of RAM (previously it was allowed on the command line, although it probably wouldn't work properly)<br />
<br />
=== Renesas RX ===<br />
<br />
=== Renesas SH ===<br />
<br />
=== RISC-V ===<br />
* Clenaup some left over v1.9 code<br />
* Documentation improvements<br />
* Support for the shakti_c machine<br />
* Internal cleanup of the CSR accesses<br />
* Updates to the OpenTitan platform<br />
* Add support for the OpenTitan timer<br />
* Support for the virtio-vga<br />
* Fix for the saturate subtract in vector extensions (https://bugs.launchpad.net/qemu/+bug/1923629)<br />
* Experimental support for the ePMP spec<br />
* Initial support for the experimental Bit Manip extension<br />
* Update the PLIC and CLINT DT bindings<br />
* Improve documentation for RISC-V machines<br />
* Support direct kernel boot for microchip_pfsoc<br />
* Fix WFI exception behaviour<br />
* Improve CSR printing<br />
* Fix a GDB CSR bug<br />
* A range of other internal code cleanups and bug fixes<br />
<br />
=== s390x ===<br />
<br />
* The s390-ccw bios can now be compiled with Clang, too<br />
* tcg now supports the vector-enhancements facility, and the 'qemu' cpu model has been bumped to a stripped-down z14 GA2<br />
** this should enable distributions built for the z14 to be run under tcg<br />
* cpu models for gen16 have been added<br />
<br />
=== SPARC ===<br />
<br />
=== Tricore ===<br />
<br />
=== x86 ===<br />
<br />
* New CPU model versions added with XSAVES enabled: <code>Skylake-Client-v4</code>, <code>Skylake-Server-v5</code>, <code>Cascadelake-Server-v5</code>, <code>Cooperlake-v2</code>, <code>Icelake-Client-v3</code>, <code>Icelake-Server-v5</code>, <code>Denverton-v3</code>, <code>Snowridge-v3</code>, <code>Dhyana-v2</code><br />
* <code>hv-passthrough</code> won't enable Hyper-V feature flags that are unknown to QEMU<br />
* New <code>bus-lock-ratelimit</code> machine option for rate limiting bus locks by guests<br />
<br />
==== x86_64 ====<br />
<br />
* family/model/stepping of CPU models <code>qemu64</code> (all accelerators) and <code>max</code> (TCG only) were updated to values corresponding to a 64-bit AMD processor (fixes [https://gitlab.com/qemu-project/qemu/-/issues/191 #191])<br />
<br />
=== Xtensa ===<br />
<br />
=== Device emulation and assignment ===<br />
<br />
==== ACPI ====<br />
<br />
==== Audio ====<br />
<br />
==== Block devices ====<br />
<br />
==== Graphics ====<br />
<br />
==== I2C ====<br />
<br />
* Modified the I2C base to allow I2C muxes to be added<br />
* Added support for the pca9546 and pca9548 I2C muxes.<br />
* Added support for PMBus and several PMBus devices.<br />
* Move sensor devices into a new sensor directory.<br />
* Remove the interfaces with the error-prone transfer direction and use the read/write functions instead.<br />
<br />
==== Input devices ====<br />
<br />
==== IPMI ====<br />
<br />
* Fixed type of watchdog_expired so vmstate transfer works. Otherwise a vmstate transfer could end up with the wrong data for that field.<br />
<br />
==== Multi-process QEMU ====<br />
<br />
==== Network devices ====<br />
<br />
==== NVDIMM ====<br />
<br />
==== NVMe ====<br />
<br />
===== Emulated NVMe Controller =====<br />
<br />
==== PCI/PCIe ====<br />
<br />
==== SCSI ====<br />
<br />
==== SD card ====<br />
<br />
==== SMBIOS ====<br />
<br />
==== TPM ====<br />
<br />
==== USB ====<br />
<br />
<br />
==== VFIO ====<br />
<br />
==== virtio ====<br />
<br />
* virtio-mem now works with vfio<br />
<br />
==== Xen ====<br />
<br />
==== fw_cfg ====<br />
<br />
==== 9pfs ====<br />
<br />
* Reduce latency of Twalk request (directory tree traversal)<br />
* Fix potential information leak if mtime of export root directory changed (security impact in practice either none or low).<br />
<br />
==== virtiofs ====<br />
<br />
==== Semihosting ====<br />
<br />
=== Audio ===<br />
<br />
=== Character devices ===<br />
<br />
=== Crypto subsystem ===<br />
<br />
* The SASL configuration now recommends SCRAM-SHA-256 as the mechanism for simple password authentication<br />
* Documentation is provided outlining how to use the secret passing features<br />
<br />
=== Authorization subsystem ===<br />
<br />
* Documentation is provided outlining how to use the authorization framework for access control<br />
<br />
=== GUI ===<br />
<br />
=== GDBStub ===<br />
<br />
=== TCG Plugins ===<br />
* some memory leaks plugged in example plugins<br />
* syscall plugin can now summarise totals<br />
<br />
=== Host support ===<br />
<br />
=== Memory backends ===<br />
<br />
=== Migration ===<br />
<br />
=== Monitor ===<br />
<br />
==== QMP ====<br />
<br />
==== HMP ====<br />
<br />
=== Network ===<br />
<br />
=== Block device backends and tools ===<br />
* Fix a [https://gitlab.com/qemu-project/qemu/-/issues/218 regression] in qemu-nbd and qemu-storage-daemon handling file descriptors via socket activation.<br />
* The NBD client connection code has been refactored to operate as a background task, which in turn allows even better responsiveness in the retry code in dealing with a transient failure connection to a server.<br />
* <code>qemu-img map --output=json</code> now includes a <code>"present":''bool''</code> field to facilitate reconstructing which parts of a backing chain are actually present.<br />
<br />
=== Tracing ===<br />
<br />
=== Miscellaneous ===<br />
* The settings for the "-smp" option can be also passed to -M using a "smp." prefix, for example "-smp cpus=4" is now a synonym of "-M smp.cpus=4".<br />
<br />
== User-mode emulation ==<br />
<br />
=== binfmt_misc ===<br />
<br />
=== Hexagon ===<br />
<br />
== TCG ==<br />
<br />
* tricore now has check-tcg support and tests<br />
* hexagon now has check-tcg support and tests<br />
* fixed bug in replay HMP commands to accept full length icount<br />
<br />
== Guest agent ==<br />
<br />
== Build Information ==<br />
<br />
* CentOS 7 is no longer a supported build platform<br />
<br />
=== Python ===<br />
<br />
=== GIT submodules ===<br />
<br />
=== Container Based Builds ===<br />
* improvements to binfmt_misc containers<br />
<br />
=== Build Dependencies ===<br />
* minimum nettle is now 3.4<br />
* minimum libgcrypt is now 1.8.0<br />
* minimum gnutls is now 3.5.18<br />
* minimum glib is now 2.56<br />
* minimum gcc is now 7.5.0<br />
* minimum clang is now 6.0<br />
* minimum xcode clang is now 10.0<br />
* minimum libssh is now 0.8.7<br />
<br />
=== Windows ===<br />
<br />
=== Testing and CI ===<br />
<br />
== Known issues ==<br />
<br />
* see [[Planning/6.1]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10279Documentation/9p2021-03-16T15:49:48Z<p>Schoenebeck: /* 9p Filesystem Drivers */ fix typos</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
The original ambition for this driver was to allow QEMU subsystems to expose a synthetic API to the client, i.e. to expose some stats, information or any knob you can think of to the guest ''à la'' linux<br />
kernel /sys. This never gained momentum and remained totally unused for years, until a new use case was found : use it to implement 9p protocol validation tests. This fs driver is now exclusively used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [[#Synth Tests|9pfs test cases]] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be. This is the place to validate that the 9p<br />
server in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] honors the 9p protocol, e.g. [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] actually cancels a pending request. Testing of ''real life'' scenarios doesn't belong here : they should be performed with the "local" fs driver because this is what is used in production.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10232Documentation/9p2021-02-25T16:24:53Z<p>Schoenebeck: add section 'Roadmap'</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
This fs driver is only used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [#Synth Tests|9pfs test cases] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Roadmap ==<br />
<br />
This is a rough list of things that are planned to be changed in future.<br />
<br />
=== Implementation Plans ===<br />
<br />
* <b>Optimizations</b>:<br />
** <b>Reducing thread hops</b>: Right now in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] almost every request (its coroutine that is) is dispatched multiple times between 9p server's main thread and some worker thread back and forth. Every thread hop adds latency to the overall completion time of a request. The desired plan is to reduce the amount of thread hops to a minimum, ideally one 9p request would be dispatched exactly one time to a worker thread for all required filesystem related I/O subtasks and then dispatched back exactly one time back to main thread. Some work on this has already been done for [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request handling, as this was the request type suffering the most under large amount of thread hops, and reduction of those hops provided [https://lists.gnu.org/archive/html/qemu-devel/2020-01/msg05539.html significant performance improvements for Treaddir] handling. For other request types similar changes should be applied.<br />
** <b>Making Tflush non-blocking</b>: When handling a [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] request, server currently blocks the Tflush request's coroutine until the requested other I/O request was actually aborted. From the specs though Tflush should return immediately, and currently this blocking behaviour has a negative performance impact especially with 9p clients that do not support handling parallel requests.<br />
<br />
=== Protocol Plans ===<br />
<br />
These are some of the things that we might want to change on 9p protocol level in future. Right now this list just serves for roughly collecting some ideas for future protocol changes. Don't expect protocol changes in near future though, this will definitely take a long time.<br />
<br />
* <b>Fixes</b>:<br />
** <b>Increase qid.path Size</b>: The [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor32 qid.path] (which should not be confused with a filesystem path like "/foo/bar/") is an integer supposed to uniquely identify a file, which is currently a 64-bit number. A filesystem on host often has things like hard links which means different pathes on the filesystem might actually point to the same file and a numeric file ID in general is used to detect that by systems. Certain services like Samba are using this information, and incorrect handling (i.e. collisions) of unique file IDs can cause misbehaviours. The problem though is that 9p might share more than one filesystem anywhere under its 9p share's root path. So a truly unique file ID under Linux for instance is the combination of the mounted filesystem's device ID and the individual file's inode number, which is larger than 64-bit combined and hence would exceed 9p protocol's qid.path field. By default we only pass the file's inode number via qid.path, so we are assuming that only one filesystem is shared per 9p share. If multiple filesystems are detected, a warning is logged at runtime noting that file ID collisions are possible, and suggesting to enable the multidevs=remap option, which (if enabled) remaps file IDs from host to guest in a way that would prevent such collisions. In practice this remapping should happen with no noticable overhead, but obviously in a future protocol change this should be addressed by simply increasing the qid.path e.g. to 128 bits so that we won't need to remap file IDs in future anymore.<br />
* <b>Cleanup</b>:<br />
** <b>Merge Dialects</b>: It might make sense merging the individual 9p dialects to just one protocol version for all systems to reduce complexity and confusion.<br />
* <b>Optimizations</b>:<br />
** <b>Extend Treaddir</b>: To retrieve a list of directory entries a [https://github.com/chaos/diod/blob/master/protocol.md#readdir---read-a-directory Treaddir] request is sent by clients. In practice, this request is followed by a large amount of individual requests for getting more detailed information about each directory entry like permissions, ownership and so forth. For that reason it might make sense for allowing to optionally return such common detailed information already with a single Rreaddir response to avoid overhead.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10231Documentation/9p2021-02-25T14:16:12Z<p>Schoenebeck: added section 'Parallelism'</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
This fs driver is only used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [#Synth Tests|9pfs test cases] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
=== Parallelism ===<br />
<br />
Incoming 9p requests are processed by the 9p server's main thread in the order they arrived. However while 9p requests (i.e. their coroutine) are dispatched for filesystem I/O to a worker thread, the 9p server's main thread would handle another 9p request (if any) in the meantime. Each 9p request (i.e. coroutine) might be dispatched between main thread and some worker thread several times (for the same 9p request that is) before the 9p request is completed by the server and a 9p response eventually been sent to client. So pending 9p requests are therefore handled in parallel by the 9p server, and there is no guarantee that 9p replies are transmitted in the exact same order as their 9p requests originally came in.<br />
<br />
Carrying out several 9p requests simultaniously allows higher performance, provided that the 9p client implementation supports parallelism as well. Apart from performance aspects, the 9p protocol requires parallel handling of [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, to allow aborting I/O requests that might be blocking for a long time, e.g. to prevent them from hanging for good on server side. We do have a test case for this Tflush behaviour by the way.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10229Documentation/9p2021-02-24T19:27:06Z<p>Schoenebeck: Elaborated section Tests and Contribute</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
This fs driver is only used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [#Synth Tests|9pfs test cases] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file. If all runs well and all tests pass, you should see an output like this:<br />
<br />
...<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/pci-device/pci-device-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio/virtio-tests/nop: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/version/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/attach/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/no_slash: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/walk/dotdot_from_root: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/lopen/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/write/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/basic: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_512: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_256: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/config: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/create_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/symlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_symlink: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/hardlink_file: OK<br />
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_hardlink: OK<br />
...<br />
<br />
If you don't see all test cases appearing on screen, or if some problem occurs, try adding --verbose to the command line:<br />
<br />
tests/qtest/qos-test -m slow --verbose<br />
<br />
Keep in mind that QEMU's qtest framework automatically enables just those test cases that are supported by your machine and configuration. With the --verbose switch you will see exactly which individual tests are enabled and which not at the beginning of the output:<br />
<br />
...<br />
# ALL QGRAPH NODES: {<br />
# name='e1000e-tests/rx' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/synth/readdir/basic' type=3 cmd_line='(null)' [available]<br />
# name='virtio-scsi-pci' type=1 cmd_line=' -device virtio-scsi-pci' [available]<br />
# name='virtio-9p-tests/synth/readdir/split_128' type=3 cmd_line='(null)' [available]<br />
# name='virtio-net-tests/vhost-user/multiqueue' type=3 cmd_line='(null)' [available]<br />
# name='virtio-9p-tests/local/unlinkat_symlink' type=3 cmd_line='(null)' [available]<br />
...<br />
<br />
And for each test case being executed, you can see the precise QEMU command line that is used for that individual test:<br />
<br />
...<br />
GTest: run: /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/local/unlinkat_dir<br />
# Run QEMU with: '-M pc -fsdev local,id=fsdev0,path='/home/me/src/qemu/build/qtest-9p-local-ELKQGv',security_model=mapped-xattr -device virtio-9p-pci,fsdev=fsdev0,addr=04.0,mount_tag=qtest'<br />
GTest: result: OK<br />
...<br />
<br />
You can also just run one or a smaller list of tests to concentrate on whatever you are working on. To get a list of all test cases:<br />
<br />
tests/qtest/qos-test -l<br />
<br />
Then pass the respective test case name(s) as argument -p to run them as "partial" tests, e.g.:<br />
<br />
tests/qtest/qos-test -p /x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/readdir/split_128<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
On doubt, just send a message to [https://lists.nongnu.org/mailman/listinfo/qemu-devel qemu-devel] first; but as this is a high traffic mailing list, don't forget to add "9p" to the subject line to prevent your message from ending up unseen.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeckhttps://wiki.qemu.org/index.php?title=Documentation/9p&diff=10197Documentation/9p2021-02-12T14:03:18Z<p>Schoenebeck: /* Threads and Coroutines */ add section Main Thread</p>
<hr />
<div>= 9pfs Developers Documentation =<br />
<br />
This page is intended for developers who want to put their hands on the <b>9p passthrough filesystem</b> implementation in QEMU. For regular user aspects you rather want to look at the separate page [[Documentation/9psetup]] instead.<br />
<br />
== 9p Protocol ==<br />
<br />
9pfs uses the [https://en.wikipedia.org/wiki/9P_(protocol) Plan 9 Filesystem Protocol] for communicating the file I/O operations between guest systems (clients) and the [[#9P Server|9p server (see below)]]. There are a bunch of separate documents specifying different variants of the protocol, which might be a bit confusing at first, so here is a summary of the individual protocol flavours.<br />
<br />
=== Introduction ===<br />
If this is your first time getting in touch with the 9p protocol then you might have a look at this introduction by Eric Van Hensbergen which is an easy understandable text explaining how the protocol works, including examples of individual requests and their response messages: [https://www.usenix.org/legacy/events/usenix05/tech/freenix/full_papers/hensbergen/hensbergen_html/index.html Using 9P2000 Under Linux]<br />
<br />
There are currently 3 dialects of the 9p network protocol called "9p2000", "9p2000.u" and "9p2000.L". Note that QEMU's 9pfs implementation only supports either "9p2000.u" or "9p2000.L".<br />
<br />
=== 9p2000 ===<br />
This is the basis of the 9p protocol the other two dialects derive from. This is the specification of the protocol:<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.html 9p2000 Protocol]<br />
<br />
=== 9p2000.u ===<br />
The "9p2000.u" dialect adds extensions and minor adjustments to the protocol for Unix systems, especially for common data types available on a Unix system. For instance the basic "9p2000" protocol version only returns an error text if some error occurred on server side, whereas "9p2000.u" also returns an appropriate, common POSIX error code for the individual error.<br />
[http://ericvh.github.io/9p-rfc/rfc9p2000.u.html 9p2000.u Protocol]<br />
<br />
=== 9p2000.L ===<br />
Similar to the "9p2000.u" dialect, the "9p2000.L" dialect adds extensions and minor adjustments of the protocol specifically for Linux systems. Again this is mostly targeted at specializing for data types of system calls available on a Linux system.<br />
[https://github.com/chaos/diod/blob/master/protocol.md 9p2000.L Protocol]<br />
<br />
== Topology ==<br />
<br />
The following figure shows the basic structure of the 9pfs implementation in QEMU.<br />
<br />
[[File:9pfs_topology.png|frameless|upright=3.0]]<br />
<br />
The implementation consists of 3 modular components: 9p server, 9p filesystem drivers and 9p transport drivers. The 9p client on guest OS side is not part of the QEMU code base. There are a bunch of 9p client implementations e.g. for individual OSes. The most commonly used one is the client that comes with the stock Linux kernel. [https://github.com/torvalds/linux/tree/master/fs/9p Linux 9p Client]<br />
<br />
=== 9p Server ===<br />
<br />
This is the controller portion of the 9pfs code base which handles the raw 9p network protocol handling, and the general high-level control flow of 9p clients' (the guest systems) 9p requests. The 9p server is basically a full-fledged file server and accordingly it has the highest code complexity in the 9pfs code base, most of this is in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] source file.<br />
<br />
=== 9p Filesystem Drivers ===<br />
<br />
The 9p server uses a [https://en.wikipedia.org/wiki/Virtual_file_system VFS] layer for the actual file operations, which makes it flexible from where the file storage data comes from and how exactly that data is actually accessed. There are currently 3 different 9p file system driver implementations available:<br />
<br />
1. <b>local</b> fs driver<br />
<br />
This is the most common fs driver which is used most often with 9p in practice. It basically just maps the individual VFS functions (more or less) directly to the host system's file system functions like open(), read(), write(), etc. You find this fs driver implementation in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-local.c hw/9pfs/9p-local.c] source file.<br />
<br />
Most of the "local" driver's code deals with remapping of permissions, which solves a fundamental problem: a high privileged user like "root" (and the kernel itself) on the guest system expects to have full control over its filesystems. For instance it needs to be able to change the owning user and group of files and directories, be able to add, change and remove attributes, changing any file permissions and so forth. Without these assumed permissions, it would nearly be impossible to run any useful service on guest side ontop of a 9pfs filesystem. The QEMU binary on the host system however is usually not running as privileged user for security reasons, so the 9pfs server can actually not do all those things on the file system it has access to on host side.<br />
<br />
For that reason the "local" driver supports remapping of file permissions and owners. So when the "remap" driver option of the "local" driver is used (like it's usually the case on a production system), then the "local" driver pretends to the guest system it could do all those things, but in reality it just maps things like permissions and owning users and groups as additional data on the filesystem, either as some hidden files, or as extended attributes (the latter being recommended) which are not directly exposed to the guest OS. With remapping enabled, you can actually run an entire guest OS on a single 9pfs root filesystem already.<br />
<br />
2. <b>proxy</b> fs driver<br />
<br />
This fs driver was supposed to dispatch the VFS functions to be called from a separate process (by [https://gitlab.com/qemu-project/qemu/-/blob/master/fsdev/virtfs-proxy-helper.c fsdev/virtfs-proxy-helper]) and increasing security by that separation, however the "proxy" driver is currently not considered to be production grade. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-proxy.c hw/9pfs/9p-proxy.c]<br />
<br />
However the "proxy" fs driver shows some potential of 9pfs. As a fs driver for 9pfs is just a thin, lite-weight VFS layer to the actual fs data, it would for instance be considerable to implement a fs driver that allows the actual filesystem to be kept entirely on a separate storage system and therefore increasing security and availability. If an attacker would then e.g. be able to gain full control over the 9pfs host system, the attacker would still not have access to the raw filesystem. So with a separate [https://en.wikipedia.org/wiki/Copy-on-write COW] storage system, an attacker might be able to temporarily command data changes on storage side, but the uncompromised data before the attack would remain available and an immediate rollback would therefore be possible. And due to not having direct raw access to the storage filesystem, the attack could then be audited later on in detail as the attacker would not be able to wipe its traces on the storage logs.<br />
<br />
3. <b>synth</b> fs driver<br />
<br />
This fs driver is only used for development purposes. It just simulates individual filesystem operations with specific test scenarios in mind, and therefore is not useful for anything on a production system. The main purpose of the "synth" fs driver is to simulate certain fs behaviours that would be hard to trigger with a regular (production) fs driver like the "local" fs driver for instance. Right now the synth fs driver is used by the automated [#Synth Tests|9pfs test cases] and by the automated 9pfs fuzzing code. The automated test cases use the "synth" fs driver for instance to check the 9p server's correct behaviour on 9p [http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 Tflush] requests, which a client may send to abort a file I/O operation that might already be blocking for a long time. In general the "synth" driver is very useful for effectively simulating any multi-threaded use case scenarios. [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p-synth.c hw/9pfs/9p-synth.c]<br />
<br />
=== 9p Transport Drivers ===<br />
<br />
The third component of the 9pfs implementation in QEMU is the "transport" driver, which is the communication channel between host system and guest system used by the 9p server. There are currently two 9p transport driver implementations available in QEMU:<br />
<br />
1. <b>virtio</b> transport driver<br />
<br />
The 9p "virtio" transport driver uses e.g. a virtual PCI device and ontop the [https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html virtio] protocol to transfer the 9p messages between clients (guest systems) and 9p server (host system). [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/virtio-9p-device.c hw/9pfs/virtio-9p-device.c]<br />
<br />
2. <b>Xen</b> transport driver<br />
<br />
TODO [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/xen-9p-backend.c hw/9pfs/xen-9p-backend.c]<br />
<br />
== Threads and Coroutines ==<br />
<br />
=== Coroutines ===<br />
<br />
The 9pfs implementation in QEMU heavily uses [https://en.wikipedia.org/wiki/Coroutine Coroutines] to handle individual 9p requests.<br />
<br />
If you haven't used Coroutines before, simply put: a Coroutine manages its own stack memory. That's it. So when a thread enters the scope of a Coroutine then everything that is usually put on the thread's own stack memory (and the latter being always firmly tied to that thread) is rather put on the Coroutine's stack memory instead. The advantage is, as Coroutines are just data structures, they can be passed from one thread to another. So Coroutines allow to use memory stacks that are decoupled from specific threads.<br />
<br />
Another important aspect to know is that once a thread leaves the scope of a Coroutine, then that thread is back at using its own thread-owned stack again.<br />
<br />
[[File:Coroutines_stacks.png|frameless|upright=2.4]]<br />
<br />
Each coroutine instance usually handles a certain "collaborative" task, where "collaborative" means that individual parts of the task usually need to be executed by different threads before the overall task eventually can be considered as fulfilled. So if a thread knows it has to start a new task that may also require other threads to process parts of that task, then that thread allocates a Coroutine instance. The thread then "enters" the Coroutine scope, which means starting at this point every local variable and all following function calls (function call stack, including function arguments and their return values) are put on the Coroutine's stack memory instead of the thread's own memory stack (as it would usually). So now the thread would call arbitrary functions, run loops, create local variables inside them, etc. and then at a certain point the thread realizes that something of the task needs to be handled by a different thread next. At this point the thread leaves the Coroutine scope (e.g. by either "yielding" or "awaiting"), it then passes the Coroutine instance to another thread which in turn enters the Coroutine scope and finds the call stack and all local variables exactly as it was left by the previous thread using the Coroutine instance before.<br />
<br />
It is important to understand that Coroutines are really just covering memory stack aspects. They are not dealing with any multi-threading aspects by themselves. Which has the advantage that Coroutines can be combined with any multi-threading concept & framework (e.g. POSIX threads, Grand Central Dispatch, ...).<br />
<br />
=== Control Flow ===<br />
<br />
The following figure shows the control flow and relationship of Threads and Coroutines of the 9pfs implementation.<br />
<br />
[[File:9pfs_control_flow.png|frameless|upright=3.5]]<br />
<br />
Getting back to 9pfs as concrete user of Coroutines, every 9P client request that comes in on 9P server side is a task the 9P server needs to fulfill on behalf of the client / guest OS. So for every 9P request a Coroutine instance is allocated. Then the 9P server's main thread "enters" the Coroutine scope to start processing the client's 9P request. At a certain point something of that request usually needs to be handled by the fs driver which means the fs driver needs to call file I/O syscall(s) which might block for a long time. Therefore the 9P server leaves the Coroutine at that point and dispatches the Coroutine instance to a QEMU worker thread which then executes the fs driver function(s) for fulfilling the actual file system I/O task(s). Once the worker thread is done with the fs I/O task portion it leaves the Coroutine scope and dispatches the Coroutine data structure back to the server's main thread, which in turn would re-enter the Coroutine and continue processing the request with the result as provided by the worker thread. So yet again, main thread finds the call stack and local variables exactly as it was left by the worker thread when it re-rentered the Coroutine.<br />
<br />
The primary major advantages of this design is that the 9P server's main thread can continue handling another 9P request while a worker thread would do the (maybe long taking) fs driver I/O subtask(s), and yet<br />
code complexity is reduced substantially in comparison to other multi-threaded task handling concepts, which also improves safety.<br />
<br />
=== Main Thread ===<br />
<br />
Almost the entire 9p server is running on the QEMU main thread, with the exception of some worker threads handling fs driver file I/O tasks as described above. So basically everything in [https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c hw/9pfs/9p.c] you can assume to run on main thread, except of function calls there with the naming scheme *_co_*(). So if you find a call with such a function name pattern you can know immediately that this function dispatches the Coroutine at this point to a worker thread (by using the macro v9fs_co_run_in_worker(...) inside its function implementation), and when the *_co_*() function call returned, it already dispatched the Coroutine back to main thread.<br />
<br />
== Test Cases ==<br />
<br />
Whatever you are doing there on the 9pfs code base, please run the automated test cases after you modified the source code to ensure that your changes did not break the expected behaviour of 9pfs. Running the tests is very simple and does not require any guest OS installation, nor is any guest OS booted, and for that reason you can run them in few seconds. The test cases are also a very efficient way to check whether your 9pfs changes are actually doing what you want them to while still coding.<br />
<br />
To run the 9pfs tests e.g. on a x86 system, all you need to do is executing the following two commands:<br />
<br />
export QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64<br />
tests/qtest/qos-test -m slow<br />
<br />
All 9pfs test cases are in [https://gitlab.com/qemu-project/qemu/-/blob/master/tests/qtest/virtio-9p-test.c tests/qtest/virtio-9p-test.c] source file.<br />
<br />
=== Synth Tests ===<br />
<br />
As you can see at the end of the virtio-9p-test.c file, the 9pfs test cases are split into two groups of tests. The first group of tests use the "synth" fs driver, so all file I/O operations are simulated and basically you can add all kinds of hacks into the synth driver to simulate whatever you need to test certain fs behaviours, no matter how exotic that behaviour might be.<br />
<br />
=== Local Tests ===<br />
<br />
The second group of tests use the "local" fs driver, so they are actually operating on real dirs and files in a test directory on the host filesystem. Some issues that happened in the past were caused by a combination of the 9p server and the actual "local" fs driver that's usually used on production machines. For that reason this group of tests are covering issues thay may happen across these two components of 9pfs. Again, this works without any guest OS, which has the advantage that you can test the behaviour independent of third-party 9p client implementations.<br />
<br />
== Contribute ==<br />
<br />
Please refer to [[Contribute/SubmitAPatch]] for instructions about how to send your patches.<br />
<br />
[[Category:Developer documentation]]</div>Schoenebeck