Features/GuestAgent: Difference between revisions

From QEMU
(Use wait=off instead of the obsolete nowait parameter)
 
(30 intermediate revisions by 4 users not shown)
Line 12: Line 12:
   qemu \
   qemu \
   ...
   ...
   -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
   -chardev socket,path=/tmp/qga.sock,server=on,wait=off,id=qga0 \
   -device virtio-serial \
   -device virtio-serial \
   -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
   -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
Line 30: Line 30:


The current list of supported RPCs is documented in qemu.git/qapi-schema-guest.json.
The current list of supported RPCs is documented in qemu.git/qapi-schema-guest.json.
== Example usage ==
build:
  # for linux
  ./configure
  make qemu-ga #should be built on|for target guest
  # for Windows using MinGW on linux/cygwin (example for Fedora 18)
  ./configure --enable-guest-agent --cross-prefix=i686-w64-mingw32-
  make qemu-ga.exe
install:
  # for linux
  sudo make install
  # for Windows
  1. make sure virtio-serial Windows drivers are installed and
    working correctly (vioser-test utility that ships with
    virtio-win ISO can help to confirm this)
    (http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/)
  2. copy qemu-ga.exe to a directory on your windows guest, along
    with the following libraries all extracted/installed to the
    same directory:
    a) Contents of 'bin' directory from 'GLib' runtime
        (http://www.gtk.org/download/win32.php)
    b) 'intl.dll' for 'gettext' runtime
        (http://www.gtk.org/download/win32.php)
    c) depending on your build environment, you may also need
        'libssp-0.dll', which can be obtained from your mingw sys-root
  3. Make sure C:\Program Files\QEMU\run exists (create it if it doesn't)
  4. open a command prompt (as administrator), and run
    `qemu-ga.exe -s install` to install qemu-ga service
  5. manually start qemu-ga service via `net start qemu-ga`, or enable
    autostart for qemu-ga service via 'Control Panel'>'Services'
    configuration menu.
start guest:
  qemu \
  -drive file=/home/mdroth/vm/rhel6_64_base.raw,snapshot=off,if=virtio \
  -net nic,model=virtio,macaddr=52:54:00:12:34:00 \
  -net tap,script=/etc/qemu-ifup \
  -vnc :1 -m 1024 --enable-kvm \
  -chardev socket,path=/tmp/qga.sock,server=on,wait=off,id=qga0 \
  -device virtio-serial \
  -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0"
use guest agent:
  ./qemu-ga -h
  ./qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0
start/use qmp:
  mdroth@illuin:~$ sudo socat unix-connect:/tmp/qga.sock readline
  {"execute":"guest-sync", "arguments":{"id":1234}}
  {"return": 1234}
  {"execute":"guest-ping"}
  {"return": {}}
  {"execute": "guest-info"}
  {"return": {"version": "1.0"}}
  // write "hello world!\n" to /tmp/testqga
  {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}}
  {"return": 0}
  {"execute":"guest-file-write", "arguments":{"handle":0,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
  {"return": {"count": 13, "eof": false}}
  {"execute":"guest-file-close", "arguments":{"handle":0}}
  {"return": {}}
  // read back the "hello world!\n" from /tmp/testqga
  {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
  {"return": 1}
  {"execute":"guest-file-read", "arguments":{"handle":1,"count":1024}}
  {"return": {"buf-b64": "aGVsbG8gd29ybGQhCg==", "count": 13, "eof": true}}
  {"execute":"guest-file-close","arguments":{"handle":1}}
  {"return": {}}
  // freeze and unfreeze freezable guest filesystems
  {"execute":"guest-fsfreeze-status"}
  {"return": "thawed"}
  {"execute":"guest-fsfreeze-freeze"}
  {"return": 3}
  {"execute":"guest-fsfreeze-status"}
  {"return": "frozen"}
  {"execute":"guest-fsfreeze-thaw"}
  {"return": 3}
  {"execute":"guest-fsfreeze-status"}
  {"return": "thawed"}
=== Example using vsock ===
start guest (cid = 3):
  host$ qemu -device vhost-vsock-pci,guest-cid=3 ...
start guest agent using vsock device:
  guest$ qemu-ga -m vsock-listen -p 3:1234
start/use qmp:
  host$ nc --vsock 3 1234                                                     
  {"execute":"guest-sync", "arguments":{"id":1234}}                           
  {"return": 1234}


== Schema Definition ==
== Schema Definition ==


All guest commands will use a ''guest-'' prefix to distinguish the fact that
All guest commands will use a ''guest-'' prefix to distinguish the fact that
the commands are handled by the guest. Likewise, events will carry a
the commands are handled by the guest. Type names (complex types and enums) do
''GUEST_'' prefix.  Type names (complex types and enums) do not require a
not require a special prefix.  The following is an example of the proposed guest
special prefix.  The following is an example of the proposed guest agent schema:
agent schema:


  // qemu/qmp-schema.json
  ##
  [ 'guest-ping', {}, 'none' ]
  # @guest-ping:
  [ 'guest-view-file', {'filename', 'str'}, 'str' ]
#
  # Ping the guest agent, a non-error return implies success
#
# Since: 0.15.0
##
{ 'command': 'guest-ping' }
   
   
  { 'GuestType': [ 'linux', 'windows', 'freebsd', 'netbsd',
  ##
                  'openbsd', 'solaris' ] }
# @guest-info:
  { 'GuestInfo': { 'os-type': 'GuestType', 'os-version': 'str',
#
                  'agent-version': 'str' } }
# Get some information about the guest agent.
  { 'GUEST_STARTUP': { 'GuestInfo': 'str' } }
#
  # Since: 0.15.0
##
{ 'type': 'GuestAgentInfo', 'data': {'version': 'str'} }
  { 'command': 'guest-info',
  'returns': 'GuestAgentInfo' }
 
This would result is types being created as described for QAPI, with signatures as follows:
 
void qmp_guest_ping(Error **errp);
GuestAgentInfo * qmp_guest_info(Error **errp);


== libqmp ==
== libqmp ==


In libqmp, the code generated for a guest command is identical to the code
In libqmp, the code generated for a guest command is nearly identical to the code
generated for a normal command.
generated for a normal command.


For instance, the ''guest-view-file'' command will have the following signature:
For instance, the ''guest-info'' command will have the following signature:


  char *qmp_guest_view_file(QmpSession *sess, const char *filename, Error **errp);
  GuestAgentInfo *qmp_guest_info(QmpSession *sess, Error **errp);


== QEMU ==
== QEMU ==
Line 64: Line 181:
surface.
surface.


This works by combining the existing unmarshalling code for the QMP server with
Here is an example of the code that will be generated handle agent commands:
the marshalling code from libqmp.  Here is an example of the code that will be
generated:


  void qmp_guest_ping(Error **errp)
  static void qmp_marshal_output_guest_info(GuestAgentInfo * ret_in, QObject **ret_out, Error **errp)
  {
  {
     QDict *args = qdict_new();
     QapiDeallocVisitor *md = qapi_dealloc_visitor_new();
     QObject *ret_data;
     QmpOutputVisitor *mo = qmp_output_visitor_new();
 
    Visitor *v;
     // no arguments to marshal
     ret_data = qmp_guest_command_dispatch("guest-ping", args, errp);
     v = qmp_output_get_visitor(mo);
     // ret_data should always be empty
     visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
     qobject_decref(ret_data);
     if (!error_is_set(errp)) {
     QDECREF(args);
        *ret_out = qmp_output_get_qobject(mo);
    }
    qmp_output_visitor_cleanup(mo);
    v = qapi_dealloc_get_visitor(md);
     visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
     qapi_dealloc_visitor_cleanup(md);
  }
  }
   
   
  static void qmp_marshal_guest_ping(const QDict *args, QObject **ret_data, Error **errp)
  static void qmp_marshal_input_guest_info(QDict *args, QObject **ret, Error **errp)
{
{
     // no arguments to unmarshal
    GuestAgentInfo * retval = NULL;
     qmp_guest_ping(errp);
    if (error_is_set(errp)) {
     // no retval to marshal
        goto out;
}
    }
    retval = qmp_guest_info(errp);
    qmp_marshal_output_guest_info(retval, ret, errp);
out:
    return;
}
 
== QEMU Guest Agent Protocol ==
 
In general, qemu-ga uses the same protocol as QMP. There are a couple issues
regarding it's isa-serial/virtio-serial transport that incur some additional
caveats, however:
 
1) there is no way for qemu-ga to detect whether or not a client has connected
    to the channel (usually a chardev with a unix socket front-end and
     virtio-serial backend)
2) there is no way for a client to detect whether or not qemu-ga has
    [re-]connected or disconnected to the backend
3) if qemu-ga has not connected to the channel since the virtio-serial device
     has been reset (generally after reboot or hotplug), data from the client
    will be dropped
4) if qemu-ga has connected to the channel since the virtio-serial device has
    been reset, data from the client will be queued (and eventually throttled
    if available buffers are exhausted), regardless of whether or not qemu-ga
     is still running/connected.
 
Because of 1) and 2), a qemu-ga channel must be treated as "always-on", even if
qemu-ga hasn't even been installed on the guest. We could add start-up
notifications to the agent, but there's no way of detecting if, after a
notification, qemu-ga was stopped and uninstalled, and the machine subsequently
rebooted (we can probe for the this, but that only tells use the state for that
exact instance in time. Stop notifications would be needed to build any notion
of a "session" around such events, but there's no way to guarantee a stop
notification's delivery before agent shutdown or device/buffer reset).
 
This means robust clients *must* implement a client-side timeout mechanism when
attempting to communicate with the agent. It also means that when a client
connects, or after a client times out waiting for a response to a request,
there may be garbage received due to the agent starting up and responding to
requests that were queued by previous client connections, or to stale requests
from the current client connection that had timed-out on the client-side.
 
It also means that, due to 4), a client can block indefinitely when writing to
a channel that's been throttled due to a backlog of unhandled/queued requests,
and so should be written with this possibility in mind (separate thread, event
loop, etc.).
 
qemu-ga uses the guest-sync or guest-sync-delimited command to address the
problem of re-sync'ing the channel after [re-]connection or client-side
timeouts. These are described below.
 
=== guest-sync ===
 
The guest-sync request/response exchange is simple. The client provides a
unique numerical token, the agent sends it back in a response:
 
    > { "execute": "guest-sync", "arguments": { "id": 123456 } }
    < { "return": 123456}
 
A successful exchange guarantees that the channel is now in sync and no
unexpected data/responses will be sent.


== virtio-serial Transport ==
Note that for the reasons mentioned above there's no guarantee this request
will be answered, so a client should implement a timeout and re-issue this
periodically until a response is received for the most recent request.


The ''qmp_guest_command_dispatch'' command will take the QObjects and generate
This alone does not handle synchronisation for all cases, however. For instance, if qemu-ga's
a QMP command to send to the guest.  This will be sent to the guest via a new
parser previously received a partial request from a previous client connection,
CharDriverState implementation. This CharDriverState will provide a backend
subsequent attempts to issue the guest-sync request can be misconstrued as
to virtio-serial.  It will essentially act as an in-memory chardev except that
being part of the previous partial request. Eventually qemu-ga will hit it's
it will parse the input from the guest for invalid UTF-8 characters. If an
recursion or token size limit and flush its parser state, at which point it
invalid character is detected, the CharDriverState will generate a reset.
will begin processing the backlog of requests, but there's no guarantee this
will occur before the channel is throttled due to exhausting all available
buffers. Thus there is potential for a deadlock situation occurring for
certain instances.


This behavior will be utilized by the guest in order to reset the QMP session
To avoid this, qemu-ga/QEMU's JSON parser have special handling for the 0xFF
after the guest agent restartsThe first byte it writes to virtio-serial will
byte, which is an invalid UTF-8 character. Clients should precede the
always be 0xFF.
guest-sync request with to ensure that qemu-ga flushes it's parser state as
soon as possible. So long as all clients abide by this, the deadlock state
should be reliably avoidable.
 
A similar situation can happen WRT to qemu-ga attempting to communicate with
a client, however. If the client receives a partial response from a previous
qemu-ga instance, the client might misconstrue responses to guest-sync as being
part of this previous request. For client implementations that treat newlines
as a delimiter for qemu-ga responses, this is easy to recover from (one valid
response may be lost, but we can recover on the subsequent try).
 
For some implementations, in particular, JSON stream-based implementations
which do not rely on newline delimiters, it may be invasive to implement a
client's response/JSON handling in a such a way that this same deadlock scenario
can be avoided on the client-side. To make this situation easier to deal with,
the guest-sync-delimited command can be used to tell qemu-ga to send precede
the response with this same 0xFF character.
 
=== guest-sync-delimited ===
 
  > { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }
< { "return": 123456}
 
Actual hex values sent:
> 7b 27 65 78 65 63 75 74 65 27 3a 27 67 75 65 73 74 2d 73 79 6e 63 2d 64 65
  6c 69 6d 69 74 65 64 27 2c 27 61 72 67 75 6d 65 6e 74 73 27 3a 7b 27 69 64
  27 3a 31 32 33 34 35 36 7d 7d 0a
< ff 7b 22 72 65 74 75 72 6e 22 3a 20 31 32 33 34 35 36 7d 0a
 
As stated above, the request should also be preceded with a 0xFF to flush
qemu-ga's parser state.


== Guest Agent ==
== Guest Agent ==
Line 116: Line 333:
from a protocol visible perspective but is simply an implementation detail
from a protocol visible perspective but is simply an implementation detail
within QEMU.
within QEMU.
These details will be worked out in the context of QAPI-based QMP. The current, standalone host service requires that clients provide for their own timeout mechanisms. The reset mechanism descibed under "virtio-serial Transport" should be employed upon each connection to the guest agent to re-sync the streams with the guest agent in case a timeout from a client left the stream in a bad state.


== Security Considerations ==
== Security Considerations ==
Line 121: Line 340:
The following security issues need to be resolved in QMP:
The following security issues need to be resolved in QMP:


1. The JSON parser uses a recursive decent parser.  Malicious input could potentially cause a stack overflow.  Either implement a recursion depth counter, or
# The JSON parser uses a recursive decent parser.  Malicious input could potentially cause a stack overflow.  Either implement a recursion depth counter, or switch the parser to only use tail recursion.
swith the parser to only use tail recursion.
# The JSON parser may not handle premature EOI all that well.  I think I've worked out most of these issues but more rigorous testing is needed.
1. The JSON parser may not handle premature EOI all that well.  I think I've worked out most of these issues but more rigorious testing is needed.

Latest revision as of 12:38, 22 March 2022

Summary

Implement support for QMP commands and events that terminate and originate respectively within the guest using an agent built as part of QEMU.

Detailed Summary

Ultimately the QEMU Guest Agent aims to provide access to a system-level agent via standard QMP commands.

This support is targeted for a future QAPI-based rework of QMP, however, so currently, for QEMU 0.15, the guest agent is exposed to the host via a separate QEMU chardev device (generally, a unix socket) that communicates with the agent using the QMP wire protocol (minus the negotiation) over a virtio-serial or isa-serial channel to the guest. Assuming the agent will be listening inside the guest using the virtio-serial device at /dev/virtio-ports/org.qemu.guest_agent.0 (the default), the corresponding host-side QEMU invocation would be something:

 qemu \
 ...
 -chardev socket,path=/tmp/qga.sock,server=on,wait=off,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0

Commands would be then be issued by connecting to /tmp/qga.sock, writing the QMP-formatted guest agent command, reading the QMP-formatted response, then disconnecting from the socket. (It's not strictly necessary to disconnect after a command, but should be done to allow sharing of the guest agent with multiple client when exposing it as a standalone service in this fashion. When guest agent passthrough support is added to QMP, QEMU/QMP will handle arbitration between multiple clients).

When QAPI-based QMP is available (somewhere around the QEMU 0.16 timeframe), a different host-side invocation that doesn't involve access to the guest agent outside of QMP will be used. Something like:

 qemu \
 ...
 -chardev qga_proxy,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
 -qmp tcp:localhost:4444,server

Currently this is planned to be done as a pseudo-chardev that only QEMU/QMP sees or interacts with, but the ultimate implementation may vary to some degree. The net effect should the same however: guest agent commands will be exposed in the same manner as QMP commands using the same QMP server, and communication with the agent will be handled by QEMU, transparently to the client.

The current list of supported RPCs is documented in qemu.git/qapi-schema-guest.json.

Example usage

build:

 # for linux
 ./configure
 make qemu-ga #should be built on|for target guest
 # for Windows using MinGW on linux/cygwin (example for Fedora 18)
 ./configure --enable-guest-agent --cross-prefix=i686-w64-mingw32-
 make qemu-ga.exe

install:

 # for linux
 sudo make install
 # for Windows
 1. make sure virtio-serial Windows drivers are installed and
    working correctly (vioser-test utility that ships with
    virtio-win ISO can help to confirm this)
    (http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/)
 2. copy qemu-ga.exe to a directory on your windows guest, along
    with the following libraries all extracted/installed to the
    same directory:
    a) Contents of 'bin' directory from 'GLib' runtime
       (http://www.gtk.org/download/win32.php)
    b) 'intl.dll' for 'gettext' runtime
       (http://www.gtk.org/download/win32.php)
    c) depending on your build environment, you may also need
       'libssp-0.dll', which can be obtained from your mingw sys-root
 3. Make sure C:\Program Files\QEMU\run exists (create it if it doesn't)
 4. open a command prompt (as administrator), and run
    `qemu-ga.exe -s install` to install qemu-ga service
 5. manually start qemu-ga service via `net start qemu-ga`, or enable
    autostart for qemu-ga service via 'Control Panel'>'Services'
    configuration menu.

start guest:

 qemu \
 -drive file=/home/mdroth/vm/rhel6_64_base.raw,snapshot=off,if=virtio \
 -net nic,model=virtio,macaddr=52:54:00:12:34:00 \
 -net tap,script=/etc/qemu-ifup \
 -vnc :1 -m 1024 --enable-kvm \
 -chardev socket,path=/tmp/qga.sock,server=on,wait=off,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0"

use guest agent:

 ./qemu-ga -h
 ./qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

start/use qmp:

 mdroth@illuin:~$ sudo socat unix-connect:/tmp/qga.sock readline
 {"execute":"guest-sync", "arguments":{"id":1234}}
 {"return": 1234}
 {"execute":"guest-ping"}
 {"return": {}}
 {"execute": "guest-info"}
 {"return": {"version": "1.0"}}
 // write "hello world!\n" to /tmp/testqga
 {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}}
 {"return": 0}
 {"execute":"guest-file-write", "arguments":{"handle":0,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
 {"return": {"count": 13, "eof": false}}
 {"execute":"guest-file-close", "arguments":{"handle":0}}
 {"return": {}}
 // read back the "hello world!\n" from /tmp/testqga
 {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
 {"return": 1}
 {"execute":"guest-file-read", "arguments":{"handle":1,"count":1024}}
 {"return": {"buf-b64": "aGVsbG8gd29ybGQhCg==", "count": 13, "eof": true}}
 {"execute":"guest-file-close","arguments":{"handle":1}}
 {"return": {}}
 // freeze and unfreeze freezable guest filesystems
 {"execute":"guest-fsfreeze-status"}
 {"return": "thawed"}
 {"execute":"guest-fsfreeze-freeze"}
 {"return": 3}
 {"execute":"guest-fsfreeze-status"}
 {"return": "frozen"}
 {"execute":"guest-fsfreeze-thaw"}
 {"return": 3}
 {"execute":"guest-fsfreeze-status"}
 {"return": "thawed"}

Example using vsock

start guest (cid = 3):

 host$ qemu -device vhost-vsock-pci,guest-cid=3 ...

start guest agent using vsock device:

 guest$ qemu-ga -m vsock-listen -p 3:1234

start/use qmp:

 host$ nc --vsock 3 1234                                                      
 {"execute":"guest-sync", "arguments":{"id":1234}}                            
 {"return": 1234}

Schema Definition

All guest commands will use a guest- prefix to distinguish the fact that the commands are handled by the guest. Type names (complex types and enums) do not require a special prefix. The following is an example of the proposed guest agent schema:

##
# @guest-ping:
#
# Ping the guest agent, a non-error return implies success
#
# Since: 0.15.0
##
{ 'command': 'guest-ping' }

##
# @guest-info:
#
# Get some information about the guest agent.
#
# Since: 0.15.0
##
{ 'type': 'GuestAgentInfo', 'data': {'version': 'str'} }
{ 'command': 'guest-info',
  'returns': 'GuestAgentInfo' }

This would result is types being created as described for QAPI, with signatures as follows:

void qmp_guest_ping(Error **errp);
GuestAgentInfo * qmp_guest_info(Error **errp);

libqmp

In libqmp, the code generated for a guest command is nearly identical to the code generated for a normal command.

For instance, the guest-info command will have the following signature:

GuestAgentInfo *qmp_guest_info(QmpSession *sess, Error **errp);

QEMU

The only role QEMU plays in guest commands is unmarshalling and remarshalling the input and output. This means that data from the guest is not being sent directly to a management tool which significantly decreases the guest attack surface.

Here is an example of the code that will be generated handle agent commands:

static void qmp_marshal_output_guest_info(GuestAgentInfo * ret_in, QObject **ret_out, Error **errp)
{
    QapiDeallocVisitor *md = qapi_dealloc_visitor_new();
    QmpOutputVisitor *mo = qmp_output_visitor_new();
    Visitor *v;

    v = qmp_output_get_visitor(mo);
    visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
    if (!error_is_set(errp)) {
        *ret_out = qmp_output_get_qobject(mo);
    }
    qmp_output_visitor_cleanup(mo);
    v = qapi_dealloc_get_visitor(md);
    visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
    qapi_dealloc_visitor_cleanup(md);
}

static void qmp_marshal_input_guest_info(QDict *args, QObject **ret, Error **errp)
{
    GuestAgentInfo * retval = NULL;
    if (error_is_set(errp)) {
        goto out;
    }
    retval = qmp_guest_info(errp);
    qmp_marshal_output_guest_info(retval, ret, errp);

out:

    return;
}

QEMU Guest Agent Protocol

In general, qemu-ga uses the same protocol as QMP. There are a couple issues regarding it's isa-serial/virtio-serial transport that incur some additional caveats, however:

1) there is no way for qemu-ga to detect whether or not a client has connected
   to the channel (usually a chardev with a unix socket front-end and
   virtio-serial backend)
2) there is no way for a client to detect whether or not qemu-ga has
   [re-]connected or disconnected to the backend
3) if qemu-ga has not connected to the channel since the virtio-serial device
   has been reset (generally after reboot or hotplug), data from the client
   will be dropped
4) if qemu-ga has connected to the channel since the virtio-serial device has
   been reset, data from the client will be queued (and eventually throttled
   if available buffers are exhausted), regardless of whether or not qemu-ga
   is still running/connected.

Because of 1) and 2), a qemu-ga channel must be treated as "always-on", even if qemu-ga hasn't even been installed on the guest. We could add start-up notifications to the agent, but there's no way of detecting if, after a notification, qemu-ga was stopped and uninstalled, and the machine subsequently rebooted (we can probe for the this, but that only tells use the state for that exact instance in time. Stop notifications would be needed to build any notion of a "session" around such events, but there's no way to guarantee a stop notification's delivery before agent shutdown or device/buffer reset).

This means robust clients *must* implement a client-side timeout mechanism when attempting to communicate with the agent. It also means that when a client connects, or after a client times out waiting for a response to a request, there may be garbage received due to the agent starting up and responding to requests that were queued by previous client connections, or to stale requests from the current client connection that had timed-out on the client-side.

It also means that, due to 4), a client can block indefinitely when writing to a channel that's been throttled due to a backlog of unhandled/queued requests, and so should be written with this possibility in mind (separate thread, event loop, etc.).

qemu-ga uses the guest-sync or guest-sync-delimited command to address the problem of re-sync'ing the channel after [re-]connection or client-side timeouts. These are described below.

guest-sync

The guest-sync request/response exchange is simple. The client provides a unique numerical token, the agent sends it back in a response:

   > { "execute": "guest-sync", "arguments": { "id": 123456 } }
   < { "return": 123456}

A successful exchange guarantees that the channel is now in sync and no unexpected data/responses will be sent.

Note that for the reasons mentioned above there's no guarantee this request will be answered, so a client should implement a timeout and re-issue this periodically until a response is received for the most recent request.

This alone does not handle synchronisation for all cases, however. For instance, if qemu-ga's parser previously received a partial request from a previous client connection, subsequent attempts to issue the guest-sync request can be misconstrued as being part of the previous partial request. Eventually qemu-ga will hit it's recursion or token size limit and flush its parser state, at which point it will begin processing the backlog of requests, but there's no guarantee this will occur before the channel is throttled due to exhausting all available buffers. Thus there is potential for a deadlock situation occurring for certain instances.

To avoid this, qemu-ga/QEMU's JSON parser have special handling for the 0xFF byte, which is an invalid UTF-8 character. Clients should precede the guest-sync request with to ensure that qemu-ga flushes it's parser state as soon as possible. So long as all clients abide by this, the deadlock state should be reliably avoidable.

A similar situation can happen WRT to qemu-ga attempting to communicate with a client, however. If the client receives a partial response from a previous qemu-ga instance, the client might misconstrue responses to guest-sync as being part of this previous request. For client implementations that treat newlines as a delimiter for qemu-ga responses, this is easy to recover from (one valid response may be lost, but we can recover on the subsequent try).

For some implementations, in particular, JSON stream-based implementations which do not rely on newline delimiters, it may be invasive to implement a client's response/JSON handling in a such a way that this same deadlock scenario can be avoided on the client-side. To make this situation easier to deal with, the guest-sync-delimited command can be used to tell qemu-ga to send precede the response with this same 0xFF character.

guest-sync-delimited

> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }
< { "return": 123456}

Actual hex values sent:

> 7b 27 65 78 65 63 75 74 65 27 3a 27 67 75 65 73 74 2d 73 79 6e 63 2d 64 65
  6c 69 6d 69 74 65 64 27 2c 27 61 72 67 75 6d 65 6e 74 73 27 3a 7b 27 69 64
  27 3a 31 32 33 34 35 36 7d 7d 0a
< ff 7b 22 72 65 74 75 72 6e 22 3a 20 31 32 33 34 35 36 7d 0a

As stated above, the request should also be preceded with a 0xFF to flush qemu-ga's parser state.

Guest Agent

The guest agent will be a daemon that connects to a virtio-serial device and feeds the input to a JSON parser. When a new command is received, it will hand the command over to the QAPI generated dispatch routines.

The guest agent will implement the server side of the QMP commands using the native signature for the function.

Asynchronous Commands

Since QEMU cannot rely on the guest agent responding immediately to a command (it is in fact impossible for it to do so), all guest commands most be implemented as asynchronous commands within QEMU. This does not change anything from a protocol visible perspective but is simply an implementation detail within QEMU.

These details will be worked out in the context of QAPI-based QMP. The current, standalone host service requires that clients provide for their own timeout mechanisms. The reset mechanism descibed under "virtio-serial Transport" should be employed upon each connection to the guest agent to re-sync the streams with the guest agent in case a timeout from a client left the stream in a bad state.

Security Considerations

The following security issues need to be resolved in QMP:

  1. The JSON parser uses a recursive decent parser. Malicious input could potentially cause a stack overflow. Either implement a recursion depth counter, or switch the parser to only use tail recursion.
  2. The JSON parser may not handle premature EOI all that well. I think I've worked out most of these issues but more rigorous testing is needed.