Features/GuestAgent: Difference between revisions

From QEMU
Line 171: Line 171:
  }
  }


== virtio-serial Transport ==
== QEMU Guest Agent Protocol ==


In the proposed implementation of passthrough support for guest agent commands via QMP, the ''qmp_guest_command_dispatch'' command will take the QObjects and generate
In general, qemu-ga uses the same protocol as QMP. There are a couple issues
a QMP command to send to the guest. This will be sent to the guest via a new
regarding it's isa-serial/virtio-serial transport that incur some additional
CharDriverState implementation.  This CharDriverState will provide a backend
caveats, however:
to virtio-serial.  It will essentially act as an in-memory chardev except that
it will parse the input from the guest for invalid UTF-8 characters.  If an
invalid character is detected, the CharDriverState will generate a reset.


This behavior will be utilized by the guest in order to reset the QMP session
1) there is no way for qemu-ga to detect whether or not a client has connected
after the guest agent restarts. The first byte it writes to virtio-serial will
    to the channel (usually a chardev with a unix socket front-end and
always be 0xFF.
    virtio-serial backend)
2) there is no way for a client to detect whether or not qemu-ga has
    [re-]connected or disconnected to the backend
3) if qemu-ga has not connected to the channel since the virtio-serial device
    has been reset (generally after reboot or hotplug), data from the client
    will be dropped
  4) if qemu-ga has connected to the channel since the virtio-serial device has
    been reset, data from the client will be queued (and eventually throttled
    if available buffers are exhausted)


For standalone mode, client are responsible for resetting the stream and issuing a guest-sync command to clear out stale data from the channel after connecting, in case a previous connection left the stream in a bad state. The procedure is:
Because of 1) and 2), a qemu-ga channel must be treated as "always-on", even if
qemu-ga hasn't even been installed on the guest. We could add start-up
notifications to the agent, but there's no way of detecting if, after a
notification, qemu-ga was stopped and uninstalled, and the machine subsequently
rebooted (we can probe for the this, but that only tells use the state for that
exact instance in time. Stop notifications would be needed to build any notion
of a "session" around such events, but there's no way to guarantee a stop
notification's delivery before agent shutdown or device/buffer reset).


- A 0xFF character is written to flush the channel guest's read side of the
This means robust clients *must* implement a client-side timeout mechanism when
  channel
attempting to communicate with the agent. It also means that when a client
- A guest-sync command should be used to reset the host's read-side of the
connects, or after a client times out waiting for a response to a request,
  channel stream should a client timeout or early disconnect result in
there may be garbage received due to the agent starting up and responding to
  unexpected output to a subsequent client session. This is done by passing
requests that were queued by previous client connections, or to stale requests
  in a unique id to guest-sync, and reading responses until the expected
from the current client connection that had timed-out on the client-side.
  response is returned


(Note: For this to work as intended, the parser for the QMP responses should have the quality that a properly formatted QMP response at the end of a stream of QMP responses, where the beginning of the stream is potentially missing due to a client timeout occurring after partially reading the stream, should be detected a parsed properly. This should be achievable by handling mismatched braces in such a way that ending braces/brackets with no opening brackes/brackets cause the data to be discared. This should be the case for the current implementation of the JSON parser in QEMU)
It also means that, due to 4), a client can block indefinitely when writing to
a channel that's been throttled due to a backlog of unhandled/queued requests,
and so should be written with this possibility in mind (separate thread, event
loop, etc.).


Robust clients should follow this procedure every time they connect to the guest agent socket, but it is not strictly required, and will be not be necessary when passthrough via QMP is implemented.
qemu-ga uses the guest-sync or guest-sync-delimited command to address the
problem of re-sync'ing the channel after [re-]connection or client-side
timeouts. These are described below.
 
=== guest-sync ===
 
The guest-sync request/response exchange is simple. The client provides a
unique numerical token, the agent sends it back in a response:
 
> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }
< { "return": 123456}
 
A successful exchange guarantees that the channel is now in sync and no
unexpected data/responses will be sent.
 
Note that for the reasons mentioned above there's no guarantee this request
will be answered, so a client should implement a timeout and re-issue this
periodically until a response is received for the most recent request.
 
This alone does not handle sync for cases, however. For instance, if qemu-ga's
parses previously received a partial request from a previous client connection,
subsequent attempts to issue the guest-sync request can be misconstrued as
being part of the previous partial request. Eventually qemu-ga will hit it's
recursion or token size limit and flush its parser state, at which point it
will begin processing the backlog of requests, but there's no guarantee this
will occur before the channel is throttled due to exhausting all available
buffers. Thus there is potential for a deadlock situation occurring for
certain instances.
 
To avoid this, qemu-ga/QEMU's JSON parser have special handling for the 0xFF
byte, which is an invalid UTF-8 character. Clients should precede the
guest-sync request with to ensure that qemu-ga flushes it's parser state as
soon as possible. So long as all clients abide by this, the deadlock state
should be reliably avoidable.
 
A similar situation can happen WRT to qemu-ga attempting to communicate with
a client, however. If the client receives a partial response from a previous
qemu-ga instance, the client might misconstrue responses to guest-sync as being
part of this previous request. While this is completely avoidable,
theoretically, it may be invasive to implement a client's response/JSON
handling in a such a way that this same deadlock scenario can be avoided on the
client-side. To make this situation easier to deal with, the
guest-sync-delimited command can be used to tell qemu-ga to send precede the
response with this same 0xFF character.
 
=== guest-sync-delimited ===
 
> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } }
< { "return": 123456}
 
Actual hex values sent:
> 7b 27 65 78 65 63 75 74 65 27 3a 27 67 75 65 73 74 2d 73 79 6e 63 2d 64 65
  6c 69 6d 69 74 65 64 27 2c 27 61 72 67 75 6d 65 6e 74 73 27 3a 7b 27 69 64
  27 3a 31 32 33 34 35 36 7d 7d 0a
< ff 7b 22 72 65 74 75 72 6e 22 3a 20 31 32 33 34 35 36 7d 0a
 
As stated above, the request should also be preceded with a 0xFF to flush
qemu-ga's parser state.


== Guest Agent ==
== Guest Agent ==

Revision as of 23:34, 6 February 2012

Summary

Implement support for QMP commands and events that terminate and originate respectively within the guest using an agent built as part of QEMU.

Detailed Summary

Ultimately the QEMU Guest Agent aims to provide access to a system-level agent via standard QMP commands.

This support is targeted for a future QAPI-based rework of QMP, however, so currently, for QEMU 0.15, the guest agent is exposed to the host via a separate QEMU chardev device (generally, a unix socket) that communicates with the agent using the QMP wire protocol (minus the negotiation) over a virtio-serial or isa-serial channel to the guest. Assuming the agent will be listening inside the guest using the virtio-serial device at /dev/virtio-ports/org.qemu.guest_agent.0 (the default), the corresponding host-side QEMU invocation would be something:

 qemu \
 ...
 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0

Commands would be then be issued by connecting to /tmp/qga.sock, writing the QMP-formatted guest agent command, reading the QMP-formatted response, then disconnecting from the socket. (It's not strictly necessary to disconnect after a command, but should be done to allow sharing of the guest agent with multiple client when exposing it as a standalone service in this fashion. When guest agent passthrough support is added to QMP, QEMU/QMP will handle arbitration between multiple clients).

When QAPI-based QMP is available (somewhere around the QEMU 0.16 timeframe), a different host-side invocation that doesn't involve access to the guest agent outside of QMP will be used. Something like:

 qemu \
 ...
 -chardev qga_proxy,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0
 -qmp tcp:localhost:4444,server

Currently this is planned to be done as a pseudo-chardev that only QEMU/QMP sees or interacts with, but the ultimate implementation may vary to some degree. The net effect should the same however: guest agent commands will be exposed in the same manner as QMP commands using the same QMP server, and communication with the agent will be handled by QEMU, transparently to the client.

The current list of supported RPCs is documented in qemu.git/qapi-schema-guest.json.

Example usage

build:

 ./configure
 make qemu-ga #should be built on|for target guest

start guest:

 qemu \
 -drive file=/home/mdroth/vm/rhel6_64_base.raw,snapshot=off,if=virtio \
 -net nic,model=virtio,macaddr=52:54:00:12:34:00 \
 -net tap,script=/etc/qemu-ifup \
 -vnc :1 -m 1024 --enable-kvm \
 -chardev socket,path=/tmp/qga.sock,server,nowait,id=qga0 \
 -device virtio-serial \
 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0"

use guest agent:

 ./qemu-ga -h
 ./qemu-ga -m virtio-serial -p /dev/virtio-ports/org.qemu.guest_agent.0

start/use qmp:

 mdroth@illuin:~$ sudo socat unix-connect:/tmp/qga.sock readline
 {"execute":"guest-sync", "arguments":{"id":1234}}
 {"return": 1234}
 {"execute":"guest-ping"}
 {"return": {}}
 {"execute": "guest-info"}
 {"return": {"version": "1.0"}}
 // write "hello world!\n" to /tmp/testqga
 {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"w+"}}
 {"return": 0}
 {"execute":"guest-file-write", "arguments":{"handle":0,"buf-b64":"aGVsbG8gd29ybGQhCg=="}}
 {"return": {"count": 13, "eof": false}}
 {"execute":"guest-file-close", "arguments":{"handle":0}}
 {"return": {}}
 // read back the "hello world!\n" from /tmp/testqga
 {"execute":"guest-file-open", "arguments":{"path":"/tmp/testqga","mode":"r"}}
 {"return": 1}
 {"execute":"guest-file-read", "arguments":{"handle":1,"count":1024}}
 {"return": {"buf-b64": "aGVsbG8gd29ybGQhCg==", "count": 13, "eof": true}}
 {"execute":"guest-file-close","arguments":{"handle":1}}
 {"return": {}}
 // freeze and unfreeze freezable guest filesystems
 {"execute":"guest-fsfreeze-status"}
 {"return": "thawed"}
 {"execute":"guest-fsfreeze-freeze"}
 {"return": 3}
 {"execute":"guest-fsfreeze-status"}
 {"return": "frozen"}
 {"execute":"guest-fsfreeze-thaw"}
 {"return": 3}
 {"execute":"guest-fsfreeze-status"}
 {"return": "thawed"}

Schema Definition

All guest commands will use a guest- prefix to distinguish the fact that the commands are handled by the guest. Type names (complex types and enums) do not require a special prefix. The following is an example of the proposed guest agent schema:

##
# @guest-ping:
#
# Ping the guest agent, a non-error return implies success
#
# Since: 0.15.0
##
{ 'command': 'guest-ping' }

##
# @guest-info:
#
# Get some information about the guest agent.
#
# Since: 0.15.0
##
{ 'type': 'GuestAgentInfo', 'data': {'version': 'str'} }
{ 'command': 'guest-info',
  'returns': 'GuestAgentInfo' }

This would result is types being created as described for QAPI, with signatures as follows:

void qmp_guest_ping(Error **errp);
GuestAgentInfo * qmp_guest_info(Error **errp);

libqmp

In libqmp, the code generated for a guest command is nearly identical to the code generated for a normal command.

For instance, the guest-info command will have the following signature:

GuestAgentInfo *qmp_guest_info(QmpSession *sess, Error **errp);

QEMU

The only role QEMU plays in guest commands is unmarshalling and remarshalling the input and output. This means that data from the guest is not being sent directly to a management tool which significantly decreases the guest attack surface.

Here is an example of the code that will be generated handle agent commands:

static void qmp_marshal_output_guest_info(GuestAgentInfo * ret_in, QObject **ret_out, Error **errp)
{
    QapiDeallocVisitor *md = qapi_dealloc_visitor_new();
    QmpOutputVisitor *mo = qmp_output_visitor_new();
    Visitor *v;

    v = qmp_output_get_visitor(mo);
    visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
    if (!error_is_set(errp)) {
        *ret_out = qmp_output_get_qobject(mo);
    }
    qmp_output_visitor_cleanup(mo);
    v = qapi_dealloc_get_visitor(md);
    visit_type_GuestAgentInfo(v, &ret_in, "unused", errp);
    qapi_dealloc_visitor_cleanup(md);
}

static void qmp_marshal_input_guest_info(QDict *args, QObject **ret, Error **errp)
{
    GuestAgentInfo * retval = NULL;
    if (error_is_set(errp)) {
        goto out;
    }
    retval = qmp_guest_info(errp);
    qmp_marshal_output_guest_info(retval, ret, errp);

out:

    return;
}

QEMU Guest Agent Protocol

In general, qemu-ga uses the same protocol as QMP. There are a couple issues regarding it's isa-serial/virtio-serial transport that incur some additional caveats, however:

1) there is no way for qemu-ga to detect whether or not a client has connected
   to the channel (usually a chardev with a unix socket front-end and
   virtio-serial backend)
2) there is no way for a client to detect whether or not qemu-ga has
   [re-]connected or disconnected to the backend
3) if qemu-ga has not connected to the channel since the virtio-serial device
   has been reset (generally after reboot or hotplug), data from the client
   will be dropped
4) if qemu-ga has connected to the channel since the virtio-serial device has
   been reset, data from the client will be queued (and eventually throttled
   if available buffers are exhausted)

Because of 1) and 2), a qemu-ga channel must be treated as "always-on", even if qemu-ga hasn't even been installed on the guest. We could add start-up notifications to the agent, but there's no way of detecting if, after a notification, qemu-ga was stopped and uninstalled, and the machine subsequently rebooted (we can probe for the this, but that only tells use the state for that exact instance in time. Stop notifications would be needed to build any notion of a "session" around such events, but there's no way to guarantee a stop notification's delivery before agent shutdown or device/buffer reset).

This means robust clients *must* implement a client-side timeout mechanism when attempting to communicate with the agent. It also means that when a client connects, or after a client times out waiting for a response to a request, there may be garbage received due to the agent starting up and responding to requests that were queued by previous client connections, or to stale requests from the current client connection that had timed-out on the client-side.

It also means that, due to 4), a client can block indefinitely when writing to a channel that's been throttled due to a backlog of unhandled/queued requests, and so should be written with this possibility in mind (separate thread, event loop, etc.).

qemu-ga uses the guest-sync or guest-sync-delimited command to address the problem of re-sync'ing the channel after [re-]connection or client-side timeouts. These are described below.

guest-sync

The guest-sync request/response exchange is simple. The client provides a unique numerical token, the agent sends it back in a response:

> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } } < { "return": 123456}

A successful exchange guarantees that the channel is now in sync and no unexpected data/responses will be sent.

Note that for the reasons mentioned above there's no guarantee this request will be answered, so a client should implement a timeout and re-issue this periodically until a response is received for the most recent request.

This alone does not handle sync for cases, however. For instance, if qemu-ga's parses previously received a partial request from a previous client connection, subsequent attempts to issue the guest-sync request can be misconstrued as being part of the previous partial request. Eventually qemu-ga will hit it's recursion or token size limit and flush its parser state, at which point it will begin processing the backlog of requests, but there's no guarantee this will occur before the channel is throttled due to exhausting all available buffers. Thus there is potential for a deadlock situation occurring for certain instances.

To avoid this, qemu-ga/QEMU's JSON parser have special handling for the 0xFF byte, which is an invalid UTF-8 character. Clients should precede the guest-sync request with to ensure that qemu-ga flushes it's parser state as soon as possible. So long as all clients abide by this, the deadlock state should be reliably avoidable.

A similar situation can happen WRT to qemu-ga attempting to communicate with a client, however. If the client receives a partial response from a previous qemu-ga instance, the client might misconstrue responses to guest-sync as being part of this previous request. While this is completely avoidable, theoretically, it may be invasive to implement a client's response/JSON handling in a such a way that this same deadlock scenario can be avoided on the client-side. To make this situation easier to deal with, the guest-sync-delimited command can be used to tell qemu-ga to send precede the response with this same 0xFF character.

guest-sync-delimited

> { "execute": "guest-sync-delimited", "arguments": { "id": 123456 } } < { "return": 123456}

Actual hex values sent: > 7b 27 65 78 65 63 75 74 65 27 3a 27 67 75 65 73 74 2d 73 79 6e 63 2d 64 65

 6c 69 6d 69 74 65 64 27 2c 27 61 72 67 75 6d 65 6e 74 73 27 3a 7b 27 69 64
 27 3a 31 32 33 34 35 36 7d 7d 0a

< ff 7b 22 72 65 74 75 72 6e 22 3a 20 31 32 33 34 35 36 7d 0a

As stated above, the request should also be preceded with a 0xFF to flush qemu-ga's parser state.

Guest Agent

The guest agent will be a daemon that connects to a virtio-serial device and feeds the input to a JSON parser. When a new command is received, it will hand the command over to the QAPI generated dispatch routines.

The guest agent will implement the server side of the QMP commands using the native signature for the function.

Asynchronous Commands

Since QEMU cannot rely on the guest agent responding immediately to a command (it is in fact impossible for it to do so), all guest commands most be implemented as asynchronous commands within QEMU. This does not change anything from a protocol visible perspective but is simply an implementation detail within QEMU.

These details will be worked out in the context of QAPI-based QMP. The current, standalone host service requires that clients provide for their own timeout mechanisms. The reset mechanism descibed under "virtio-serial Transport" should be employed upon each connection to the guest agent to re-sync the streams with the guest agent in case a timeout from a client left the stream in a bad state.

Security Considerations

The following security issues need to be resolved in QMP:

  1. The JSON parser uses a recursive decent parser. Malicious input could potentially cause a stack overflow. Either implement a recursion depth counter, or switch the parser to only use tail recursion.
  2. The JSON parser may not handle premature EOI all that well. I think I've worked out most of these issues but more rigorous testing is needed.