Problem Statement

A gating CI is a prerequisite to having a multi-maintainer model of merging. By having a common set of tests that are run prior to a merge you do not rely on who is currently doing merging duties having access to the current set of test machines.

Currently pre-merge testing is done via a set of tests done by ad-hoc shell scripts run on a set of machines using personal accounts of the overall maintainer. We want to replace this ad-hoc system with one which:

does not use any machines which aren't usable with generic project role accounts
uses a known and maintainable CI system (eg Gitlab) rather than hand-hacked scripts
can be handed over to another person to handle releases

Current Tests

This section describes the current ad-hoc setup. It isn't intended to imply that we want to necessarily carry over all of these tests and host types.

The scripts are kept in:

https://git.linaro.org/people/peter.maydell/misc-scripts.git/tree

though they are best treated as a reference for what we currently do rather than used as a base for anything.

The set of machine I currently test on are:

an S390x box (this is provided to the project by IBM's Community Cloud so can be used for the new CI setup)
aarch32 (as a chroot on an aarch64 system)
aarch64
ppc64 (on the GCC compile farm)
OSX
Windows crossbuilds
NetBSD, FreeBSD and OpenBSD using the tests/vm VMs
x86-64 Linux with a variety of different build configs (see the 'remake-merge-builds' script for how these are set up)

I also have access to a SPARC box but am not currently testing with it as there are hangs which I did not have time to investigate.

Testing process:

I get an email which is a pull request, and I run the "apply-pullreq" script, which takes the GIT URL and tag/branch name to test.
apply-pullreq performs the merge into a 'staging' branch
apply-pullreq also performs some simple local tests:
- does git verify-tag like the GPG signature?
- are we trying to apply the pull before reopening the dev tree for a new release?
- does the pull include commits with bad UTF8 or bogus qemu-devel email addresses?
- submodule updates are only allowed if the --submodule-ok option was specifically passed
apply-pullreq then invokes parallel-buildtest to do the actual testing
parallel-buildtest is a trivial wrapper around GNU Parallel which invokes 'mergebuild' on each of the test machines
if all is OK then the user gets to do the 'git push' to push the staging branch to master

In almost all cases 'mergebuild' is simply "run 'make -C build' and then 'make -C build check'". The exceptions are:

the Windows crossbuilds don't try to run 'make check'
the x86-64 host runs the 'pull-buildtest' script, which:
- does make/make check for multiple configs
- includes one build from 'make clean' (almost everything else does an incremental build)
- runs 'make check-tcg' on the all-linux-static config
- runs a trivial set of 'ls' binaries for a bunch of linux-user guests (this is probably mostly redundant now we have check-tcg)

The parallel-buildtest script causes GNU parallel to print a series of lines with the logfiles containing the captured stdout/stderr from each machine. I run the output of those through the 'greplogs' script which looks for things that look like error messages. This is intended to capture warnings/errors which didn't manage to cause the make process to return failure for one reason or another.

Sketch of 'phase one' Solution

To keep the scope of the initial implementation constrained the plan is:

identifying pull request emails, performing the actual git merge, and pushing the resulting staging branch to a public location should remain manual (or locally shell-scripted) tasks initially
the CI should have some mechanism for "start CI on this git repo + branch-or-tag"
that mechanism being scriptable and returning a success/failure code is preferable but not essential in phase one
the CI should have a web UI for looking at current status, logs from failed tests, etc
pushing the staging branch to master on success should also be locally scripted

(This roughly corresponds to "start by doing just the parts handled by 'parallel-buildtest' in the existing scripts".)

I'm not sure whether the equivalent of "run 'greplogs' on the build logs" should be done locally, or as part of the build process on the job runners. This may depend on whether the CI system conveniently gives us the build logs in a greppable way or not.

The initial load here is relatively low as it will only be doing tests of merge builds, and we don't have very many of these. (We will likely want to scale up later, though.)

Two possible technologies have been suggested for this:

patchew

Patchew is our current CI robot which tests all patchsets sent to the qemu-devel mailing list. Patchew parses emails, creates git trees with the patches applied and dispatches to a variety of platforms to do make/make-check style tests, with feedback via both web UI and email. It could be made to do pull requests as well. However we would prefer not to put more work into patchew if we can avoid it -- we want to be spending our time on QEMU, not on CI systems which we're the only serious user of. So this is a fallback if gitlab is not viable for some reason.

gitlab

The consensus on the mailing list was that the preferred approach would be to use gitlab CI (presumably on the public gitlab.com instance). gitlab permits projects to define private custom 'runners', which would allow us to run build tests on owned-by-the-project PPC/s390/etc hardware.

We have however identified one possible problem with using gitlab -- at the moment their 'runner' process (written in Go) only works out-of-the-box on x86. This seems to mostly be an issue with how they try to build and package it, so it seems like it could be worked around by building the runner executable locally on each host rather than trying to cross-compile it on x86. The relevant gitlab upstream issue is https://gitlab.com/gitlab-org/gitlab-runner/merge_requests/725 (and it has been open for a long time...). Talk to Alex Bennée (alex.bennee@linaro.org) for more details/information. We need to identify early in the process whether this is something we can work around or get fixed, or if we need to abandon the gitlab approach in favour of enhancing patchew.

Ideas for later phases

These are described mostly for context and to give an idea of where we want to go in future. We should definitely not try to do any of this before we have a basic working phase 1 solution. Some of them likely need more thought and discussion.

Using the same CI infrastructure for stable releases

At the moment Mike Roth does the stable releases using an entirely separate testing and release process from what I do for mainline releases. Once we have working CI and automation for mainline releases we should look at moving the stable-release process over as well.

Folding into patchew

Today we have a CI setup which tests all patchsets sent to the qemu-devel mailing list. This is handled by a robot called 'patchew', which parses emails, creates git trees with the patches applied and dispatches to a variety of platforms to do make/make-check style tests, with feedback via both web UI and email. This obviously has significant overlap with the CI we're doing for merge requests. The suggestion is that we could get rid of the half of patchew which is implementing "dispatch to CI job runners for testing and get back logs and pass/fail indication". The "parse emails and create git trees to be tested" and "web UI" parts would remain, but instead of doing its own dispatch and CI runners it would just invoke the same CI setup as the merge tests.

Automated identification of pull request emails

Probably most easily done via patchew. We could automate more of the merge request workflow so that tests are kicked off automatically when a merge request email is sent to the list, rather than requiring a human to do this. There would then need to be some way to tell the automated system "ok, actually push that staging branch to master", as we still want a human to eyeball them first, especially during releases.

Automated and decentralised application of merges

My personal preference for where we finally end up is to have something similar to the Rust community's automation, where testing of potential merges is automatic, and the pushing of a successful merge to master is done via comments in the web UI. We could have a setup where, for instance, an ack by any two other devs who've successfully submitted merges in the last 4 months is sufficient to permit a merge. (Criteria for acceptance could perhaps be tightened during release freezes.)

Automation of the 'create release tarballs' process

Currently I do releases (rcN and the final release) by making tags in git. I then email Mike Roth to let him know, and he produces source tarballs and makes release announcements. I don't know what this process involves but it seems likely we could automate at least some of it.