FlakyTests

From QEMU

This is a page to document tests which currently seem to be flaky, either in CI or locally. If you have information/possible causes/fixes for any of these, please feel free to update. If particular tests seem flaky to you, please note that also. Add new issues at the top of the file. Try to include dates when flakiness was observed so we have some hope of distinguishing old-and-fixed from current problems...

N8x0Machine.test_n810 avocado test sometimes times out

Seen: 2023-03-10

https://gitlab.com/qemu-project/qemu/-/jobs/3907923626

 (076/217) tests/avocado/machine_arm_n8x0.py:N8x0Machine.test_n810:  INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '076-tests/avocado/machine_arm_n8x0.py:N8x0Machine.test_n810', 'logdir': '/builds/qemu-project/qemu/build/tests/results/job-2023-03-09T22.29-2da5ef9/test-re... (90.26 s)

segfaults in linux-user tests

Seen: 2023-03-06

Various of the linux-user tests seem to intermittently crash.

SIGSEGV for mte-3, aarch64 host aarch64 guest: https://gitlab.com/qemu-project/qemu/-/jobs/3881303104

test-blockjob in msys2-64bit job

Seen: 2023-03-03, 2023-03-06

The msys2-64bit job fails intermittently with a failure on the test-blockjob test

https://gitlab.com/qemu-project/qemu/-/jobs/3872508803 https://gitlab.com/qemu-project/qemu/-/jobs/3871061024 https://gitlab.com/qemu-project/qemu/-/jobs/3865312440

Sample output:

| 53/89 ERROR:../tests/unit/test-blockjob.c:499:test_complete_in_standby:
assertion failed: (job->status == JOB_STATUS_STANDBY) ERROR
53/89 qemu:unit / test-blockjob ERROR 0.08s exit status 3

Also seen on macos x86:

172/621 qemu:unit / test-blockjob
           ERROR           0.26s   killed by signal 6 SIGABRT
11:03:46 MALLOC_PERTURB_=176
G_TEST_SRCDIR=/Users/pm215/src/qemu-for-merges/tests/unit
G_TEST_BUILDDIR=/Users/pm215/src/qemu-for-merges/build/all/tests/unit
/Users/pm215/src/qemu-for-merges/build/all/tests/unit/test-blockjob
--tap -k
----------------------------------- output -----------------------------------
stdout:
# random seed: R02S8c79d6e1c01ce0b25475b2210a253242
1..9
# Start of blockjob tests
ok 1 /blockjob/ids
stderr:
Assertion failed: (job->status == JOB_STATUS_STANDBY), function
test_complete_in_standby, file ../../tests/unit/test-blockjob.c, line
499.


TAP parsing error: Too few tests run (expected 9, got 1)
(test program exited with status code -6)
------------------------------------------------------------------------------

tricore-debian-cross container build

Seen: 2023-02-21 and in other flavours that we've attempted to address previously also

There's something strange about this container build, which is trying to build the tricore cross-toolchain. Usually it works, but sometimes it fails to compile: https://gitlab.com/qemu-project/qemu/-/jobs/3806772043

This seems to be because when it's building binutils occasionally it decides it needs to re-run flex/bison, and they don't re-build correctly -- but most of the time it decides the pre-generated output files are fine and then the build works. Maybe this is some timestamp issue?

The proposal is to avoid this entirely by just putting prebuilt tricore cross tools into a container, I think.

ast2500_evb_sdk avocado test

Seen: 2023-02-21 (and for months at least before that)

This one is flaky for me on local 'make check-avocado' builds; I haven't seen it in the CI.

 (41/69) tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '41-tests/avocado/machine_aspeed.py:AST2x00MachineSDK.test_arm_ast2500_evb_sdk', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests... (240.87 s)

migration-test

Seen: 2023-02-21 and at least back into December of 2022

Fails a lot for me on my local macos x86 box, but not when run individually -- possibly it's bad when the host is under heavy load? Also fails elsewhere, but much more intermittently.

https://gitlab.com/qemu-project/qemu/-/jobs/3806090216 (a FreeBSD job)

  32/648 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR

on a local macos x86 box:

▶  34/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_
str_equal(status, "failed")) ERROR
 34/621 qemu:qtest+qtest-i386 / qtest-i386/migration-test                         ERROR          168.12s   killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-i386: Failed to peek at channel
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))

(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

▶  37/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed")) ERROR
 37/621 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                     ERROR          174.37s   killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion failed: (!g_str_equal(status, "failed"))

(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I've seen this on other CI jobs as well, but Gitlab's UI makes it pretty much impossible to re-find failed jobs, since you can't search for them by failure reason at all.

I've also seen this fail on the OpenBSD vm build.

I've seen the migration-test hang on the s390 private CI runner in such a way that even though the CI job has timed out, the stale QEMU and migration-test processes are still lying around on the host.

segfaults in check-tcg

Seen: 2023-02-21

Here's a failed job on aarch64 host, aarch64 guest, segfault on bti-3 in tcg-tests: https://gitlab.com/qemu-project/qemu/-/jobs/3806772144

TEST bti-3 on aarch64
Segmentation fault
make[1]: *** [Makefile:170: run-bti-3] Error 139

Seems to be intermittent, didn't happen on re-run of the job.