On Mon, Jan 28, 2019 at 03:14:53PM -0500, Sasha Levin wrote:
On Mon, Jan 28, 2019 at 08:25:20PM +0100, Thomas Lindroth wrote:
I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on downloaded files. The failures are undeterministic and similar to the failures you get with bad ram. I tried to diagnose the problem with various testing tools and found that "stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors usually within 60 sec:
stress-ng-cpu: Newton-Rapshon sqrt not accurate enough stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
Nothing relevant has changed recently in the VM but the host kernel was upgraded from 4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There is only one kvm related change in that range so I tried to revert that one.
By reverting commit 4124a4cff344abbf8187775eb643d9827830e715 "x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce the stress-ng error and I have no segfault or other problems with the guest.
[...]
Interesting, thank you for the report.
Could you confirm whether this issue reproduces on a newer kernel that has that patch (4.19.18 for example)?
The bug is specific to 4.14, two dependent commits were applied in the wrong order and introduced the bug. I have a patch, in the process of typing up the changelog.