On Thu, Feb 21, 2019 at 05:20:32PM +0100, Greg Kroah-Hartman wrote:
On Thu, Feb 21, 2019 at 03:47:01PM +0100, Joerg Roedel wrote:
On Thu, Feb 21, 2019 at 03:15:30PM +0100, Greg Kroah-Hartman wrote:
Ugh, good catch!
Any hint as to what type of testing that you did that caught this? I keep asking people to run some kvm tests, but so far no one is :(
We caught this at SUSE while testing candidate kernel updates for one of our service packs using a 4.4-based kernel and debugging turned out that this is issue came in via stable-updates. We also build a vanilla-flavour of the kernel which is nearly identical to the upstream stable tree, but what usually ends up in testing is the full tree with other backports.
This particular issue was found by updating some openstack machines with the candidate kernel, which then triggered the problem in some guests. It is also a very special one, since I was only able to trigger the problem on Westmere-based machines with a specific guest-config.
Nice work. Any chance that "test" could be added to the kvm testing scripts that I think are being worked on somewhere? Ideally we would have caught this before it ever hit the stable tree. Due to the lack of good KVM testing, that's one of the areas I am always most worried about
This bug exists only in the 4.4.y backport; upstream, 4.9.y and 4.14.y all had the correct code from the get-go. And there is already a KVM unit test that *should* hit this, albeit somewhat indirectly. I'll verify the tests that touch the TPR actually run with x2APIC enabled.
Assuming the KVM unit test actually works, it's not a stretch for the bug to esacpe, e.g. if the tests weren't run on 4.4.y at all, or were only run on hardware with x2APIC.