On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
It does. A working setup will no longer work if Xen version is upgraded to 4.11.
Why isn't this a regression in Xen that they fix? Why are we responsible for adding new kernel features to work on newer versions of Xen and backport them to older kernels?
In case a Linux user program relies on undocumented behavior of the kernel (e.g. a register being non-zero on return from a syscall), does the kernel have to support that behavior eternally? I don't think so.
Yes, the kernel does have to support it. It's called "do not break working systems", or as some like to call it, the "Cambridge Promise" that we made to userspace well over a decade ago at a kernel summit in Cambridge.
We do this all the time, sometimes going through great gyrations in order to achieve it. Or sometimes we just "wait it out" and delay 4+ years to make these types of changes to allow everyone to update their userspace programs before we make a change like this.
It's part of the job of running a good software project by not breaking user's systems. I suggest that Xen also adopt this same behavior if they want to keep a happy userbase. Otherwise we can just tell everyone to go use KVM :)
This is a similar case.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
So why can't Xen just tell all Linux users to update to a more modern kernel, i.e. 4.17.y and newer, in order to run with the new Xen kernel if they want to enforce this previously working behavior? Why does Linux have to be the one to change here?
thanks,
greg k-h