Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen. In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
For kernel 4.14 I'll send a slightly modified version of the patches soon.
Juergen
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
thanks,
greg k-h
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
It does. A working setup will no longer work if Xen version is upgraded to 4.11.
Juergen
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
It does. A working setup will no longer work if Xen version is upgraded to 4.11.
Why isn't this a regression in Xen that they fix? Why are we responsible for adding new kernel features to work on newer versions of Xen and backport them to older kernels?
thanks,
greg k-h
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
It does. A working setup will no longer work if Xen version is upgraded to 4.11.
Why isn't this a regression in Xen that they fix? Why are we responsible for adding new kernel features to work on newer versions of Xen and backport them to older kernels?
In case a Linux user program relies on undocumented behavior of the kernel (e.g. a register being non-zero on return from a syscall), does the kernel have to support that behavior eternally? I don't think so. This is a similar case.
Juergen
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
In PVH mode there is no guarantee the kernel can find the RSDP table at the legacy location in low memory, which is a requirement for the kernel to boot successful without those patches.
Why not just use newer kernels for new Xen features? This really doesn't look like a bugfix to me, does it to you?
It does. A working setup will no longer work if Xen version is upgraded to 4.11.
Why isn't this a regression in Xen that they fix? Why are we responsible for adding new kernel features to work on newer versions of Xen and backport them to older kernels?
In case a Linux user program relies on undocumented behavior of the kernel (e.g. a register being non-zero on return from a syscall), does the kernel have to support that behavior eternally? I don't think so.
Yes, the kernel does have to support it. It's called "do not break working systems", or as some like to call it, the "Cambridge Promise" that we made to userspace well over a decade ago at a kernel summit in Cambridge.
We do this all the time, sometimes going through great gyrations in order to achieve it. Or sometimes we just "wait it out" and delay 4+ years to make these types of changes to allow everyone to update their userspace programs before we make a change like this.
It's part of the job of running a good software project by not breaking user's systems. I suggest that Xen also adopt this same behavior if they want to keep a happy userbase. Otherwise we can just tell everyone to go use KVM :)
This is a similar case.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
So why can't Xen just tell all Linux users to update to a more modern kernel, i.e. 4.17.y and newer, in order to run with the new Xen kernel if they want to enforce this previously working behavior? Why does Linux have to be the one to change here?
thanks,
greg k-h
On 04/04/18 17:42, Greg KH wrote:
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote:
Please add the patches:
commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream commit dfc9327ab7c99bc13e12106448615efba833886b upstream commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream
to the 4.15 and 4.16 stable kernels.
Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
None.
BSD did it correctly. I guess Mini-OS doesn't count, as it is mostly Xen-internal, but it was not hit by this change.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
We have a working kernel just by luck. Would your reasoning be the same if the kernel would use an EFI runtime service wrong and an EFI update would lead to a crash?
So why can't Xen just tell all Linux users to update to a more modern kernel, i.e. 4.17.y and newer, in order to run with the new Xen kernel if they want to enforce this previously working behavior? Why does Linux have to be the one to change here?
I wanted to have those patches in 4.15, but problems with grub2 (not the upstream version, but multiple distro versions) and the Meltdown/Spectre desaster pushed them back to 4.17.
Juergen
On Wed, Apr 04, 2018 at 06:32:17PM +0200, Juergen Gross wrote:
On 04/04/18 17:42, Greg KH wrote:
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote:
On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote: > Please add the patches: > > commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream > commit dfc9327ab7c99bc13e12106448615efba833886b upstream > commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream > > to the 4.15 and 4.16 stable kernels. > > Those patches are needed to boot Linux as PVH guest on recent Xen.
So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
None.
BSD did it correctly. I guess Mini-OS doesn't count, as it is mostly Xen-internal, but it was not hit by this change.
Xen doesn't support anything other than BSD, Linux, and Mini-OS? :)
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
We have a working kernel just by luck. Would your reasoning be the same if the kernel would use an EFI runtime service wrong and an EFI update would lead to a crash?
If a UEFI/BIOS update broken working systems, first we would go yell at the BIOS engineers for doing something foolish (like I am doing here...) Then we would grumble and go fix the issue in the latest kernel version and tell people to update to a new release and never buy from that vendor ever again as they obviously do not care about their users.
So, I'll gladly tell everyone who hits this bug, to stop using Xen as they don't care about their users, and to work around it they have to use the 4.17 kernel release.
There, that was simple :)
thanks,
greg k-h
On 05/04/18 08:33, Greg KH wrote:
On Wed, Apr 04, 2018 at 06:32:17PM +0200, Juergen Gross wrote:
On 04/04/18 17:42, Greg KH wrote:
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote:
On 04/04/18 16:27, Greg KH wrote: > On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote: >> Please add the patches: >> >> commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream >> commit dfc9327ab7c99bc13e12106448615efba833886b upstream >> commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream >> >> to the 4.15 and 4.16 stable kernels. >> >> Those patches are needed to boot Linux as PVH guest on recent Xen. > > So a new feature? Why is that ok for stable kernels?
It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
None.
BSD did it correctly. I guess Mini-OS doesn't count, as it is mostly Xen-internal, but it was not hit by this change.
Xen doesn't support anything other than BSD, Linux, and Mini-OS? :)
No other OS supports PVH mode so far.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
We have a working kernel just by luck. Would your reasoning be the same if the kernel would use an EFI runtime service wrong and an EFI update would lead to a crash?
If a UEFI/BIOS update broken working systems, first we would go yell at the BIOS engineers for doing something foolish (like I am doing here...) Then we would grumble and go fix the issue in the latest kernel version and tell people to update to a new release and never buy from that vendor ever again as they obviously do not care about their users.
Even if the kernel wasn't using the EFI interfaces correctly and just worked by accident? Sorry, that's ridiculous.
So, I'll gladly tell everyone who hits this bug, to stop using Xen as they don't care about their users, and to work around it they have to use the 4.17 kernel release.
The kernel is wrong here. You don't want to take the patches fixing the issue. That's rather sad as PVH mode was meant to replace PV in the future, which will remove the need for most of the paravirt ops stuff. You are just shifting that possibility some months further into the future.
I won't fight against you any longer.
Juergen
On Thu, Apr 05, 2018 at 09:02:27AM +0200, Juergen Gross wrote:
On 05/04/18 08:33, Greg KH wrote:
On Wed, Apr 04, 2018 at 06:32:17PM +0200, Juergen Gross wrote:
On 04/04/18 17:42, Greg KH wrote:
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote:
On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote: > On 04/04/18 16:27, Greg KH wrote: >> On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote: >>> Please add the patches: >>> >>> commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream >>> commit dfc9327ab7c99bc13e12106448615efba833886b upstream >>> commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream >>> >>> to the 4.15 and 4.16 stable kernels. >>> >>> Those patches are needed to boot Linux as PVH guest on recent Xen. >> >> So a new feature? Why is that ok for stable kernels? > > It works for kernels since at least 4.11 on Xen 4.10.
Great, so what commit caused this to fail?
So far, in reading those commits, it sounds like they are "make Linux work again due to changes in Xen". That sounds like a pretty bad thing that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
None.
BSD did it correctly. I guess Mini-OS doesn't count, as it is mostly Xen-internal, but it was not hit by this change.
Xen doesn't support anything other than BSD, Linux, and Mini-OS? :)
No other OS supports PVH mode so far.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
We have a working kernel just by luck. Would your reasoning be the same if the kernel would use an EFI runtime service wrong and an EFI update would lead to a crash?
If a UEFI/BIOS update broken working systems, first we would go yell at the BIOS engineers for doing something foolish (like I am doing here...) Then we would grumble and go fix the issue in the latest kernel version and tell people to update to a new release and never buy from that vendor ever again as they obviously do not care about their users.
Even if the kernel wasn't using the EFI interfaces correctly and just worked by accident? Sorry, that's ridiculous.
So, I'll gladly tell everyone who hits this bug, to stop using Xen as they don't care about their users, and to work around it they have to use the 4.17 kernel release.
The kernel is wrong here. You don't want to take the patches fixing the issue.
These are not just "patches to fix the issue", they are "patches to add new features" that touch core acpi bits, right? Support for new hardware and platforms and such are not normally part of the stable kernel patches at all (with the exceptions of tiny patches that add device ids and quirks.)
That's my main objection here, combined with the obvious one of "Xen does not care about their users".
That's rather sad as PVH mode was meant to replace PV in the future, which will remove the need for most of the paravirt ops stuff. You are just shifting that possibility some months further into the future.
So if you run in PV mode, all is fine, right? Great, then just use 4.17 or newer for PVH, what's the issue? Who cares about this for older kernel versions, those are all in running systems that would not be changing their version of Xen.
But again, I still claim that Xen doesn't care about their users by breaking existing systems, no matter if Linux was wrong or not. That's just how the world is, Linux has to handle stupid userspace programs, and Xen needs to handle stupid operating system kernels, if those projects which to succeed over time.
Personally, I use KVM and now will strongly recommend others do the same.
thanks,
greg k-h
On 05/04/18 09:14, Greg KH wrote:
On Thu, Apr 05, 2018 at 09:02:27AM +0200, Juergen Gross wrote:
On 05/04/18 08:33, Greg KH wrote:
On Wed, Apr 04, 2018 at 06:32:17PM +0200, Juergen Gross wrote:
On 04/04/18 17:42, Greg KH wrote:
On Wed, Apr 04, 2018 at 05:12:32PM +0200, Juergen Gross wrote:
On 04/04/18 16:46, Greg KH wrote: > On Wed, Apr 04, 2018 at 04:30:30PM +0200, Juergen Gross wrote: >> On 04/04/18 16:27, Greg KH wrote: >>> On Wed, Apr 04, 2018 at 12:38:43PM +0200, Juergen Gross wrote: >>>> Please add the patches: >>>> >>>> commit 038bac2b02989acf1fc938cedcb7944c02672b9f upstream >>>> commit dfc9327ab7c99bc13e12106448615efba833886b upstream >>>> commit b17d9d1df3c33a4f1d2bf397e2257aecf9dc56d4 upstream >>>> >>>> to the 4.15 and 4.16 stable kernels. >>>> >>>> Those patches are needed to boot Linux as PVH guest on recent Xen. >>> >>> So a new feature? Why is that ok for stable kernels? >> >> It works for kernels since at least 4.11 on Xen 4.10. > > Great, so what commit caused this to fail? > > So far, in reading those commits, it sounds like they are "make Linux > work again due to changes in Xen". That sounds like a pretty bad thing > that Xen did, why do we have to fix up their mess?
Xen did nothing bad. It was the "old" kernel implementation which relied on an assumption which happened to be true by accident. Xen had to be changed in order to enable grub2 to support PVH mode.
The PVH interface specifies that the RSDP address is available via the start_info structure handed over to the PVH boot entry. The Linux kernel didn't look at that address, but used the legacy method scanning low memory for the RSDP table. As soon as Xen moved the RSDP to a higher address (which is covered by the PVH interface specification) the kernel could no longer be booted.
So it was clearly a fault of the kernel not complying to the PVH specification.
But it worked previously, so you can't fault Linux here :)
How many other operating systems broke with this change?
None.
BSD did it correctly. I guess Mini-OS doesn't count, as it is mostly Xen-internal, but it was not hit by this change.
Xen doesn't support anything other than BSD, Linux, and Mini-OS? :)
No other OS supports PVH mode so far.
Not at all. We have a working kernel here. Xen changed and broke working Linux systems. Now I understand the goal of wanting to also change Linux to work properly, but these changes are really a new feature addition if you read the patches.
We have a working kernel just by luck. Would your reasoning be the same if the kernel would use an EFI runtime service wrong and an EFI update would lead to a crash?
If a UEFI/BIOS update broken working systems, first we would go yell at the BIOS engineers for doing something foolish (like I am doing here...) Then we would grumble and go fix the issue in the latest kernel version and tell people to update to a new release and never buy from that vendor ever again as they obviously do not care about their users.
Even if the kernel wasn't using the EFI interfaces correctly and just worked by accident? Sorry, that's ridiculous.
So, I'll gladly tell everyone who hits this bug, to stop using Xen as they don't care about their users, and to work around it they have to use the 4.17 kernel release.
The kernel is wrong here. You don't want to take the patches fixing the issue.
These are not just "patches to fix the issue", they are "patches to add new features" that touch core acpi bits, right? Support for new hardware and platforms and such are not normally part of the stable kernel patches at all (with the exceptions of tiny patches that add device ids and quirks.)
The way the patches are written are the result of requests of the maintainers (x86, acpi). This way they don't break layering of the components. I'd be happy to rewrite them for stable kernels if you like that better.
That's my main objection here, combined with the obvious one of "Xen does not care about their users".
Xen does care. PVH support in Linux is relatively new (the first working kernel was 4.11), Xen has full PVH guest support since Xen 4.10.
For being able to replace PV mode it is mandatory for PVH to not add unnecessary performance overhead, as performance is the main reason for customers to run their guests in PV mode (yes, PV guests _are_ faster, especially with many vcpus).
It was discovered that placing the RSDP table in low memory is bad for performance as it adds more memory map holes than necessary. So moving the RSDP to the same memory area as all other ACPI tables was the right move and completely compliant to the specified interface. So in order not to hit performance for future guests it was decided to write patches for Linux to comply to the interface and move the RSDP. We hoped to get those patches into stable kernels, too, but it seems we were wrong.
That's rather sad as PVH mode was meant to replace PV in the future, which will remove the need for most of the paravirt ops stuff. You are just shifting that possibility some months further into the future.
So if you run in PV mode, all is fine, right? Great, then just use 4.17 or newer for PVH, what's the issue? Who cares about this for older kernel versions, those are all in running systems that would not be changing their version of Xen.
The idea was to be able to use kernel 4.14 or newer for PVH.
As you don't seem to take the patches it will be 4.17, of course.
Juergen
On Thu, Apr 5, 2018 at 9:00 AM, Juergen Gross jgross@suse.com wrote:
These are not just "patches to fix the issue", they are "patches to add new features" that touch core acpi bits, right? Support for new hardware and platforms and such are not normally part of the stable kernel patches at all (with the exceptions of tiny patches that add device ids and quirks.)
The way the patches are written are the result of requests of the maintainers (x86, acpi). This way they don't break layering of the components. I'd be happy to rewrite them for stable kernels if you like that better.
That's my main objection here, combined with the obvious one of "Xen does not care about their users".
Xen does care. PVH support in Linux is relatively new (the first working kernel was 4.11), Xen has full PVH guest support since Xen 4.10.
For being able to replace PV mode it is mandatory for PVH to not add unnecessary performance overhead, as performance is the main reason for customers to run their guests in PV mode (yes, PV guests _are_ faster, especially with many vcpus).
I'm afraid I have to agree with Greg here regarding the meaning of "supported"; and I remember expressing a similar sentiment when I discovered that a recent Linux kernel wouldn't boot on the development version of Xen. Either we declare PVH in Linux 4.11-4.16 as "supported", in which case we have to maintain backwards compatibility and attempt not to break it; or we declare PVH in Linux 4.11-4.16 as "tech preview" (retroactively), and Greg should feel free to ignore these backports.
It's unfortunate that Linux 4.11 didn't follow the spec, but whose fault is that?
The fact is, that as it stands, a user could have a perfectly working system with Xen 4.10 and a load of PVH guests running stock Linux 4.15, and then upgrade to Xen 4.11 and have all those guests break for no apparent reason. That's a pretty obnoxious thing to do, particularly as we made such a fanfare about Xen 4.10 finally having PVH support, and encouraging everyone to go and use it. How are all of those users going to feel about Xen?
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
-George
On Thu, Apr 5, 2018 at 11:06 AM, George Dunlap dunlapg@umich.edu wrote:
The fact is, that as it stands, a user could have a perfectly working system with Xen 4.10 and a load of PVH guests running stock Linux 4.15, and then upgrade to Xen 4.11 and have all those guests break for no apparent reason. That's a pretty obnoxious thing to do, particularly as we made such a fanfare about Xen 4.10 finally having PVH support, and encouraging everyone to go and use it. How are all of those users going to feel about Xen?
I mean, imagine a cloud provider that's managed to get a bunch of *customers* using PVH, because it's more secure than either PV or HVM (fewer hypercalls and no device emulation). Then she upgrades to Xen 4.11, and suddenly all the guests break on reboot! Worse yet, the only way to fix it is to have the customers either boot into classic PV mode or HVM mode in order to actually get an updated kernel! That's a real jerk move to pull on the early adopters who are so critical for widespread adoption and feedback.
-George
On 05/04/18 12:06, George Dunlap wrote:
On Thu, Apr 5, 2018 at 9:00 AM, Juergen Gross jgross@suse.com wrote:
These are not just "patches to fix the issue", they are "patches to add new features" that touch core acpi bits, right? Support for new hardware and platforms and such are not normally part of the stable kernel patches at all (with the exceptions of tiny patches that add device ids and quirks.)
The way the patches are written are the result of requests of the maintainers (x86, acpi). This way they don't break layering of the components. I'd be happy to rewrite them for stable kernels if you like that better.
That's my main objection here, combined with the obvious one of "Xen does not care about their users".
Xen does care. PVH support in Linux is relatively new (the first working kernel was 4.11), Xen has full PVH guest support since Xen 4.10.
For being able to replace PV mode it is mandatory for PVH to not add unnecessary performance overhead, as performance is the main reason for customers to run their guests in PV mode (yes, PV guests _are_ faster, especially with many vcpus).
I'm afraid I have to agree with Greg here regarding the meaning of "supported"; and I remember expressing a similar sentiment when I discovered that a recent Linux kernel wouldn't boot on the development version of Xen. Either we declare PVH in Linux 4.11-4.16 as
You finally said:
My subsequent response to Roger ("FWIW I can buy this argument") was meant to indicate I didn't have any more objection to the approach you guys were planning on taking.
"supported", in which case we have to maintain backwards compatibility and attempt not to break it; or we declare PVH in Linux 4.11-4.16 as "tech preview" (retroactively), and Greg should feel free to ignore these backports.
I still believe he should take them, as they are correcting a bug in the kernel.
It's unfortunate that Linux 4.11 didn't follow the spec, but whose fault is that?
Linux? ;-)
I have no problem to admit that the patches adding PVH support to the Linux kernel were wrong in this regard and I didn't detect that when reviewing them.
The fact is, that as it stands, a user could have a perfectly working system with Xen 4.10 and a load of PVH guests running stock Linux 4.15, and then upgrade to Xen 4.11 and have all those guests break for no apparent reason. That's a pretty obnoxious thing to do, particularly as we made such a fanfare about Xen 4.10 finally having PVH support, and encouraging everyone to go and use it. How are all of those users going to feel about Xen?
Point taken.
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Juergen
On 04/05/2018 08:19 AM, Juergen Gross wrote:
On 05/04/18 12:06, George Dunlap wrote:
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Can we backport latest struct hvm_start_info changes (which bumped interface version) to 4.11 and pass RSDP only for versions >=1?
-boris
On 05/04/18 15:00, Boris Ostrovsky wrote:
On 04/05/2018 08:19 AM, Juergen Gross wrote:
On 05/04/18 12:06, George Dunlap wrote:
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Can we backport latest struct hvm_start_info changes (which bumped interface version) to 4.11 and pass RSDP only for versions >=1?
And this would help how?
RSDP address is passed today, the kernel just doesn't read it. And how should Xen know which interface version the kernel is supporting? And Xen needs to know that in advance in order to place the RSDP in low memory in case the kernel isn't reading the RSDP address from start_info.
Juergen
On Thu, Apr 5, 2018 at 2:06 PM, Juergen Gross jgross@suse.com wrote:
On 05/04/18 15:00, Boris Ostrovsky wrote:
On 04/05/2018 08:19 AM, Juergen Gross wrote:
On 05/04/18 12:06, George Dunlap wrote:
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Can we backport latest struct hvm_start_info changes (which bumped interface version) to 4.11 and pass RSDP only for versions >=1?
And this would help how?
RSDP address is passed today, the kernel just doesn't read it. And how should Xen know which interface version the kernel is supporting? And Xen needs to know that in advance in order to place the RSDP in low memory in case the kernel isn't reading the RSDP address from start_info.
But the kernel image has ELF notes, right? You can put one that indicates that this binary *does* know how to read the RSDP from the start_info, and if you don't find that, put it in lowmem.
-George
On 05/04/18 15:42, George Dunlap wrote:
On Thu, Apr 5, 2018 at 2:06 PM, Juergen Gross jgross@suse.com wrote:
On 05/04/18 15:00, Boris Ostrovsky wrote:
On 04/05/2018 08:19 AM, Juergen Gross wrote:
On 05/04/18 12:06, George Dunlap wrote:
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Can we backport latest struct hvm_start_info changes (which bumped interface version) to 4.11 and pass RSDP only for versions >=1?
And this would help how?
RSDP address is passed today, the kernel just doesn't read it. And how should Xen know which interface version the kernel is supporting? And Xen needs to know that in advance in order to place the RSDP in low memory in case the kernel isn't reading the RSDP address from start_info.
But the kernel image has ELF notes, right? You can put one that indicates that this binary *does* know how to read the RSDP from the start_info, and if you don't find that, put it in lowmem.
Sow you would hurt BSD which does read the RSDP address correctly but (today) has no such ELF note.
I think extending the PVH interface in such a way is no good idea.
Juergen
On Thu, Apr 5, 2018 at 3:09 PM, Juergen Gross jgross@suse.com wrote:
On 05/04/18 15:42, George Dunlap wrote:
On Thu, Apr 5, 2018 at 2:06 PM, Juergen Gross jgross@suse.com wrote:
On 05/04/18 15:00, Boris Ostrovsky wrote:
On 04/05/2018 08:19 AM, Juergen Gross wrote:
On 05/04/18 12:06, George Dunlap wrote:
Aren't there flags in the binary somewhere that could tell the toolstack / Xen whether the kernel in question needs the RSDP table in lowmem, or whether it can be put higher?
Not really. Analyzing the binary whether it accesses the rsdp_addr in the start_info isn't the way to go, IMO.
I've sent a patch to xen-devel adding a quirk flag to the domain's config to enable the admin special casing such an "old" kernel.
Can we backport latest struct hvm_start_info changes (which bumped interface version) to 4.11 and pass RSDP only for versions >=1?
And this would help how?
RSDP address is passed today, the kernel just doesn't read it. And how should Xen know which interface version the kernel is supporting? And Xen needs to know that in advance in order to place the RSDP in low memory in case the kernel isn't reading the RSDP address from start_info.
But the kernel image has ELF notes, right? You can put one that indicates that this binary *does* know how to read the RSDP from the start_info, and if you don't find that, put it in lowmem.
Sow you would hurt BSD which does read the RSDP address correctly but (today) has no such ELF note.
I think extending the PVH interface in such a way is no good idea.
Option 1: Put the RSDP in lowmem unless we know the guest will use the address in start_info Pro: Existing Linux instances boot Con: Existing BSD instances whose memory is an exact multiple of 1 GiB will have slightly slower TLB miss times.
Option 2: Put the RSDP in highmem regardless Pro: Existing BSD instances whose memory is an exact multiple of 1GiB will have slightly faster TLB miss times Con: Existing Linux instances don't boot at all
This seems like a no-brainer to me. But anyway, maybe we should move the discussion elsewhere and stop bothering Greg. :-)
-George
Den 05.04.2018 kl. 10:02, skrev Juergen Gross:
The kernel is wrong here. You don't want to take the patches fixing the issue. That's rather sad as PVH mode was meant to replace PV in the future, which will remove the need for most of the paravirt ops stuff. You are just shifting that possibility some months further into the future.
I won't fight against you any longer.
Please post the series adapted for 4.14 anyway for distros & users wanting to have the 4.14 series kernels working with new xen.
-- Thomas
linux-stable-mirror@lists.linaro.org