On Sun, 10 Nov 2019 16:26:15 +0000, Ard Biesheuvel ard.biesheuvel@linaro.org wrote:
(adding Marc, the GIC maintainer)
On Sun, 10 Nov 2019 at 15:57, Sasha Levin sashal@kernel.org wrote:
On Sun, Nov 10, 2019 at 02:16:57PM +0000, Ard Biesheuvel wrote:
On Sun, 10 Nov 2019 at 13:27, Sasha Levin sashal@kernel.org wrote:
On Sun, Nov 10, 2019 at 08:33:47AM +0100, Ard Biesheuvel wrote:
On Sun, 10 Nov 2019 at 03:44, Sasha Levin sashal@kernel.org wrote:
From: Ard Biesheuvel ard.biesheuvel@linaro.org
[ Upstream commit 71e0940d52e107748b270213a01d3b1546657d74 ]
In order to allow the OS to reserve memory persistently across a kexec, introduce a Linux-specific UEFI configuration table that points to the head of a linked list in memory, allowing each kernel to add list items describing memory regions that the next kernel should treat as reserved.
This is useful, e.g., for GICv3 based ARM systems that cannot disable DMA access to the LPI tables, forcing them to reuse the same memory region again after a kexec reboot.
Tested-by: Jeremy Linton jeremy.linton@arm.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org
NAK
This doesn't belong in -stable, and I'd be interested in understanding how this got autoselected, and how I can prevent this from happening again in the future.
It was selected because it's part of a fix for a real issue reported by users:
For my understanding, are you saying your AI is reading launchpad bug reports etc? Because it is marked AUTOSEL.
Not quite. This review set was me feeding all the patches Ubuntu has on top of stable trees into AUTOSEL, and sending out the output for review. I doesn't look into launchpad bug reports on it's own, but in my experience one can find a bug report for mostly everything AUTOSEL considers to be a bug.
So the assumption is that taking an arbitrary subset of what Ubuntu backported (and tested extensively), and letting that subset be chosen by a bot is a process that improves the quality of stable trees? I'm rather skeptical of that tbh.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1806766
That pages mentions
""" 2 upstream patch series are required to fix this: https://<email address hidden>/msg10328.html Which provides an EFI facility consumed by: https://lkml.org/lkml/2018/9/21/1066 There were also some follow-on fixes to deal with ARM-specific problems associated with this usage: https://www.spinics.net/lists/arm-kernel/msg685751.html """
and without the other patches, we only add bugs and don't fix any.
Besides ubuntu, it is also carried by:
SUSE: https://www.suse.com/support/update/announcement/2019/suse-su-20191530-1/ CentOS: https://koji.mbox.centos.org/koji/buildinfo?buildID=4558
As a way to resolve the reported bug.
Backporting a feature/fix like this requires careful consideration of the patches involved, and doing actual testing on hardware.
Any reason this *shouldn't* be in stable?
Yes. By itself, it causes crashes at early boot and does not actually solve the problem.
Sure, let's work on gathering all the needed patches then and testing it out.
No, let's not. This is a feature that was introduced to address a shortcoming in some hardware that makes kexec/kdump problematic on them. If you want kexec/kdump on that hardware, use a newer kernel.
That's my position as well. This isn't a bug fix at all, but a workaround for a HW defect with complex dependencies. It wasn't cc'd stable for good reasons, and I wish stable maintainers would respect this decision.
I'm aware that there might be dependencies that are not obvious to me, but the solution here is to take those dependencies as well rather than ignore the process completely.
This is not a bugfix. kexec/kdump never worked correctly on the hardware involved, and backporting a feature like that goes way beyond what I am willing to accept for stable backports affecting the EFI subsystem.
I'm a bit confused. The bug report starts with:
[Impact] kdump support isn't usable on HiSilicon D05 systems. This previously worked in bionic.
So it seems like it did use to work, but not anymore?
I have no idea what Ubuntu shipped in the previous kernel, but labelling this as a software regression is dubious at least, and wholly inaccurate for upstream.
I have this exact machine keeping my feet warm, and kexec never worked on it (nor on any GICv3 machine that is LPI-capable) before we added this workaround. Whatever distros carried as hacks to make it work at the time, I don't want to know. What is more likely is that they always kexec'd a kernel with the exact same allocation layout and that it worked by luck (or maybe resulted in silent memory corruption).
If anything, I'd merge a patch *disabling* kexec altogether on these systems, because pretending that it ever worked is a damn lie.
Either way, I understand that you want to keep the stable tree conservative, but keep in mind that the flip side of not taking fixes that users ask for means that distros end up having to carry them anyway, which means that they don't get the review and testing they need.
I'd say it is the opposite. At least the distros test their backports on actual hardware. Taking any part of this set without testing it by doing kexec/kdump on an affected ARM system, and regression testing it on the hardware that got broken by it (with hundreds of cores IIRC) is totally irresponsible, and I don't have the time or the hardware to do the testing.
We can argue all we want around whether it's a fix or not, but if most distros carry it then I think our argument is moot.
If someone cares enough to backport these as a coherent set, with boot tests on the affected hardware etc, then I am not going to object.
Ideally, I'd like the distros to test these backports, because only them have access to the variety of HW that's required. But they'd have to majorly up their game, if the above is anything to go by...
Thanks,
M.