On Sat, Aug 1, 2020 at 3:38 AM Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:
From: Geert Uytterhoeven geert@linux-m68k.org
commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.
If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few hundred identical messages:
unwind: Unknown symbol address c0800300 unwind: Index not found c0800300
c0800300 is the return address from the last subroutine call (to __memzero()) in __mmap_switched(). Apparently having this address in the link register confuses the unwinder.
To fix this, reset the link register to zero before jumping to start_kernel().
Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code") Suggested-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Nicolas Pitre nico@linaro.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Nick Desaulniers ndesaulniers@google.com
Looks like this first landed in v4.15-rc1. Without this, we can't tell during an unwind initiated from start_kernel() when to stop unwinding, which for the clang specific implementation of the arm frame pointer unwinder leads to dereferencing a garbage value, triggering an exception which has no fixup, triggering a panic, triggering an unwind, triggering an infinite loop that prevents booting. I have more patches to send upstream to make the unwinder more resilient, but it's ambiguous as to when to stop unwinding without this patch.
Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you _SURE_ this is needed in 4.14.y?
It's a fair question, sorry I didn't notice and pre-emptively address.
Yes. Prior to 59b6359dd92d, the value in the link register (LR) was garbage left over from the calls to __memzero added in 9520b1a1b5f7. I suspect that after a `ret` instruction, the value in LR really shouldn't be used again.
Having garbage in LR when chasing frame pointers from an unwind started in start_kernel() makes it appear that there are further frames to unwind, hence the error noted in the commit message of 59b6359dd92d.
Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references the latest change that noticeably changed the value of LR, but it was still previously undefined what its last value was set to). In fact, digging up the original suggestion from Ard regarding 59b6359dd92d: https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4...
I don't think the patch itself is to blame here, it simply triggers an existing issue in the unwinder (if my analysis is correct, of course)
and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d. (I agree with Ard's analysis). Yes, "c0800300 is the return address from the last subroutine call (to __memzero()) in __mmap_switched()" is correct, but I'd have argued this was broken even before 59b6359dd92d (which is Ard's point). Forgive me if I'm misinterpreting your analysis, Ard. Maybe that Fixes tag was the simplest to avoid backports to stable which would have had conflicts due to 9520b1a1b5f7 being a dependency?
Looking at the callers of __mmap_switched, it's hard to tell who would have set the last value of LR, as there's divergent implementations based on whether or not there's MMU support. I don't think it matters though, and that unwinding via frame pointer on ARM out of a path in start_kernel() was broken until 59b6359dd92d. I suspect a combination of the frame pointer unwinder not being as popular on ARM (vs CONFIG_UNWINDER_ARM) and the lack of needing to unwind from start_kernel() (since unwinding occurs for exceptional cases like WARN_ON or panics) as sources that may have helped to keep this bug latent for a while.
Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of interest there IMO. https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-g...