This problem reported by Clement LE GOFFIC manifest when using CONFIG_KASAN_IN_VMALLOC and VMAP_STACK: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6...
After some analysis it seems we are missing to sync the VMALLOC shadow memory in top level PGD to all CPUs.
Add some code to perform this sync, and the bug appears to go away.
As suggested by Ard, also perform a dummy read from the shadow memory of the new VMAP_STACK in the low level assembly.
Signed-off-by: Linus Walleij linus.walleij@linaro.org --- Changes in v3: - Collect Mark Rutlands ACK on patch 1 - Change the simplified assembly add r2, ip, lsr #n to the canonical add r2, r2, ip, lsr #n in patch 2. - Link to v2: https://lore.kernel.org/r/20241016-arm-kasan-vmalloc-crash-v2-0-0a52fd086eef...
Changes in v2: - Implement the two helper functions suggested by Russell making the KASAN PGD copying less messy. - Link to v1: https://lore.kernel.org/r/20241015-arm-kasan-vmalloc-crash-v1-0-dbb23592ca83...
--- Linus Walleij (2): ARM: ioremap: Sync PGDs for VMALLOC shadow ARM: entry: Do a dummy read from VMAP shadow
arch/arm/kernel/entry-armv.S | 8 ++++++++ arch/arm/mm/ioremap.c | 25 +++++++++++++++++++++---- 2 files changed, 29 insertions(+), 4 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20241015-arm-kasan-vmalloc-crash-fcbd51416457
Best regards,
When sync:ing the VMALLOC area to other CPUs, make sure to also sync the KASAN shadow memory for the VMALLOC area, so that we don't get stale entries for the shadow memory in the top level PGD.
Since we are now copying PGDs in two instances, create a helper function named memcpy_pgd() to do the actual copying, and create a helper to map the addresses of VMALLOC_START and VMALLOC_END into the corresponding shadow memory.
Cc: stable@vger.kernel.org Fixes: 565cbaad83d8 ("ARM: 9202/1: kasan: support CONFIG_KASAN_VMALLOC") Link: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6... Reported-by: Clement LE GOFFIC clement.legoffic@foss.st.com Suggested-by: Mark Rutland mark.rutland@arm.com Suggested-by: Russell King (Oracle) linux@armlinux.org.uk Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Linus Walleij linus.walleij@linaro.org --- arch/arm/mm/ioremap.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 794cfea9f9d4..94586015feed 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -23,6 +23,7 @@ */ #include <linux/module.h> #include <linux/errno.h> +#include <linux/kasan.h> #include <linux/mm.h> #include <linux/vmalloc.h> #include <linux/io.h> @@ -115,16 +116,32 @@ int ioremap_page(unsigned long virt, unsigned long phys, } EXPORT_SYMBOL(ioremap_page);
+static unsigned long arm_kasan_mem_to_shadow(unsigned long addr) +{ + return (unsigned long)kasan_mem_to_shadow((void *)addr); +} + +static void memcpy_pgd(struct mm_struct *mm, unsigned long start, + unsigned long end) +{ + memcpy(pgd_offset(mm, start), pgd_offset_k(start), + sizeof(pgd_t) * (pgd_index(end) - pgd_index(start))); +} + void __check_vmalloc_seq(struct mm_struct *mm) { int seq;
do { seq = atomic_read(&init_mm.context.vmalloc_seq); - memcpy(pgd_offset(mm, VMALLOC_START), - pgd_offset_k(VMALLOC_START), - sizeof(pgd_t) * (pgd_index(VMALLOC_END) - - pgd_index(VMALLOC_START))); + memcpy_pgd(mm, VMALLOC_START, VMALLOC_END); + if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) { + unsigned long start = + arm_kasan_mem_to_shadow(VMALLOC_START); + unsigned long end = + arm_kasan_mem_to_shadow(VMALLOC_END); + memcpy_pgd(mm, start, end); + } /* * Use a store-release so that other CPUs that observe the * counter's new value are guaranteed to see the results of the
On 10/17/24 14:59, Linus Walleij wrote:
[...] +static unsigned long arm_kasan_mem_to_shadow(unsigned long addr) +{
- return (unsigned long)kasan_mem_to_shadow((void *)addr);
+}
`kasan_mem_to_shadow` function symbol is only exported with : CONFIG_KASAN_GENERIC or defined(CONFIG_KASAN_SW_TAGS) from kasan.h
To me, the if condition you added below should be expanded with those two macros.
[...] void __check_vmalloc_seq(struct mm_struct *mm) { int seq; do { seq = atomic_read(&init_mm.context.vmalloc_seq);
memcpy(pgd_offset(mm, VMALLOC_START),
pgd_offset_k(VMALLOC_START),
sizeof(pgd_t) * (pgd_index(VMALLOC_END) -
pgd_index(VMALLOC_START)));
memcpy_pgd(mm, VMALLOC_START, VMALLOC_END);
if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
unsigned long start =
arm_kasan_mem_to_shadow(VMALLOC_START);
unsigned long end =
arm_kasan_mem_to_shadow(VMALLOC_END);
memcpy_pgd(mm, start, end);
/*}
- Use a store-release so that other CPUs that observe the
- counter's new value are guaranteed to see the results of the
Otherwise it compiles with KASAN enabled, I am running some tests with your patches.
Regards,
Clément
On Thu, Oct 17, 2024 at 4:22 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
On 10/17/24 14:59, Linus Walleij wrote:
[...]
+static unsigned long arm_kasan_mem_to_shadow(unsigned long addr) +{
return (unsigned long)kasan_mem_to_shadow((void *)addr);
+}
`kasan_mem_to_shadow` function symbol is only exported with : CONFIG_KASAN_GENERIC or defined(CONFIG_KASAN_SW_TAGS) from kasan.h
To me, the if condition you added below should be expanded with those two macros.
(...)
if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
Let's check this with the KASAN authors, I think looking for CONFIG_KASAN_VMALLOC should be enough as it is inside the if KASAN clause in lib/Kconfig.kasan, i.e. the symbol KASAN must be enabled for CONFIG_KASAN_VMALLOC to be enabled, and if KASAN is enabled then either KASAN_GENERIC or KASAN_SW_TAGS is enabled (the third option KASAN_HW_TAGS, also known as memory tagging is only available on ARM64 and we are not ARM64.)
But I might be wrong! Kconfig regularly bites me in the foot...
Yours, Linus Walleij
On 10/17/24 21:00, Linus Walleij wrote:
On Thu, Oct 17, 2024 at 4:22 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
On 10/17/24 14:59, Linus Walleij wrote:
[...]
+static unsigned long arm_kasan_mem_to_shadow(unsigned long addr) +{
return (unsigned long)kasan_mem_to_shadow((void *)addr);
+}
`kasan_mem_to_shadow` function symbol is only exported with : CONFIG_KASAN_GENERIC or defined(CONFIG_KASAN_SW_TAGS) from kasan.h
To me, the if condition you added below should be expanded with those two macros.
(...)
if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
Let's check this with the KASAN authors, I think looking for CONFIG_KASAN_VMALLOC should be enough as it is inside the if KASAN clause in lib/Kconfig.kasan, i.e. the symbol KASAN must be enabled for CONFIG_KASAN_VMALLOC to be enabled, and if KASAN is enabled then either KASAN_GENERIC or KASAN_SW_TAGS is enabled (the third option KASAN_HW_TAGS, also known as memory tagging is only available on ARM64 and we are not ARM64.)
But I might be wrong! Kconfig regularly bites me in the foot...
Yours, Linus Walleij
Hi Linus,
I saw your email about Melon's patch targeting the same subject. If we don't enable KASAN either you patch or Melon's one do not compile.
[...] + if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) [...]
Should be replaced with an #ifdef directive. `kasan_mem_to_shadow` symbol is hiden behind :
include/linux/kasan.h:32:#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
So symbol doesn't exist without KASAN enabled.
I don't know how to submit a comment on Melon's patch inside Russel's website.
No issue with KASAN enabled so I did some tests with your serie on a stm32mp157f-dk2 board and qemu. I only get an issue on my board while testing with an ext4 fs on a sdcard and I'm unable to reproduce it with qemu.
Perhaps not related with this topic but as in the backtrace I am getting some keyword from our start exchange, I dump the crash below. If this backtrace is somehow related with our issue, please have a look.
[ 1439.267852] 8<--- cut here --- [ 1439.269570] Unable to handle kernel paging request at virtual address 809b8480 when read [ 1439.277631] [809b8480] *pgd=00000000 [ 1439.281287] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 1439.286534] Modules linked in: aes_arm aes_generic cmac algif_hash algif_skcipher af_alg stm32_adc stm32_timer_trigger stm32_lptimer_trigger snd_soc_stm32_sai_sub snd_soc_audio_graph_card snd_soc_simple_card_utils usb_f_ncm stusb160x u_ether brcmfmac_wcc typec hci_uart btbcm stm32_hash stm32_cryp crypto_engine snd_soc_cs42l51_i2c libdes snd_soc_stm32_i2s snd_soc_cs42l51 brcmfmac snd_soc_hdmi_codec bluetooth brcmutil stm32_vrefbuf snd_soc_core libcomposite stm32_adc_core snd_pcm_dmaengine ac97_bus snd_pcm cfg80211 snd_timer snd_soc_stm32_sai ecdh_generic ecc snd libaes stm32_cec soundcore stm32_ddr_pmu stm32_crc32 [ 1439.340928] CPU: 0 PID: 20797 Comm: grep Not tainted 6.6.48 #1 [ 1439.346767] Hardware name: STM32 (Device Tree Support) [ 1439.351945] PC is at __read_once_word_nocheck+0x0/0x8 [ 1439.356965] LR is at unwind_exec_insn+0x364/0x658 [ 1439.361662] pc : [<c011369c>] lr : [<c0113bb0>] psr: 600f0193 [ 1439.367953] sp : de803358 ip : c20584b8 fp : bad0067c [ 1439.373135] r10: de80344c r9 : 0000000b r8 : de80342c [ 1439.378416] r7 : 00000009 r6 : c20584b8 r5 : 809b8480 r4 : de803400 [ 1439.384911] r3 : 00000007 r2 : 00000000 r1 : 00000000 r0 : 809b8480 [ 1439.391405] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none [ 1439.398717] Control: 10c5387d Table: c222c06a DAC: 00000051 [ 1439.404398] Register r0 information: non-paged memory [ 1439.409505] Register r1 information: NULL pointer [ 1439.414197] Register r2 information: NULL pointer [ 1439.418888] Register r3 information: non-paged memory [ 1439.423881] Register r4 information: 4-page vmalloc region starting at 0xde800000 allocated at start_kernel+0x1a8/0x334 [ 1439.434669] Register r5 information: non-paged memory [ 1439.439764] Register r6 information: non-slab/vmalloc memory [ 1439.445466] Register r7 information: non-paged memory [ 1439.450459] Register r8 information: 4-page vmalloc region starting at 0xde800000 allocated at start_kernel+0x1a8/0x334 [ 1439.461236] Register r9 information: non-paged memory [ 1439.466330] Register r10 information: 4-page vmalloc region starting at 0xde800000 allocated at start_kernel+0x1a8/0x334 [ 1439.477207] Register r11 information: non-paged memory [ 1439.482302] Register r12 information: non-slab/vmalloc memory [ 1439.488103] Process grep (pid: 20797, stack limit = 0xd736559a) [ 1439.494000] Stack: (0xde803358 to 0xde804000) [ 1439.498385] 3340: de803480 c01137f0 [ 1439.506512] 3360: c0100ff4 df113fb0 de803448 00000000 bad00678 809b8480 de8034a0 de8034c8 [ 1439.514739] 3380: 00000001 809b8480 de803400 c1241d90 bad0067c c0114114 00004e20 de8034b0 [ 1439.522966] 33a0: de8034d4 de8034cc de803400 809b8480 c1241da8 c1241da8 809b8480 de8034d0 [ 1439.531093] 33c0: 41b58ab3 c194f7f4 c0113ea4 c5349bc0 da2ee5c0 da2ee604 dd811334 c034a6e4 [ 1439.539318] 33e0: 41b58ab3 c194f7f4 c0113ea4 00000000 00000000 00000000 00000000 00000000 [ 1439.547440] 3400: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.555664] 3420: 00000000 00000000 00000000 809b8480 00000000 809b8480 c1241da8 00000000 [ 1439.563887] 3440: c1ae9dc8 809bc000 00000000 00000000 00000000 00000003 00000000 c2047e40 [ 1439.572011] 3460: 00000003 bad0069c 000002c2 00000000 c2048988 00000003 c2047e40 8bf8f8ba [ 1439.580238] 3480: de803500 c01f5200 de803500 00000000 c3020900 00000001 00000000 faacd85f [ 1439.588363] 34a0: de8034cc c01f5200 de803520 00000000 c3020900 00000001 00000000 de803720 [ 1439.596591] 34c0: de8034ec c010ed7c 809b8480 de803abc c1241da8 c1241da8 de803ab8 faacd85f [ 1439.604818] 34e0: de803560 bad006a0 c207c280 c01f539c 00000000 bad006a8 de8035a0 c01f52c4 [ 1439.612944] 3500: 41b58ab3 c195e47c c01f530c de803560 00000001 00000000 de803700 c01736fc [ 1439.621169] 3520: de803580 00000040 00000000 00000011 00000000 d0c85340 00000000 de8036a0 [ 1439.629296] 3540: bad006b4 c01f0328 d0b58f00 c1c06cc0 de8036a0 c07aa568 da2eae80 0000281e [ 1439.637521] 3560: 00000005 bad006b4 de803600 faacd85f 00000000 00000001 00000001 c036bcb8 [ 1439.645649] 3580: c036bcb8 c036d268 c015f29c c01803e4 c01fceb4 c0218570 c0218bd0 c01fe05c [ 1439.653876] 35a0: c01ffe5c c0eb80d8 c01bd890 c01b463c c01012c4 c12a6bc4 c0100bc8 c0113850 [ 1439.662098] 35c0: c1241da8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.670219] 35e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.678441] 3600: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.686563] 3620: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.694786] 3640: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.703008] 3660: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1439.711134] 3680: 00000001 c5419b40 dd805360 c036d268 c5419904 c54199dc bad006d8 c5419280 [ 1439.719360] 36a0: 00000001 c015f29c c65e7f00 1b0aeebb 0000014f c0e463fc d07d3800 faacd85f [ 1439.727586] 36c0: 41b58ab3 c1955000 c015f20c 00000000 c0e46390 c01a28b0 0000014f c5419580 [ 1439.735711] 36e0: 00000000 c195849c c01a282c c5419280 da2eae40 00000000 0016e360 00000000 [ 1439.743937] 3700: 00000000 c01fd6f0 c5419280 da2eae40 da2eae80 da2eae40 c5419288 00000000 [ 1439.752063] 3720: 00000000 c01850f0 1b0aeebb faacd85f c1b8be40 da2eae40 c1b8be40 00000000 [ 1439.760289] 3740: c1c07144 c5419280 00000000 c1c07144 de803794 c01803e4 00000000 00000000 [ 1439.768414] 3760: da2eb640 00000000 c207dac0 c1a3750c 1875f000 da2e3940 bad006f4 ffffffae [ 1439.776640] 3780: 00000000 c5419280 fffffff7 de803800 00000000 c01fceb4 bad006f8 c1b8be40 [ 1439.784866] 37a0: 41b58ab3 c195ea00 c01fcd90 c1b8be40 00000000 00000000 c01362fc c1241da8 [ 1439.792992] 37c0: 41b58ab3 c1956708 c017f024 c01736fc 00000000 00000000 00000000 c01728b0 [ 1439.801219] 37e0: 00000000 c1c05d80 c20925e0 c5419280 c20925e0 1b2d5580 da2e4a80 faacd85f [ 1439.809344] 3800: c1c05d80 da2e4da0 c1c06e40 00000000 0000014f 1a952e8f da2e4a40 da2e4da0 [ 1439.817571] 3820: de803a30 de803a30 da2e4da0 1a952e8f da2e4a40 da2e4db0 da2e4a80 c0218570 [ 1439.825698] 3840: 0000014f da2e4db0 de803a30 c0218bd0 600f0193 da2e4a40 c0218b70 da2e4a90 [ 1439.833925] 3860: da2e4a40 00000000 da2e4a40 c01fe05c da2e4a8c da2e4dd8 de803920 da2e4a98 [ 1439.842151] 3880: 00000050 c5419280 da2e4a94 00000006 de8039e8 1a94faa3 0000014f bad00718 [ 1439.850277] 38a0: 41b58ab3 c195f258 c01fdd98 c01b5444 60010193 c02024c8 c66e23cc 0000014f [ 1439.858504] 38c0: 41b58ab3 c195f258 c01fdd98 0000014f 00000000 1875f000 da2ee000 00023f1b [ 1439.866630] 38e0: 1a94faa3 0000014f 7fffffff da2e4a68 da2ee000 00000000 0000014f 1a94bf00 [ 1439.874857] 3900: 1875f000 7fffffff da2e4a50 c020540c 05fd815f 1a94bf00 da2e4b18 faacd85f [ 1439.883084] 3920: da2e4b18 da2e4a40 1a94faa3 0000014f 600f0193 ffffffff 7fffffff da2e4a50 [ 1439.891212] 3940: da2e4b90 c01ffe5c 600f0193 0000000f da2e4b68 da2e4a60 da2e4af0 da2e4b18 [ 1439.899438] 3960: 00000003 da2e4a70 da2e4b68 da2e4ac8 da2e4a4c da2e4bb8 da2e4b70 da2e4bc0 [ 1439.907565] 3980: 1a94faa3 0000014f 0000001a da2ee000 c0eb80a4 c146d9e0 c30c3c00 c30b7428 [ 1439.915791] 39a0: 0000001a 1875f000 de8039b8 c0eb80d8 c30b7400 c01bd890 c30b7400 c02b9c30 [ 1439.923918] 39c0: c1b8aab0 c30b7400 c5419280 c1b8aab0 c1c07538 c1dbae00 de80e000 de80e00c [ 1439.932145] 39e0: bad0075c c01b463c 0000001b 0000001b c1b8aab0 c01012c4 c1b8ba74 de8039f8 [ 1439.940373] 3a00: de803a30 00000000 200f0013 ffffffff de803a64 e5333f40 c5419280 c1241d90 [ 1439.948498] 3a20: bad0075c c12a6bc4 c0113850 c0100bc8 de803b00 00000000 00000001 000000c0 [ 1439.956725] 3a40: e5333f40 de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c [ 1439.964851] 3a60: c20584b8 de803a7c c0114114 c0113850 200f0013 ffffffff 00000051 e5333f40 [ 1439.973078] 3a80: de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c c0114114 [ 1439.981305] 3aa0: de803ae0 00000001 de803bdc de803bd4 de803b00 809b8480 c1241da8 c1241da8 [ 1439.989432] 3ac0: e5333f40 de803bd8 c0113ea4 c07aa5e8 da2eae80 b778d32a ffffffff bad00760 [ 1439.997656] 3ae0: 41b58ab3 c194f7f4 c0113ea4 00000000 00000000 00000000 00000000 00000000 [ 1440.005778] 3b00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.014002] 3b20: 00000000 00000000 00000000 e5333f40 00000000 e5333f40 c1241da8 00000000 [ 1440.022127] 3b40: c1ae9dc4 e5334000 00000000 00000000 00000001 00000001 c015f29c c01803e4 [ 1440.030354] 3b60: c01fceb4 c0218570 c0218bd0 c01fe05c c01ffe5c c0eb80d8 c01bd890 891bf569 [ 1440.038580] 3b80: c01012c4 c01f5200 de803c00 891bf569 00000000 faacd85f de803c20 c01f5200 [ 1440.046705] 3ba0: de803c20 faacd85f 00000000 c01f5200 de803c20 00000000 00000000 c01d8bb4 [ 1440.054932] 3bc0: c62ff600 c1253858 de803bf4 c010ed7c e5333f40 de804000 c1241da8 c1241da8 [ 1440.063059] 3be0: de803ffc faacd85f de803c60 bad00780 c204e68c c01f539c 0000000d c01f52c4 [ 1440.071285] 3c00: 41b58ab3 c195e47c c01f530c c01f52c4 c45904b4 c01f52c4 c507a104 00000000 [ 1440.079510] 3c20: de803c88 00000040 00000000 00000008 c3020000 de803c88 00000001 c62ff400 [ 1440.087635] 3c40: c1253858 c0887230 00000000 00000000 83b6dbea c507a104 00000000 faacd85f [ 1440.095860] 3c60: 00000000 c62ff404 00000000 c3020000 00000000 faacd85f c62ff604 00000000 [ 1440.103987] 3c80: c3020000 c036bd10 c036bd10 c036d64c c036bfb4 c0369864 c01d8bb4 c01de580 [ 1440.112211] 3ca0: c0135ac4 c1241da8 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.120333] 3cc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.128555] 3ce0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.136777] 3d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.144899] 3d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.153121] 3d40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.161343] 3d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1440.169469] 3d80: 00000000 00000000 c3020000 b7c5fec0 c62ff600 c036d64c dd826bb8 c036bfb4 [ 1440.177696] 3da0: c3020000 c62ff600 c01d8bb4 dd826bb8 00000000 c62ff604 c1253858 c0369864 [ 1440.185823] 3dc0: da2eb640 d19ca500 d19ca500 d19cad20 da2eb640 c5419280 0000000a da2eb640 [ 1440.194049] 3de0: c1c70940 c62ff608 c62ff604 c1253858 c204e68c c01d8bb4 c3130000 00000000 [ 1440.202174] 3e00: 00000000 de803ee0 c31306ac c5419284 de803e80 00000009 00000000 0000000a [ 1440.210400] 3e20: da2eb680 c541948c bad007cc c1c06d14 00000001 c31306ac c5419288 da2eb6a4 [ 1440.218627] 3e40: da2eb6c8 de803ec0 de803f2c 00000000 c3130200 d0721280 de803f2c c017f120 [ 1440.226754] 3e60: 41b58ab3 c195cd48 c01d8858 80080093 c206f70c c01dd1a0 bad007d4 c01dd8c4 [ 1440.234980] 3e80: c62fea04 c2b8a004 00000055 c01e3e48 00000000 00000001 00020330 c206f70c [ 1440.243106] 3ea0: 41b58ab3 c1956708 c1b8ba80 00000009 c1c06d14 c5419280 c204e44c c0136398 [ 1440.251333] 3ec0: c5419280 faacd85f c1b8c649 da2eb640 da2eb680 c1c06d14 c5419280 c204e44c [ 1440.259460] 3ee0: c206f70c da2eb650 c1331e80 c01de580 da2e4a50 1875f000 c1c70940 da2eb64a [ 1440.267686] 3f00: c5419284 c01736fc 00400000 c1c050a4 00000200 c5419280 c5419284 00000101 [ 1440.275911] 3f20: 1875f000 c204dc48 0000000a c0135ac4 00000009 00000009 c1b843d8 c1c05080 [ 1440.284037] 3f40: de803f30 c1b843b0 00000009 00000001 c1b8ba80 00400000 00000000 0001bd06 [ 1440.292264] 3f60: de803fc0 c1c05d40 c30c3c00 00400000 c541948c bad007f0 de803f88 c0eb80d8 [ 1440.300391] 3f80: 41b58ab3 c1952a48 c01358f4 c02b9c30 00000000 c30b7400 d0c85340 c088be94 [ 1440.308618] 3fa0: 00000000 c02b9c30 c1b8aab0 c1c07538 c1dbae00 c1b8a2f4 c1b8a2fc de803fd8 [ 1440.316745] 3fc0: c1b8a2f4 c1b8a2fc de803fd8 1875f000 da2e92f4 c12a7030 c1c0d940 c1b8ba80 [ 1440.324973] 3fe0: 1875f000 600f0013 1875f000 c20795c0 c5419360 c5419470 e5333f40 c1241da8 [ 1440.333183] __read_once_word_nocheck from unwind_exec_insn+0x364/0x658 [ 1440.339726] unwind_exec_insn from unwind_frame+0x270/0x618 [ 1440.345352] unwind_frame from arch_stack_walk+0x6c/0xe0 [ 1440.350674] arch_stack_walk from stack_trace_save+0x90/0xc0 [ 1440.356308] stack_trace_save from kasan_save_stack+0x30/0x4c [ 1440.362042] kasan_save_stack from __kasan_record_aux_stack+0x84/0x8c [ 1440.368473] __kasan_record_aux_stack from task_work_add+0x90/0x210 [ 1440.374706] task_work_add from scheduler_tick+0x18c/0x250 [ 1440.380245] scheduler_tick from update_process_times+0x124/0x148 [ 1440.386287] update_process_times from tick_sched_handle+0x64/0x88 [ 1440.392521] tick_sched_handle from tick_sched_timer+0x60/0xcc [ 1440.398341] tick_sched_timer from __hrtimer_run_queues+0x2c4/0x59c [ 1440.404572] __hrtimer_run_queues from hrtimer_interrupt+0x1bc/0x3a0 [ 1440.411009] hrtimer_interrupt from arch_timer_handler_virt+0x34/0x3c [ 1440.417447] arch_timer_handler_virt from handle_percpu_devid_irq+0xf4/0x368 [ 1440.424480] handle_percpu_devid_irq from generic_handle_domain_irq+0x38/0x48 [ 1440.431618] generic_handle_domain_irq from gic_handle_irq+0x90/0xa8 [ 1440.437953] gic_handle_irq from generic_handle_arch_irq+0x30/0x40 [ 1440.444094] generic_handle_arch_irq from __irq_svc+0x88/0xc8 [ 1440.449920] Exception stack(0xde803a30 to 0xde803a78) [ 1440.454914] 3a20: de803b00 00000000 00000001 000000c0 [ 1440.463141] 3a40: e5333f40 de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c [ 1440.471262] 3a60: c20584b8 de803a7c c0114114 c0113850 200f0013 ffffffff [ 1440.477959] __irq_svc from unwind_exec_insn+0x4/0x658 [ 1440.483078] unwind_exec_insn from call_with_stack+0x18/0x20 [ 1440.488722] 8<--- cut here ---
On Mon, Oct 21, 2024 at 2:12 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
I saw your email about Melon's patch targeting the same subject. If we don't enable KASAN either you patch or Melon's one do not compile.
[...]
if (IS_ENABLED(CONFIG_KASAN_VMALLOC))
[...]
Should be replaced with an #ifdef directive. `kasan_mem_to_shadow` symbol is hiden behind :
include/linux/kasan.h:32:#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
So symbol doesn't exist without KASAN enabled.
Yeah sorry for missing this. :(
The absence of stubs in the Kasan header makes it necessary to rely on ifdefs.
I will fold the ideas from Melon's patch into mine and also develop a version that works with ifdefs.
Yours, Linus Walleij
Hi Clement,
I saw I missed to look closer at the new bug found in ext4 on the STM32:
On Mon, Oct 21, 2024 at 2:12 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
Perhaps not related with this topic but as in the backtrace I am getting some keyword from our start exchange, I dump the crash below. If this backtrace is somehow related with our issue, please have a look.
(...)
[ 1439.351945] PC is at __read_once_word_nocheck+0x0/0x8 [ 1439.356965] LR is at unwind_exec_insn+0x364/0x658
(...)
[ 1440.333183] __read_once_word_nocheck from unwind_exec_insn+0x364/0x658 [ 1440.339726] unwind_exec_insn from unwind_frame+0x270/0x618 [ 1440.345352] unwind_frame from arch_stack_walk+0x6c/0xe0 [ 1440.350674] arch_stack_walk from stack_trace_save+0x90/0xc0 [ 1440.356308] stack_trace_save from kasan_save_stack+0x30/0x4c [ 1440.362042] kasan_save_stack from __kasan_record_aux_stack+0x84/0x8c [ 1440.368473] __kasan_record_aux_stack from task_work_add+0x90/0x210 [ 1440.374706] task_work_add from scheduler_tick+0x18c/0x250 [ 1440.380245] scheduler_tick from update_process_times+0x124/0x148 [ 1440.386287] update_process_times from tick_sched_handle+0x64/0x88 [ 1440.392521] tick_sched_handle from tick_sched_timer+0x60/0xcc [ 1440.398341] tick_sched_timer from __hrtimer_run_queues+0x2c4/0x59c [ 1440.404572] __hrtimer_run_queues from hrtimer_interrupt+0x1bc/0x3a0 [ 1440.411009] hrtimer_interrupt from arch_timer_handler_virt+0x34/0x3c [ 1440.417447] arch_timer_handler_virt from handle_percpu_devid_irq+0xf4/0x368 [ 1440.424480] handle_percpu_devid_irq from generic_handle_domain_irq+0x38/0x48 [ 1440.431618] generic_handle_domain_irq from gic_handle_irq+0x90/0xa8 [ 1440.437953] gic_handle_irq from generic_handle_arch_irq+0x30/0x40 [ 1440.444094] generic_handle_arch_irq from __irq_svc+0x88/0xc8 [ 1440.449920] Exception stack(0xde803a30 to 0xde803a78) [ 1440.454914] 3a20: de803b00 00000000 00000001 000000c0 [ 1440.463141] 3a40: e5333f40 de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c [ 1440.471262] 3a60: c20584b8 de803a7c c0114114 c0113850 200f0013 ffffffff [ 1440.477959] __irq_svc from unwind_exec_insn+0x4/0x658 [ 1440.483078] unwind_exec_insn from call_with_stack+0x18/0x20
This is hard to analyze without being able to reproduce it, but it talks about the stack and Kasan and unwinding, so could it (also) be related to the VMAP:ed stack?
Did you try to revert (or check out the commit before and after) b6506981f880 ARM: unwind: support unwinding across multiple stacks to see if this is again fixing the issue?
Yours, Linus Walleij
On 10/24/24 23:58, Linus Walleij wrote:
Hi Clement,
I saw I missed to look closer at the new bug found in ext4 on the STM32:
On Mon, Oct 21, 2024 at 2:12 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
Perhaps not related with this topic but as in the backtrace I am getting some keyword from our start exchange, I dump the crash below. If this backtrace is somehow related with our issue, please have a look.
(...)
[ 1439.351945] PC is at __read_once_word_nocheck+0x0/0x8 [ 1439.356965] LR is at unwind_exec_insn+0x364/0x658
(...)
[ 1440.333183] __read_once_word_nocheck from unwind_exec_insn+0x364/0x658 [ 1440.339726] unwind_exec_insn from unwind_frame+0x270/0x618 [ 1440.345352] unwind_frame from arch_stack_walk+0x6c/0xe0 [ 1440.350674] arch_stack_walk from stack_trace_save+0x90/0xc0 [ 1440.356308] stack_trace_save from kasan_save_stack+0x30/0x4c [ 1440.362042] kasan_save_stack from __kasan_record_aux_stack+0x84/0x8c [ 1440.368473] __kasan_record_aux_stack from task_work_add+0x90/0x210 [ 1440.374706] task_work_add from scheduler_tick+0x18c/0x250 [ 1440.380245] scheduler_tick from update_process_times+0x124/0x148 [ 1440.386287] update_process_times from tick_sched_handle+0x64/0x88 [ 1440.392521] tick_sched_handle from tick_sched_timer+0x60/0xcc [ 1440.398341] tick_sched_timer from __hrtimer_run_queues+0x2c4/0x59c [ 1440.404572] __hrtimer_run_queues from hrtimer_interrupt+0x1bc/0x3a0 [ 1440.411009] hrtimer_interrupt from arch_timer_handler_virt+0x34/0x3c [ 1440.417447] arch_timer_handler_virt from handle_percpu_devid_irq+0xf4/0x368 [ 1440.424480] handle_percpu_devid_irq from generic_handle_domain_irq+0x38/0x48 [ 1440.431618] generic_handle_domain_irq from gic_handle_irq+0x90/0xa8 [ 1440.437953] gic_handle_irq from generic_handle_arch_irq+0x30/0x40 [ 1440.444094] generic_handle_arch_irq from __irq_svc+0x88/0xc8 [ 1440.449920] Exception stack(0xde803a30 to 0xde803a78) [ 1440.454914] 3a20: de803b00 00000000 00000001 000000c0 [ 1440.463141] 3a40: e5333f40 de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c [ 1440.471262] 3a60: c20584b8 de803a7c c0114114 c0113850 200f0013 ffffffff [ 1440.477959] __irq_svc from unwind_exec_insn+0x4/0x658 [ 1440.483078] unwind_exec_insn from call_with_stack+0x18/0x20
This is hard to analyze without being able to reproduce it, but it talks about the stack and Kasan and unwinding, so could it (also) be related to the VMAP:ed stack?
Did you try to revert (or check out the commit before and after) b6506981f880 ARM: unwind: support unwinding across multiple stacks to see if this is again fixing the issue?
I Linus,
Yes, I've tried to revert this particular commit on top of your last patches but I have some conflicts inside arch/arm/kernel/unwind.c
On Fri, Oct 25, 2024 at 11:27 AM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
On 10/24/24 23:58, Linus Walleij wrote:
Hi Clement,
I saw I missed to look closer at the new bug found in ext4 on the STM32:
On Mon, Oct 21, 2024 at 2:12 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
Perhaps not related with this topic but as in the backtrace I am getting some keyword from our start exchange, I dump the crash below. If this backtrace is somehow related with our issue, please have a look.
(...)
[ 1439.351945] PC is at __read_once_word_nocheck+0x0/0x8 [ 1439.356965] LR is at unwind_exec_insn+0x364/0x658
(...)
[ 1440.333183] __read_once_word_nocheck from unwind_exec_insn+0x364/0x658 [ 1440.339726] unwind_exec_insn from unwind_frame+0x270/0x618 [ 1440.345352] unwind_frame from arch_stack_walk+0x6c/0xe0 [ 1440.350674] arch_stack_walk from stack_trace_save+0x90/0xc0 [ 1440.356308] stack_trace_save from kasan_save_stack+0x30/0x4c [ 1440.362042] kasan_save_stack from __kasan_record_aux_stack+0x84/0x8c [ 1440.368473] __kasan_record_aux_stack from task_work_add+0x90/0x210 [ 1440.374706] task_work_add from scheduler_tick+0x18c/0x250 [ 1440.380245] scheduler_tick from update_process_times+0x124/0x148 [ 1440.386287] update_process_times from tick_sched_handle+0x64/0x88 [ 1440.392521] tick_sched_handle from tick_sched_timer+0x60/0xcc [ 1440.398341] tick_sched_timer from __hrtimer_run_queues+0x2c4/0x59c [ 1440.404572] __hrtimer_run_queues from hrtimer_interrupt+0x1bc/0x3a0 [ 1440.411009] hrtimer_interrupt from arch_timer_handler_virt+0x34/0x3c [ 1440.417447] arch_timer_handler_virt from handle_percpu_devid_irq+0xf4/0x368 [ 1440.424480] handle_percpu_devid_irq from generic_handle_domain_irq+0x38/0x48 [ 1440.431618] generic_handle_domain_irq from gic_handle_irq+0x90/0xa8 [ 1440.437953] gic_handle_irq from generic_handle_arch_irq+0x30/0x40 [ 1440.444094] generic_handle_arch_irq from __irq_svc+0x88/0xc8 [ 1440.449920] Exception stack(0xde803a30 to 0xde803a78) [ 1440.454914] 3a20: de803b00 00000000 00000001 000000c0 [ 1440.463141] 3a40: e5333f40 de803ba0 de803bd0 00000001 e5333f40 de803b00 c1241d90 bad0075c [ 1440.471262] 3a60: c20584b8 de803a7c c0114114 c0113850 200f0013 ffffffff [ 1440.477959] __irq_svc from unwind_exec_insn+0x4/0x658 [ 1440.483078] unwind_exec_insn from call_with_stack+0x18/0x20
This is hard to analyze without being able to reproduce it, but it talks about the stack and Kasan and unwinding, so could it (also) be related to the VMAP:ed stack?
Did you try to revert (or check out the commit before and after) b6506981f880 ARM: unwind: support unwinding across multiple stacks to see if this is again fixing the issue?
I Linus,
Yes, I've tried to revert this particular commit on top of your last patches but I have some conflicts inside arch/arm/kernel/unwind.c
What happens if you just
git checkout b6506981f880^
And build and boot that? It's just running the commit right before the unwinding patch.
Yours, Linus Walleij
[Me]
What happens if you just
git checkout b6506981f880^
And build and boot that? It's just running the commit right before the unwinding patch.
Another thing you can test is to disable vmap:ed stacks and see what happens. (General architecture-dependent options uncheck "Use a virtually-mapped stack".)
Yours, Linus Walleij
On 10/25/24 22:57, Linus Walleij wrote:
What happens if you just
git checkout b6506981f880^
And build and boot that? It's just running the commit right before the unwinding patch.
Another thing you can test is to disable vmap:ed stacks and see what happens. (General architecture-dependent options uncheck "Use a virtually-mapped stack".)
Hi Linus,
I have tested your patches against few kernel versions without reproducing the issue. - b6506981f880^ - v6.6.48 - v6.12-rc4 I didn't touch to CONFIG_VMAP_STACK though.
The main difference from my crash report is my test environment which was a downstream one.
So it seems related to ST downstream kernel version based on a v6.6.48. Even though the backtrace was talking about unwinding and kasan.
I will continue to investigate on my side in the next weeks but I don't want to block the patch integration process if I was.
Best regards,
Clément
On Tue, Oct 29, 2024 at 4:03 PM Clement LE GOFFIC clement.legoffic@foss.st.com wrote:
I have tested your patches against few kernel versions without reproducing the issue.
- b6506981f880^
- v6.6.48
- v6.12-rc4
I didn't touch to CONFIG_VMAP_STACK though.
The main difference from my crash report is my test environment which was a downstream one.
So it seems related to ST downstream kernel version based on a v6.6.48. Even though the backtrace was talking about unwinding and kasan.
I will continue to investigate on my side in the next weeks but I don't want to block the patch integration process if I was.
I think we can assume that the patches we have queued in Russells patch tracker at least don't make things worse, so let's merge those and then see if there is more fallout we need to dig into as you test.
Thanks Clement!
Yours, Linus Walleij
When switching task, in addition to a dummy read from the new VMAP stack, also do a dummy read from the VMAP stack's corresponding KASAN shadow memory to sync things up in the new MM context.
Cc: stable@vger.kernel.org Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks") Link: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6... Reported-by: Clement LE GOFFIC clement.legoffic@foss.st.com Suggested-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Linus Walleij linus.walleij@linaro.org --- arch/arm/kernel/entry-armv.S | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 1dfae1af8e31..ef6a657c8d13 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -25,6 +25,7 @@ #include <asm/tls.h> #include <asm/system_info.h> #include <asm/uaccess-asm.h> +#include <asm/kasan_def.h>
#include "entry-header.S" #include <asm/probes.h> @@ -561,6 +562,13 @@ ENTRY(__switch_to) @ entries covering the vmalloc region. @ ldr r2, [ip] +#ifdef CONFIG_KASAN_VMALLOC + @ Also dummy read from the KASAN shadow memory for the new stack if we + @ are using KASAN + mov_l r2, KASAN_SHADOW_OFFSET + add r2, r2, ip, lsr #KASAN_SHADOW_SCALE_SHIFT + ldr r2, [r2] +#endif #endif
@ When CONFIG_THREAD_INFO_IN_TASK=n, the update of SP itself is what
linux-stable-mirror@lists.linaro.org