On Sun, Apr 17, 2022 at 02:32:03PM -0700, KernelCI bot wrote:
The KernelCI bisection bot found that commit 6026d4032dbbe3 ("arm: extend pfn_valid to take into account freed memory map alignment") triggered a regression in v5.4.x on 32 bit ARM with a qemu platform booting UEFI firmware. We try to dereference an invalid pointer parsing the DMI tables:
<1>[ 0.084476] 8<--- cut here --- <1>[ 0.084595] Unable to handle kernel paging request at virtual address dfb76000 <1>[ 0.084938] pgd = (ptrval) <1>[ 0.085038] [dfb76000] *pgd=5f7fe801, *pte=00000000, *ppte=00000000
...
<4>[ 0.093923] [<c0ed6ce8>] (memcpy) from [<c16a06f8>] (dmi_setup+0x60/0x418) <4>[ 0.094204] [<c16a06f8>] (dmi_setup) from [<c16a38d4>] (arm_dmi_init+0x8/0x10) <4>[ 0.094408] [<c16a38d4>] (arm_dmi_init) from [<c0302e9c>] (do_one_initcall+0x50/0x228) <4>[ 0.094619] [<c0302e9c>] (do_one_initcall) from [<c16011e4>] (kernel_init_freeable+0x15c/0x1f8) <4>[ 0.094841] [<c16011e4>] (kernel_init_freeable) from [<c0f028cc>] (kernel_init+0x8/0x10c) <4>[ 0.095057] [<c0f028cc>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
This particular bisect is from GICv2 but GICv3 shows the same issue, and it persists in the latest stable -rc:
https://linux.kernelci.org/test/job/stable-rc/branch/linux-5.4.y/kernel/v5.4...
A quick check seems to show that other stable branches are unaffected. I've left all the context from the report (including full boot logs and a Reported-by tag) below:
- This automated bisection report was sent to you on the basis *
- that you may be involved with the breaking commit it has *
- found. No manual investigation has been done to verify it, *
- and the root cause of the problem may be somewhere else. *
*
- If you do send a fix, please include this trailer: *
- Reported-by: "kernelci.org bot" bot@kernelci.org *
*
- Hope this helps! *
stable-rc/linux-5.4.y bisection: baseline.login on qemu_arm-virt-gicv2-uefi
Summary: Start: e7f5213d755bc Linux 5.4.189 Plain log: https://storage.kernelci.org/stable-rc/linux-5.4.y/v5.4.189/arm/multi_v7_def... HTML log: https://storage.kernelci.org/stable-rc/linux-5.4.y/v5.4.189/arm/multi_v7_def... Result: 6026d4032dbbe arm: extend pfn_valid to take into account freed memory map alignment
Checks: revert: PASS verify: PASS
Parameters: Tree: stable-rc URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Branch: linux-5.4.y Target: qemu_arm-virt-gicv2-uefi CPU arch: arm Lab: lab-baylibre Compiler: gcc-10 Config: multi_v7_defconfig Test case: baseline.login
Breaking commit found:
commit 6026d4032dbbe3d7f4ac2c8daa923fe74dcf41c4 Author: Mike Rapoport rppt@linux.ibm.com Date: Mon Dec 13 16:57:09 2021 +0800
arm: extend pfn_valid to take into account freed memory map alignment
commit a4d5613c4dc6d413e0733e37db9d116a2a36b9f3 upstream. When unused memory map is freed the preserved part of the memory map is extended to match pageblock boundaries because lots of core mm functionality relies on homogeneity of the memory map within pageblock boundaries. Since pfn_valid() is used to check whether there is a valid memory map entry for a PFN, make it return true also for PFNs that have memory map entries even if there is no actual memory populated there. Signed-off-by: Mike Rapoport rppt@linux.ibm.com Tested-by: Kefeng Wang wangkefeng.wang@huawei.com Tested-by: Tony Lindgren tony@atomide.com Link: https://lore.kernel.org/lkml/20210630071211.21011-1-rppt@kernel.org/ Signed-off-by: Mark-PK Tsai mark-pk.tsai@mediatek.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 5635bcc419af8..ff2cd985d20e0 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -176,11 +176,22 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, int pfn_valid(unsigned long pfn) { phys_addr_t addr = __pfn_to_phys(pfn);
- unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
if (__phys_to_pfn(addr) != pfn) return 0;
- return memblock_is_map_memory(__pfn_to_phys(pfn));
- /*
* If address less than pageblock_size bytes away from a present
* memory chunk there still will be a memory map entry for it
* because we round freed memory map to the pageblock boundaries.
*/
- if (memblock_overlaps_region(&memblock.memory,
ALIGN_DOWN(addr, pageblock_size),
pageblock_size))
return 1;
- return 0;
} EXPORT_SYMBOL(pfn_valid);
#endif
Git bisection log:
git bisect start # good: [7f70428f0109470aa9177d1a9e5ce02de736f480] Linux 5.4.165 git bisect good 7f70428f0109470aa9177d1a9e5ce02de736f480 # bad: [e7f5213d755bc34f366d36f08825c0b446117d96] Linux 5.4.189 git bisect bad e7f5213d755bc34f366d36f08825c0b446117d96 # bad: [902528183f4d94945a0c1ed6048d4a5d4e1e712e] mmc: block: fix read single on recovery logic git bisect bad 902528183f4d94945a0c1ed6048d4a5d4e1e712e # bad: [c7e4004b38aa7ad482fc46ab76e28879f84ec77e] batman-adv: allow netlink usage in unprivileged containers git bisect bad c7e4004b38aa7ad482fc46ab76e28879f84ec77e # bad: [db0c834abbc186bda56b1e13b4eb61f7126c12c5] rndis_host: support Hytera digital radios git bisect bad db0c834abbc186bda56b1e13b4eb61f7126c12c5 # bad: [0b01c51c4f47f59ad7eb1ea5bac47fab14b188a5] qlcnic: potential dereference null pointer of rx_queue->page_ring git bisect bad 0b01c51c4f47f59ad7eb1ea5bac47fab14b188a5 # bad: [e7660f9535ade84ea57aed1c55d102bfb23dd2ff] mac80211: fix lookup when adding AddBA extension element git bisect bad e7660f9535ade84ea57aed1c55d102bfb23dd2ff # bad: [802a1a8501563714a5fe8824f4ed27fec04a0719] firmware: arm_scpi: Fix string overflow in SCPI genpd driver git bisect bad 802a1a8501563714a5fe8824f4ed27fec04a0719 # good: [2fb8e4267c47d69d6bada6310607ea3762f6c962] KVM: x86: Ignore sparse banks size for an "all CPUs", non-sparse IPI req git bisect good 2fb8e4267c47d69d6bada6310607ea3762f6c962 # good: [492f4d3cde95aadcd1d070db5dd4796ae8019165] memblock: ensure there is no overflow in memblock_overlaps_region() git bisect good 492f4d3cde95aadcd1d070db5dd4796ae8019165 # bad: [e8ef940326efd17ca7fdd3cb8791c29a24b04f28] Linux 5.4.167 git bisect bad e8ef940326efd17ca7fdd3cb8791c29a24b04f28 # bad: [c97579584fa88df65ff6e4653b175acba154862d] arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM git bisect bad c97579584fa88df65ff6e4653b175acba154862d # bad: [6026d4032dbbe3d7f4ac2c8daa923fe74dcf41c4] arm: extend pfn_valid to take into account freed memory map alignment git bisect bad 6026d4032dbbe3d7f4ac2c8daa923fe74dcf41c4
# first bad commit: [6026d4032dbbe3d7f4ac2c8daa923fe74dcf41c4] arm: extend pfn_valid to take into account freed memory map alignment
-=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#25917): https://groups.io/g/kernelci-results/message/25917 Mute This Topic: https://groups.io/mt/90529234/1131744 Group Owner: kernelci-results+owner@groups.io Unsubscribe: https://groups.io/g/kernelci-results/unsub [broonie@kernel.org] -=-=-=-=-=-=-=-=-=-=-=-
On Tue, Apr 19, 2022 at 02:31:59PM +0100, Mark Brown wrote:
On Sun, Apr 17, 2022 at 02:32:03PM -0700, KernelCI bot wrote:
The KernelCI bisection bot found that commit 6026d4032dbbe3 ("arm: extend pfn_valid to take into account freed memory map alignment") triggered a regression in v5.4.x on 32 bit ARM with a qemu platform booting UEFI firmware. We try to dereference an invalid pointer parsing the DMI tables:
<1>[ 0.084476] 8<--- cut here --- <1>[ 0.084595] Unable to handle kernel paging request at virtual address dfb76000 <1>[ 0.084938] pgd = (ptrval) <1>[ 0.085038] [dfb76000] *pgd=5f7fe801, *pte=00000000, *ppte=00000000
...
<4>[ 0.093923] [<c0ed6ce8>] (memcpy) from [<c16a06f8>] (dmi_setup+0x60/0x418) <4>[ 0.094204] [<c16a06f8>] (dmi_setup) from [<c16a38d4>] (arm_dmi_init+0x8/0x10) <4>[ 0.094408] [<c16a38d4>] (arm_dmi_init) from [<c0302e9c>] (do_one_initcall+0x50/0x228) <4>[ 0.094619] [<c0302e9c>] (do_one_initcall) from [<c16011e4>] (kernel_init_freeable+0x15c/0x1f8) <4>[ 0.094841] [<c16011e4>] (kernel_init_freeable) from [<c0f028cc>] (kernel_init+0x8/0x10c) <4>[ 0.095057] [<c0f028cc>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
This particular bisect is from GICv2 but GICv3 shows the same issue, and it persists in the latest stable -rc:
https://linux.kernelci.org/test/job/stable-rc/branch/linux-5.4.y/kernel/v5.4.189-64-gab55553793398/plan/baseline/
I don't know how exactly kernel-ci runs qemu with UEFI, I've tried this:
$QEMU -serial stdio -M virt-2.11,gic-version=2 -cpu cortex-a15 -m 1G \ -drive file=$QEMU_EFI,if=pflash,format=raw,unit=0,readonly=on \ -drive file=flash1.img,if=pflash,format=raw,unit=1,readonly=off \ -kernel $kernel \ -append "console=ttyAMA0"
with stable-rc/linux-5.4.y and I've got as far as to failure to mount rootfs and the crash in dmu_setup() didn't reproduce for me.
My understanding is that with HEAD pointing to 6026d4032dbbe3 crash happens because ioremap uses pfn_valid() to check if a PFN is in RAM which is fixed by c97579584fa8 ("arm: ioremap: don't abuse pfn_valid() to check if pfn is in RAM") that comes on top of 6026d4032dbbe3.
No clues why ab55553793398 fails, though...
On Wed, Apr 20, 2022 at 12:18:04PM +0300, Mike Rapoport wrote:
I don't know how exactly kernel-ci runs qemu with UEFI, I've tried this:
$QEMU -serial stdio -M virt-2.11,gic-version=2 -cpu cortex-a15 -m 1G \ -drive file=$QEMU_EFI,if=pflash,format=raw,unit=0,readonly=on \ -drive file=flash1.img,if=pflash,format=raw,unit=1,readonly=off \ -kernel $kernel \ -append "console=ttyAMA0"
with stable-rc/linux-5.4.y and I've got as far as to failure to mount rootfs and the crash in dmu_setup() didn't reproduce for me.
The command should be something to the effect of:
qemu-system-aarch64 -cpu max -machine virt,gic-version=3,mte=on,accel=tcg -nographic -net nic,model=virtio,macaddr=52:54:00:12:34:58 -net user -m 512 -monitor none -bios /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/bios/QEMU_EFI.fd -kernel /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/kernel/Image -append "console=ttyAMA0,115200 root=/dev/ram0 debug verbose console_msg_format=syslog earlycon" -initrd /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/ramdisk/rootfs.cpio.gz -drive format=qcow2,file=/var/lib/lava/dispatcher/tmp/75061/apply-overlay-guest-6hyfr8ki/lava-guest.qcow2,media=disk,if=virtio,id=lavatest
(with different paths) where QEMU_EFI.fd is:
http://storage.kernelci.org/images/uefi/edk2-stable202005/arm64/QEMU_EFI.fd
I didn't pull an exact job, sorry. A full job showing the expected flow (for GICv3 which shows the same thing) is at:
On Wed, Apr 20, 2022 at 01:07:03PM +0100, Mark Brown wrote:
On Wed, Apr 20, 2022 at 12:18:04PM +0300, Mike Rapoport wrote:
I don't know how exactly kernel-ci runs qemu with UEFI, I've tried this:
$QEMU -serial stdio -M virt-2.11,gic-version=2 -cpu cortex-a15 -m 1G \ -drive file=$QEMU_EFI,if=pflash,format=raw,unit=0,readonly=on \ -drive file=flash1.img,if=pflash,format=raw,unit=1,readonly=off \ -kernel $kernel \ -append "console=ttyAMA0"
with stable-rc/linux-5.4.y and I've got as far as to failure to mount rootfs and the crash in dmu_setup() didn't reproduce for me.
The command should be something to the effect of:
qemu-system-aarch64 -cpu max -machine virt,gic-version=3,mte=on,accel=tcg -nographic -net nic,model=virtio,macaddr=52:54:00:12:34:58 -net user -m 512 -monitor none -bios /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/bios/QEMU_EFI.fd -kernel /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/kernel/Image -append "console=ttyAMA0,115200 root=/dev/ram0 debug verbose console_msg_format=syslog earlycon" -initrd /var/lib/lava/dispatcher/tmp/75061/deployimages-ptln1wlp/ramdisk/rootfs.cpio.gz -drive format=qcow2,file=/var/lib/lava/dispatcher/tmp/75061/apply-overlay-guest-6hyfr8ki/lava-guest.qcow2,media=disk,if=virtio,id=lavatest
I could reproduce with this, thanks!
The problem is that memremap() uses pfn_valid() to check if RAM resource can be accessed via linear mapping and this check is wrong. The problem does not manifest on more recent kernels because the way ARM registers "System RAM" resources was updated so that it skipped NOMAP memory.
Can you please give a whirl to the patch below? If it's Ok I'll extend it to include arm64 and will send a formal patch.
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h index 7a0596fcb2e7..2df7454be649 100644 --- a/arch/arm/include/asm/io.h +++ b/arch/arm/include/asm/io.h @@ -144,6 +144,10 @@ extern void __iomem * (*arch_ioremap_caller)(phys_addr_t, size_t, unsigned int, void *); extern void (*arch_iounmap)(volatile void __iomem *);
+extern bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size, + unsigned long flags); +#define arch_memremap_can_ram_remap arch_memremap_can_ram_remap + /* * Bad read/write accesses... */ diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c index 513c26b46db3..98f90cd5a948 100644 --- a/arch/arm/mm/ioremap.c +++ b/arch/arm/mm/ioremap.c @@ -42,6 +42,13 @@ #include <asm/mach/pci.h> #include "mm.h"
+bool arch_memremap_can_ram_remap(resource_size_t offset, size_t size, + unsigned long flags) +{ + unsigned long pfn = PHYS_PFN(offset); + + return memblock_is_map_memory(pfn); +}
LIST_HEAD(static_vmlist);
diff --git a/kernel/iomem.c b/kernel/iomem.c index 62c92e43aa0d..e85bed24c0a9 100644 --- a/kernel/iomem.c +++ b/kernel/iomem.c @@ -33,7 +33,7 @@ static void *try_ram_remap(resource_size_t offset, size_t size, unsigned long pfn = PHYS_PFN(offset);
/* In the simple case just return the existing linear address */ - if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) && + if (!PageHighMem(pfn_to_page(pfn)) && arch_memremap_can_ram_remap(offset, size, flags)) return __va(offset);
(with different paths) where QEMU_EFI.fd is:
http://storage.kernelci.org/images/uefi/edk2-stable202005/arm64/QEMU_EFI.fd
I didn't pull an exact job, sorry. A full job showing the expected flow (for GICv3 which shows the same thing) is at:
On Thu, Apr 21, 2022 at 09:42:36AM +0300, Mike Rapoport wrote:
The problem is that memremap() uses pfn_valid() to check if RAM resource can be accessed via linear mapping and this check is wrong. The problem does not manifest on more recent kernels because the way ARM registers "System RAM" resources was updated so that it skipped NOMAP memory.
Can you please give a whirl to the patch below? If it's Ok I'll extend it to include arm64 and will send a formal patch.
Looks to have tested out OK:
https://linux.kernelci.org/test/job/broonie-misc/branch/for-kernelci/kernel/...
Tested-by: Mark Brown broonie@kernel.org
linux-stable-mirror@lists.linaro.org