Dear ARM developers, could you please help me to find the reason of this problem?
On 6/7/22 18:29, Naresh Kamboju wrote:
On Tue, 7 Jun 2022 at 19:47, Shakeel Butt shakeelb@google.com wrote:
On Tue, Jun 7, 2022 at 3:28 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
Hi Shakeel,
Can you test v5.19-rc1, please? If that does not fail, then you could bisect between that and next-20220606 ...
This is already reported at https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/ and I think we know the underlying issue (which is calling virt_to_page() on a vmalloc address).
Sorry, I might be wrong. Just checked the stacktrace again and it seems like the failure is happening in early boot in this report. Though the error "Unable to handle kernel paging request at virtual address" is happening in the function mem_cgroup_from_obj().
Naresh, can you repro the issue if you revert the patch "net: set proper memcg for net_init hooks allocations"?
yes. You are right ! 19ee3818b7c6 ("net: set proper memcg for net_init hooks allocations") After reverting this single commit I am able to boot arm64 successfully.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Can you please run script/faddr2line on "mem_cgroup_from_obj+0x2c/0x120"?
./scripts/faddr2line vmlinux mem_cgroup_from_obj+0x2c/0x120 mem_cgroup_from_obj+0x2c/0x120: mem_cgroup_from_obj at ??:?
Please find the following artifacts which are causing kernel crashes.
vmlinux: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/vmlinux.xz System.map: https://builds.tuxbuild.com/2ABl8X9kHAAU5MlL3E3xExHFrNy/System.map
Dear Naresh, thank you very much
mem_cgroup_from_obj(): ffff80000836cf40: d503245f bti c ffff80000836cf44: d503201f nop ffff80000836cf48: d503201f nop ffff80000836cf4c: d503233f paciasp ffff80000836cf50: d503201f nop ffff80000836cf54: d2e00021 mov x1, #0x1000000000000 // #281474976710656 ffff80000836cf58: 8b010001 add x1, x0, x1 ffff80000836cf5c: b25657e4 mov x4, #0xfffffc0000000000 // #-4398046511104 ffff80000836cf60: d34cfc21 lsr x1, x1, #12 ffff80000836cf64: d37ae421 lsl x1, x1, #6 ffff80000836cf68: 8b040022 add x2, x1, x4 ffff80000836cf6c: f9400443 ldr x3, [x2, #8]
x5 : ffff80000a96f000 x4 : fffffc0000000000 x3 : ffff80000ad5e680 x2 : fffffe00002bc240 x1 : 00000200002bc240 x0 : ffff80000af09740
x0 = 0xffff80000af09740 is an argument of mem_cgroup_from_obj() according to System.map it is init_net
This issue is caused by calling virt_to_page() on address of static variable init_net. Arm64 consider that addresses of static variables are not valid virtual addresses. On x86_64 the same API works without any problem.
Unfortunately I do not understand the cause of the problem. I do not see any bugs in my patch. I'm using an existing API, mem_cgroup_from_obj(), to find the memory cgroup used to account for the specified object. In particular, in the current case, I wanted to get the memory cgroup of the specified network namespace by the name taken from for_each_net(). The first object in this list is the static structure unit_net
On x86_64 I can translate its address to page:
crash> p &init_net $1 = (struct net *) 0xffffffff90c7bdc0 <init_net> crash> vtop 0xffffffff90c7bdc0 VIRTUAL PHYSICAL ffffffff90c7bdc0 402c7bdc0
PGD DIRECTORY: ffffffff8fe10000 PAGE DIRECTORY: 401e15067 PUD: 401e15ff0 => 401e16063 PMD: 401e16430 => 8000000402c000e3 PAGE: 402c00000 (2MB)
PTE PHYSICAL FLAGS 8000000402c000e3 402c00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff227d00b1ec0 402c7b000 0 0 1 17ffffc0001000 reserved
However, as far as I understand this does not work for arm64. Could you please help me to understand what is wrong here?
Below are: link to my patch: https://lore.kernel.org/all/20220603182442.63750C385B8@smtp.kernel.org/ and the quote of my investigation of similar report: https://lore.kernel.org/all/Yp4F6n2Ie32re7Ed@qian/
virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net) WARNING: CPU: 87 PID: 3170 at arch/arm64/mm/physaddr.c:12 __virt_to_phys
...
Call trace: __virt_to_phys mem_cgroup_from_obj __register_pernet_operations
@@ -1143,7 +1144,13 @@ static int __register_pernet_operations(struct list_head *list, * setup_net() and cleanup_net() are not possible. */ for_each_net(net) { + struct mem_cgroup *old, *memcg; + + memcg = mem_cgroup_or_root(get_mem_cgroup_from_obj(net)); <<<< Here + old = set_active_memcg(memcg); error = ops_init(ops, net); + set_active_memcg(old); + mem_cgroup_put(memcg); ... +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); + do { + memcg = mem_cgroup_from_obj(p); <<<< + } while (memcg && !css_tryget(&memcg->css)); ... struct mem_cgroup *mem_cgroup_from_obj(void *p) { struct folio *folio;
if (mem_cgroup_disabled()) return NULL;
folio = virt_to_folio(p); <<<< here ... static inline struct folio *virt_to_folio(const void *x) { struct page *page = virt_to_page(x); <<< here
... (arm64) #define virt_to_page(x) pfn_to_page(virt_to_pfn(x)) ... #define virt_to_pfn(x) __phys_to_pfn(__virt_to_phys((unsigned long)(x))) ... phys_addr_t __virt_to_phys(unsigned long x) { WARN(!__is_lm_address(__tag_reset(x)), "virt_to_phys used for non-linear address: %pK (%pS)\n", ... virt_to_phys used for non-linear address: ffffd8efe2d2fe00 (init_net)
Thank you, Vasily Averin