Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") introduced a regression when running as Xen PV guest.
Today pmd/pud_poplulate() for Xen PV assumes that the PFN inserted is referencing a not yet used page table. In case of move_normal_pmd/pud() this is not true, resulting in WARN splats like:
[34321.304270] ------------[ cut here ]------------ [34321.304277] WARNING: CPU: 0 PID: 23628 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x176/0x1a0 [34321.304288] Modules linked in: [34321.304291] CPU: 0 PID: 23628 Comm: apt-get Not tainted 5.14.1-20210906-doflr-mac80211debug+ #1 [34321.304294] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640) , BIOS V1.8B1 09/13/2010 [34321.304296] RIP: e030:xen_mc_flush+0x176/0x1a0 [34321.304300] Code: 89 45 18 48 c1 e9 3f 48 89 ce e9 20 ff ff ff e8 60 03 00 00 66 90 5b 5d 41 5c 41 5d c3 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 10 97 aa 82 31 db 49 c7 c5 38 97 aa 82 65 [34321.304303] RSP: e02b:ffffc90000a97c90 EFLAGS: 00010002 [34321.304305] RAX: ffff88807d416398 RBX: ffff88807d416350 RCX: ffff88807d416398 [34321.304306] RDX: 0000000000000001 RSI: 0000000000000001 RDI: deadbeefdeadf00d [34321.304308] RBP: ffff88807d416300 R08: aaaaaaaaaaaaaaaa R09: ffff888006160cc0 [34321.304309] R10: deadbeefdeadf00d R11: ffffea000026a600 R12: 0000000000000000 [34321.304310] R13: ffff888012f6b000 R14: 0000000012f6b000 R15: 0000000000000001 [34321.304320] FS: 00007f5071177800(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000 [34321.304322] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033 [34321.304323] CR2: 00007f506f542000 CR3: 00000000160cc000 CR4: 0000000000000660 [34321.304326] Call Trace: [34321.304331] xen_alloc_pte+0x294/0x320 [34321.304334] move_pgt_entry+0x165/0x4b0 [34321.304339] move_page_tables+0x6fa/0x8d0 [34321.304342] move_vma.isra.44+0x138/0x500 [34321.304345] __x64_sys_mremap+0x296/0x410 [34321.304348] do_syscall_64+0x3a/0x80 [34321.304352] entry_SYSCALL_64_after_hwframe+0x44/0xae [34321.304355] RIP: 0033:0x7f507196301a [34321.304358] Code: 73 01 c3 48 8b 0d 76 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 0e 0c 00 f7 d8 64 89 01 48 [34321.304360] RSP: 002b:00007ffda1eecd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019 [34321.304362] RAX: ffffffffffffffda RBX: 000056205f950f30 RCX: 00007f507196301a [34321.304363] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f506dc56000 [34321.304364] RBP: 0000000001a00000 R08: 0000000000000010 R09: 0000000000000004 [34321.304365] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f506dc56060 [34321.304367] R13: 00007f506dc56000 R14: 00007f506dc56060 R15: 000056205f950f30 [34321.304368] ---[ end trace a19885b78fe8f33e ]--- [34321.304370] 1 of 2 multicall(s) failed: cpu 0 [34321.304371] call 2: op=12297829382473034410 arg=[aaaaaaaaaaaaaaaa] result=-22
Fix that by modifying xen_alloc_ptpage() to only pin the page table in case it wasn't pinned already.
Fixes: 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") Cc: stable@vger.kernel.org Reported-by: Sander Eikelenboom linux@eikelenboom.it Tested-by: Sander Eikelenboom linux@eikelenboom.it Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/xen/mmu_pv.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 1df5f01529e5..8d751939c6f3 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -1518,14 +1518,17 @@ static inline void xen_alloc_ptpage(struct mm_struct *mm, unsigned long pfn, if (pinned) { struct page *page = pfn_to_page(pfn);
- if (static_branch_likely(&xen_struct_pages_ready)) + pinned = false; + if (static_branch_likely(&xen_struct_pages_ready)) { + pinned = PagePinned(page); SetPagePinned(page); + }
xen_mc_batch();
__set_pfn_prot(pfn, PAGE_KERNEL_RO);
- if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS) + if (level == PT_PTE && USE_SPLIT_PTE_PTLOCKS && !pinned) __pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, pfn);
xen_mc_issue(PARAVIRT_LAZY_MMU);
On 08.09.2021 09:36, Juergen Gross wrote:
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") introduced a regression when running as Xen PV guest.
The description of that change starts with "pmd/pud_populate is the right interface to be used to set the respective page table entries." If this is deemed true, I don't think pmd_populate() should call paravirt_alloc_pte(): The latter function, as its name says, is supposed to be called for newly allocated page tables only (aiui).
Today pmd/pud_poplulate() for Xen PV assumes that the PFN inserted is referencing a not yet used page table. In case of move_normal_pmd/pud() this is not true, resulting in WARN splats like:
I agree for the PMD part, but is including PUD here really correct? While I don't know why that is, xen_alloc_ptpage() pins L1 tables only. Hence a PUD update shouldn't be able to find a pinned L2 table.
Jan
On 08.09.21 13:07, Jan Beulich wrote:
On 08.09.2021 09:36, Juergen Gross wrote:
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") introduced a regression when running as Xen PV guest.
The description of that change starts with "pmd/pud_populate is the right interface to be used to set the respective page table entries." If this is deemed true, I don't think pmd_populate() should call paravirt_alloc_pte(): The latter function, as its name says, is supposed to be called for newly allocated page tables only (aiui).
In theory you are correct, but my experience with reality tells me that another set of macros for this case will not be appreciated.
Today pmd/pud_poplulate() for Xen PV assumes that the PFN inserted is referencing a not yet used page table. In case of move_normal_pmd/pud() this is not true, resulting in WARN splats like:
I agree for the PMD part, but is including PUD here really correct? While I don't know why that is, xen_alloc_ptpage() pins L1 tables only. Hence a PUD update shouldn't be able to find a pinned L2 table.
I agree that I should drop mentioning PUD here.
I will do that change when committing in case no other changes are required.
Juergen
On 08.09.2021 15:32, Juergen Gross wrote:
On 08.09.21 13:07, Jan Beulich wrote:
On 08.09.2021 09:36, Juergen Gross wrote:
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") introduced a regression when running as Xen PV guest.
The description of that change starts with "pmd/pud_populate is the right interface to be used to set the respective page table entries." If this is deemed true, I don't think pmd_populate() should call paravirt_alloc_pte(): The latter function, as its name says, is supposed to be called for newly allocated page tables only (aiui).
In theory you are correct, but my experience with reality tells me that another set of macros for this case will not be appreciated.
Perhaps a new parameter to the macros / inlines identifying fresh vs moved? Or perhaps the offending change wasn't really correct in what its description said?
Jan
On 08.09.21 16:28, Jan Beulich wrote:
On 08.09.2021 15:32, Juergen Gross wrote:
On 08.09.21 13:07, Jan Beulich wrote:
On 08.09.2021 09:36, Juergen Gross wrote:
Commit 0881ace292b662 ("mm/mremap: use pmd/pud_poplulate to update page table entries") introduced a regression when running as Xen PV guest.
The description of that change starts with "pmd/pud_populate is the right interface to be used to set the respective page table entries." If this is deemed true, I don't think pmd_populate() should call paravirt_alloc_pte(): The latter function, as its name says, is supposed to be called for newly allocated page tables only (aiui).
In theory you are correct, but my experience with reality tells me that another set of macros for this case will not be appreciated.
Perhaps a new parameter to the macros / inlines identifying fresh vs moved? Or perhaps the offending change wasn't really correct in what its description said?
The problem is that those macros are spread over all architectures with each architecture defining them separately. Changing all those will not be really welcomed.
And the change was correct IMO, as the replaced pmd_set() should be used for leaf entries only (at least in arch independent code). pmd_populate() is the correct one for non-leaf entries.
Juergen
linux-stable-mirror@lists.linaro.org