On 2024-05-02 09:47, Ryan Roberts wrote:
On 02/05/2024 14:08, David Hildenbrand wrote:
On 01.05.24 16:33, Ryan Roberts wrote:
__split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. This is a problem for the migration entry case because pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a present pmd.
On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future call to pmd_present() will return true. And therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker.
x86 does not suffer the above problem, but instead pmd_mkinvalid() will corrupt the offset field of the swap entry within the swap pte. See link below for discussion of that problem.
Could that explain:
https://lore.kernel.org/all/YjoGbhreg8lGCGIJ@linutronix.de/
Where the PFN of a migration entry might have been corrupted?
Ahh interesting! Yes, it seems to fit...
Ccing Felix
Are you able to reliably reproduce the bug, Felix? If so, would you mind trying with this patch to see if it goes away?
Sorry, this question got lost and I found it while cleaning my inbox. The team that ran into this problems reported that it was a one-off failure. They didn't see this problem again. So I can't help with verifying the fix.
Regards, Felix
Patch itself looks good to me
Acked-by: David Hildenbrand david@redhat.com
Thanks!