syzbot reports oops in lockdep's __lock_acquire(), called from __pte_offset_map_lock() called from filemap_map_pages(); or when I run the repro, the oops comes in pmd_install(), called from filemap_map_pmd() called from filemap_map_pages(), just before the __pte_offset_map_lock().
The problem is that filemap_map_pmd() has been assuming that when it finds pmd_none(), a page table has already been prepared in prealloc_pte; and indeed do_fault_around() has been careful to preallocate one there, when it finds pmd_none(): but what if *pmd became none in between?
My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it easy for *pmd to be cleared while servicing a page fault; but even before those, a huge *pmd might be zapped while a fault is serviced.
The difference in symptomatic stack traces comes from the "memory model" in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some models that is strict, and will oops on the NULL prealloc_pte; in other models, it will construct a bogus value to be populated into *pmd, then __pte_offset_map_lock() oops when trying to access split ptlock pointer (or some other symptom in normal case of ptlock embedded not pointer).
Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxh... Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Hugh Dickins hughd@google.com Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/ Reviewed-by: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com, Cc: José Pekkarinen jose.pekkarinen@foxhound.fi Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9aa1345d66b8132745ffb99b348b1492088da9e2) Signed-off-by: Hugh Dickins hughd@google.com --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c index 81e28722edfa..84a5b0213e0e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3209,7 +3209,7 @@ static bool filemap_map_pmd(struct vm_fault *vmf, struct page *page) } }
- if (pmd_none(*vmf->pmd)) { + if (pmd_none(*vmf->pmd) && vmf->prealloc_pte) { vmf->ptl = pmd_lock(mm, vmf->pmd); if (likely(pmd_none(*vmf->pmd))) { mm_inc_nr_ptes(mm);
On Sat, Dec 09, 2023 at 09:18:42PM -0800, Hugh Dickins wrote:
syzbot reports oops in lockdep's __lock_acquire(), called from __pte_offset_map_lock() called from filemap_map_pages(); or when I run the repro, the oops comes in pmd_install(), called from filemap_map_pmd() called from filemap_map_pages(), just before the __pte_offset_map_lock().
The problem is that filemap_map_pmd() has been assuming that when it finds pmd_none(), a page table has already been prepared in prealloc_pte; and indeed do_fault_around() has been careful to preallocate one there, when it finds pmd_none(): but what if *pmd became none in between?
My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it easy for *pmd to be cleared while servicing a page fault; but even before those, a huge *pmd might be zapped while a fault is serviced.
The difference in symptomatic stack traces comes from the "memory model" in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some models that is strict, and will oops on the NULL prealloc_pte; in other models, it will construct a bogus value to be populated into *pmd, then __pte_offset_map_lock() oops when trying to access split ptlock pointer (or some other symptom in normal case of ptlock embedded not pointer).
Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxh... Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Hugh Dickins hughd@google.com Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/ Reviewed-by: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com, Cc: José Pekkarinen jose.pekkarinen@foxhound.fi Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9aa1345d66b8132745ffb99b348b1492088da9e2) Signed-off-by: Hugh Dickins hughd@google.com
mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Now queued up, thanks.
greg k-h
On Mon, 11 Dec 2023, Greg KH wrote:
On Sat, Dec 09, 2023 at 09:18:42PM -0800, Hugh Dickins wrote:
syzbot reports oops in lockdep's __lock_acquire(), called from __pte_offset_map_lock() called from filemap_map_pages(); or when I run the repro, the oops comes in pmd_install(), called from filemap_map_pmd() called from filemap_map_pages(), just before the __pte_offset_map_lock().
The problem is that filemap_map_pmd() has been assuming that when it finds pmd_none(), a page table has already been prepared in prealloc_pte; and indeed do_fault_around() has been careful to preallocate one there, when it finds pmd_none(): but what if *pmd became none in between?
My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it easy for *pmd to be cleared while servicing a page fault; but even before those, a huge *pmd might be zapped while a fault is serviced.
The difference in symptomatic stack traces comes from the "memory model" in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some models that is strict, and will oops on the NULL prealloc_pte; in other models, it will construct a bogus value to be populated into *pmd, then __pte_offset_map_lock() oops when trying to access split ptlock pointer (or some other symptom in normal case of ptlock embedded not pointer).
Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxh... Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Hugh Dickins hughd@google.com Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/ Reviewed-by: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com, Cc: José Pekkarinen jose.pekkarinen@foxhound.fi Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9aa1345d66b8132745ffb99b348b1492088da9e2) Signed-off-by: Hugh Dickins hughd@google.com
mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Now queued up, thanks.
greg k-h
Thanks Greg: but Sasha appears to have a competing queue, in which he's cherry-picked in a dependency from 5.16 ahead of a clean cherry-pick for this one.
He posted his the next day: I expect it's more to your taste (pull in dependency rather than edit cherry-pick) and it looked fine to me. Please sort out with Sasha which goes forward, either will do.
Hugh
On Mon, Dec 11, 2023 at 09:18:06AM -0800, Hugh Dickins wrote:
On Mon, 11 Dec 2023, Greg KH wrote:
On Sat, Dec 09, 2023 at 09:18:42PM -0800, Hugh Dickins wrote:
syzbot reports oops in lockdep's __lock_acquire(), called from __pte_offset_map_lock() called from filemap_map_pages(); or when I run the repro, the oops comes in pmd_install(), called from filemap_map_pmd() called from filemap_map_pages(), just before the __pte_offset_map_lock().
The problem is that filemap_map_pmd() has been assuming that when it finds pmd_none(), a page table has already been prepared in prealloc_pte; and indeed do_fault_around() has been careful to preallocate one there, when it finds pmd_none(): but what if *pmd became none in between?
My 6.6 mods in mm/khugepaged.c, avoiding mmap_lock for write, have made it easy for *pmd to be cleared while servicing a page fault; but even before those, a huge *pmd might be zapped while a fault is serviced.
The difference in symptomatic stack traces comes from the "memory model" in use: pmd_install() uses pmd_populate() uses page_to_pfn(): in some models that is strict, and will oops on the NULL prealloc_pte; in other models, it will construct a bogus value to be populated into *pmd, then __pte_offset_map_lock() oops when trying to access split ptlock pointer (or some other symptom in normal case of ptlock embedded not pointer).
Link: https://lore.kernel.org/linux-mm/20231115065506.19780-1-jose.pekkarinen@foxh... Link: https://lkml.kernel.org/r/6ed0c50c-78ef-0719-b3c5-60c0c010431c@google.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Hugh Dickins hughd@google.com Reported-and-tested-by: syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-mm/0000000000005e44550608a0806c@google.com/ Reviewed-by: David Hildenbrand david@redhat.com Cc: Jann Horn jannh@google.com, Cc: José Pekkarinen jose.pekkarinen@foxhound.fi Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: stable@vger.kernel.org [5.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 9aa1345d66b8132745ffb99b348b1492088da9e2) Signed-off-by: Hugh Dickins hughd@google.com
mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Now queued up, thanks.
greg k-h
Thanks Greg: but Sasha appears to have a competing queue, in which he's cherry-picked in a dependency from 5.16 ahead of a clean cherry-pick for this one.
He posted his the next day: I expect it's more to your taste (pull in dependency rather than edit cherry-pick) and it looked fine to me. Please sort out with Sasha which goes forward, either will do.
I dropped his version and took yours :)
linux-stable-mirror@lists.linaro.org