Hi Gerald,
Thanks, we were also trying to reproduce on x86, w/o success so far. But I guess that matches David latest observations wrt to our exception handling code on s390.
Good news is that the problem goes away when I add this simple patch, which should result in proper VM_WRITE check for vma flags, before triggering a FAULT_FLAG_WRITE fault:
--- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -379,7 +379,9 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) flags = FAULT_FLAG_DEFAULT; if (user_mode(regs)) flags |= FAULT_FLAG_USER;
if (access == VM_WRITE || is_write)
if (is_write)
access = VM_WRITE;
if (access == VM_WRITE) flags |= FAULT_FLAG_WRITE; mmap_read_lock(mm);
That's what I had in mind, good.
Still find it a bit hard to believe that this > 10 years old logic really is/was broken all the time. I guess it simply did not matter for normal PTE faults, probably because the common fault handling code later would check itself via maybe_mkwrite(). And for hugetlb PTEs, it might not have mattered before commit bcd51a3c679d.
It is akward, but maybe we never really noticed for hugetlb (not sure how common read-only mappings are after all).
bcd51a3c679d eliminates the copying of page tables at fork for non-anon hugetlb vmas. So, in these tests you would likely see more pte_none() faults.
Yes, makes sense, assuming now that it actually is related to s390 exception handling code, not checking for VM_WRITE before triggering a write fault for pte_none().
Thanks for checking! And Thanks a lot to David for finding that issue in s390 exception handling code!
Thanks! Looks like adding the WARN_ON_ONCE was the right decision.