On Tue, Apr 29, 2025 at 10:33 PM Suren Baghdasaryan surenb@google.com wrote:
On Tue, Apr 29, 2025 at 11:55 AM Jann Horn jannh@google.com wrote:
On Tue, Apr 29, 2025 at 8:04 PM Suren Baghdasaryan surenb@google.com wrote:
On Tue, Apr 29, 2025 at 10:21 AM Jann Horn jannh@google.com wrote:
Hi!
(I just noticed that I incorrectly assumed that VMAs use kfree_rcu (not SLAB_TYPESAFE_BY_RCU) when I wrote my review of this, somehow I forgot all about that...)
Does this fact affect your previous comments? Just want to make sure I'm not missing something...
When I suggested using "WRITE_ONCE(vma->vm_file, NULL)" when tearing down a VMA, and using get_file_rcu() for the lockless lookup, I did not realize that you could actually also race with all the other places that set ->vm_file, like __mmap_new_file_vma() and so on; and I did not think about whether any of those code paths might leave a VMA with a dangling ->vm_file pointer.
So, let me summarize my understanding and see if it's correct.
If we copy the original vma, ensure that it hasn't changed while we were copying (with mmap_lock_speculate_retry()) and then use get_file_rcu(©->vm_file) I think we are guaranteed no UAF because we are in RCU read section. At this point the only issue is that copy->vm_file might have lost its last refcount and get_file_rcu() would enter an infinite loop.
Yeah. (Using get_file_active() would avoid that.)
So, to avoid that we have to use the original vma when calling get_file_rcu()
Sorry - I originally said that, but I didn't think about SLAB_TYPESAFE_BY_RCU when I had that in mind.
but then we should also ensure that vma itself does not change from under us due to SLAB_TYPESAFE_BY_RCU used for vm_area_struct cache. If it does change from under us we might end up accessing an invalid address if vma->vm_file is being modified concurrently.
Yeah, I think in theory we would have data races, since the file* reads in get_file_rcu() could race with all the (plain) ->vm_file pointer stores. So I guess it might actually be safer to use the copied VMA's ->vm_file for this, with get_file_active(). Though that would be abusing get_file_active() quite a bit, so brauner@ should probably look over this early and see whether he thinks that's acceptable...
I guess maybe that means you really do need to do the lookup from the copied data, as you did in your patch; and that might require calling get_file_active() on the copied ->vm_file pointer (instead of get_file_rcu()), even though I think that is not really how get_file_active() is supposed to be used (it's supposed to be used when you know the original file hasn't been freed yet). Really what you'd want for that is basically a raw __get_file_rcu(), but that is static and I think Christian wouldn't want to expose more of these internals outside VFS... (In that case, all the stuff below about get_file_rcu() would be moot.)
Or you could pepper WRITE_ONCE() over all the places that write ->vm_file, and ensure that ->vm_file is always NULLed before its reference is dropped... but that seems a bit more ugly to me.
Ugh, yes. We either ensure no vma->vm_file tearing or use __get_file_rcu() on a copy of the vma. Or we have to stabilize the vma by locking it... Let me think about all these options. Thanks!