On Wed, Sep 25, 2024 at 7:48 PM Carlos Llamas cmllamas@google.com wrote:
On Wed, Sep 25, 2024 at 10:02:51AM +0200, 'Alice Ryhl' via kernel-team wrote:
On Tue, Sep 24, 2024 at 8:44 PM Carlos Llamas cmllamas@google.com wrote:
In binder_add_freeze_work() we iterate over the proc->nodes with the proc->inner_lock held. However, this lock is temporarily dropped to acquire the node->lock first (lock nesting order). This can race with binder_deferred_release() which removes the nodes from the proc->nodes rbtree and adds them into binder_dead_nodes list. This leads to a broken iteration in binder_add_freeze_work() as rb_next() will use data from binder_dead_nodes, triggering an out-of-bounds access:
================================================================== BUG: KASAN: global-out-of-bounds in rb_next+0xfc/0x124 Read of size 8 at addr ffffcb84285f7170 by task freeze/660
CPU: 8 UID: 0 PID: 660 Comm: freeze Not tainted 6.11.0-07343-ga727812a8d45 #18 Hardware name: linux,dummy-virt (DT) Call trace: rb_next+0xfc/0x124 binder_add_freeze_work+0x344/0x534 binder_ioctl+0x1e70/0x25ac __arm64_sys_ioctl+0x124/0x190
The buggy address belongs to the variable: binder_dead_nodes+0x10/0x40 [...] ==================================================================
This is possible because proc->nodes (rbtree) and binder_dead_nodes (list) share entries in binder_node through a union:
struct binder_node { [...] union { struct rb_node rb_node; struct hlist_node dead_node; };
Fix the race by checking that the proc is still alive. If not, simply break out of the iteration.
Fixes: d579b04a52a1 ("binder: frozen notification") Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com
This change LGTM. Reviewed-by: Alice Ryhl aliceryhl@google.com
I reviewed some other code paths to verify whether there are other problems with processes dying concurrently with operations on freeze notifications. I didn't notice any other memory safety issues, but I
Yeah most other paths are protected with binder_procs_lock mutex.
noticed that binder_request_freeze_notification returns EINVAL if you try to use it with a node from a dead process. That seems problematic, as this means that there's no way to invoke that command without risking an EINVAL error if the remote process dies. We should not return EINVAL errors on correct usage of the driver.
Agreed, this should probably be -ESRCH or something. I'll add it to v2, thanks for the suggestion.
Well, maybe? I think it's best to not return errnos from these commands at all, as they obscure how many commands were processed.
Since the node still exists even if the process dies, perhaps we can just let you create the freeze notification even if it's dead? We can make it end up in the same state as if you request a freeze notification and the process then dies afterwards.
Alice