Re: [PATCH 4.9.y] net: sched: prevent UAF on tc_ctl_tfilter when temporarily dropping rtnl_lock

3 May 2022


      On Tue, May 03, 2022 at 04:27:20PM +0200, Greg KH wrote:
...
On Tue, May 03, 2022 at 11:24:01AM -0300, Thadeu Lima de Souza Cascardo wrote:
...
On Tue, May 03, 2022 at 03:49:15PM +0200, Greg KH wrote:
...
On Mon, May 02, 2022 at 05:49:24PM -0300, Thadeu Lima de Souza Cascardo wrote:
...
When dropping the rtnl_lock for looking up for a module, the device may be
removed, releasing the qdisc and class memory. Right after trying to load
the module, cl_ops->put is called, leading to a potential use-after-free.
Though commit e368fdb61d8e ("net: sched: use Qdisc rcu API instead of
relying on rtnl lock") fixes this, it involves a lot of refactoring of the
net/sched/ code, complicating its backport.
What about 4.14.y?  We can not take a commit for 4.9.y with it also
being broken in 4.14.y, and yet fixed in 4.19.y, right?  Anyone who
updates from 4.9 to 4.14 will have a regression.
thanks,
greg k-h
4.14.y does not call cl_ops->put (the get/put and class refcount has been done
with on 4.14.y). However, on the error path after the lock has been dropped,
tcf_chain_put is called. But it does not touch the qdisc, but only the chain
and block objects, which cannot be released on a race condition, as far as I
was able to investigate.
So what changed between 4.9 and 4.14 that requires this out-of-tree
change to 4.9 for the issue?  Shouldn't we backport that change instead
of this custom one?
thanks,
greg k-h
143976ce992f ("net_sched: remove tc class reference counting") removed the call
to cops->put as that reference counting was removed and the get call was
replaced by find.
Backporting it is an alternative fix, but there are more chances of breaking
something else, as it is not a trivial cherry-pick.
Cascardo.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 4.9.y] net: sched: prevent UAF on tc_ctl_tfilter when temporarily dropping rtnl_lock