Syzkaller reports use-after-free for net_device's in 5.10 stable releases. The problem has been fixed by the following patch series and it can be cleanly applied to the 5.10 branch.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
From: Jakub Kicinski kuba@kernel.org
commit 2b446e650b418f9a9e75f99852e2f2560cabfa17 upstream.
Explain the two basic flows of struct net_device's operation.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- Documentation/networking/netdevices.rst | 171 +++++++++++++++++++++++- net/core/rtnetlink.c | 2 +- 2 files changed, 166 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index e65665c5ab50..17bdcb746dcf 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -10,18 +10,177 @@ Introduction The following is a random collection of documentation regarding network devices.
-struct net_device allocation rules -================================== +struct net_device lifetime rules +================================ Network device structures need to persist even after module is unloaded and must be allocated with alloc_netdev_mqs() and friends. If device has registered successfully, it will be freed on last use -by free_netdev(). This is required to handle the pathologic case cleanly -(example: rmmod mydriver </sys/class/net/myeth/mtu ) +by free_netdev(). This is required to handle the pathological case cleanly +(example: ``rmmod mydriver </sys/class/net/myeth/mtu``)
-alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver +alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver private data which gets freed when the network device is freed. If separately allocated data is attached to the network device -(netdev_priv(dev)) then it is up to the module exit handler to free that. +(netdev_priv()) then it is up to the module exit handler to free that. + +There are two groups of APIs for registering struct net_device. +First group can be used in normal contexts where ``rtnl_lock`` is not already +held: register_netdev(), unregister_netdev(). +Second group can be used when ``rtnl_lock`` is already held: +register_netdevice(), unregister_netdevice(), free_netdevice(). + +Simple drivers +-------------- + +Most drivers (especially device drivers) handle lifetime of struct net_device +in context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths). + +In that case the struct net_device registration is done using +the register_netdev(), and unregister_netdev() functions: + +.. code-block:: c + + int probe() + { + struct my_device_priv *priv; + int err; + + dev = alloc_netdev_mqs(...); + if (!dev) + return -ENOMEM; + priv = netdev_priv(dev); + + /* ... do all device setup before calling register_netdev() ... + */ + + err = register_netdev(dev); + if (err) + goto err_undo; + + /* net_device is visible to the user! */ + + err_undo: + /* ... undo the device setup ... */ + free_netdev(dev); + return err; + } + + void remove() + { + unregister_netdev(dev); + free_netdev(dev); + } + +Note that after calling register_netdev() the device is visible in the system. +Users can open it and start sending / receiving traffic immediately, +or run any other callback, so all initialization must be done prior to +registration. + +unregister_netdev() closes the device and waits for all users to be done +with it. The memory of struct net_device itself may still be referenced +by sysfs but all operations on that device will fail. + +free_netdev() can be called after unregister_netdev() returns on when +register_netdev() failed. + +Device management under RTNL +---------------------------- + +Registering struct net_device while in context which already holds +the ``rtnl_lock`` requires extra care. In those scenarios most drivers +will want to make use of struct net_device's ``needs_free_netdev`` +and ``priv_destructor`` members for freeing of state. + +Example flow of netdev handling under ``rtnl_lock``: + +.. code-block:: c + + static void my_setup(struct net_device *dev) + { + dev->needs_free_netdev = true; + } + + static void my_destructor(struct net_device *dev) + { + some_obj_destroy(priv->obj); + some_uninit(priv); + } + + int create_link() + { + struct my_device_priv *priv; + int err; + + ASSERT_RTNL(); + + dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup); + if (!dev) + return -ENOMEM; + priv = netdev_priv(dev); + + /* Implicit constructor */ + err = some_init(priv); + if (err) + goto err_free_dev; + + priv->obj = some_obj_create(); + if (!priv->obj) { + err = -ENOMEM; + goto err_some_uninit; + } + /* End of constructor, set the destructor: */ + dev->priv_destructor = my_destructor; + + err = register_netdevice(dev); + if (err) + /* register_netdevice() calls destructor on failure */ + goto err_free_dev; + + /* If anything fails now unregister_netdevice() (or unregister_netdev()) + * will take care of calling my_destructor and free_netdev(). + */ + + return 0; + + err_some_uninit: + some_uninit(priv); + err_free_dev: + free_netdev(dev); + return err; + } + +If struct net_device.priv_destructor is set it will be called by the core +some time after unregister_netdevice(), it will also be called if +register_netdevice() fails. The callback may be invoked with or without +``rtnl_lock`` held. + +There is no explicit constructor callback, driver "constructs" the private +netdev state after allocating it and before registration. + +Setting struct net_device.needs_free_netdev makes core call free_netdevice() +automatically after unregister_netdevice() when all references to the device +are gone. It only takes effect after a successful call to register_netdevice() +so if register_netdevice() fails driver is responsible for calling +free_netdev(). + +free_netdev() is safe to call on error paths right after unregister_netdevice() +or when register_netdevice() fails. Parts of netdev (de)registration process +happen after ``rtnl_lock`` is released, therefore in those cases free_netdev() +will defer some of the processing until ``rtnl_lock`` is released. + +Devices spawned from struct rtnl_link_ops should never free the +struct net_device directly. + +.ndo_init and .ndo_uninit +~~~~~~~~~~~~~~~~~~~~~~~~~ + +``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device +registration and de-registration, under ``rtnl_lock``. Drivers can use +those e.g. when parts of their init process need to run under ``rtnl_lock``. + +``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit`` +runs during de-registering after device is closed but other subsystems +may still have outstanding references to the netdevice.
MTU === diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index bb0596c41b3e..79f514afb17d 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3441,7 +3441,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
if (ops->newlink) { err = ops->newlink(link_net ? : net, dev, tb, data, extack); - /* Drivers should call free_netdev() in ->destructor + /* Drivers should set dev->needs_free_netdev * and unregister it on failure after registration * so that device could be finally freed in rtnl_unlock. */
On Thu, Jul 14, 2022 at 07:11:32PM +0300, Fedor Pchelkin wrote:
From: Jakub Kicinski kuba@kernel.org
commit 2b446e650b418f9a9e75f99852e2f2560cabfa17 upstream.
Explain the two basic flows of struct net_device's operation.
Signed-off-by: Jakub Kicinski kuba@kernel.org
If you pass on patches to others, you also must sign off on them.
Please do so for all of the patches in this series.
And why is a documentation patch needed for stable?
thanks,
greg k-h
If you pass on patches to others, you also must sign off on them.
Sorry. I've fixed that for all the patches.
I also found out that the previous series was incomplete. It solved the use-after-free bug but produced a new one. It turns out that several more patches are required in order to fully solve this problem.
And why is a documentation patch needed for stable?
I think the documentation patch is needed because it safely fixes a docs file section on the issue and a comment fragment. It is important for consistency because that docs patch describes the problem and gives more clarity to the code, and it was included in the original patch series as well.
Fedor
From: Jakub Kicinski kuba@kernel.org
commit c269a24ce057abfc31130960e96ab197ef6ab196 upstream.
There are two flavors of handling netdev registration: - ones called without holding rtnl_lock: register_netdev() and unregister_netdev(); and - those called with rtnl_lock held: register_netdevice() and unregister_netdevice().
While the semantics of the former are pretty clear, the same can't be said about the latter. The netdev_todo mechanism is utilized to perform some of the device unregistering tasks and it hooks into rtnl_unlock() so the locked variants can't actually finish the work. In general free_netdev() does not mix well with locked calls. Most drivers operating under rtnl_lock set dev->needs_free_netdev to true and expect core to make the free_netdev() call some time later.
The part where this becomes most problematic is error paths. There is no way to unwind the state cleanly after a call to register_netdevice(), since unreg can't be performed fully without dropping locks.
Make free_netdev() more lenient, and defer the freeing if device is being unregistered. This allows error paths to simply call free_netdev() both after register_netdevice() failed, and after a call to unregister_netdevice() but before dropping rtnl_lock.
Simplify the error paths which are currently doing gymnastics around free_netdev() handling.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/8021q/vlan.c | 4 +--- net/core/dev.c | 11 +++++++++++ net/core/rtnetlink.c | 23 ++++++----------------- 3 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 15bbfaf943fd..8b644113715e 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -284,9 +284,7 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id) return 0;
out_free_newdev: - if (new_dev->reg_state == NETREG_UNINITIALIZED || - new_dev->reg_state == NETREG_UNREGISTERED) - free_netdev(new_dev); + free_netdev(new_dev); return err; }
diff --git a/net/core/dev.c b/net/core/dev.c index 8fa739259041..adde93cbca9f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10631,6 +10631,17 @@ void free_netdev(struct net_device *dev) struct napi_struct *p, *n;
might_sleep(); + + /* When called immediately after register_netdevice() failed the unwind + * handling may still be dismantling the device. Handle that case by + * deferring the free. + */ + if (dev->reg_state == NETREG_UNREGISTERING) { + ASSERT_RTNL(); + dev->needs_free_netdev = true; + return; + } + netif_free_tx_queues(dev); netif_free_rx_queues(dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 79f514afb17d..3d6ab194d0f5 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3439,26 +3439,15 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
dev->ifindex = ifm->ifi_index;
- if (ops->newlink) { + if (ops->newlink) err = ops->newlink(link_net ? : net, dev, tb, data, extack); - /* Drivers should set dev->needs_free_netdev - * and unregister it on failure after registration - * so that device could be finally freed in rtnl_unlock. - */ - if (err < 0) { - /* If device is not registered at all, free it now */ - if (dev->reg_state == NETREG_UNINITIALIZED || - dev->reg_state == NETREG_UNREGISTERED) - free_netdev(dev); - goto out; - } - } else { + else err = register_netdevice(dev); - if (err < 0) { - free_netdev(dev); - goto out; - } + if (err < 0) { + free_netdev(dev); + goto out; } + err = rtnl_configure_link(dev, ifm); if (err < 0) goto out_unregister;
From: Jakub Kicinski kuba@kernel.org
commit 766b0515d5bec4b780750773ed3009b148df8c0a upstream.
If register_netdevice() fails at the very last stage - the notifier call - some subsystems may have already seen it and grabbed a reference. struct net_device can't be freed right away without calling netdev_wait_all_refs().
Now that we have a clean interface in form of dev->needs_free_netdev and lenient free_netdev() we can undo what commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.") has done and complete the unregistration path by bringing the net_set_todo() call back.
After registration fails user is still expected to explicitly free the net_device, so make sure ->needs_free_netdev is cleared, otherwise rolling back the registration will cause the old double free for callers who release rtnl_lock before the free.
This also solves the problem of priv_destructor not being called on notifier error.
net_set_todo() will be moved back into unregister_netdevice_queue() in a follow up.
Reported-by: Hulk Robot hulkci@huawei.com Reported-by: Yang Yingliang yangyingliang@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/core/dev.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index adde93cbca9f..0071a11a6dc3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10077,17 +10077,11 @@ int register_netdevice(struct net_device *dev) ret = call_netdevice_notifiers(NETDEV_REGISTER, dev); ret = notifier_to_errno(ret); if (ret) { + /* Expect explicit free_netdev() on failure */ + dev->needs_free_netdev = false; rollback_registered(dev); - rcu_barrier(); - - dev->reg_state = NETREG_UNREGISTERED; - /* We should put the kobject that hold in - * netdev_unregister_kobject(), otherwise - * the net device cannot be freed when - * driver calls free_netdev(), because the - * kobject is being hold. - */ - kobject_put(&dev->dev.kobj); + net_set_todo(dev); + goto out; } /* * Prevent userspace races by waiting until the network
linux-stable-mirror@lists.linaro.org