On Friday 09/05 at 10:25 -0700, Breno Leitao wrote:
commit efa95b01da18 ("netpoll: fix use after free") incorrectly ignored the refcount and prematurely set dev->npinfo to NULL during netpoll cleanup, leading to improper behavior and memory leaks.
Scenario causing lack of proper cleanup:
A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is allocated, and refcnt = 1
- Keep in mind that npinfo is shared among all netpoll instances. In this case, there is just one.
Another netpoll is also associated with the same NIC and npinfo->refcnt += 1.
- Now dev->npinfo->refcnt = 2;
- There is just one npinfo associated to the netdev.
When the first netpolls goes to clean up:
- The first cleanup succeeds and clears np->dev->npinfo, ignoring refcnt.
- It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);`
- Set dev->npinfo = NULL, without proper cleanup
- No ->ndo_netpoll_cleanup() is either called
Now the second target tries to clean up
- The second cleanup fails because np->dev->npinfo is already NULL.
- In this case, ops->ndo_netpoll_cleanup() was never called, and the skb pool is not cleaned as well (for the second netpoll instance)
- This leaks npinfo and skbpool skbs, which is clearly reported by kmemleak.
Revert commit efa95b01da18 ("netpoll: fix use after free") and adds clarifying comments emphasizing that npinfo cleanup should only happen once the refcount reaches zero, ensuring stable and correct netpoll behavior.
This makes sense to me.
Just curious, did you try the original OOPS reproducer? https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.140485...
I wonder if there might be a demon lurking in bonding+netpoll that this was papering over? Not a reason not to fix the leaks IMO, I'm just curious, I don't want to spend time on it if you already did :)
The discussion on v1 isn't enlightening either: https://lore.kernel.org/lkml/0f692012238337f2c40893319830ae042523ce18.140417...
Thanks, Calvin
Cc: stable@vger.kernel.org Cc: jv@jvosburgh.net Fixes: efa95b01da18 ("netpoll: fix use after free") Signed-off-by: Breno Leitao leitao@debian.org
net/core/netpoll.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 5f65b62346d4e..19676cd379640 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np) if (!npinfo) return;
- /* At this point, there is a single npinfo instance per netdevice, and
* its refcnt tracks how many netpoll structures are linked to it. We
* only perform npinfo cleanup when the refcnt decrements to zero.
if (refcount_dec_and_test(&npinfo->refcnt)) { const struct net_device_ops *ops;*/
@@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np) RCU_INIT_POINTER(np->dev->npinfo, NULL); call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
- } else
RCU_INIT_POINTER(np->dev->npinfo, NULL);
- }
skb_pool_flush(np); }
-- 2.47.3