On 23/11/2021 13:33, Nikolay Aleksandrov wrote:
On 23/11/2021 13:09, Ido Schimmel wrote:
On Tue, Nov 23, 2021 at 12:27:19PM +0200, Nikolay Aleksandrov wrote:
From: Nikolay Aleksandrov nikolay@nvidia.com
When we try to add an IPv6 nexthop and IPv6 is not enabled (!CONFIG_IPV6) we'll hit a NULL pointer dereference[1] in the error path of nh_create_ipv6() due to calling ipv6_stub->fib6_nh_release. The bug has been present since the beginning of IPv6 nexthop gateway support. Commit 1aefd3de7bc6 ("ipv6: Add fib6_nh_init and release to stubs") tells us that only fib6_nh_init has a dummy stub because fib6_nh_release should not be called if fib6_nh_init returns an error, but the commit below added a call to ipv6_stub->fib6_nh_release in its error path. To fix it return the dummy stub's -EAFNOSUPPORT error directly without calling ipv6_stub->fib6_nh_release in nh_create_ipv6()'s error path.
[...]
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index a69a9e76f99f..5dbd4b5505eb 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -2565,11 +2565,15 @@ static int nh_create_ipv6(struct net *net, struct nexthop *nh, /* sets nh_dev if successful */ err = ipv6_stub->fib6_nh_init(net, fib6_nh, &fib6_cfg, GFP_KERNEL, extack);
- if (err)
- if (err) {
/* IPv6 is not enabled, don't call fib6_nh_release */
if (err == -EAFNOSUPPORT)
ipv6_stub->fib6_nh_release(fib6_nh);goto out;
Is the call actually necessary? If fib6_nh_init() failed, then I believe it should clean up after itself and not rely on fib6_nh_release().
I think it doesn't do that, or at least not entirely. For example take the following sequence of events: fib6_nh_init: ... err = fib_nh_common_init(net, &fib6_nh->nh_common, cfg->fc_encap, cfg->fc_encap_type, cfg, gfp_flags, extack); (passes)
then after:
fib6_nh->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, gfp_flags); if (!fib6_nh->rt6i_pcpu) { err = -ENOMEM; goto out; } (fails)
I don't see anything in the error path that would free the fib_nh_common_init() resources, i.e. nothing calls fib_nh_common_release(), which is called by fib6_nh_release().
By the way, I haven't checked but it looks like fib_check_nh_v6_gw() might leak memory if fib6_nh_init() fails like that unless I'm missing something.
That change might be doable, but much riskier because there is at least 1 call site which relies on fib6_info_release -> fib6_info_destroy_rcu() to call fib6_nh_release in its error path.
I'd prefer to fix these bugs in a straight-forward way and would go with the bigger change for fib6_nh_init() cleanup for net-next. WDYT ?
Cheers, Nik
Just to let everyone know, me and Ido had a quick offline discussion about the issue, I'll try to untangle the places which have different cleanup expectations of fib6_nh_init and try to make it clean up after itself, as that would fix more bugs (e.g. the memory leak I mentioned earlier) automatically. If the change is too risky or becomes bigger than expected we can always continue with the simpler fixes for -net and clean it all up in net-next.
I'll update the thread soon.
Thanks, Nik