From: msizanoen1 msizanoen@qtmlabs.xyz
The kernel leaks memory when a `fib` rule is present in IPv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not decreasing the refcount when needed.
How to reproduce: - Add the following nftables rule to a prerouting chain: meta nfproto ipv6 fib saddr . mark . iif oif missing drop This can be done with: sudo nft create table inet test sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }' sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop - Run: sudo ip -6 rule add table main suppress_prefixlength 0 - Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this.
[1]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266... [2]: https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266...
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105 Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com --- The original author of this commit and commit message is anonymous and is therefore unable to sign off on it. Greg suggested that I do the sign off, extracting it from the bugzilla entry above, and post it properly. The patch "seems to work" on first glance, but I haven't looked deeply at it yet and therefore it doesn't have my Reviewed-by, even though I'm submitting this patch on the author's behalf. And it should probably get a good look from the v6 fib folks. The original author should be on this thread to address issues that come off, and I'll shephard additional versions that he has.
include/net/fib_rules.h | 4 +++- net/core/fib_rules.c | 2 +- net/ipv4/fib_rules.c | 1 + net/ipv6/fib6_rules.c | 4 ++-- 4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 4b10676c69d1..bd07484ab9dd 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -69,7 +69,7 @@ struct fib_rules_ops { int (*action)(struct fib_rule *, struct flowi *, int, struct fib_lookup_arg *); - bool (*suppress)(struct fib_rule *, + bool (*suppress)(struct fib_rule *, int, struct fib_lookup_arg *); int (*match)(struct fib_rule *, struct flowi *, int); @@ -218,7 +218,9 @@ INDIRECT_CALLABLE_DECLARE(int fib4_rule_action(struct fib_rule *rule, struct fib_lookup_arg *arg));
INDIRECT_CALLABLE_DECLARE(bool fib6_rule_suppress(struct fib_rule *rule, + int flags, struct fib_lookup_arg *arg)); INDIRECT_CALLABLE_DECLARE(bool fib4_rule_suppress(struct fib_rule *rule, + int flags, struct fib_lookup_arg *arg)); #endif diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 79df7cd9dbc1..1bb567a3b329 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -323,7 +323,7 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl, if (!err && ops->suppress && INDIRECT_CALL_MT(ops->suppress, fib6_rule_suppress, fib4_rule_suppress, - rule, arg)) + rule, flags, arg)) continue;
if (err != -EAGAIN) { diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index ce54a30c2ef1..364ad3446b2f 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -141,6 +141,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_action(struct fib_rule *rule, }
INDIRECT_CALLABLE_SCOPE bool fib4_rule_suppress(struct fib_rule *rule, + int flags, struct fib_lookup_arg *arg) { struct fib_result *result = (struct fib_result *) arg->result; diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c index 40f3e4f9f33a..dcedfe29d9d9 100644 --- a/net/ipv6/fib6_rules.c +++ b/net/ipv6/fib6_rules.c @@ -267,6 +267,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_action(struct fib_rule *rule, }
INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule, + int flags, struct fib_lookup_arg *arg) { struct fib6_result *res = arg->result; @@ -294,8 +295,7 @@ INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule, return false;
suppress_route: - if (!(arg->flags & FIB_LOOKUP_NOREF)) - ip6_rt_put(rt); + ip6_rt_put_flags(rt, flags); return true; }
On 11/23/21 19:48, Jason A. Donenfeld wrote:
From: msizanoen1 msizanoen@qtmlabs.xyz
The kernel leaks memory when a `fib` rule is present in IPv6 nftables firewall rules and a suppress_prefix rule is present in the IPv6 routing rules (used by certain tools such as wg-quick). In such scenarios, every incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked down the issue to ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have `FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag `RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not decreasing the refcount when needed.
How to reproduce:
- Add the following nftables rule to a prerouting chain: meta nfproto ipv6 fib saddr . mark . iif oif missing drop This can be done with: sudo nft create table inet test sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }' sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
- Run: sudo ip -6 rule add table main suppress_prefixlength 0
- Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol specific `suppress` function, and check the protocol-specific `flags` argument for RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when decreasing the refcount, like this.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105 Fixes: ca7a03c41753 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com
The original author of this commit and commit message is anonymous and is therefore unable to sign off on it. Greg suggested that I do the sign off, extracting it from the bugzilla entry above, and post it properly. The patch "seems to work" on first glance, but I haven't looked deeply at it yet and therefore it doesn't have my Reviewed-by, even though I'm submitting this patch on the author's behalf. And it should probably get a good look from the v6 fib folks. The original author should be on this thread to address issues that come off, and I'll shephard additional versions that he has.
This patch has been running on my personal laptop since I debugged the issue, so you can also add a `Tested-by: msizanoen@qtmlabs.xyz`.
include/net/fib_rules.h | 4 +++- net/core/fib_rules.c | 2 +- net/ipv4/fib_rules.c | 1 + net/ipv6/fib6_rules.c | 4 ++-- 4 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 4b10676c69d1..bd07484ab9dd 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -69,7 +69,7 @@ struct fib_rules_ops { int (*action)(struct fib_rule *, struct flowi *, int, struct fib_lookup_arg *);
- bool (*suppress)(struct fib_rule *,
- bool (*suppress)(struct fib_rule *, int, struct fib_lookup_arg *); int (*match)(struct fib_rule *, struct flowi *, int);
@@ -218,7 +218,9 @@ INDIRECT_CALLABLE_DECLARE(int fib4_rule_action(struct fib_rule *rule, struct fib_lookup_arg *arg)); INDIRECT_CALLABLE_DECLARE(bool fib6_rule_suppress(struct fib_rule *rule,
INDIRECT_CALLABLE_DECLARE(bool fib4_rule_suppress(struct fib_rule *rule,int flags, struct fib_lookup_arg *arg));
#endifint flags, struct fib_lookup_arg *arg));
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 79df7cd9dbc1..1bb567a3b329 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -323,7 +323,7 @@ int fib_rules_lookup(struct fib_rules_ops *ops, struct flowi *fl, if (!err && ops->suppress && INDIRECT_CALL_MT(ops->suppress, fib6_rule_suppress, fib4_rule_suppress,
rule, arg))
rule, flags, arg)) continue;
if (err != -EAGAIN) { diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index ce54a30c2ef1..364ad3446b2f 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -141,6 +141,7 @@ INDIRECT_CALLABLE_SCOPE int fib4_rule_action(struct fib_rule *rule, } INDIRECT_CALLABLE_SCOPE bool fib4_rule_suppress(struct fib_rule *rule,
{ struct fib_result *result = (struct fib_result *) arg->result;int flags, struct fib_lookup_arg *arg)
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c index 40f3e4f9f33a..dcedfe29d9d9 100644 --- a/net/ipv6/fib6_rules.c +++ b/net/ipv6/fib6_rules.c @@ -267,6 +267,7 @@ INDIRECT_CALLABLE_SCOPE int fib6_rule_action(struct fib_rule *rule, } INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule,
{ struct fib6_result *res = arg->result;int flags, struct fib_lookup_arg *arg)
@@ -294,8 +295,7 @@ INDIRECT_CALLABLE_SCOPE bool fib6_rule_suppress(struct fib_rule *rule, return false; suppress_route:
- if (!(arg->flags & FIB_LOOKUP_NOREF))
ip6_rt_put(rt);
- ip6_rt_put_flags(rt, flags); return true; }
On Tue, 23 Nov 2021 13:48:32 +0100 Jason A. Donenfeld wrote:
The original author of this commit and commit message is anonymous and is therefore unable to sign off on it. Greg suggested that I do the sign off, extracting it from the bugzilla entry above, and post it properly. The patch "seems to work" on first glance, but I haven't looked deeply at it yet and therefore it doesn't have my Reviewed-by, even though I'm submitting this patch on the author's behalf. And it should probably get a good look from the v6 fib folks. The original author should be on this thread to address issues that come off, and I'll shephard additional versions that he has.
Does the fact that the author responded to the patch undermine the need for this special handling?
On Tue, Nov 23, 2021 at 08:03:47PM -0800, Jakub Kicinski wrote:
On Tue, 23 Nov 2021 13:48:32 +0100 Jason A. Donenfeld wrote:
The original author of this commit and commit message is anonymous and is therefore unable to sign off on it. Greg suggested that I do the sign off, extracting it from the bugzilla entry above, and post it properly. The patch "seems to work" on first glance, but I haven't looked deeply at it yet and therefore it doesn't have my Reviewed-by, even though I'm submitting this patch on the author's behalf. And it should probably get a good look from the v6 fib folks. The original author should be on this thread to address issues that come off, and I'll shephard additional versions that he has.
Does the fact that the author responded to the patch undermine the need for this special handling?
Unless the author wishes to use their real name, sadly it does not :(
linux-stable-mirror@lists.linaro.org