As pointed out by Al Viro and Eric Dumazet in [1], using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns as it is usually done. This could cause unexpected issues when other operations are done on the wrong netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' or 'pernet' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly in more places, but that would increase the size of this fix to replace all accesses via 'net'. Probably best to avoid that for fixes.
Patches 2-9 remove access of net via current->nsproxy in sysfs handlers in MPTCP, SCTP and RDS. There are multiple patches doing almost the same thing, but the reason is to ease the backports.
Patch 1 is not directly linked to this, but it is a small fix for MPTCP available_schedulers sysctl knob to explicitly mark it as read-only.
Please note that this series does not address Al's comment [2]. In SCTP, some sysctl knobs set other sysfs-exposed variables for the min/max: two processes could then write two linked values at the same time, resulting in new values being outside the new boundaries. It would be great if SCTP developers can look at this problem.
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Link: https://lore.kernel.org/netdev/20250105211158.GL1977892@ZenIV/ [2] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Matthieu Baerts (NGI0) (9): mptcp: sysctl: avail sched: remove write access mptcp: sysctl: sched: avoid using current->nsproxy mptcp: sysctl: blackhole timeout: avoid using current->nsproxy sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy sctp: sysctl: rto_min/max: avoid using current->nsproxy sctp: sysctl: auth_enable: avoid using current->nsproxy sctp: sysctl: udp_port: avoid using current->nsproxy sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy
net/mptcp/ctrl.c | 17 +++++++++-------- net/rds/tcp.c | 39 ++++++++++++++++++++++++++++++++------- net/sctp/sysctl.c | 14 ++++++++------ 3 files changed, 49 insertions(+), 21 deletions(-) --- base-commit: db78475ba0d3c66d430f7ded2388cc041078a542 change-id: 20250108-net-sysctl-current-nsproxy-672ae21a873f
Best regards,
'net.mptcp.available_schedulers' sysctl knob is there to list available schedulers, not to modify this list.
There are then no reasons to give write access to it.
Nothing would have been written anyway, but no errors would have been returned, which is unexpected.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index 38d8121331d4a981d4a60ebd8f6cd9482fc2b50c..d9b57fab2a13e64b6c8585e821ed5212f59f8651 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -228,7 +228,7 @@ static struct ctl_table mptcp_sysctl_table[] = { { .procname = "available_schedulers", .maxlen = MPTCP_SCHED_BUF_MAX, - .mode = 0644, + .mode = 0444, .proc_handler = proc_available_schedulers, }, {
On Wed, Jan 8, 2025 at 4:35 PM Matthieu Baerts (NGI0) matttbe@kernel.org wrote:
'net.mptcp.available_schedulers' sysctl knob is there to list available schedulers, not to modify this list.
There are then no reasons to give write access to it.
Nothing would have been written anyway, but no errors would have been returned, which is unexpected.
Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Reviewed-by: Eric Dumazet edumazet@google.com
Using the 'net' structure via 'current' is not recommended for different reasons.
First, if the goal is to use it to read or write per-netns data, this is inconsistent with how the "generic" sysctl entries are doing: directly by only using pointers set to the table entry, e.g. table->data. Linked to that, the per-netns data should always be obtained from the table linked to the netns it had been created for, which may not coincide with the reader's or writer's netns.
Another reason is that access to current->nsproxy->netns can oops if attempted when current->nsproxy had been dropped when the current task is exiting. This is what syzbot found, when using acct(2):
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601 __kernel_write_iter+0x318/0xa80 fs/read_write.c:612 __kernel_write+0xf6/0x140 fs/read_write.c:632 do_acct_process+0xcb0/0x14a0 kernel/acct.c:539 acct_pin_kill+0x2d/0x100 kernel/acct.c:192 pin_kill+0x194/0x7c0 fs/fs_pin.c:44 mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81 cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366 task_work_run+0x14e/0x250 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xad8/0x2d70 kernel/exit.c:938 do_group_exit+0xd3/0x2a0 kernel/exit.c:1087 get_signal+0x2576/0x2610 kernel/signal.c:3017 arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop kernel/entry/common.c:111 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218 do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fee3cb87a6a Code: Unable to access opcode bytes at 0x7fee3cb87a40. RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037 RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7 R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500 R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125 Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00 RSP: 0018:ffffc900034774e8 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620 RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028 RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040 R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000 R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---------------- Code disassembly (best guess), 1 bytes skipped: 0: 42 80 3c 38 00 cmpb $0x0,(%rax,%r15,1) 5: 0f 85 fe 02 00 00 jne 0x309 b: 4d 8b a4 24 08 09 00 mov 0x908(%r12),%r12 12: 00 13: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 1a: fc ff df 1d: 49 8d 7c 24 28 lea 0x28(%r12),%rdi 22: 48 89 fa mov %rdi,%rdx 25: 48 c1 ea 03 shr $0x3,%rdx * 29: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction 2d: 0f 85 cc 02 00 00 jne 0x2ff 33: 4d 8b 7c 24 28 mov 0x28(%r12),%r15 38: 48 rex.W 39: 8d .byte 0x8d 3a: 84 24 c8 test %ah,(%rax,%rcx,8)
Here with 'net.mptcp.scheduler', the 'net' structure is not really needed, because the table->data already has a pointer to the current scheduler, the only thing needed from the per-netns data. Simply use 'data', instead of getting (most of the time) the same thing, but from a longer and indirect way.
Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler") Cc: stable@vger.kernel.org Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com Suggested-by: Al Viro viro@zeniv.linux.org.uk Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/ctrl.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index d9b57fab2a13e64b6c8585e821ed5212f59f8651..81c30aa02196d69c55799e5963f6591e416c8831 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -102,16 +102,15 @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet) }
#ifdef CONFIG_SYSCTL -static int mptcp_set_scheduler(const struct net *net, const char *name) +static int mptcp_set_scheduler(char *scheduler, const char *name) { - struct mptcp_pernet *pernet = mptcp_get_pernet(net); struct mptcp_sched_ops *sched; int ret = 0;
rcu_read_lock(); sched = mptcp_sched_find(name); if (sched) - strscpy(pernet->scheduler, name, MPTCP_SCHED_NAME_MAX); + strscpy(scheduler, name, MPTCP_SCHED_NAME_MAX); else ret = -ENOENT; rcu_read_unlock(); @@ -122,7 +121,7 @@ static int mptcp_set_scheduler(const struct net *net, const char *name) static int proc_scheduler(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - const struct net *net = current->nsproxy->net_ns; + char (*scheduler)[MPTCP_SCHED_NAME_MAX] = ctl->data; char val[MPTCP_SCHED_NAME_MAX]; struct ctl_table tbl = { .data = val, @@ -130,11 +129,11 @@ static int proc_scheduler(const struct ctl_table *ctl, int write, }; int ret;
- strscpy(val, mptcp_get_scheduler(net), MPTCP_SCHED_NAME_MAX); + strscpy(val, *scheduler, MPTCP_SCHED_NAME_MAX);
ret = proc_dostring(&tbl, write, buffer, lenp, ppos); if (write && ret == 0) - ret = mptcp_set_scheduler(net, val); + ret = mptcp_set_scheduler(*scheduler, val);
return ret; }
As mentioned in the previous commit, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'pernet' structure can be obtained from the table->data using container_of().
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/ctrl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index 81c30aa02196d69c55799e5963f6591e416c8831..b0dd008e2114bce65ee3906bbdc19a5a4316cefa 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -160,7 +160,9 @@ static int proc_blackhole_detect_timeout(const struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct mptcp_pernet *pernet = mptcp_get_pernet(current->nsproxy->net_ns); + struct mptcp_pernet *pernet = container_of(table->data, + struct mptcp_pernet, + blackhole_timeout); int ret;
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly, as this is the only member needed from the 'net' structure, but that would increase the size of this fix, to use '*data' everywhere 'net->sctp.sctp_hmac_alg' is used.
Fixes: 3c68198e7511 ("sctp: Make hmac algorithm selection for cookie generation dynamic") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/sctp/sysctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index e5a5af343c4c98de1afb27359c104f5030583ac4..9848d19630a4f760238a3a2abd3ec823f012d34a 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -387,7 +387,8 @@ static struct ctl_table sctp_net_table[] = { static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, + sctp.sctp_hmac_alg); struct ctl_table tbl; bool changed = false; char *none = "none";
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly, as this is the only member needed from the 'net' structure, but that would increase the size of this fix, to use '*data' everywhere 'net->sctp.rto_min/max' is used.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/sctp/sysctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 9848d19630a4f760238a3a2abd3ec823f012d34a..a5285815264dfa9d88d1d71244f309448e97a506 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -433,7 +433,7 @@ static int proc_sctp_do_hmac_alg(const struct ctl_table *ctl, int write, static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, sctp.rto_min); unsigned int min = *(unsigned int *) ctl->extra1; unsigned int max = *(unsigned int *) ctl->extra2; struct ctl_table tbl; @@ -461,7 +461,7 @@ static int proc_sctp_do_rto_min(const struct ctl_table *ctl, int write, static int proc_sctp_do_rto_max(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, sctp.rto_max); unsigned int min = *(unsigned int *) ctl->extra1; unsigned int max = *(unsigned int *) ctl->extra2; struct ctl_table tbl;
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly, but that would increase the size of this fix, while 'sctp.ctl_sock' still needs to be retrieved from 'net' structure.
Fixes: b14878ccb7fa ("net: sctp: cache auth_enable per endpoint") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/sctp/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index a5285815264dfa9d88d1d71244f309448e97a506..9d29611621feaf0d2e8d7c923601ab374515563b 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -499,7 +499,7 @@ static int proc_sctp_do_alpha_beta(const struct ctl_table *ctl, int write, static int proc_sctp_do_auth(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, sctp.auth_enable); struct ctl_table tbl; int new_value, ret;
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly, but that would increase the size of this fix, while 'sctp.ctl_sock' still needs to be retrieved from 'net' structure.
Fixes: 046c052b475e ("sctp: enable udp tunneling socks") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/sctp/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 9d29611621feaf0d2e8d7c923601ab374515563b..18fa4f44e8ec8c86f8415b1251ef8a2979c7f823 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -528,7 +528,7 @@ static int proc_sctp_do_auth(const struct ctl_table *ctl, int write, static int proc_sctp_do_udp_port(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, sctp.udp_port); unsigned int min = *(unsigned int *)ctl->extra1; unsigned int max = *(unsigned int *)ctl->extra2; struct ctl_table tbl;
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The 'net' structure can be obtained from the table->data using container_of().
Note that table->data could also be used directly, as this is the only member needed from the 'net' structure, but that would increase the size of this fix, to use '*data' everywhere 'net->sctp.probe_interval' is used.
Fixes: d1e462a7a5f3 ("sctp: add probe_interval in sysctl and sock/asoc/transport") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/sctp/sysctl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c index 18fa4f44e8ec8c86f8415b1251ef8a2979c7f823..8e1e97be4df79f3245e2bbbeb0a75841abc67f58 100644 --- a/net/sctp/sysctl.c +++ b/net/sctp/sysctl.c @@ -569,7 +569,8 @@ static int proc_sctp_do_udp_port(const struct ctl_table *ctl, int write, static int proc_sctp_do_probe_interval(const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos) { - struct net *net = current->nsproxy->net_ns; + struct net *net = container_of(ctl->data, struct net, + sctp.probe_interval); struct ctl_table tbl; int ret, new_value;
As mentioned in a previous commit of this series, using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns.
- current->nsproxy can be NULL in some cases, resulting in an 'Oops' (null-ptr-deref), e.g. when the current task is exiting, as spotted by syzbot [1] using acct(2).
The per-netns structure can be obtained from the table->data using container_of(), then the 'net' one can be retrieved from the listen socket (if available).
Fixes: c6a58ffed536 ("RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1] Suggested-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/rds/tcp.c | 39 ++++++++++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 7 deletions(-)
diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 351ac1747224a3a1c8b0e297ba53cdbbcbc55401..0581c53e6517043ad6c2ad4207b26ab169989ed8 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -61,8 +61,10 @@ static atomic_t rds_tcp_unloading = ATOMIC_INIT(0);
static struct kmem_cache *rds_tcp_conn_slab;
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write, - void *buffer, size_t *lenp, loff_t *fpos); +static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *fpos); +static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *fpos);
static int rds_tcp_min_sndbuf = SOCK_MIN_SNDBUF; static int rds_tcp_min_rcvbuf = SOCK_MIN_RCVBUF; @@ -74,7 +76,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = { /* data is per-net pointer */ .maxlen = sizeof(int), .mode = 0644, - .proc_handler = rds_tcp_skbuf_handler, + .proc_handler = rds_tcp_sndbuf_handler, .extra1 = &rds_tcp_min_sndbuf, }, #define RDS_TCP_RCVBUF 1 @@ -83,7 +85,7 @@ static struct ctl_table rds_tcp_sysctl_table[] = { /* data is per-net pointer */ .maxlen = sizeof(int), .mode = 0644, - .proc_handler = rds_tcp_skbuf_handler, + .proc_handler = rds_tcp_rcvbuf_handler, .extra1 = &rds_tcp_min_rcvbuf, }, }; @@ -682,10 +684,10 @@ static void rds_tcp_sysctl_reset(struct net *net) spin_unlock_irq(&rds_tcp_conn_lock); }
-static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write, +static int rds_tcp_skbuf_handler(struct rds_tcp_net *rtn, + const struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *fpos) { - struct net *net = current->nsproxy->net_ns; int err;
err = proc_dointvec_minmax(ctl, write, buffer, lenp, fpos); @@ -694,11 +696,34 @@ static int rds_tcp_skbuf_handler(const struct ctl_table *ctl, int write, *(int *)(ctl->extra1)); return err; } - if (write) + + if (write && rtn->rds_tcp_listen_sock && rtn->rds_tcp_listen_sock->sk) { + struct net *net = sock_net(rtn->rds_tcp_listen_sock->sk); + rds_tcp_sysctl_reset(net); + } + return 0; }
+static int rds_tcp_sndbuf_handler(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *fpos) +{ + struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net, + sndbuf_size); + + return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos); +} + +static int rds_tcp_rcvbuf_handler(const struct ctl_table *ctl, int write, + void *buffer, size_t *lenp, loff_t *fpos) +{ + struct rds_tcp_net *rtn = container_of(ctl->data, struct rds_tcp_net, + rcvbuf_size); + + return rds_tcp_skbuf_handler(rtn, ctl, write, buffer, lenp, fpos); +} + static void rds_tcp_exit(void) { rds_tcp_set_unloading();
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 08 Jan 2025 16:34:28 +0100 you wrote:
As pointed out by Al Viro and Eric Dumazet in [1], using the 'net' structure via 'current' is not recommended for different reasons:
- Inconsistency: getting info from the reader's/writer's netns vs only from the opener's netns as it is usually done. This could cause unexpected issues when other operations are done on the wrong netns.
[...]
Here is the summary with links: - [net,1/9] mptcp: sysctl: avail sched: remove write access https://git.kernel.org/netdev/net/c/771ec78dc8b4 - [net,2/9] mptcp: sysctl: sched: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/d38e26e36206 - [net,3/9] mptcp: sysctl: blackhole timeout: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/92cf7a51bdae - [net,4/9] sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/ea62dd138391 - [net,5/9] sctp: sysctl: rto_min/max: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/9fc17b76fc70 - [net,6/9] sctp: sysctl: auth_enable: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/15649fd5415e - [net,7/9] sctp: sysctl: udp_port: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/c10377bbc197 - [net,8/9] sctp: sysctl: plpmtud_probe_interval: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/6259d2484d0c - [net,9/9] rds: sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy https://git.kernel.org/netdev/net/c/7f5611cbc487
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org