This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.
Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
mentioned as CVE-2024-54680, but is actually did not fix anything;
The issue can be reproduced on top of it. [0]
Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
Fix use-after-free of network namespace.") and introduced a real
issue by reviving the kernel TCP socket.
When a reconnect happens for a CIFS connection, the socket state
transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
in tcp_close() stops all timers for the socket.
If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.
Usually, FIN can be retransmitted by the peer, but if the peer aborts
the connection, the issue comes into reality.
I warned about this privately by pointing out the exact report [1],
but the bogus fix was finally merged.
So, we should not stop the timers to finally kill the connection on
our side in that case, meaning we must not use a kernel socket for
TCP whose sk->sk_net_refcnt is 0.
The kernel socket does not have a reference to its netns to make it
possible to tear down netns without cleaning up every resource in it.
For example, tunnel devices use a UDP socket internally, but we can
destroy netns without removing such devices and let it complete
during exit. Otherwise, netns would be leaked when the last application
died.
However, this is problematic for TCP sockets because TCP has timers to
close the connection gracefully even after the socket is close()d. The
lifetime of the socket and its netns is different from the lifetime of
the underlying connection.
If the socket user does not maintain the netns lifetime, the timer could
be fired after the socket is close()d and its netns is freed up, resulting
in use-after-free.
Actually, we have seen so many similar issues and converted such sockets
to have a reference to netns.
That's why I converted the CIFS client socket to have a reference to
netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
fix**.
Regarding the LOCKDEP issue, we can prevent the module unload by
bumping the module refcount when switching the LOCKDDEP key in
sock_lock_init_class_and_name(). [2]
For a while, let's revert the bogus fix.
Note that now we can use sk_net_refcnt_upgrade() for the socket
conversion, but I'll do so later separately to make backport easy.
Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0]
Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1]
Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2]
Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod")
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Cc: stable(a)vger.kernel.org
---
fs/smb/client/connect.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 137a611c5ab0..989d8808260b 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -1073,13 +1073,9 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
msleep(125);
if (cifs_rdma_enabled(server))
smbd_destroy(server);
-
if (server->ssocket) {
sock_release(server->ssocket);
server->ssocket = NULL;
-
- /* Release netns reference for the socket. */
- put_net(cifs_net_ns(server));
}
if (!list_empty(&server->pending_mid_q)) {
@@ -1127,7 +1123,6 @@ clean_demultiplex_info(struct TCP_Server_Info *server)
*/
}
- /* Release netns reference for this server. */
put_net(cifs_net_ns(server));
kfree(server->leaf_fullpath);
kfree(server->hostname);
@@ -1773,8 +1768,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
tcp_ses->ops = ctx->ops;
tcp_ses->vals = ctx->vals;
-
- /* Grab netns reference for this server. */
cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns));
tcp_ses->sign = ctx->sign;
@@ -1902,7 +1895,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
out_err_crypto_release:
cifs_crypto_secmech_release(tcp_ses);
- /* Release netns reference for this server. */
put_net(cifs_net_ns(tcp_ses));
out_err:
@@ -1911,10 +1903,8 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
cifs_put_tcp_session(tcp_ses->primary_server, false);
kfree(tcp_ses->hostname);
kfree(tcp_ses->leaf_fullpath);
- if (tcp_ses->ssocket) {
+ if (tcp_ses->ssocket)
sock_release(tcp_ses->ssocket);
- put_net(cifs_net_ns(tcp_ses));
- }
kfree(tcp_ses);
}
return ERR_PTR(rc);
@@ -3356,20 +3346,20 @@ generic_ip_connect(struct TCP_Server_Info *server)
socket = server->ssocket;
} else {
struct net *net = cifs_net_ns(server);
+ struct sock *sk;
- rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket);
+ rc = __sock_create(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket, 1);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
}
- /*
- * Grab netns reference for the socket.
- *
- * It'll be released here, on error, or in clean_demultiplex_info() upon server
- * teardown.
- */
- get_net(net);
+ sk = server->ssocket->sk;
+ __netns_tracker_free(net, &sk->ns_tracker, false);
+ sk->sk_net_refcnt = 1;
+ get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
+ sock_inuse_add(net, 1);
/* BB other socket options to set KEEPALIVE, NODELAY? */
cifs_dbg(FYI, "Socket created\n");
@@ -3383,10 +3373,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
}
rc = bind_socket(server);
- if (rc < 0) {
- put_net(cifs_net_ns(server));
+ if (rc < 0)
return rc;
- }
/*
* Eventually check for other socket options to change from
@@ -3423,7 +3411,6 @@ generic_ip_connect(struct TCP_Server_Info *server)
if (rc < 0) {
cifs_dbg(FYI, "Error %d connecting to server\n", rc);
trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc);
- put_net(cifs_net_ns(server));
sock_release(socket);
server->ssocket = NULL;
return rc;
@@ -3441,9 +3428,6 @@ generic_ip_connect(struct TCP_Server_Info *server)
(server->rfc1001_sessinit == -1 && sport == htons(RFC1001_PORT)))
rc = ip_rfc1001_connect(server);
- if (rc < 0)
- put_net(cifs_net_ns(server));
-
return rc;
}
--
2.48.1
This reverts commit 4e7f1644f2ac6d01dc584f6301c3b1d5aac4eaef.
The commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is not only a bogus fix for LOCKDEP null-ptr-deref but also
introduces a real issue, TCP sockets leak, which will be explained in
detail in the next revert.
Also, CNA assigned CVE-2024-54680 to it but is rejecting it. [0]
Thus, we are reverting the commit and its follow-up commit 4e7f1644f2ac
("smb: client: Fix netns refcount imbalance causing leaks and
use-after-free").
Link: https://lore.kernel.org/all/2025040248-tummy-smilingly-4240@gregkh/ #[0]
Fixes: 4e7f1644f2ac ("smb: client: Fix netns refcount imbalance causing leaks and use-after-free")
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Cc: stable(a)vger.kernel.org
---
fs/smb/client/connect.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 10a7c28d2d44..137a611c5ab0 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -300,7 +300,6 @@ cifs_abort_connection(struct TCP_Server_Info *server)
server->ssocket->flags);
sock_release(server->ssocket);
server->ssocket = NULL;
- put_net(cifs_net_ns(server));
}
server->sequence_number = 0;
server->session_estab = false;
@@ -3367,12 +3366,8 @@ generic_ip_connect(struct TCP_Server_Info *server)
/*
* Grab netns reference for the socket.
*
- * This reference will be released in several situations:
- * - In the failure path before the cifsd thread is started.
- * - In the all place where server->socket is released, it is
- * also set to NULL.
- * - Ultimately in clean_demultiplex_info(), during the final
- * teardown.
+ * It'll be released here, on error, or in clean_demultiplex_info() upon server
+ * teardown.
*/
get_net(net);
@@ -3388,8 +3383,10 @@ generic_ip_connect(struct TCP_Server_Info *server)
}
rc = bind_socket(server);
- if (rc < 0)
+ if (rc < 0) {
+ put_net(cifs_net_ns(server));
return rc;
+ }
/*
* Eventually check for other socket options to change from
@@ -3444,6 +3441,9 @@ generic_ip_connect(struct TCP_Server_Info *server)
(server->rfc1001_sessinit == -1 && sport == htons(RFC1001_PORT)))
rc = ip_rfc1001_connect(server);
+ if (rc < 0)
+ put_net(cifs_net_ns(server));
+
return rc;
}
--
2.48.1
From: Joe Damato <jdamato(a)fastly.com>
In commit b65969856d4f ("igc: Link queues to NAPI instances"), the XSK
queues were incorrectly unmapped from their NAPI instances. After
discussion on the mailing list and the introduction of a test to codify
the expected behavior, we can see that the unmapping causes the
check_xsk test to fail:
NETIF=enp86s0 ./tools/testing/selftests/drivers/net/queues.py
[...]
# Check| ksft_eq(q.get('xsk', None), {},
# Check failed None != {} xsk attr on queue we configured
not ok 4 queues.check_xsk
After this commit, the test passes:
ok 4 queues.check_xsk
Note that the test itself is only in net-next, so I tested this change
by applying it to my local net-next tree, booting, and running the test.
Cc: stable(a)vger.kernel.org
Fixes: b65969856d4f ("igc: Link queues to NAPI instances")
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
Reviewed-by: Gerhard Engleder <gerhard(a)engleder-embedded.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
---
drivers/net/ethernet/intel/igc/igc.h | 2 --
drivers/net/ethernet/intel/igc/igc_main.c | 4 ++--
drivers/net/ethernet/intel/igc/igc_xdp.c | 2 --
3 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index cd1d7b6c1782..c35cc5cb1185 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -337,8 +337,6 @@ struct igc_adapter {
struct igc_led_classdev *leds;
};
-void igc_set_queue_napi(struct igc_adapter *adapter, int q_idx,
- struct napi_struct *napi);
void igc_up(struct igc_adapter *adapter);
void igc_down(struct igc_adapter *adapter);
int igc_open(struct net_device *netdev);
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 491d942cefca..27c99ff59ed4 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5022,8 +5022,8 @@ static int igc_sw_init(struct igc_adapter *adapter)
return 0;
}
-void igc_set_queue_napi(struct igc_adapter *adapter, int vector,
- struct napi_struct *napi)
+static void igc_set_queue_napi(struct igc_adapter *adapter, int vector,
+ struct napi_struct *napi)
{
struct igc_q_vector *q_vector = adapter->q_vector[vector];
diff --git a/drivers/net/ethernet/intel/igc/igc_xdp.c b/drivers/net/ethernet/intel/igc/igc_xdp.c
index c538e6b18aad..9eb47b4beb06 100644
--- a/drivers/net/ethernet/intel/igc/igc_xdp.c
+++ b/drivers/net/ethernet/intel/igc/igc_xdp.c
@@ -97,7 +97,6 @@ static int igc_xdp_enable_pool(struct igc_adapter *adapter,
napi_disable(napi);
}
- igc_set_queue_napi(adapter, queue_id, NULL);
set_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
set_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
@@ -147,7 +146,6 @@ static int igc_xdp_disable_pool(struct igc_adapter *adapter, u16 queue_id)
xsk_pool_dma_unmap(pool, IGC_RX_DMA_ATTR);
clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &rx_ring->flags);
clear_bit(IGC_RING_FLAG_AF_XDP_ZC, &tx_ring->flags);
- igc_set_queue_napi(adapter, queue_id, napi);
if (needs_reset) {
napi_enable(napi);
--
2.47.1
From: Steven Rostedt <rostedt(a)goodmis.org>
Some architectures do not have data cache coherency between user and
kernel space. For these architectures, the cache needs to be flushed on
both the kernel and user addresses so that user space can see the updates
the kernel has made.
Instead of using flush_dcache_folio() and playing with virt_to_folio()
within the call to that function, use flush_kernel_vmap_range() which
takes the virtual address and does the work for those architectures that
need it.
This also fixes a bug where the flush of the reader page only flushed one
page. If the sub-buffer order is 1 or more, where the sub-buffer size
would be greater than a page, it would miss the rest of the sub-buffer
content, as the "reader page" is not just a page, but the size of a
sub-buffer.
Link: https://lore.kernel.org/all/CAG48ez3w0my4Rwttbc5tEbNsme6tc0mrSN95thjXUFaJ3a…
Cc: stable(a)vger.kernel.org
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions");
Suggested-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f25966b3a1fc..d4b0f7b55cce 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -6016,7 +6016,7 @@ static void rb_update_meta_page(struct ring_buffer_per_cpu *cpu_buffer)
meta->read = cpu_buffer->read;
/* Some archs do not have data cache coherency between kernel and user-space */
- flush_dcache_folio(virt_to_folio(cpu_buffer->meta_page));
+ flush_kernel_vmap_range(cpu_buffer->meta_page, PAGE_SIZE);
}
static void
@@ -7319,7 +7319,8 @@ int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu)
out:
/* Some archs do not have data cache coherency between kernel and user-space */
- flush_dcache_folio(virt_to_folio(cpu_buffer->reader_page->page));
+ flush_kernel_vmap_range(cpu_buffer->reader_page->page,
+ buffer->subbuf_size + BUF_PAGE_HDR_SIZE);
rb_update_meta_page(cpu_buffer);
--
2.47.2
In nilfs_direct_propagate(), the printer get from nilfs_direct_get_ptr()
need to be checked to ensure it is not an invalid pointer. A proper
implementation can be found in nilfs_direct_delete(). Add a value check
and return -ENOENT when it is an invalid pointer.
Fixes: 10ff885ba6f5 ("nilfs2: get rid of nilfs_direct uses")
Cc: stable(a)vger.kernel.org # v2.6+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
fs/nilfs2/direct.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/nilfs2/direct.c b/fs/nilfs2/direct.c
index 893ab36824cc..ff1c9fe72bec 100644
--- a/fs/nilfs2/direct.c
+++ b/fs/nilfs2/direct.c
@@ -273,6 +273,9 @@ static int nilfs_direct_propagate(struct nilfs_bmap *bmap,
dat = nilfs_bmap_get_dat(bmap);
key = nilfs_bmap_data_get_key(bmap, bh);
ptr = nilfs_direct_get_ptr(bmap, key);
+ if (ptr == NILFS_BMAP_INVALID_PTR)
+ return -ENOENT;
+
if (!buffer_nilfs_volatile(bh)) {
oldreq.pr_entry_nr = ptr;
newreq.pr_entry_nr = ptr;
--
2.42.0.windows.2