From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 23daf1b4c91db9b26f8425cc7039cf96d22ccbfe ]
Setting the AP channel width is meant for use with the normal 20/40/... MHz channel width progression, and switching around in S1G or narrow channels isn't supported. Disallow that.
Reported-by: syzbot+bc0f5b92cc7091f45fb6@syzkaller.appspotmail.com Link: https://msgid.link/20240515141600.d4a9590bfe32.I19a32d60097e81b527eafe6b0924... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 8f8f077e6cd40..053258b4e28d2 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3398,6 +3398,33 @@ static int __nl80211_set_channel(struct cfg80211_registered_device *rdev, if (chandef.chan != cur_chan) return -EBUSY;
+ /* only allow this for regular channel widths */ + switch (wdev->links[link_id].ap.chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + case NL80211_CHAN_WIDTH_40: + case NL80211_CHAN_WIDTH_80: + case NL80211_CHAN_WIDTH_80P80: + case NL80211_CHAN_WIDTH_160: + case NL80211_CHAN_WIDTH_320: + break; + default: + return -EINVAL; + } + + switch (chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + case NL80211_CHAN_WIDTH_40: + case NL80211_CHAN_WIDTH_80: + case NL80211_CHAN_WIDTH_80P80: + case NL80211_CHAN_WIDTH_160: + case NL80211_CHAN_WIDTH_320: + break; + default: + return -EINVAL; + } + result = rdev_set_ap_chanwidth(rdev, dev, link_id, &chandef); if (result)
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index d759f3373b175..dfd114845a3ed 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2135,7 +2135,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 }, - { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + /* This one is very old and rare, let's see if anybody complains. + * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + */
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 3d60041543189438cd1b03a1fa40ff6681c77970 ]
Currently the resource allocated by crypto_alloc_shash() is not freed in case ath12k_peer_find() fails, resulting in memory leak.
Add crypto_free_shash() to fix it.
This is found during code review, compile tested only.
Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Acked-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://msgid.link/20240526124226.24661-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/dp_rx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp_rx.c b/drivers/net/wireless/ath/ath12k/dp_rx.c index dbcbe7e0cd2a7..a5b6e2e078d33 100644 --- a/drivers/net/wireless/ath/ath12k/dp_rx.c +++ b/drivers/net/wireless/ath/ath12k/dp_rx.c @@ -2756,6 +2756,7 @@ int ath12k_dp_rx_peer_frag_setup(struct ath12k *ar, const u8 *peer_mac, int vdev peer = ath12k_peer_find(ab, vdev_id, peer_mac); if (!peer) { spin_unlock_bh(&ab->base_lock); + crypto_free_shash(tfm); ath12k_warn(ab, "failed to find the peer to set up fragment info\n"); return -ENOENT; }
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit fba8334721e266f92079632598e46e5f89082f30 ]
When all the strides in a WQE have been consumed, the WQE is unlinked from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible to receive CQEs with 0 consumed strides for the same WQE even after the WQE is fully consumed and unlinked. This triggers an additional unlink for the same wqe which corrupts the linked list.
Fix this scenario by accepting 0 sized consumed strides without unlinking the WQE again.
Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 8d9743a5e42c7..79ec6fcc9e259 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -2374,6 +2374,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq if (likely(wi->consumed_strides < rq->mpwqe.num_strides)) return;
+ if (unlikely(!cstrides)) + return; + wq = &rq->mpwqe.wq; wqe = mlx5_wq_ll_get_wqe(wq, wqe_id); mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
From: Yonghong Song yonghong.song@linux.dev
[ Upstream commit 7015843afcaf68c132784c89528dfddc0005e483 ]
Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue.
The main qemu command line includes:
-enable-kvm -smp 16 -cpu host
The failure log looks like:
$ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK>
In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe.
The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed.
In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well.
[1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/
Reported-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/prog_tests/send_signal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c index b15b343ebb6b1..9adcda7f1fedc 100644 --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -156,7 +156,8 @@ static void test_send_signal_tracepoint(bool signal_thread) static void test_send_signal_perf(bool signal_thread) { struct perf_event_attr attr = { - .sample_period = 1, + .freq = 1, + .sample_freq = 1000, .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_CLOCK, };
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1ca27e0c8c13ac50a4acf9cdf77069e2d94a547d ]
When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need to lock the two sockets to check their states in unix_stream_connect().
We use unix_state_lock() for the server and unix_state_lock_nested() for client with tricky sk->sk_state check to avoid deadlock.
The possible deadlock scenario are the following:
1) Self connect() 2) Simultaneous connect()
The former is simple, attempt to grab the same lock, and the latter is AB-BA deadlock.
After the server's unix_state_lock(), we check the server socket's state, and if it's not TCP_LISTEN, connect() fails with -EINVAL.
Then, we avoid the former deadlock by checking the client's state before unix_state_lock_nested(). If its state is not TCP_LISTEN, we can make sure that the client and the server are not identical based on the state.
Also, the latter deadlock can be avoided in the same way. Due to the server sk->sk_state requirement, AB-BA deadlock could happen only with TCP_LISTEN sockets. So, if the client's state is TCP_LISTEN, we can give up the second lock to avoid the deadlock.
CPU 1 CPU 2 CPU 3 connect(A -> B) connect(B -> A) listen(A) --- --- --- unix_state_lock(B) B->sk_state == TCP_LISTEN READ_ONCE(A->sk_state) == TCP_CLOSE ^^^^^^^^^ ok, will lock A unix_state_lock(A) .--------------' WRITE_ONCE(A->sk_state, TCP_LISTEN) | unix_state_unlock(A) | | unix_state_lock(A) | A->sk_sk_state == TCP_LISTEN | READ_ONCE(B->sk_state) == TCP_LISTEN v ^^^^^^^^^^ unix_state_lock_nested(A) Don't lock B !!
Currently, while checking the client's state, we also check if it's TCP_ESTABLISHED, but this is unlikely and can be checked after we know the state is not TCP_CLOSE.
Moreover, if it happens after the second lock, we now jump to the restart label, but it's unlikely that the server is not found during the retry, so the jump is mostly to revist the client state check.
Let's remove the retry logic and check the state against TCP_CLOSE first.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 34 +++++++++------------------------- 1 file changed, 9 insertions(+), 25 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5a26e785ce70d..bb66bcd4a50ea 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1483,6 +1483,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, struct unix_sock *u = unix_sk(sk), *newu, *otheru; struct net *net = sock_net(sk); struct sk_buff *skb = NULL; + unsigned char state; long timeo; int err;
@@ -1529,7 +1530,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, goto out; }
- /* Latch state of peer */ unix_state_lock(other);
/* Apparently VFS overslept socket death. Retry. */ @@ -1559,37 +1559,21 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, goto restart; }
- /* Latch our state. - - It is tricky place. We need to grab our state lock and cannot - drop lock on peer. It is dangerous because deadlock is - possible. Connect to self case and simultaneous - attempt to connect are eliminated by checking socket - state. other is TCP_LISTEN, if sk is TCP_LISTEN we - check this before attempt to grab lock. - - Well, and we have to recheck the state after socket locked. + /* self connect and simultaneous connect are eliminated + * by rejecting TCP_LISTEN socket to avoid deadlock. */ - switch (READ_ONCE(sk->sk_state)) { - case TCP_CLOSE: - /* This is ok... continue with connect */ - break; - case TCP_ESTABLISHED: - /* Socket is already connected */ - err = -EISCONN; - goto out_unlock; - default: - err = -EINVAL; + state = READ_ONCE(sk->sk_state); + if (unlikely(state != TCP_CLOSE)) { + err = state == TCP_ESTABLISHED ? -EISCONN : -EINVAL; goto out_unlock; }
unix_state_lock_nested(sk, U_LOCK_SECOND);
- if (sk->sk_state != TCP_CLOSE) { + if (unlikely(sk->sk_state != TCP_CLOSE)) { + err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL; unix_state_unlock(sk); - unix_state_unlock(other); - sock_put(other); - goto restart; + goto out_unlock; }
err = security_unix_stream_connect(sk, other, newsk);
From: FUJITA Tomonori fujita.tomonori@gmail.com
[ Upstream commit eee5528890d54b22b46f833002355a5ee94c3bb4 ]
Add the Edimax Vendor ID (0x1432) for an ethernet driver for Tehuti Networks TN40xx chips. This ID can be used for Realtek 8180 and Ralink rt28xx wireless drivers.
Signed-off-by: FUJITA Tomonori fujita.tomonori@gmail.com Acked-by: Bjorn Helgaas bhelgaas@google.com Link: https://patch.msgid.link/20240623235507.108147-2-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 0a85ff5c8db3c..abff4e3b6a58b 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2124,6 +2124,8 @@
#define PCI_VENDOR_ID_CHELSIO 0x1425
+#define PCI_VENDOR_ID_EDIMAX 0x1432 + #define PCI_VENDOR_ID_ADLINK 0x144a
#define PCI_VENDOR_ID_SAMSUNG 0x144d
From: Roman Smirnov r.smirnov@omp.ru
[ Upstream commit 56e69e59751d20993f243fb7dd6991c4e522424c ]
An overflow may occur if the function is called with the last block and an offset greater than zero. It is necessary to add a check to avoid this.
Found by Linux Verification Center (linuxtesting.org) with Svace.
[JK: Make test cover also unalloc table freeing]
Link: https://patch.msgid.link/20240620072413.7448-1-r.smirnov@omp.ru Suggested-by: Jan Kara jack@suse.com Signed-off-by: Roman Smirnov r.smirnov@omp.ru Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/udf/balloc.c | 36 +++++++++++++----------------------- 1 file changed, 13 insertions(+), 23 deletions(-)
diff --git a/fs/udf/balloc.c b/fs/udf/balloc.c index ab3ffc355949d..2192c4a1ba98b 100644 --- a/fs/udf/balloc.c +++ b/fs/udf/balloc.c @@ -18,6 +18,7 @@ #include "udfdecl.h"
#include <linux/bitops.h> +#include <linux/overflow.h>
#include "udf_i.h" #include "udf_sb.h" @@ -129,7 +130,6 @@ static void udf_bitmap_free_blocks(struct super_block *sb, { struct udf_sb_info *sbi = UDF_SB(sb); struct buffer_head *bh = NULL; - struct udf_part_map *partmap; unsigned long block; unsigned long block_group; unsigned long bit; @@ -138,19 +138,9 @@ static void udf_bitmap_free_blocks(struct super_block *sb, unsigned long overflow;
mutex_lock(&sbi->s_alloc_mutex); - partmap = &sbi->s_partmaps[bloc->partitionReferenceNum]; - if (bloc->logicalBlockNum + count < count || - (bloc->logicalBlockNum + count) > partmap->s_partition_len) { - udf_debug("%u < %d || %u + %u > %u\n", - bloc->logicalBlockNum, 0, - bloc->logicalBlockNum, count, - partmap->s_partition_len); - goto error_return; - } - + /* We make sure this cannot overflow when mounting the filesystem */ block = bloc->logicalBlockNum + offset + (sizeof(struct spaceBitmapDesc) << 3); - do { overflow = 0; block_group = block >> (sb->s_blocksize_bits + 3); @@ -380,7 +370,6 @@ static void udf_table_free_blocks(struct super_block *sb, uint32_t count) { struct udf_sb_info *sbi = UDF_SB(sb); - struct udf_part_map *partmap; uint32_t start, end; uint32_t elen; struct kernel_lb_addr eloc; @@ -389,16 +378,6 @@ static void udf_table_free_blocks(struct super_block *sb, struct udf_inode_info *iinfo;
mutex_lock(&sbi->s_alloc_mutex); - partmap = &sbi->s_partmaps[bloc->partitionReferenceNum]; - if (bloc->logicalBlockNum + count < count || - (bloc->logicalBlockNum + count) > partmap->s_partition_len) { - udf_debug("%u < %d || %u + %u > %u\n", - bloc->logicalBlockNum, 0, - bloc->logicalBlockNum, count, - partmap->s_partition_len); - goto error_return; - } - iinfo = UDF_I(table); udf_add_free_space(sb, sbi->s_partition, count);
@@ -673,6 +652,17 @@ void udf_free_blocks(struct super_block *sb, struct inode *inode, { uint16_t partition = bloc->partitionReferenceNum; struct udf_part_map *map = &UDF_SB(sb)->s_partmaps[partition]; + uint32_t blk; + + if (check_add_overflow(bloc->logicalBlockNum, offset, &blk) || + check_add_overflow(blk, count, &blk) || + bloc->logicalBlockNum + count > map->s_partition_len) { + udf_debug("Invalid request to free blocks: (%d, %u), off %u, " + "len %u, partition len %u\n", + partition, bloc->logicalBlockNum, offset, count, + map->s_partition_len); + return; + }
if (map->s_partition_flags & UDF_PART_FLAG_UNALLOC_BITMAP) { udf_bitmap_free_blocks(sb, map->s_uspace.s_bitmap,
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit a7e5793035792cc46a1a4b0a783655ffa897dfe9 ]
When a key is requested by userspace, there's really no need to include the key data, the sequence counter is really what userspace needs in this case. The fact that it's included is just a historic quirk.
Remove the key data.
Reviewed-by: Miriam Rachel Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240627104411.b6a4f097e4ea.I7e6cc976cb9e8a80ef25a3... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 053258b4e28d2..be5c42d6ffbea 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4473,10 +4473,7 @@ static void get_key_callback(void *c, struct key_params *params) struct nlattr *key; struct get_key_cookie *cookie = c;
- if ((params->key && - nla_put(cookie->msg, NL80211_ATTR_KEY_DATA, - params->key_len, params->key)) || - (params->seq && + if ((params->seq && nla_put(cookie->msg, NL80211_ATTR_KEY_SEQ, params->seq_len, params->seq)) || (params->cipher && @@ -4488,10 +4485,7 @@ static void get_key_callback(void *c, struct key_params *params) if (!key) goto nla_put_failure;
- if ((params->key && - nla_put(cookie->msg, NL80211_KEY_DATA, - params->key_len, params->key)) || - (params->seq && + if ((params->seq && nla_put(cookie->msg, NL80211_KEY_SEQ, params->seq_len, params->seq)) || (params->cipher &&
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit b8e0ddd36ce9536ad7478dd27df06c9ae92370ba ]
This is a preparatory patch to work around a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO.
When handling the TEF interrupt, the driver reads the FIFO header index from the TEF FIFO STA register of the chip.
In the bad case, the driver reads a too large head index. In the original code, the driver always trusted the read value, which caused old CAN transmit complete events that were already processed to be re-processed.
Instead of reading and trusting the head index, read the head index and calculate the number of CAN frames that were supposedly received - replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().
The mcp251xfd_handle_tefif() function reads the CAN transmit complete events from the chip, iterates over them and pushes them into the network stack. The original driver already contains code to detect old CAN transmit complete events, that will be updated in the next patch.
Cc: Stefan Althöfer Stefan.Althoefer@janztec.com Cc: Thomas Kopp thomas.kopp@microchip.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/can/spi/mcp251xfd/mcp251xfd-ring.c | 2 + drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 54 +++++++++++++------ drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 13 ++--- 3 files changed, 43 insertions(+), 26 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c index bfe4caa0c99d4..4cb79a4f24612 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c @@ -485,6 +485,8 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv) clear_bit(MCP251XFD_FLAGS_FD_MODE, priv->flags); }
+ tx_ring->obj_num_shift_to_u8 = BITS_PER_TYPE(tx_ring->obj_num) - + ilog2(tx_ring->obj_num); tx_ring->obj_size = tx_obj_size;
rem = priv->rx_obj_num; diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index e5bd57b65aafe..b41fad3b37c06 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -2,7 +2,7 @@ // // mcp251xfd - Microchip MCP251xFD Family CAN controller driver // -// Copyright (c) 2019, 2020, 2021 Pengutronix, +// Copyright (c) 2019, 2020, 2021, 2023 Pengutronix, // Marc Kleine-Budde kernel@pengutronix.de // // Based on: @@ -16,6 +16,11 @@
#include "mcp251xfd.h"
+static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta) +{ + return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF); +} + static inline int mcp251xfd_tef_tail_get_from_chip(const struct mcp251xfd_priv *priv, u8 *tef_tail) @@ -120,28 +125,44 @@ mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv, return 0; }
-static int mcp251xfd_tef_ring_update(struct mcp251xfd_priv *priv) +static int +mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p) { const struct mcp251xfd_tx_ring *tx_ring = priv->tx; - unsigned int new_head; - u8 chip_tx_tail; + const u8 shift = tx_ring->obj_num_shift_to_u8; + u8 chip_tx_tail, tail, len; + u32 fifo_sta; int err;
- err = mcp251xfd_tx_tail_get_from_chip(priv, &chip_tx_tail); + err = regmap_read(priv->map_reg, MCP251XFD_REG_FIFOSTA(priv->tx->fifo_nr), + &fifo_sta); if (err) return err;
- /* chip_tx_tail, is the next TX-Object send by the HW. - * The new TEF head must be >= the old head, ... + if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) { + *len_p = tx_ring->obj_num; + return 0; + } + + chip_tx_tail = FIELD_GET(MCP251XFD_REG_FIFOSTA_FIFOCI_MASK, fifo_sta); + + err = mcp251xfd_check_tef_tail(priv); + if (err) + return err; + tail = mcp251xfd_get_tef_tail(priv); + + /* First shift to full u8. The subtraction works on signed + * values, that keeps the difference steady around the u8 + * overflow. The right shift acts on len, which is an u8. */ - new_head = round_down(priv->tef->head, tx_ring->obj_num) + chip_tx_tail; - if (new_head <= priv->tef->head) - new_head += tx_ring->obj_num; + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(chip_tx_tail)); + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(tail)); + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(len));
- /* ... but it cannot exceed the TX head. */ - priv->tef->head = min(new_head, tx_ring->head); + len = (chip_tx_tail << shift) - (tail << shift); + *len_p = len >> shift;
- return mcp251xfd_check_tef_tail(priv); + return 0; }
static inline int @@ -182,13 +203,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) u8 tef_tail, len, l; int err, i;
- err = mcp251xfd_tef_ring_update(priv); + err = mcp251xfd_get_tef_len(priv, &len); if (err) return err;
tef_tail = mcp251xfd_get_tef_tail(priv); - len = mcp251xfd_get_tef_len(priv); - l = mcp251xfd_get_tef_linear_len(priv); + l = mcp251xfd_get_tef_linear_len(priv, len); err = mcp251xfd_tef_obj_read(priv, hw_tef_obj, tef_tail, l); if (err) return err; @@ -223,6 +243,8 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) struct mcp251xfd_tx_ring *tx_ring = priv->tx; int offset;
+ ring->head += len; + /* Increment the TEF FIFO tail pointer 'len' times in * a single SPI message. * diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h index b35bfebd23f29..4628bf847bc9b 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h @@ -524,6 +524,7 @@ struct mcp251xfd_tef_ring {
/* u8 obj_num equals tx_ring->obj_num */ /* u8 obj_size equals sizeof(struct mcp251xfd_hw_tef_obj) */ + /* u8 obj_num_shift_to_u8 equals tx_ring->obj_num_shift_to_u8 */
union mcp251xfd_write_reg_buf irq_enable_buf; struct spi_transfer irq_enable_xfer; @@ -542,6 +543,7 @@ struct mcp251xfd_tx_ring { u8 nr; u8 fifo_nr; u8 obj_num; + u8 obj_num_shift_to_u8; u8 obj_size;
struct mcp251xfd_tx_obj obj[MCP251XFD_TX_OBJ_NUM_MAX]; @@ -861,17 +863,8 @@ static inline u8 mcp251xfd_get_tef_tail(const struct mcp251xfd_priv *priv) return priv->tef->tail & (priv->tx->obj_num - 1); }
-static inline u8 mcp251xfd_get_tef_len(const struct mcp251xfd_priv *priv) +static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv, u8 len) { - return priv->tef->head - priv->tef->tail; -} - -static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv) -{ - u8 len; - - len = mcp251xfd_get_tef_len(priv); - return min_t(u8, len, priv->tx->obj_num - mcp251xfd_get_tef_tail(priv)); }
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit 3a0a88fcbaf9e027ecca3fe8775be9700b4d6460 ]
This patch updates the workaround for a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO.
In the bad case, the driver reads a too large head index. As the FIFO is implemented as a ring buffer, this results in re-handling old CAN transmit complete events.
Every transmit complete event contains with a sequence number that equals to the sequence number of the corresponding TX request. This way old TX complete events can be detected.
If the original driver detects a non matching sequence number, it prints an info message and tries again later. As wrong sequence numbers can be explained by the erratum DS80000789E 6, demote the info message to debug level, streamline the code and update the comments.
Keep the behavior: If an old CAN TX complete event is detected, abort the iteration and mark the number of valid CAN TX complete events as processed in the chip by incrementing the FIFO's tail index.
Cc: Stefan Althöfer Stefan.Althoefer@janztec.com Cc: Thomas Kopp thomas.kopp@microchip.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 71 +++++++------------ 1 file changed, 27 insertions(+), 44 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index b41fad3b37c06..5b0c7890d4b44 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -60,56 +60,39 @@ static int mcp251xfd_check_tef_tail(const struct mcp251xfd_priv *priv) return 0; }
-static int -mcp251xfd_handle_tefif_recover(const struct mcp251xfd_priv *priv, const u32 seq) -{ - const struct mcp251xfd_tx_ring *tx_ring = priv->tx; - u32 tef_sta; - int err; - - err = regmap_read(priv->map_reg, MCP251XFD_REG_TEFSTA, &tef_sta); - if (err) - return err; - - if (tef_sta & MCP251XFD_REG_TEFSTA_TEFOVIF) { - netdev_err(priv->ndev, - "Transmit Event FIFO buffer overflow.\n"); - return -ENOBUFS; - } - - netdev_info(priv->ndev, - "Transmit Event FIFO buffer %s. (seq=0x%08x, tef_tail=0x%08x, tef_head=0x%08x, tx_head=0x%08x).\n", - tef_sta & MCP251XFD_REG_TEFSTA_TEFFIF ? - "full" : tef_sta & MCP251XFD_REG_TEFSTA_TEFNEIF ? - "not empty" : "empty", - seq, priv->tef->tail, priv->tef->head, tx_ring->head); - - /* The Sequence Number in the TEF doesn't match our tef_tail. */ - return -EAGAIN; -} - static int mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv, const struct mcp251xfd_hw_tef_obj *hw_tef_obj, unsigned int *frame_len_ptr) { struct net_device_stats *stats = &priv->ndev->stats; + u32 seq, tef_tail_masked, tef_tail; struct sk_buff *skb; - u32 seq, seq_masked, tef_tail_masked, tef_tail;
- seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2518FD_MASK, + /* Use the MCP2517FD mask on the MCP2518FD, too. We only + * compare 7 bits, this is enough to detect old TEF objects. + */ + seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK, hw_tef_obj->flags); - - /* Use the MCP2517FD mask on the MCP2518FD, too. We only - * compare 7 bits, this should be enough to detect - * net-yet-completed, i.e. old TEF objects. - */ - seq_masked = seq & - field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK); tef_tail_masked = priv->tef->tail & field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK); - if (seq_masked != tef_tail_masked) - return mcp251xfd_handle_tefif_recover(priv, seq); + + /* According to mcp2518fd erratum DS80000789E 6. the FIFOCI + * bits of a FIFOSTA register, here the TX FIFO tail index + * might be corrupted and we might process past the TEF FIFO's + * head into old CAN frames. + * + * Compare the sequence number of the currently processed CAN + * frame with the expected sequence number. Abort with + * -EBADMSG if an old CAN frame is detected. + */ + if (seq != tef_tail_masked) { + netdev_dbg(priv->ndev, "%s: chip=0x%02x ring=0x%02x\n", __func__, + seq, tef_tail_masked); + stats->tx_fifo_errors++; + + return -EBADMSG; + }
tef_tail = mcp251xfd_get_tef_tail(priv); skb = priv->can.echo_skb[tef_tail]; @@ -223,12 +206,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) unsigned int frame_len = 0;
err = mcp251xfd_handle_tefif_one(priv, &hw_tef_obj[i], &frame_len); - /* -EAGAIN means the Sequence Number in the TEF - * doesn't match our tef_tail. This can happen if we - * read the TEF objects too early. Leave loop let the - * interrupt handler call us again. + /* -EBADMSG means we're affected by mcp2518fd erratum + * DS80000789E 6., i.e. the Sequence Number in the TEF + * doesn't match our tef_tail. Don't process any + * further and mark processed frames as good. */ - if (err == -EAGAIN) + if (err == -EBADMSG) goto out_netif_wake_queue; if (err) return err;
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit 3c466d6537b99f801b3f68af3d8124d4312437a0 ]
On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at the time of the DMA reset. We can however extract the RX clock from the internal SERDES block. Once the link is up, we can revert to the previous state.
The AQR115C PHY doesn't support in-band signalling so we can count on getting the link up notification and safely reuse existing callbacks which are already used by another HW quirk workaround which enables the functional clock to avoid a DMA reset due to timeout.
Only enable loopback on revision 3 of the board - check the phy_mode to make sure.
Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://patch.msgid.link/20240703181500.28491-3-brgl@bgdev.pl Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../stmicro/stmmac/dwmac-qcom-ethqos.c | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c index d5d2a4c776c1c..ded1bbda5266f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c @@ -21,6 +21,7 @@ #define RGMII_IO_MACRO_CONFIG2 0x1C #define RGMII_IO_MACRO_DEBUG1 0x20 #define EMAC_SYSTEM_LOW_POWER_DEBUG 0x28 +#define EMAC_WRAPPER_SGMII_PHY_CNTRL1 0xf4
/* RGMII_IO_MACRO_CONFIG fields */ #define RGMII_CONFIG_FUNC_CLK_EN BIT(30) @@ -79,6 +80,9 @@ #define ETHQOS_MAC_CTRL_SPEED_MODE BIT(14) #define ETHQOS_MAC_CTRL_PORT_SEL BIT(15)
+/* EMAC_WRAPPER_SGMII_PHY_CNTRL1 bits */ +#define SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN BIT(3) + #define SGMII_10M_RX_CLK_DVDR 0x31
struct ethqos_emac_por { @@ -95,6 +99,7 @@ struct ethqos_emac_driver_data { bool has_integrated_pcs; u32 dma_addr_width; struct dwmac4_addrs dwmac4_addrs; + bool needs_sgmii_loopback; };
struct qcom_ethqos { @@ -113,6 +118,7 @@ struct qcom_ethqos { unsigned int num_por; bool rgmii_config_loopback_en; bool has_emac_ge_3; + bool needs_sgmii_loopback; };
static int rgmii_readl(struct qcom_ethqos *ethqos, unsigned int offset) @@ -187,8 +193,22 @@ ethqos_update_link_clk(struct qcom_ethqos *ethqos, unsigned int speed) clk_set_rate(ethqos->link_clk, ethqos->link_clk_rate); }
+static void +qcom_ethqos_set_sgmii_loopback(struct qcom_ethqos *ethqos, bool enable) +{ + if (!ethqos->needs_sgmii_loopback || + ethqos->phy_mode != PHY_INTERFACE_MODE_2500BASEX) + return; + + rgmii_updatel(ethqos, + SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN, + enable ? SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN : 0, + EMAC_WRAPPER_SGMII_PHY_CNTRL1); +} + static void ethqos_set_func_clk_en(struct qcom_ethqos *ethqos) { + qcom_ethqos_set_sgmii_loopback(ethqos, true); rgmii_updatel(ethqos, RGMII_CONFIG_FUNC_CLK_EN, RGMII_CONFIG_FUNC_CLK_EN, RGMII_IO_MACRO_CONFIG); } @@ -273,6 +293,7 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = { .has_emac_ge_3 = true, .link_clk_name = "phyaux", .has_integrated_pcs = true, + .needs_sgmii_loopback = true, .dma_addr_width = 36, .dwmac4_addrs = { .dma_chan = 0x00008100, @@ -646,6 +667,7 @@ static void ethqos_fix_mac_speed(void *priv, unsigned int speed, unsigned int mo { struct qcom_ethqos *ethqos = priv;
+ qcom_ethqos_set_sgmii_loopback(ethqos, false); ethqos->speed = speed; ethqos_update_link_clk(ethqos, speed); ethqos_configure(ethqos); @@ -781,6 +803,7 @@ static int qcom_ethqos_probe(struct platform_device *pdev) ethqos->num_por = data->num_por; ethqos->rgmii_config_loopback_en = data->rgmii_config_loopback_en; ethqos->has_emac_ge_3 = data->has_emac_ge_3; + ethqos->needs_sgmii_loopback = data->needs_sgmii_loopback;
ethqos->link_clk = devm_clk_get(dev, data->link_clk_name ?: "rgmii"); if (IS_ERR(ethqos->link_clk))
From: Qu Wenruo wqu@suse.com
[ Upstream commit 97713b1a2ced1e4a2a6c40045903797ebd44d7e0 ]
[BUG] For subpage + zoned case, the following workload can lead to rsv data leak at unmount time:
# mkfs.btrfs -f -s 4k $dev # mount $dev $mnt # fsstress -w -n 8 -d $mnt -s 1709539240 0/0: fiemap - no filename 0/1: copyrange read - no filename 0/2: write - no filename 0/3: rename - no source filename 0/4: creat f0 x:0 0 0 0/4: creat add id=0,parent=-1 0/5: writev f0[259 1 0 0 0 0] [778052,113,965] 0 0/6: ioctl(FIEMAP) f0[259 1 0 0 224 887097] [1294220,2291618343991484791,0x10000] -1 0/7: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 224 887097] return 25, fallback to stat() 0/7: dwrite f0[259 1 0 0 224 887097] [696320,102400] 0 # umount $mnt
The dmesg includes the following rsv leak detection warning (all call trace skipped):
------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8653 btrfs_destroy_inode+0x1e0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8654 btrfs_destroy_inode+0x1a8/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8660 btrfs_destroy_inode+0x1a0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): last unmount of filesystem 1b4abba9-de34-4f07-9e7f-157cf12a18d6 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info DATA has 268218368 free, is not full BTRFS info (device sda): space_info total=268435456, used=204800, pinned=0, reserved=0, may_use=12288, readonly=0 zone_unusable=0 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info METADATA has 267796480 free, is not full BTRFS info (device sda): space_info total=268435456, used=131072, pinned=0, reserved=0, may_use=262144, readonly=0 zone_unusable=245760 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone append size of 64K, and the system has 64K page size.
[CAUSE] I have added several trace_printk() to show the events (header skipped):
btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The above lines show our buffered write has dirtied 3 pages of inode 259 of root 5:
704K 768K 832K 896K I |////I/////////////////I///////////| I 756K 868K
|///| is the dirtied range using subpage bitmaps. and 'I' is the page boundary.
Meanwhile all three pages (704K, 768K, 832K) have their PageDirty flag set.
btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400
Then direct IO write starts, since the range [680K, 780K) covers the beginning part of the above dirty range, we need to writeback the two pages at 704K and 768K.
cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
Now the above 2 lines show that we're writing back for dirty range [756K, 756K + 64K). We only writeback 64K because the zoned device has max zone append size as 64K.
extent_write_locked_range: r/i=5/259 clear dirty for page=786432
!!! The above line shows the root cause. !!!
We're calling clear_page_dirty_for_io() inside extent_write_locked_range(), for the page 768K. This is because extent_write_locked_range() can go beyond the current locked page, here we hit the page at 768K and clear its page dirt.
In fact this would lead to the desync between subpage dirty and page dirty flags. We have the page dirty flag cleared, but the subpage range [820K, 832K) is still dirty.
After the writeback of range [756K, 820K), the dirty flags look like this, as page 768K no longer has dirty flag set.
704K 768K 832K 896K I I | I/////////////| I 820K 868K
This means we will no longer writeback range [820K, 832K), thus the reserved data/metadata space would never be properly released.
extent_write_cache_pages: r/i=5/259 skip non-dirty folio=786432
Now even though we try to start writeback for page 768K, since the page is not dirty, we completely skip it at extent_write_cache_pages() time.
btrfs_direct_write: r/i=5/259 dio done filepos=696320 len=0
Now the direct IO finished.
cow_file_range: r/i=5/259 add ordered extent filepos=851968 len=36864 extent_write_locked_range: r/i=5/259 locked page=851968 start=851968 len=36864
Now we writeback the remaining dirty range, which is [832K, 868K). Causing the range [820K, 832K) never to be submitted, thus leaking the reserved space.
This bug only affects subpage and zoned case. For non-subpage and zoned case, we have exactly one sector for each page, thus no such partial dirty cases.
For subpage and non-zoned case, we never go into run_delalloc_cow(), and normally all the dirty subpage ranges would be properly submitted inside __extent_writepage_io().
[FIX] Just do not clear the page dirty at all inside extent_write_locked_range(). As __extent_writepage_io() would do a more accurate, subpage compatible clear for page and subpage dirty flags anyway.
Now the correct trace would look like this:
btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The page dirty part is still the same 3 pages.
btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400 cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
And the writeback for the first 64K is still correct.
cow_file_range: r/i=5/259 add ordered extent filepos=839680 len=49152 extent_write_locked_range: r/i=5/259 locked page=786432 start=839680 len=49152
Now with the fix, we can properly writeback the range [820K, 832K), and properly release the reserved data/metadata space.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9fbffd84b16c5..c6a95dfa59c81 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2172,10 +2172,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
page = find_get_page(mapping, cur >> PAGE_SHIFT); ASSERT(PageLocked(page)); - if (pages_dirty && page != locked_page) { + if (pages_dirty && page != locked_page) ASSERT(PageDirty(page)); - clear_page_dirty_for_io(page); - }
ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl, i_size, &nr);
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 320d8dc612660da84c3b70a28658bb38069e5a9a ]
If we failed to link a free space entry because there's already a conflicting entry for the same offset, we free the free space entry but we don't free the associated bitmap that we had just allocated before. Fix that by freeing the bitmap before freeing the entry.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/free-space-cache.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index dcfc0425115e9..003b865676142 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -855,6 +855,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, spin_unlock(&ctl->tree_lock); btrfs_err(fs_info, "Duplicate entries in free space cache, dumping"); + kmem_cache_free(btrfs_free_space_bitmap_cachep, e->bitmap); kmem_cache_free(btrfs_free_space_cachep, e); goto free_cache; }
From: Luke Wang ziniu.wang_1@nxp.com
[ Upstream commit 0d0df1e750bac0fdaa77940e711c1625cff08d33 ]
When unload the btnxpuart driver, its associated timer will be deleted. If the timer happens to be modified at this moment, it leads to the kernel call this timer even after the driver unloaded, resulting in kernel panic. Use timer_shutdown_sync() instead of del_timer_sync() to prevent rearming.
panic log: Internal error: Oops: 0000000086000007 [#1] PREEMPT SMP Modules linked in: algif_hash algif_skcipher af_alg moal(O) mlan(O) crct10dif_ce polyval_ce polyval_generic snd_soc_imx_card snd_soc_fsl_asoc_card snd_soc_imx_audmux mxc_jpeg_encdec v4l2_jpeg snd_soc_wm8962 snd_soc_fsl_micfil snd_soc_fsl_sai flexcan snd_soc_fsl_utils ap130x rpmsg_ctrl imx_pcm_dma can_dev rpmsg_char pwm_fan fuse [last unloaded: btnxpuart] CPU: 5 PID: 723 Comm: memtester Tainted: G O 6.6.23-lts-next-06207-g4aef2658ac28 #1 Hardware name: NXP i.MX95 19X19 board (DT) pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0xffff80007a2cf464 lr : call_timer_fn.isra.0+0x24/0x80 ... Call trace: 0xffff80007a2cf464 __run_timers+0x234/0x280 run_timer_softirq+0x20/0x40 __do_softirq+0x100/0x26c ____do_softirq+0x10/0x1c call_on_irq_stack+0x24/0x4c do_softirq_own_stack+0x1c/0x2c irq_exit_rcu+0xc0/0xdc el0_interrupt+0x54/0xd8 __el0_irq_handler_common+0x18/0x24 el0t_64_irq_handler+0x10/0x1c el0t_64_irq+0x190/0x194 Code: ???????? ???????? ???????? ???????? (????????) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0,c0000000,40028143,1000721b Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Signed-off-by: Luke Wang ziniu.wang_1@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btnxpuart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bluetooth/btnxpuart.c b/drivers/bluetooth/btnxpuart.c index abccd571cf3ee..45a06c94a8606 100644 --- a/drivers/bluetooth/btnxpuart.c +++ b/drivers/bluetooth/btnxpuart.c @@ -324,7 +324,7 @@ static void ps_cancel_timer(struct btnxpuart_dev *nxpdev) struct ps_data *psdata = &nxpdev->psdata;
flush_work(&psdata->work); - del_timer_sync(&psdata->ps_timer); + timer_shutdown_sync(&psdata->ps_timer); }
static void ps_control(struct hci_dev *hdev, u8 ps_state)
linux-stable-mirror@lists.linaro.org