From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 23daf1b4c91db9b26f8425cc7039cf96d22ccbfe ]
Setting the AP channel width is meant for use with the normal 20/40/... MHz channel width progression, and switching around in S1G or narrow channels isn't supported. Disallow that.
Reported-by: syzbot+bc0f5b92cc7091f45fb6@syzkaller.appspotmail.com Link: https://msgid.link/20240515141600.d4a9590bfe32.I19a32d60097e81b527eafe6b0924... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 72c7bf5585816..81d5bf186180f 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -3419,6 +3419,33 @@ static int __nl80211_set_channel(struct cfg80211_registered_device *rdev, if (chandef.chan != cur_chan) return -EBUSY;
+ /* only allow this for regular channel widths */ + switch (wdev->links[link_id].ap.chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + case NL80211_CHAN_WIDTH_40: + case NL80211_CHAN_WIDTH_80: + case NL80211_CHAN_WIDTH_80P80: + case NL80211_CHAN_WIDTH_160: + case NL80211_CHAN_WIDTH_320: + break; + default: + return -EINVAL; + } + + switch (chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + case NL80211_CHAN_WIDTH_40: + case NL80211_CHAN_WIDTH_80: + case NL80211_CHAN_WIDTH_80P80: + case NL80211_CHAN_WIDTH_160: + case NL80211_CHAN_WIDTH_320: + break; + default: + return -EINVAL; + } + result = rdev_set_ap_chanwidth(rdev, dev, link_id, &chandef); if (result)
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 0a993772e0f0934d730c0d451622c80e03a40ab1 ]
Commit 5082b3e3027e ("wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early") fixes a race in ath11k driver. Since ath12k shares the same logic as ath11k, currently the race also exists in ath12k: in ath12k_pci_ext_irq_enable(), ATH12K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled. In cases where only one MSI vector is allocated, this results in a race condition: after ATH12K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled, CE interrupt breaks in. Since IRQ is shared by CE and data path, ath12k_pci_ext_interrupt_handler() is also called where we call disable_irq_nosync() to disable IRQ. Then napi_schedule() is called but it does nothing because NAPI is not enabled at that time, meaning that ath12k_pci_ext_grp_napi_poll() will never run, so we have no chance to call enable_irq() to enable IRQ back. Since IRQ is shared, all interrupts are disabled and we would finally get no response from target.
So port ath11k fix here, this is done by setting ATH12K_FLAG_EXT_IRQ_ENABLED after all NAPI and IRQ work are done. With the fix, we are sure that by the time ATH12K_FLAG_EXT_IRQ_ENABLED is set, NAPI is enabled.
Note that the fix above also introduce some side effects: if ath12k_pci_ext_interrupt_handler() breaks in after NAPI enabled but before ATH12K_FLAG_EXT_IRQ_ENABLED set, nothing will be done by the handler this time, the work will be postponed till the next time the IRQ fires.
This is found during code review.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Acked-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://msgid.link/20240524023642.37030-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/pci.c b/drivers/net/wireless/ath/ath12k/pci.c index 16af046c33d9e..b08e64a5c8932 100644 --- a/drivers/net/wireless/ath/ath12k/pci.c +++ b/drivers/net/wireless/ath/ath12k/pci.c @@ -1090,14 +1090,14 @@ void ath12k_pci_ext_irq_enable(struct ath12k_base *ab) { int i;
- set_bit(ATH12K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags); - for (i = 0; i < ATH12K_EXT_IRQ_GRP_NUM_MAX; i++) { struct ath12k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i];
napi_enable(&irq_grp->napi); ath12k_pci_ext_grp_enable(irq_grp); } + + set_bit(ATH12K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags); }
void ath12k_pci_ext_irq_disable(struct ath12k_base *ab)
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7b9e04884575e..d2d46fe17631a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 }, - { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + /* This one is very old and rare, let's see if anybody complains. + * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + */
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
On 28.07.2024 02:52, Sasha Levin wrote:
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7b9e04884575e..d2d46fe17631a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii) /* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 },
{ 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
/* This one is very old and rare, let's see if anybody complains.
* { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
*/
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
It may be the case that there are still few users out there with this ancient hw. We will know better once 6.11 is out for a few month. In this case we would have to revert this change. I don't think it's a change which should go to stable.
On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
On 28.07.2024 02:52, Sasha Levin wrote:
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7b9e04884575e..d2d46fe17631a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 },
{ 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
/* This one is very old and rare, let's see if anybody complains.
* { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 },
*/
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
It may be the case that there are still few users out there with this ancient hw. We will know better once 6.11 is out for a few month. In this case we would have to revert this change. I don't think it's a change which should go to stable.
Sure, I'll drop it.
On 10.08.2024 11:12, Sasha Levin wrote:
On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
On 28.07.2024 02:52, Sasha Levin wrote:
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7b9e04884575e..d2d46fe17631a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 }, - { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + /* This one is very old and rare, let's see if anybody complains. + * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + */
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
It may be the case that there are still few users out there with this ancient hw. We will know better once 6.11 is out for a few month. In this case we would have to revert this change. I don't think it's a change which should go to stable.
Sure, I'll drop it.
Just saw that this patch has been added to stable again an hour ago. Technical issue?
On Sun, Aug 11, 2024 at 04:32:07PM +0200, Heiner Kallweit wrote:
On 10.08.2024 11:12, Sasha Levin wrote:
On Mon, Jul 29, 2024 at 10:45:15AM +0200, Heiner Kallweit wrote:
On 28.07.2024 02:52, Sasha Levin wrote:
From: Heiner Kallweit hkallweit1@gmail.com
[ Upstream commit 982300c115d229565d7af8e8b38aa1ee7bb1f5bd ]
This early RTL8168b version was the first PCIe chip version, and it's quite quirky. Last sign of life is from more than 15 yrs ago. Let's remove detection of this chip version, we'll see whether anybody complains. If not, support for this chip version can be removed a few kernel versions later.
Signed-off-by: Heiner Kallweit hkallweit1@gmail.com Link: https://lore.kernel.org/r/875cdcf4-843c-420a-ad5d-417447b68572@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
drivers/net/ethernet/realtek/r8169_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 7b9e04884575e..d2d46fe17631a 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2274,7 +2274,9 @@ static enum mac_version rtl8169_get_mac_version(u16 xid, bool gmii)
/* 8168B family. */ { 0x7c8, 0x380, RTL_GIGA_MAC_VER_17 }, - { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + /* This one is very old and rare, let's see if anybody complains. + * { 0x7c8, 0x300, RTL_GIGA_MAC_VER_11 }, + */
/* 8101 family. */ { 0x7c8, 0x448, RTL_GIGA_MAC_VER_39 },
It may be the case that there are still few users out there with this ancient hw. We will know better once 6.11 is out for a few month. In this case we would have to revert this change. I don't think it's a change which should go to stable.
Sure, I'll drop it.
Just saw that this patch has been added to stable again an hour ago. Technical issue?
Indeed, I ended up dropping from the wrong local branch yesterday, sorry!
From: Ping-Ke Shih pkshih@realtek.com
[ Upstream commit 9c4fde42cce05719120cf892a44b76ff61d908c7 ]
Handle error code to cause failed to USB probe result from unexpected USB EP number, otherwise when USB disconnect skb_dequeue() an uninitialized skb list and cause warnings below.
usb 2-1: USB disconnect, device number 76 INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 54060 Comm: kworker/0:1 Not tainted 6.9.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x116/0x1b0 lib/dump_stack.c:114 assign_lock_key kernel/locking/lockdep.c:976 [inline] register_lock_class+0xc18/0xfa0 kernel/locking/lockdep.c:1289 __lock_acquire+0x108/0x3bc0 kernel/locking/lockdep.c:5014 lock_acquire kernel/locking/lockdep.c:5754 [inline] lock_acquire+0x1b0/0x550 kernel/locking/lockdep.c:5719 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 skb_dequeue+0x20/0x180 net/core/skbuff.c:3846 rtl_usb_cleanup drivers/net/wireless/realtek/rtlwifi/usb.c:706 [inline] rtl_usb_deinit drivers/net/wireless/realtek/rtlwifi/usb.c:721 [inline] rtl_usb_disconnect+0x4a4/0x850 drivers/net/wireless/realtek/rtlwifi/usb.c:1051 usb_unbind_interface+0x1e8/0x980 drivers/usb/core/driver.c:461 device_remove drivers/base/dd.c:568 [inline] device_remove+0x122/0x170 drivers/base/dd.c:560 __device_release_driver drivers/base/dd.c:1270 [inline] device_release_driver_internal+0x443/0x620 drivers/base/dd.c:1293 bus_remove_device+0x22f/0x420 drivers/base/bus.c:574 device_del+0x395/0x9f0 drivers/base/core.c:3909 usb_disable_device+0x360/0x7b0 drivers/usb/core/message.c:1418 usb_disconnect+0x2db/0x930 drivers/usb/core/hub.c:2305 hub_port_connect drivers/usb/core/hub.c:5362 [inline] hub_port_connect_change drivers/usb/core/hub.c:5662 [inline] port_event drivers/usb/core/hub.c:5822 [inline] hub_event+0x1e39/0x4ce0 drivers/usb/core/hub.c:5904 process_one_work+0x97b/0x1a90 kernel/workqueue.c:3267 process_scheduled_works kernel/workqueue.c:3348 [inline] worker_thread+0x680/0xf00 kernel/workqueue.c:3429 kthread+0x2c7/0x3b0 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK>
Reported-by: Shichao Lai shichaorai@gmail.com Closes: https://lore.kernel.org/linux-wireless/CAEk6kZuuezkH1dVRJf3EAVZK-83=OpTz62qC... Tested-by: Shichao Lai shichaorai@gmail.com Signed-off-by: Ping-Ke Shih pkshih@realtek.com Link: https://msgid.link/20240524003248.5952-1-pkshih@realtek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/realtek/rtlwifi/usb.c | 34 +++++++++++++++++----- 1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/usb.c b/drivers/net/wireless/realtek/rtlwifi/usb.c index 2ea72d9e39577..4d2931e544278 100644 --- a/drivers/net/wireless/realtek/rtlwifi/usb.c +++ b/drivers/net/wireless/realtek/rtlwifi/usb.c @@ -23,6 +23,8 @@ MODULE_DESCRIPTION("USB basic driver for rtlwifi");
#define MAX_USBCTRL_VENDORREQ_TIMES 10
+static void _rtl_usb_cleanup_tx(struct ieee80211_hw *hw); + static void _usbctrl_vendorreq_sync(struct usb_device *udev, u8 reqtype, u16 value, void *pdata, u16 len) { @@ -285,9 +287,23 @@ static int _rtl_usb_init(struct ieee80211_hw *hw) } /* usb endpoint mapping */ err = rtlpriv->cfg->usb_interface_cfg->usb_endpoint_mapping(hw); - rtlusb->usb_mq_to_hwq = rtlpriv->cfg->usb_interface_cfg->usb_mq_to_hwq; - _rtl_usb_init_tx(hw); - _rtl_usb_init_rx(hw); + if (err) + return err; + + rtlusb->usb_mq_to_hwq = rtlpriv->cfg->usb_interface_cfg->usb_mq_to_hwq; + + err = _rtl_usb_init_tx(hw); + if (err) + return err; + + err = _rtl_usb_init_rx(hw); + if (err) + goto err_out; + + return 0; + +err_out: + _rtl_usb_cleanup_tx(hw); return err; }
@@ -691,17 +707,13 @@ static int rtl_usb_start(struct ieee80211_hw *hw) }
/*======================= tx =========================================*/ -static void rtl_usb_cleanup(struct ieee80211_hw *hw) +static void _rtl_usb_cleanup_tx(struct ieee80211_hw *hw) { u32 i; struct sk_buff *_skb; struct rtl_usb *rtlusb = rtl_usbdev(rtl_usbpriv(hw)); struct ieee80211_tx_info *txinfo;
- /* clean up rx stuff. */ - _rtl_usb_cleanup_rx(hw); - - /* clean up tx stuff */ for (i = 0; i < RTL_USB_MAX_EP_NUM; i++) { while ((_skb = skb_dequeue(&rtlusb->tx_skb_queue[i]))) { rtlusb->usb_tx_cleanup(hw, _skb); @@ -715,6 +727,12 @@ static void rtl_usb_cleanup(struct ieee80211_hw *hw) usb_kill_anchored_urbs(&rtlusb->tx_submitted); }
+static void rtl_usb_cleanup(struct ieee80211_hw *hw) +{ + _rtl_usb_cleanup_rx(hw); + _rtl_usb_cleanup_tx(hw); +} + /* We may add some struct into struct rtl_usb later. Do deinit here. */ static void rtl_usb_deinit(struct ieee80211_hw *hw) {
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 3d60041543189438cd1b03a1fa40ff6681c77970 ]
Currently the resource allocated by crypto_alloc_shash() is not freed in case ath12k_peer_find() fails, resulting in memory leak.
Add crypto_free_shash() to fix it.
This is found during code review, compile tested only.
Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Acked-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://msgid.link/20240526124226.24661-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/dp_rx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp_rx.c b/drivers/net/wireless/ath/ath12k/dp_rx.c index 75df622f25d85..f05c45c8732e7 100644 --- a/drivers/net/wireless/ath/ath12k/dp_rx.c +++ b/drivers/net/wireless/ath/ath12k/dp_rx.c @@ -2762,6 +2762,7 @@ int ath12k_dp_rx_peer_frag_setup(struct ath12k *ar, const u8 *peer_mac, int vdev peer = ath12k_peer_find(ab, vdev_id, peer_mac); if (!peer) { spin_unlock_bh(&ab->base_lock); + crypto_free_shash(tfm); ath12k_warn(ab, "failed to find the peer to set up fragment info\n"); return -ENOENT; }
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit fba8334721e266f92079632598e46e5f89082f30 ]
When all the strides in a WQE have been consumed, the WQE is unlinked from the WQ linked list (mlx5_wq_ll_pop()). For SHAMPO, it is possible to receive CQEs with 0 consumed strides for the same WQE even after the WQE is fully consumed and unlinked. This triggers an additional unlink for the same wqe which corrupts the linked list.
Fix this scenario by accepting 0 sized consumed strides without unlinking the WQE again.
Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240603212219.1037656-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index b5333da20e8a7..cdc84a27a04ed 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -2374,6 +2374,9 @@ static void mlx5e_handle_rx_cqe_mpwrq_shampo(struct mlx5e_rq *rq, struct mlx5_cq if (likely(wi->consumed_strides < rq->mpwqe.num_strides)) return;
+ if (unlikely(!cstrides)) + return; + wq = &rq->mpwqe.wq; wqe = mlx5_wq_ll_get_wqe(wq, wqe_id); mlx5_wq_ll_pop(wq, cqe->wqe_id, &wqe->next.next_wqe_index);
From: Yonghong Song yonghong.song@linux.dev
[ Upstream commit 7015843afcaf68c132784c89528dfddc0005e483 ]
Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue.
The main qemu command line includes:
-enable-kvm -smp 16 -cpu host
The failure log looks like:
$ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK>
In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe.
The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed.
In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well.
[1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/
Reported-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/prog_tests/send_signal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c index 920aee41bd58c..6cc69900b3106 100644 --- a/tools/testing/selftests/bpf/prog_tests/send_signal.c +++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c @@ -156,7 +156,8 @@ static void test_send_signal_tracepoint(bool signal_thread) static void test_send_signal_perf(bool signal_thread) { struct perf_event_attr attr = { - .sample_period = 1, + .freq = 1, + .sample_freq = 1000, .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_CLOCK, };
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f ]
Now that we have an intermediate layer of code for handling rtnl-level netlink dump quirks, we can move the rtnl_lock taking there.
For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can avoid taking rtnl_lock just to generate NLM_DONE, once again.
Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/rtnetlink.c | 9 +++++++-- net/netlink/af_netlink.c | 2 -- 2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 4668d67180407..eabfc8290f5e2 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -6486,6 +6486,7 @@ static int rtnl_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { + const bool needs_lock = !(cb->flags & RTNL_FLAG_DUMP_UNLOCKED); rtnl_dumpit_func dumpit = cb->data; int err;
@@ -6495,7 +6496,11 @@ static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb) if (!dumpit) return 0;
+ if (needs_lock) + rtnl_lock(); err = dumpit(skb, cb); + if (needs_lock) + rtnl_unlock();
/* Old dump handlers used to send NLM_DONE as in a separate recvmsg(). * Some applications which parse netlink manually depend on this. @@ -6515,7 +6520,8 @@ static int rtnetlink_dump_start(struct sock *ssk, struct sk_buff *skb, const struct nlmsghdr *nlh, struct netlink_dump_control *control) { - if (control->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE) { + if (control->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE || + !(control->flags & RTNL_FLAG_DUMP_UNLOCKED)) { WARN_ON(control->data); control->data = control->dump; control->dump = rtnl_dumpit; @@ -6703,7 +6709,6 @@ static int __net_init rtnetlink_net_init(struct net *net) struct netlink_kernel_cfg cfg = { .groups = RTNLGRP_MAX, .input = rtnetlink_rcv, - .cb_mutex = &rtnl_mutex, .flags = NL_CFG_F_NONROOT_RECV, .bind = rtnetlink_bind, }; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index fa9c090cf629e..8bbbe75e75dbe 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2330,8 +2330,6 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
cb->extack = &extack;
- if (cb->flags & RTNL_FLAG_DUMP_UNLOCKED) - extra_mutex = NULL; if (extra_mutex) mutex_lock(extra_mutex); nlk->dump_done_errno = cb->dump(skb, cb);
On Sat, 27 Jul 2024 20:52:51 -0400 Sasha Levin wrote:
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5380d64f8d766576ac5c0f627418b2d0e1d2641f ]
Now that we have an intermediate layer of code for handling rtnl-level netlink dump quirks, we can move the rtnl_lock taking there.
For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can avoid taking rtnl_lock just to generate NLM_DONE, once again.
Not really a fix, FWIW, but not scary either so your call.
From: Ping-Ke Shih pkshih@realtek.com
[ Upstream commit 94298477f81a1701fc4e1b5a0ce9672acab5dcb2 ]
Read 32 bits RX info to a local variable to fix race condition between reading RX length and RX tag.
Another solution is to get RX tag at first statement, but adopted solution can save some memory read, and also save 15 bytes binary code.
RX tag, a sequence number, is used to ensure that RX data has been DMA to memory completely, so driver must check sequence number is expected before reading other data.
This potential problem happens only after enabling 36-bit DMA.
Signed-off-by: Ping-Ke Shih pkshih@realtek.com Link: https://msgid.link/20240611021901.26394-2-pkshih@realtek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/realtek/rtw89/pci.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c index 03bbcf9b6737c..1ba812fae12f0 100644 --- a/drivers/net/wireless/realtek/rtw89/pci.c +++ b/drivers/net/wireless/realtek/rtw89/pci.c @@ -183,14 +183,17 @@ static void rtw89_pci_sync_skb_for_device(struct rtw89_dev *rtwdev, static void rtw89_pci_rxbd_info_update(struct rtw89_dev *rtwdev, struct sk_buff *skb) { - struct rtw89_pci_rxbd_info *rxbd_info; struct rtw89_pci_rx_info *rx_info = RTW89_PCI_RX_SKB_CB(skb); + struct rtw89_pci_rxbd_info *rxbd_info; + __le32 info;
rxbd_info = (struct rtw89_pci_rxbd_info *)skb->data; - rx_info->fs = le32_get_bits(rxbd_info->dword, RTW89_PCI_RXBD_FS); - rx_info->ls = le32_get_bits(rxbd_info->dword, RTW89_PCI_RXBD_LS); - rx_info->len = le32_get_bits(rxbd_info->dword, RTW89_PCI_RXBD_WRITE_SIZE); - rx_info->tag = le32_get_bits(rxbd_info->dword, RTW89_PCI_RXBD_TAG); + info = rxbd_info->dword; + + rx_info->fs = le32_get_bits(info, RTW89_PCI_RXBD_FS); + rx_info->ls = le32_get_bits(info, RTW89_PCI_RXBD_LS); + rx_info->len = le32_get_bits(info, RTW89_PCI_RXBD_WRITE_SIZE); + rx_info->tag = le32_get_bits(info, RTW89_PCI_RXBD_TAG); }
static int rtw89_pci_validate_rx_tag(struct rtw89_dev *rtwdev,
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 401cb7dae8130fd34eb84648e02ab4c506df7d5e ]
The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used.
- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list.
At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists.
The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking.
PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided.
Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset.
The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
Cc: Andrii Nakryiko andrii@kernel.org Cc: Eduard Zingerman eddyz87@gmail.com Cc: Hao Luo haoluo@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Fastabend john.fastabend@gmail.com Cc: KP Singh kpsingh@kernel.org Cc: Martin KaFai Lau martin.lau@linux.dev Cc: Song Liu song@kernel.org Cc: Stanislav Fomichev sdf@google.com Cc: Yonghong Song yonghong.song@linux.dev Acked-by: Alexei Starovoitov ast@kernel.org Acked-by: Jesper Dangaard Brouer hawk@kernel.org Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++-------- include/linux/sched.h | 3 +++ kernel/bpf/cpumap.c | 3 +++ kernel/bpf/devmap.c | 9 ++++++- kernel/fork.c | 1 + net/bpf/test_run.c | 11 ++++++++- net/core/dev.c | 29 +++++++++++++++++++++- net/core/filter.c | 44 +++++++++------------------------ net/core/lwt_bpf.c | 3 +++ 9 files changed, 114 insertions(+), 45 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h index 5669da513cd7c..b2dc932fdc35e 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -733,21 +733,59 @@ struct bpf_nh_params { }; };
+/* flags for bpf_redirect_info kern_flags */ +#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ +#define BPF_RI_F_RI_INIT BIT(1) + struct bpf_redirect_info { u64 tgt_index; void *tgt_value; struct bpf_map *map; u32 flags; - u32 kern_flags; u32 map_id; enum bpf_map_type map_type; struct bpf_nh_params nh; + u32 kern_flags; };
-DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); +struct bpf_net_context { + struct bpf_redirect_info ri; +};
-/* flags for bpf_redirect_info kern_flags */ -#define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ +static inline struct bpf_net_context *bpf_net_ctx_set(struct bpf_net_context *bpf_net_ctx) +{ + struct task_struct *tsk = current; + + if (tsk->bpf_net_context != NULL) + return NULL; + bpf_net_ctx->ri.kern_flags = 0; + + tsk->bpf_net_context = bpf_net_ctx; + return bpf_net_ctx; +} + +static inline void bpf_net_ctx_clear(struct bpf_net_context *bpf_net_ctx) +{ + if (bpf_net_ctx) + current->bpf_net_context = NULL; +} + +static inline struct bpf_net_context *bpf_net_ctx_get(void) +{ + return current->bpf_net_context; +} + +static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void) +{ + struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get(); + + if (!(bpf_net_ctx->ri.kern_flags & BPF_RI_F_RI_INIT)) { + memset(&bpf_net_ctx->ri, 0, offsetof(struct bpf_net_context, ri.nh)); + bpf_net_ctx->ri.kern_flags |= BPF_RI_F_RI_INIT; + } + + return &bpf_net_ctx->ri; +}
/* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, @@ -1018,25 +1056,23 @@ struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt);
-void bpf_clear_redirect_map(struct bpf_map *map); - static inline bool xdp_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT; }
static inline void xdp_set_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT; }
static inline void xdp_clear_return_frame_no_direct(void) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; } @@ -1592,7 +1628,7 @@ static __always_inline long __bpf_xdp_redirect_map(struct bpf_map *map, u64 inde u64 flags, const u64 flag_mask, void *lookup_elem(struct bpf_map *map, u32 key)) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); const u64 action_mask = XDP_ABORTED | XDP_DROP | XDP_PASS | XDP_TX;
/* Lower bits of the flags are used as return code on lookup failure */ diff --git a/include/linux/sched.h b/include/linux/sched.h index a5f4b48fca184..6d7170259a538 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -53,6 +53,7 @@ struct bio_list; struct blk_plug; struct bpf_local_storage; struct bpf_run_ctx; +struct bpf_net_context; struct capture_control; struct cfs_rq; struct fs_struct; @@ -1506,6 +1507,8 @@ struct task_struct { /* Used for BPF run context */ struct bpf_run_ctx *bpf_ctx; #endif + /* Used by BPF for per-TASK xdp storage */ + struct bpf_net_context *bpf_net_context;
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK unsigned long lowest_stack; diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index a8e34416e960f..66974bd027109 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -240,12 +240,14 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, int xdp_n, struct xdp_cpumap_stats *stats, struct list_head *list) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int nframes;
if (!rcpu->prog) return xdp_n;
rcu_read_lock_bh(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
nframes = cpu_map_bpf_prog_run_xdp(rcpu, frames, xdp_n, stats);
@@ -255,6 +257,7 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, if (unlikely(!list_empty(list))) cpu_map_bpf_prog_run_skb(rcpu, list, stats);
+ bpf_net_ctx_clear(bpf_net_ctx); rcu_read_unlock_bh(); /* resched point, may call do_softirq() */
return nframes; diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 7f3b34452243c..fbfdfb60db8d7 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -196,7 +196,14 @@ static void dev_map_free(struct bpf_map *map) list_del_rcu(&dtab->list); spin_unlock(&dev_map_lock);
- bpf_clear_redirect_map(map); + /* bpf_redirect_info->map is assigned in __bpf_xdp_redirect_map() + * during NAPI callback and cleared after the XDP redirect. There is no + * explicit RCU read section which protects bpf_redirect_info->map but + * local_bh_disable() also marks the beginning an RCU section. This + * makes the complete softirq callback RCU protected. Thus after + * following synchronize_rcu() there no bpf_redirect_info->map == map + * assignment. + */ synchronize_rcu();
/* Make sure prior __dev_map_entry_free() have completed. */ diff --git a/kernel/fork.c b/kernel/fork.c index 99076dbe27d83..f314bdd7e6108 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2355,6 +2355,7 @@ __latent_entropy struct task_struct *copy_process( RCU_INIT_POINTER(p->bpf_storage, NULL); p->bpf_ctx = NULL; #endif + p->bpf_net_context = NULL;
/* Perform scheduler related setup. Assign this task to a CPU. */ retval = sched_fork(clone_flags, p); diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 36ae54f57bf57..a6d7f790cdda8 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -283,9 +283,10 @@ static int xdp_recv_frames(struct xdp_frame **frames, int nframes, static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, u32 repeat) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int err = 0, act, ret, i, nframes = 0, batch_sz; struct xdp_frame **frames = xdp->frames; + struct bpf_redirect_info *ri; struct xdp_page_head *head; struct xdp_frame *frm; bool redirect = false; @@ -295,6 +296,8 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, batch_sz = min_t(u32, repeat, xdp->batch_size);
local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + ri = bpf_net_ctx_get_ri(); xdp_set_return_frame_no_direct();
for (i = 0; i < batch_sz; i++) { @@ -359,6 +362,7 @@ static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog, }
xdp_clear_return_frame_no_direct(); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); return err; } @@ -394,6 +398,7 @@ static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx, static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, u32 *retval, u32 *time, bool xdp) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; struct bpf_cg_run_ctx run_ctx; @@ -419,10 +424,14 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, do { run_ctx.prog_item = &item; local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + if (xdp) *retval = bpf_prog_run_xdp(prog, ctx); else *retval = bpf_prog_run(prog, ctx); + + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); } while (bpf_test_timer_continue(&t, 1, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); diff --git a/net/core/dev.c b/net/core/dev.c index 2b4819b610b8a..195aa4d488bec 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4029,10 +4029,13 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, { struct bpf_mprog_entry *entry = rcu_dereference_bh(skb->dev->tcx_ingress); enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_INGRESS; + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int sch_ret;
if (!entry) return skb; + + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (*pt_prev) { *ret = deliver_skb(skb, *pt_prev, orig_dev); *pt_prev = NULL; @@ -4061,10 +4064,12 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, break; } *ret = NET_RX_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; case TC_ACT_SHOT: kfree_skb_reason(skb, drop_reason); *ret = NET_RX_DROP; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; /* used by tc_run */ case TC_ACT_STOLEN: @@ -4074,8 +4079,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, fallthrough; case TC_ACT_CONSUMED: *ret = NET_RX_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; } + bpf_net_ctx_clear(bpf_net_ctx);
return skb; } @@ -4085,11 +4092,14 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) { struct bpf_mprog_entry *entry = rcu_dereference_bh(dev->tcx_egress); enum skb_drop_reason drop_reason = SKB_DROP_REASON_TC_EGRESS; + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int sch_ret;
if (!entry) return skb;
+ bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + /* qdisc_skb_cb(skb)->pkt_len & tcx_set_ingress() was * already set by the caller. */ @@ -4105,10 +4115,12 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) /* No need to push/pop skb's mac_header here on egress! */ skb_do_redirect(skb); *ret = NET_XMIT_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; case TC_ACT_SHOT: kfree_skb_reason(skb, drop_reason); *ret = NET_XMIT_DROP; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; /* used by tc_run */ case TC_ACT_STOLEN: @@ -4118,8 +4130,10 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) fallthrough; case TC_ACT_CONSUMED: *ret = NET_XMIT_SUCCESS; + bpf_net_ctx_clear(bpf_net_ctx); return NULL; } + bpf_net_ctx_clear(bpf_net_ctx);
return skb; } @@ -6301,6 +6315,7 @@ enum { static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, unsigned flags, u16 budget) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; bool skip_schedule = false; unsigned long timeout; int rc; @@ -6318,6 +6333,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx);
if (flags & NAPI_F_PREFER_BUSY_POLL) { napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs); @@ -6340,6 +6356,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, netpoll_poll_unlock(have_poll_lock); if (rc == budget) __busy_poll_stop(napi, skip_schedule); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); }
@@ -6349,6 +6366,7 @@ static void __napi_busy_loop(unsigned int napi_id, { unsigned long start_time = loop_end ? busy_loop_current_time() : 0; int (*napi_poll)(struct napi_struct *napi, int budget); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; void *have_poll_lock = NULL; struct napi_struct *napi;
@@ -6367,6 +6385,7 @@ static void __napi_busy_loop(unsigned int napi_id, int work = 0;
local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); if (!napi_poll) { unsigned long val = READ_ONCE(napi->state);
@@ -6397,6 +6416,7 @@ static void __napi_busy_loop(unsigned int napi_id, __NET_ADD_STATS(dev_net(napi->dev), LINUX_MIB_BUSYPOLLRXPACKETS, work); skb_defer_free_flush(this_cpu_ptr(&softnet_data)); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable();
if (!loop_end || loop_end(loop_end_arg, start_time)) @@ -6824,6 +6844,7 @@ static int napi_thread_wait(struct napi_struct *napi)
static void napi_threaded_poll_loop(struct napi_struct *napi) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct softnet_data *sd; unsigned long last_qs = jiffies;
@@ -6832,6 +6853,8 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) void *have;
local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); + sd = this_cpu_ptr(&softnet_data); sd->in_napi_threaded_poll = true;
@@ -6847,6 +6870,7 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) net_rps_action_and_irq_enable(sd); } skb_defer_free_flush(sd); + bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable();
if (!repoll) @@ -6872,10 +6896,12 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + usecs_to_jiffies(READ_ONCE(net_hotdata.netdev_budget_usecs)); + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int budget = READ_ONCE(net_hotdata.netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll);
+ bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); start: sd->in_net_rx_action = true; local_irq_disable(); @@ -6928,7 +6954,8 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) sd->in_net_rx_action = false;
net_rps_action_and_irq_enable(sd); -end:; +end: + bpf_net_ctx_clear(bpf_net_ctx); }
struct netdev_adjacent { diff --git a/net/core/filter.c b/net/core/filter.c index 9933851c685e7..c0f350b963272 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2476,9 +2476,6 @@ static const struct bpf_func_proto bpf_clone_redirect_proto = { .arg3_type = ARG_ANYTHING, };
-DEFINE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); -EXPORT_PER_CPU_SYMBOL_GPL(bpf_redirect_info); - static struct net_device *skb_get_peer_dev(struct net_device *dev) { const struct net_device_ops *ops = dev->netdev_ops; @@ -2491,7 +2488,7 @@ static struct net_device *skb_get_peer_dev(struct net_device *dev)
int skb_do_redirect(struct sk_buff *skb) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct net *net = dev_net(skb->dev); struct net_device *dev; u32 flags = ri->flags; @@ -2524,7 +2521,7 @@ int skb_do_redirect(struct sk_buff *skb)
BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
if (unlikely(flags & (~(BPF_F_INGRESS) | BPF_F_REDIRECT_INTERNAL))) return TC_ACT_SHOT; @@ -2545,7 +2542,7 @@ static const struct bpf_func_proto bpf_redirect_proto = {
BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
if (unlikely(flags)) return TC_ACT_SHOT; @@ -2567,7 +2564,7 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = { BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params, int, plen, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
if (unlikely((plen && plen < sizeof(*params)) || flags)) return TC_ACT_SHOT; @@ -4293,30 +4290,13 @@ void xdp_do_check_flushed(struct napi_struct *napi) } #endif
-void bpf_clear_redirect_map(struct bpf_map *map) -{ - struct bpf_redirect_info *ri; - int cpu; - - for_each_possible_cpu(cpu) { - ri = per_cpu_ptr(&bpf_redirect_info, cpu); - /* Avoid polluting remote cacheline due to writes if - * not needed. Once we pass this test, we need the - * cmpxchg() to make sure it hasn't been changed in - * the meantime by remote CPU. - */ - if (unlikely(READ_ONCE(ri->map) == map)) - cmpxchg(&ri->map, map, NULL); - } -} - DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key); EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
u32 xdp_master_redirect(struct xdp_buff *xdp) { + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct net_device *master, *slave; - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev); slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp); @@ -4388,7 +4368,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, map = READ_ONCE(ri->map);
/* The map pointer is cleared when the map is being torn - * down by bpf_clear_redirect_map() + * down by dev_map_free() */ if (unlikely(!map)) { err = -ENOENT; @@ -4433,7 +4413,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri, int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type;
if (map_type == BPF_MAP_TYPE_XSKMAP) @@ -4447,7 +4427,7 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect); int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp, struct xdp_frame *xdpf, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type;
if (map_type == BPF_MAP_TYPE_XSKMAP) @@ -4464,7 +4444,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, enum bpf_map_type map_type, u32 map_id, u32 flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); struct bpf_map *map; int err;
@@ -4476,7 +4456,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, map = READ_ONCE(ri->map);
/* The map pointer is cleared when the map is being torn - * down by bpf_clear_redirect_map() + * down by dev_map_free() */ if (unlikely(!map)) { err = -ENOENT; @@ -4518,7 +4498,7 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri(); enum bpf_map_type map_type = ri->map_type; void *fwd = ri->tgt_value; u32 map_id = ri->map_id; @@ -4554,7 +4534,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) { - struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); + struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
if (unlikely(flags)) return XDP_ABORTED; diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 4a0797f0a154b..5350dce2e52d6 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -38,6 +38,7 @@ static inline struct bpf_lwt *bpf_lwt_lwtunnel(struct lwtunnel_state *lwt) static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, struct dst_entry *dst, bool can_redirect) { + struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; int ret;
/* Migration disable and BH disable are needed to protect per-cpu @@ -45,6 +46,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, */ migrate_disable(); local_bh_disable(); + bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); bpf_compute_data_pointers(skb); ret = bpf_prog_run_save_cb(lwt->prog, skb);
@@ -77,6 +79,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt, break; }
+ bpf_net_ctx_clear(bpf_net_ctx); local_bh_enable(); migrate_enable();
On Sat, 27 Jul 2024 20:52:53 -0400 Sasha Levin wrote:
Subject: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
no no no, let's drop this one, it's not a fix, and there's a ton of fallout
On Mon, Jul 29, 2024 at 08:00:14AM -0700, Jakub Kicinski wrote:
On Sat, 27 Jul 2024 20:52:53 -0400 Sasha Levin wrote:
Subject: [PATCH AUTOSEL 6.10 10/27] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
no no no, let's drop this one, it's not a fix, and there's a ton of fallout
Ack, will do. Thanks!
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1ca27e0c8c13ac50a4acf9cdf77069e2d94a547d ]
When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need to lock the two sockets to check their states in unix_stream_connect().
We use unix_state_lock() for the server and unix_state_lock_nested() for client with tricky sk->sk_state check to avoid deadlock.
The possible deadlock scenario are the following:
1) Self connect() 2) Simultaneous connect()
The former is simple, attempt to grab the same lock, and the latter is AB-BA deadlock.
After the server's unix_state_lock(), we check the server socket's state, and if it's not TCP_LISTEN, connect() fails with -EINVAL.
Then, we avoid the former deadlock by checking the client's state before unix_state_lock_nested(). If its state is not TCP_LISTEN, we can make sure that the client and the server are not identical based on the state.
Also, the latter deadlock can be avoided in the same way. Due to the server sk->sk_state requirement, AB-BA deadlock could happen only with TCP_LISTEN sockets. So, if the client's state is TCP_LISTEN, we can give up the second lock to avoid the deadlock.
CPU 1 CPU 2 CPU 3 connect(A -> B) connect(B -> A) listen(A) --- --- --- unix_state_lock(B) B->sk_state == TCP_LISTEN READ_ONCE(A->sk_state) == TCP_CLOSE ^^^^^^^^^ ok, will lock A unix_state_lock(A) .--------------' WRITE_ONCE(A->sk_state, TCP_LISTEN) | unix_state_unlock(A) | | unix_state_lock(A) | A->sk_sk_state == TCP_LISTEN | READ_ONCE(B->sk_state) == TCP_LISTEN v ^^^^^^^^^^ unix_state_lock_nested(A) Don't lock B !!
Currently, while checking the client's state, we also check if it's TCP_ESTABLISHED, but this is unlikely and can be checked after we know the state is not TCP_CLOSE.
Moreover, if it happens after the second lock, we now jump to the restart label, but it's unlikely that the server is not found during the retry, so the jump is mostly to revist the client state check.
Let's remove the retry logic and check the state against TCP_CLOSE first.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 34 +++++++++------------------------- 1 file changed, 9 insertions(+), 25 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 142f56770b77f..ff0029e9a7f91 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1473,6 +1473,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, struct unix_sock *u = unix_sk(sk), *newu, *otheru; struct net *net = sock_net(sk); struct sk_buff *skb = NULL; + unsigned char state; long timeo; int err;
@@ -1523,7 +1524,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, goto out; }
- /* Latch state of peer */ unix_state_lock(other);
/* Apparently VFS overslept socket death. Retry. */ @@ -1553,37 +1553,21 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, goto restart; }
- /* Latch our state. - - It is tricky place. We need to grab our state lock and cannot - drop lock on peer. It is dangerous because deadlock is - possible. Connect to self case and simultaneous - attempt to connect are eliminated by checking socket - state. other is TCP_LISTEN, if sk is TCP_LISTEN we - check this before attempt to grab lock. - - Well, and we have to recheck the state after socket locked. + /* self connect and simultaneous connect are eliminated + * by rejecting TCP_LISTEN socket to avoid deadlock. */ - switch (READ_ONCE(sk->sk_state)) { - case TCP_CLOSE: - /* This is ok... continue with connect */ - break; - case TCP_ESTABLISHED: - /* Socket is already connected */ - err = -EISCONN; - goto out_unlock; - default: - err = -EINVAL; + state = READ_ONCE(sk->sk_state); + if (unlikely(state != TCP_CLOSE)) { + err = state == TCP_ESTABLISHED ? -EISCONN : -EINVAL; goto out_unlock; }
unix_state_lock_nested(sk, U_LOCK_SECOND);
- if (sk->sk_state != TCP_CLOSE) { + if (unlikely(sk->sk_state != TCP_CLOSE)) { + err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL; unix_state_unlock(sk); - unix_state_unlock(other); - sock_put(other); - goto restart; + goto out_unlock; }
err = security_unix_stream_connect(sk, other, newsk);
From: FUJITA Tomonori fujita.tomonori@gmail.com
[ Upstream commit eee5528890d54b22b46f833002355a5ee94c3bb4 ]
Add the Edimax Vendor ID (0x1432) for an ethernet driver for Tehuti Networks TN40xx chips. This ID can be used for Realtek 8180 and Ralink rt28xx wireless drivers.
Signed-off-by: FUJITA Tomonori fujita.tomonori@gmail.com Acked-by: Bjorn Helgaas bhelgaas@google.com Link: https://patch.msgid.link/20240623235507.108147-2-fujita.tomonori@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 942a587bb97e3..677aea20d3e11 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2126,6 +2126,8 @@
#define PCI_VENDOR_ID_CHELSIO 0x1425
+#define PCI_VENDOR_ID_EDIMAX 0x1432 + #define PCI_VENDOR_ID_ADLINK 0x144a
#define PCI_VENDOR_ID_SAMSUNG 0x144d
From: Zong-Zhe Yang kevin_yang@realtek.com
[ Upstream commit 021d53a3d87eeb9dbba524ac515651242a2a7e3b ]
In MLD connection, link_data/link_conf are dynamically allocated. They don't point to vif->bss_conf. So, there will be no chanreq assigned to vif->bss_conf and then the chan will be NULL. Tweak the code to check ht_supported/vht_supported/has_he/has_eht on sta deflink.
Crash log (with rtw89 version under MLO development): [ 9890.526087] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 9890.526102] #PF: supervisor read access in kernel mode [ 9890.526105] #PF: error_code(0x0000) - not-present page [ 9890.526109] PGD 0 P4D 0 [ 9890.526114] Oops: 0000 [#1] PREEMPT SMP PTI [ 9890.526119] CPU: 2 PID: 6367 Comm: kworker/u16:2 Kdump: loaded Tainted: G OE 6.9.0 #1 [ 9890.526123] Hardware name: LENOVO 2356AD1/2356AD1, BIOS G7ETB3WW (2.73 ) 11/28/2018 [ 9890.526126] Workqueue: phy2 rtw89_core_ba_work [rtw89_core] [ 9890.526203] RIP: 0010:ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211 [ 9890.526279] Code: f7 e8 d5 93 3e ea 48 83 c4 28 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8b 84 24 e0 f1 ff ff 48 8b 80 90 1b 00 00 <83> 38 03 0f 84 37 fe ff ff bb ea ff ff ff eb cc 49 8b 84 24 10 f3 All code ======== 0: f7 e8 imul %eax 2: d5 (bad) 3: 93 xchg %eax,%ebx 4: 3e ea ds (bad) 6: 48 83 c4 28 add $0x28,%rsp a: 89 d8 mov %ebx,%eax c: 5b pop %rbx d: 41 5c pop %r12 f: 41 5d pop %r13 11: 41 5e pop %r14 13: 41 5f pop %r15 15: 5d pop %rbp 16: c3 retq 17: cc int3 18: cc int3 19: cc int3 1a: cc int3 1b: 49 8b 84 24 e0 f1 ff mov -0xe20(%r12),%rax 22: ff 23: 48 8b 80 90 1b 00 00 mov 0x1b90(%rax),%rax 2a:* 83 38 03 cmpl $0x3,(%rax) <-- trapping instruction 2d: 0f 84 37 fe ff ff je 0xfffffffffffffe6a 33: bb ea ff ff ff mov $0xffffffea,%ebx 38: eb cc jmp 0x6 3a: 49 rex.WB 3b: 8b .byte 0x8b 3c: 84 24 10 test %ah,(%rax,%rdx,1) 3f: f3 repz
Code starting with the faulting instruction =========================================== 0: 83 38 03 cmpl $0x3,(%rax) 3: 0f 84 37 fe ff ff je 0xfffffffffffffe40 9: bb ea ff ff ff mov $0xffffffea,%ebx e: eb cc jmp 0xffffffffffffffdc 10: 49 rex.WB 11: 8b .byte 0x8b 12: 84 24 10 test %ah,(%rax,%rdx,1) 15: f3 repz [ 9890.526285] RSP: 0018:ffffb8db09013d68 EFLAGS: 00010246 [ 9890.526291] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9308e0d656c8 [ 9890.526295] RDX: 0000000000000000 RSI: ffffffffab99460b RDI: ffffffffab9a7685 [ 9890.526300] RBP: ffffb8db09013db8 R08: 0000000000000000 R09: 0000000000000873 [ 9890.526304] R10: ffff9308e0d64800 R11: 0000000000000002 R12: ffff9308e5ff6e70 [ 9890.526308] R13: ffff930952500e20 R14: ffff9309192a8c00 R15: 0000000000000000 [ 9890.526313] FS: 0000000000000000(0000) GS:ffff930b4e700000(0000) knlGS:0000000000000000 [ 9890.526316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9890.526318] CR2: 0000000000000000 CR3: 0000000391c58005 CR4: 00000000001706f0 [ 9890.526321] Call Trace: [ 9890.526324] <TASK> [ 9890.526327] ? show_regs (arch/x86/kernel/dumpstack.c:479) [ 9890.526335] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 9890.526340] ? page_fault_oops (arch/x86/mm/fault.c:713) [ 9890.526347] ? search_module_extables (kernel/module/main.c:3256 (discriminator 3)) [ 9890.526353] ? ieee80211_start_tx_ba_session (net/mac80211/agg-tx.c:618 (discriminator 1)) mac80211
Signed-off-by: Zong-Zhe Yang kevin_yang@realtek.com Link: https://patch.msgid.link/20240617115217.22344-1-kevin_yang@realtek.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/agg-tx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c index 21d55dc539f6c..677bbbac9f169 100644 --- a/net/mac80211/agg-tx.c +++ b/net/mac80211/agg-tx.c @@ -616,7 +616,9 @@ int ieee80211_start_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid, return -EINVAL;
if (!pubsta->deflink.ht_cap.ht_supported && - sta->sdata->vif.bss_conf.chanreq.oper.chan->band != NL80211_BAND_6GHZ) + !pubsta->deflink.vht_cap.vht_supported && + !pubsta->deflink.he_cap.has_he && + !pubsta->deflink.eht_cap.has_eht) return -EINVAL;
if (WARN_ON_ONCE(!local->ops->ampdu_action))
From: Roman Smirnov r.smirnov@omp.ru
[ Upstream commit 56e69e59751d20993f243fb7dd6991c4e522424c ]
An overflow may occur if the function is called with the last block and an offset greater than zero. It is necessary to add a check to avoid this.
Found by Linux Verification Center (linuxtesting.org) with Svace.
[JK: Make test cover also unalloc table freeing]
Link: https://patch.msgid.link/20240620072413.7448-1-r.smirnov@omp.ru Suggested-by: Jan Kara jack@suse.com Signed-off-by: Roman Smirnov r.smirnov@omp.ru Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/udf/balloc.c | 36 +++++++++++++----------------------- 1 file changed, 13 insertions(+), 23 deletions(-)
diff --git a/fs/udf/balloc.c b/fs/udf/balloc.c index ab3ffc355949d..2192c4a1ba98b 100644 --- a/fs/udf/balloc.c +++ b/fs/udf/balloc.c @@ -18,6 +18,7 @@ #include "udfdecl.h"
#include <linux/bitops.h> +#include <linux/overflow.h>
#include "udf_i.h" #include "udf_sb.h" @@ -129,7 +130,6 @@ static void udf_bitmap_free_blocks(struct super_block *sb, { struct udf_sb_info *sbi = UDF_SB(sb); struct buffer_head *bh = NULL; - struct udf_part_map *partmap; unsigned long block; unsigned long block_group; unsigned long bit; @@ -138,19 +138,9 @@ static void udf_bitmap_free_blocks(struct super_block *sb, unsigned long overflow;
mutex_lock(&sbi->s_alloc_mutex); - partmap = &sbi->s_partmaps[bloc->partitionReferenceNum]; - if (bloc->logicalBlockNum + count < count || - (bloc->logicalBlockNum + count) > partmap->s_partition_len) { - udf_debug("%u < %d || %u + %u > %u\n", - bloc->logicalBlockNum, 0, - bloc->logicalBlockNum, count, - partmap->s_partition_len); - goto error_return; - } - + /* We make sure this cannot overflow when mounting the filesystem */ block = bloc->logicalBlockNum + offset + (sizeof(struct spaceBitmapDesc) << 3); - do { overflow = 0; block_group = block >> (sb->s_blocksize_bits + 3); @@ -380,7 +370,6 @@ static void udf_table_free_blocks(struct super_block *sb, uint32_t count) { struct udf_sb_info *sbi = UDF_SB(sb); - struct udf_part_map *partmap; uint32_t start, end; uint32_t elen; struct kernel_lb_addr eloc; @@ -389,16 +378,6 @@ static void udf_table_free_blocks(struct super_block *sb, struct udf_inode_info *iinfo;
mutex_lock(&sbi->s_alloc_mutex); - partmap = &sbi->s_partmaps[bloc->partitionReferenceNum]; - if (bloc->logicalBlockNum + count < count || - (bloc->logicalBlockNum + count) > partmap->s_partition_len) { - udf_debug("%u < %d || %u + %u > %u\n", - bloc->logicalBlockNum, 0, - bloc->logicalBlockNum, count, - partmap->s_partition_len); - goto error_return; - } - iinfo = UDF_I(table); udf_add_free_space(sb, sbi->s_partition, count);
@@ -673,6 +652,17 @@ void udf_free_blocks(struct super_block *sb, struct inode *inode, { uint16_t partition = bloc->partitionReferenceNum; struct udf_part_map *map = &UDF_SB(sb)->s_partmaps[partition]; + uint32_t blk; + + if (check_add_overflow(bloc->logicalBlockNum, offset, &blk) || + check_add_overflow(blk, count, &blk) || + bloc->logicalBlockNum + count > map->s_partition_len) { + udf_debug("Invalid request to free blocks: (%d, %u), off %u, " + "len %u, partition len %u\n", + partition, bloc->logicalBlockNum, offset, count, + map->s_partition_len); + return; + }
if (map->s_partition_flags & UDF_PART_FLAG_UNALLOC_BITMAP) { udf_bitmap_free_blocks(sb, map->s_uspace.s_bitmap,
From: Matt Bobrowski mattbobrowski@google.com
[ Upstream commit ec2b9a5e11e51fea1bb04c1e7e471952e887e874 ]
Currently, it's possible to pass in a modified CONST_PTR_TO_DYNPTR to a global function as an argument. The adverse effects of this is that BPF helpers can continue to make use of this modified CONST_PTR_TO_DYNPTR from within the context of the global function, which can unintentionally result in out-of-bounds memory accesses and therefore compromise overall system stability i.e.
[ 244.157771] BUG: KASAN: slab-out-of-bounds in bpf_dynptr_data+0x137/0x140 [ 244.161345] Read of size 8 at addr ffff88810914be68 by task test_progs/302 [ 244.167151] CPU: 0 PID: 302 Comm: test_progs Tainted: G O E 6.10.0-rc3-00131-g66b586715063 #533 [ 244.174318] Call Trace: [ 244.175787] <TASK> [ 244.177356] dump_stack_lvl+0x66/0xa0 [ 244.179531] print_report+0xce/0x670 [ 244.182314] ? __virt_addr_valid+0x200/0x3e0 [ 244.184908] kasan_report+0xd7/0x110 [ 244.187408] ? bpf_dynptr_data+0x137/0x140 [ 244.189714] ? bpf_dynptr_data+0x137/0x140 [ 244.192020] bpf_dynptr_data+0x137/0x140 [ 244.194264] bpf_prog_b02a02fdd2bdc5fa_global_call_bpf_dynptr_data+0x22/0x26 [ 244.198044] bpf_prog_b0fe7b9d7dc3abde_callback_adjust_bpf_dynptr_reg_off+0x1f/0x23 [ 244.202136] bpf_user_ringbuf_drain+0x2c7/0x570 [ 244.204744] ? 0xffffffffc0009e58 [ 244.206593] ? __pfx_bpf_user_ringbuf_drain+0x10/0x10 [ 244.209795] bpf_prog_33ab33f6a804ba2d_user_ringbuf_callback_const_ptr_to_dynptr_reg_off+0x47/0x4b [ 244.215922] bpf_trampoline_6442502480+0x43/0xe3 [ 244.218691] __x64_sys_prlimit64+0x9/0xf0 [ 244.220912] do_syscall_64+0xc1/0x1d0 [ 244.223043] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 244.226458] RIP: 0033:0x7ffa3eb8f059 [ 244.228582] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8f 1d 0d 00 f7 d8 64 89 01 48 [ 244.241307] RSP: 002b:00007ffa3e9c6eb8 EFLAGS: 00000206 ORIG_RAX: 000000000000012e [ 244.246474] RAX: ffffffffffffffda RBX: 00007ffa3e9c7cdc RCX: 00007ffa3eb8f059 [ 244.250478] RDX: 00007ffa3eb162b4 RSI: 0000000000000000 RDI: 00007ffa3e9c7fb0 [ 244.255396] RBP: 00007ffa3e9c6ed0 R08: 00007ffa3e9c76c0 R09: 0000000000000000 [ 244.260195] R10: 0000000000000000 R11: 0000000000000206 R12: ffffffffffffff80 [ 244.264201] R13: 000000000000001c R14: 00007ffc5d6b4260 R15: 00007ffa3e1c7000 [ 244.268303] </TASK>
Add a check_func_arg_reg_off() to the path in which the BPF verifier verifies the arguments of global function arguments, specifically those which take an argument of type ARG_PTR_TO_DYNPTR | MEM_RDONLY. Also, process_dynptr_func() doesn't appear to perform any explicit and strict type matching on the supplied register type, so let's also enforce that a register either type PTR_TO_STACK or CONST_PTR_TO_DYNPTR is by the caller.
Reported-by: Kumar Kartikeya Dwivedi memxor@gmail.com Acked-by: Kumar Kartikeya Dwivedi memxor@gmail.com Acked-by: Eduard Zingerman eddyz87@gmail.com Signed-off-by: Matt Bobrowski mattbobrowski@google.com Link: https://lore.kernel.org/r/20240625062857.92760-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/verifier.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 214a9fa8c6fb7..87e242f37f46e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7715,6 +7715,13 @@ static int process_dynptr_func(struct bpf_verifier_env *env, int regno, int insn struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; int err;
+ if (reg->type != PTR_TO_STACK && reg->type != CONST_PTR_TO_DYNPTR) { + verbose(env, + "arg#%d expected pointer to stack or const struct bpf_dynptr\n", + regno); + return -EINVAL; + } + /* MEM_UNINIT and MEM_RDONLY are exclusive, when applied to an * ARG_PTR_TO_DYNPTR (or ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_*): */ @@ -9464,6 +9471,10 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog, return -EINVAL; } } else if (arg->arg_type == (ARG_PTR_TO_DYNPTR | MEM_RDONLY)) { + ret = check_func_arg_reg_off(env, reg, regno, ARG_PTR_TO_DYNPTR); + if (ret) + return ret; + ret = process_dynptr_func(env, regno, -1, arg->arg_type, 0); if (ret) return ret; @@ -11957,12 +11968,6 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR; int clone_ref_obj_id = 0;
- if (reg->type != PTR_TO_STACK && - reg->type != CONST_PTR_TO_DYNPTR) { - verbose(env, "arg#%d expected pointer to stack or dynptr_ptr\n", i); - return -EINVAL; - } - if (reg->type == CONST_PTR_TO_DYNPTR) dynptr_arg_type |= MEM_RDONLY;
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit a7e5793035792cc46a1a4b0a783655ffa897dfe9 ]
When a key is requested by userspace, there's really no need to include the key data, the sequence counter is really what userspace needs in this case. The fact that it's included is just a historic quirk.
Remove the key data.
Reviewed-by: Miriam Rachel Korenblit miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20240627104411.b6a4f097e4ea.I7e6cc976cb9e8a80ef25a3... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/nl80211.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 81d5bf186180f..715ecbb3689e7 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -4482,10 +4482,7 @@ static void get_key_callback(void *c, struct key_params *params) struct nlattr *key; struct get_key_cookie *cookie = c;
- if ((params->key && - nla_put(cookie->msg, NL80211_ATTR_KEY_DATA, - params->key_len, params->key)) || - (params->seq && + if ((params->seq && nla_put(cookie->msg, NL80211_ATTR_KEY_SEQ, params->seq_len, params->seq)) || (params->cipher && @@ -4497,10 +4494,7 @@ static void get_key_callback(void *c, struct key_params *params) if (!key) goto nla_put_failure;
- if ((params->key && - nla_put(cookie->msg, NL80211_KEY_DATA, - params->key_len, params->key)) || - (params->seq && + if ((params->seq && nla_put(cookie->msg, NL80211_KEY_SEQ, params->seq_len, params->seq)) || (params->cipher &&
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit b8e0ddd36ce9536ad7478dd27df06c9ae92370ba ]
This is a preparatory patch to work around a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO.
When handling the TEF interrupt, the driver reads the FIFO header index from the TEF FIFO STA register of the chip.
In the bad case, the driver reads a too large head index. In the original code, the driver always trusted the read value, which caused old CAN transmit complete events that were already processed to be re-processed.
Instead of reading and trusting the head index, read the head index and calculate the number of CAN frames that were supposedly received - replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().
The mcp251xfd_handle_tefif() function reads the CAN transmit complete events from the chip, iterates over them and pushes them into the network stack. The original driver already contains code to detect old CAN transmit complete events, that will be updated in the next patch.
Cc: Stefan Althöfer Stefan.Althoefer@janztec.com Cc: Thomas Kopp thomas.kopp@microchip.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/can/spi/mcp251xfd/mcp251xfd-ring.c | 2 + drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 54 +++++++++++++------ drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 13 ++--- 3 files changed, 43 insertions(+), 26 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c index bfe4caa0c99d4..4cb79a4f24612 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c @@ -485,6 +485,8 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv) clear_bit(MCP251XFD_FLAGS_FD_MODE, priv->flags); }
+ tx_ring->obj_num_shift_to_u8 = BITS_PER_TYPE(tx_ring->obj_num) - + ilog2(tx_ring->obj_num); tx_ring->obj_size = tx_obj_size;
rem = priv->rx_obj_num; diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index e5bd57b65aafe..b41fad3b37c06 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -2,7 +2,7 @@ // // mcp251xfd - Microchip MCP251xFD Family CAN controller driver // -// Copyright (c) 2019, 2020, 2021 Pengutronix, +// Copyright (c) 2019, 2020, 2021, 2023 Pengutronix, // Marc Kleine-Budde kernel@pengutronix.de // // Based on: @@ -16,6 +16,11 @@
#include "mcp251xfd.h"
+static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta) +{ + return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF); +} + static inline int mcp251xfd_tef_tail_get_from_chip(const struct mcp251xfd_priv *priv, u8 *tef_tail) @@ -120,28 +125,44 @@ mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv, return 0; }
-static int mcp251xfd_tef_ring_update(struct mcp251xfd_priv *priv) +static int +mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p) { const struct mcp251xfd_tx_ring *tx_ring = priv->tx; - unsigned int new_head; - u8 chip_tx_tail; + const u8 shift = tx_ring->obj_num_shift_to_u8; + u8 chip_tx_tail, tail, len; + u32 fifo_sta; int err;
- err = mcp251xfd_tx_tail_get_from_chip(priv, &chip_tx_tail); + err = regmap_read(priv->map_reg, MCP251XFD_REG_FIFOSTA(priv->tx->fifo_nr), + &fifo_sta); if (err) return err;
- /* chip_tx_tail, is the next TX-Object send by the HW. - * The new TEF head must be >= the old head, ... + if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) { + *len_p = tx_ring->obj_num; + return 0; + } + + chip_tx_tail = FIELD_GET(MCP251XFD_REG_FIFOSTA_FIFOCI_MASK, fifo_sta); + + err = mcp251xfd_check_tef_tail(priv); + if (err) + return err; + tail = mcp251xfd_get_tef_tail(priv); + + /* First shift to full u8. The subtraction works on signed + * values, that keeps the difference steady around the u8 + * overflow. The right shift acts on len, which is an u8. */ - new_head = round_down(priv->tef->head, tx_ring->obj_num) + chip_tx_tail; - if (new_head <= priv->tef->head) - new_head += tx_ring->obj_num; + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(chip_tx_tail)); + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(tail)); + BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(len));
- /* ... but it cannot exceed the TX head. */ - priv->tef->head = min(new_head, tx_ring->head); + len = (chip_tx_tail << shift) - (tail << shift); + *len_p = len >> shift;
- return mcp251xfd_check_tef_tail(priv); + return 0; }
static inline int @@ -182,13 +203,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) u8 tef_tail, len, l; int err, i;
- err = mcp251xfd_tef_ring_update(priv); + err = mcp251xfd_get_tef_len(priv, &len); if (err) return err;
tef_tail = mcp251xfd_get_tef_tail(priv); - len = mcp251xfd_get_tef_len(priv); - l = mcp251xfd_get_tef_linear_len(priv); + l = mcp251xfd_get_tef_linear_len(priv, len); err = mcp251xfd_tef_obj_read(priv, hw_tef_obj, tef_tail, l); if (err) return err; @@ -223,6 +243,8 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) struct mcp251xfd_tx_ring *tx_ring = priv->tx; int offset;
+ ring->head += len; + /* Increment the TEF FIFO tail pointer 'len' times in * a single SPI message. * diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h index b35bfebd23f29..4628bf847bc9b 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h @@ -524,6 +524,7 @@ struct mcp251xfd_tef_ring {
/* u8 obj_num equals tx_ring->obj_num */ /* u8 obj_size equals sizeof(struct mcp251xfd_hw_tef_obj) */ + /* u8 obj_num_shift_to_u8 equals tx_ring->obj_num_shift_to_u8 */
union mcp251xfd_write_reg_buf irq_enable_buf; struct spi_transfer irq_enable_xfer; @@ -542,6 +543,7 @@ struct mcp251xfd_tx_ring { u8 nr; u8 fifo_nr; u8 obj_num; + u8 obj_num_shift_to_u8; u8 obj_size;
struct mcp251xfd_tx_obj obj[MCP251XFD_TX_OBJ_NUM_MAX]; @@ -861,17 +863,8 @@ static inline u8 mcp251xfd_get_tef_tail(const struct mcp251xfd_priv *priv) return priv->tef->tail & (priv->tx->obj_num - 1); }
-static inline u8 mcp251xfd_get_tef_len(const struct mcp251xfd_priv *priv) +static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv, u8 len) { - return priv->tef->head - priv->tef->tail; -} - -static inline u8 mcp251xfd_get_tef_linear_len(const struct mcp251xfd_priv *priv) -{ - u8 len; - - len = mcp251xfd_get_tef_len(priv); - return min_t(u8, len, priv->tx->obj_num - mcp251xfd_get_tef_tail(priv)); }
From: Marc Kleine-Budde mkl@pengutronix.de
[ Upstream commit 3a0a88fcbaf9e027ecca3fe8775be9700b4d6460 ]
This patch updates the workaround for a problem similar to erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip family (mcp2517fd and mcp251863) are probably also affected.
Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA register for an RX FIFO may be corrupted". However observation shows that this problem is not limited to RX FIFOs but also effects the TEF FIFO.
In the bad case, the driver reads a too large head index. As the FIFO is implemented as a ring buffer, this results in re-handling old CAN transmit complete events.
Every transmit complete event contains with a sequence number that equals to the sequence number of the corresponding TX request. This way old TX complete events can be detected.
If the original driver detects a non matching sequence number, it prints an info message and tries again later. As wrong sequence numbers can be explained by the erratum DS80000789E 6, demote the info message to debug level, streamline the code and update the comments.
Keep the behavior: If an old CAN TX complete event is detected, abort the iteration and mark the number of valid CAN TX complete events as processed in the chip by incrementing the FIFO's tail index.
Cc: Stefan Althöfer Stefan.Althoefer@janztec.com Cc: Thomas Kopp thomas.kopp@microchip.com Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 71 +++++++------------ 1 file changed, 27 insertions(+), 44 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index b41fad3b37c06..5b0c7890d4b44 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -60,56 +60,39 @@ static int mcp251xfd_check_tef_tail(const struct mcp251xfd_priv *priv) return 0; }
-static int -mcp251xfd_handle_tefif_recover(const struct mcp251xfd_priv *priv, const u32 seq) -{ - const struct mcp251xfd_tx_ring *tx_ring = priv->tx; - u32 tef_sta; - int err; - - err = regmap_read(priv->map_reg, MCP251XFD_REG_TEFSTA, &tef_sta); - if (err) - return err; - - if (tef_sta & MCP251XFD_REG_TEFSTA_TEFOVIF) { - netdev_err(priv->ndev, - "Transmit Event FIFO buffer overflow.\n"); - return -ENOBUFS; - } - - netdev_info(priv->ndev, - "Transmit Event FIFO buffer %s. (seq=0x%08x, tef_tail=0x%08x, tef_head=0x%08x, tx_head=0x%08x).\n", - tef_sta & MCP251XFD_REG_TEFSTA_TEFFIF ? - "full" : tef_sta & MCP251XFD_REG_TEFSTA_TEFNEIF ? - "not empty" : "empty", - seq, priv->tef->tail, priv->tef->head, tx_ring->head); - - /* The Sequence Number in the TEF doesn't match our tef_tail. */ - return -EAGAIN; -} - static int mcp251xfd_handle_tefif_one(struct mcp251xfd_priv *priv, const struct mcp251xfd_hw_tef_obj *hw_tef_obj, unsigned int *frame_len_ptr) { struct net_device_stats *stats = &priv->ndev->stats; + u32 seq, tef_tail_masked, tef_tail; struct sk_buff *skb; - u32 seq, seq_masked, tef_tail_masked, tef_tail;
- seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2518FD_MASK, + /* Use the MCP2517FD mask on the MCP2518FD, too. We only + * compare 7 bits, this is enough to detect old TEF objects. + */ + seq = FIELD_GET(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK, hw_tef_obj->flags); - - /* Use the MCP2517FD mask on the MCP2518FD, too. We only - * compare 7 bits, this should be enough to detect - * net-yet-completed, i.e. old TEF objects. - */ - seq_masked = seq & - field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK); tef_tail_masked = priv->tef->tail & field_mask(MCP251XFD_OBJ_FLAGS_SEQ_MCP2517FD_MASK); - if (seq_masked != tef_tail_masked) - return mcp251xfd_handle_tefif_recover(priv, seq); + + /* According to mcp2518fd erratum DS80000789E 6. the FIFOCI + * bits of a FIFOSTA register, here the TX FIFO tail index + * might be corrupted and we might process past the TEF FIFO's + * head into old CAN frames. + * + * Compare the sequence number of the currently processed CAN + * frame with the expected sequence number. Abort with + * -EBADMSG if an old CAN frame is detected. + */ + if (seq != tef_tail_masked) { + netdev_dbg(priv->ndev, "%s: chip=0x%02x ring=0x%02x\n", __func__, + seq, tef_tail_masked); + stats->tx_fifo_errors++; + + return -EBADMSG; + }
tef_tail = mcp251xfd_get_tef_tail(priv); skb = priv->can.echo_skb[tef_tail]; @@ -223,12 +206,12 @@ int mcp251xfd_handle_tefif(struct mcp251xfd_priv *priv) unsigned int frame_len = 0;
err = mcp251xfd_handle_tefif_one(priv, &hw_tef_obj[i], &frame_len); - /* -EAGAIN means the Sequence Number in the TEF - * doesn't match our tef_tail. This can happen if we - * read the TEF objects too early. Leave loop let the - * interrupt handler call us again. + /* -EBADMSG means we're affected by mcp2518fd erratum + * DS80000789E 6., i.e. the Sequence Number in the TEF + * doesn't match our tef_tail. Don't process any + * further and mark processed frames as good. */ - if (err == -EAGAIN) + if (err == -EBADMSG) goto out_netif_wake_queue; if (err) return err;
From: Bartosz Golaszewski bartosz.golaszewski@linaro.org
[ Upstream commit 3c466d6537b99f801b3f68af3d8124d4312437a0 ]
On sa8775p-ride-r3 the RX clocks from the AQR115C PHY are not available at the time of the DMA reset. We can however extract the RX clock from the internal SERDES block. Once the link is up, we can revert to the previous state.
The AQR115C PHY doesn't support in-band signalling so we can count on getting the link up notification and safely reuse existing callbacks which are already used by another HW quirk workaround which enables the functional clock to avoid a DMA reset due to timeout.
Only enable loopback on revision 3 of the board - check the phy_mode to make sure.
Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://patch.msgid.link/20240703181500.28491-3-brgl@bgdev.pl Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../stmicro/stmmac/dwmac-qcom-ethqos.c | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c index 466c4002f00d4..3a7f3a8b06718 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c @@ -21,6 +21,7 @@ #define RGMII_IO_MACRO_CONFIG2 0x1C #define RGMII_IO_MACRO_DEBUG1 0x20 #define EMAC_SYSTEM_LOW_POWER_DEBUG 0x28 +#define EMAC_WRAPPER_SGMII_PHY_CNTRL1 0xf4
/* RGMII_IO_MACRO_CONFIG fields */ #define RGMII_CONFIG_FUNC_CLK_EN BIT(30) @@ -79,6 +80,9 @@ #define ETHQOS_MAC_CTRL_SPEED_MODE BIT(14) #define ETHQOS_MAC_CTRL_PORT_SEL BIT(15)
+/* EMAC_WRAPPER_SGMII_PHY_CNTRL1 bits */ +#define SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN BIT(3) + #define SGMII_10M_RX_CLK_DVDR 0x31
struct ethqos_emac_por { @@ -95,6 +99,7 @@ struct ethqos_emac_driver_data { bool has_integrated_pcs; u32 dma_addr_width; struct dwmac4_addrs dwmac4_addrs; + bool needs_sgmii_loopback; };
struct qcom_ethqos { @@ -114,6 +119,7 @@ struct qcom_ethqos { unsigned int num_por; bool rgmii_config_loopback_en; bool has_emac_ge_3; + bool needs_sgmii_loopback; };
static int rgmii_readl(struct qcom_ethqos *ethqos, unsigned int offset) @@ -191,8 +197,22 @@ ethqos_update_link_clk(struct qcom_ethqos *ethqos, unsigned int speed) clk_set_rate(ethqos->link_clk, ethqos->link_clk_rate); }
+static void +qcom_ethqos_set_sgmii_loopback(struct qcom_ethqos *ethqos, bool enable) +{ + if (!ethqos->needs_sgmii_loopback || + ethqos->phy_mode != PHY_INTERFACE_MODE_2500BASEX) + return; + + rgmii_updatel(ethqos, + SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN, + enable ? SGMII_PHY_CNTRL1_SGMII_TX_TO_RX_LOOPBACK_EN : 0, + EMAC_WRAPPER_SGMII_PHY_CNTRL1); +} + static void ethqos_set_func_clk_en(struct qcom_ethqos *ethqos) { + qcom_ethqos_set_sgmii_loopback(ethqos, true); rgmii_updatel(ethqos, RGMII_CONFIG_FUNC_CLK_EN, RGMII_CONFIG_FUNC_CLK_EN, RGMII_IO_MACRO_CONFIG); } @@ -277,6 +297,7 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = { .has_emac_ge_3 = true, .link_clk_name = "phyaux", .has_integrated_pcs = true, + .needs_sgmii_loopback = true, .dma_addr_width = 36, .dwmac4_addrs = { .dma_chan = 0x00008100, @@ -674,6 +695,7 @@ static void ethqos_fix_mac_speed(void *priv, unsigned int speed, unsigned int mo { struct qcom_ethqos *ethqos = priv;
+ qcom_ethqos_set_sgmii_loopback(ethqos, false); ethqos->speed = speed; ethqos_update_link_clk(ethqos, speed); ethqos_configure(ethqos); @@ -809,6 +831,7 @@ static int qcom_ethqos_probe(struct platform_device *pdev) ethqos->num_por = data->num_por; ethqos->rgmii_config_loopback_en = data->rgmii_config_loopback_en; ethqos->has_emac_ge_3 = data->has_emac_ge_3; + ethqos->needs_sgmii_loopback = data->needs_sgmii_loopback;
ethqos->link_clk = devm_clk_get(dev, data->link_clk_name ?: "rgmii"); if (IS_ERR(ethqos->link_clk))
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 0970836c348b6bc2ea77ce4348a136d6febfd440 ]
The driver triggers a "Secondary Bus Reset" (SBR) by calling __pci_reset_function_locked() which asserts the SBR bit in the "Bridge Control Register" in the configuration space of the upstream bridge for 2ms. This is done without locking the configuration space of the upstream bridge port, allowing user space to access it concurrently.
Linux 6.11 will start warning about such unlocked resets [1][2]:
pcieport 0000:00:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x51c/0x6a0
Avoid the warning and the concurrent access by locking the configuration space of the upstream bridge prior to the reset and unlocking it afterwards.
[1] https://lore.kernel.org/all/171711746953.1628941.4692125082286867825.stgit@d... [2] https://lore.kernel.org/all/20240531213150.GA610983@bhelgaas/
Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Link: https://patch.msgid.link/9937b0afdb50f2f2825945393c94c093c04a5897.1720447210... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index c0ced4d315f3d..d92f640bae575 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1599,6 +1599,7 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci, { struct pci_dev *pdev = mlxsw_pci->pdev; char mrsr_pl[MLXSW_REG_MRSR_LEN]; + struct pci_dev *bridge; int err;
if (!pci_reset_sbr_supported) { @@ -1615,6 +1616,9 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci, sbr: device_lock_assert(&pdev->dev);
+ bridge = pci_upstream_bridge(pdev); + if (bridge) + pci_cfg_access_lock(bridge); pci_cfg_access_lock(pdev); pci_save_state(pdev);
@@ -1624,6 +1628,8 @@ static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci,
pci_restore_state(pdev); pci_cfg_access_unlock(pdev); + if (bridge) + pci_cfg_access_unlock(bridge);
return err; }
From: Qu Wenruo wqu@suse.com
[ Upstream commit 97713b1a2ced1e4a2a6c40045903797ebd44d7e0 ]
[BUG] For subpage + zoned case, the following workload can lead to rsv data leak at unmount time:
# mkfs.btrfs -f -s 4k $dev # mount $dev $mnt # fsstress -w -n 8 -d $mnt -s 1709539240 0/0: fiemap - no filename 0/1: copyrange read - no filename 0/2: write - no filename 0/3: rename - no source filename 0/4: creat f0 x:0 0 0 0/4: creat add id=0,parent=-1 0/5: writev f0[259 1 0 0 0 0] [778052,113,965] 0 0/6: ioctl(FIEMAP) f0[259 1 0 0 224 887097] [1294220,2291618343991484791,0x10000] -1 0/7: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 224 887097] return 25, fallback to stat() 0/7: dwrite f0[259 1 0 0 224 887097] [696320,102400] 0 # umount $mnt
The dmesg includes the following rsv leak detection warning (all call trace skipped):
------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8653 btrfs_destroy_inode+0x1e0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8654 btrfs_destroy_inode+0x1a8/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ WARNING: CPU: 2 PID: 4528 at fs/btrfs/inode.c:8660 btrfs_destroy_inode+0x1a0/0x200 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): last unmount of filesystem 1b4abba9-de34-4f07-9e7f-157cf12a18d6 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info DATA has 268218368 free, is not full BTRFS info (device sda): space_info total=268435456, used=204800, pinned=0, reserved=0, may_use=12288, readonly=0 zone_unusable=0 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0 ------------[ cut here ]------------ WARNING: CPU: 3 PID: 4528 at fs/btrfs/block-group.c:4434 btrfs_free_block_groups+0x338/0x500 [btrfs] ---[ end trace 0000000000000000 ]--- BTRFS info (device sda): space_info METADATA has 267796480 free, is not full BTRFS info (device sda): space_info total=268435456, used=131072, pinned=0, reserved=0, may_use=262144, readonly=0 zone_unusable=245760 BTRFS info (device sda): global_block_rsv: size 0 reserved 0 BTRFS info (device sda): trans_block_rsv: size 0 reserved 0 BTRFS info (device sda): chunk_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_block_rsv: size 0 reserved 0 BTRFS info (device sda): delayed_refs_rsv: size 0 reserved 0
Above $dev is a tcmu-runner emulated zoned HDD, which has a max zone append size of 64K, and the system has 64K page size.
[CAUSE] I have added several trace_printk() to show the events (header skipped):
btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The above lines show our buffered write has dirtied 3 pages of inode 259 of root 5:
704K 768K 832K 896K I |////I/////////////////I///////////| I 756K 868K
|///| is the dirtied range using subpage bitmaps. and 'I' is the page boundary.
Meanwhile all three pages (704K, 768K, 832K) have their PageDirty flag set.
btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400
Then direct IO write starts, since the range [680K, 780K) covers the beginning part of the above dirty range, we need to writeback the two pages at 704K and 768K.
cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
Now the above 2 lines show that we're writing back for dirty range [756K, 756K + 64K). We only writeback 64K because the zoned device has max zone append size as 64K.
extent_write_locked_range: r/i=5/259 clear dirty for page=786432
!!! The above line shows the root cause. !!!
We're calling clear_page_dirty_for_io() inside extent_write_locked_range(), for the page 768K. This is because extent_write_locked_range() can go beyond the current locked page, here we hit the page at 768K and clear its page dirt.
In fact this would lead to the desync between subpage dirty and page dirty flags. We have the page dirty flag cleared, but the subpage range [820K, 832K) is still dirty.
After the writeback of range [756K, 820K), the dirty flags look like this, as page 768K no longer has dirty flag set.
704K 768K 832K 896K I I | I/////////////| I 820K 868K
This means we will no longer writeback range [820K, 832K), thus the reserved data/metadata space would never be properly released.
extent_write_cache_pages: r/i=5/259 skip non-dirty folio=786432
Now even though we try to start writeback for page 768K, since the page is not dirty, we completely skip it at extent_write_cache_pages() time.
btrfs_direct_write: r/i=5/259 dio done filepos=696320 len=0
Now the direct IO finished.
cow_file_range: r/i=5/259 add ordered extent filepos=851968 len=36864 extent_write_locked_range: r/i=5/259 locked page=851968 start=851968 len=36864
Now we writeback the remaining dirty range, which is [832K, 868K). Causing the range [820K, 832K) never to be submitted, thus leaking the reserved space.
This bug only affects subpage and zoned case. For non-subpage and zoned case, we have exactly one sector for each page, thus no such partial dirty cases.
For subpage and non-zoned case, we never go into run_delalloc_cow(), and normally all the dirty subpage ranges would be properly submitted inside __extent_writepage_io().
[FIX] Just do not clear the page dirty at all inside extent_write_locked_range(). As __extent_writepage_io() would do a more accurate, subpage compatible clear for page and subpage dirty flags anyway.
Now the correct trace would look like this:
btrfs_dirty_pages: r/i=5/259 dirty start=774144 len=114688 btrfs_dirty_pages: r/i=5/259 dirty part of page=720896 off_in_page=53248 len_in_page=12288 btrfs_dirty_pages: r/i=5/259 dirty part of page=786432 off_in_page=0 len_in_page=65536 btrfs_dirty_pages: r/i=5/259 dirty part of page=851968 off_in_page=0 len_in_page=36864
The page dirty part is still the same 3 pages.
btrfs_direct_write: r/i=5/259 start dio filepos=696320 len=102400 cow_file_range: r/i=5/259 add ordered extent filepos=774144 len=65536 extent_write_locked_range: r/i=5/259 locked page=720896 start=774144 len=65536
And the writeback for the first 64K is still correct.
cow_file_range: r/i=5/259 add ordered extent filepos=839680 len=49152 extent_write_locked_range: r/i=5/259 locked page=786432 start=839680 len=49152
Now with the fix, we can properly writeback the range [820K, 832K), and properly release the reserved data/metadata space.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 958155cc43a81..0486b1f911248 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2246,10 +2246,8 @@ void extent_write_locked_range(struct inode *inode, struct page *locked_page,
page = find_get_page(mapping, cur >> PAGE_SHIFT); ASSERT(PageLocked(page)); - if (pages_dirty && page != locked_page) { + if (pages_dirty && page != locked_page) ASSERT(PageDirty(page)); - clear_page_dirty_for_io(page); - }
ret = __extent_writepage_io(BTRFS_I(inode), page, &bio_ctrl, i_size, &nr);
From: Filipe Manana fdmanana@suse.com
[ Upstream commit bb3868033a4cccff7be57e9145f2117cbdc91c11 ]
When freeing a tree block, at btrfs_free_tree_block(), if we fail to create a delayed reference we don't deal with the error and just do a BUG_ON(). The error most likely to happen is -ENOMEM, and we have a comment mentioning that only -ENOMEM can happen, but that is not true, because in case qgroups are enabled any error returned from btrfs_qgroup_trace_extent_post() (can be -EUCLEAN or anything returned from btrfs_search_slot() for example) can be propagated back to btrfs_free_tree_block().
So stop doing a BUG_ON() and return the error to the callers and make them abort the transaction to prevent leaking space. Syzbot was triggering this, likely due to memory allocation failure injection.
Reported-by: syzbot+a306f914b4d01b3958fe@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000fcba1e05e998263c@google.com/ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/ctree.c | 53 ++++++++++++++++++++++++++++++-------- fs/btrfs/extent-tree.c | 24 ++++++++++------- fs/btrfs/extent-tree.h | 8 +++--- fs/btrfs/free-space-tree.c | 10 ++++--- fs/btrfs/ioctl.c | 6 ++++- fs/btrfs/qgroup.c | 6 +++-- 6 files changed, 76 insertions(+), 31 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 1a49b92329908..ca372068226d5 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -620,10 +620,16 @@ int btrfs_force_cow_block(struct btrfs_trans_handle *trans, atomic_inc(&cow->refs); rcu_assign_pointer(root->node, cow);
- btrfs_free_tree_block(trans, btrfs_root_id(root), buf, - parent_start, last_ref); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), buf, + parent_start, last_ref); free_extent_buffer(buf); add_root_to_dirty_list(root); + if (ret < 0) { + btrfs_tree_unlock(cow); + free_extent_buffer(cow); + btrfs_abort_transaction(trans, ret); + return ret; + } } else { WARN_ON(trans->transid != btrfs_header_generation(parent)); ret = btrfs_tree_mod_log_insert_key(parent, parent_slot, @@ -648,8 +654,14 @@ int btrfs_force_cow_block(struct btrfs_trans_handle *trans, return ret; } } - btrfs_free_tree_block(trans, btrfs_root_id(root), buf, - parent_start, last_ref); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), buf, + parent_start, last_ref); + if (ret < 0) { + btrfs_tree_unlock(cow); + free_extent_buffer(cow); + btrfs_abort_transaction(trans, ret); + return ret; + } } if (unlock_orig) btrfs_tree_unlock(buf); @@ -983,9 +995,13 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, free_extent_buffer(mid);
root_sub_used_bytes(root); - btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1); /* once for the root ptr */ free_extent_buffer_stale(mid); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto out; + } return 0; } if (btrfs_header_nritems(mid) > @@ -1053,10 +1069,14 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, goto out; } root_sub_used_bytes(root); - btrfs_free_tree_block(trans, btrfs_root_id(root), right, - 0, 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), + right, 0, 1); free_extent_buffer_stale(right); right = NULL; + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto out; + } } else { struct btrfs_disk_key right_key; btrfs_node_key(right, &right_key, 0); @@ -1111,9 +1131,13 @@ static noinline int balance_level(struct btrfs_trans_handle *trans, goto out; } root_sub_used_bytes(root); - btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1); free_extent_buffer_stale(mid); mid = NULL; + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto out; + } } else { /* update the parent key to reflect our changes */ struct btrfs_disk_key mid_key; @@ -2883,7 +2907,11 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, old = root->node; ret = btrfs_tree_mod_log_insert_root(root->node, c, false); if (ret < 0) { - btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1); + int ret2; + + ret2 = btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1); + if (ret2 < 0) + btrfs_abort_transaction(trans, ret2); btrfs_tree_unlock(c); free_extent_buffer(c); return ret; @@ -4452,9 +4480,12 @@ static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans, root_sub_used_bytes(root);
atomic_inc(&leaf->refs); - btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1); free_extent_buffer_stale(leaf); - return 0; + if (ret < 0) + btrfs_abort_transaction(trans, ret); + + return ret; } /* * delete the item at the leaf level in path. If that empties diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 3774c191e36dc..41db538e4959e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3419,10 +3419,10 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans, return 0; }
-void btrfs_free_tree_block(struct btrfs_trans_handle *trans, - u64 root_id, - struct extent_buffer *buf, - u64 parent, int last_ref) +int btrfs_free_tree_block(struct btrfs_trans_handle *trans, + u64 root_id, + struct extent_buffer *buf, + u64 parent, int last_ref) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group *bg; @@ -3449,11 +3449,12 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, btrfs_init_tree_ref(&generic_ref, btrfs_header_level(buf), 0, false); btrfs_ref_tree_mod(fs_info, &generic_ref); ret = btrfs_add_delayed_tree_ref(trans, &generic_ref, NULL); - BUG_ON(ret); /* -ENOMEM */ + if (ret < 0) + return ret; }
if (!last_ref) - return; + return 0;
if (btrfs_header_generation(buf) != trans->transid) goto out; @@ -3510,6 +3511,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, * matter anymore. */ clear_bit(EXTENT_BUFFER_CORRUPT, &buf->bflags); + return 0; }
/* Can return -ENOMEM */ @@ -5643,7 +5645,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans, struct walk_control *wc) { struct btrfs_fs_info *fs_info = root->fs_info; - int ret; + int ret = 0; int level = wc->level; struct extent_buffer *eb = path->nodes[level]; u64 parent = 0; @@ -5730,12 +5732,14 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans, goto owner_mismatch; }
- btrfs_free_tree_block(trans, btrfs_root_id(root), eb, parent, - wc->refs[level] == 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(root), eb, parent, + wc->refs[level] == 1); + if (ret < 0) + btrfs_abort_transaction(trans, ret); out: wc->refs[level] = 0; wc->flags[level] = 0; - return 0; + return ret;
owner_mismatch: btrfs_err_rl(fs_info, "unexpected tree owner, have %llu expect %llu", diff --git a/fs/btrfs/extent-tree.h b/fs/btrfs/extent-tree.h index af9f8800d5aca..2ad51130c037e 100644 --- a/fs/btrfs/extent-tree.h +++ b/fs/btrfs/extent-tree.h @@ -127,10 +127,10 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, u64 empty_size, u64 reloc_src_root, enum btrfs_lock_nesting nest); -void btrfs_free_tree_block(struct btrfs_trans_handle *trans, - u64 root_id, - struct extent_buffer *buf, - u64 parent, int last_ref); +int btrfs_free_tree_block(struct btrfs_trans_handle *trans, + u64 root_id, + struct extent_buffer *buf, + u64 parent, int last_ref); int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 owner, u64 offset, u64 ram_bytes, diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index 90f2938bd743d..7ba50e133921a 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -1300,10 +1300,14 @@ int btrfs_delete_free_space_tree(struct btrfs_fs_info *fs_info) btrfs_tree_lock(free_space_root->node); btrfs_clear_buffer_dirty(trans, free_space_root->node); btrfs_tree_unlock(free_space_root->node); - btrfs_free_tree_block(trans, btrfs_root_id(free_space_root), - free_space_root->node, 0, 1); - + ret = btrfs_free_tree_block(trans, btrfs_root_id(free_space_root), + free_space_root->node, 0, 1); btrfs_put_root(free_space_root); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + btrfs_end_transaction(trans); + return ret; + }
return btrfs_commit_transaction(trans); } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index efd5d6e9589e0..c1b0556e40368 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -719,6 +719,8 @@ static noinline int create_subvol(struct mnt_idmap *idmap, ret = btrfs_insert_root(trans, fs_info->tree_root, &key, root_item); if (ret) { + int ret2; + /* * Since we don't abort the transaction in this case, free the * tree block so that we don't leak space and leave the @@ -729,7 +731,9 @@ static noinline int create_subvol(struct mnt_idmap *idmap, btrfs_tree_lock(leaf); btrfs_clear_buffer_dirty(trans, leaf); btrfs_tree_unlock(leaf); - btrfs_free_tree_block(trans, objectid, leaf, 0, 1); + ret2 = btrfs_free_tree_block(trans, objectid, leaf, 0, 1); + if (ret2 < 0) + btrfs_abort_transaction(trans, ret2); free_extent_buffer(leaf); goto out; } diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 39a15cca58ca9..29d6ca3b874ec 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1446,9 +1446,11 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info) btrfs_tree_lock(quota_root->node); btrfs_clear_buffer_dirty(trans, quota_root->node); btrfs_tree_unlock(quota_root->node); - btrfs_free_tree_block(trans, btrfs_root_id(quota_root), - quota_root->node, 0, 1); + ret = btrfs_free_tree_block(trans, btrfs_root_id(quota_root), + quota_root->node, 0, 1);
+ if (ret < 0) + btrfs_abort_transaction(trans, ret);
out: btrfs_put_root(quota_root);
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 5c83b3beaee06aa88d4015408ac2d8bb35380b06 ]
Instead of using an if-else statement when processing the extent item at btrfs_lookup_extent_info(), use a single if statement for the error case since it does a goto at the end and leave the success (expected) case following the if statement, reducing indentation and making the logic a bit easier to follow. Also make the if statement's condition as unlikely since it's not expected to ever happen, as it signals some corruption, making it clear and hint the compiler to generate more efficient code.
Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent-tree.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 41db538e4959e..a7190fd747e07 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -104,10 +104,7 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *head; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_path *path; - struct btrfs_extent_item *ei; - struct extent_buffer *leaf; struct btrfs_key key; - u32 item_size; u64 num_refs; u64 extent_flags; u64 owner = 0; @@ -157,16 +154,11 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans, }
if (ret == 0) { - leaf = path->nodes[0]; - item_size = btrfs_item_size(leaf, path->slots[0]); - if (item_size >= sizeof(*ei)) { - ei = btrfs_item_ptr(leaf, path->slots[0], - struct btrfs_extent_item); - num_refs = btrfs_extent_refs(leaf, ei); - extent_flags = btrfs_extent_flags(leaf, ei); - owner = btrfs_get_extent_owner_root(fs_info, leaf, - path->slots[0]); - } else { + struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_extent_item *ei; + const u32 item_size = btrfs_item_size(leaf, path->slots[0]); + + if (unlikely(item_size < sizeof(*ei))) { ret = -EUCLEAN; btrfs_err(fs_info, "unexpected extent item size, has %u expect >= %zu", @@ -179,6 +171,10 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans, goto out_free; }
+ ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item); + num_refs = btrfs_extent_refs(leaf, ei); + extent_flags = btrfs_extent_flags(leaf, ei); + owner = btrfs_get_extent_owner_root(fs_info, leaf, path->slots[0]); BUG_ON(num_refs == 0); } else { num_refs = 0;
From: Filipe Manana fdmanana@suse.com
[ Upstream commit ca84529a842f3a15a5f17beac6252aa11955923f ]
KCSAN complains about a data race when accessing the last_trans field of a root:
[ 199.553628] BUG: KCSAN: data-race in btrfs_record_root_in_trans [btrfs] / record_root_in_trans [btrfs]
[ 199.555186] read to 0x000000008801e308 of 8 bytes by task 2812 on cpu 1: [ 199.555210] btrfs_record_root_in_trans+0x9a/0x128 [btrfs] [ 199.555999] start_transaction+0x154/0xcd8 [btrfs] [ 199.556780] btrfs_join_transaction+0x44/0x60 [btrfs] [ 199.557559] btrfs_dirty_inode+0x9c/0x140 [btrfs] [ 199.558339] btrfs_update_time+0x8c/0xb0 [btrfs] [ 199.559123] touch_atime+0x16c/0x1e0 [ 199.559151] pipe_read+0x6a8/0x7d0 [ 199.559179] vfs_read+0x466/0x498 [ 199.559204] ksys_read+0x108/0x150 [ 199.559230] __s390x_sys_read+0x68/0x88 [ 199.559257] do_syscall+0x1c6/0x210 [ 199.559286] __do_syscall+0xc8/0xf0 [ 199.559318] system_call+0x70/0x98
[ 199.559431] write to 0x000000008801e308 of 8 bytes by task 2808 on cpu 0: [ 199.559464] record_root_in_trans+0x196/0x228 [btrfs] [ 199.560236] btrfs_record_root_in_trans+0xfe/0x128 [btrfs] [ 199.561097] start_transaction+0x154/0xcd8 [btrfs] [ 199.561927] btrfs_join_transaction+0x44/0x60 [btrfs] [ 199.562700] btrfs_dirty_inode+0x9c/0x140 [btrfs] [ 199.563493] btrfs_update_time+0x8c/0xb0 [btrfs] [ 199.564277] file_update_time+0xb8/0xf0 [ 199.564301] pipe_write+0x8ac/0xab8 [ 199.564326] vfs_write+0x33c/0x588 [ 199.564349] ksys_write+0x108/0x150 [ 199.564372] __s390x_sys_write+0x68/0x88 [ 199.564397] do_syscall+0x1c6/0x210 [ 199.564424] __do_syscall+0xc8/0xf0 [ 199.564452] system_call+0x70/0x98
This is because we update and read last_trans concurrently without any type of synchronization. This should be generally harmless and in the worst case it can make us do extra locking (btrfs_record_root_in_trans()) trigger some warnings at ctree.c or do extra work during relocation - this would probably only happen in case of load or store tearing.
So fix this by always reading and updating the field using READ_ONCE() and WRITE_ONCE(), this silences KCSAN and prevents load and store tearing.
Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/ctree.c | 4 ++-- fs/btrfs/ctree.h | 10 ++++++++++ fs/btrfs/defrag.c | 2 +- fs/btrfs/disk-io.c | 4 ++-- fs/btrfs/relocation.c | 8 ++++---- fs/btrfs/transaction.c | 8 ++++---- 6 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index ca372068226d5..8a791b648ac53 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -321,7 +321,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && trans->transid != fs_info->running_transaction->transid); WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && - trans->transid != root->last_trans); + trans->transid != btrfs_get_root_last_trans(root));
level = btrfs_header_level(buf); if (level == 0) @@ -551,7 +551,7 @@ int btrfs_force_cow_block(struct btrfs_trans_handle *trans, WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && trans->transid != fs_info->running_transaction->transid); WARN_ON(test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && - trans->transid != root->last_trans); + trans->transid != btrfs_get_root_last_trans(root));
level = btrfs_header_level(buf);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c03c58246033b..b2e4b30b8fae9 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -354,6 +354,16 @@ static inline void btrfs_set_root_last_log_commit(struct btrfs_root *root, int c WRITE_ONCE(root->last_log_commit, commit_id); }
+static inline u64 btrfs_get_root_last_trans(const struct btrfs_root *root) +{ + return READ_ONCE(root->last_trans); +} + +static inline void btrfs_set_root_last_trans(struct btrfs_root *root, u64 transid) +{ + WRITE_ONCE(root->last_trans, transid); +} + /* * Structure that conveys information about an extent that is going to replace * all the extents in a file range. diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index 407ccec3e57ed..f664678c71d15 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -139,7 +139,7 @@ int btrfs_add_inode_defrag(struct btrfs_trans_handle *trans, if (trans) transid = trans->transid; else - transid = inode->root->last_trans; + transid = btrfs_get_root_last_trans(root);
defrag = kmem_cache_zalloc(btrfs_inode_defrag_cachep, GFP_NOFS); if (!defrag) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cabb558dbdaa8..3791813dc7b62 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -658,7 +658,7 @@ static void __setup_root(struct btrfs_root *root, struct btrfs_fs_info *fs_info, root->state = 0; RB_CLEAR_NODE(&root->rb_node);
- root->last_trans = 0; + btrfs_set_root_last_trans(root, 0); root->free_objectid = 0; root->nr_delalloc_inodes = 0; root->nr_ordered_extents = 0; @@ -1010,7 +1010,7 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans, return ret; }
- log_root->last_trans = trans->transid; + btrfs_set_root_last_trans(log_root, trans->transid); log_root->root_key.offset = btrfs_root_id(root);
inode_item = &log_root->root_item.inode; diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 8b24bb5a0aa18..f2935252b981a 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -817,7 +817,7 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans, goto abort; } set_bit(BTRFS_ROOT_SHAREABLE, &reloc_root->state); - reloc_root->last_trans = trans->transid; + btrfs_set_root_last_trans(reloc_root, trans->transid); return reloc_root; fail: kfree(root_item); @@ -864,7 +864,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, */ if (root->reloc_root) { reloc_root = root->reloc_root; - reloc_root->last_trans = trans->transid; + btrfs_set_root_last_trans(reloc_root, trans->transid); return 0; }
@@ -1739,7 +1739,7 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc, * btrfs_update_reloc_root() and update our root item * appropriately. */ - reloc_root->last_trans = trans->transid; + btrfs_set_root_last_trans(reloc_root, trans->transid); trans->block_rsv = rc->block_rsv;
replaced = 0; @@ -2082,7 +2082,7 @@ static int record_reloc_root_in_trans(struct btrfs_trans_handle *trans, struct btrfs_root *root; int ret;
- if (reloc_root->last_trans == trans->transid) + if (btrfs_get_root_last_trans(reloc_root) == trans->transid) return 0;
root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, false); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3388c836b9a56..76117bb2c726c 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -405,7 +405,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, int ret = 0;
if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && - root->last_trans < trans->transid) || force) { + btrfs_get_root_last_trans(root) < trans->transid) || force) { WARN_ON(!force && root->commit_root != root->node);
/* @@ -421,7 +421,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, smp_wmb();
spin_lock(&fs_info->fs_roots_radix_lock); - if (root->last_trans == trans->transid && !force) { + if (btrfs_get_root_last_trans(root) == trans->transid && !force) { spin_unlock(&fs_info->fs_roots_radix_lock); return 0; } @@ -429,7 +429,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans, (unsigned long)btrfs_root_id(root), BTRFS_ROOT_TRANS_TAG); spin_unlock(&fs_info->fs_roots_radix_lock); - root->last_trans = trans->transid; + btrfs_set_root_last_trans(root, trans->transid);
/* this is pretty tricky. We don't want to * take the relocation lock in btrfs_record_root_in_trans @@ -491,7 +491,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans, * and barriers */ smp_rmb(); - if (root->last_trans == trans->transid && + if (btrfs_get_root_last_trans(root) == trans->transid && !test_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state)) return 0;
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 320d8dc612660da84c3b70a28658bb38069e5a9a ]
If we failed to link a free space entry because there's already a conflicting entry for the same offset, we free the free space entry but we don't free the associated bitmap that we had just allocated before. Fix that by freeing the bitmap before freeing the entry.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/free-space-cache.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index dabc3d0793cf4..5007dabc45d3d 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -858,6 +858,7 @@ static int __load_free_space_cache(struct btrfs_root *root, struct inode *inode, spin_unlock(&ctl->tree_lock); btrfs_err(fs_info, "Duplicate entries in free space cache, dumping"); + kmem_cache_free(btrfs_free_space_bitmap_cachep, e->bitmap); kmem_cache_free(btrfs_free_space_cachep, e); goto free_cache; }
From: Luke Wang ziniu.wang_1@nxp.com
[ Upstream commit 0d0df1e750bac0fdaa77940e711c1625cff08d33 ]
When unload the btnxpuart driver, its associated timer will be deleted. If the timer happens to be modified at this moment, it leads to the kernel call this timer even after the driver unloaded, resulting in kernel panic. Use timer_shutdown_sync() instead of del_timer_sync() to prevent rearming.
panic log: Internal error: Oops: 0000000086000007 [#1] PREEMPT SMP Modules linked in: algif_hash algif_skcipher af_alg moal(O) mlan(O) crct10dif_ce polyval_ce polyval_generic snd_soc_imx_card snd_soc_fsl_asoc_card snd_soc_imx_audmux mxc_jpeg_encdec v4l2_jpeg snd_soc_wm8962 snd_soc_fsl_micfil snd_soc_fsl_sai flexcan snd_soc_fsl_utils ap130x rpmsg_ctrl imx_pcm_dma can_dev rpmsg_char pwm_fan fuse [last unloaded: btnxpuart] CPU: 5 PID: 723 Comm: memtester Tainted: G O 6.6.23-lts-next-06207-g4aef2658ac28 #1 Hardware name: NXP i.MX95 19X19 board (DT) pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0xffff80007a2cf464 lr : call_timer_fn.isra.0+0x24/0x80 ... Call trace: 0xffff80007a2cf464 __run_timers+0x234/0x280 run_timer_softirq+0x20/0x40 __do_softirq+0x100/0x26c ____do_softirq+0x10/0x1c call_on_irq_stack+0x24/0x4c do_softirq_own_stack+0x1c/0x2c irq_exit_rcu+0xc0/0xdc el0_interrupt+0x54/0xd8 __el0_irq_handler_common+0x18/0x24 el0t_64_irq_handler+0x10/0x1c el0t_64_irq+0x190/0x194 Code: ???????? ???????? ???????? ???????? (????????) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0,c0000000,40028143,1000721b Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
Signed-off-by: Luke Wang ziniu.wang_1@nxp.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btnxpuart.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/bluetooth/btnxpuart.c b/drivers/bluetooth/btnxpuart.c index 9bfa9a6ad56c8..8fa06eb051d1e 100644 --- a/drivers/bluetooth/btnxpuart.c +++ b/drivers/bluetooth/btnxpuart.c @@ -328,7 +328,7 @@ static void ps_cancel_timer(struct btnxpuart_dev *nxpdev) struct ps_data *psdata = &nxpdev->psdata;
flush_work(&psdata->work); - del_timer_sync(&psdata->ps_timer); + timer_shutdown_sync(&psdata->ps_timer); }
static void ps_control(struct hci_dev *hdev, u8 ps_state)
From: Hilda Wu hildawu@realtek.com
[ Upstream commit 295ef07a9dae6182ad4b689aa8c6a7dbba21474c ]
Add the support ID 0489:e125 to usb_device_id table for Realtek RTL8852B chip.
The device info from /sys/kernel/debug/usb/devices as below.
T: Bus=01 Lev=01 Prnt=01 Port=07 Cnt=03 Dev#= 5 Spd=12 MxCh= 0 D: Ver= 1.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0489 ProdID=e125 Rev= 0.00 S: Manufacturer=Realtek S: Product=Bluetooth Radio S: SerialNumber=00e04c000001 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
Signed-off-by: Hilda Wu hildawu@realtek.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btusb.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index e384ef6ff050d..aec6da3e37ef5 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -555,6 +555,8 @@ static const struct usb_device_id quirks_table[] = { BTUSB_WIDEBAND_SPEECH }, { USB_DEVICE(0x13d3, 0x3572), .driver_info = BTUSB_REALTEK | BTUSB_WIDEBAND_SPEECH }, + { USB_DEVICE(0x0489, 0xe125), .driver_info = BTUSB_REALTEK | + BTUSB_WIDEBAND_SPEECH },
/* Realtek 8852BT/8852BE-VT Bluetooth devices */ { USB_DEVICE(0x0bda, 0x8520), .driver_info = BTUSB_REALTEK |
linux-stable-mirror@lists.linaro.org