Hi All,
I dropped Harry Yoo's Reviewed-by from this version.
Chages since v8:
- fixed page_owner=on issue preventing bulk allocations on x86
Chages since v7:
- drop "unnecessary free pages" optimization
- fix error path page leak
Chages since v6:
- do not unnecessary free pages across iterations
Chages since v5:
- full error message included into commit description
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 92 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 78 insertions(+), 14 deletions(-)
--
2.45.2
Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state()
helper") introduced an optimization to xhci_reset() during xhci removal,
allowing it to bail out early without waiting for the reset to complete.
This behavior can cause issues on SNPS DWC3 USB controller with dual-role
capability. When the DWC3 controller exits host mode and removes xhci
while a reset is still in progress, and then tries to configure its
hardware for device mode, the ongoing reset leads to register access
issues; specifically, all register reads returns 0. These issues extend
beyond the xhci register space (which is expected during a reset) and
affect the entire DWC3 IP block, causing the DWC3 device mode to
malfunction.
To address this, introduce the `XHCI_FULL_RESET_ON_REMOVE` quirk. When this
quirk is set, xhci_reset() always completes its reset handshake, ensuring
the controller is in a fully reset state before proceeding.
Cc: stable(a)vger.kernel.org
Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Signed-off-by: Roy Luo <royluo(a)google.com>
---
drivers/usb/host/xhci-plat.c | 3 +++
drivers/usb/host/xhci.c | 8 +++++++-
drivers/usb/host/xhci.h | 1 +
3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 3155e3a842da..19c5c26a8e63 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -265,6 +265,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
if (device_property_read_bool(tmpdev, "xhci-skip-phy-init-quirk"))
xhci->quirks |= XHCI_SKIP_PHY_INIT;
+ if (device_property_read_bool(tmpdev, "xhci-full-reset-on-remove-quirk"))
+ xhci->quirks |= XHCI_FULL_RESET_ON_REMOVE;
+
device_property_read_u32(tmpdev, "imod-interval-ns",
&xhci->imod_interval);
}
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..4f091d618c01 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -198,6 +198,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us)
u32 command;
u32 state;
int ret;
+ unsigned int exit_state;
state = readl(&xhci->op_regs->status);
@@ -226,8 +227,13 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us)
if (xhci->quirks & XHCI_INTEL_HOST)
udelay(1000);
+ if (xhci->quirks & XHCI_FULL_RESET_ON_REMOVE)
+ exit_state = 0;
+ else
+ exit_state = XHCI_STATE_REMOVING;
+
ret = xhci_handshake_check_state(xhci, &xhci->op_regs->command,
- CMD_RESET, 0, timeout_us, XHCI_STATE_REMOVING);
+ CMD_RESET, 0, timeout_us, exit_state);
if (ret)
return ret;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 242ab9fbc8ae..ac65af788298 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1637,6 +1637,7 @@ struct xhci_hcd {
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
#define XHCI_ETRON_HOST BIT_ULL(49)
+#define XHCI_FULL_RESET_ON_REMOVE BIT_ULL(50)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.49.0.1112.g889b7c5bd8-goog
If s390_wiggle_split_folio() returns 0 because splitting a large folio
succeeded, we will return 0 from make_hva_secure() even though a retry
is required. Return -EAGAIN in that case.
Otherwise, we'll return 0 from gmap_make_secure(), and consequently from
unpack_one(). In kvm_s390_pv_unpack(), we assume that unpacking
succeeded and skip unpacking this page. Later on, we run into issues
and fail booting the VM.
So far, this issue was only observed with follow-up patches where we
split large pagecache XFS folios. Maybe it can also be triggered with
shmem?
We'll cleanup s390_wiggle_split_folio() a bit next, to also return 0
if no split was required.
Fixes: d8dfda5af0be ("KVM: s390: pv: fix race when making a page secure")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
arch/s390/kernel/uv.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index 9a5d5be8acf41..2cc3b599c7fe3 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -393,8 +393,11 @@ int make_hva_secure(struct mm_struct *mm, unsigned long hva, struct uv_cb_header
folio_walk_end(&fw, vma);
mmap_read_unlock(mm);
- if (rc == -E2BIG || rc == -EBUSY)
+ if (rc == -E2BIG || rc == -EBUSY) {
rc = s390_wiggle_split_folio(mm, folio, rc == -E2BIG);
+ if (!rc)
+ rc = -EAGAIN;
+ }
folio_put(folio);
return rc;
--
2.49.0
Generally PASID support requires ACS settings that usually create
single device groups, but there are some niche cases where we can get
multi-device groups and still have working PASID support. The primary
issue is that PCI switches are not required to treat PASID tagged TLPs
specially so appropriate ACS settings are required to route all TLPs to
the host bridge if PASID is going to work properly.
pci_enable_pasid() does check that each device that will use PASID has
the proper ACS settings to achieve this routing.
However, no-PASID devices can be combined with PASID capable devices
within the same topology using non-uniform ACS settings. In this case
the no-PASID devices may not have strict route to host ACS flags and
end up being grouped with the PASID devices.
This configuration fails to allow use of the PASID within the iommu
core code which wrongly checks if the no-PASID device supports PASID.
Fix this by ignoring no-PASID devices during the PASID validation. They
will never issue a PASID TLP anyhow so they can be ignored.
Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tushar Dave <tdave(a)nvidia.com>
---
changes in v3:
- addressed review comment from Vasant.
drivers/iommu/iommu.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60aed01e54f2..636fc68a8ec0 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3329,10 +3329,12 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
int ret;
for_each_group_device(group, device) {
- ret = domain->ops->set_dev_pasid(domain, device->dev,
- pasid, NULL);
- if (ret)
- goto err_revert;
+ if (device->dev->iommu->max_pasids > 0) {
+ ret = domain->ops->set_dev_pasid(domain, device->dev,
+ pasid, NULL);
+ if (ret)
+ goto err_revert;
+ }
}
return 0;
@@ -3342,7 +3344,8 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain,
for_each_group_device(group, device) {
if (device == last_gdev)
break;
- iommu_remove_dev_pasid(device->dev, pasid, domain);
+ if (device->dev->iommu->max_pasids > 0)
+ iommu_remove_dev_pasid(device->dev, pasid, domain);
}
return ret;
}
@@ -3353,8 +3356,10 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
{
struct group_device *device;
- for_each_group_device(group, device)
- iommu_remove_dev_pasid(device->dev, pasid, domain);
+ for_each_group_device(group, device) {
+ if (device->dev->iommu->max_pasids > 0)
+ iommu_remove_dev_pasid(device->dev, pasid, domain);
+ }
}
/*
@@ -3391,7 +3396,13 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
mutex_lock(&group->mutex);
for_each_group_device(group, device) {
- if (pasid >= device->dev->iommu->max_pasids) {
+ /*
+ * Skip PASID validation for devices without PASID support
+ * (max_pasids = 0). These devices cannot issue transactions
+ * with PASID, so they don't affect group's PASID usage.
+ */
+ if ((device->dev->iommu->max_pasids > 0) &&
+ (pasid >= device->dev->iommu->max_pasids)) {
ret = -EINVAL;
goto out_unlock;
}
--
2.34.1
Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces
the device and then reactivates it for the ESXi to know about the new mtu.
So, technically the OS stack can start using the new mtu before ESXi knows
about the new mtu.
This can lead to issues for TSO packets which use mss as per the new mtu
configured. This patch fixes this issue by moving the mtu write after
device quiesce.
Cc: stable(a)vger.kernel.org
Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Signed-off-by: Ronak Doshi <ronak.doshi(a)broadcom.com>
Acked-by: Guolin Yang <guolin.yang(a)broadcom.com>
Changes v1-> v2:
Moved MTU write after destroy of rx rings
---
drivers/net/vmxnet3/vmxnet3_drv.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 3df6aabc7e33..c676979c7ab9 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -3607,8 +3607,6 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
int err = 0;
- WRITE_ONCE(netdev->mtu, new_mtu);
-
/*
* Reset_work may be in the middle of resetting the device, wait for its
* completion.
@@ -3622,6 +3620,7 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
/* we need to re-create the rx queue based on the new mtu */
vmxnet3_rq_destroy_all(adapter);
+ WRITE_ONCE(netdev->mtu, new_mtu);
vmxnet3_adjust_rx_ring_size(adapter);
err = vmxnet3_rq_create_all(adapter);
if (err) {
@@ -3638,6 +3637,8 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
"Closing it\n", err);
goto out;
}
+ } else {
+ WRITE_ONCE(netdev->mtu, new_mtu);
}
out:
--
2.45.2
Add the missing memory barrier to make sure that the REO dest ring
descriptor is read after the head pointer to avoid using stale data on
weakly ordered architectures like aarch64.
This may fix the ring-buffer corruption worked around by commit
f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination
ring") by silently discarding data, and may possibly also address user
reported errors like:
ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set
Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Cc: stable(a)vger.kernel.org # 5.6
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
As I reported here:
https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@hovoldconsulting.com/
the ath11k and ath12k appear to be missing a number of memory barriers
that are required on weakly ordered architectures like aarch64 to avoid
memory corruption issues.
Here's a fix for one more such case which people already seem to be
hitting.
Note that I've seen one "msdu_done" bit not set warning also with this
patch so whether it helps with that at all remains to be seen. I'm CCing
Jens and Steev that see these warnings frequently and that may be able
to help out with testing.
Johan
drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index 029ecf51c9ef..0a57b337e4c6 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
struct ath11k *ar;
struct hal_reo_dest_ring *desc;
enum hal_reo_dest_ring_push_reason push_reason;
- u32 cookie;
+ u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0;
int i;
for (i = 0; i < MAX_RADIOS; i++)
@@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
try_again:
ath11k_hal_srng_access_begin(ab, srng);
+ /* Make sure descriptor is read after the head pointer. */
+ dma_rmb();
+
while (likely(desc =
(struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab,
srng))) {
cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE,
- desc->buf_addr_info.info1);
+ READ_ONCE(desc->buf_addr_info.info1));
buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID,
cookie);
mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie);
@@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
num_buffs_reaped[mac_id]++;
+ info0 = READ_ONCE(desc->info0);
push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON,
- desc->info0);
+ info0);
if (unlikely(push_reason !=
HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) {
dev_kfree_skb_any(msdu);
@@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
continue;
}
- rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 &
+ rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0);
+ rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0);
+
+ rxcb->is_first_msdu = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU);
- rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 &
+ rxcb->is_last_msdu = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU);
- rxcb->is_continuation = !!(desc->rx_msdu_info.info0 &
+ rxcb->is_continuation = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_MSDU_CONTINUATION);
rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID,
- desc->rx_mpdu_info.meta_data);
+ READ_ONCE(desc->rx_mpdu_info.meta_data));
rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM,
- desc->rx_mpdu_info.info0);
+ rx_mpdu_info0);
rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM,
- desc->info0);
+ info0);
rxcb->mac_id = mac_id;
__skb_queue_tail(&msdu_list[mac_id], msdu);
--
2.48.1
Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
breaks and the log fills up with errors like:
ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
which based on a quick look at the driver seemed to indicate some kind
of ring-buffer corruption.
Miaoqing Pan tracked it down to the host seeing the updated destination
ring head pointer before the updated descriptor, and the error handling
for that in turn leaves the ring buffer in an inconsistent state.
Add the missing memory barrier to make sure that the descriptor is read
after the head pointer to address the root cause of the corruption while
fixing up the error handling in case there are ever any (ordering) bugs
on the device side.
Note that the READ_ONCE() are only needed to avoid compiler mischief in
case the ring-buffer helpers are ever inlined.
Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623
Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com
Cc: Miaoqing Pan <quic_miaoqing(a)quicinc.com>
Cc: stable(a)vger.kernel.org # 5.6
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/net/wireless/ath/ath11k/ce.c | 11 +++++------
drivers/net/wireless/ath/ath11k/hal.c | 4 ++--
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c
index e66e86bdec20..9d8efec46508 100644
--- a/drivers/net/wireless/ath/ath11k/ce.c
+++ b/drivers/net/wireless/ath/ath11k/ce.c
@@ -393,11 +393,10 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe,
goto err;
}
+ /* Make sure descriptor is read after the head pointer. */
+ dma_rmb();
+
*nbytes = ath11k_hal_ce_dst_status_get_length(desc);
- if (*nbytes == 0) {
- ret = -EIO;
- goto err;
- }
*skb = pipe->dest_ring->skb[sw_index];
pipe->dest_ring->skb[sw_index] = NULL;
@@ -430,8 +429,8 @@ static void ath11k_ce_recv_process_cb(struct ath11k_ce_pipe *pipe)
dma_unmap_single(ab->dev, ATH11K_SKB_RXCB(skb)->paddr,
max_nbytes, DMA_FROM_DEVICE);
- if (unlikely(max_nbytes < nbytes)) {
- ath11k_warn(ab, "rxed more than expected (nbytes %d, max %d)",
+ if (unlikely(max_nbytes < nbytes || nbytes == 0)) {
+ ath11k_warn(ab, "unexpected rx length (nbytes %d, max %d)",
nbytes, max_nbytes);
dev_kfree_skb_any(skb);
continue;
diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index 61f4b6dd5380..8cb1505a5a0c 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -599,7 +599,7 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf)
struct hal_ce_srng_dst_status_desc *desc = buf;
u32 len;
- len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags);
+ len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, READ_ONCE(desc->flags));
desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN;
return len;
@@ -829,7 +829,7 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng)
srng->u.src_ring.cached_tp =
*(volatile u32 *)srng->u.src_ring.tp_addr;
} else {
- srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
+ srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
/* Try to prefetch the next descriptor in the ring */
if (srng->flags & HAL_SRNG_FLAGS_CACHED)
--
2.48.1
Embryo socket is not queued in gc_candidates, so we can't drop
a reference held by its oob_skb.
Let's say we create listener and embryo sockets, send the
listener's fd to the embryo as OOB data, and close() them
without recv()ing the OOB data.
There is a self-reference cycle like
listener -> embryo.oob_skb -> listener
, so this must be cleaned up by GC. Otherwise, the listener's
refcnt is not released and sockets are leaked:
# unshare -n
# cat /proc/net/protocols | grep UNIX-STREAM
UNIX-STREAM 1024 0 -1 NI 0 yes kernel ...
# python3
>>> from array import array
>>> from socket import *
>>>
>>> s = socket(AF_UNIX, SOCK_STREAM)
>>> s.bind('\0test\0')
>>> s.listen()
>>>
>>> c = socket(AF_UNIX, SOCK_STREAM)
>>> c.connect(s.getsockname())
>>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
1
>>> quit()
# cat /proc/net/protocols | grep UNIX-STREAM
UNIX-STREAM 1024 3 -1 NI 0 yes kernel ...
^^^
3 sockets still in use after FDs are close()d
Let's drop the embryo socket's oob_skb ref in scan_inflight().
This also fixes a racy access to oob_skb that commit 9841991a446c
("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
lock.") fixed for the new Tarjan's algo-based GC.
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: Lei Lu <llfamsec(a)gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
---
This has no upstream commit because I replaced the entire GC in
6.10 and the new GC does not have this bug, and this fix is only
applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
---
---
net/unix/garbage.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 2a758531e102..b3fbdf129944 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -102,13 +102,14 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
/* Process the descriptors of this socket */
int nfd = UNIXCB(skb).fp->count;
struct file **fp = UNIXCB(skb).fp->fp;
+ struct unix_sock *u;
while (nfd--) {
/* Get the socket the fd matches if it indeed does so */
struct sock *sk = unix_get_socket(*fp++);
if (sk) {
- struct unix_sock *u = unix_sk(sk);
+ u = unix_sk(sk);
/* Ignore non-candidates, they could
* have been added to the queues after
@@ -122,6 +123,13 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
}
}
if (hit && hitlist != NULL) {
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+ u = unix_sk(x);
+ if (u->oob_skb) {
+ WARN_ON_ONCE(skb_unref(u->oob_skb));
+ u->oob_skb = NULL;
+ }
+#endif
__skb_unlink(skb, &x->sk_receive_queue);
__skb_queue_tail(hitlist, skb);
}
@@ -299,17 +307,9 @@ void unix_gc(void)
* which are creating the cycle(s).
*/
skb_queue_head_init(&hitlist);
- list_for_each_entry(u, &gc_candidates, link) {
+ list_for_each_entry(u, &gc_candidates, link)
scan_children(&u->sk, inc_inflight, &hitlist);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
- if (u->oob_skb) {
- kfree_skb(u->oob_skb);
- u->oob_skb = NULL;
- }
-#endif
- }
-
/* not_cycle_list contains those sockets which do not make up a
* cycle. Restore these to the inflight list.
*/
--
2.39.5 (Apple Git-154)
From: Amir Goldstein <amir73il(a)gmail.com>
commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream.
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
fs/notify/fdinfo.c | 4 +---
fs/overlayfs/copy_up.c | 5 ++---
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 55081ae3a6ec..dd5bc6ffae85 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f.handle.handle_bytes >> 2;
ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f.handle.handle_type = ret;
f.handle.handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 65ac504595ba..8557180bd7e1 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -302,9 +302,8 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
--
2.34.1