- Linux-stable-mirror - lists.linaro.org

[PATCH v3 rc] iommu: Skip PASID validation for devices without PASID capability

by Tushar Dave

Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly. pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing. However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices. This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID. Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored. Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable(a)vger.kernel.org Signed-off-by: Tushar Dave <tdave(a)nvidia.com> --- changes in v3: - addressed review comment from Vasant. drivers/iommu/iommu.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60aed01e54f2..636fc68a8ec0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3329,10 +3329,12 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain, int ret; for_each_group_device(group, device) { - ret = domain->ops->set_dev_pasid(domain, device->dev, - pasid, NULL); - if (ret) - goto err_revert; + if (device->dev->iommu->max_pasids > 0) { + ret = domain->ops->set_dev_pasid(domain, device->dev, + pasid, NULL); + if (ret) + goto err_revert; + } } return 0; @@ -3342,7 +3344,8 @@ static int __iommu_set_group_pasid(struct iommu_domain *domain, for_each_group_device(group, device) { if (device == last_gdev) break; - iommu_remove_dev_pasid(device->dev, pasid, domain); + if (device->dev->iommu->max_pasids > 0) + iommu_remove_dev_pasid(device->dev, pasid, domain); } return ret; } @@ -3353,8 +3356,10 @@ static void __iommu_remove_group_pasid(struct iommu_group *group, { struct group_device *device; - for_each_group_device(group, device) - iommu_remove_dev_pasid(device->dev, pasid, domain); + for_each_group_device(group, device) { + if (device->dev->iommu->max_pasids > 0) + iommu_remove_dev_pasid(device->dev, pasid, domain); + } } /* @@ -3391,7 +3396,13 @@ int iommu_attach_device_pasid(struct iommu_domain *domain, mutex_lock(&group->mutex); for_each_group_device(group, device) { - if (pasid >= device->dev->iommu->max_pasids) { + /* + * Skip PASID validation for devices without PASID support + * (max_pasids = 0). These devices cannot issue transactions + * with PASID, so they don't affect group's PASID usage. + */ + if ((device->dev->iommu->max_pasids > 0) && + (pasid >= device->dev->iommu->max_pasids)) { ret = -EINVAL; goto out_unlock; } -- 2.34.1

2 months

6
8
0 0

[PATCH net v2] vmxnet3: update MTU after device quiesce

by Ronak Doshi

Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces the device and then reactivates it for the ESXi to know about the new mtu. So, technically the OS stack can start using the new mtu before ESXi knows about the new mtu. This can lead to issues for TSO packets which use mss as per the new mtu configured. This patch fixes this issue by moving the mtu write after device quiesce. Cc: stable(a)vger.kernel.org Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3") Signed-off-by: Ronak Doshi <ronak.doshi(a)broadcom.com> Acked-by: Guolin Yang <guolin.yang(a)broadcom.com> Changes v1-> v2: Moved MTU write after destroy of rx rings --- drivers/net/vmxnet3/vmxnet3_drv.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 3df6aabc7e33..c676979c7ab9 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -3607,8 +3607,6 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu) struct vmxnet3_adapter *adapter = netdev_priv(netdev); int err = 0; - WRITE_ONCE(netdev->mtu, new_mtu); - /* * Reset_work may be in the middle of resetting the device, wait for its * completion. @@ -3622,6 +3620,7 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu) /* we need to re-create the rx queue based on the new mtu */ vmxnet3_rq_destroy_all(adapter); + WRITE_ONCE(netdev->mtu, new_mtu); vmxnet3_adjust_rx_ring_size(adapter); err = vmxnet3_rq_create_all(adapter); if (err) { @@ -3638,6 +3637,8 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu) "Closing it\n", err); goto out; } + } else { + WRITE_ONCE(netdev->mtu, new_mtu); } out: -- 2.45.2

2 months

2
1
0 0

[PATCH net v3] net: dsa: microchip: linearize skb for tail-tagging switches

by Jakob Unterwurzacher

The pointer arithmentic for accessing the tail tag only works for linear skbs. For nonlinear skbs, it reads uninitialized memory inside the skb headroom, essentially randomizing the tag. I have observed it gets set to 6 most of the time. Example where ksz9477_rcv thinks that the packet from port 1 comes from port 6 (which does not exist for the ksz9896 that's in use), dropping the packet. Debug prints added by me (not included in this patch): [ 256.645337] ksz9477_rcv:323 tag0=6 [ 256.645349] skb len=47 headroom=78 headlen=0 tailroom=0 mac=(64,14) mac_len=14 net=(78,0) trans=78 shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0)) csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x00f8 pkttype=1 iif=3 priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) [ 256.645377] dev name=end1 feat=0x0002e10200114bb3 [ 256.645386] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645395] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645403] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645411] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 256.645420] skb headroom: 00000040: ff ff ff ff ff ff 00 1c 19 f2 e2 db 08 06 [ 256.645428] skb frag: 00000000: 00 01 08 00 06 04 00 01 00 1c 19 f2 e2 db 0a 02 [ 256.645436] skb frag: 00000010: 00 83 00 00 00 00 00 00 0a 02 a0 2f 00 00 00 00 [ 256.645444] skb frag: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 [ 256.645452] ksz_common_rcv:92 dsa_conduit_find_user returned NULL Call skb_linearize before trying to access the tag. This patch fixes ksz9477_rcv which is used by the ksz9896 I have at hand, and also applies the same fix to ksz8795_rcv which seems to have the same problem. Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher(a)cherry.de> CC: stable(a)vger.kernel.org Fixes: 016e43a26bab ("net: dsa: ksz: Add KSZ8795 tag code") Fixes: 8b8010fb7876 ("dsa: add support for Microchip KSZ tail tagging") Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com> --- v1: Initial submission https://lore.kernel.org/netdev/20250509071820.4100022-1-jakob.unterwurzache… v2: Add Fixes tags, Cc stable, "[PATCH net]" prefix https://lore.kernel.org/netdev/20250512144416.3697054-1-jakob.unterwurzache… v3: Move declarations before skb_linearize call, pick up Reviewed-By net/dsa/tag_ksz.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c index c33d4bf17929..0b7564b53790 100644 --- a/net/dsa/tag_ksz.c +++ b/net/dsa/tag_ksz.c @@ -140,7 +140,12 @@ static struct sk_buff *ksz8795_xmit(struct sk_buff *skb, struct net_device *dev) static struct sk_buff *ksz8795_rcv(struct sk_buff *skb, struct net_device *dev) { - u8 *tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN; + u8 *tag; + + if (skb_linearize(skb)) + return NULL; + + tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN; return ksz_common_rcv(skb, dev, tag[0] & KSZ8795_TAIL_TAG_EG_PORT_M, KSZ_EGRESS_TAG_LEN); @@ -311,10 +316,16 @@ static struct sk_buff *ksz9477_xmit(struct sk_buff *skb, static struct sk_buff *ksz9477_rcv(struct sk_buff *skb, struct net_device *dev) { - /* Tag decoding */ - u8 *tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN; - unsigned int port = tag[0] & KSZ9477_TAIL_TAG_EG_PORT_M; unsigned int len = KSZ_EGRESS_TAG_LEN; + unsigned int port; + u8 *tag; + + if (skb_linearize(skb)) + return NULL; + + /* Tag decoding */ + tag = skb_tail_pointer(skb) - KSZ_EGRESS_TAG_LEN; + port = tag[0] & KSZ9477_TAIL_TAG_EG_PORT_M; /* Extra 4-bytes PTP timestamp */ if (tag[0] & KSZ9477_PTP_TAG_INDICATION) { -- 2.39.5

2 months

2
1
0 0

[PATCH] wifi: ath11k: fix rx completion meta data corruption

by Johan Hovold

Add the missing memory barrier to make sure that the REO dest ring descriptor is read after the head pointer to avoid using stale data on weakly ordered architectures like aarch64. This may fix the ring-buffer corruption worked around by commit f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination ring") by silently discarding data, and may possibly also address user reported errors like: ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Cc: stable(a)vger.kernel.org # 5.6 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- As I reported here: https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@hovoldconsulting.com/ the ath11k and ath12k appear to be missing a number of memory barriers that are required on weakly ordered architectures like aarch64 to avoid memory corruption issues. Here's a fix for one more such case which people already seem to be hitting. Note that I've seen one "msdu_done" bit not set warning also with this patch so whether it helps with that at all remains to be seen. I'm CCing Jens and Steev that see these warnings frequently and that may be able to help out with testing. Johan drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c index 029ecf51c9ef..0a57b337e4c6 100644 --- a/drivers/net/wireless/ath/ath11k/dp_rx.c +++ b/drivers/net/wireless/ath/ath11k/dp_rx.c @@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, struct ath11k *ar; struct hal_reo_dest_ring *desc; enum hal_reo_dest_ring_push_reason push_reason; - u32 cookie; + u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0; int i; for (i = 0; i < MAX_RADIOS; i++) @@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, try_again: ath11k_hal_srng_access_begin(ab, srng); + /* Make sure descriptor is read after the head pointer. */ + dma_rmb(); + while (likely(desc = (struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab, srng))) { cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE, - desc->buf_addr_info.info1); + READ_ONCE(desc->buf_addr_info.info1)); buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID, cookie); mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie); @@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, num_buffs_reaped[mac_id]++; + info0 = READ_ONCE(desc->info0); push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON, - desc->info0); + info0); if (unlikely(push_reason != HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) { dev_kfree_skb_any(msdu); @@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, continue; } - rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 & + rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0); + rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0); + + rxcb->is_first_msdu = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU); - rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 & + rxcb->is_last_msdu = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU); - rxcb->is_continuation = !!(desc->rx_msdu_info.info0 & + rxcb->is_continuation = !!(rx_msdu_info0 & RX_MSDU_DESC_INFO0_MSDU_CONTINUATION); rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID, - desc->rx_mpdu_info.meta_data); + READ_ONCE(desc->rx_mpdu_info.meta_data)); rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM, - desc->rx_mpdu_info.info0); + rx_mpdu_info0); rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM, - desc->info0); + info0); rxcb->mac_id = mac_id; __skb_queue_tail(&msdu_list[mac_id], msdu); -- 2.48.1

2 months

5
7
0 0

[PATCH] wifi: ath11k: fix ring-buffer corruption

by Johan Hovold

Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes breaks and the log fills up with errors like: ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492 ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484 which based on a quick look at the driver seemed to indicate some kind of ring-buffer corruption. Miaoqing Pan tracked it down to the host seeing the updated destination ring head pointer before the updated descriptor, and the error handling for that in turn leaves the ring buffer in an inconsistent state. Add the missing memory barrier to make sure that the descriptor is read after the head pointer to address the root cause of the corruption while fixing up the error handling in case there are ever any (ordering) bugs on the device side. Note that the READ_ONCE() are only needed to avoid compiler mischief in case the ring-buffer helpers are ever inlined. Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623 Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com Cc: Miaoqing Pan <quic_miaoqing(a)quicinc.com> Cc: stable(a)vger.kernel.org # 5.6 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> --- drivers/net/wireless/ath/ath11k/ce.c | 11 +++++------ drivers/net/wireless/ath/ath11k/hal.c | 4 ++-- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c index e66e86bdec20..9d8efec46508 100644 --- a/drivers/net/wireless/ath/ath11k/ce.c +++ b/drivers/net/wireless/ath/ath11k/ce.c @@ -393,11 +393,10 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe, goto err; } + /* Make sure descriptor is read after the head pointer. */ + dma_rmb(); + *nbytes = ath11k_hal_ce_dst_status_get_length(desc); - if (*nbytes == 0) { - ret = -EIO; - goto err; - } *skb = pipe->dest_ring->skb[sw_index]; pipe->dest_ring->skb[sw_index] = NULL; @@ -430,8 +429,8 @@ static void ath11k_ce_recv_process_cb(struct ath11k_ce_pipe *pipe) dma_unmap_single(ab->dev, ATH11K_SKB_RXCB(skb)->paddr, max_nbytes, DMA_FROM_DEVICE); - if (unlikely(max_nbytes < nbytes)) { - ath11k_warn(ab, "rxed more than expected (nbytes %d, max %d)", + if (unlikely(max_nbytes < nbytes || nbytes == 0)) { + ath11k_warn(ab, "unexpected rx length (nbytes %d, max %d)", nbytes, max_nbytes); dev_kfree_skb_any(skb); continue; diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c index 61f4b6dd5380..8cb1505a5a0c 100644 --- a/drivers/net/wireless/ath/ath11k/hal.c +++ b/drivers/net/wireless/ath/ath11k/hal.c @@ -599,7 +599,7 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf) struct hal_ce_srng_dst_status_desc *desc = buf; u32 len; - len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags); + len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, READ_ONCE(desc->flags)); desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN; return len; @@ -829,7 +829,7 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng) srng->u.src_ring.cached_tp = *(volatile u32 *)srng->u.src_ring.tp_addr; } else { - srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr; + srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); /* Try to prefetch the next descriptor in the ring */ if (srng->flags & HAL_SRNG_FLAGS_CACHED) -- 2.48.1

2 months

7
6
0 0

[PATCH stable 5.15/6.1/6.6] af_unix: Clear oob_skb in scan_inflight().

by Kuniyuki Iwashima

Embryo socket is not queued in gc_candidates, so we can't drop a reference held by its oob_skb. Let's say we create listener and embryo sockets, send the listener's fd to the embryo as OOB data, and close() them without recv()ing the OOB data. There is a self-reference cycle like listener -> embryo.oob_skb -> listener , so this must be cleaned up by GC. Otherwise, the listener's refcnt is not released and sockets are leaked: # unshare -n # cat /proc/net/protocols | grep UNIX-STREAM UNIX-STREAM 1024 0 -1 NI 0 yes kernel ... # python3 >>> from array import array >>> from socket import * >>> >>> s = socket(AF_UNIX, SOCK_STREAM) >>> s.bind('\0test\0') >>> s.listen() >>> >>> c = socket(AF_UNIX, SOCK_STREAM) >>> c.connect(s.getsockname()) >>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB) 1 >>> quit() # cat /proc/net/protocols | grep UNIX-STREAM UNIX-STREAM 1024 3 -1 NI 0 yes kernel ... ^^^ 3 sockets still in use after FDs are close()d Let's drop the embryo socket's oob_skb ref in scan_inflight(). This also fixes a racy access to oob_skb that commit 9841991a446c ("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue lock.") fixed for the new Tarjan's algo-based GC. Fixes: 314001f0bf92 ("af_unix: Add OOB support") Reported-by: Lei Lu <llfamsec(a)gmail.com> Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com> --- This has no upstream commit because I replaced the entire GC in 6.10 and the new GC does not have this bug, and this fix is only applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6. --- --- net/unix/garbage.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2a758531e102..b3fbdf129944 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -102,13 +102,14 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), /* Process the descriptors of this socket */ int nfd = UNIXCB(skb).fp->count; struct file **fp = UNIXCB(skb).fp->fp; + struct unix_sock *u; while (nfd--) { /* Get the socket the fd matches if it indeed does so */ struct sock *sk = unix_get_socket(*fp++); if (sk) { - struct unix_sock *u = unix_sk(sk); + u = unix_sk(sk); /* Ignore non-candidates, they could * have been added to the queues after @@ -122,6 +123,13 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), } } if (hit && hitlist != NULL) { +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + u = unix_sk(x); + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb = NULL; + } +#endif __skb_unlink(skb, &x->sk_receive_queue); __skb_queue_tail(hitlist, skb); } @@ -299,17 +307,9 @@ void unix_gc(void) * which are creating the cycle(s). */ skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { + list_for_each_entry(u, &gc_candidates, link) scan_children(&u->sk, inc_inflight, &hitlist); -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif - } - /* not_cycle_list contains those sockets which do not make up a * cycle. Restore these to the inflight list. */ -- 2.39.5 (Apple Git-154)

2 months

5
8
0 0

[PATCH 5.10.y] fs: relax assertions on failure to encode file handles

by jianqi.ren.cn＠windriver.com

From: Amir Goldstein <amir73il(a)gmail.com> commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream. Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog… Reported-by: Dmitry Safonov <dima(a)arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3… Cc: stable(a)vger.kernel.org Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner(a)kernel.org> [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com> Signed-off-by: He Zhe <zhe.he(a)windriver.com> --- Verified the build test --- fs/notify/fdinfo.c | 4 +--- fs/overlayfs/copy_up.c | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 55081ae3a6ec..dd5bc6ffae85 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode) size = f.handle.handle_bytes >> 2; ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL); - if ((ret == FILEID_INVALID) || (ret < 0)) { - WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret); + if ((ret == FILEID_INVALID) || (ret < 0)) return; - } f.handle.handle_type = ret; f.handle.handle_bytes = size * sizeof(u32); diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 65ac504595ba..8557180bd7e1 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -302,9 +302,8 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper) buflen = (dwords << 2); err = -EIO; - if (WARN_ON(fh_type < 0) || - WARN_ON(buflen > MAX_HANDLE_SZ) || - WARN_ON(fh_type == FILEID_INVALID)) + if (fh_type < 0 || fh_type == FILEID_INVALID || + WARN_ON(buflen > MAX_HANDLE_SZ)) goto out_err; fh->fb.version = OVL_FH_VERSION; -- 2.34.1

2 months

2
1
0 0

[PATCH 6.1.y] fs: relax assertions on failure to encode file handles

by jianqi.ren.cn＠windriver.com

From: Amir Goldstein <amir73il(a)gmail.com> commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream. Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog… Reported-by: Dmitry Safonov <dima(a)arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3… Cc: stable(a)vger.kernel.org Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner(a)kernel.org> [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com> Signed-off-by: He Zhe <zhe.he(a)windriver.com> --- Verified the build test --- fs/notify/fdinfo.c | 4 +--- fs/overlayfs/copy_up.c | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 55081ae3a6ec..dd5bc6ffae85 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode) size = f.handle.handle_bytes >> 2; ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL); - if ((ret == FILEID_INVALID) || (ret < 0)) { - WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret); + if ((ret == FILEID_INVALID) || (ret < 0)) return; - } f.handle.handle_type = ret; f.handle.handle_bytes = size * sizeof(u32); diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 203b88293f6b..ced56696beeb 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -361,9 +361,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real, buflen = (dwords << 2); err = -EIO; - if (WARN_ON(fh_type < 0) || - WARN_ON(buflen > MAX_HANDLE_SZ) || - WARN_ON(fh_type == FILEID_INVALID)) + if (fh_type < 0 || fh_type == FILEID_INVALID || + WARN_ON(buflen > MAX_HANDLE_SZ)) goto out_err; fh->fb.version = OVL_FH_VERSION; -- 2.34.1

2 months

2
1
0 0

[PATCH 5.15.y] fs: relax assertions on failure to encode file handles

by jianqi.ren.cn＠windriver.com

From: Amir Goldstein <amir73il(a)gmail.com> commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream. Encoding file handles is usually performed by a filesystem >encode_fh() method that may fail for various reasons. The legacy users of exportfs_encode_fh(), namely, nfsd and name_to_handle_at(2) syscall are ready to cope with the possibility of failure to encode a file handle. There are a few other users of exportfs_encode_{fh,fid}() that currently have a WARN_ON() assertion when ->encode_fh() fails. Relax those assertions because they are wrong. The second linked bug report states commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles") in v6.6 as the regressing commit, but this is not accurate. The aforementioned commit only increases the chances of the assertion and allows triggering the assertion with the reproducer using overlayfs, inotify and drop_caches. Triggering this assertion was always possible with other filesystems and other reasons of ->encode_fh() failures and more particularly, it was also possible with the exact same reproducer using overlayfs that is mounted with options index=on,nfs_export=on also on kernels < v6.6. Therefore, I am not listing the aforementioned commit as a Fixes commit. Backport hint: this patch will have a trivial conflict applying to v6.6.y, and other trivial conflicts applying to stable kernels < v6.6. Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog… Reported-by: Dmitry Safonov <dima(a)arista.com> Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3… Cc: stable(a)vger.kernel.org Signed-off-by: Amir Goldstein <amir73il(a)gmail.com> Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com Signed-off-by: Christian Brauner <brauner(a)kernel.org> [Minor conflict resolved due to code context change.] Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com> Signed-off-by: He Zhe <zhe.he(a)windriver.com> --- Verified the build test --- fs/notify/fdinfo.c | 4 +--- fs/overlayfs/copy_up.c | 5 ++--- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c index 55081ae3a6ec..dd5bc6ffae85 100644 --- a/fs/notify/fdinfo.c +++ b/fs/notify/fdinfo.c @@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode) size = f.handle.handle_bytes >> 2; ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL); - if ((ret == FILEID_INVALID) || (ret < 0)) { - WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret); + if ((ret == FILEID_INVALID) || (ret < 0)) return; - } f.handle.handle_type = ret; f.handle.handle_bytes = size * sizeof(u32); diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index 5fc32483afed..c5d8ad610a37 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -366,9 +366,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real, buflen = (dwords << 2); err = -EIO; - if (WARN_ON(fh_type < 0) || - WARN_ON(buflen > MAX_HANDLE_SZ) || - WARN_ON(fh_type == FILEID_INVALID)) + if (fh_type < 0 || fh_type == FILEID_INVALID || + WARN_ON(buflen > MAX_HANDLE_SZ)) goto out_err; fh->fb.version = OVL_FH_VERSION; -- 2.34.1

2 months

2
1
0 0

[PATCH 6.1.y] riscv: mm: Fix the out of bound issue of vmemmap address

by Zhaoyang Li

From: Xu Lu <luxu.kernel(a)bytedance.com> [ Upstream commit f754f27e98f88428aaf6be6e00f5cbce97f62d4b ] In sparse vmemmap model, the virtual address of vmemmap is calculated as: ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)). And the struct page's va can be calculated with an offset: (vmemmap + (pfn)). However, when initializing struct pages, kernel actually starts from the first page from the same section that phys_ram_base belongs to. If the first page's physical address is not (phys_ram_base >> PAGE_SHIFT), then we get an va below VMEMMAP_START when calculating va for it's struct page. For example, if phys_ram_base starts from 0x82000000 with pfn 0x82000, the first page in the same section is actually pfn 0x80000. During init_unavailable_range(), we will initialize struct page for pfn 0x80000 with virtual address ((struct page *)VMEMMAP_START - 0x2000), which is below VMEMMAP_START as well as PCI_IO_END. This commit fixes this bug by introducing a new variable 'vmemmap_start_pfn' which is aligned with memory section size and using it to calculate vmemmap address instead of phys_ram_base. Fixes: a11dd49dcb93 ("riscv: Sparse-Memory/vmemmap out-of-bounds fix") Signed-off-by: Xu Lu <luxu.kernel(a)bytedance.com> Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Tested-by: Björn Töpel <bjorn(a)rivosinc.com> Reviewed-by: Björn Töpel <bjorn(a)rivosinc.com> Link: https://lore.kernel.org/r/20241209122617.53341-1-luxu.kernel@bytedance.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> Signed-off-by: Zhaoyang Li <lizy04(a)hust.edu.cn> --- arch/riscv/include/asm/page.h | 1 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/mm/init.c | 17 ++++++++++++++++- 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h index 86048c60f700..fd861ac47a89 100644 --- a/arch/riscv/include/asm/page.h +++ b/arch/riscv/include/asm/page.h @@ -115,6 +115,7 @@ struct kernel_mapping { extern struct kernel_mapping kernel_map; extern phys_addr_t phys_ram_base; +extern unsigned long vmemmap_start_pfn; #define is_kernel_mapping(x) \ ((x) >= kernel_map.virt_addr && (x) < (kernel_map.virt_addr + kernel_map.size)) diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 7d1688f850c3..bb19a643c5c2 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -79,7 +79,7 @@ * Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel * is configured with CONFIG_SPARSEMEM_VMEMMAP enabled. */ -#define vmemmap ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)) +#define vmemmap ((struct page *)VMEMMAP_START - vmemmap_start_pfn) #define PCI_IO_SIZE SZ_16M #define PCI_IO_END VMEMMAP_START diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index ba2210b553f9..72b3462babbf 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -22,6 +22,7 @@ #include <linux/hugetlb.h> #include <asm/fixmap.h> +#include <asm/sparsemem.h> #include <asm/tlbflush.h> #include <asm/sections.h> #include <asm/soc.h> @@ -52,6 +53,13 @@ EXPORT_SYMBOL(pgtable_l5_enabled); phys_addr_t phys_ram_base __ro_after_init; EXPORT_SYMBOL(phys_ram_base); +#ifdef CONFIG_SPARSEMEM_VMEMMAP +#define VMEMMAP_ADDR_ALIGN (1ULL << SECTION_SIZE_BITS) + +unsigned long vmemmap_start_pfn __ro_after_init; +EXPORT_SYMBOL(vmemmap_start_pfn); +#endif + unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss; EXPORT_SYMBOL(empty_zero_page); @@ -210,8 +218,12 @@ static void __init setup_bootmem(void) memblock_reserve(vmlinux_start, vmlinux_end - vmlinux_start); phys_ram_end = memblock_end_of_DRAM(); - if (!IS_ENABLED(CONFIG_XIP_KERNEL)) + if (!IS_ENABLED(CONFIG_XIP_KERNEL)) { phys_ram_base = memblock_start_of_DRAM(); +#ifdef CONFIG_SPARSEMEM_VMEMMAP + vmemmap_start_pfn = round_down(phys_ram_base, VMEMMAP_ADDR_ALIGN) >> PAGE_SHIFT; +#endif +} /* * Reserve physical address space that would be mapped to virtual * addresses greater than (void *)(-PAGE_SIZE) because: @@ -946,6 +958,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa) kernel_map.xiprom_sz = (uintptr_t)(&_exiprom) - (uintptr_t)(&_xiprom); phys_ram_base = CONFIG_PHYS_RAM_BASE; +#ifdef CONFIG_SPARSEMEM_VMEMMAP + vmemmap_start_pfn = round_down(phys_ram_base, VMEMMAP_ADDR_ALIGN) >> PAGE_SHIFT; +#endif kernel_map.phys_addr = (uintptr_t)CONFIG_PHYS_RAM_BASE; kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_start); -- 2.25.1

2 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror