From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
Hi,
Commit 653143ed73ec ("serial: sh-sci: Check if TX data was written
to device in .tx_empty()") doesn't apply cleanly on top of v6.6.y
stable tree. This series adjust it. Along with it, propose for
backporting other sh-sci fixes.
Please provide your feedback.
Thank you,
Claudiu Beznea
Claudiu Beznea (4):
serial: sh-sci: Check if TX data was written to device in .tx_empty()
serial: sh-sci: Move runtime PM enable to sci_probe_single()
serial: sh-sci: Clean sci_ports[0] after at earlycon exit
serial: sh-sci: Increment the runtime usage counter for the earlycon
device
drivers/tty/serial/sh-sci.c | 97 ++++++++++++++++++++++++++++++-------
1 file changed, 79 insertions(+), 18 deletions(-)
--
2.43.0
From: Chuck Lever <chuck.lever(a)oracle.com>
[ Upstream commit e6faac3f58c7c4176b66f63def17a34232a17b0e ]
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and
NFSv4 both define file size as an unsigned 64-bit type. Thus there
is a range of valid file size values an NFS client can send that is
already larger than Linux can handle.
Currently decode_fattr4() dumps a full u64 value into ia_size. If
that value happens to be larger than S64_MAX, then ia_size
underflows. I'm about to fix up the NFSv3 behavior as well, so let's
catch the underflow in the common code path: nfsd_setattr().
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
(cherry picked from commit e6faac3f58c7c4176b66f63def17a34232a17b0e)
[Larry: backport to 5.4.y. Minor conflict resolved due to missing commit 2f221d6f7b88
attr: handle idmapped mounts]
Signed-off-by: Larry Bassel <larry.bassel(a)oracle.com>
---
fs/nfsd/vfs.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6aa968bee0ce..bee4fdf6e239 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -448,6 +448,10 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
.ia_size = iap->ia_size,
};
+ host_err = -EFBIG;
+ if (iap->ia_size < 0)
+ goto out_unlock;
+
host_err = notify_change(dentry, &size_attr, NULL);
if (host_err)
goto out_unlock;
--
2.46.0
A not-so-careful NAT46 BPF program can crash the kernel
if it indiscriminately flips ingress packets from v4 to v6:
BUG: kernel NULL pointer dereference, address: 0000000000000000
ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
ipv6_rcv (net/ipv6/ip6_input.c:306:8)
process_backlog (net/core/dev.c:6186:4)
napi_poll (net/core/dev.c:6906:9)
net_rx_action (net/core/dev.c:7028:13)
do_softirq (kernel/softirq.c:462:3)
netif_rx (net/core/dev.c:5326:3)
dev_loopback_xmit (net/core/dev.c:4015:2)
ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
NF_HOOK (./include/linux/netfilter.h:314:9)
ip_mc_output (net/ipv4/ip_output.c:400:5)
dst_output (./include/net/dst.h:459:9)
ip_local_out (net/ipv4/ip_output.c:130:9)
ip_send_skb (net/ipv4/ip_output.c:1496:8)
udp_send_skb (net/ipv4/udp.c:1040:8)
udp_sendmsg (net/ipv4/udp.c:1328:10)
The output interface has a 4->6 program attached at ingress.
We try to loop the multicast skb back to the sending socket.
Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
and changes skb->protocol to v6. We enter ip6_rcv_core which
tries to use skb_dst(). But the dst is still an IPv4 one left
after IPv4 mcast output.
Clear the dst in all BPF helpers which change the protocol.
Try to preserve metadata dsts, those may carry non-routing
metadata.
Cc: stable(a)vger.kernel.org
Reviewed-by: Maciej Żenczykowski <maze(a)google.com>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v3:
- go back to v1, the encap / decap which don't change proto
will be added in -next
- split out the test
v2: https://lore.kernel.org/20250607204734.1588964-1-kuba@kernel.org
- drop on encap/decap
- fix typo (protcol)
- add the test to the Makefile
v1: https://lore.kernel.org/20250604210604.257036-1-kuba@kernel.org
I wonder if we should not skip ingress (tc_skip_classify?)
for looped back packets in the first place. But that doesn't
seem robust enough vs multiple redirections to solve the crash.
Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
work either, since BPF can change pkt_type arbitrarily.
CC: martin.lau(a)linux.dev
CC: daniel(a)iogearbox.net
CC: john.fastabend(a)gmail.com
CC: eddyz87(a)gmail.com
CC: sdf(a)fomichev.me
CC: haoluo(a)google.com
CC: willemb(a)google.com
CC: william.xuanziyang(a)huawei.com
CC: alan.maguire(a)oracle.com
CC: bpf(a)vger.kernel.org
CC: edumazet(a)google.com
CC: maze(a)google.com
CC: shuah(a)kernel.org
CC: linux-kselftest(a)vger.kernel.org
CC: yonghong.song(a)linux.dev
---
net/core/filter.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 327ca73f9cd7..7a72f766aacf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3233,6 +3233,13 @@ static const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
.arg1_type = ARG_PTR_TO_CTX,
};
+static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
+{
+ skb->protocol = htons(proto);
+ if (skb_valid_dst(skb))
+ skb_dst_drop(skb);
+}
+
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
{
/* Caller already did skb_cow() with len as headroom,
@@ -3329,7 +3336,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
}
}
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
skb_clear_hash(skb);
return 0;
@@ -3359,7 +3366,7 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
}
}
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
skb_clear_hash(skb);
return 0;
@@ -3550,10 +3557,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
/* Match skb->protocol to new outer l3 protocol */
if (skb->protocol == htons(ETH_P_IP) &&
flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV6)
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
else if (skb->protocol == htons(ETH_P_IPV6) &&
flags & BPF_F_ADJ_ROOM_ENCAP_L3_IPV4)
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
}
if (skb_is_gso(skb)) {
@@ -3606,10 +3613,10 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
/* Match skb->protocol to new outer l3 protocol */
if (skb->protocol == htons(ETH_P_IP) &&
flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
- skb->protocol = htons(ETH_P_IPV6);
+ bpf_skb_change_protocol(skb, ETH_P_IPV6);
else if (skb->protocol == htons(ETH_P_IPV6) &&
flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
- skb->protocol = htons(ETH_P_IP);
+ bpf_skb_change_protocol(skb, ETH_P_IP);
if (skb_is_gso(skb)) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
--
2.49.0
From: Kairui Song <kasong(a)tencent.com>
On seeing a swap entry PTE, userfaultfd_move does a lockless swap
cache lookup, and tries to move the found folio to the faulting vma.
Currently, it relies on checking the PTE value to ensure that the moved
folio still belongs to the src swap entry and that no new folio has
been added to the swap cache, which turns out to be unreliable.
While working and reviewing the swap table series with Barry, following
existing races are observed and reproduced [1]:
In the example below, move_pages_pte is moving src_pte to dst_pte,
where src_pte is a swap entry PTE holding swap entry S1, and S1
is not in the swap cache:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B can use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry PTE
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of
multiple rare events, so it's very unlikely to happen, but with a
deliberately constructed reproducer and increased time window, it
can be reproduced easily.
This can be fixed by checking if the folio returned by filemap is the
valid swap cache folio after acquiring the folio lock.
Another similar race is possible: filemap_get_folio may return NULL, but
folio (A) could be swapped in and then swapped out again using the same
swap entry after the lookup. In such a case, folio (A) may remain in the
swap cache, so it must be moved too:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1, and S1 is not in swap cache
folio = filemap_get_folio(S1)
// Got NULL
... < interrupted again> ...
<swapin folio A and free S1>
<swapout folio A re-using S1>
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// folio A is ignored !!!
Fix this by checking the swap cache again after acquiring the src_pte
lock. And to avoid the filemap overhead, we check swap_map directly [2].
The SWP_SYNCHRONOUS_IO path does make the problem more complex, but so
far we don't need to worry about that, since folios can only be exposed
to the swap cache in the swap out path, and this is covered in this
patch by checking the swap cache again after acquiring the src_pte lock.
Testing with a simple C program that allocates and moves several GB of
memory did not show any observable performance change.
Cc: <stable(a)vger.kernel.org>
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoS… [1]
Link: https://lore.kernel.org/all/CAGsJ_4yJhJBo16XhiC-nUzSheyX-V3-nFE+tAi=8Y560K8… [2]
Signed-off-by: Kairui Song <kasong(a)tencent.com>
Reviewed-by: Lokesh Gidra <lokeshgidra(a)google.com>
---
V1: https://lore.kernel.org/linux-mm/20250530201710.81365-1-ryncsn@gmail.com/
Changes:
- Check swap_map instead of doing a filemap lookup after acquiring the
PTE lock to minimize critical section overhead [ Barry Song, Lokesh Gidra ]
V2: https://lore.kernel.org/linux-mm/20250601200108.23186-1-ryncsn@gmail.com/
Changes:
- Move the folio and swap check inside move_swap_pte to avoid skipping
the check and potential overhead [ Lokesh Gidra ]
- Add a READ_ONCE for the swap_map read to ensure it reads a up to dated
value.
V3: https://lore.kernel.org/all/20250602181419.20478-1-ryncsn@gmail.com/
Changes:
- Add more comments and more context in commit message.
mm/userfaultfd.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index bc473ad21202..8253978ee0fb 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1084,8 +1084,18 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
pte_t orig_dst_pte, pte_t orig_src_pte,
pmd_t *dst_pmd, pmd_t dst_pmdval,
spinlock_t *dst_ptl, spinlock_t *src_ptl,
- struct folio *src_folio)
+ struct folio *src_folio,
+ struct swap_info_struct *si, swp_entry_t entry)
{
+ /*
+ * Check if the folio still belongs to the target swap entry after
+ * acquiring the lock. Folio can be freed in the swap cache while
+ * not locked.
+ */
+ if (src_folio && unlikely(!folio_test_swapcache(src_folio) ||
+ entry.val != src_folio->swap.val))
+ return -EAGAIN;
+
double_pt_lock(dst_ptl, src_ptl);
if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte,
@@ -1102,6 +1112,25 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
if (src_folio) {
folio_move_anon_rmap(src_folio, dst_vma);
src_folio->index = linear_page_index(dst_vma, dst_addr);
+ } else {
+ /*
+ * Check if the swap entry is cached after acquiring the src_pte
+ * lock. Otherwise, we might miss a newly loaded swap cache folio.
+ *
+ * Check swap_map directly to minimize overhead, READ_ONCE is sufficient.
+ * We are trying to catch newly added swap cache, the only possible case is
+ * when a folio is swapped in and out again staying in swap cache, using the
+ * same entry before the PTE check above. The PTL is acquired and released
+ * twice, each time after updating the swap_map's flag. So holding
+ * the PTL here ensures we see the updated value. False positive is possible,
+ * e.g. SWP_SYNCHRONOUS_IO swapin may set the flag without touching the
+ * cache, or during the tiny synchronization window between swap cache and
+ * swap_map, but it will be gone very quickly, worst result is retry jitters.
+ */
+ if (READ_ONCE(si->swap_map[swp_offset(entry)]) & SWAP_HAS_CACHE) {
+ double_pt_unlock(dst_ptl, src_ptl);
+ return -EAGAIN;
+ }
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
@@ -1412,7 +1441,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
}
err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte,
orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval,
- dst_ptl, src_ptl, src_folio);
+ dst_ptl, src_ptl, src_folio, si, entry);
}
out:
--
2.49.0
The patch titled
Subject: mm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud()
has been added to the -mm mm-new branch. Its filename is
mm-huge_memory-dont-ignore-queried-cachemode-in-vmf_insert_pfn_pud.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/huge_memory: don't ignore queried cachemode in vmf_insert_pfn_pud()
Date: Wed, 11 Jun 2025 14:06:52 +0200
Patch series "mm/huge_memory: vmf_insert_folio_*() and
vmf_insert_pfn_pud() fixes", v2.
While working on improving vm_normal_page() and friends, I stumbled over
this issues: refcounted "normal" pages must not be marked using
pmd_special() / pud_special().
Fortunately, so far there doesn't seem to be serious damage.
This patch (of 3):
We setup the cache mode but ... don't forward the updated pgprot to
insert_pfn_pud().
Only a problem on x86-64 PAT when mapping PFNs using PUDs that require a
special cachemode.
Fix it by using the proper pgprot where the cachemode was setup.
Identified by code inspection.
Link: https://lkml.kernel.org/r/20250611120654.545963-1-david@redhat.com
Link: https://lkml.kernel.org/r/20250611120654.545963-2-david@redhat.com
Fixes: 7b806d229ef1 ("mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entries")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-dont-ignore-queried-cachemode-in-vmf_insert_pfn_pud
+++ a/mm/huge_memory.c
@@ -1516,10 +1516,9 @@ static pud_t maybe_pud_mkwrite(pud_t pud
}
static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
- pud_t *pud, pfn_t pfn, bool write)
+ pud_t *pud, pfn_t pfn, pgprot_t prot, bool write)
{
struct mm_struct *mm = vma->vm_mm;
- pgprot_t prot = vma->vm_page_prot;
pud_t entry;
if (!pud_none(*pud)) {
@@ -1581,7 +1580,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_
pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot);
ptl = pud_lock(vma->vm_mm, vmf->pud);
- insert_pfn_pud(vma, addr, vmf->pud, pfn, write);
+ insert_pfn_pud(vma, addr, vmf->pud, pfn, pgprot, write);
spin_unlock(ptl);
return VM_FAULT_NOPAGE;
@@ -1625,7 +1624,7 @@ vm_fault_t vmf_insert_folio_pud(struct v
add_mm_counter(mm, mm_counter_file(folio), HPAGE_PUD_NR);
}
insert_pfn_pud(vma, addr, vmf->pud, pfn_to_pfn_t(folio_pfn(folio)),
- write);
+ vma->vm_page_prot, write);
spin_unlock(ptl);
return VM_FAULT_NOPAGE;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-gup-revert-mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
mm-gup-remove-vm_bug_ons.patch
mm-gup-remove-vm_bug_ons-fix.patch
mm-huge_memory-dont-ignore-queried-cachemode-in-vmf_insert_pfn_pud.patch
mm-huge_memory-dont-mark-refcounted-folios-special-in-vmf_insert_folio_pmd.patch
mm-huge_memory-dont-mark-refcounted-folios-special-in-vmf_insert_folio_pud.patch
From: Francesco Dolcini <francesco.dolcini(a)toradex.com>
This reverts commit 4fcfcbe457349267fe048524078e8970807c1a5b.
That commit introduces a regression, when HT40 mode is enabled,
received packets are lost, this was experience with W8997 with both
SDIO-UART and SDIO-SDIO variants. From an initial investigation the
issue solves on its own after some time, but it's not clear what is
the reason. Given that this was just a performance optimization, let's
revert it till we have a better understanding of the issue and a proper
fix.
Cc: Jeff Chen <jeff.chen_1(a)nxp.com>
Cc: stable(a)vger.kernel.org
Fixes: 4fcfcbe45734 ("wifi: mwifiex: Fix HT40 bandwidth issue.")
Closes: https://lore.kernel.org/all/20250603203337.GA109929@francesco-nb/
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
---
v2: fix reverted commit sha
v1: https://lore.kernel.org/all/20250605100313.34014-1-francesco@dolcini.it/
---
drivers/net/wireless/marvell/mwifiex/11n.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/11n.c b/drivers/net/wireless/marvell/mwifiex/11n.c
index 738bafc3749b..66f0f5377ac1 100644
--- a/drivers/net/wireless/marvell/mwifiex/11n.c
+++ b/drivers/net/wireless/marvell/mwifiex/11n.c
@@ -403,14 +403,12 @@ mwifiex_cmd_append_11n_tlv(struct mwifiex_private *priv,
if (sband->ht_cap.cap & IEEE80211_HT_CAP_SUP_WIDTH_20_40 &&
bss_desc->bcn_ht_oper->ht_param &
- IEEE80211_HT_PARAM_CHAN_WIDTH_ANY) {
- chan_list->chan_scan_param[0].radio_type |=
- CHAN_BW_40MHZ << 2;
+ IEEE80211_HT_PARAM_CHAN_WIDTH_ANY)
SET_SECONDARYCHAN(chan_list->chan_scan_param[0].
radio_type,
(bss_desc->bcn_ht_oper->ht_param &
IEEE80211_HT_PARAM_CHA_SEC_OFFSET));
- }
+
*buffer += struct_size(chan_list, chan_scan_param, 1);
ret_len += struct_size(chan_list, chan_scan_param, 1);
}
--
2.39.5
The patch titled
Subject: mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-revert-mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked"
Date: Wed, 11 Jun 2025 15:13:14 +0200
After commit 1aaf8c122918 ("mm: gup: fix infinite loop within
__get_longterm_locked") we are able to longterm pin folios that are not
supposed to get longterm pinned, simply because they temporarily have the
LRU flag cleared (esp. temporarily isolated).
For example, two __get_longterm_locked() callers can race, or
__get_longterm_locked() can race with anything else that temporarily
isolates folios.
The introducing commit mentions the use case of a driver that uses
vm_ops->fault to insert pages allocated through cma_alloc() into the page
tables, assuming they can later get longterm pinned. These pages/ folios
would never have the LRU flag set and consequently cannot get isolated.
There is no known in-tree user making use of that so far, fortunately.
To handle that in the future -- and avoid retrying forever to
isolate/migrate them -- we will need a different mechanism for the CMA
area *owner* to indicate that it actually already allocated the page and
is fine with longterm pinning it. The LRU flag is not suitable for that.
Probably we can lookup the relevant CMA area and query the bitmap; we only
have have to care about some races, probably. If already allocated, we
could just allow longterm pinning)
Anyhow, let's fix the "must not be longterm pinned" problem first by
reverting the original commit.
Link: https://lkml.kernel.org/r/20250611131314.594529-1-david@redhat.com
Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/all/20250522092755.GA3277597@tiffany/
Reported-by: Hyesoo Yu <hyesoo.yu(a)samsung.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: Aijun Sun <aijun.sun(a)unisoc.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/mm/gup.c~mm-gup-revert-mm-gup-fix-infinite-loop-within-__get_longterm_locked
+++ a/mm/gup.c
@@ -2303,13 +2303,13 @@ static void pofs_unpin(struct pages_or_f
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
-static void collect_longterm_unpinnable_folios(
+static unsigned long collect_longterm_unpinnable_folios(
struct list_head *movable_folio_list,
struct pages_or_folios *pofs)
{
+ unsigned long i, collected = 0;
struct folio *prev_folio = NULL;
bool drain_allow = true;
- unsigned long i;
for (i = 0; i < pofs->nr_entries; i++) {
struct folio *folio = pofs_get_folio(pofs, i);
@@ -2321,6 +2321,8 @@ static void collect_longterm_unpinnable_
if (folio_is_longterm_pinnable(folio))
continue;
+ collected++;
+
if (folio_is_device_coherent(folio))
continue;
@@ -2342,6 +2344,8 @@ static void collect_longterm_unpinnable_
NR_ISOLATED_ANON + folio_is_file_lru(folio),
folio_nr_pages(folio));
}
+
+ return collected;
}
/*
@@ -2418,9 +2422,11 @@ static long
check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
{
LIST_HEAD(movable_folio_list);
+ unsigned long collected;
- collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
- if (list_empty(&movable_folio_list))
+ collected = collect_longterm_unpinnable_folios(&movable_folio_list,
+ pofs);
+ if (!collected)
return 0;
return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-gup-revert-mm-gup-fix-infinite-loop-within-__get_longterm_locked.patch
mm-gup-remove-vm_bug_ons.patch
mm-gup-remove-vm_bug_ons-fix.patch
From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org>
Hi,
Jürgen Groß reported some bugs in interaction of ITS mitigation with
execmem [1] when running on a Xen PV guest.
These patches fix the issue by moving all the permissions management of
ITS memory allocated from execmem into ITS code.
I didn't test on a real Xen PV guest, but I emulated !PSE variant by
force-disabling the ROX cache in x86::execmem_arch_setup().
Peter, I took liberty to put your SoB in the patch that actually
implements the execmem permissions management in ITS, please let me know
if I need to update something about the authorship.
The patches are against v6.15.
They are also available in git:
https://web.git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=i…
[1] https://lore.kernel.org/all/20250528123557.12847-2-jgross@suse.com/
Juergen Gross (1):
x86/mm/pat: don't collapse pages without PSE set
Mike Rapoport (Microsoft) (3):
x86/Kconfig: only enable ROX cache in execmem when STRICT_MODULE_RWX is set
x86/its: move its_pages array to struct mod_arch_specific
Revert "mm/execmem: Unify early execmem_cache behaviour"
Peter Zijlstra (Intel) (1):
x86/its: explicitly manage permissions for ITS pages
arch/x86/Kconfig | 2 +-
arch/x86/include/asm/module.h | 8 ++++
arch/x86/kernel/alternative.c | 89 ++++++++++++++++++++++++++---------
arch/x86/mm/init_32.c | 3 --
arch/x86/mm/init_64.c | 3 --
arch/x86/mm/pat/set_memory.c | 3 ++
include/linux/execmem.h | 8 +---
include/linux/module.h | 5 --
mm/execmem.c | 40 ++--------------
9 files changed, 82 insertions(+), 79 deletions(-)
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.47.2
Number of apqn target list entries contained in 'nr_apqns' variable is
determined by userspace via an ioctl call so the result of the product in
calculation of size passed to memdup_user() may overflow.
In this case the actual size of the allocated area and the value
describing it won't be in sync leading to various types of unpredictable
behaviour later.
Return an error if an overflow is detected. Note that it is different
from when nr_apqns is zero - that case is considered valid and should be
handled in subsequent pkey_handler implementations.
Found by Linux Verification Center (linuxtesting.org).
Fixes: f2bbc96e7cfa ("s390/pkey: add CCA AES cipher key support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/s390/crypto/pkey_api.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/pkey_api.c b/drivers/s390/crypto/pkey_api.c
index cef60770f68b..a731fc9c62a7 100644
--- a/drivers/s390/crypto/pkey_api.c
+++ b/drivers/s390/crypto/pkey_api.c
@@ -83,10 +83,15 @@ static void *_copy_key_from_user(void __user *ukey, size_t keylen)
static void *_copy_apqns_from_user(void __user *uapqns, size_t nr_apqns)
{
+ size_t size;
+
if (!uapqns || nr_apqns == 0)
return NULL;
- return memdup_user(uapqns, nr_apqns * sizeof(struct pkey_apqn));
+ if (check_mul_overflow(nr_apqns, sizeof(struct pkey_apqn), &size))
+ return ERR_PTR(-EINVAL);
+
+ return memdup_user(uapqns, size);
}
static int pkey_ioctl_genseck(struct pkey_genseck __user *ugs)
--
2.49.0
Hello Cassio,
On 10/06/2025 21:31:48+0100, Cassio Neri wrote:
> Hi all,
>
> Although untested, I'm pretty sure that with very small changes, the
> previous revision (1d1bb12) can handle dates prior to 1970-01-01 with no
> need to add extra branches or arithmetic operations. Indeed, 1d1bb12
> contains:
>
> <code>
> /* time must be positive */
> days = div_s64_rem(time, 86400, &secs);
>
> /* day of the week, 1970-01-01 was a Thursday */
> tm->tm_wday = (days + 4) % 7;
>
> /* long comments */
>
> udays = ((u32) days) + 719468;
> </code>
>
> This could have been changed to:
>
> <code>
> /* time must be >= -719468 * 86400 which corresponds to 0000-03-01 */
> udays = div_u64_rem(time + 719468 * 86400, 86400, &secs);
>
> /* day of the week, 0000-03-01 was a Wednesday (in the proleptic Gregorian
> calendar) */
> tm->tm_wday = (days + 3) % 7;
>
> /* long comments */
> </code>
>
> Indeed, the addition of 719468 * 86400 to `time` makes `days` to be 719468
> more than it should be. Therefore, in the calculation of `udays`, the
> addition of 719468 becomes unnecessary and thus, `udays == days`. Moreover,
> this means that `days` can be removed altogether and replaced by `udays`.
> (Not the other way around because in the remaining code `udays` must be
> u32.)
>
> Now, 719468 % 7 = 1 and thus tm->wday is 1 day after what it should be and
> we correct that by adding 3 instead of 4.
>
> Therefore, I suggest these changes on top of 1d1bb12 instead of those made
> in 7df4cfe. Since you're working on this, can I please kindly suggest two
> other changes?
>
> 1) Change the reference provided in the long comment. It should say, "The
> following algorithm is, basically, Figure 12 of Neri and Schneider [1]" and
> [1] should refer to the published article:
>
> Neri C, Schneider L. Euclidean affine functions and their application to
> calendar algorithms. Softw Pract Exper. 2023;53(4):937-970. doi:
> 10.1002/spe.3172
> https://doi.org/10.1002/spe.3172
>
> The article is much better written and clearer than the pre-print currently
> referred to.
>
Thanks for your input, I wanted to look again at your paper and make those
optimizations which is why I took so long to review the original patch.
Unfortunately, I didn't have the time before the merge window.
I would also gladly take patches for this if you are up for the task.
> 2) Function rtc_time64_to_tm_test_date_range in drivers/rtc/lib_test.c, is
> a kunit test that checks the result for everyday in a 160000 years range
> starting at 1970-01-01. It'd be nice if this test is adapted to the new
> code and starts at 1900-01-01 (technically, it could start at 0000-03-01
> but since tm->year counts from 1900, it would be weird to see tm->year ==
> -1900 to mean that the calendar year is 0.) Also 160000 is definitely an
> overkill (my bad!) and a couple of thousands of years, say 3000, should be
> more than safe for anyone. :-)
This is also something on my radar as some have been complaining about the time
it takes to run those tests.
>
> Many thanks,
> Cassio.
>
>
>
> On Tue, 10 Jun 2025 at 08:35, Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> wrote:
>
> > From: Alexandre Mergnat <amergnat(a)baylibre.com>
> >
> > commit 7df4cfef8b351fec3156160bedfc7d6d29de4cce upstream.
> >
> > Conversion of dates before 1970 is still relevant today because these
> > dates are reused on some hardwares to store dates bigger than the
> > maximal date that is representable in the device's native format.
> > This prominently and very soon affects the hardware covered by the
> > rtc-mt6397 driver that can only natively store dates in the interval
> > 1900-01-01 up to 2027-12-31. So to store the date 2028-01-01 00:00:00
> > to such a device, rtc_time64_to_tm() must do the right thing for
> > time=-2208988800.
> >
> > Signed-off-by: Alexandre Mergnat <amergnat(a)baylibre.com>
> > Reviewed-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> > Link:
> > https://lore.kernel.org/r/20250428-enable-rtc-v4-1-2b2f7e3f9349@baylibre.com
> > Signed-off-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
> > ---
> > drivers/rtc/lib.c | 24 +++++++++++++++++++-----
> > 1 file changed, 19 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/rtc/lib.c b/drivers/rtc/lib.c
> > index fe361652727a..13b5b1f20465 100644
> > --- a/drivers/rtc/lib.c
> > +++ b/drivers/rtc/lib.c
> > @@ -46,24 +46,38 @@ EXPORT_SYMBOL(rtc_year_days);
> > * rtc_time64_to_tm - converts time64_t to rtc_time.
> > *
> > * @time: The number of seconds since 01-01-1970 00:00:00.
> > - * (Must be positive.)
> > + * Works for values since at least 1900
> > * @tm: Pointer to the struct rtc_time.
> > */
> > void rtc_time64_to_tm(time64_t time, struct rtc_time *tm)
> > {
> > - unsigned int secs;
> > - int days;
> > + int days, secs;
> >
> > u64 u64tmp;
> > u32 u32tmp, udays, century, day_of_century, year_of_century, year,
> > day_of_year, month, day;
> > bool is_Jan_or_Feb, is_leap_year;
> >
> > - /* time must be positive */
> > + /*
> > + * Get days and seconds while preserving the sign to
> > + * handle negative time values (dates before 1970-01-01)
> > + */
> > days = div_s64_rem(time, 86400, &secs);
> >
> > + /*
> > + * We need 0 <= secs < 86400 which isn't given for negative
> > + * values of time. Fixup accordingly.
> > + */
> > + if (secs < 0) {
> > + days -= 1;
> > + secs += 86400;
> > + }
> > +
> > /* day of the week, 1970-01-01 was a Thursday */
> > tm->tm_wday = (days + 4) % 7;
> > + /* Ensure tm_wday is always positive */
> > + if (tm->tm_wday < 0)
> > + tm->tm_wday += 7;
> >
> > /*
> > * The following algorithm is, basically, Proposition 6.3 of Neri
> > @@ -93,7 +107,7 @@ void rtc_time64_to_tm(time64_t time, struct rtc_time
> > *tm)
> > * thus, is slightly different from [1].
> > */
> >
> > - udays = ((u32) days) + 719468;
> > + udays = days + 719468;
> >
> > u32tmp = 4 * udays + 3;
> > century = u32tmp / 146097;
> > --
> > 2.49.0
> >
> >
This reverts commit e70c301faece15b618e54b613b1fd6ece3dd05b4.
Commit <e70c301faece> ("block: don't reorder requests in
blk_add_rq_to_plug") reversed how requests are stored in the blk_plug
list, this had significant impact on bio merging with requests exist on
the plug list. This impact has been reported in [1] and could easily be
reproducible using 4k randwrite fio benchmark on an NVME based SSD without
having any filesystem on the disk.
My benchmark is:
fio --time_based --name=benchmark --size=50G --rw=randwrite \
--runtime=60 --filename="/dev/nvme1n1" --ioengine=psync \
--randrepeat=0 --iodepth=1 --fsync=64 --invalidate=1 \
--verify=0 --verify_fatal=0 --blocksize=4k --numjobs=4 \
--group_reporting
On 1.9TiB SSD(180K Max IOPS) attached to i3.16xlarge AWS EC2 instance.
Kernel | fio (B.W MiB/sec) | I/O size (iostat)
--------------+---------------------+--------------------
6.15.1 | 362 | 2KiB
6.15.1+revert | 660 (+82%) | 4KiB
--------------+---------------------+--------------------
I have run iostat while the fio benchmark was running and was able to
see that the I/O size seen on the disk is shown as 2KB without this revert
while it's 4KB with the revert. In the bad case the write bandwidth
is capped at around 362MiB/sec which almost 2KiB * 180K IOPS so we are
hitting the SSD Disk IOPS limit which is 180K. After the revert the I/O
size has been doubled to 4KiB hence the bandwidth has been almost doubled
as we no longer hit the Disk IOPS limit.
I have done some tracing using bpftrace & bcc and was able to conclude that
the reason behind the I/O size discrepancy with the revert is that this
fio benchmark is subimitting each 4k I/O as 2 contiguous 2KB bios.
In the good case each 2 bios are merged in a 4KB request that's then been
submitted to the disk while in the bad case 2K bios are submitted to the
disk without merging because blk_attempt_plug_merge() failed to merge
them as seen below.
**Without the revert**
[12:12:28]
r::blk_attempt_plug_merge():int:$retval
COUNT EVENT
5618 $retval = 1
176578 $retval = 0
**With the revert**
[12:11:43]
r::blk_attempt_plug_merge():int:$retval
COUNT EVENT
146684 $retval = 0
146686 $retval = 1
In blk_attempt_plug_merge() we are iterating ithrought the plug list
from head to tail looking for a request with which we can merge the
most recently submitted bio.
With commit <e70c301faece> ("block: don't reorder requests in
blk_add_rq_to_plug") the most recent request will be at the tail so
blk_attempt_plug_merge() will fail because it tries to merge bio with
the plug list head. In blk_attempt_plug_merge() we don't iterate across
the whole plug list because as we exit the loop once we fail merging in
blk_attempt_bio_merge().
In commit <bc490f81731> ("block: change plugging to use a singly linked
list") the plug list has been changed to single linked list so there's
no way to iterate the list from tail to head which is the only way to
mitigate the impact on bio merging if we want to keep commit <e70c301faece>
("block: don't reorder requests in blk_add_rq_to_plug").
Given that moving plug list to a single linked list was mainly for
performance reason then let's revert commit <e70c301faece> ("block: don't
reorder requests in blk_add_rq_to_plug") for now to mitigate the
reported performance regression.
[1] https://lore.kernel.org/lkml/202412122112.ca47bcec-lkp@intel.com/
Cc: stable(a)vger.kernel.org # 6.12
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reported-by: Hagar Hemdan <hagarhem(a)amazon.com>
Reported-and-bisected-by: Shaoying Xu <shaoyi(a)amazon.com>
Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze(a)amazon.com>
---
block/blk-mq.c | 4 ++--
drivers/block/virtio_blk.c | 2 +-
drivers/nvme/host/pci.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c2697db59109..28965cac19fb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1394,7 +1394,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
*/
if (!plug->has_elevator && (rq->rq_flags & RQF_SCHED_TAGS))
plug->has_elevator = true;
- rq_list_add_tail(&plug->mq_list, rq);
+ rq_list_add_head(&plug->mq_list, rq);
plug->rq_count++;
}
@@ -2846,7 +2846,7 @@ static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
rq_list_add_tail(&requeue_list, rq);
continue;
}
- list_add_tail(&rq->queuelist, &list);
+ list_add(&rq->queuelist, &list);
depth++;
} while (!rq_list_empty(&plug->mq_list));
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 7cffea01d868..7992a171f905 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -513,7 +513,7 @@ static void virtio_queue_rqs(struct rq_list *rqlist)
vq = this_vq;
if (virtblk_prep_rq_batch(req))
- rq_list_add_tail(&submit_list, req);
+ rq_list_add_head(&submit_list, req); /* reverse order */
else
rq_list_add_tail(&requeue_list, req);
}
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f1dd804151b1..5f7da42f9dac 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1026,7 +1026,7 @@ static void nvme_queue_rqs(struct rq_list *rqlist)
nvmeq = req->mq_hctx->driver_data;
if (nvme_prep_rq_batch(nvmeq, req))
- rq_list_add_tail(&submit_list, req);
+ rq_list_add_head(&submit_list, req); /* reverse order */
else
rq_list_add_tail(&requeue_list, req);
}
--
2.47.1
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
When requesting pins whose intr_detection_width setting is not 1 or 2
for interrupts (for example by running `gpiomon -c 0 113` on RB2), we'll
hit a BUG() in msm_gpio_irq_set_type(). Potentially crashing the kernel
due to an invalid request from user-space is not optimal, so let's go
through the pins and mark those that would fail the check as invalid for
the irq chip as we should not even register them as available irqs.
This function can be extended if we determine that there are more
corner-cases like this.
Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
drivers/pinctrl/qcom/pinctrl-msm.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index f012ea88aa22c..77e0c2f023455 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -1038,6 +1038,24 @@ static bool msm_gpio_needs_dual_edge_parent_workaround(struct irq_data *d,
test_bit(d->hwirq, pctrl->skip_wake_irqs);
}
+static void msm_gpio_irq_init_valid_mask(struct gpio_chip *gc,
+ unsigned long *valid_mask,
+ unsigned int ngpios)
+{
+ struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+ const struct msm_pingroup *g;
+ int i;
+
+ bitmap_fill(valid_mask, ngpios);
+
+ for (i = 0; i < ngpios; i++) {
+ g = &pctrl->soc->groups[i];
+ if (g->intr_detection_width != 1 &&
+ g->intr_detection_width != 2)
+ clear_bit(i, valid_mask);
+ }
+}
+
static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -1441,6 +1459,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
girq->default_type = IRQ_TYPE_NONE;
girq->handler = handle_bad_irq;
girq->parents[0] = pctrl->irq;
+ girq->init_valid_mask = msm_gpio_irq_init_valid_mask;
ret = gpiochip_add_data(&pctrl->chip, pctrl);
if (ret) {
--
2.48.1
From: Dan Aloni <dan.aloni(a)vastdata.com>
[ Upstream commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33 ]
If there are failures then we must not leave the non-NULL pointers with
the error value, otherwise `rpcrdma_ep_destroy` gets confused and tries
free them, resulting in an Oops.
Signed-off-by: Dan Aloni <dan.aloni(a)vastdata.com>
Acked-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
(cherry picked from commit a9c10b5b3b67b3750a10c8b089b2e05f5e176e33)
[Larry: backport to 5.4.y. Minor conflict resolved due to missing commit 93aa8e0a9de80
xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep]
Signed-off-by: Larry Bassel <larry.bassel(a)oracle.com>
---
net/sunrpc/xprtrdma/verbs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cfae1a871578..4fd3f632a2af 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -525,6 +525,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(sendcq)) {
rc = PTR_ERR(sendcq);
+ sendcq = NULL;
goto out1;
}
@@ -533,6 +534,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
IB_POLL_WORKQUEUE);
if (IS_ERR(recvcq)) {
rc = PTR_ERR(recvcq);
+ recvcq = NULL;
goto out2;
}
--
2.46.0
Hi Kees,
Bill's PR to disable __counted_by for "whole struct" __bdos cases has now
been merged into 19.1.3 [1], so here's the patch to disable __counted_by
for clang versions < 19.1.3 in the kernel.
Hopefully in the near future __counted_by for whole struct __bdos can be
enabled once again in coordination between the kernel, gcc, and clang.
There has been recent progress on this in [2] thanks to Tavian.
Also see previous discussion on the mailing list [3]
Thanks to everyone for moving this issue along. In particular, Bill for
his PR to clang/llvm, Kees and Thorsten for reproducers of the two issues,
Nathan for Kconfig-ifying this patch, and Miguel for reviewing.
Info for the stable team:
This patch should be backported to kernels >= 6.6 to make sure that those
build correctly with the effected clang versions. This patch cherry-picks
cleanly onto linux-6.11.y. For linux-6.6.y three prerequiste commits are
neded:
16c31dd7fdf6: Compiler Attributes: counted_by: bump min gcc version
2993eb7a8d34: Compiler Attributes: counted_by: fixup clang URL
231dc3f0c936: lkdtm/bugs: Improve warning message for compilers without counted_by support
There are still two merge conflicts even with those prerequistes.
Here's the correct resolution:
1. include/linux/compiler_types.h:
use the incoming change until before (but not including) the
"Apply __counted_by() when the Endianness matches to increase test coverage."
comment
2. lib/overflow_kunit.c:
HEAD is correct
[1] https://github.com/llvm/llvm-project/pull/112786
[2] https://github.com/llvm/llvm-project/pull/112636
[3] https://lore.kernel.org/lkml/3E304FB2-799D-478F-889A-CDFC1A52DCD8@toblux.co…
Best Regards
Jan
Jan Hendrik Farr (1):
Compiler Attributes: disable __counted_by for clang < 19.1.3
drivers/misc/lkdtm/bugs.c | 2 +-
include/linux/compiler_attributes.h | 13 -------------
include/linux/compiler_types.h | 19 +++++++++++++++++++
init/Kconfig | 9 +++++++++
lib/overflow_kunit.c | 2 +-
5 files changed, 30 insertions(+), 15 deletions(-)
--
2.47.0
Hi Greg,
Please cherry-pick this patch series into 5.10.y stable. It
includes a feature that fixes CVE-2022-0500 which allows a user with
cap_bpf privileges to get root privileges. The patch that fixes
the bug is
patch 6/8: bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM
The rest are the depedences required by the fix patch.
This patchset has been merged in mainline v5.17 and backported to v5.16[1]
and v5.15[2]
Tested by compile, build and run through the bpf selftest test_progs.
Before:
./test_progs -t ksyms_btf/write_check
test_ksyms_btf:PASS:btf_exists 0 nsec
test_write_check:FAIL:skel_open unexpected load of a prog writing to ksym memory
#44/3 write_check:FAIL
#44 ksyms_btf:FAIL
Summary: 0/0 PASSED, 0 SKIPPED, 2 FAILED
After:
./test_progs -t ksyms_btf/write_check
#44/3 write_check:OK
#44 ksyms_btf:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
[1] https://lore.kernel.org/all/Yg6cixLJFoxDmp+I@kroah.com/
[2] https://lore.kernel.org/all/Ymupcl2JshcWjmMD@kroah.com/
Hao Luo (8):
bpf: Introduce composable reg, ret and arg types.
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
bpf: Introduce MEM_RDONLY flag
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
bpf/selftests: Test PTR_TO_RDONLY_MEM
include/linux/bpf.h | 98 +++-
include/linux/bpf_verifier.h | 18 +
kernel/bpf/btf.c | 8 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/helpers.c | 10 +-
kernel/bpf/map_iter.c | 4 +-
kernel/bpf/ringbuf.c | 2 +-
kernel/bpf/verifier.c | 477 +++++++++---------
kernel/trace/bpf_trace.c | 22 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 62 +--
net/core/sock_map.c | 2 +-
.../selftests/bpf/prog_tests/ksyms_btf.c | 14 +
.../bpf/progs/test_ksyms_btf_write_check.c | 29 ++
14 files changed, 441 insertions(+), 309 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf_write_check.c
--
2.47.1
Hello,
this is a followup to
https://lore.kernel.org/stable/cover.1749223334.git.u.kleine-koenig@baylibr…
that handled backporting the two patches by Alexandre to the active
stable kernels between 6.15 and 5.15. Here comes a backport to 5.10.y, git
am handles application to 5.4.y just fine.
Compared to the backport for later kernels I included a major rework of
rtc_time64_to_tm() by Cassio Neri. (FTR: I checked, that commit by
Cassio Neri isn't the reason we need to fix rtc_time64_to_tm(), the
actual problem is older.)
Now that I completed the backport and did some final checks on it I
noticed that the problem fixed here is (TTBOMK) a theoretic one because
only drivers with .start_secs < 0 are known to have issues and in 5.10
and before there is no such driver. I'm uncertain if this should result
in not backporting the changes. I would tend to pick them anyhow, but
I won't argue on a veto.
Best regards
Uwe
Alexandre Mergnat (2):
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: Fix offset calculation for .start_secs < 0
Cassio Neri (1):
rtc: Improve performance of rtc_time64_to_tm(). Add tests.
drivers/rtc/Kconfig | 10 ++++
drivers/rtc/Makefile | 1 +
drivers/rtc/class.c | 2 +-
drivers/rtc/lib.c | 121 ++++++++++++++++++++++++++++++++---------
drivers/rtc/lib_test.c | 79 +++++++++++++++++++++++++++
5 files changed, 185 insertions(+), 28 deletions(-)
create mode 100644 drivers/rtc/lib_test.c
base-commit: 01e7e36b8606e5d4fddf795938010f7bfa3aa277
--
2.49.0
From: Jakub Kicinski <kuba(a)kernel.org>
commit f22b4b55edb507a2b30981e133b66b642be4d13f upstream.
I find the behavior of xa_for_each_start() slightly counter-intuitive.
It doesn't end the iteration by making the index point after the last
element. IOW calling xa_for_each_start() again after it "finished"
will run the body of the loop for the last valid element, instead
of doing nothing.
This works fine for netlink dumps if they terminate correctly
(i.e. coalesce or carefully handle NLM_DONE), but as we keep getting
reminded legacy dumps are unlikely to go away.
Fixing this generically at the xa_for_each_start() level seems hard -
there is no index reserved for "end of iteration".
ifindexes are 31b wide, tho, and iterator is ulong so for
for_each_netdev_dump() it's safe to go to the next element.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Jeremy Kerr <jk(a)codeconstruct.com.au>
---
The mctp RTM_GETADDR rework backport of acab78ae12c7 ("net: mctp: Don't
access ifa_index when missing") pulled 2d45eeb7d5d7 ("mctp: no longer
rely on net->dev_index_head[]") as a dependency. However, that change
relies on this backport for correct behaviour of for_each_netdev_dump().
Jakub mentions[1] that nothing should be relying on the old behaviour of
for_each_netdev_dump(), hence the backport.
[1]: https://lore.kernel.org/netdev/20250609083749.741c27f5@kernel.org/
This backport is only applicable to 6.6.y; the change hit upstream in
6.10.
---
include/linux/netdevice.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0b0a172337dbac5716e5e5556befd95b4c201f5b..030d9de2ba2d23aa80b4b02182883f022f553964 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3036,7 +3036,8 @@ extern rwlock_t dev_base_lock; /* Device list lock */
#define net_device_entry(lh) list_entry(lh, struct net_device, dev_list)
#define for_each_netdev_dump(net, d, ifindex) \
- xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex))
+ for (; (d = xa_find(&(net)->dev_by_index, &ifindex, \
+ ULONG_MAX, XA_PRESENT)); ifindex++)
static inline struct net_device *next_net_device(struct net_device *dev)
{
---
base-commit: c2603c511feb427b2b09f74b57816a81272932a1
change-id: 20250610-nl-dump-618700905d4f
Best regards,
--
Jeremy Kerr <jk(a)codeconstruct.com.au>
Fix compilation warning:
In file included from ./include/linux/kernel.h:15,
from ./include/linux/list.h:9,
from ./include/linux/module.h:12,
from net/ipv4/inet_hashtables.c:12:
net/ipv4/inet_hashtables.c: In function ‘inet_ehash_locks_alloc’:
./include/linux/minmax.h:20:35: warning: comparison of distinct pointer types lacks a cast
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^~
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
26 | (__typecheck(x, y) && __no_side_effects(x, y))
| ^~~~~~~~~~~
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
./include/linux/minmax.h:52:25: note: in expansion of macro ‘__careful_cmp’
52 | #define max(x, y) __careful_cmp(x, y, >)
| ^~~~~~~~~~~~~
net/ipv4/inet_hashtables.c:946:19: note: in expansion of macro ‘max’
946 | nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
| ^~~
CC block/badblocks.o
When warnings are treated as errors, this causes the build to fail.
The issue is a type mismatch between the operands passed to the max()
macro. Here, nblocks is an unsigned int, while the expression
num_online_nodes() * PAGE_SIZE / locksz is promoted to unsigned long.
This happens because:
- num_online_nodes() returns int
- PAGE_SIZE is typically defined as an unsigned long (depending on the
architecture)
- locksz is unsigned int
The resulting arithmetic expression is promoted to unsigned long.
Thus, the max() macro compares values of different types: unsigned int
vs unsigned long.
This issue was introduced in commit f8ece40786c9 ("tcp: bring back NUMA
dispersion in inet_ehash_locks_alloc()") during the update from kernel
v5.10.237 to v5.10.238.
It does not exist in newer kernel branches (e.g., v5.15.185 and all 6.x
branches), because they include commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness.")
Fix the issue by using max_t(unsigned int, ...) to explicitly cast both
operands to the same type, avoiding the type mismatch and ensuring
correctness.
Fixes: f8ece40786c9 ("tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()")
Signed-off-by: Eliav Farber <farbere(a)amazon.com>
---
V1 -> V2: Use upstream commit SHA1 in reference
net/ipv4/inet_hashtables.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index fea74ab2a4be..ac2d185c04ef 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -943,7 +943,7 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U) * num_possible_cpus();
/* At least one page per NUMA node. */
- nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
+ nblocks = max_t(unsigned int, nblocks, num_online_nodes() * PAGE_SIZE / locksz);
nblocks = roundup_pow_of_two(nblocks);
--
2.47.1
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of scatterlists sync calls. dma_sync_sg_for_*()
functions have to be called with the number of elements originally passed
to dma_map_sg_*() function, not the one returned in sgtable's nents.
Fixes: 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
CC: stable(a)vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
---
drivers/dma-buf/udmabuf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 7eee3eb47a8e..c9d0c68d2fcb 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -264,8 +264,7 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
ubuf->sg = NULL;
}
} else {
- dma_sync_sg_for_cpu(dev, ubuf->sg->sgl, ubuf->sg->nents,
- direction);
+ dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
return ret;
@@ -280,7 +279,7 @@ static int end_cpu_udmabuf(struct dma_buf *buf,
if (!ubuf->sg)
return -EINVAL;
- dma_sync_sg_for_device(dev, ubuf->sg->sgl, ubuf->sg->nents, direction);
+ dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
return 0;
}
--
2.34.1