We found some bugs when testing the XDP function of enetc driver,
and these bugs are easy to reproduce. This is not only causes XDP
to not work, but also the network cannot be restored after exiting
the XDP program. So the patch set is mainly to fix these bugs. For
details, please see the commit message of each patch.
---
v1 link: https://lore.kernel.org/bpf/20240919084104.661180-1-wei.fang@nxp.com/T/
v2 link: https://lore.kernel.org/netdev/20241008224806.2onzkt3gbslw5jxb@skbuf/T/
v3 link: https://lore.kernel.org/imx/20241009090327.146461-1-wei.fang@nxp.com/T/
---
Wei Fang (4):
net: enetc: remove xdp_drops statistic from enetc_xdp_drop()
net: enetc: block concurrent XDP transmissions during ring
reconfiguration
net: enetc: disable Tx BD rings after they are empty
net: enetc: disable NAPI after all rings are disabled
drivers/net/ethernet/freescale/enetc/enetc.c | 56 +++++++++++++++-----
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
2 files changed, 44 insertions(+), 13 deletions(-)
--
2.34.1
From: Bob Pearson <rpearsonhpe(a)gmail.com>
[ Upstream commit 2b23b6097303ed0ba5f4bc036a1c07b6027af5c6 ]
In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the
resp_pkts queue and then a decision is made whether to run the completer
task inline or schedule it. Finally the skb is dereferenced to bump a 'hw'
performance counter. This is wrong because if the completer task is
already running in a separate thread it may have already processed the skb
and freed it which can cause a seg fault. This has been observed
infrequently in testing at high scale.
This patch fixes this by changing the order of enqueuing the packet until
after the counter is accessed.
Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe(a)gmail.com>
Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats")
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Sherry: bp to fix CVE-2024-38544. Fix conflict due to missing commit:
dccb23f6c312 ("RDMA/rxe: Split rxe_run_task() into two subroutines")
which is not necessary to backport]
Signed-off-by: Sherry Yang <sherry.yang(a)oracle.com>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 48a3864ada29..250494306932 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -124,12 +124,12 @@ void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
{
int must_sched;
- skb_queue_tail(&qp->resp_pkts, skb);
-
- must_sched = skb_queue_len(&qp->resp_pkts) > 1;
+ must_sched = skb_queue_len(&qp->resp_pkts) > 0;
if (must_sched != 0)
rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
+ skb_queue_tail(&qp->resp_pkts, skb);
+
rxe_run_task(&qp->comp.task, must_sched);
}
--
2.46.0
This series fixes a wrong handling of the child node within the
for_each_child_of_node() by adding the missing calls to of_node_put() to
make it compatible with stable kernels that don't provide the scoped
variant of the macro, which is more secure and was introduced early this
year. The switch to the scoped macro is the next patch, which makes the
coe more robust and will avoid such issues in new error paths.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
dmaengine: mv_xor: fix child node refcount handling in early exit
dmaengine: mv_xor: switch to for_each_child_of_node_scoped()
drivers/dma/mv_xor.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241011-dma_mv_xor_of_node_put-5cc4126746a2
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.
Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:
kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: leases_conflict+0x68/0x370
kernel: __break_lease+0x204/0xc38
kernel: nfsd_open_break_lease+0x8c/0xf0 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20
Proposing to have laundromat thread hold the state_lock over both
marking thru revoking the delegation as well as making free_stateid
acquire state_lock before accessing the list. Making sure that
revoke_delegation() (ie kernel_setlease(unlock)) is called for
every delegation that was revoked and added to the reaper list.
CC: stable(a)vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
--- I can't figure out the Fixes: tag. Laundromat's behaviour has
been like that forever. But the free_stateid bits wont apply before
the 1e3577a4521e ("SUNRPC: discard sv_refcnt, and svc_get/svc_put").
But we used that fixes tag already with a previous fix for a different
problem.
---
fs/nfsd/nfs4state.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9c2b1d251ab3..c97907d7fb38 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6605,13 +6605,13 @@ nfs4_laundromat(struct nfsd_net *nn)
unhash_delegation_locked(dp, SC_STATUS_REVOKED);
list_add(&dp->dl_recall_lru, &reaplist);
}
- spin_unlock(&state_lock);
while (!list_empty(&reaplist)) {
dp = list_first_entry(&reaplist, struct nfs4_delegation,
dl_recall_lru);
list_del_init(&dp->dl_recall_lru);
revoke_delegation(dp);
}
+ spin_unlock(&state_lock);
spin_lock(&nn->client_lock);
while (!list_empty(&nn->close_lru)) {
@@ -7213,7 +7213,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (s->sc_status & SC_STATUS_REVOKED) {
spin_unlock(&s->sc_lock);
dp = delegstateid(s);
+ spin_lock(&state_lock);
list_del_init(&dp->dl_recall_lru);
+ spin_unlock(&state_lock);
spin_unlock(&cl->cl_lock);
nfs4_put_stid(s);
ret = nfs_ok;
--
2.43.5
As explained in the comment block this change adds, we can't tell what
userspace's intent is when the stack grows towards an inaccessible VMA.
We should ensure that, as long as code is compiled with something like
-fstack-check, a stack overflow in this code can never cause the main stack
to overflow into adjacent heap memory - so the bottom of a stack should
never be directly adjacent to an accessible VMA.
As suggested by Lorenzo, enforce this by blocking attempts to:
- make an inaccessible VMA accessible with mprotect() when it is too close
to a stack
- replace an inaccessible VMA with another VMA using MAP_FIXED when it is
too close to a stack
I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc
that mixes malloc(), pthread creation, and recursion in just the right way
such that the main stack overflows into malloc() arena memory, see the
linked list post.
I don't know of any specific scenario where this is actually exploitable,
but it seems like it could be a security problem for sufficiently unlucky
userspace.
Link: https://lore.kernel.org/r/CAG48ez2v=r9-37JADA5DgnZdMLCjcbVxAjLt5eH5uoBohRdq…
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
This is an attempt at the alternate fix approach suggested by Lorenzo.
This turned into more code than I would prefer to have for a scenario
like this...
Also, the way the gap is enforced in my patch for MAP_FIXED_NOREPLACE is
a bit ugly. In the existing code, __get_unmapped_area() normally already
enforces the stack gap even when it is called with a hint; but when
MAP_FIXED_NOREPLACE is used, we kinda lie to __get_unmapped_area() and
tell it we'll do a MAP_FIXED mapping (introduced in commit
a4ff8e8620d3f when MAP_FIXED_NOREPLACE was created), then afterwards
manually reject overlapping mappings.
So I ended up also doing the gap check separately for
MAP_FIXED_NOREPLACE.
The following test program exercises scenarios that could lead to the
stack becoming directly adjacent to another accessible VMA,
and passes with this patch applied:
<<<
int main(void) {
setbuf(stdout, NULL);
char *ptr = (char*)( (unsigned long)(STACK_POINTER() - (1024*1024*4)/*4MiB*/) & ~0xfffUL );
if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr)
err(1, "mmap distant-from-stack");
*(volatile char *)(ptr + 0x1000); /* expand stack */
system("echo;cat /proc/$PPID/maps;echo");
/* test transforming PROT_NONE mapping adjacent to stack */
if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC) == 0)
errx(1, "mprotect adjacent to stack allowed");
if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0) != MAP_FAILED)
errx(1, "MAP_FIXED adjacent to stack allowed");
if (munmap(ptr, 0x1000))
err(1, "munmap failed???");
/* test creating new mapping adjacent to stack */
if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0) != MAP_FAILED)
errx(1, "MAP_FIXED_NOREPLACE adjacent to stack allowed");
if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) == ptr)
errx(1, "mmap hint adjacent to stack accepted");
printf("all tests passed\n");
}
>>>
---
Changes in v2:
- Entirely new approach (suggested by Lorenzo)
- Link to v1: https://lore.kernel.org/r/20241008-stack-gap-inaccessible-v1-1-848d4d891f21…
---
include/linux/mm.h | 1 +
mm/mmap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
mm/mprotect.c | 6 +++++
3 files changed, 72 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2b0582..ecd4afc304ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3520,6 +3520,7 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
struct vm_area_struct *find_extend_vma_locked(struct mm_struct *,
unsigned long addr);
+bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..937361be3c48 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -359,6 +359,20 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
return -EEXIST;
}
+ /*
+ * This does two things:
+ *
+ * 1. Disallow MAP_FIXED replacing a PROT_NONE VMA adjacent to a stack
+ * with an accessible VMA.
+ * 2. Disallow MAP_FIXED_NOREPLACE creating a new accessible VMA
+ * adjacent to a stack.
+ */
+ if ((flags & (MAP_FIXED_NOREPLACE | MAP_FIXED)) &&
+ (prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
+ !(vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
+ overlaps_stack_gap(mm, addr, len))
+ return (flags & MAP_FIXED) ? -ENOMEM : -EEXIST;
+
if (flags & MAP_LOCKED)
if (!can_do_mlock())
return -EPERM;
@@ -1341,6 +1355,57 @@ struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
return vma;
}
+/*
+ * Does the specified VA range overlap the stack gap of a preceding or following
+ * stack VMA?
+ * Overlapping stack VMAs are ignored - so if someone deliberately creates a
+ * MAP_FIXED mapping in the middle of a stack or such, we let that go through.
+ *
+ * This is needed partly because userspace's intent when making PROT_NONE
+ * mappings is unclear; there are two different reasons for creating PROT_NONE
+ * mappings:
+ *
+ * A) Userspace wants to create its own guard mapping, for example for stacks.
+ * According to
+ * <https://lore.kernel.org/all/1499126133.2707.20.camel@decadent.org.uk/T/>,
+ * some Rust/Java programs do this with the main stack.
+ * Enforcing the kernel's stack gap between these userspace guard mappings and
+ * the main stack breaks stuff.
+ *
+ * B) Userspace wants to reserve some virtual address space for later mappings.
+ * This is done by memory allocators.
+ * In this case, we want to enforce a stack gap between the mapping and the
+ * stack.
+ *
+ * Because we can't tell these cases apart when a PROT_NONE mapping is created,
+ * we instead enforce the stack gap when a PROT_NONE mapping is made accessible
+ * (using mprotect()) or replaced with an accessible one (using MAP_FIXED).
+ */
+bool overlaps_stack_gap(struct mm_struct *mm, unsigned long addr, unsigned long len)
+{
+
+ struct vm_area_struct *vma, *prev_vma;
+
+ /* step 1: search for a non-overlapping following stack VMA */
+ vma = find_vma(mm, addr+len);
+ if (vma && vma->vm_start >= addr+len) {
+ /* is it too close? */
+ if (vma->vm_start - (addr+len) < stack_guard_start_gap(vma))
+ return true;
+ }
+
+ /* step 2: search for a non-overlapping preceding stack VMA */
+ if (!IS_ENABLED(CONFIG_STACK_GROWSUP))
+ return false;
+ vma = find_vma_prev(mm, addr, &prev_vma);
+ /* don't handle cases where the VA start overlaps a VMA */
+ if (vma && vma->vm_start < addr)
+ return false;
+ if (!prev_vma || !(prev_vma->vm_flags & VM_GROWSUP))
+ return false;
+ return addr - prev_vma->vm_end < stack_guard_gap;
+}
+
/* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
* @mm: The mm_struct
* @start: The start address to munmap
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0c5d6d06107d..2300e2eff956 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -772,6 +772,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
}
}
+ error = -ENOMEM;
+ if ((prot & (PROT_READ | PROT_WRITE | PROT_EXEC)) &&
+ !(vma->vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) &&
+ overlaps_stack_gap(current->mm, start, end - start))
+ goto out;
+
prev = vma_prev(&vmi);
if (start > vma->vm_start)
prev = vma;
---
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b
--
Jann Horn <jannh(a)google.com>
A problem was introduced with commit f69759be251d ("x86/CPU/AMD: Move
Zenbleed check to the Zen2 init function") where a bit in the DE_CFG MSR
is getting set after a microcode late load.
The problem seems to be that the microcode late load path calls into
amd_check_microcode() and subsequently zen2_zenbleed_check(). Since the
patch removes the cpu_has_amd_erratum() check from zen2_zenbleed_check(),
this will cause all non-Zen2 CPUs to go through the function and set
the bit in the DE_CFG MSR.
Call into the zenbleed fix path on Zen2 CPUs only.
Fixes: f69759be251d ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Cc: <stable(a)vger.kernel.org>
Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Signed-off-by: John Allen <john.allen(a)amd.com>
---
v2:
- Clean up commit description
---
arch/x86/kernel/cpu/amd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 015971adadfc..368344e1394b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -1202,5 +1202,6 @@ void amd_check_microcode(void)
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return;
- on_each_cpu(zenbleed_check_cpu, NULL, 1);
+ if (boot_cpu_has(X86_FEATURE_ZEN2))
+ on_each_cpu(zenbleed_check_cpu, NULL, 1);
}
--
2.34.1
'new_map' is allocated using devm_* which takes care of freeing the
allocated data on device removal, call to
.dt_free_map = pinconf_generic_dt_free_map
double frees the map as pinconf_generic_dt_free_map() calls
pinctrl_utils_free_map().
Fix this by using kcalloc() instead of auto-managed devm_kcalloc().
Cc: stable(a)vger.kernel.org
Fixes: f805e356313b ("pinctrl: nuvoton: Add ma35d1 pinctrl and GPIO driver")
Reported-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
This is based on static analysis and reading code, only compile tested.
Added the stable tag as the commit in Fixes is also in 6.11.y
---
drivers/pinctrl/nuvoton/pinctrl-ma35.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/nuvoton/pinctrl-ma35.c b/drivers/pinctrl/nuvoton/pinctrl-ma35.c
index 1fa00a23534a..59c4e7c6cdde 100644
--- a/drivers/pinctrl/nuvoton/pinctrl-ma35.c
+++ b/drivers/pinctrl/nuvoton/pinctrl-ma35.c
@@ -218,7 +218,7 @@ static int ma35_pinctrl_dt_node_to_map_func(struct pinctrl_dev *pctldev,
}
map_num += grp->npins;
- new_map = devm_kcalloc(pctldev->dev, map_num, sizeof(*new_map), GFP_KERNEL);
+ new_map = kcalloc(map_num, sizeof(*new_map), GFP_KERNEL);
if (!new_map)
return -ENOMEM;
--
2.39.3
Hi,
The following commit is to make amd_pmc and amd_pmf drivers work
properly on AMD family 1Ah model 60h processors,
59c34008d3bd x86/amd_nb: Add new PCI IDs for AMD family 1Ah model 60h
Please add this commit to stable kernel 6.11.y. Thanks!
Regards,
Richard
This series fixes a wrong handling of the child node within the
for_each_child_of_node() by adding the missing call to of_node_put() to
make it compatible with stable kernels that don't provide the scoped
variant of the macro, which is more secure and was introduced early this
year.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
drm/mediatek: Fix child node refcount handling in early exit
drm/mediatek: Switch to for_each_child_of_node_scoped()
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241011-mtk_drm_drv_memleak-5e8b8e45ed1c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>