Hello,
This series is based on commit
320475fbd590 Merge tag 'mtd/fixes-for-6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
of Mainline Linux.
The first patch in the series has been posted as a Fix in contrast to
its predecessor at:
https://lore.kernel.org/r/20250903124505.365913-10-s-vadapalli@ti.com/
based on the feedback provided by Jiri Slaby <jirislaby(a)kernel.org> at:
https://lore.kernel.org/r/3d3a4b52-e343-42f3-9d69-94c259812143@kernel.org/
Since the Fix is independent of enabling loadable module support for the
pci-keystone.c driver, it is being posted as a new patch.
Checking out at the commit of Mainline Linux which this series is based
on, I noticed an exception triggered by the pci-keystone.c driver during
its probe. Although this is not a fatal exception and Linux continues to
boot, the driver is non-functional. I root-caused the exception to
free_initmem() freeing the memory associated with the ks_pcie_host_init()
function in the driver before the driver's probe was invoked. This
appears to be a race condition but it is easily reproducible with the
Linux .config that I have used. The fix therefore is to remove the
__init macro which is implemented by the second patch in the series.
For reference, the logs for the case where Linux is built by checking
out at the base commit of Mainline Linux are:
https://gist.github.com/Siddharth-Vadapalli-at-TI/f4891b707921c53dfb464ad2f…
and the logs clearly prove that the print associated with free_initmem()
which is:
[ 2.446834] Freeing unused kernel memory: 4864K
is displayed prior to the prints associated with the pci-keystone.c
driver being probed which is:
[ 7.707103] keystone-pcie 5500000.pcie: host bridge /bus@100000/pcie@5500000 ranges:
Building Linux by applying both patches in the series on the base commit of
Mainline Linux, the driver probes successfully without any exceptions or
errors. This was tested on AM654-EVM with an NVMe SSD connected to the
PCIe Connector on the board. The NVMe SSD enumerates successfully.
Additionally, the 'hdparm' utility was used to read from the SSD
confirming that the SSD is functional. The logs corresponding to this are:
https://gist.github.com/Siddharth-Vadapalli-at-TI/1b09a12a53db4233e82c5bcfc…
Regards,
Siddharth.
Siddharth Vadapalli (2):
PCI: keystone: Use devm_request_irq() to free "ks-pcie-error-irq" on
exit
PCI: keystone: Remove the __init macro for the ks_pcie_host_init()
callback
drivers/pci/controller/dwc/pci-keystone.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--
2.43.0
From: Andrey Konovalov <andreyknvl(a)gmail.com>
Drop the check on the maximum transfer length in Raw Gadget for both
control and non-control transfers.
Limiting the transfer length causes a problem with emulating USB devices
whose full configuration descriptor exceeds PAGE_SIZE in length.
Overall, there does not appear to be any reason to enforce any kind of
transfer length limit on the Raw Gadget side for either control or
non-control transfers, so let's just drop the related check.
Cc: stable(a)vger.kernel.org
Fixes: f2c2e717642c ("usb: gadget: add raw-gadget interface")
Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com>
---
drivers/usb/gadget/legacy/raw_gadget.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
index 20165e1582d9..b71680c58de6 100644
--- a/drivers/usb/gadget/legacy/raw_gadget.c
+++ b/drivers/usb/gadget/legacy/raw_gadget.c
@@ -667,8 +667,6 @@ static void *raw_alloc_io_data(struct usb_raw_ep_io *io, void __user *ptr,
return ERR_PTR(-EINVAL);
if (!usb_raw_io_flags_valid(io->flags))
return ERR_PTR(-EINVAL);
- if (io->length > PAGE_SIZE)
- return ERR_PTR(-EINVAL);
if (get_from_user)
data = memdup_user(ptr + sizeof(*io), io->length);
else {
--
2.43.0
Hi. This is supposed to be a patch, but I think it's worth discussing
how it should be backported to -stable, so I've labeled it as [DISCUSSION].
The bug described below was unintentionally fixed in v6.5 and not
backported to -stable. So technically I would need to use "Option 3" [A],
but since the original patch [B] did not intend to fix a bug (and it's also
part of a larger patch series), it looks quite different from the patch below,
and I'm not sure what the backport should look like.
I think there are probably two options:
1. Provide the description of the original patch along with a very long,
detailed explanation of why the patch deviates from the upstream version, or
2. Post the patch below with a clarification that it was fixed upstream
by commit 670ddd8cdcbd1.
Any thoughts?
[A] https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opt…
[B] https://lkml.kernel.org/r/725a42a9-91e9-c868-925-e3a5fd40bb4f@google.com
(Upstream commit 670ddd8cdcbd1)
In any case, no matter how we backport this, it needs some review and
feedback would be appreciated. The patch applies to v6.1 and v5.15, and
v5.10 but not v5.4.
From cf45867ab8e48b42160b7253390db7bdecef1455 Mon Sep 17 00:00:00 2001
From: Harry Yoo <harry.yoo(a)oracle.com>
Date: Thu, 11 Sep 2025 20:05:40 +0900
Subject: [PATCH] mm, numa: fix bad pmd by atomically checking is_swap_pmd() in
change_prot_numa()
It was observed that a bad pmd is seen when automatic NUMA balancing
is marking page table entries as prot_numa:
[2437548.196018] mm/pgtable-generic.c:50: bad pmd 00000000af22fc02(dffffffe71fbfe02)
With some kernel modification, the call stack was dumped:
[2437548.235022] Call Trace:
[2437548.238234] <TASK>
[2437548.241060] dump_stack_lvl+0x46/0x61
[2437548.245689] panic+0x106/0x2e5
[2437548.249497] pmd_clear_bad+0x3c/0x3c
[2437548.253967] change_pmd_range.isra.0+0x34d/0x3a7
[2437548.259537] change_p4d_range+0x156/0x20e
[2437548.264392] change_protection_range+0x116/0x1a9
[2437548.269976] change_prot_numa+0x15/0x37
[2437548.274774] task_numa_work+0x1b8/0x302
[2437548.279512] task_work_run+0x62/0x95
[2437548.283882] exit_to_user_mode_loop+0x1a4/0x1a9
[2437548.289277] exit_to_user_mode_prepare+0xf4/0xfc
[2437548.294751] ? sysvec_apic_timer_interrupt+0x34/0x81
[2437548.300677] irqentry_exit_to_user_mode+0x5/0x25
[2437548.306153] asm_sysvec_apic_timer_interrupt+0x16/0x1b
This is due to a race condition between change_prot_numa() and
THP migration because the kernel doesn't check is_swap_pmd() and
pmd_trans_huge() atomically:
change_prot_numa() THP migration
======================================================================
- change_pmd_range()
-> is_swap_pmd() returns false,
meaning it's not a PMD migration
entry.
- do_huge_pmd_numa_page()
-> migrate_misplaced_page() sets
migration entries for the THP.
- change_pmd_range()
-> pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_none() and pmd_trans_huge() returns false
- pmd_none_or_clear_bad_unless_trans_huge()
-> pmd_bad() returns true for the migration entry!
For the race condition described above to occur:
1) AutoNUMA must be unmapping a range of pages, with at least part of the
range already unmapped by AutoNUMA.
2) While AutoNUMA is in the process of unmapping, a NUMA hinting fault
occurs within that range, specifically when we are about to unmap
the PMD entry, between the is_swap_pmd() and pmd_trans_huge() checks.
So this is a really rare race condition and it's observed that it takes
usually a few days of autonuma-intensive testing to trigger.
A bit of history on a similar race condition in the past:
In fact, a similar race condition caused by not checking pmd_trans_huge()
atomically was reported [1] in 2017. However, instead of the patch [1],
another patch series [3] fixed the problem [2] by not clearing the pmd
entry but invaliding it instead (so that pmd_trans_huge() would still
return true).
Despite patch series [3], the bad pmd error continued to be reported
in mainline. As a result, [1] was resurrected [4] and it landed mainline
in 2020 in a hope that it would resolve the issue. However, now it turns
out that [3] was not sufficient.
Fix this race condition by checking is_swap_pmd() and pmd_trans_huge()
atomically. With that, the kernel should see either
pmd_trans_huge() == true, or is_swap_pmd() == true when another task is
migrating the page concurrently.
This bug was introduced when THP migration support was added. More
specifically, by commit 84c3fc4e9c56 ("mm: thp: check pmd migration entry
in common path")).
It is unintentionally fixed since v6.5 by commit 670ddd8cdcbd1
("mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()") while
removing pmd_none_or_clear_bad_unless_trans_huge() function. But it's not
backported to -stable because it was fixed unintentionally.
Link: https://lore.kernel.org/linux-mm/20170410094825.2yfo5zehn7pchg6a@techsingul… [1]
Link: https://lore.kernel.org/linux-mm/8A6309F4-DB76-48FA-BE7F-BF9536A4C4E5@cs.ru… [2]
Link: https://lore.kernel.org/linux-mm/20170302151034.27829-1-kirill.shutemov@lin… [3]
Link: https://lore.kernel.org/linux-mm/20200216191800.22423-1-aquini@redhat.com [4]
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
mm/mprotect.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 668bfaa6ed2a..c0e796c0f9b0 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -303,7 +303,7 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
if (pmd_none(pmdval))
return 1;
- if (pmd_trans_huge(pmdval))
+ if (is_swap_pmd(pmdval) || pmd_trans_huge(pmdval))
return 0;
if (unlikely(pmd_bad(pmdval))) {
pmd_clear_bad(pmd);
@@ -373,7 +373,7 @@ static inline unsigned long change_pmd_range(struct mmu_gather *tlb,
* Hence, it's necessary to atomically read the PMD value
* for all the checks.
*/
- if (!is_swap_pmd(*pmd) && !pmd_devmap(*pmd) &&
+ if (!pmd_devmap(*pmd) &&
pmd_none_or_clear_bad_unless_trans_huge(pmd))
goto next;
--
2.43.0
Several crypto user API contexts and requests allocated with
sock_kmalloc() were left uninitialized, relying on callers to
set fields explicitly. This resulted in the use of uninitialized
data in certain error paths or when new fields are added in the
future.
The ACVP patches also contain two user-space interface files:
algif_kpp.c and algif_akcipher.c. These too rely on proper
initialization of their context structures.
A particular issue has been observed with the newly added
'inflight' variable introduced in af_alg_ctx by commit:
67b164a871af ("crypto: af_alg - Disallow multiple in-flight AIO requests")
Because the context is not memset to zero after allocation,
the inflight variable has contained garbage values. As a result,
af_alg_alloc_areq() has incorrectly returned -EBUSY randomly when
the garbage value was interpreted as true:
https://github.com/gregkh/linux/blame/master/crypto/af_alg.c#L1209
The check directly tests ctx->inflight without explicitly
comparing against true/false. Since inflight is only ever set to
true or false later, an uninitialized value has triggered
-EBUSY failures. Zero-initializing memory allocated with
sock_kmalloc() ensures inflight and other fields start in a known
state, removing random issues caused by uninitialized data.
Fixes: fe869cdb89c9 ("crypto: algif_hash - User-space interface for hash operations")
Fixes: 5afdfd22e6ba ("crypto: algif_rng - add random number generator support")
Fixes: 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code")
Fixes: 67b164a871af ("crypto: af_alg - Disallow multiple in-flight AIO requests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
Changes in v2:
- Dropped algif_skcipher_export changes, The ctx->state will immediately
be overwritten by crypto_skcipher_export.
- No other changes.
---
crypto/af_alg.c | 5 ++---
crypto/algif_hash.c | 3 +--
crypto/algif_rng.c | 3 +--
3 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index ca6fdcc6c54a..6c271e55f44d 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1212,15 +1212,14 @@ struct af_alg_async_req *af_alg_alloc_areq(struct sock *sk,
if (unlikely(!areq))
return ERR_PTR(-ENOMEM);
+ memset(areq, 0, areqlen);
+
ctx->inflight = true;
areq->areqlen = areqlen;
areq->sk = sk;
areq->first_rsgl.sgl.sgt.sgl = areq->first_rsgl.sgl.sgl;
- areq->last_rsgl = NULL;
INIT_LIST_HEAD(&areq->rsgl_list);
- areq->tsgl = NULL;
- areq->tsgl_entries = 0;
return areq;
}
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index e3f1a4852737..4d3dfc60a16a 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -416,9 +416,8 @@ static int hash_accept_parent_nokey(void *private, struct sock *sk)
if (!ctx)
return -ENOMEM;
- ctx->result = NULL;
+ memset(ctx, 0, len);
ctx->len = len;
- ctx->more = false;
crypto_init_wait(&ctx->wait);
ask->private = ctx;
diff --git a/crypto/algif_rng.c b/crypto/algif_rng.c
index 10c41adac3b1..1a86e40c8372 100644
--- a/crypto/algif_rng.c
+++ b/crypto/algif_rng.c
@@ -248,9 +248,8 @@ static int rng_accept_parent(void *private, struct sock *sk)
if (!ctx)
return -ENOMEM;
+ memset(ctx, 0, len);
ctx->len = len;
- ctx->addtl = NULL;
- ctx->addtl_len = 0;
/*
* No seeding done at that point -- if multiple accepts are
--
2.40.4
commit 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection
in the CPU hotplug path")
adds a call to detect_cache_attributes() to populate the cacheinfo
before updating the siblings mask. detect_cache_attributes() allocates
memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
kernels, on secondary CPUs, this triggers a:
'BUG: sleeping function called from invalid context'
as the code is executed with preemption and interrupts disabled:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
| preempt_count: 1, expected: 0
| RCU nest depth: 1, expected: 1
| 3 locks held by swapper/111/0:
| #0: (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
| #1: (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
| #2: (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last enabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last disabled at (0): 0x0
| Preemption disabled at:
| migrate_enable+0x30/0x130
| CPU: 111 PID: 0 Comm: swapper/111 Tainted: G W 6.0.0-rc4-rt6-[...]
| Call trace:
| __kmalloc+0xbc/0x1e8
| detect_cache_attributes+0x2d4/0x5f0
| update_siblings_masks+0x30/0x368
| store_cpu_topology+0x78/0xb8
| secondary_start_kernel+0xd0/0x198
| __secondary_switched+0xb0/0xb4
Pierre fixed this issue in the upstream 6.3 and the original series is follows:
https://lore.kernel.org/all/167404285593.885445.6219705651301997538.b4-ty@a…
We also encountered the same issue on 6.1 stable branch, and need to backport this series.
Pierre Gondois (6):
cacheinfo: Use RISC-V's init_cache_level() as generic OF
implementation
cacheinfo: Return error code in init_of_cache_level()
cacheinfo: Check 'cache-unified' property to count cache leaves
ACPI: PPTT: Remove acpi_find_cache_levels()
ACPI: PPTT: Update acpi_find_last_cache_level() to
acpi_get_cache_info()
arch_topology: Build cacheinfo from primary CPU
arch/arm64/kernel/cacheinfo.c | 11 ++-
arch/riscv/kernel/cacheinfo.c | 42 -----------
drivers/acpi/pptt.c | 93 +++++++++++++----------
drivers/base/arch_topology.c | 12 ++-
drivers/base/cacheinfo.c | 134 +++++++++++++++++++++++++++++-----
include/linux/cacheinfo.h | 11 ++-
6 files changed, 196 insertions(+), 107 deletions(-)
--
2.25.1