In the SDSP probe path, qcom_scm_assign_mem() is used to assign the
reserved memory to the configured VMIDs, but its return value was not
checked.
Fail the probe if the SCM call fails to avoid continuing with an
unexpected/incorrect memory permission configuration
Fixes: c3c0363bc72d4 ("misc: fastrpc: support complete DMA pool access to the DSP")
Cc: stable(a)vger.kernel.org # 6.11-rc1
Signed-off-by: Xingjing Deng <xjdeng(a)buaa.edu.cn>
---
v4:
- Format the indentation
- Link to v3: https://lore.kernel.org/linux-arm-msm/20260113084352.72itrloj5w7qb5o3@hu-mo…
v3:
- Add missing linux-kernel(a)vger.kernel.org to cc list.
- Standarlize changelog placement/format.
- Link to v2: https://lore.kernel.org/linux-arm-msm/20260113063618.e2ke47gy3hnfi67e@hu-mo…
v2:
- Add Fixes: and Cc: stable tags.
- Link to v1: https://lore.kernel.org/linux-arm-msm/20260113022550.4029635-1-xjdeng@buaa.…
Signed-off-by: Xingjing Deng <xjdeng(a)buaa.edu.cn>
---
drivers/misc/fastrpc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index cbb12db110b3..9c41b51d80ee 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -2339,10 +2339,10 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
src_perms = BIT(QCOM_SCM_VMID_HLOS);
err = qcom_scm_assign_mem(res.start, resource_size(&res), &src_perms,
- data->vmperms, data->vmcount);
+ data->vmperms, data->vmcount);
if (err) {
dev_err(rdev, "Failed to assign memory phys 0x%llx size 0x%llx err %d",
- res.start, resource_size(&res), err);
+ res.start, resource_size(&res), err);
goto err_free_data;
}
}
--
2.25.1
Hi there,
While running performance benchmarks for the 5.15.196 LTS tags , it was
observed that several regressions across different benchmarks is being
introduced when compared to the previous 5.15.193 kernel tag. Running an
automated bisect on both of them narrowed down the culprit commit to:
- 5666bcc3c00f7 Revert "cpuidle: menu: Avoid discarding useful
information" for 5.15
Regressions on 5.15.196 include:
-9.3% : Phoronix pts/sqlite using 2 processes on OnPrem X6-2
-6.3% : Phoronix system/sqlite on OnPrem X6-2
-18% : rds-stress -M 1 (readonly rdma-mode) metrics with 1 depth & 1
thread & 1M buffer size on OnPrem X6-2
-4 -> -8% : rds-stress -M 2 (writeonly rdma-mode) metrics with 1 depth &
1 thread & 1M buffer size on OnPrem X6-2
Up to -30% : Some Netpipe metrics on OnPrem X5-2
The culprit commits' messages mention that these reverts were done due
to performance regressions introduced in Intel Jasper Lake systems but
this revert is causing issues in other systems unfortunately. I wanted
to know the maintainers' opinion on how we should proceed in order to
fix this. If we reapply it'll bring back the previous regressions on
Jasper Lake systems and if we don't revert it then it's stuck with
current regressions. If this problem has been reported before and a fix
is in the works then please let me know I shall follow developments to
that mail thread.
Thanks & Regards,
Harshvardhan
Currently pivot_root() doesn't work on the real rootfs because it
cannot be unmounted. Userspace has to do a recursive removal of the
initramfs contents manually before continuing the boot.
Really all we want from the real rootfs is to serve as the parent mount
for anything that is actually useful such as the tmpfs or ramfs for
initramfs unpacking or the rootfs itself. There's no need for the real
rootfs to actually be anything meaningful or useful. Add a immutable
rootfs called "nullfs" that can be selected via the "nullfs_rootfs"
kernel command line option.
The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs
and fire up userspace which mounts the rootfs and can then just do:
chdir(rootfs);
pivot_root(".", ".");
umount2(".", MNT_DETACH);
and be done with it. (Ofc, userspace can also choose to retain the
initramfs contents by using something like pivot_root(".", "/initramfs")
without unmounting it.)
Technically this also means that the rootfs mount in unprivileged
namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed
that the immutable rootfs remains permanently empty so there cannot be
anything revealed by unmounting the covering mount.
In the future this will also allow us to create completely empty mount
namespaces without risking to leak anything.
systemd already handles this all correctly as it tries to pivot_root()
first and falls back to MS_MOVE only when that fails.
This goes back to various discussion in previous years and a LPC 2024
presentation about this very topic.
Now in vfs-7.0.nullfs.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Changes in v2:
- Rename to "nullfs".
- Update documentation.
- Link to v1: https://patch.msgid.link/20260102-work-immutable-rootfs-v1-0-f2073b2d1602@k…
---
Christian Brauner (4):
fs: ensure that internal tmpfs mount gets mount id zero
fs: add init_pivot_root()
fs: add immutable rootfs
docs: mention nullfs
.../filesystems/ramfs-rootfs-initramfs.rst | 32 +++-
fs/Makefile | 2 +-
fs/init.c | 17 ++
fs/internal.h | 1 +
fs/mount.h | 1 +
fs/namespace.c | 181 ++++++++++++++-------
fs/nullfs.c | 70 ++++++++
include/linux/init_syscalls.h | 1 +
include/uapi/linux/magic.h | 1 +
init/do_mounts.c | 14 ++
init/do_mounts.h | 1 +
11 files changed, 254 insertions(+), 67 deletions(-)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260102-work-immutable-rootfs-b5f23e0f5a27
During system shutdown, KFENCE can cause IPI synchronization issues if
it remains active through the reboot process. To prevent this, register
a reboot notifier that disables KFENCE and cancels any pending timer
work early in the shutdown sequence.
This is only necessary when CONFIG_KFENCE_STATIC_KEYS is enabled, as
this configuration sends IPIs that can interfere with shutdown. Without
static keys, no IPIs are generated and KFENCE can safely remain active.
The notifier uses maximum priority (INT_MAX) to ensure KFENCE shuts
down before other subsystems that might still depend on stable memory
allocation behavior.
This fixes a late kexec CSD lockup[1] when kfence is trying to IPI a CPU
that is busy in a IRQ-disabled context printing characters to the
console.
Link: https://lore.kernel.org/all/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupfexeml6… [1]
Cc: stable(a)vger.kernel.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Marco Elver <elver(a)google.com>
Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
---
Changes in v2:
- Adding Fixes: tag and CCing stable (akpm)
- Link to v1: https://patch.msgid.link/20251126-kfence-v1-1-5a6e1d7c681c@debian.org
---
mm/kfence/core.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 727c20c94ac5..162a026871ab 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -26,6 +26,7 @@
#include <linux/panic_notifier.h>
#include <linux/random.h>
#include <linux/rcupdate.h>
+#include <linux/reboot.h>
#include <linux/sched/clock.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
@@ -820,6 +821,25 @@ static struct notifier_block kfence_check_canary_notifier = {
static struct delayed_work kfence_timer;
#ifdef CONFIG_KFENCE_STATIC_KEYS
+static int kfence_reboot_callback(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ /*
+ * Disable kfence to avoid static keys IPI synchronization during
+ * late shutdown/kexec
+ */
+ WRITE_ONCE(kfence_enabled, false);
+ /* Cancel any pending timer work */
+ cancel_delayed_work_sync(&kfence_timer);
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block kfence_reboot_notifier = {
+ .notifier_call = kfence_reboot_callback,
+ .priority = INT_MAX, /* Run early to stop timers ASAP */
+};
+
/* Wait queue to wake up allocation-gate timer task. */
static DECLARE_WAIT_QUEUE_HEAD(allocation_wait);
@@ -901,6 +921,10 @@ static void kfence_init_enable(void)
if (kfence_check_on_panic)
atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier);
+#ifdef CONFIG_KFENCE_STATIC_KEYS
+ register_reboot_notifier(&kfence_reboot_notifier);
+#endif
+
WRITE_ONCE(kfence_enabled, true);
queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
---
base-commit: ab084f0b8d6d2ee4b1c6a28f39a2a7430bdfa7f0
change-id: 20251126-kfence-42c93f9b3979
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Sparse inode cluster allocation sets min/max agbno values to avoid
allocating an inode cluster that might map to an invalid inode
chunk. For example, we can't have an inode record mapped to agbno 0
or that extends past the end of a runt AG of misaligned size.
The initial calculation of max_agbno is unnecessarily conservative,
however. This has triggered a corner case allocation failure where a
small runt AG (i.e. 2063 blocks) is mostly full save for an extent
to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
case, which happens to be the offset of the last possible valid
inode chunk in the AG. In practice, we should be able to allocate
the 4-block cluster at agbno 2052 to map to the parent inode record
at agbno 2048, but the max_agbno value precludes it.
Note that this can result in filesystem shutdown via dirty trans
cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
the tail AG selection by the allocator sets t_highest_agno on the
transaction. If the inode allocator spins around and finds an inode
chunk with free inodes in an earlier AG, the subsequent dir name
creation path may still fail to allocate due to the AG restriction
and cancel.
To avoid this problem, update the max_agbno calculation to the agbno
prior to the last chunk aligned agbno in the AG. This is not
necessarily the last valid allocation target for a sparse chunk, but
since inode chunks (i.e. records) are chunk aligned and sparse
allocs are cluster sized/aligned, this allows the sb_spino_align
alignment restriction to take over and round down the max effective
agbno to within the last valid inode chunk in the AG.
Note that even though the allocator improvements in the
aforementioned commit seem to avoid this particular dirty trans
cancel situation, the max_agbno logic improvement still applies as
we should be able to allocate from an AG that has been appropriately
selected. The more important target for this patch however are
older/stable kernels prior to this allocator rework/improvement.
Cc: <stable(a)vger.kernel.org> # v4.2
Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
---
v2:
- Added misc. commit log tags.
v1: https://lore.kernel.org/linux-xfs/20260108141129.7765-1-bfoster@redhat.com/
fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index d97295eaebe6..c19d6d713780 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
* invalid inode records, such as records that start at agbno 0
* or extend beyond the AG.
*
- * Set min agbno to the first aligned, non-zero agbno and max to
- * the last aligned agbno that is at least one full chunk from
- * the end of the AG.
+ * Set min agbno to the first chunk aligned, non-zero agbno and
+ * max to one less than the last chunk aligned agbno from the
+ * end of the AG. We subtract 1 from max so that the cluster
+ * allocation alignment takes over and allows allocation within
+ * the last full inode chunk in the AG.
*/
args.min_agbno = args.mp->m_sb.sb_inoalignmt;
args.max_agbno = round_down(xfs_ag_block_count(args.mp,
pag_agno(pag)),
- args.mp->m_sb.sb_inoalignmt) -
- igeo->ialloc_blks;
+ args.mp->m_sb.sb_inoalignmt) - 1;
error = xfs_alloc_vextent_near_bno(&args,
xfs_agbno_to_fsb(pag,
--
2.52.0
Percpu sheaves caching was introduced as opt-in but the goal was to
eventually move all caches to them. This is the next step, enabling
sheaves for all caches (except the two bootstrap ones) and then removing
the per cpu (partial) slabs and lots of associated code.
Besides (hopefully) improved performance, this removes the rather
complicated code related to the lockless fastpaths (using
this_cpu_try_cmpxchg128/64) and its complications with PREEMPT_RT or
kmalloc_nolock().
The lockless slab freelist+counters update operation using
try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects
without repeating the "alien" array flushing of SLUB, and to allow
flushing objects from sheaves to slabs mostly without the node
list_lock.
This v2 is the first non-RFC. I would consider exposing the series to
linux-next at this point.
Git branch for the v2:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=she…
Based on:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab…
- includes a sheaves optimization that seemed minor but there was lkp
test robot result with significant improvements:
https://lore.kernel.org/all/202512291555.56ce2e53-lkp@intel.com/
(could be an uncommon corner case workload though)
Significant (but not critical) remaining TODOs:
- Integration of rcu sheaves handling with kfree_rcu batching.
- Currently the kfree_rcu batching is almost completely bypassed. I'm
thinking it could be adjusted to handle rcu sheaves in addition to
individual objects, to get the best of both.
- Performance evaluation. Petr Tesarik has been doing that on the RFC
with some promising results (thanks!) and also found a memory leak.
Note that as many things, this caching scheme change is a tradeoff, as
summarized by Christoph:
https://lore.kernel.org/all/f7c33974-e520-387e-9e2f-1e523bfe1545@gentwo.org/
- Objects allocated from sheaves should have better temporal locality
(likely recently freed, thus cache hot) but worse spatial locality
(likely from many different slabs, increasing memory usage and
possibly TLB pressure on kernel's direct map).
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
---
Changes in v2:
- Rebased to v6.19-rc1+slab.git slab/for-7.0/sheaves
- Some of the preliminary patches from the RFC went in there.
- Incorporate feedback/reports from many people (thanks!), including:
- Make caches with sheaves mergeable.
- Fix a major memory leak.
- Cleanup of stat items.
- Link to v1: https://patch.msgid.link/20251023-sheaves-for-all-v1-0-6ffa2c9941c0@suse.cz
---
Vlastimil Babka (20):
mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache()
mm/slab: move and refactor __kmem_cache_alias()
mm/slab: make caches with sheaves mergeable
slab: add sheaves to most caches
slab: introduce percpu sheaves bootstrap
slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock()
slab: handle kmalloc sheaves bootstrap
slab: add optimized sheaf refill from partial list
slab: remove cpu (partial) slabs usage from allocation paths
slab: remove SLUB_CPU_PARTIAL
slab: remove the do_slab_free() fastpath
slab: remove defer_deactivate_slab()
slab: simplify kmalloc_nolock()
slab: remove struct kmem_cache_cpu
slab: remove unused PREEMPT_RT specific macros
slab: refill sheaves from all nodes
slab: update overview comments
slab: remove frozen slab checks from __slab_free()
mm/slub: remove DEACTIVATE_TO_* stat items
mm/slub: cleanup and repurpose some stat items
include/linux/slab.h | 6 -
mm/Kconfig | 11 -
mm/internal.h | 1 +
mm/page_alloc.c | 5 +
mm/slab.h | 53 +-
mm/slab_common.c | 56 +-
mm/slub.c | 2591 +++++++++++++++++---------------------------------
7 files changed, 950 insertions(+), 1773 deletions(-)
---
base-commit: aff9fb2fffa1175bd5ae3b4630f3d4ae53af450b
change-id: 20251002-sheaves-for-all-86ac13dc47a5
Best regards,
--
Vlastimil Babka <vbabka(a)suse.cz>
From: Biju Das <biju.das.jz(a)bp.renesas.com>
A glitch in the edge detection circuit can cause a spurious interrupt. The
hardware manual recommends clearing the status flag after setting the
ICU_TSSRk register as a countermeasure.
Currently, a spurious IRQ is generated on the resume path of s2idle for
the PMIC RTC TINT interrupt due to a glitch related to unnecessary
enabling/disabling of the TINT enable bit.
Fix this issue by not setting TSSR(TINT Source) and TITSR(TINT Detection
Method Selection) registers if the values are the same as those set
in these registers.
Fixes: 0d7605e75ac2 ("irqchip: Add RZ/V2H(P) Interrupt Control Unit (ICU) driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com>
---
drivers/irqchip/irq-renesas-rzv2h.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-renesas-rzv2h.c b/drivers/irqchip/irq-renesas-rzv2h.c
index 0c44b6109842..9b4565375e83 100644
--- a/drivers/irqchip/irq-renesas-rzv2h.c
+++ b/drivers/irqchip/irq-renesas-rzv2h.c
@@ -328,6 +328,7 @@ static int rzv2h_tint_set_type(struct irq_data *d, unsigned int type)
u32 titsr, titsr_k, titsel_n, tien;
struct rzv2h_icu_priv *priv;
u32 tssr, tssr_k, tssel_n;
+ u32 titsr_cur, tssr_cur;
unsigned int hwirq;
u32 tint, sense;
int tint_nr;
@@ -376,12 +377,18 @@ static int rzv2h_tint_set_type(struct irq_data *d, unsigned int type)
guard(raw_spinlock)(&priv->lock);
tssr = readl_relaxed(priv->base + priv->info->t_offs + ICU_TSSR(tssr_k));
+ titsr = readl_relaxed(priv->base + priv->info->t_offs + ICU_TITSR(titsr_k));
+
+ tssr_cur = field_get(ICU_TSSR_TSSEL_MASK(tssel_n, priv->info->field_width), tssr);
+ titsr_cur = field_get(ICU_TITSR_TITSEL_MASK(titsel_n), titsr);
+ if (tssr_cur == tint && titsr_cur == sense)
+ return 0;
+
tssr &= ~(ICU_TSSR_TSSEL_MASK(tssel_n, priv->info->field_width) | tien);
tssr |= ICU_TSSR_TSSEL_PREP(tint, tssel_n, priv->info->field_width);
writel_relaxed(tssr, priv->base + priv->info->t_offs + ICU_TSSR(tssr_k));
- titsr = readl_relaxed(priv->base + priv->info->t_offs + ICU_TITSR(titsr_k));
titsr &= ~ICU_TITSR_TITSEL_MASK(titsel_n);
titsr |= ICU_TITSR_TITSEL_PREP(sense, titsel_n);
--
2.43.0