Hi Greg and Sasha,
Please apply commit 56778b49c9a2 ("kunit: Add a macro to wrap a deferred
action function") to the 6.7 branch, as there are reported kCFI failures
that are resolved by that change:
https://github.com/ClangBuiltLinux/linux/issues/1998
It applies cleanly for me. We may want this in earlier branches as well
but there are some conflicts that I did not have too much time to look
at, we can always wait for other reports to come in before going further
back as well.
Cheers,
Nathan
From: Bjorn Helgaas <bhelgaas(a)google.com>
When booting with "pci=noaer", we don't request control of AER, but we
previously *did* request control of DPC, as in the dmesg log attached at
the bugzilla below:
Command line: ... pci=noaer
acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR DPC]
That's illegal per PCI Firmware Spec, r3.3, sec 4.5.1, table 4-5, which
says:
If the operating system sets this bit [OSC_PCI_EXPRESS_DPC_CONTROL], it
must also set bit 7 of the Support field (indicating support for Error
Disconnect Recover notifications) and bits 3 and 4 of the Control field
(requesting control of PCI Express Advanced Error Reporting and the PCI
Express Capability Structure).
Request DPC control only if we have also requested AER control.
Fixes: ac1c8e35a326 ("PCI/DPC: Add Error Disconnect Recover (EDR) support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218491#c12
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org> # v5.7+
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com>
Cc: Matthew W Carlis <mattc(a)purestorage.com>
Cc: Keith Busch <kbusch(a)kernel.org>
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
---
drivers/acpi/pci_root.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 58b89b8d950e..1c16965427b3 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -518,17 +518,19 @@ static u32 calculate_control(void)
if (IS_ENABLED(CONFIG_HOTPLUG_PCI_SHPC))
control |= OSC_PCI_SHPC_NATIVE_HP_CONTROL;
- if (pci_aer_available())
+ if (pci_aer_available()) {
control |= OSC_PCI_EXPRESS_AER_CONTROL;
- /*
- * Per the Downstream Port Containment Related Enhancements ECN to
- * the PCI Firmware Spec, r3.2, sec 4.5.1, table 4-5,
- * OSC_PCI_EXPRESS_DPC_CONTROL indicates the OS supports both DPC
- * and EDR.
- */
- if (IS_ENABLED(CONFIG_PCIE_DPC) && IS_ENABLED(CONFIG_PCIE_EDR))
- control |= OSC_PCI_EXPRESS_DPC_CONTROL;
+ /*
+ * Per PCI Firmware Spec, r3.3, sec 4.5.1, table 4-5, the
+ * OS can request DPC control only if it has advertised
+ * OSC_PCI_EDR_SUPPORT and requested both
+ * OSC_PCI_EXPRESS_CAPABILITY_CONTROL and
+ * OSC_PCI_EXPRESS_AER_CONTROL.
+ */
+ if (IS_ENABLED(CONFIG_PCIE_DPC) && IS_ENABLED(CONFIG_PCIE_EDR))
+ control |= OSC_PCI_EXPRESS_DPC_CONTROL;
+ }
return control;
}
--
2.34.1
From: Paul E. McKenney <paulmck(a)kernel.org>
commit bc31e6cb27a9334140ff2f0a209d59b08bc0bc8c upstream.
Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock. This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch. However, such deadlocks are becoming
more frequent. In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.
In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
just as well take advantage of that fact.
This commit therefore eliminates these deadlock by replacing the
SRCU-based wait for do_exit() completion with per-CPU lists of tasks
currently exiting. A given task will be on one of these per-CPU lists for
the same period of time that this task would previously have been in the
previous SRCU read-side critical section. These lists enable RCU Tasks
to find the tasks that have already been removed from the tasks list,
but that must nevertheless be waited upon.
The RCU Tasks grace period gathers any of these do_exit() tasks that it
must wait on, and adds them to the list of holdouts. Per-CPU locking
and get_task_struct() are used to synchronize addition to and removal
from these lists.
Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
Reported-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211(a)gmail.com>
---
include/linux/sched.h | 1 +
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/rcu/update.c | 65 ++++++++++++++++++++++++++++++-------------
4 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index fd4899236037..0b555d8e9d5e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -679,6 +679,7 @@ struct task_struct {
u8 rcu_tasks_idx;
int rcu_tasks_idle_cpu;
struct list_head rcu_tasks_holdout_list;
+ struct list_head rcu_tasks_exit_list;
#endif /* #ifdef CONFIG_TASKS_RCU */
struct sched_info sched_info;
diff --git a/init/init_task.c b/init/init_task.c
index 994ffe018120..f741cbfd891c 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -139,6 +139,7 @@ struct task_struct init_task
.rcu_tasks_holdout = false,
.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
.rcu_tasks_idle_cpu = -1,
+ .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
#endif
#ifdef CONFIG_CPUSETS
.mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
diff --git a/kernel/fork.c b/kernel/fork.c
index b65871600507..d416d16df62f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1626,6 +1626,7 @@ static inline void rcu_copy_process(struct task_struct *p)
p->rcu_tasks_holdout = false;
INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
p->rcu_tasks_idle_cpu = -1;
+ INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
#endif /* #ifdef CONFIG_TASKS_RCU */
}
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 81688a133552..5227cb5c1bea 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -527,7 +527,8 @@ static DECLARE_WAIT_QUEUE_HEAD(rcu_tasks_cbs_wq);
static DEFINE_RAW_SPINLOCK(rcu_tasks_cbs_lock);
/* Track exiting tasks in order to allow them to be waited for. */
-DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
+static LIST_HEAD(rtp_exit_list);
+static DEFINE_RAW_SPINLOCK(rtp_exit_list_lock);
/* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */
#define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10)
@@ -661,6 +662,17 @@ static void check_holdout_task(struct task_struct *t,
sched_show_task(t);
}
+static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
+{
+ if (t != current && READ_ONCE(t->on_rq) &&
+ !is_idle_task(t)) {
+ get_task_struct(t);
+ t->rcu_tasks_nvcsw = READ_ONCE(t->nvcsw);
+ WRITE_ONCE(t->rcu_tasks_holdout, true);
+ list_add(&t->rcu_tasks_holdout_list, hop);
+ }
+}
+
/* RCU-tasks kthread that detects grace periods and invokes callbacks. */
static int __noreturn rcu_tasks_kthread(void *arg)
{
@@ -726,14 +738,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
*/
rcu_read_lock();
for_each_process_thread(g, t) {
- if (t != current && READ_ONCE(t->on_rq) &&
- !is_idle_task(t)) {
- get_task_struct(t);
- t->rcu_tasks_nvcsw = READ_ONCE(t->nvcsw);
- WRITE_ONCE(t->rcu_tasks_holdout, true);
- list_add(&t->rcu_tasks_holdout_list,
- &rcu_tasks_holdouts);
- }
+ rcu_tasks_pertask(t, &rcu_tasks_holdouts);
}
rcu_read_unlock();
@@ -744,8 +749,12 @@ static int __noreturn rcu_tasks_kthread(void *arg)
* where they have disabled preemption, allowing the
* later synchronize_sched() to finish the job.
*/
- synchronize_srcu(&tasks_rcu_exit_srcu);
-
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_for_each_entry(t, &rtp_exit_list, rcu_tasks_exit_list) {
+ if (list_empty(&t->rcu_tasks_holdout_list))
+ rcu_tasks_pertask(t, &rcu_tasks_holdouts);
+ }
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
/*
* Each pass through the following loop scans the list
* of holdout tasks, removing any that are no longer
@@ -802,8 +811,7 @@ static int __noreturn rcu_tasks_kthread(void *arg)
*
* In addition, this synchronize_sched() waits for exiting
* tasks to complete their final preempt_disable() region
- * of execution, cleaning up after the synchronize_srcu()
- * above.
+ * of execution.
*/
synchronize_sched();
@@ -834,20 +842,39 @@ static int __init rcu_spawn_tasks_kthread(void)
}
core_initcall(rcu_spawn_tasks_kthread);
-/* Do the srcu_read_lock() for the above synchronize_srcu(). */
+/*
+ * Protect against tasklist scan blind spot while the task is exiting and
+ * may be removed from the tasklist. Do this by adding the task to yet
+ * another list.
+ */
void exit_tasks_rcu_start(void)
{
+ unsigned long flags;
+ struct task_struct *t = current;
+
+ WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
+ get_task_struct(t);
preempt_disable();
- current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_add(&t->rcu_tasks_exit_list, &rtp_exit_list);
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
preempt_enable();
}
-/* Do the srcu_read_unlock() for the above synchronize_srcu(). */
+/*
+ * Remove the task from the "yet another list" because do_exit() is now
+ * non-preemptible, allowing synchronize_rcu() to wait beyond this point.
+ */
void exit_tasks_rcu_finish(void)
{
- preempt_disable();
- __srcu_read_unlock(&tasks_rcu_exit_srcu, current->rcu_tasks_idx);
- preempt_enable();
+ unsigned long flags;
+ struct task_struct *t = current;
+
+ WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_del_init(&t->rcu_tasks_exit_list);
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
+ put_task_struct(t);
}
#endif /* #ifdef CONFIG_TASKS_RCU */
--
2.17.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d877550eaf2dc9090d782864c96939397a3c6835
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024021941-reprimand-grudge-7734@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d877550eaf2d ("x86/fpu: Stop relying on userspace for info to fault in xsave buffer")
c03098d4b9ad ("Merge tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d877550eaf2dc9090d782864c96939397a3c6835 Mon Sep 17 00:00:00 2001
From: Andrei Vagin <avagin(a)google.com>
Date: Mon, 29 Jan 2024 22:36:03 -0800
Subject: [PATCH] x86/fpu: Stop relying on userspace for info to fault in xsave
buffer
Before this change, the expected size of the user space buffer was
taken from fx_sw->xstate_size. fx_sw->xstate_size can be changed
from user-space, so it is possible construct a sigreturn frame where:
* fx_sw->xstate_size is smaller than the size required by valid bits in
fx_sw->xfeatures.
* user-space unmaps parts of the sigrame fpu buffer so that not all of
the buffer required by xrstor is accessible.
In this case, xrstor tries to restore and accesses the unmapped area
which results in a fault. But fault_in_readable succeeds because buf +
fx_sw->xstate_size is within the still mapped area, so it goes back and
tries xrstor again. It will spin in this loop forever.
Instead, fault in the maximum size which can be touched by XRSTOR (taken
from fpstate->user_size).
[ dhansen: tweak subject / changelog ]
Fixes: fcb3635f5018 ("x86/fpu/signal: Handle #PF in the direct restore path")
Reported-by: Konstantin Bogomolov <bogomolov(a)google.com>
Suggested-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Andrei Vagin <avagin(a)google.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240130063603.3392627-1-avagin%40google.com
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 558076dbde5b..247f2225aa9f 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -274,12 +274,13 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
* Attempt to restore the FPU registers directly from user memory.
* Pagefaults are handled and any errors returned are fatal.
*/
-static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
- bool fx_only, unsigned int size)
+static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
struct fpu *fpu = ¤t->thread.fpu;
int ret;
+ /* Restore enabled features only. */
+ xrestore &= fpu->fpstate->user_xfeatures;
retry:
fpregs_lock();
/* Ensure that XFD is up to date */
@@ -309,7 +310,7 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
if (ret != X86_TRAP_PF)
return false;
- if (!fault_in_readable(buf, size))
+ if (!fault_in_readable(buf, fpu->fpstate->user_size))
goto retry;
return false;
}
@@ -339,7 +340,6 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
- unsigned int state_size;
u64 user_xfeatures = 0;
if (use_xsave()) {
@@ -349,17 +349,14 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
return false;
fx_only = !fx_sw_user.magic1;
- state_size = fx_sw_user.xstate_size;
user_xfeatures = fx_sw_user.xfeatures;
} else {
user_xfeatures = XFEATURE_MASK_FPSSE;
- state_size = fpu->fpstate->user_size;
}
if (likely(!ia32_fxstate)) {
/* Restore the FPU registers directly from user memory. */
- return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only,
- state_size);
+ return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only);
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012650-altitude-gush-572f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3c12466b6b7b ("erofs: fix lz4 inplace decompression")
123ec246ebe3 ("erofs: get rid of the remaining kmap_atomic()")
ab749badf9f4 ("erofs: support unaligned data decompression")
10e5f6e482e1 ("erofs: introduce z_erofs_fixup_insize")
d67aee76d418 ("erofs: tidy up z_erofs_lz4_decompress")
7e508f2ca8bb ("erofs: rename lz4_0pading to zero_padding")
eaa9172ad988 ("erofs: get rid of ->lru usage")
622ceaddb764 ("erofs: lzma compression support")
966edfb0a3dc ("erofs: rename some generic methods in decompressor")
386292919c25 ("erofs: introduce readmore decompression strategy")
8f89926290c4 ("erofs: get compression algorithms directly on mapping")
dfeab2e95a75 ("erofs: add multiple device support")
e62424651f43 ("erofs: decouple basic mount options from fs_context")
5b6e7e120e71 ("erofs: remove the fast path of per-CPU buffer decompression")
2e5fd489a4e5 ("Merge tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de Mon Sep 17 00:00:00 2001
From: Gao Xiang <xiang(a)kernel.org>
Date: Wed, 6 Dec 2023 12:55:34 +0800
Subject: [PATCH] erofs: fix lz4 inplace decompression
Currently EROFS can map another compressed buffer for inplace
decompression, that was used to handle the cases that some pages of
compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed
data is arranged at the end of the decompressed buffer and it
explicitly uses memmove() to handle overlapping:
__________________________________________________________
|_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two
individual virtual buffers so the relative order is uncertain.
Previously, it was hardly observed since LZ4 only uses memmove() for
short overlapped literals and x86/arm64 memmove implementations seem to
completely cover it up and they don't have this issue. Juhyung reported
that EROFS data corruption can be found on a new Intel x86 processor.
After some analysis, it seems that recent x86 processors with the new
FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace
decompression for now. Later, as an useful improvement, we could try
to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park <qkrwngud825(a)gmail.com>
Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR…
Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace")
Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend")
Cc: stable <stable(a)vger.kernel.org> # 5.4+
Tested-by: Yifan Zhao <zhaoyifan(a)sjtu.edu.cn>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.…
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 021be5feb1bc..e0d609c3958f 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -121,11 +121,11 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
}
static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
- void *inpage, unsigned int *inputmargin, int *maptype,
- bool may_inplace)
+ void *inpage, void *out, unsigned int *inputmargin,
+ int *maptype, bool may_inplace)
{
struct z_erofs_decompress_req *rq = ctx->rq;
- unsigned int omargin, total, i, j;
+ unsigned int omargin, total, i;
struct page **in;
void *src, *tmp;
@@ -135,12 +135,13 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize))
goto docopy;
- for (i = 0; i < ctx->inpages; ++i) {
- DBG_BUGON(rq->in[i] == NULL);
- for (j = 0; j < ctx->outpages - ctx->inpages + i; ++j)
- if (rq->out[j] == rq->in[i])
- goto docopy;
- }
+ for (i = 0; i < ctx->inpages; ++i)
+ if (rq->out[ctx->outpages - ctx->inpages + i] !=
+ rq->in[i])
+ goto docopy;
+ kunmap_local(inpage);
+ *maptype = 3;
+ return out + ((ctx->outpages - ctx->inpages) << PAGE_SHIFT);
}
if (ctx->inpages <= 1) {
@@ -148,7 +149,6 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
return inpage;
}
kunmap_local(inpage);
- might_sleep();
src = erofs_vm_map_ram(rq->in, ctx->inpages);
if (!src)
return ERR_PTR(-ENOMEM);
@@ -204,12 +204,12 @@ int z_erofs_fixup_insize(struct z_erofs_decompress_req *rq, const char *padbuf,
}
static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
- u8 *out)
+ u8 *dst)
{
struct z_erofs_decompress_req *rq = ctx->rq;
bool support_0padding = false, may_inplace = false;
unsigned int inputmargin;
- u8 *headpage, *src;
+ u8 *out, *headpage, *src;
int ret, maptype;
DBG_BUGON(*rq->in == NULL);
@@ -230,11 +230,12 @@ static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
}
inputmargin = rq->pageofs_in;
- src = z_erofs_lz4_handle_overlap(ctx, headpage, &inputmargin,
+ src = z_erofs_lz4_handle_overlap(ctx, headpage, dst, &inputmargin,
&maptype, may_inplace);
if (IS_ERR(src))
return PTR_ERR(src);
+ out = dst + rq->pageofs_out;
/* legacy format could compress extra data in a pcluster. */
if (rq->partial_decoding || !support_0padding)
ret = LZ4_decompress_safe_partial(src + inputmargin, out,
@@ -265,7 +266,7 @@ static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
vm_unmap_ram(src, ctx->inpages);
} else if (maptype == 2) {
erofs_put_pcpubuf(src);
- } else {
+ } else if (maptype != 3) {
DBG_BUGON(1);
return -EFAULT;
}
@@ -308,7 +309,7 @@ static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq,
}
dstmap_out:
- ret = z_erofs_lz4_decompress_mem(&ctx, dst + rq->pageofs_out);
+ ret = z_erofs_lz4_decompress_mem(&ctx, dst);
if (!dst_maptype)
kunmap_local(dst);
else if (dst_maptype == 2)
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Now that we are reading the full FIFO in the interrupt handler,
it is possible to have an emply FIFO since we are still receiving
1 interrupt per data. Handle correctly this case instead of having
an error causing a reset of the FIFO.
Fixes: 0829edc43e0a ("iio: imu: inv_mpu6050: read the full fifo when processing data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
V2: add missing stable tag
drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 66d4ba088e70..d4f9b5d8d28d 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -109,6 +109,8 @@ irqreturn_t inv_mpu6050_read_fifo(int irq, void *p)
/* compute and process only all complete datum */
nb = fifo_count / bytes_per_datum;
fifo_count = nb * bytes_per_datum;
+ if (nb == 0)
+ goto end_session;
/* Each FIFO data contains all sensors, so same number for FIFO and sensor data */
fifo_period = NSEC_PER_SEC / INV_MPU6050_DIVIDER_TO_FIFO_RATE(st->chip_config.divider);
inv_sensors_timestamp_interrupt(&st->timestamp, fifo_period, nb, nb, pf->timestamp);
--
2.34.1
Resolving a frequency to an efficient one should not transgress policy->max
(which can be set for thermal reason) and policy->min. Currently there is
possibility where scaling_cur_freq can exceed scaling_max_freq when
scaling_max_freq is inefficient frequency. Add additional check to ensure
that resolving a frequency will respect policy->min/max.
Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan(a)quicinc.com>
---
include/linux/cpufreq.h | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index afda5f24d3dd..42d98b576a36 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1021,6 +1021,19 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
efficiencies);
}
+static inline bool cpufreq_table_index_is_in_limits(struct cpufreq_policy *policy,
+ int idx)
+{
+ unsigned int freq;
+
+ if (idx < 0)
+ return false;
+
+ freq = policy->freq_table[idx].frequency;
+
+ return (freq == clamp_val(freq, policy->min, policy->max));
+}
+
static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
@@ -1054,7 +1067,10 @@ static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
return 0;
}
- if (idx < 0 && efficiencies) {
+ /*
+ * Limit frequency index to honor policy->min/max
+ */
+ if (!cpufreq_table_index_is_in_limits(policy, idx) && efficiencies) {
efficiencies = false;
goto retry;
}
--
2.25.1
Resolving a frequency to an efficient one should not transgress policy->max
(which can be set for thermal reason) and policy->min. Currently there is
possibility where scaling_cur_freq can exceed scaling_max_freq when
scaling_max_freq is inefficient frequency. Add additional check to ensure
that resolving a frequency will respect policy->min/max.
Cc: <stable(a)vger.kernel.org>
Fixes: 1f39fa0dccff ("cpufreq: Introducing CPUFREQ_RELATION_E")
Signed-off-by: Shivnandan Kumar <quic_kshivnan(a)quicinc.com>
--
Changes in v2:
-rename function name from cpufreq_table_index_is_in_limits to cpufreq_is_in_limits
-remove redundant outer parenthesis in return statement
-Make comment single line
--
---
include/linux/cpufreq.h | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index afda5f24d3dd..7741244dee6e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1021,6 +1021,19 @@ static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
efficiencies);
}
+static inline bool cpufreq_is_in_limits(struct cpufreq_policy *policy,
+ int idx)
+{
+ unsigned int freq;
+
+ if (idx < 0)
+ return false;
+
+ freq = policy->freq_table[idx].frequency;
+
+ return freq == clamp_val(freq, policy->min, policy->max);
+}
+
static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
@@ -1054,7 +1067,8 @@ static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
return 0;
}
- if (idx < 0 && efficiencies) {
+ /* Limit frequency index to honor policy->min/max */
+ if (!cpufreq_is_in_limits(policy, idx) && efficiencies) {
efficiencies = false;
goto retry;
}
--
2.25.1
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().
Note that the upper limit of ida_simple_get() is exclusive, but the one of
ida_alloc_range() is inclusive. So change this change allows one more
device. Previously address 0xFE was never used.
Fixes: 46a2bb5a7f7e ("slimbus: core: Add slim controllers support")
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
---
drivers/slimbus/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/slimbus/core.c b/drivers/slimbus/core.c
index d43873bb5fe6..01cbd4621981 100644
--- a/drivers/slimbus/core.c
+++ b/drivers/slimbus/core.c
@@ -436,8 +436,8 @@ static int slim_device_alloc_laddr(struct slim_device *sbdev,
if (ret < 0)
goto err;
} else if (report_present) {
- ret = ida_simple_get(&ctrl->laddr_ida,
- 0, SLIM_LA_MANAGER - 1, GFP_KERNEL);
+ ret = ida_alloc_max(&ctrl->laddr_ida,
+ SLIM_LA_MANAGER - 1, GFP_KERNEL);
if (ret < 0)
goto err;
--
2.25.1
Some Makefiles under tools/ use the 'override CFLAGS += ...' construct
to add a few required options to CFLAGS passed by the user.
Unfortunately that only works when user passes CFLAGS as an environment
variable, i.e.
CFLAGS=... make ...
and not in case when CFLAGS are passed as make command line arguments:
make ... CFLAGS=...
It happens because in the latter case CFLAGS=... is recorded in the make
variable MAKEOVERRIDES and this variable is passed in its original form
to all $(MAKE) subcommands, taking precedence over modified CFLAGS value
passed in the environment variable. E.g. this causes build failure for
gpio and iio tools when the build is run with user CFLAGS because of
missing _GNU_SOURCE definition needed for the asprintf().
One way to fix it is by removing overridden variables from the
MAKEOVERRIDES. Add macro 'drop-var-from-overrides' that removes a
definition of a variable passed to it from the MAKEOVERRIDES and use it
to fix CFLAGS passing for tools/gpio and tools/iio.
This implementation tries to be precise in string processing and handle
variables with embedded spaces and backslashes correctly. To achieve
that it replaces every '\\' sequence with '\-' to make sure that every
'\' in the resulting string is an escape character. It then replaces
every '\ ' sequence with '\_' to turn string values with embedded spaces
into single words. After filtering the overridden variable definition
out of the resulting string these two transformations are reversed.
Cc: stable(a)vger.kernel.org
Fixes: 4ccc98a48958 ("tools gpio: Allow overriding CFLAGS")
Fixes: 572974610273 ("tools iio: Override CFLAGS assignments")
Signed-off-by: Max Filippov <jcmvbkbc(a)gmail.com>
---
Changes v1->v2:
- make drop-var-from-overrides-code work correctly with arbitrary
variables, including thoses ending with '\'.
tools/gpio/Makefile | 1 +
tools/iio/Makefile | 1 +
tools/scripts/Makefile.include | 9 +++++++++
3 files changed, 11 insertions(+)
diff --git a/tools/gpio/Makefile b/tools/gpio/Makefile
index d29c9c49e251..46fc38d51639 100644
--- a/tools/gpio/Makefile
+++ b/tools/gpio/Makefile
@@ -24,6 +24,7 @@ ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
all: $(ALL_PROGRAMS)
export srctree OUTPUT CC LD CFLAGS
+$(call drop-var-from-overrides,CFLAGS)
include $(srctree)/tools/build/Makefile.include
#
diff --git a/tools/iio/Makefile b/tools/iio/Makefile
index fa720f062229..04307588dd3f 100644
--- a/tools/iio/Makefile
+++ b/tools/iio/Makefile
@@ -20,6 +20,7 @@ ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
all: $(ALL_PROGRAMS)
export srctree OUTPUT CC LD CFLAGS
+$(call drop-var-from-overrides,CFLAGS)
include $(srctree)/tools/build/Makefile.include
#
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 6fba29f3222d..0f68b95cf55c 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -51,6 +51,15 @@ define allow-override
$(eval $(1) = $(2)))
endef
+# When a Makefile overrides a variable and exports it for the nested $(MAKE)
+# invocations to use its modified value, it must remove that variable definition
+# from the MAKEOVERRIDES variable, otherwise the original definition from the
+# MAKEOVERRIDES takes precedence over the exported value.
+drop-var-from-overrides = $(eval $(drop-var-from-overrides-code))
+define drop-var-from-overrides-code
+MAKEOVERRIDES := $(subst \-,\\,$(subst \_,\ ,$(filter-out $(1)=%,$(subst \ ,\_,$(subst \\,\-,$(MAKEOVERRIDES))))))
+endef
+
ifneq ($(LLVM),)
ifneq ($(filter %/,$(LLVM)),)
LLVM_PREFIX := $(LLVM)
--
2.39.2
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024012648-backwater-colt-7290@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
3c12466b6b7b ("erofs: fix lz4 inplace decompression")
123ec246ebe3 ("erofs: get rid of the remaining kmap_atomic()")
ab749badf9f4 ("erofs: support unaligned data decompression")
10e5f6e482e1 ("erofs: introduce z_erofs_fixup_insize")
d67aee76d418 ("erofs: tidy up z_erofs_lz4_decompress")
7e508f2ca8bb ("erofs: rename lz4_0pading to zero_padding")
eaa9172ad988 ("erofs: get rid of ->lru usage")
622ceaddb764 ("erofs: lzma compression support")
966edfb0a3dc ("erofs: rename some generic methods in decompressor")
386292919c25 ("erofs: introduce readmore decompression strategy")
8f89926290c4 ("erofs: get compression algorithms directly on mapping")
dfeab2e95a75 ("erofs: add multiple device support")
e62424651f43 ("erofs: decouple basic mount options from fs_context")
5b6e7e120e71 ("erofs: remove the fast path of per-CPU buffer decompression")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de Mon Sep 17 00:00:00 2001
From: Gao Xiang <xiang(a)kernel.org>
Date: Wed, 6 Dec 2023 12:55:34 +0800
Subject: [PATCH] erofs: fix lz4 inplace decompression
Currently EROFS can map another compressed buffer for inplace
decompression, that was used to handle the cases that some pages of
compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed
data is arranged at the end of the decompressed buffer and it
explicitly uses memmove() to handle overlapping:
__________________________________________________________
|_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two
individual virtual buffers so the relative order is uncertain.
Previously, it was hardly observed since LZ4 only uses memmove() for
short overlapped literals and x86/arm64 memmove implementations seem to
completely cover it up and they don't have this issue. Juhyung reported
that EROFS data corruption can be found on a new Intel x86 processor.
After some analysis, it seems that recent x86 processors with the new
FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace
decompression for now. Later, as an useful improvement, we could try
to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park <qkrwngud825(a)gmail.com>
Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR…
Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace")
Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend")
Cc: stable <stable(a)vger.kernel.org> # 5.4+
Tested-by: Yifan Zhao <zhaoyifan(a)sjtu.edu.cn>
Signed-off-by: Gao Xiang <hsiangkao(a)linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.…
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 021be5feb1bc..e0d609c3958f 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -121,11 +121,11 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
}
static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
- void *inpage, unsigned int *inputmargin, int *maptype,
- bool may_inplace)
+ void *inpage, void *out, unsigned int *inputmargin,
+ int *maptype, bool may_inplace)
{
struct z_erofs_decompress_req *rq = ctx->rq;
- unsigned int omargin, total, i, j;
+ unsigned int omargin, total, i;
struct page **in;
void *src, *tmp;
@@ -135,12 +135,13 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize))
goto docopy;
- for (i = 0; i < ctx->inpages; ++i) {
- DBG_BUGON(rq->in[i] == NULL);
- for (j = 0; j < ctx->outpages - ctx->inpages + i; ++j)
- if (rq->out[j] == rq->in[i])
- goto docopy;
- }
+ for (i = 0; i < ctx->inpages; ++i)
+ if (rq->out[ctx->outpages - ctx->inpages + i] !=
+ rq->in[i])
+ goto docopy;
+ kunmap_local(inpage);
+ *maptype = 3;
+ return out + ((ctx->outpages - ctx->inpages) << PAGE_SHIFT);
}
if (ctx->inpages <= 1) {
@@ -148,7 +149,6 @@ static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx,
return inpage;
}
kunmap_local(inpage);
- might_sleep();
src = erofs_vm_map_ram(rq->in, ctx->inpages);
if (!src)
return ERR_PTR(-ENOMEM);
@@ -204,12 +204,12 @@ int z_erofs_fixup_insize(struct z_erofs_decompress_req *rq, const char *padbuf,
}
static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
- u8 *out)
+ u8 *dst)
{
struct z_erofs_decompress_req *rq = ctx->rq;
bool support_0padding = false, may_inplace = false;
unsigned int inputmargin;
- u8 *headpage, *src;
+ u8 *out, *headpage, *src;
int ret, maptype;
DBG_BUGON(*rq->in == NULL);
@@ -230,11 +230,12 @@ static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
}
inputmargin = rq->pageofs_in;
- src = z_erofs_lz4_handle_overlap(ctx, headpage, &inputmargin,
+ src = z_erofs_lz4_handle_overlap(ctx, headpage, dst, &inputmargin,
&maptype, may_inplace);
if (IS_ERR(src))
return PTR_ERR(src);
+ out = dst + rq->pageofs_out;
/* legacy format could compress extra data in a pcluster. */
if (rq->partial_decoding || !support_0padding)
ret = LZ4_decompress_safe_partial(src + inputmargin, out,
@@ -265,7 +266,7 @@ static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx,
vm_unmap_ram(src, ctx->inpages);
} else if (maptype == 2) {
erofs_put_pcpubuf(src);
- } else {
+ } else if (maptype != 3) {
DBG_BUGON(1);
return -EFAULT;
}
@@ -308,7 +309,7 @@ static int z_erofs_lz4_decompress(struct z_erofs_decompress_req *rq,
}
dstmap_out:
- ret = z_erofs_lz4_decompress_mem(&ctx, dst + rq->pageofs_out);
+ ret = z_erofs_lz4_decompress_mem(&ctx, dst);
if (!dst_maptype)
kunmap_local(dst);
else if (dst_maptype == 2)
The quilt patch titled
Subject: mm: cachestat: fix folio read-after-free in cache walk
has been removed from the -mm tree. Its filename was
mm-cachestat-fix-folio-read-after-free-in-cache-walk.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Nhat Pham <nphamcs(a)gmail.com>
Subject: mm: cachestat: fix folio read-after-free in cache walk
Date: Mon, 19 Feb 2024 19:01:21 -0800
In cachestat, we access the folio from the page cache's xarray to compute
its page offset, and check for its dirty and writeback flags. However, we
do not hold a reference to the folio before performing these actions,
which means the folio can concurrently be released and reused as another
folio/page/slab.
Get around this altogether by just using xarray's existing machinery for
the folio page offsets and dirty/writeback states.
This changes behavior for tmpfs files to now always report zeroes in their
dirty and writeback counters. This is okay as tmpfs doesn't follow
conventional writeback cache behavior: its pages get "cleaned" during
swapout, after which they're no longer resident etc.
Link: https://lkml.kernel.org/r/20240220153409.GA216065@cmpxchg.org
Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
Reported-by: Jann Horn <jannh(a)google.com>
Suggested-by: Matthew Wilcox <willy(a)infradead.org>
Signed-off-by: Nhat Pham <nphamcs(a)gmail.com>
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Tested-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> [6.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 51 ++++++++++++++++++++++++-------------------------
1 file changed, 26 insertions(+), 25 deletions(-)
--- a/mm/filemap.c~mm-cachestat-fix-folio-read-after-free-in-cache-walk
+++ a/mm/filemap.c
@@ -4111,28 +4111,40 @@ static void filemap_cachestat(struct add
rcu_read_lock();
xas_for_each(&xas, folio, last_index) {
+ int order;
unsigned long nr_pages;
pgoff_t folio_first_index, folio_last_index;
+ /*
+ * Don't deref the folio. It is not pinned, and might
+ * get freed (and reused) underneath us.
+ *
+ * We *could* pin it, but that would be expensive for
+ * what should be a fast and lightweight syscall.
+ *
+ * Instead, derive all information of interest from
+ * the rcu-protected xarray.
+ */
+
if (xas_retry(&xas, folio))
continue;
+ order = xa_get_order(xas.xa, xas.xa_index);
+ nr_pages = 1 << order;
+ folio_first_index = round_down(xas.xa_index, 1 << order);
+ folio_last_index = folio_first_index + nr_pages - 1;
+
+ /* Folios might straddle the range boundaries, only count covered pages */
+ if (folio_first_index < first_index)
+ nr_pages -= first_index - folio_first_index;
+
+ if (folio_last_index > last_index)
+ nr_pages -= folio_last_index - last_index;
+
if (xa_is_value(folio)) {
/* page is evicted */
void *shadow = (void *)folio;
bool workingset; /* not used */
- int order = xa_get_order(xas.xa, xas.xa_index);
-
- nr_pages = 1 << order;
- folio_first_index = round_down(xas.xa_index, 1 << order);
- folio_last_index = folio_first_index + nr_pages - 1;
-
- /* Folios might straddle the range boundaries, only count covered pages */
- if (folio_first_index < first_index)
- nr_pages -= first_index - folio_first_index;
-
- if (folio_last_index > last_index)
- nr_pages -= folio_last_index - last_index;
cs->nr_evicted += nr_pages;
@@ -4150,24 +4162,13 @@ static void filemap_cachestat(struct add
goto resched;
}
- nr_pages = folio_nr_pages(folio);
- folio_first_index = folio_pgoff(folio);
- folio_last_index = folio_first_index + nr_pages - 1;
-
- /* Folios might straddle the range boundaries, only count covered pages */
- if (folio_first_index < first_index)
- nr_pages -= first_index - folio_first_index;
-
- if (folio_last_index > last_index)
- nr_pages -= folio_last_index - last_index;
-
/* page is in cache */
cs->nr_cache += nr_pages;
- if (folio_test_dirty(folio))
+ if (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY))
cs->nr_dirty += nr_pages;
- if (folio_test_writeback(folio))
+ if (xas_get_mark(&xas, PAGECACHE_TAG_WRITEBACK))
cs->nr_writeback += nr_pages;
resched:
_
Patches currently in -mm which might be from nphamcs(a)gmail.com are
The patch titled
Subject: init/Kconfig: lower GCC version check for -Warray-bounds
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
init-kconfig-lower-gcc-version-check-for-warray-bounds.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Kees Cook <keescook(a)chromium.org>
Subject: init/Kconfig: lower GCC version check for -Warray-bounds
Date: Fri, 23 Feb 2024 09:08:27 -0800
We continue to see false positives from -Warray-bounds even in GCC 10,
which is getting reported in a few places[1] still:
security/security.c:811:2: warning: `memcpy' offset 32 is out of the bounds [0, 0] [-Warray-bounds]
Lower the GCC version check from 11 to 10.
Link: https://lkml.kernel.org/r/20240223170824.work.768-kees@kernel.org
Reported-by: Lu Yao <yaolu(a)kylinos.cn>
Closes: https://lore.kernel.org/lkml/20240117014541.8887-1-yaolu@kylinos.cn/
Link: https://lore.kernel.org/linux-next/65d84438.620a0220.7d171.81a7@mx.google.c… [1]
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Gustavo A. R. Silva" <gustavoars(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Marc Aur��le La France <tsi(a)tuyoix.net>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
init/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/init/Kconfig~init-kconfig-lower-gcc-version-check-for-warray-bounds
+++ a/init/Kconfig
@@ -876,14 +876,14 @@ config CC_IMPLICIT_FALLTHROUGH
default "-Wimplicit-fallthrough=5" if CC_IS_GCC && $(cc-option,-Wimplicit-fallthrough=5)
default "-Wimplicit-fallthrough" if CC_IS_CLANG && $(cc-option,-Wunreachable-code-fallthrough)
-# Currently, disable gcc-11+ array-bounds globally.
+# Currently, disable gcc-10+ array-bounds globally.
# It's still broken in gcc-13, so no upper bound yet.
-config GCC11_NO_ARRAY_BOUNDS
+config GCC10_NO_ARRAY_BOUNDS
def_bool y
config CC_NO_ARRAY_BOUNDS
bool
- default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC11_NO_ARRAY_BOUNDS
+ default y if CC_IS_GCC && GCC_VERSION >= 100000 && GCC10_NO_ARRAY_BOUNDS
# Currently, disable -Wstringop-overflow for GCC globally.
config GCC_NO_STRINGOP_OVERFLOW
_
Patches currently in -mm which might be from keescook(a)chromium.org are
init-kconfig-lower-gcc-version-check-for-warray-bounds.patch
Limit the WiFi PCIe link speed to Gen2 speed (500 MB/s), which is the
speed that the boot firmware has brought up the link at (and that
Windows uses).
This is specifically needed to avoid a large amount of link errors when
restarting the link during boot (but which are currently not reported).
This also appears to fix intermittent failures to download the ath11k
firmware during boot which can be seen when there is a longer delay
between restarting the link and loading the WiFi driver (e.g. when using
full disk encryption).
Fixes: 123b30a75623 ("arm64: dts: qcom: sc8280xp-x13s: enable WiFi controller")
Cc: stable(a)vger.kernel.org # 6.2
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
index 2c17e137563a..a67756ada990 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
+++ b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
@@ -768,6 +768,8 @@ &pcie3a_phy {
};
&pcie4 {
+ max-link-speed = <2>;
+
perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
--
2.43.0
From: Paul E. McKenney <paulmck(a)kernel.org>
commit bc31e6cb27a9334140ff2f0a209d59b08bc0bc8c upstream.
Holding a mutex across synchronize_rcu_tasks() and acquiring
that same mutex in code called from do_exit() after its call to
exit_tasks_rcu_start() but before its call to exit_tasks_rcu_stop()
results in deadlock. This is by design, because tasks that are far
enough into do_exit() are no longer present on the tasks list, making
it a bit difficult for RCU Tasks to find them, let alone wait on them
to do a voluntary context switch. However, such deadlocks are becoming
more frequent. In addition, lockdep currently does not detect such
deadlocks and they can be difficult to reproduce.
In addition, if a task voluntarily context switches during that time
(for example, if it blocks acquiring a mutex), then this task is in an
RCU Tasks quiescent state. And with some adjustments, RCU Tasks could
just as well take advantage of that fact.
This commit therefore eliminates these deadlock by replacing the
SRCU-based wait for do_exit() completion with per-CPU lists of tasks
currently exiting. A given task will be on one of these per-CPU lists for
the same period of time that this task would previously have been in the
previous SRCU read-side critical section. These lists enable RCU Tasks
to find the tasks that have already been removed from the tasks list,
but that must nevertheless be waited upon.
The RCU Tasks grace period gathers any of these do_exit() tasks that it
must wait on, and adds them to the list of holdouts. Per-CPU locking
and get_task_struct() are used to synchronize addition to and removal
from these lists.
Link: https://lore.kernel.org/all/20240118021842.290665-1-chenzhongjin@huawei.com/
Reported-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Zqiang <qiang.zhang1211(a)gmail.com>
---
include/linux/sched.h | 1 +
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/rcu/tasks.h | 54 ++++++++++++++++++++++++++++---------------
4 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa015416c569..80499f7ab39a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -740,6 +740,7 @@ struct task_struct {
u8 rcu_tasks_idx;
int rcu_tasks_idle_cpu;
struct list_head rcu_tasks_holdout_list;
+ struct list_head rcu_tasks_exit_list;
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_TRACE_RCU
diff --git a/init/init_task.c b/init/init_task.c
index 5fa18ed59d33..59454d6e2c2a 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -151,6 +151,7 @@ struct task_struct init_task
.rcu_tasks_holdout = false,
.rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
.rcu_tasks_idle_cpu = -1,
+ .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
#endif
#ifdef CONFIG_TASKS_TRACE_RCU
.trc_reader_nesting = 0,
diff --git a/kernel/fork.c b/kernel/fork.c
index 633b0af1d1a7..86803165aa00 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1699,6 +1699,7 @@ static inline void rcu_copy_process(struct task_struct *p)
p->rcu_tasks_holdout = false;
INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
p->rcu_tasks_idle_cpu = -1;
+ INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
#endif /* #ifdef CONFIG_TASKS_RCU */
#ifdef CONFIG_TASKS_TRACE_RCU
p->trc_reader_nesting = 0;
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index c5624ab0580c..901cd7bc78ed 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -81,9 +81,6 @@ static struct rcu_tasks rt_name = \
.kname = #rt_name, \
}
-/* Track exiting tasks in order to allow them to be waited for. */
-DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
-
/* Avoid IPIing CPUs early in the grace period. */
#define RCU_TASK_IPI_DELAY (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB) ? HZ / 2 : 0)
static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY;
@@ -383,6 +380,9 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
// rates from multiple CPUs. If this is required, per-CPU callback lists
// will be needed.
+static LIST_HEAD(rtp_exit_list);
+static DEFINE_RAW_SPINLOCK(rtp_exit_list_lock);
+
/* Pre-grace-period preparation. */
static void rcu_tasks_pregp_step(void)
{
@@ -416,15 +416,18 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
/* Processing between scanning taskslist and draining the holdout list. */
static void rcu_tasks_postscan(struct list_head *hop)
{
+ unsigned long flags;
+ struct task_struct *t;
+
/*
* Exiting tasks may escape the tasklist scan. Those are vulnerable
* until their final schedule() with TASK_DEAD state. To cope with
* this, divide the fragile exit path part in two intersecting
* read side critical sections:
*
- * 1) An _SRCU_ read side starting before calling exit_notify(),
- * which may remove the task from the tasklist, and ending after
- * the final preempt_disable() call in do_exit().
+ * 1) A task_struct list addition before calling exit_notify(),
+ * which may remove the task from the tasklist, with the
+ * removal after the final preempt_disable() call in do_exit().
*
* 2) An _RCU_ read side starting with the final preempt_disable()
* call in do_exit() and ending with the final call to schedule()
@@ -433,7 +436,12 @@ static void rcu_tasks_postscan(struct list_head *hop)
* This handles the part 1). And postgp will handle part 2) with a
* call to synchronize_rcu().
*/
- synchronize_srcu(&tasks_rcu_exit_srcu);
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_for_each_entry(t, &rtp_exit_list, rcu_tasks_exit_list) {
+ if (list_empty(&t->rcu_tasks_holdout_list))
+ rcu_tasks_pertask(t, hop);
+ }
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
}
/* See if tasks are still holding out, complain if so. */
@@ -498,7 +506,6 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
*
* In addition, this synchronize_rcu() waits for exiting tasks
* to complete their final preempt_disable() region of execution,
- * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu),
* enforcing the whole region before tasklist removal until
* the final schedule() with TASK_DEAD state to be an RCU TASKS
* read side critical section.
@@ -591,25 +598,36 @@ static void show_rcu_tasks_classic_gp_kthread(void)
#endif /* #ifndef CONFIG_TINY_RCU */
/*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Protect against tasklist scan blind spot while the task is exiting and
+ * may be removed from the tasklist. Do this by adding the task to yet
+ * another list.
*/
-void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_start(void)
{
- current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
+ unsigned long flags;
+ struct task_struct *t = current;
+
+ WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
+ get_task_struct(t);
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_add(&t->rcu_tasks_exit_list, &rtp_exit_list);
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
}
/*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Remove the task from the "yet another list" because do_exit() is now
+ * non-preemptible, allowing synchronize_rcu() to wait beyond this point.
*/
-void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_stop(void)
{
+ unsigned long flags;
struct task_struct *t = current;
- __srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
+ WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
+ raw_spin_lock_irqsave(&rtp_exit_list_lock, flags);
+ list_del_init(&t->rcu_tasks_exit_list);
+ raw_spin_unlock_irqrestore(&rtp_exit_list_lock, flags);
+ put_task_struct(t);
}
/*
--
2.17.1
Hi stable team - please don't take patches for fs/bcachefs/ except from
myself; I'll be doing backports and sending pull requests after stuff
has been tested by my CI.
Thanks, and let me know if there's any other workflow things I should
know about
-Kent
From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Fix link error:
ld.bfd: drivers/mfd/twl-core.o: in function `twl_probe':
git/drivers/mfd/twl-core.c:846: undefined reference to `devm_mfd_add_devices'
Cc: <stable(a)vger.kernel.org>
Fixes: 63416320419e ("mfd: twl-core: Add a clock subdevice for the TWL6032")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
---
drivers/mfd/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 90ce58fd629e5..1195a27c881e4 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1772,6 +1772,7 @@ config TWL4030_CORE
bool "TI TWL4030/TWL5030/TWL6030/TPS659x0 Support"
depends on I2C=y
select IRQ_DOMAIN
+ select MFD_CORE
select REGMAP_I2C
help
Say yes here if you have TWL4030 / TWL6030 family chip on your board.
--
2.43.0
From: Cyril Hrubis <chrubis(a)suse.cz>
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Petr Vorel <pvorel(a)suse.cz>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Tested-by: Petr Vorel <pvorel(a)suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
[ pvorel: rebased for 4.19 ]
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
kernel/sched/rt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 394c66442cff..ce4594215728 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,7 +8,7 @@
#include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE;
-int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
+int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ;
static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
--
2.35.3
From: Jan Kiszka <jan.kiszka(a)siemens.com>
commit afb2a4fb84555ef9e61061f6ea63ed7087b295d5 upstream.
The cflags for the RISC-V efistub were missing -mno-relax, thus were
under the risk that the compiler could use GP-relative addressing. That
happened for _edata with binutils-2.41 and kernel 6.1, causing the
relocation to fail due to an invalid kernel_size in handle_kernel_image.
It was not yet observed with newer versions, but that may just be luck.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/firmware/efi/libstub/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index ef5045a53ce0..b6e1dcb98a64 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM) := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
-fno-builtin -fpic \
$(call cc-option,-mno-single-pic-base)
cflags-$(CONFIG_RISCV) := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
- -fpic
+ -fpic -mno-relax
cflags-$(CONFIG_LOONGARCH) := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
-fpie
--
2.35.3
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 14db5f64a971fce3d8ea35de4dfc7f443a3efb92
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024021942-driven-backhand-7edd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
14db5f64a971 ("zonefs: Improve error handling")
77af13ba3c7f ("zonefs: Do not propagate iomap_dio_rw() ENOTBLK error to user space")
aa7f243f32e1 ("zonefs: Separate zone information from inode information")
34422914dc00 ("zonefs: Reduce struct zonefs_inode_info size")
46a9c526eef7 ("zonefs: Simplify IO error handling")
4008e2a0b01a ("zonefs: Reorganize code")
a608da3bd730 ("zonefs: Detect append writes at invalid locations")
db58653ce0c7 ("zonefs: Fix active zone accounting")
7dd12d65ac64 ("zonefs: fix zone report size in __zonefs_io_error()")
8745889a7fd0 ("Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 14db5f64a971fce3d8ea35de4dfc7f443a3efb92 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Thu, 8 Feb 2024 17:26:59 +0900
Subject: [PATCH] zonefs: Improve error handling
Write error handling is racy and can sometime lead to the error recovery
path wrongly changing the inode size of a sequential zone file to an
incorrect value which results in garbage data being readable at the end
of a file. There are 2 problems:
1) zonefs_file_dio_write() updates a zone file write pointer offset
after issuing a direct IO with iomap_dio_rw(). This update is done
only if the IO succeed for synchronous direct writes. However, for
asynchronous direct writes, the update is done without waiting for
the IO completion so that the next asynchronous IO can be
immediately issued. However, if an asynchronous IO completes with a
failure right before the i_truncate_mutex lock protecting the update,
the update may change the value of the inode write pointer offset
that was corrected by the error path (zonefs_io_error() function).
2) zonefs_io_error() is called when a read or write error occurs. This
function executes a report zone operation using the callback function
zonefs_io_error_cb(), which does all the error recovery handling
based on the current zone condition, write pointer position and
according to the mount options being used. However, depending on the
zoned device being used, a report zone callback may be executed in a
context that is different from the context of __zonefs_io_error(). As
a result, zonefs_io_error_cb() may be executed without the inode
truncate mutex lock held, which can lead to invalid error processing.
Fix both problems as follows:
- Problem 1: Perform the inode write pointer offset update before a
direct write is issued with iomap_dio_rw(). This is safe to do as
partial direct writes are not supported (IOMAP_DIO_PARTIAL is not
set) and any failed IO will trigger the execution of zonefs_io_error()
which will correct the inode write pointer offset to reflect the
current state of the one on the device.
- Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error()
and call this function directly from __zonefs_io_error() after
obtaining the zone information using blkdev_report_zones() with a
simple callback function that copies to a local stack variable the
struct blk_zone obtained from the device. This ensures that error
handling is performed holding the inode truncate mutex.
This change also simplifies error handling for conventional zone files
by bypassing the execution of report zones entirely. This is safe to
do because the condition of conventional zones cannot be read-only or
offline and conventional zone files are always fully mapped with a
constant file size.
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani(a)oracle.com>
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 6ab2318a9c8e..dba5dcb62bef 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -348,7 +348,12 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
struct zonefs_inode_info *zi = ZONEFS_I(inode);
if (error) {
- zonefs_io_error(inode, true);
+ /*
+ * For Sync IOs, error recovery is called from
+ * zonefs_file_dio_write().
+ */
+ if (!is_sync_kiocb(iocb))
+ zonefs_io_error(inode, true);
return error;
}
@@ -491,6 +496,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
ret = -EINVAL;
goto inode_unlock;
}
+ /*
+ * Advance the zone write pointer offset. This assumes that the
+ * IO will succeed, which is OK to do because we do not allow
+ * partial writes (IOMAP_DIO_PARTIAL is not set) and if the IO
+ * fails, the error path will correct the write pointer offset.
+ */
+ z->z_wpoffset += count;
+ zonefs_inode_account_active(inode);
mutex_unlock(&zi->i_truncate_mutex);
}
@@ -504,20 +517,19 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
if (ret == -ENOTBLK)
ret = -EBUSY;
- if (zonefs_zone_is_seq(z) &&
- (ret > 0 || ret == -EIOCBQUEUED)) {
- if (ret > 0)
- count = ret;
-
- /*
- * Update the zone write pointer offset assuming the write
- * operation succeeded. If it did not, the error recovery path
- * will correct it. Also do active seq file accounting.
- */
- mutex_lock(&zi->i_truncate_mutex);
- z->z_wpoffset += count;
- zonefs_inode_account_active(inode);
- mutex_unlock(&zi->i_truncate_mutex);
+ /*
+ * For a failed IO or partial completion, trigger error recovery
+ * to update the zone write pointer offset to a correct value.
+ * For asynchronous IOs, zonefs_file_write_dio_end_io() may already
+ * have executed error recovery if the IO already completed when we
+ * reach here. However, we cannot know that and execute error recovery
+ * again (that will not change anything).
+ */
+ if (zonefs_zone_is_seq(z)) {
+ if (ret > 0 && ret != count)
+ ret = -EIO;
+ if (ret < 0 && ret != -EIOCBQUEUED)
+ zonefs_io_error(inode, true);
}
inode_unlock:
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 93971742613a..b6e8e7c96251 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -246,16 +246,18 @@ static void zonefs_inode_update_mode(struct inode *inode)
z->z_mode = inode->i_mode;
}
-struct zonefs_ioerr_data {
- struct inode *inode;
- bool write;
-};
-
static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
void *data)
{
- struct zonefs_ioerr_data *err = data;
- struct inode *inode = err->inode;
+ struct blk_zone *z = data;
+
+ *z = *zone;
+ return 0;
+}
+
+static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone,
+ bool write)
+{
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct super_block *sb = inode->i_sb;
struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
@@ -270,8 +272,8 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
data_size = zonefs_check_zone_condition(sb, z, zone);
isize = i_size_read(inode);
if (!(z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE)) &&
- !err->write && isize == data_size)
- return 0;
+ !write && isize == data_size)
+ return;
/*
* At this point, we detected either a bad zone or an inconsistency
@@ -292,7 +294,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
* In all cases, warn about inode size inconsistency and handle the
* IO error according to the zone condition and to the mount options.
*/
- if (zonefs_zone_is_seq(z) && isize != data_size)
+ if (isize != data_size)
zonefs_warn(sb,
"inode %lu: invalid size %lld (should be %lld)\n",
inode->i_ino, isize, data_size);
@@ -352,8 +354,6 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
zonefs_i_size_write(inode, data_size);
z->z_wpoffset = data_size;
zonefs_inode_account_active(inode);
-
- return 0;
}
/*
@@ -367,23 +367,25 @@ void __zonefs_io_error(struct inode *inode, bool write)
{
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct super_block *sb = inode->i_sb;
- struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
unsigned int noio_flag;
- unsigned int nr_zones = 1;
- struct zonefs_ioerr_data err = {
- .inode = inode,
- .write = write,
- };
+ struct blk_zone zone;
int ret;
/*
- * The only files that have more than one zone are conventional zone
- * files with aggregated conventional zones, for which the inode zone
- * size is always larger than the device zone size.
+ * Conventional zone have no write pointer and cannot become read-only
+ * or offline. So simply fake a report for a single or aggregated zone
+ * and let zonefs_handle_io_error() correct the zone inode information
+ * according to the mount options.
*/
- if (z->z_size > bdev_zone_sectors(sb->s_bdev))
- nr_zones = z->z_size >>
- (sbi->s_zone_sectors_shift + SECTOR_SHIFT);
+ if (!zonefs_zone_is_seq(z)) {
+ zone.start = z->z_sector;
+ zone.len = z->z_size >> SECTOR_SHIFT;
+ zone.wp = zone.start + zone.len;
+ zone.type = BLK_ZONE_TYPE_CONVENTIONAL;
+ zone.cond = BLK_ZONE_COND_NOT_WP;
+ zone.capacity = zone.len;
+ goto handle_io_error;
+ }
/*
* Memory allocations in blkdev_report_zones() can trigger a memory
@@ -394,12 +396,20 @@ void __zonefs_io_error(struct inode *inode, bool write)
* the GFP_NOIO context avoids both problems.
*/
noio_flag = memalloc_noio_save();
- ret = blkdev_report_zones(sb->s_bdev, z->z_sector, nr_zones,
- zonefs_io_error_cb, &err);
- if (ret != nr_zones)
+ ret = blkdev_report_zones(sb->s_bdev, z->z_sector, 1,
+ zonefs_io_error_cb, &zone);
+ memalloc_noio_restore(noio_flag);
+
+ if (ret != 1) {
zonefs_err(sb, "Get inode %lu zone information failed %d\n",
inode->i_ino, ret);
- memalloc_noio_restore(noio_flag);
+ zonefs_warn(sb, "remounting filesystem read-only\n");
+ sb->s_flags |= SB_RDONLY;
+ return;
+ }
+
+handle_io_error:
+ zonefs_handle_io_error(inode, &zone, write);
}
static struct kmem_cache *zonefs_inode_cachep;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 67695f18d55924b2013534ef3bdc363bc9e14605
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024021850-vaseline-mongrel-489e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67695f18d55924b2013534ef3bdc363bc9e14605 Mon Sep 17 00:00:00 2001
From: Lokesh Gidra <lokeshgidra(a)google.com>
Date: Wed, 17 Jan 2024 14:37:29 -0800
Subject: [PATCH] userfaultfd: fix mmap_changing checking in
mfill_atomic_hugetlb
In mfill_atomic_hugetlb(), mmap_changing isn't being checked
again if we drop mmap_lock and reacquire it. When the lock is not held,
mmap_changing could have been incremented. This is also inconsistent
with the behavior in mfill_atomic().
Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com
Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Signed-off-by: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Nicolas Geoffray <ngeoffray(a)google.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 20e3b0d9cf7e..75fcf1f783bc 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -357,6 +357,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
+ atomic_t *mmap_changing,
uffd_flags_t flags)
{
struct mm_struct *dst_mm = dst_vma->vm_mm;
@@ -472,6 +473,15 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
goto out;
}
mmap_read_lock(dst_mm);
+ /*
+ * If memory mappings are changing because of non-cooperative
+ * operation (e.g. mremap) running in parallel, bail out and
+ * request the user to retry later
+ */
+ if (mmap_changing && atomic_read(mmap_changing)) {
+ err = -EAGAIN;
+ break;
+ }
dst_vma = NULL;
goto retry;
@@ -506,6 +516,7 @@ extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
unsigned long dst_start,
unsigned long src_start,
unsigned long len,
+ atomic_t *mmap_changing,
uffd_flags_t flags);
#endif /* CONFIG_HUGETLB_PAGE */
@@ -622,8 +633,8 @@ static __always_inline ssize_t mfill_atomic(struct mm_struct *dst_mm,
* If this is a HUGETLB vma, pass off to appropriate routine
*/
if (is_vm_hugetlb_page(dst_vma))
- return mfill_atomic_hugetlb(dst_vma, dst_start,
- src_start, len, flags);
+ return mfill_atomic_hugetlb(dst_vma, dst_start, src_start,
+ len, mmap_changing, flags);
if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
goto out_unlock;
commit 5124a0a549857c4b87173280e192eea24dea72ad upstream.
If DAT metadata file block access fails due to corruption of the DAT file
or abnormal virtual block numbers held by b-trees or inodes, a kernel
warning is generated.
This replaces the WARN_ONs by error output, so that a kernel, booted with
panic_on_warn, does not panic. This patch also replaces the detected
return code -ENOENT with another internal code -EINVAL to notify the bmap
layer of metadata corruption. When the bmap layer sees -EINVAL, it
handles the abnormal situation with nilfs_bmap_convert_error() and finally
returns code -EIO as it should.
Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com
Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+5d5d25f90f195a3cfcb4(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please use this patch for these versions instead of the patch I asked
you to drop in the previous review comments.
This replacement patch uses an equivalent call using nilfs_msg()
instead of nilfs_err(), which does not exist in these versions.
Thanks,
Ryusuke Konishi
fs/nilfs2/dat.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index e2a5320f2718..b9c759addd50 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -40,8 +40,21 @@ static inline struct nilfs_dat_info *NILFS_DAT_I(struct inode *dat)
static int nilfs_dat_prepare_entry(struct inode *dat,
struct nilfs_palloc_req *req, int create)
{
- return nilfs_palloc_get_entry_block(dat, req->pr_entry_nr,
- create, &req->pr_entry_bh);
+ int ret;
+
+ ret = nilfs_palloc_get_entry_block(dat, req->pr_entry_nr,
+ create, &req->pr_entry_bh);
+ if (unlikely(ret == -ENOENT)) {
+ nilfs_msg(dat->i_sb, KERN_ERR,
+ "DAT doesn't have a block to manage vblocknr = %llu",
+ (unsigned long long)req->pr_entry_nr);
+ /*
+ * Return internal code -EINVAL to notify bmap layer of
+ * metadata corruption.
+ */
+ ret = -EINVAL;
+ }
+ return ret;
}
static void nilfs_dat_commit_entry(struct inode *dat,
@@ -123,11 +136,7 @@ static void nilfs_dat_commit_free(struct inode *dat,
int nilfs_dat_prepare_start(struct inode *dat, struct nilfs_palloc_req *req)
{
- int ret;
-
- ret = nilfs_dat_prepare_entry(dat, req, 0);
- WARN_ON(ret == -ENOENT);
- return ret;
+ return nilfs_dat_prepare_entry(dat, req, 0);
}
void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req,
@@ -154,10 +163,8 @@ int nilfs_dat_prepare_end(struct inode *dat, struct nilfs_palloc_req *req)
int ret;
ret = nilfs_dat_prepare_entry(dat, req, 0);
- if (ret < 0) {
- WARN_ON(ret == -ENOENT);
+ if (ret < 0)
return ret;
- }
kaddr = kmap_atomic(req->pr_entry_bh->b_page);
entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
--
2.39.3
When bpf_trace_printk is called without any args in a second depth level,
it will enable preemption without disabling it.
These patch series fix this for 5.15 and 6.1. The fix was introduced in
6.3, so later kernels already have it. And 5.10 and earlier did not have
the code that disabled preemption, so they are fine in that regard.
This was tested by attaching a bpf program doing a non-0 arguments
trace_printk at sys_enter and a 0 arguments snprintf at local_timer_entry.
Dave Marchevsky (1):
bpf: Merge printk and seq_printf VARARG max macros
Jiri Olsa (3):
bpf: Add struct for bin_args arg in bpf_bprintf_prepare
bpf: Do cleanup in bpf_bprintf_cleanup only when needed
bpf: Remove trace_printk_lock
include/linux/bpf.h | 14 ++++++--
kernel/bpf/helpers.c | 71 ++++++++++++++++++++++------------------
kernel/bpf/verifier.c | 3 +-
kernel/trace/bpf_trace.c | 39 ++++++++++------------
4 files changed, 72 insertions(+), 55 deletions(-)
--
2.34.1
There are indications that ASPM L0s is not working very well on this
machine so disable it also for the NVMe and modem controllers for now.
Note that this is done as a precaution based on problems with the Wi-Fi
on the X13s as well as the NVMe, modem and Wi-Fi on the sc8280xp-crd
reference design (the NVMe controller on my X13s does not support L0 and
the machine lacks a modem).
Fixes: 9f4f3dfad8cf ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops")
Cc: stable(a)vger.kernel.org # 6.7
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
index 70824294108e..06fc88d5d025 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
+++ b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
@@ -730,6 +730,8 @@ keyboard@68 {
};
&pcie2a {
+ aspm-no-l0s;
+
perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 145 GPIO_ACTIVE_LOW>;
@@ -749,6 +751,8 @@ &pcie2a_phy {
};
&pcie3a {
+ aspm-no-l0s;
+
perst-gpios = <&tlmm 151 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 148 GPIO_ACTIVE_LOW>;
--
2.43.0
There are indications that ASPM L0s is not working very well on this
machine so disable it also for the modem and Wi-Fi controllers for now.
This specifically avoids having the modem and Wi-Fi controllers bounce
in an out of L0s when not used (the modem still bounces in and out of
L1) as well as intermittent Correctable errors on the Wi-Fi link when
not used.
Fixes: 9f4f3dfad8cf ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops")
Cc: stable(a)vger.kernel.org # 6.7
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
arch/arm64/boot/dts/qcom/sc8280xp-crd.dts | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
index 7e94a68d5d9f..8fc0380f65a0 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
+++ b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
@@ -546,6 +546,8 @@ &pcie2a_phy {
};
&pcie3a {
+ aspm-no-l0s;
+
perst-gpios = <&tlmm 151 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 148 GPIO_ACTIVE_LOW>;
@@ -566,6 +568,7 @@ &pcie3a_phy {
&pcie4 {
max-link-speed = <2>;
+ aspm-no-l0s;
perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
--
2.43.0
While debugging the `X86_DECODER_SELFTEST` failure first reported in [1],
we noticed that the line numbers reported by the `insn_decoder_test` tool
do not correspond to the line in the output of `objdump_reformat.awk` that
was causing the failure:
# TEST posttest
llvm-objdump -d -j .text ./vmlinux | \
awk -f ./arch/x86/tools/objdump_reformat.awk | \
arch/x86/tools/insn_decoder_test -y -v
arch/x86/tools/insn_decoder_test: error: malformed line 1657116:
68db0
$ llvm-objdump -d -j .text ./vmlinux | \
awk -f ./arch/x86/tools/objdump_reformat.awk > objdump_reformat.txt
$ head -n `echo 1657116+2 | bc` objdump_reformat.txt | tail -n 5
ffffffff815430b1 41 8b 47 1c movl
ffffffff815430b5 89 c1 movl
ffffffff815430b7 81 c9 00 40 00 00 orl
ffffffff815430bd 41 89 4e 18 movl
ffffffff815430c1 a8 40 testb
These lines are perfectly fine. The reason is that the line count reported
by the tool only includes instruction lines, i.e., it does not count symbol
lines. This behavior was introduced in Commit 35039eb6b199 ("x86: Show
symbol name if insn decoder test failed"), which included symbol lines
in the output of the awk script. This broke the `instuction lines == total
lines` property without accounting for it in `insn_decoder_test.c`.
Add a new variable to count the combined (insn+symbol) line count and
report this in the error message. With this patch, the line reported by the
tool is the line causing the failure (long line wrapped at 75 chars):
# TEST posttest
llvm-objdump -d -j .text ./vmlinux | \
awk -f ./arch/x86/tools/objdump_reformat.awk | \
arch/x86/tools/insn_decoder_test -y -v
arch/x86/tools/insn_decoder_test: error: malformed line 1699686:
68db0
$ head -n ` echo 1699686+2 | bc` objdump_reformat.txt | tail -n 5
ffffffff81568dac c3 retq
<_RNvXsP_NtCs7qddEHlz8fK_4core3fmtRINtNtNtNtB7_4iter8adapters5chain5Chain
INtNtBA_7flatten7FlattenINtNtB7_6option8IntoIterNtNtB7_4char11EscapeDebug
EEINtB1a_7FlatMapNtNtNtB7_3str4iter5CharsB1T_NtB2D_23CharEscapeDebugConti
nueEENtB5_5Debug3fmtB7_>:ffffffff81568db0
ffffffff81568dad 0f 1f 00 nopl
ffffffff81568db0 f3 0f 1e fa endbr64
ffffffff81568db4 41 56 pushq
[In this case the line causing the failure is interpreted as two lines by
the tool (due to its length, but this is fixed by [1, 2]), and the second
line is reported. Still the spatial closeness between the reported line and
the line causing the failure would have made debugging a lot easier.]
Link: https://lore.kernel.org/lkml/Y9ES4UKl%2F+DtvAVS@gmail.com/T/ [1]
Link: https://lore.kernel.org/rust-for-linux/20231119180145.157455-1-sergio.colla… [2]
Fixes: 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
Reviewed-by: Miguel Ojeda <ojeda(a)kernel.org>
Tested-by: Miguel Ojeda <ojeda(a)kernel.org>
Reported-by: John Baublitz <john.m.baublitz(a)gmail.com>
Debugged-by: John Baublitz <john.m.baublitz(a)gmail.com>
Signed-off-by: Valentin Obst <kernel(a)valentinobst.de>
---
Changes in v2:
- Added tags 'Reviewed-by', 'Tested-by', 'Reported-by', 'Debugged-by',
'Link', and 'Fixes'.
- Explain why this patch fixes the commit mentioned in the 'Fixes' tag.
- CCed the stable list and sent to all x86 maintainers.
- Link to v1: https://lore.kernel.org/r/20240221-x86-insn-decoder-line-fix-v1-1-47cd5a171…
---
arch/x86/tools/insn_decoder_test.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/tools/insn_decoder_test.c b/arch/x86/tools/insn_decoder_test.c
index 472540aeabc2..727017a3c3c7 100644
--- a/arch/x86/tools/insn_decoder_test.c
+++ b/arch/x86/tools/insn_decoder_test.c
@@ -114,6 +114,7 @@ int main(int argc, char **argv)
unsigned char insn_buff[16];
struct insn insn;
int insns = 0;
+ int lines = 0;
int warnings = 0;
parse_args(argc, argv);
@@ -123,6 +124,8 @@ int main(int argc, char **argv)
int nb = 0, ret;
unsigned int b;
+ lines++;
+
if (line[0] == '<') {
/* Symbol line */
strcpy(sym, line);
@@ -134,12 +137,12 @@ int main(int argc, char **argv)
strcpy(copy, line);
tab1 = strchr(copy, '\t');
if (!tab1)
- malformed_line(line, insns);
+ malformed_line(line, lines);
s = tab1 + 1;
s += strspn(s, " ");
tab2 = strchr(s, '\t');
if (!tab2)
- malformed_line(line, insns);
+ malformed_line(line, lines);
*tab2 = '\0'; /* Characters beyond tab2 aren't examined */
while (s < tab2) {
if (sscanf(s, "%x", &b) == 1) {
---
base-commit: b401b621758e46812da61fa58a67c3fd8d91de0d
change-id: 20240221-x86-insn-decoder-line-fix-7b1f2e1732ff
Best regards,
--
Valentin Obst <kernel(a)valentinobst.de>
I'm announcing the release of the 5.10.210 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Aaron Conole (1):
net: openvswitch: limit the number of recursions from action sets
Adrian Reber (1):
tty: allow TIOCSLCKTRMIOS with CAP_CHECKPOINT_RESTORE
Al Viro (2):
rename(): fix the locking of subdirectories
fast_dput(): handle underflows gracefully
Aleksander Mazur (1):
x86/Kconfig: Transmeta Crusoe is CPU family 5, not 6
Alex Henrie (1):
HID: apple: Add support for the 2021 Magic Keyboard
Alex Lyakas (1):
md: Whenassemble the array, consult the superblock of the freshest device
Alexander Stein (4):
ARM: dts: imx7d: Fix coresight funnel ports
ARM: dts: imx7s: Fix lcdif compatible
ARM: dts: imx7s: Fix nand-controller #size-cells
mmc: slot-gpio: Allow non-sleeping GPIO ro
Alexandra Winter (1):
s390/qeth: Fix potential loss of L3-IP@ in case of network issues
Alexey Dobriyan (1):
uapi: stddef.h: Fix __DECLARE_FLEX_ARRAY for C++
Alexey Khoroshilov (1):
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
Alfred Piccioni (1):
lsm: new security_file_ioctl_compat() hook
Amelie Delaunay (1):
dmaengine: fix NULL pointer in channel unregistration function
Andrii Nakryiko (1):
selftests/bpf: satisfy compiler by having explicit return in btf test
Andrii Staikov (1):
i40e: Fix VF disable behavior to block all traffic
Andrzej Hajda (1):
debugobjects: Stop accessing objects after releasing hash bucket lock
Andy Shevchenko (1):
mmc: mmc_spi: remove custom DMA mapped buffers
Anna Schumaker (1):
SUNRPC: Fix a suspicious RCU usage warning
Antoine Tenart (1):
tunnels: fix out of bounds access when building IPv6 PMTU error
Anton Ivanov (1):
um: Fix naming clash between UML and scheduler
Arjun Roy (1):
net-zerocopy: Refactor frag-is-remappable test.
Arnd Bergmann (1):
drm/exynos: fix accidental on-stack copy of exynos_drm_plane
Avri Altman (1):
mmc: core: Use mrq.sbc in close-ended ffu
Baokun Li (4):
ext4: unify the type of flexbg_size to unsigned int
ext4: remove unnecessary check from alloc_flex_gd()
ext4: avoid online resizing failures due to oversized flex bg
ext4: fix double-free of blocks due to wrong extents moved_len
Bart Van Assche (1):
scsi: core: Introduce enum scsi_disposition
Benjamin Berg (3):
wifi: cfg80211: free beacon_ies when overridden from hidden BSS
um: Don't use vfprintf() for os_info()
HID: apple: Add 2021 magic keyboard FN key mapping
Bernd Edlinger (1):
exec: Fix error handling in begin_new_exec()
Bjorn Helgaas (2):
PM: sleep: Use dev_printk() when possible
PCI/AER: Decode Requester ID when no error info found
Boris Burkov (2):
btrfs: forbid creating subvol qgroups
btrfs: forbid deleting live subvol qgroup
Breno Leitao (2):
net: sysfs: Fix /sys/class/net/<iface> path
net: sysfs: Fix /sys/class/net/<iface> path for statistics
Carlos Llamas (2):
binder: signal epoll threads of self-work
scripts/decode_stacktrace.sh: optionally use LLVM utilities
Chao Yu (1):
f2fs: fix to check return value of f2fs_reserve_new_block()
Charan Teja Kalla (1):
mm/sparsemem: fix race in accessing memory_section->usage
Chris Riches (1):
audit: Send netlink ACK before setting connection in auditd_set
Christian A. Ehrhardt (2):
of: unittest: Fix compile in the non-dynamic case
usb: ucsi_acpi: Fix command completion handling
Christoph Hellwig (1):
block: prevent an integer overflow in bvec_try_merge_hw_page
Christophe JAILLET (3):
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA
dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA
Chung-Chiang Cheng (1):
btrfs: tree-checker: fix inline ref size in error messages
Cristian Ciocaltea (1):
ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
Dan Carpenter (4):
drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
drm/bridge: nxp-ptn3460: simplify some error checking
netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq()
Daniel Basilio (1):
nfp: use correct macro for LengthSelect in BAR config
Daniel Lezcano (2):
units: change from 'L' to 'UL'
units: add the HZ macros
Daniel Stodden (1):
PCI: switchtec: Fix stdev_release() crash after surprise hot remove
Daniel Vacek (1):
IB/ipoib: Fix mcast list locking
Daniel de Villiers (1):
nfp: flower: prevent re-adding mac index for bonded port
Dave Airlie (1):
nouveau/vmm: don't set addr on the fail path to avoid warning
David Howells (2):
afs: Hide silly-rename files from userspace
rxrpc: Fix response to PING RESPONSE ACKs to a dead call
David Schiller (1):
staging: iio: ad5933: fix type mismatch regression
David Senoner (1):
ALSA: hda/realtek: Fix the external mic not being recognised for Acer Swift 1 SF114-32
David Sterba (2):
btrfs: don't warn if discard range is not aligned to sector
btrfs: send: return EOPNOTSUPP on unknown flags
Davidlohr Bueso (1):
hrtimer: Ignore slack time for RT tasks in schedule_hrtimeout_range()
Dmitry Antipov (1):
PNP: ACPI: fix fortify warning
Dmitry Baryshkov (1):
PM: runtime: add devm_pm_runtime_enable helper
Doug Berger (1):
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
Douglas Anderson (2):
drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend()
Edson Juliano Drosdeck (1):
ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL
Edward Adam Davis (3):
jfs: fix uaf in jfs_evict_inode
jfs: fix array-index-out-of-bounds in diNewExt
wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
Ekansh Gupta (1):
misc: fastrpc: Mark all sessions as invalid in cb_remove
Emmanuel Grumbach (1):
wifi: iwlwifi: fix a memory corruption
Eric Dumazet (9):
llc: make llc_ui_sendmsg() more robust against bonding changes
ip6_tunnel: use dev_sw_netstats_rx_add()
ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()
tcp: add sanity checks to rx zerocopy
llc: call sock_orphan() at release time
af_unix: fix lockdep positive in sk_diag_dump_icons()
inet: read sk->sk_family once in inet_recv_error()
ppp_async: limit MRU to 64K
net: prevent mss overflow in skb_segment()
Fabio Estevam (9):
ARM: dts: imx25/27-eukrea: Fix RTC node name
ARM: dts: imx: Use flash@0,0 pattern
ARM: dts: imx27: Fix sram node
ARM: dts: imx1: Fix sram node
ARM: dts: imx25: Fix the iim compatible string
ARM: dts: imx25/27: Pass timing0
ARM: dts: imx27-apf27dev: Fix LED name
ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
ARM: dts: imx23/28: Fix the DMA controller node name
Fedor Pchelkin (3):
btrfs: ref-verify: free ref cache before clearing mount opt
drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume
nfc: nci: free rx_data_reassembly skb on NCI device cleanup
Felix Kuehling (1):
drm/amdgpu: Let KFD sync with VM fences
Florian Fainelli (1):
net: bcmgenet: Fix EEE implementation
Florian Westphal (5):
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
netfilter: nft_set_pipapo: store index in scratch maps
netfilter: nft_set_pipapo: add helper to release pcpu scratch area
netfilter: nft_set_pipapo: remove scratch_aligned pointer
Frank Li (5):
usb: cdns3: fix uvc failure work since sg support enabled
usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config
usb: cdns3: fix iso transfer error when mult is not zero
usb: cdns3: Fix uvc fail when DMA cross 4k boundery since sg enabled
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
Frederic Weisbecker (1):
hrtimer: Report offline hrtimer enqueue
Frédéric Danis (1):
Bluetooth: L2CAP: Fix possible multiple reject send
Furong Xu (2):
net: stmmac: xgmac: fix handling of DPP safety error for DMA channels
net: stmmac: xgmac: fix a typo of register name in DPP safety handling
Gabriel Krisman Bertazi (1):
ecryptfs: Reject casefold directory inodes
Ghanshyam Agrawal (1):
media: stk1160: Fixed high volume of stk1160_dbg messages
Greg KH (1):
perf/core: Fix narrow startup race when creating the perf nr_addr_filters sysfs file
Greg Kroah-Hartman (1):
Linux 5.10.210
Guanhua Gao (1):
dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools
Guenter Roeck (1):
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
Guilherme G. Piccoli (1):
PCI: Only override AMD USB controller if required
Hannes Reinecke (2):
scsi: libfc: Don't schedule abort twice
scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
Hans de Goede (1):
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
Hardik Gajjar (1):
usb: hub: Replace hardcoded quirk value with BIT() macro
Harshit Shah (1):
i3c: master: cdns: Update maximum prescaler value for i2c clock
Heiko Carstens (2):
s390/ptrace: handle setting of fpc register correctly
KVM: s390: fix setting of fpc register
Heiner Kallweit (2):
leds: trigger: panic: Don't register panic notifier if creating the trigger failed
i2c: i801: Remove i801_set_block_buffer_mode
Helge Deller (2):
parisc/firmware: Fix F-extend for PDC addresses
ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
Herbert Xu (3):
crypto: api - Disallow identical driver names
hwrng: core - Fix page fault dead lock on mmap-ed hwrng
crypto: s390/aes - Fix buffer overread in CTR mode
Hongchen Zhang (1):
PM: hibernate: Enforce ordering during image compression/decompression
Hou Tao (2):
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
bpf: Set uattr->batch.count as zero before batched update or deletion
Hugo Villeneuve (4):
serial: sc16is7xx: set safe default SPI clock frequency
serial: sc16is7xx: add check for unsupported SPI modes during probe
serial: max310x: set default value when reading clock ready bit
serial: max310x: improve crystal stable clock detection
Ian Rogers (1):
libsubcmd: Fix memory leak in uniq()
Ido Schimmel (1):
PCI: Add no PM reset quirk for NVIDIA Spectrum devices
Ilpo Järvinen (2):
serial: Add rs485_supported to uart_port
serial: 8250_exar: Fill in rs485_supported
Ilya Dryomov (1):
rbd: don't move requests to the running list on errors
Ivan Vecera (1):
i40e: Fix waiting for queues of all VSIs to be disabled
Jack Wang (1):
RDMA/IPoIB: Fix error code return in ipoib_mcast_join
JackBB Wu (1):
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
Jaegeuk Kim (1):
f2fs: fix write pointers on zoned device after roll forward
Jai Luthra (1):
dmaengine: ti: k3-udma: Report short packet errors
Jakub Kicinski (1):
selftests: netdevsim: fix the udp_tunnel_nic test
Jan Beulich (1):
xen-netback: properly sync TX responses
Jason Gerecke (1):
HID: wacom: Do not register input devices until after hid_hw_start
Jean Delvare (1):
i2c: i801: Fix block process call transactions
Jedrzej Jagielski (2):
ixgbe: Refactor returning internal error codes
ixgbe: Refactor overtemp event handling
Jenishkumar Maheshbhai Patel (1):
net: mvpp2: clear BM pool before initialization
Jiangfeng Xiao (1):
powerpc/kasan: Fix addr error caused by page alignment
Jiri Wiesner (1):
clocksource: Skip watchdog check for large watchdog intervals
Johan Hovold (3):
arm64: dts: qcom: sdm845: fix USB wakeup interrupt types
arm64: dts: qcom: sdm845: fix USB DP/DM HS PHY interrupts
arm64: dts: qcom: sc7180: fix USB wakeup interrupt types
Johan Jonker (1):
ARM: dts: rockchip: fix rk3036 hdmi ports node
Johannes Berg (1):
wifi: mac80211: reload info pointer in ieee80211_tx_dequeue()
Jonathan Cameron (1):
iio:adc:ad7091r: Move exports into IIO_AD7091R namespace.
Jozsef Kadlecsik (2):
netfilter: ipset: fix performance regression in swap operation
netfilter: ipset: Missing gc cancellations fixed
Julian Wiedmann (1):
net/af_iucv: clean up a try_then_request_module()
Jun'ichi Nomura (1):
x86/boot: Ignore NMIs during very early boot
Junxiao Bi (1):
Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
Justin Tee (1):
scsi: lpfc: Fix possible file string name overflow when updating firmware
Kamal Dasu (1):
spi: bcm-qspi: fix SFDP BFPT read by usig mspi read
Kees Cook (3):
stddef: Introduce DECLARE_FLEX_ARRAY() helper
smb3: Replace smb2pdu 1-element arrays with flex-arrays
block/rnbd-srv: Check for unlikely string overflow
Kim Phillips (1):
crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
Konrad Dybcio (2):
pmdomain: core: Move the unused cleanup to a _sync initcall
drm/msm/dsi: Enable runtime PM
Kuan-Wei Chiu (2):
clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
Kuniyuki Iwashima (1):
llc: Drop support for ETH_P_TR_802_2.
Kunwu Chan (1):
powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
Kuogee Hsieh (1):
drm/msm/dp: return correct Colorimetry for DP_TEST_DYNAMIC_RANGE_CEA case
Lee Duncan (1):
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
Leonard Dallmayr (1):
USB: serial: cp210x: add ID for IMST iM871A-USB
Li zeming (1):
PM: core: Remove unnecessary (void *) conversions
Lin Ma (1):
vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
Lino Sanfilippo (1):
serial: 8250_exar: Set missing rs485_supported flag
Linus Torvalds (1):
sched/membarrier: reduce the ability to hammer on sys_membarrier
Loic Prylli (1):
hwmon: (aspeed-pwm-tacho) mutex for tach reading
Luka Guzenko (1):
ALSA: hda/realtek: Enable Mute LED on HP Laptop 14-fq0xxx
Lukas Schauer (1):
pipe: wakeup wr_wait after setting max_usage
Manas Ghandat (2):
jfs: fix slab-out-of-bounds Read in dtSearch
jfs: fix array-index-out-of-bounds in dbAdjTree
Mao Jinlong (2):
arm64: dts: qcom: msm8996: Fix 'in-ports' is a required property
arm64: dts: qcom: msm8998: Fix 'out-ports' is a required property
Marc Zyngier (1):
irqchip/gic-v3-its: Fix GICv4.1 VPE affinity update
Marcelo Schmitt (3):
iio: adc: ad7091r: Set alert bit in config register
iio: adc: ad7091r: Allow users to configure device events
iio: adc: ad7091r: Enable internal vref if external vref is not supplied
Mario Limonciello (3):
rtc: Adjust failure return code for cmos_set_alarm()
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
iio: accel: bma400: Fix a compilation problem
Mark Rutland (1):
drivers/perf: pmuv3: don't expose SW_INCR event in sysfs
Markus Niebel (1):
drm: panel-simple: add missing bus flags for Tianma tm070jvhg[30/33]
Masami Hiramatsu (Google) (1):
tracing/trigger: Fix to return error if failed to alloc snapshot
Matthew Wilcox (Oracle) (1):
block: Remove special-casing of compound pages
Max Kellermann (2):
fs/pipe: move check to pipe_has_watch_queue()
fs/kernfs/dir: obey S_ISGID
Meenakshikumar Somasundaram (1):
drm/amd/display: Fix tiled display misalignment
Michael Chan (1):
bnxt_en: Wait for FLR to complete during probe
Michael Ellerman (2):
powerpc: Fix build error due to is_valid_bugaddr()
powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
Michael Tretter (1):
media: rockchip: rga: fix swizzling for RGB formats
Miguel Ojeda (1):
scripts: decode_stacktrace: demangle Rust symbols
Mikulas Patocka (1):
dm: limit the number of targets and parameter size area
Ming Lei (3):
blk-mq: fix IO hang from sbitmap wakeup race
scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
Minsuk Kang (1):
wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
Mukesh Ojha (1):
PM / devfreq: Synchronize devfreq_monitor_[start/stop]
Nathan Chancellor (2):
um: net: Fix return type of uml_net_start_xmit()
kbuild: Fix changing ELF file type for output of gen_btf for big endian
Naveen N Rao (1):
powerpc/lib: Validate size for vector operations
Nikita Zhandarovich (1):
net: hsr: remove WARN_ONCE() in send_hsr_supervision_frame()
Niklas Cassel (1):
PCI: dwc: endpoint: Fix dw_pcie_ep_raise_msix_irq() alignment support
Nikolay Borisov (1):
btrfs: remove err variable from btrfs_delete_subvolume
Nuno Sa (1):
of: property: fix typo in io-channels
Oleg Nesterov (3):
afs: fix the usage of read_seqbegin_or_lock() in afs_lookup_volume_rcu()
afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
Oleksandr Tyshchenko (1):
xen/gntdev: Fix the abuse of underlying struct page in DMA-buf import
Oleksij Rempel (2):
spi: introduce SPI_MODE_X_MASK macro
can: j1939: Fix UAF in j1939_sk_match_filter during setsockopt(SO_J1939_FILTER)
Oliver Neukum (1):
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
Omar Sandoval (2):
btrfs: don't abort filesystem when attempting to snapshot deleted subvolume
btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted
Ondrej Mosnacek (1):
lsm: fix the logic in security_inode_getsecctx()
Osama Muhammad (2):
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
UBSAN: array-index-out-of-bounds in dtSplitRoot
Pablo Neira Ayuso (8):
netfilter: nf_tables: validate NFPROTO_* family
netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
netfilter: nft_compat: reject unused compat flag
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_ct: reject direction for ct id
netfilter: nft_set_rbtree: skip end interval element from gc
Paolo Abeni (1):
selftests: net: avoid just another constant wait
Paolo Bonzini (2):
mm: vmalloc: introduce array allocation functions
KVM: use __vcalloc for very large allocations
Paul Cercueil (1):
ARM: dts: samsung: exynos4210-i9100: Unconditionally enable LDO12
Pawel Laszczak (1):
usb: cdns3: Fixes for sparse warnings
Peter Robinson (1):
mfd: ti_am335x_tscadc: Fix TI SoC dependencies
Peter Zijlstra (1):
perf: Fix the nr_addr_filters fix
Petr Pavlu (1):
tracing: Ensure visibility when inserting an element into tracing_map
Pierre-Louis Bossart (3):
PCI: add INTEL_HDA_ARL to pci_ids.h
ALSA: hda: Intel: add HDA_ARL PCI ID support
ALSA: hda: intel-dspcfg: add filters for ARL-S and ARL
Piotr Skajewski (1):
ixgbe: Remove non-inclusive language
Prarit Bhargava (1):
ACPI: extlog: fix NULL pointer dereference check
Prashanth K (1):
usb: host: xhci-plat: Add support for XHCI_SG_TRB_CACHE_SIZE_QUIRK
Prathu Baronia (1):
vhost: use kzalloc() instead of kmalloc() followed by memset()
Puliang Lu (1):
USB: serial: option: add Fibocom FM101-GL variant
Qiang Yu (1):
bus: mhi: host: Drop chan lock before queuing buffers
Qu Wenruo (2):
btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
btrfs: do not ASSERT() if the newly created subvolume already got read
Radek Krejci (1):
modpost: trim leading spaces when processing source files list
Rafael J. Wysocki (5):
async: Split async_schedule_node_domain()
async: Introduce async_schedule_dev_nocall()
PM: sleep: Avoid calling put_device() under dpm_list_mtx
PM: sleep: Fix possible deadlocks in core system-wide PM code
PM: sleep: Fix error handling in dpm_prepare()
Richard Palethorpe (1):
x86/entry/ia32: Ensure s32 is sign extended to s64
Rishabh Dave (1):
ceph: prevent use-after-free in encode_cap_msg()
Rob Clark (1):
drm/msm/dpu: Ratelimit framedone timeout msgs
Rolf Eike Beer (1):
mm: use __pfn_to_section() instead of open coding it
Rui Zhang (1):
regulator: core: Only increment use_count when enable_count changes
Ryusuke Konishi (4):
nilfs2: fix data corruption in dsync block recovery for small block sizes
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
nilfs2: fix potential bug in end_buffer_async_write
nilfs2: replace WARN_ONs for invalid DAT metadata block requests
Salvatore Dipietro (1):
tcp: Add memory barrier to tcp_push()
Sandeep Maheswaram (1):
arm64: dts: qcom: sc7180: Use pdc interrupts for USB instead of GIC interrupts
Schspa Shi (1):
scripts/decode_stacktrace.sh: support old bash version
Sean Young (1):
media: rc: bpf attach/detach requires write permission
Serge Semin (1):
mips: Fix max_mapnr being uninitialized on early stages
Shannon Nelson (1):
ionic: pass opcode to devcmd_wait
Sharath Srinivasan (1):
net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
Shenwei Wang (1):
net: fec: fix the unhandled context fault from smmu
Shigeru Yoshida (1):
tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
Shiji Yang (1):
wifi: rt2x00: restart beacon queue when hardware reset
Shuai Xue (1):
ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events
Simon Horman (1):
net: stmmac: xgmac: use #define for string constants
Sjoerd Simons (1):
bus: moxtet: Add spi device table
Souradeep Chakrabarti (1):
hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove
Srinivasan Shanmugam (3):
drm/amdgpu: Drop 'fence' check in 'to_amdgpu_amdkfd_fence()'
drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()'
drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
Stephen Boyd (1):
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
Stephen Rothwell (2):
powerpc: pmd_move_must_withdraw() is only needed for CONFIG_TRANSPARENT_HUGEPAGE
drm: using mul_u32_u32() requires linux/math64.h
Steve Wahl (1):
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
Steven Rostedt (Google) (2):
tracing: Fix wasted memory in saved_cmdlines logic
tracing: Inform kmemleak of saved_cmdlines allocation
Su Hui (3):
wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
media: ddbridge: fix an error code problem in ddb_probe
scsi: isci: Fix an error code problem in isci_io_request_build()
Suraj Jitindar Singh (1):
ext4: allow for the last group to be marked as trimmed
Takashi Iwai (1):
ALSA: hda: Refer to correct stream index at loops
Takashi Sakamoto (1):
firewire: core: correct documentation of fw_csr_string() kernel API
Tatsunosuke Tobita (1):
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
Tejun Heo (1):
blk-iocost: Fix an UBSAN shift-out-of-bounds warning
Thomas Bourgoin (1):
crypto: stm32/crc32 - fix parsing list of devices
Tianjia Zhang (1):
crypto: lib/mpi - Fix unexpected pointer access in mpi_ec_init
Tim Chen (1):
tick/sched: Preserve number of idle sleeps across CPU hotplug events
Tobias Waldekranz (1):
net: dsa: mv88e6xxx: Fix mv88e6352_serdes_get_stats error path
Tomi Valkeinen (4):
drm/tidss: Fix atomic_flush check
drm/drm_file: fix use of uninitialized variable
drm/framebuffer: Fix use of uninitialized variable
drm/mipi-dsi: Fix detach call without attach
Tony Lindgren (1):
phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
Uwe Kleine-König (1):
spi: ppc4xx: Drop write-only variable
Vegard Nossum (1):
scripts/get_abi: fix source path leak
Ville Syrjälä (1):
drm: Don't unref the same fb many times by mistake due to deadlock handling
Vincent Donnefort (1):
ring-buffer: Clean ring_buffer_poll_wait() error return
Weichen Chen (1):
pstore/ram: Fix crash when setting number of cpus to an odd number
Wen Gu (1):
net/smc: fix illegal rmb_desc access in SMC-D connection dump
Wenhua Lin (1):
gpio: eic-sprd: Clear interrupt after set the interrupt type
Werner Fischer (1):
watchdog: it87_wdt: Keep WDTCTRL bit 3 unmodified for IT8784/IT8786
Werner Sembach (1):
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Xi Ruoyao (1):
mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
Xiang Yang (1):
Revert "arm64: Stash shadow stack pointer in the task struct on interrupt"
Xiaolei Wang (1):
rpmsg: virtio: Free driver_override when rpmsg_remove()
Xiubo Li (1):
ceph: fix deadlock or deadcode of misusing dget()
Ye Bin (1):
ext4: fix inconsistent between segment fstrim and full fstrim
Yevgeny Kliteynik (1):
net/mlx5: DR, Use the right GVMI number for drop action
Yonghong Song (1):
selftests/bpf: Fix pyperf180 compilation failure with clang18
Yoshihiro Shimoda (1):
phy: renesas: rcar-gen3-usb2: Fix returning wrong error code
Yuluo Qiu (1):
ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
Zach O'Keefe (1):
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
Zenm Chen (1):
wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
Zhang Rui (2):
hwmon: (coretemp) Fix out-of-bounds memory access
hwmon: (coretemp) Fix bogus core_id to attr name mapping
Zheng Wang (1):
media: mtk-jpeg: Fix use after free bug due to error path handling in mtk_jpeg_dec_device_run
Zhengchao Shao (5):
tcp: make sure init the accept_queue's spinlocks once
netlink: fix potential sleeping issue in mqueue_flush_file
ipv6: init the accept_queue's spinlocks in inet6_create
bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
bonding: remove print in bond_verify_device_path
Zhihao Cheng (1):
ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path
Zhipeng Lu (5):
net/mlx5e: fix a double-free in arfs_create_groups
fjes: fix memleaks in fjes_hw_setup
net: ipv4: fix a memleak in ip_setup_cork
atm: idt77252: fix a memleak in open_card_ubr0
media: ir_toy: fix a memleak in irtoy_tx
Zhiquan Li (1):
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
Zhu Yanjun (1):
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
Zijun Hu (1):
Bluetooth: qca: Set both WIDEBAND_SPEECH and LE_STATES quirks for QCA2066
bo liu (1):
ALSA: hda/conexant: Add quirk for SWS JS201D
ching Huang (1):
scsi: arcmsr: Support new PCI device IDs 1883 and 1886
qizhong cheng (1):
PCI: mediatek: Clear interrupt status before dispatching handler
yuan linyu (1):
usb: f_mass_storage: forbid async queue when shutdown happen
zhili.liu (1):
iio: magnetometer: rm3100: add boundary check for the value read from RM3100_REG_TMRC
Hi,
Please include commit be80e9cdbca8 ("libbpf: Rename DECLARE_LIBBPF_OPTS
into LIBBPF_OPTS")
to the 5.15 stable branch.
Commit 3eefb2fbf4ec ("selftests/bpf: Test tail call counting with
bpf2bpf and data on stack")
introduced in v5.15.39 is dependent on it, and now building selftests
fails with:
...
linux/tools/testing/selftests/bpf/prog_tests/tailcalls.c: In function
‘test_tailcall_bpf2bpf_6’:
linux/tools/testing/selftests/bpf/prog_tests/tailcalls.c:822:9: warning:
implicit declaration of function ‘LIBBPF_OPTS’; did you mean
‘LIBBPF_API’? [-Wimplicit-function-declaration]
822 | LIBBPF_OPTS(bpf_test_run_opts, topts,
| ^~~~~~~~~~~
| LIBBPF_API
linux/tools/testing/selftests/bpf/prog_tests/tailcalls.c:822:21: error:
‘bpf_test_run_opts’ undeclared (first use in this function)
822 | LIBBPF_OPTS(bpf_test_run_opts, topts,
| ^~~~~~~~~~~~~~~~~
linux/tools/testing/selftests/bpf/prog_tests/tailcalls.c:822:21: note:
each undeclared identifier is reported only once for each function it
appears in
linux/tools/testing/selftests/bpf/prog_tests/tailcalls.c:822:40:
error: ‘topts’ undeclared (first use in this function)
822 | LIBBPF_OPTS(bpf_test_run_opts, topts,
| ^~~~~
tools/testing/selftests/bpf/prog_tests/tailcalls.c:823:17: error:
expected expression before ‘.’ token
823 | .data_in = &pkt_v4,
| ^
...
Thanks in advance,
Roxana
Fix kernel freeze reproduced when probing stmmac devices on kernel 4.19:
Upstream commit 474a31e13a4e9749fb3ee55794d69d0f17ee0998 to fix freeze and
upstream commit 8d72ab119f42f25abb393093472ae0ca275088b6 to apply the fix correctly.
Hi,
The commit `c1fc6484e1fb sched/rt: sysctl_sched_rr_timeslice show
default timeslice after reset` is a clean cherry-pick for v5.4+ kernels
and the commit `079be8fc6309 sched/rt: sysctl_sched_rr_timeslice show
default timeslice after reset` cleanly applicable for v6.1
These are trivial fixes which fix the parsing the negative values when
read from the userspace, but will also make LTP test `proc_sched_rt01.c`
happy instead of ignoring it.
-MNAdam
Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of
-EPROBE_DEFER")).
Move registration of the typec switch to after looking up clocks and
other resources.
Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).
Fixes: 2851117f8f42 ("phy: qcom-qmp-combo: Introduce orientation switching")
Cc: stable(a)vger.kernel.org # 6.5
Cc: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index e19d6a084f10..17c4ad7553a5 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -3562,10 +3562,6 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
return ret;
- ret = qmp_combo_typec_switch_register(qmp);
- if (ret)
- return ret;
-
/* Check for legacy binding with child nodes. */
usb_np = of_get_child_by_name(dev->of_node, "usb3-phy");
if (usb_np) {
@@ -3585,6 +3581,10 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
goto err_node_put;
+ ret = qmp_combo_typec_switch_register(qmp);
+ if (ret)
+ goto err_node_put;
+
ret = drm_aux_bridge_register(dev);
if (ret)
goto err_node_put;
--
2.43.0
Due to a long-standing issue in driver core, drivers may not probe defer
after having registered child devices to avoid triggering a probe
deferral loop (see fbc35b45f9f6 ("Add documentation on meaning of
-EPROBE_DEFER")).
This could potentially also trigger a bug in the DRM bridge
implementation which does not expect bridges to go away even if device
links may avoid triggering this (when enabled).
Move registration of the DRM aux bridge to after looking up clocks and
other resources.
Note that PHY creation can in theory also trigger a probe deferral when
a 'phy' supply is used. This does not seem to affect the QMP PHY driver
but the PHY subsystem should be reworked to address this (i.e. by
separating initialisation and registration of the PHY).
Fixes: 35921910bbd0 ("phy: qcom: qmp-combo: switch to DRM_AUX_BRIDGE")
Fixes: 1904c3f578dc ("phy: qcom-qmp-combo: Introduce drm_bridge")
Cc: stable(a)vger.kernel.org # 6.5
Cc: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/phy/qualcomm/phy-qcom-qmp-combo.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index 1ad10110dd25..e19d6a084f10 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -3566,10 +3566,6 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
return ret;
- ret = drm_aux_bridge_register(dev);
- if (ret)
- return ret;
-
/* Check for legacy binding with child nodes. */
usb_np = of_get_child_by_name(dev->of_node, "usb3-phy");
if (usb_np) {
@@ -3589,6 +3585,10 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
goto err_node_put;
+ ret = drm_aux_bridge_register(dev);
+ if (ret)
+ goto err_node_put;
+
pm_runtime_set_active(dev);
ret = devm_pm_runtime_enable(dev);
if (ret)
--
2.43.0
The PTDMA driver sets DMA masks in two different places for the same
device inconsistently. First call is in pt_pci_probe(), where it uses
48bit mask. The second call is in pt_dmaengine_register(), where it
uses a 64bit mask. Using 64bit dma mask causes IO_PAGE_FAULT errors
on DMA transfers between main memory and other devices.
Without the extra call it works fine. Additionally the second call
doesn't check the return value so it can silently fail.
Remove the superfluous dma_set_mask() call and only use 48bit mask.
Cc: stable(a)vger.kernel.org
Fixes: b0b4a6b10577 ("dmaengine: ptdma: register PTDMA controller as a DMA resource")
Signed-off-by: Tadeusz Struk <tstruk(a)gigaio.com>
---
drivers/dma/ptdma/ptdma-dmaengine.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/dma/ptdma/ptdma-dmaengine.c b/drivers/dma/ptdma/ptdma-dmaengine.c
index 1aa65e5de0f3..f79240734807 100644
--- a/drivers/dma/ptdma/ptdma-dmaengine.c
+++ b/drivers/dma/ptdma/ptdma-dmaengine.c
@@ -385,8 +385,6 @@ int pt_dmaengine_register(struct pt_device *pt)
chan->vc.desc_free = pt_do_cleanup;
vchan_init(&chan->vc, dma_dev);
- dma_set_mask_and_coherent(pt->dev, DMA_BIT_MASK(64));
-
ret = dma_async_device_register(dma_dev);
if (ret)
goto err_reg;
--
2.43.2
Hello,
Do you need help with your ongoing or any new Project in the following?
*Followed are our key services*-
· Mobile app development (iOS and Android)
· Custom software development
Let me know if you need help with the above services and would be happy to
discuss over a no-obligation consultation on how you may plan your project
and how we may help.
Look forward to your reply.
Thanks & Regards
Anjali
For LUN crossing boundaries, it is handy to know what is the index of
the last page in a LUN. This helper will soon be reused. At the same
time I rename page_per_lun to ppl in the calling function to clarify the
lines.
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
This is a dependency for the next patch, so I Cc'd stable on it as well.
---
drivers/mtd/nand/raw/nand_base.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index bcfd99a1699f..d6a27e08b112 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1211,19 +1211,25 @@ static int nand_lp_exec_read_page_op(struct nand_chip *chip, unsigned int page,
return nand_exec_op(chip, &op);
}
+static unsigned int rawnand_last_page_of_lun(unsigned int pages_per_lun, unsigned int lun)
+{
+ /* lun is expected to be very small */
+ return (lun * pages_per_lun) + pages_per_lun - 1;
+}
+
static void rawnand_cap_cont_reads(struct nand_chip *chip)
{
struct nand_memory_organization *memorg;
- unsigned int pages_per_lun, first_lun, last_lun;
+ unsigned int ppl, first_lun, last_lun;
memorg = nanddev_get_memorg(&chip->base);
- pages_per_lun = memorg->pages_per_eraseblock * memorg->eraseblocks_per_lun;
- first_lun = chip->cont_read.first_page / pages_per_lun;
- last_lun = chip->cont_read.last_page / pages_per_lun;
+ ppl = memorg->pages_per_eraseblock * memorg->eraseblocks_per_lun;
+ first_lun = chip->cont_read.first_page / ppl;
+ last_lun = chip->cont_read.last_page / ppl;
/* Prevent sequential cache reads across LUN boundaries */
if (first_lun != last_lun)
- chip->cont_read.pause_page = first_lun * pages_per_lun + pages_per_lun - 1;
+ chip->cont_read.pause_page = rawnand_last_page_of_lun(ppl, first_lun);
else
chip->cont_read.pause_page = chip->cont_read.last_page;
}
--
2.34.1
From: Rob Clark <robdclark(a)chromium.org>
We need to bail out before adding/removing devices if we are going to
-EPROBE_DEFER. Otherwise boot can get stuck in a probe deferral loop due
to a long-standing issue in driver core (see fbc35b45f9f6 ("Add
documentation on meaning of -EPROBE_DEFER")).
Deregistering the altmode child device can potentially also trigger bugs
in the DRM bridge implementation, which does not expect bridges to go
away.
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
Link: https://lore.kernel.org/r/20231213210644.8702-1-robdclark@gmail.com
[ johan: rebase on 6.8-rc4, amend commit message and mention DRM ]
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Cc: stable(a)vger.kernel.org # 6.3
Cc: Bjorn Andersson <andersson(a)kernel.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/soc/qcom/pmic_glink.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index f4bfd24386f1..f913e9bd57ed 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -265,10 +265,17 @@ static int pmic_glink_probe(struct platform_device *pdev)
pg->client_mask = *match_data;
+ pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
+ if (IS_ERR(pg->pdr)) {
+ ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr),
+ "failed to initialize pdr\n");
+ return ret;
+ }
+
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) {
ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi");
if (ret)
- return ret;
+ goto out_release_pdr_handle;
}
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) {
ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode");
@@ -281,17 +288,11 @@ static int pmic_glink_probe(struct platform_device *pdev)
goto out_release_altmode_aux;
}
- pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
- if (IS_ERR(pg->pdr)) {
- ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
- goto out_release_aux_devices;
- }
-
service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd");
if (IS_ERR(service)) {
ret = dev_err_probe(&pdev->dev, PTR_ERR(service),
"failed adding pdr lookup for charger_pd\n");
- goto out_release_pdr_handle;
+ goto out_release_aux_devices;
}
mutex_lock(&__pmic_glink_lock);
@@ -300,8 +301,6 @@ static int pmic_glink_probe(struct platform_device *pdev)
return 0;
-out_release_pdr_handle:
- pdr_handle_release(pg->pdr);
out_release_aux_devices:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT))
pmic_glink_del_aux_device(pg, &pg->ps_aux);
@@ -311,6 +310,8 @@ static int pmic_glink_probe(struct platform_device *pdev)
out_release_ucsi_aux:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI))
pmic_glink_del_aux_device(pg, &pg->ucsi_aux);
+out_release_pdr_handle:
+ pdr_handle_release(pg->pdr);
return ret;
}
--
2.43.0
On Wed, Feb 21, 2024 at 10:00 AM Valentin Obst <kernel(a)valentinobst.de> wrote:
>
> While debugging the `X86_DECODER_SELFTEST` failure first reported in [1],
> [In this case the line causing the failure is interpreted as two lines by
> the tool (due to its length, but this is fixed by [1, 2]), and the second
> line is reported. Still the spatial closeness between the reported line and
> the line causing the failure would have made debugging a lot easier.]
Thanks Valentin, John et al. for digging into this (and the related
issue) -- very much appreciated.
It looks good to me:
Reviewed-by: Miguel Ojeda <ojeda(a)kernel.org>
Tested-by: Miguel Ojeda <ojeda(a)kernel.org>
This should probably have a Fixes tag -- from a quick look, the
original test did not seem to have the problem because `insns` was
equivalent to the number of lines since there was no `if ... {
continue; }` for the symbol case. At some point that branch was added,
so that was not true anymore, thus that one should probably be the
Fixes tag, though please double-check:
Fixes: 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
It is a minor issue for sure, so perhaps not worth backporting, but
nevertheless the hash is in a very old kernel, and thus the issue
applies to all stable kernels. So it does not hurt flagging it to the
stable team:
Cc: stable(a)vger.kernel.org
In addition, John reported the original issue, but this one was found
due to that one, and I am not exactly sure who did what here. Please
consider whether one of the following (or similar) may be fair:
Reported-by: John Baublitz <john.m.baublitz(a)gmail.com>
Debugged-by: John Baublitz <john.m.baublitz(a)gmail.com>
An extra Link to the discussion in Zulip could be nice too:
Link: https://rust-for-linux.zulipchat.com/#narrow/stream/291565-Help/topic/insn_…
Finally, a nit: links are typically written like the following -- you
can still use bracket references at the end:
Link: https://lore.kernel.org/lkml/Y9ES4UKl%2F+DtvAVS@gmail.com/T/ [1]
Link: https://lore.kernel.org/rust-for-linux/20231119180145.157455-1-sergio.colla…
[2]
Cheers,
Miguel
The dwc3->gadget_driver is not initialized during the dwc3 probe
process. This leads to a warning when the runtime power management (PM)
attempts to suspend the gadget using dwc3_gadget_suspend().
This patch adds a check to prevent the warning.
Cc: stable(a)vger.kernel.org
Fixes: 61a348857e86 ("usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend")
Signed-off-by: Ray Chi <raychi(a)google.com>
---
drivers/usb/dwc3/gadget.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 28f49400f3e8..de987cffe1ec 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4708,6 +4708,9 @@ int dwc3_gadget_suspend(struct dwc3 *dwc)
unsigned long flags;
int ret;
+ if (!dwc->gadget_driver)
+ return 0;
+
ret = dwc3_gadget_soft_disconnect(dwc);
if (ret)
goto err;
--
2.44.0.rc0.258.g7320e95886-goog
psci_init_system_suspend() invokes suspend_set_ops() very early during
bootup even before kernel command line for mem_sleep_default is setup.
This leads to kernel command line mem_sleep_default=s2idle not working
as mem_sleep_current gets changed to deep via suspend_set_ops() and never
changes back to s2idle.
Move psci_init_system_suspend() to late_initcall() to make sure kernel
command line mem_sleep_default=s2idle sets up s2idle as default suspend
mode.
Fixes: faf7ec4a92c0 ("drivers: firmware: psci: add system suspend support")
CC: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Maulik Shah <quic_mkshah(a)quicinc.com>
---
drivers/firmware/psci/psci.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index d9629ff87861..655a2db70a67 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -523,18 +523,26 @@ static void __init psci_init_system_reset2(void)
psci_system_reset2_supported = true;
}
-static void __init psci_init_system_suspend(void)
+static int __init psci_init_system_suspend(void)
{
int ret;
+ u32 ver;
if (!IS_ENABLED(CONFIG_SUSPEND))
- return;
+ return 0;
+
+ ver = psci_0_2_get_version();
+ if (PSCI_VERSION_MAJOR(ver) < 1)
+ return 0;
ret = psci_features(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND));
if (ret != PSCI_RET_NOT_SUPPORTED)
suspend_set_ops(&psci_suspend_ops);
+
+ return ret;
}
+late_initcall(psci_init_system_suspend)
static void __init psci_init_cpu_suspend(void)
{
@@ -651,7 +659,6 @@ static int __init psci_probe(void)
if (PSCI_VERSION_MAJOR(ver) >= 1) {
psci_init_smccc();
psci_init_cpu_suspend();
- psci_init_system_suspend();
psci_init_system_reset2();
kvm_init_hyp_services();
}
---
base-commit: d37e1e4c52bc60578969f391fb81f947c3e83118
change-id: 20240219-suspend_ops_late_init-27fb0b15baee
Best regards,
--
Maulik Shah <quic_mkshah(a)quicinc.com>
I'm new here, first time reporting a regression, apologies in advance if
I'm doing something wrong of if this was already reported (I found some
CIFS issues but not exactly this one).
I'm using x86-64 Arch Linux and LTS kernel (6.1.71 as I write this) and
I noticed a regression that I could reproduce in other boxes with other
architectures as well (aarch64 with 6.1.70).
# mount.cifs //server/share /mnt
# mount
//server/share on /mnt type cifs (rw,relatime,vers=3.1.1...)
# cd /mnt
# df .
df: .: Resource temporarily unavailable
# ls -al
ls: .: Resource temporarily unavailable
ls: file1: Resource temporarily unavailable
ls: file2: Resource temporarily unavailable
[...then ls shows the listing...]
If I use strace with df, the problem is:
statfs(".", 0x.....) = -1 EAGAIN (Resource temporarily unavailable)
And with ls:
listxattr(".", 0x..., 152): -1 EAGAIN (Resource temporarily unavailable)
listxattr("file1", ..., 152): -1 EAGAIN (same as above)
...
Initially I thought the problem was with the Samba server and/or the
client mount flags, but I've spent a day trying a *lot* of different
combinations and nothing worked. This happens with any share that I try,
and I've tried mounting shares from multiple Linux boxes running
different Samba and kernel versions.
Then I tried changing kernel versions at my client box. I booted latest
6.6.9 and the problem simply disappeared. My Debian server with 6.5.11
also doesn't have it. I then started a VM and tried a "bisection" of
6.1.x versions, leading to kernel 6.1.70 when this started to happen.
6.1.69 and older look fine.
I hope that this is enough information to reproduce this issue. I will
be glad to provide more info if necessary.
// Leonardo.
When debugging issues with a workload using SysV shmem, Michal Hocko has
come up with a reproducer that shows how a series of mprotect()
operations can result in an elevated shm_nattch and thus leak of the
resource.
The problem is caused by wrong assumptions in vma_merge() commit
714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in
mergeability test"). The shmem vmas have a vma_ops->close callback
that decrements shm_nattch, and we remove the vma without calling it.
vma_merge() has thus historically avoided merging vma's with
vma_ops->close and commit 714965ca8252 was supposed to keep it that way.
It relaxed the checks for vma_ops->close in can_vma_merge_after()
assuming that it is never called on a vma that would be a candidate for
removal. However, the vma_merge() code does also use the result of this
check in the decision to remove a different vma in the merge case 7.
A robust solution would be to refactor vma_merge() code in a way that
the vma_ops->close check is only done for vma's that are actually going
to be removed, and not as part of the preliminary checks. That would
both solve the existing bug, and also allow additional merges that the
checks currently prevent unnecessarily in some cases.
However to fix the existing bug first with a minimized risk, and for
easier stable backports, this patch only adds a vma_ops->close check to
the buggy case 7 specifically. All other cases of vma removal are
covered by the can_vma_merge_before() check that includes the test for
vma_ops->close.
The reproducer code, adapted from Michal Hocko's code:
int main(int argc, char *argv[]) {
int segment_id;
size_t segment_size = 20 * PAGE_SIZE;
char * sh_mem;
struct shmid_ds shmid_ds;
key_t key = 0x1234;
segment_id = shmget(key, segment_size,
IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
sh_mem = (char *)shmat(segment_id, NULL, 0);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE);
mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
shmdt(sh_mem);
shmctl(segment_id, IPC_STAT, &shmid_ds);
printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch);
if (shmctl(segment_id, IPC_RMID, 0))
printf("IPCRM failed %d\n", errno);
return (shmid_ds.shm_nattch) ? 1 : 0;
}
Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Michal Hocko <mhocko(a)suse.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
---
v2: deduplicate code, per Lorenzo
mm/mmap.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index d89770eaab6b..3281287771c9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -954,13 +954,21 @@ static struct vm_area_struct
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing prev vma, so it skipped the check
+ * for vm_ops->close, but we are removing curr
+ */
+ if (curr->vm_ops && curr->vm_ops->close)
+ err = -EINVAL;
remove = curr;
} else { /* case 5 */
adjust = curr;
adj_start = (end - curr->vm_start);
}
+ if (!err)
+ err = dup_anon_vma(prev, curr, &anon_dup);
}
} else { /* merge_next */
vma_start_write(next);
--
2.43.1
From: Fabio Estevam <festevam(a)denx.de>
Since commit bfac19e239a7 ("fbdev: mx3fb: Remove the driver") backlight
is no longer functional.
The fbdev mx3fb driver used to automatically select
CONFIG_BACKLIGHT_CLASS_DEVICE.
Now that the mx3fb driver has been removed, enable the
CONFIG_BACKLIGHT_CLASS_DEVICE option so that backlight can still work
by default.
Tested on a imx6dl-sabresd board.
Cc: stable(a)vger.kernel.org
Fixes: bfac19e239a7 ("fbdev: mx3fb: Remove the driver")
Signed-off-by: Fabio Estevam <festevam(a)denx.de>
---
arch/arm/configs/imx_v6_v7_defconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index 0a90583f9f01..8f9dbe8d9029 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -297,6 +297,7 @@ CONFIG_FB_MODE_HELPERS=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_L4F00242T03=y
CONFIG_LCD_PLATFORM=y
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_PWM=y
CONFIG_BACKLIGHT_GPIO=y
CONFIG_FRAMEBUFFER_CONSOLE=y
--
2.34.1
The quilt patch titled
Subject: fat: fix uninitialized field in nostale filehandles
has been removed from the -mm tree. Its filename was
fat-fix-uninitialized-field-in-nostale-filehandles.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: fat: fix uninitialized field in nostale filehandles
Date: Mon, 5 Feb 2024 13:26:26 +0100
When fat_encode_fh_nostale() encodes file handle without a parent it
stores only first 10 bytes of the file handle. However the length of the
file handle must be a multiple of 4 so the file handle is actually 12
bytes long and the last two bytes remain uninitialized. This is not
great at we potentially leak uninitialized information with the handle
to userspace. Properly initialize the full handle length.
Link: https://lkml.kernel.org/r/20240205122626.13701-1-jack@suse.cz
Reported-by: syzbot+3ce5dea5b1539ff36769(a)syzkaller.appspotmail.com
Fixes: ea3983ace6b7 ("fat: restructure export_operations")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Acked-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Cc: Amir Goldstein <amir73il(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/nfs.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/fs/fat/nfs.c~fat-fix-uninitialized-field-in-nostale-filehandles
+++ a/fs/fat/nfs.c
@@ -130,6 +130,12 @@ fat_encode_fh_nostale(struct inode *inod
fid->parent_i_gen = parent->i_generation;
type = FILEID_FAT_WITH_PARENT;
*lenp = FAT_FID_SIZE_WITH_PARENT;
+ } else {
+ /*
+ * We need to initialize this field because the fh is actually
+ * 12 bytes long
+ */
+ fid->parent_i_pos_hi = 0;
}
return type;
_
Patches currently in -mm which might be from jack(a)suse.cz are
shmem-properly-report-quota-mount-options.patch
The quilt patch titled
Subject: bounds: support non-power-of-two CONFIG_NR_CPUS
has been removed from the -mm tree. Its filename was
bounds-support-non-power-of-two-config_nr_cpus.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: bounds: support non-power-of-two CONFIG_NR_CPUS
Date: Tue, 10 Oct 2023 15:55:49 +0100
ilog2() rounds down, so for example when PowerPC 85xx sets CONFIG_NR_CPUS
to 24, we will only allocate 4 bits to store the number of CPUs instead of
5. Use bits_per() instead, which rounds up. Found by code inspection.
The effect of this would probably be a misaccounting when doing NUMA
balancing, so to a user, it would only be a performance penalty. The
effects may be more wide-spread; it's hard to tell.
Link: https://lkml.kernel.org/r/20231010145549.1244748-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Fixes: 90572890d202 ("mm: numa: Change page last {nid,pid} into {cpu,pid}")
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/bounds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/bounds.c~bounds-support-non-power-of-two-config_nr_cpus
+++ a/kernel/bounds.c
@@ -19,7 +19,7 @@ int main(void)
DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS);
DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES);
#ifdef CONFIG_SMP
- DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
+ DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS));
#endif
DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t));
#ifdef CONFIG_LRU_GEN
_
Patches currently in -mm which might be from willy(a)infradead.org are
writeback-remove-a-duplicate-prototype-for-tag_pages_for_writeback.patch
writeback-factor-folio_prepare_writeback-out-of-write_cache_pages.patch
writeback-factor-writeback_get_batch-out-of-write_cache_pages.patch
writeback-simplify-the-loops-in-write_cache_pages.patch
pagevec-add-ability-to-iterate-a-queue.patch
writeback-use-the-folio_batch-queue-iterator.patch
writeback-move-the-folio_prepare_writeback-loop-out-of-write_cache_pages.patch
writeback-remove-a-use-of-write_cache_pages-from-do_writepages.patch
Hi,
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware
workaround.
2. Assigns all the CCS slices to one single user engine. The user
will then be able to query only one CCS engine
Changelog
=========
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI
engine list and adapt the engine counting accordingly (thanks
Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (2):
drm/i915/gt: Disable HW load balancing for CCS
drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/gt/intel_engine_user.c | 9 +++++++++
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 3 +++
drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++
drivers/gpu/drm/i915/i915_query.c | 1 +
5 files changed, 30 insertions(+)
--
2.43.0
The patch titled
Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close
Date: Thu, 22 Feb 2024 22:59:31 +0100
When debugging issues with a workload using SysV shmem, Michal Hocko has
come up with a reproducer that shows how a series of mprotect() operations
can result in an elevated shm_nattch and thus leak of the resource.
The problem is caused by wrong assumptions in vma_merge() commit
714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in
mergeability test"). The shmem vmas have a vma_ops->close callback that
decrements shm_nattch, and we remove the vma without calling it.
vma_merge() has thus historically avoided merging vma's with
vma_ops->close and commit 714965ca8252 was supposed to keep it that way.
It relaxed the checks for vma_ops->close in can_vma_merge_after() assuming
that it is never called on a vma that would be a candidate for removal.
However, the vma_merge() code does also use the result of this check in
the decision to remove a different vma in the merge case 7.
A robust solution would be to refactor vma_merge() code in a way that the
vma_ops->close check is only done for vma's that are actually going to be
removed, and not as part of the preliminary checks. That would both solve
the existing bug, and also allow additional merges that the checks
currently prevent unnecessarily in some cases.
However to fix the existing bug first with a minimized risk, and for
easier stable backports, this patch only adds a vma_ops->close check to
the buggy case 7 specifically. All other cases of vma removal are covered
by the can_vma_merge_before() check that includes the test for
vma_ops->close.
The reproducer code, adapted from Michal Hocko's code:
int main(int argc, char *argv[]) {
int segment_id;
size_t segment_size = 20 * PAGE_SIZE;
char * sh_mem;
struct shmid_ds shmid_ds;
key_t key = 0x1234;
segment_id = shmget(key, segment_size,
IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
sh_mem = (char *)shmat(segment_id, NULL, 0);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE);
mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
shmdt(sh_mem);
shmctl(segment_id, IPC_STAT, &shmid_ds);
printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch);
if (shmctl(segment_id, IPC_RMID, 0))
printf("IPCRM failed %d\n", errno);
return (shmid_ds.shm_nattch) ? 1 : 0;
}
Link: https://lkml.kernel.org/r/20240222215930.14637-2-vbabka@suse.cz
Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-fix-vma_merge-case-7-with-vma_ops-close
+++ a/mm/mmap.c
@@ -954,13 +954,21 @@ static struct vm_area_struct
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing prev vma, so it skipped the check
+ * for vm_ops->close, but we are removing curr
+ */
+ if (curr->vm_ops && curr->vm_ops->close)
+ err = -EINVAL;
remove = curr;
} else { /* case 5 */
adjust = curr;
adj_start = (end - curr->vm_start);
}
+ if (!err)
+ err = dup_anon_vma(prev, curr, &anon_dup);
}
} else { /* merge_next */
vma_start_write(next);
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch
mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch
When debugging issues with a workload using SysV shmem, Michal Hocko has
come up with a reproducer that shows how a series of mprotect()
operations can result in an elevated shm_nattch and thus leak of the
resource.
The problem is caused by wrong assumptions in vma_merge() commit
714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in
mergeability test"). The shmem vmas have a vma_ops->close callback
that decrements shm_nattch, and we remove the vma without calling it.
vma_merge() has thus historically avoided merging vma's with
vma_ops->close and commit 714965ca8252 was supposed to keep it that way.
It relaxed the checks for vma_ops->close in can_vma_merge_after()
assuming that it is never called on a vma that would be a candidate for
removal. However, the vma_merge() code does also use the result of this
check in the decision to remove a different vma in the merge case 7.
A robust solution would be to refactor vma_merge() code in a way that
the vma_ops->close check is only done for vma's that are actually going
to be removed, and not as part of the preliminary checks. That would
both solve the existing bug, and also allow additional merges that the
checks currently prevent unnecessarily in some cases.
However to fix the existing bug first with a minimized risk, and for
easier stable backports, this patch only adds a vma_ops->close check to
the buggy case 7 specifically. All other cases of vma removal are
covered by the can_vma_merge_before() check that includes the test for
vma_ops->close.
The reproducer code, adapted from Michal Hocko's code:
int main(int argc, char *argv[]) {
int segment_id;
size_t segment_size = 20 * PAGE_SIZE;
char * sh_mem;
struct shmid_ds shmid_ds;
key_t key = 0x1234;
segment_id = shmget(key, segment_size,
IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
sh_mem = (char *)shmat(segment_id, NULL, 0);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE);
mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
shmdt(sh_mem);
shmctl(segment_id, IPC_STAT, &shmid_ds);
printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch);
if (shmctl(segment_id, IPC_RMID, 0))
printf("IPCRM failed %d\n", errno);
return (shmid_ds.shm_nattch) ? 1 : 0;
}
Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test")
Reported-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
---
mm/mmap.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index d89770eaab6b..a4238373ee9b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -954,10 +954,19 @@ static struct vm_area_struct
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing prev vma, so it skipped the check
+ * for vm_ops->close, but we are removing curr
+ */
+ if (curr->vm_ops && curr->vm_ops->close)
+ err = -EINVAL;
+ else
+ err = dup_anon_vma(prev, curr, &anon_dup);
remove = curr;
} else { /* case 5 */
+ err = dup_anon_vma(prev, curr, &anon_dup);
adjust = curr;
adj_start = (end - curr->vm_start);
}
--
2.43.1
A recent DRM series purporting to simplify support for "transparent
bridges" and handling of probe deferrals ironically exposed a
use-after-free issue on pmic_glink_altmode probe deferral.
This has manifested itself as the display subsystem occasionally failing
to initialise and NULL-pointer dereferences during boot of machines like
the Lenovo ThinkPad X13s.
Specifically, the dp-hpd bridge is currently registered before all
resources have been acquired which means that it can also be
deregistered on probe deferrals.
In the meantime there is a race window where the new aux bridge driver
(or PHY driver previously) may have looked up the dp-hpd bridge and
stored a (non-reference-counted) pointer to the bridge which is about to
be deallocated.
When the display controller is later initialised, this triggers a
use-after-free when attaching the bridges:
dp -> aux -> dp-hpd (freed)
which may, for example, result in the freed bridge failing to attach:
[drm:drm_bridge_attach [drm]] *ERROR* failed to attach bridge /soc@0/phy@88eb000 to encoder TMDS-31: -16
or a NULL-pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
Call trace:
drm_bridge_attach+0x70/0x1a8 [drm]
drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
drm_bridge_attach+0x80/0x1a8 [drm]
dp_bridge_init+0xa8/0x15c [msm]
msm_dp_modeset_init+0x28/0xc4 [msm]
The DRM bridge implementation is clearly fragile and implicitly built on
the assumption that bridges may never go away. In this case, the fix is
to move the bridge registration in the pmic_glink_altmode driver to
after all resources have been looked up.
Incidentally, with the new dp-hpd bridge implementation, which registers
child devices, this is also a requirement due to a long-standing issue
in driver core that can otherwise lead to a probe deferral loop (see
fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")).
Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support")
Fixes: 2bcca96abfbf ("soc: qcom: pmic-glink: switch to DRM_AUX_HPD_BRIDGE")
Cc: stable(a)vger.kernel.org # 6.3
Cc: Bjorn Andersson <andersson(a)kernel.org>
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/soc/qcom/pmic_glink_altmode.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c
index 5fcd0fdd2faa..b3808fc24c69 100644
--- a/drivers/soc/qcom/pmic_glink_altmode.c
+++ b/drivers/soc/qcom/pmic_glink_altmode.c
@@ -76,7 +76,7 @@ struct pmic_glink_altmode_port {
struct work_struct work;
- struct device *bridge;
+ struct auxiliary_device *bridge;
enum typec_orientation orientation;
u16 svid;
@@ -230,7 +230,7 @@ static void pmic_glink_altmode_worker(struct work_struct *work)
else
pmic_glink_altmode_enable_usb(altmode, alt_port);
- drm_aux_hpd_bridge_notify(alt_port->bridge,
+ drm_aux_hpd_bridge_notify(&alt_port->bridge->dev,
alt_port->hpd_state ?
connector_status_connected :
connector_status_disconnected);
@@ -454,7 +454,7 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
alt_port->index = port;
INIT_WORK(&alt_port->work, pmic_glink_altmode_worker);
- alt_port->bridge = drm_dp_hpd_bridge_register(dev, to_of_node(fwnode));
+ alt_port->bridge = devm_drm_dp_hpd_bridge_alloc(dev, to_of_node(fwnode));
if (IS_ERR(alt_port->bridge)) {
fwnode_handle_put(fwnode);
return PTR_ERR(alt_port->bridge);
@@ -510,6 +510,16 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
}
}
+ for (port = 0; port < ARRAY_SIZE(altmode->ports); port++) {
+ alt_port = &altmode->ports[port];
+ if (!alt_port->bridge)
+ continue;
+
+ ret = devm_drm_dp_hpd_bridge_add(dev, alt_port->bridge);
+ if (ret)
+ return ret;
+ }
+
altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
--
2.43.0
From: Elad Nachman <enachman(a)marvell.com>
AC5X spec says PHY init complete bit must be polled until zero.
We see cases in which timeout can take longer than the standard
calculation on AC5X, which is expected following the spec comment above.
According to the spec, we must wait as long as it takes for that bit to
toggle on AC5X.
Cap that with 100 delay loops so we won't get stuck forever.
Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC")
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Elad Nachman <enachman(a)marvell.com>
---
drivers/mmc/host/sdhci-xenon-phy.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/mmc/host/sdhci-xenon-phy.c b/drivers/mmc/host/sdhci-xenon-phy.c
index c3096230a969..cc9d28b75eb9 100644
--- a/drivers/mmc/host/sdhci-xenon-phy.c
+++ b/drivers/mmc/host/sdhci-xenon-phy.c
@@ -110,6 +110,8 @@
#define XENON_EMMC_PHY_LOGIC_TIMING_ADJUST (XENON_EMMC_PHY_REG_BASE + 0x18)
#define XENON_LOGIC_TIMING_VALUE 0x00AA8977
+#define XENON_MAX_PHY_TIMEOUT_LOOPS 100
+
/*
* List offset of PHY registers and some special register values
* in eMMC PHY 5.0 or eMMC PHY 5.1
@@ -278,18 +280,27 @@ static int xenon_emmc_phy_init(struct sdhci_host *host)
/* get the wait time */
wait /= clock;
wait++;
- /* wait for host eMMC PHY init completes */
- udelay(wait);
- reg = sdhci_readl(host, phy_regs->timing_adj);
- reg &= XENON_PHY_INITIALIZAION;
- if (reg) {
+ /*
+ * AC5X spec says bit must be polled until zero.
+ * We see cases in which timeout can take longer
+ * than the standard calculation on AC5X, which is
+ * expected following the spec comment above.
+ * According to the spec, we must wait as long as
+ * it takes for that bit to toggle on AC5X.
+ * Cap that with 100 delay loops so we won't get
+ * stuck here forever:
+ */
+
+ ret = read_poll_timeout(sdhci_readl, reg,
+ !(reg & XENON_PHY_INITIALIZAION),
+ wait, XENON_MAX_PHY_TIMEOUT_LOOPS * wait,
+ false, host, phy_regs->timing_adj);
+ if (ret)
dev_err(mmc_dev(host->mmc), "eMMC PHY init cannot complete after %d us\n",
- wait);
- return -ETIMEDOUT;
- }
+ wait * XENON_MAX_PHY_TIMEOUT_LOOPS);
- return 0;
+ return ret;
}
#define ARMADA_3700_SOC_PAD_1_8V 0x1
--
2.25.1
The patch titled
Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close
Date: Thu, 22 Feb 2024 17:55:50 +0100
When debugging issues with a workload using SysV shmem, Michal Hocko has
come up with a reproducer that shows how a series of mprotect() operations
can result in an elevated shm_nattch and thus leak of the resource.
The problem is caused by wrong assumptions in vma_merge() commit
714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in
mergeability test"). The shmem vmas have a vma_ops->close callback that
decrements shm_nattch, and we remove the vma without calling it.
vma_merge() has thus historically avoided merging vma's with
vma_ops->close and commit 714965ca8252 was supposed to keep it that way.
It relaxed the checks for vma_ops->close in can_vma_merge_after() assuming
that it is never called on a vma that would be a candidate for removal.
However, the vma_merge() code does also use the result of this check in
the decision to remove a different vma in the merge case 7.
A robust solution would be to refactor vma_merge() code in a way that the
vma_ops->close check is only done for vma's that are actually going to be
removed, and not as part of the preliminary checks. That would both solve
the existing bug, and also allow additional merges that the checks
currently prevent unnecessarily in some cases.
However to fix the existing bug first with a minimized risk, and for
easier stable backports, this patch only adds a vma_ops->close check to
the buggy case 7 specifically. All other cases of vma removal are covered
by the can_vma_merge_before() check that includes the test for
vma_ops->close.
The reproducer code, adapted from Michal Hocko's code:
int main(int argc, char *argv[]) {
int segment_id;
size_t segment_size = 20 * PAGE_SIZE;
char * sh_mem;
struct shmid_ds shmid_ds;
key_t key = 0x1234;
segment_id = shmget(key, segment_size,
IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
sh_mem = (char *)shmat(segment_id, NULL, 0);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE);
mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE);
shmdt(sh_mem);
shmctl(segment_id, IPC_STAT, &shmid_ds);
printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch);
if (shmctl(segment_id, IPC_RMID, 0))
printf("IPCRM failed %d\n", errno);
return (shmid_ds.shm_nattch) ? 1 : 0;
}
Link: https://lkml.kernel.org/r/20240222165549.32753-2-vbabka@suse.cz
Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Michal Hocko <mhocko(a)suse.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/mmap.c~mm-mmap-fix-vma_merge-case-7-with-vma_ops-close
+++ a/mm/mmap.c
@@ -954,10 +954,19 @@ static struct vm_area_struct
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing prev vma, so it skipped the check
+ * for vm_ops->close, but we are removing curr
+ */
+ if (curr->vm_ops && curr->vm_ops->close)
+ err = -EINVAL;
+ else
+ err = dup_anon_vma(prev, curr, &anon_dup);
remove = curr;
} else { /* case 5 */
+ err = dup_anon_vma(prev, curr, &anon_dup);
adjust = curr;
adj_start = (end - curr->vm_start);
}
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch
mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch
Hi,
maybe you will not like introducing 'static int int_max = INT_MAX;' for
this old kernel which EOL in 10 months.
Cyril Hrubis (3):
sched/rt: Fix sysctl_sched_rr_timeslice intial value
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
sched/rt: Disallow writing invalid values to sched_rt_period_us
kernel/sched/rt.c | 10 +++++-----
kernel/sysctl.c | 5 +++++
2 files changed, 10 insertions(+), 5 deletions(-)
--
2.35.3
From: Cyril Hrubis <chrubis(a)suse.cz>
[ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ]
The validation of the value written to sched_rt_period_us was broken
because:
- the sysclt_sched_rt_period is declared as unsigned int
- parsed by proc_do_intvec()
- the range is asserted after the value parsed by proc_do_intvec()
Because of this negative values written to the file were written into a
unsigned integer that were later on interpreted as large positive
integers which did passed the check:
if (sysclt_sched_rt_period <= 0)
return EINVAL;
This commit fixes the parsing by setting explicit range for both
perid_us and runtime_us into the sched_rt_sysctls table and processes
the values with proc_dointvec_minmax() instead.
Alternatively if we wanted to use full range of unsigned int for the
period value we would have to split the proc_handler and use
proc_douintvec() for it however even the
Documentation/scheduller/sched-rt-group.rst describes the range as 1 to
INT_MAX.
As far as I can tell the only problem this causes is that the sysctl
file allows writing negative values which when read back may confuse
userspace.
There is also a LTP test being submitted for these sysctl files at:
http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis…
Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
kernel/sched/rt.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 904dd8534597..4ac36eb4cdee 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -37,6 +37,8 @@ static struct ctl_table sched_rt_sysctls[] = {
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = sched_rt_handler,
+ .extra1 = SYSCTL_ONE,
+ .extra2 = SYSCTL_INT_MAX,
},
{
.procname = "sched_rt_runtime_us",
@@ -44,6 +46,8 @@ static struct ctl_table sched_rt_sysctls[] = {
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = sched_rt_handler,
+ .extra1 = SYSCTL_NEG_ONE,
+ .extra2 = SYSCTL_INT_MAX,
},
{
.procname = "sched_rr_timeslice_ms",
@@ -2989,9 +2993,6 @@ static int sched_rt_global_constraints(void)
#ifdef CONFIG_SYSCTL
static int sched_rt_global_validate(void)
{
- if (sysctl_sched_rt_period <= 0)
- return -EINVAL;
-
if ((sysctl_sched_rt_runtime != RUNTIME_INF) &&
((sysctl_sched_rt_runtime > sysctl_sched_rt_period) ||
((u64)sysctl_sched_rt_runtime *
@@ -3022,7 +3023,7 @@ static int sched_rt_handler(struct ctl_table *table, int write, void *buffer,
old_period = sysctl_sched_rt_period;
old_runtime = sysctl_sched_rt_runtime;
- ret = proc_dointvec(table, write, buffer, lenp, ppos);
+ ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) {
ret = sched_rt_global_validate();
--
2.35.3
From: Cyril Hrubis <chrubis(a)suse.cz>
[ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ]
The sched_rr_timeslice can be reset to default by writing value that is
<= 0. However after reading from this file we always got the last value
written, which is not useful at all.
$ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms
$ cat /proc/sys/kernel/sched_rr_timeslice_ms
-1
Fix this by setting the variable that holds the sysctl file value to the
jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written.
Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Petr Vorel <pvorel(a)suse.cz>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Tested-by: Petr Vorel <pvorel(a)suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
kernel/sched/rt.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 76bafa8d331a..f79a6f36777a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -3047,6 +3047,9 @@ static int sched_rr_handler(struct ctl_table *table, int write, void *buffer,
sched_rr_timeslice =
sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE :
msecs_to_jiffies(sysctl_sched_rr_timeslice);
+
+ if (sysctl_sched_rr_timeslice <= 0)
+ sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE);
}
mutex_unlock(&mutex);
--
2.35.3
From: Cyril Hrubis <chrubis(a)suse.cz>
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Petr Vorel <pvorel(a)suse.cz>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Tested-by: Petr Vorel <pvorel(a)suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
[ pvorel: rebased for 5.4 ]
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
kernel/sched/rt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 186e7d78ded5..e40f32e3ab06 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,7 +8,7 @@
#include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE;
-int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
+int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ;
/* More than 4 hours if BW_SHIFT equals 20. */
static const u64 max_rt_runtime = MAX_BW;
--
2.35.3
From: Cyril Hrubis <chrubis(a)suse.cz>
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Petr Vorel <pvorel(a)suse.cz>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Tested-by: Petr Vorel <pvorel(a)suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
[ pvorel: rebased for 5.15, 5.10 ]
Signed-off-by: Petr Vorel <pvorel(a)suse.cz>
---
kernel/sched/rt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7045595aacac..3394b7f923a0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,7 +8,7 @@
#include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE;
-int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
+int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ;
/* More than 4 hours if BW_SHIFT equals 20. */
static const u64 max_rt_runtime = MAX_BW;
--
2.35.3
We observed a corruption during on-line resize of a file system that is
larger than 16 TiB with 4k block size. With having more then 2^32 blocks
resize_inode is turned off by default by mke2fs. The issue can be
reproduced on a smaller file system for convenience by explicitly
turning off resize_inode. An on-line resize across an 8 GiB boundary (the
size of a meta block group in this setup) then leads to a corruption:
dev=/dev/<some_dev> # should be >= 16 GiB
mkdir -p /corruption
/sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15))
mount -t ext4 $dev /corruption
dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15))
sha1sum /corruption/test
# 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test
/sbin/resize2fs $dev $((2*2**21))
# drop page cache to force reload the block from disk
echo 1 > /proc/sys/vm/drop_caches
sha1sum /corruption/test
# 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test
2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per
block group and 2^6 are the number of block groups that make a meta
block group.
The last checksum might be different depending on how the file is laid
out across the physical blocks. The actual corruption occurs at physical
block 63*2^15 = 2064384 which would be the location of the backup of the
meta block group's block descriptor. During the on-line resize the file
system will be converted to meta_bg starting at s_first_meta_bg which is
2 in the example - meaning all block groups after 16 GiB. However, in
ext4_flex_group_add we might add block groups that are not part of the
first meta block group yet. In the reproducer we achieved this by
substracting the size of a whole block group from the point where the
meta block group would start. This must be considered when updating the
backup block group descriptors to follow the non-meta_bg layout. The fix
is to add a test whether the group to add is already part of the meta
block group or not.
Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems")
Cc: stable(a)vger.kernel.org
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
---
fs/ext4/resize.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4d4a5a32e310..3c0d12382e06 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1602,7 +1602,8 @@ static int ext4_flex_group_add(struct super_block *sb,
int gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
int gdb_num_end = ((group + flex_gd->count - 1) /
EXT4_DESC_PER_BLOCK(sb));
- int meta_bg = ext4_has_feature_meta_bg(sb);
+ int meta_bg = ext4_has_feature_meta_bg(sb) &&
+ gdb_num >= le32_to_cpu(es->s_first_meta_bg);
sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr -
ext4_group_first_block_no(sb, 0);
--
2.40.1
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
When the PCI device is surprise removed, requests won't complete from
the device. These IOs are never completed and disk deletion hangs
indefinitely.
Fix it by aborting the IOs which the device will never complete
when the VQ is broken.
With this fix now fio completes swiftly.
An alternative of IO timeout has been considered, however
when the driver knows about unresponsive block device, swiftly clearing
them enables users and upper layers to react quickly.
Verified with multiple device unplug cycles with pending IOs in virtio
used ring and some pending with device.
In future instead of VQ broken, a more elegant method can be used. At the
moment the patch is kept to its minimal changes given its urgency to fix
broken kernels.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
Cc: stable(a)vger.kernel.org
Reported-by: lirongqing(a)baidu.com
Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@bai…
Co-developed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Signed-off-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Signed-off-by: Parav Pandit <parav(a)nvidia.com>
---
drivers/block/virtio_blk.c | 54 ++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2bf14a0e2815..59b49899b229 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1562,10 +1562,64 @@ static int virtblk_probe(struct virtio_device *vdev)
return err;
}
+static bool virtblk_cancel_request(struct request *rq, void *data)
+{
+ struct virtblk_req *vbr = blk_mq_rq_to_pdu(rq);
+
+ vbr->in_hdr.status = VIRTIO_BLK_S_IOERR;
+ if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq))
+ blk_mq_complete_request(rq);
+
+ return true;
+}
+
+static void virtblk_cleanup_reqs(struct virtio_blk *vblk)
+{
+ struct virtio_blk_vq *blk_vq;
+ struct request_queue *q;
+ struct virtqueue *vq;
+ unsigned long flags;
+ int i;
+
+ vq = vblk->vqs[0].vq;
+ if (!virtqueue_is_broken(vq))
+ return;
+
+ q = vblk->disk->queue;
+ /* Block upper layer to not get any new requests */
+ blk_mq_quiesce_queue(q);
+
+ for (i = 0; i < vblk->num_vqs; i++) {
+ blk_vq = &vblk->vqs[i];
+
+ /* Synchronize with any ongoing virtblk_poll() which may be
+ * completing the requests to uppper layer which has already
+ * crossed the broken vq check.
+ */
+ spin_lock_irqsave(&blk_vq->lock, flags);
+ spin_unlock_irqrestore(&blk_vq->lock, flags);
+ }
+
+ blk_sync_queue(q);
+
+ /* Complete remaining pending requests with error */
+ blk_mq_tagset_busy_iter(&vblk->tag_set, virtblk_cancel_request, vblk);
+ blk_mq_tagset_wait_completed_request(&vblk->tag_set);
+
+ /*
+ * Unblock any pending dispatch I/Os before we destroy device. From
+ * del_gendisk() -> __blk_mark_disk_dead(disk) will set GD_DEAD flag,
+ * that will make sure any new I/O from bio_queue_enter() to fail.
+ */
+ blk_mq_unquiesce_queue(q);
+}
+
static void virtblk_remove(struct virtio_device *vdev)
{
struct virtio_blk *vblk = vdev->priv;
+ virtblk_cleanup_reqs(vblk);
+
/* Make sure no work handler is accessing the device. */
flush_work(&vblk->config_work);
--
2.34.1
From: Michal Kazior <michal(a)plume.com>
[ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ]
The nl80211_dump_interface() supports resumption
in case nl80211_send_iface() doesn't have the
resources to complete its work.
The logic would store the progress as iteration
offsets for rdev and wdev loops.
However the logic did not properly handle
resumption for non-last rdev. Assuming a system
with 2 rdevs, with 2 wdevs each, this could
happen:
dump(cb=[0, 0]):
if_start=cb[1] (=0)
send rdev0.wdev0 -> ok
send rdev0.wdev1 -> yield
cb[1] = 1
dump(cb=[0, 1]):
if_start=cb[1] (=1)
send rdev0.wdev1 -> ok
// since if_start=1 the rdev0.wdev0 got skipped
// through if_idx < if_start
send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev
loop end.
The problem is actually hard to hit on a desktop,
and even on most routers. The prerequisites for
this manifesting was:
- more than 1 wiphy
- a few handful of interfaces
- dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each.
It'd miss 6 interfaces from the last wiphy
reported to userspace.
Signed-off-by: Michal Kazior <michal(a)plume.com>
Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/wireless/nl80211.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 1cbbb11ea503..fbf95b7ff6b4 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4008,6 +4008,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback *
}
wiphy_unlock(&rdev->wiphy);
+ if_start = 0;
wp_idx++;
}
out:
--
2.43.0
From: "Guilherme G. Piccoli" <gpiccoli(a)igalia.com>
[ Upstream commit e585a37e5061f6d5060517aed1ca4ccb2e56a34c ]
By running a Van Gogh device (Steam Deck), the following message
was noticed in the kernel log:
pci 0000:04:00.3: PCI class overridden (0x0c03fe -> 0x0c03fe) so dwc3 driver can claim this instead of xhci
Effectively this means the quirk executed but changed nothing, since the
class of this device was already the proper one (likely adjusted by newer
firmware versions).
Check and perform the override only if necessary.
Link: https://lore.kernel.org/r/20231120160531.361552-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Vicki Pfau <vi(a)endrift.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pci/quirks.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8765544bac35..75b297c15cf5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -607,10 +607,13 @@ static void quirk_amd_dwc_class(struct pci_dev *pdev)
{
u32 class = pdev->class;
- /* Use "USB Device (not host controller)" class */
- pdev->class = PCI_CLASS_SERIAL_USB_DEVICE;
- pci_info(pdev, "PCI class overridden (%#08x -> %#08x) so dwc3 driver can claim this instead of xhci\n",
- class, pdev->class);
+ if (class != PCI_CLASS_SERIAL_USB_DEVICE) {
+ /* Use "USB Device (not host controller)" class */
+ pdev->class = PCI_CLASS_SERIAL_USB_DEVICE;
+ pci_info(pdev,
+ "PCI class overridden (%#08x -> %#08x) so dwc3 driver can claim this instead of xhci\n",
+ class, pdev->class);
+ }
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_NL_USB,
quirk_amd_dwc_class);
--
2.43.0
From: Michal Kazior <michal(a)plume.com>
[ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ]
The nl80211_dump_interface() supports resumption
in case nl80211_send_iface() doesn't have the
resources to complete its work.
The logic would store the progress as iteration
offsets for rdev and wdev loops.
However the logic did not properly handle
resumption for non-last rdev. Assuming a system
with 2 rdevs, with 2 wdevs each, this could
happen:
dump(cb=[0, 0]):
if_start=cb[1] (=0)
send rdev0.wdev0 -> ok
send rdev0.wdev1 -> yield
cb[1] = 1
dump(cb=[0, 1]):
if_start=cb[1] (=1)
send rdev0.wdev1 -> ok
// since if_start=1 the rdev0.wdev0 got skipped
// through if_idx < if_start
send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev
loop end.
The problem is actually hard to hit on a desktop,
and even on most routers. The prerequisites for
this manifesting was:
- more than 1 wiphy
- a few handful of interfaces
- dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each.
It'd miss 6 interfaces from the last wiphy
reported to userspace.
Signed-off-by: Michal Kazior <michal(a)plume.com>
Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/wireless/nl80211.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 0ac829c8f188..279f4977e2ee 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3595,6 +3595,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback *
if_idx++;
}
+ if_start = 0;
wp_idx++;
}
out:
--
2.43.0
From: Michal Kazior <michal(a)plume.com>
[ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ]
The nl80211_dump_interface() supports resumption
in case nl80211_send_iface() doesn't have the
resources to complete its work.
The logic would store the progress as iteration
offsets for rdev and wdev loops.
However the logic did not properly handle
resumption for non-last rdev. Assuming a system
with 2 rdevs, with 2 wdevs each, this could
happen:
dump(cb=[0, 0]):
if_start=cb[1] (=0)
send rdev0.wdev0 -> ok
send rdev0.wdev1 -> yield
cb[1] = 1
dump(cb=[0, 1]):
if_start=cb[1] (=1)
send rdev0.wdev1 -> ok
// since if_start=1 the rdev0.wdev0 got skipped
// through if_idx < if_start
send rdev1.wdev1 -> ok
The if_start needs to be reset back to 0 upon wdev
loop end.
The problem is actually hard to hit on a desktop,
and even on most routers. The prerequisites for
this manifesting was:
- more than 1 wiphy
- a few handful of interfaces
- dump without rdev or wdev filter
I was seeing this with 4 wiphys 9 interfaces each.
It'd miss 6 interfaces from the last wiphy
reported to userspace.
Signed-off-by: Michal Kazior <michal(a)plume.com>
Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/wireless/nl80211.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 70fb14b8bab0..c259d3227a9e 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -3960,6 +3960,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback *
if_idx++;
}
+ if_start = 0;
wp_idx++;
}
out:
--
2.43.0
From: Baokun Li <libaokun1(a)huawei.com>
[ Upstream commit 4530b3660d396a646aad91a787b6ab37cf604b53 ]
Determine if the group block bitmap is corrupted before using ac_b_ex in
ext4_mb_try_best_found() to avoid allocating blocks from a group with a
corrupted block bitmap in the following concurrency and making the
situation worse.
ext4_mb_regular_allocator
ext4_lock_group(sb, group)
ext4_mb_good_group
// check if the group bbitmap is corrupted
ext4_mb_complex_scan_group
// Scan group gets ac_b_ex but doesn't use it
ext4_unlock_group(sb, group)
ext4_mark_group_bitmap_corrupted(group)
// The block bitmap was corrupted during
// the group unlock gap.
ext4_mb_try_best_found
ext4_lock_group(ac->ac_sb, group)
ext4_mb_use_best_found
mb_mark_used
// Allocating blocks in block bitmap corrupted group
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/ext4/mballoc.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 3babc07ae613..d7724601f42b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1854,6 +1854,9 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
return err;
ext4_lock_group(ac->ac_sb, group);
+ if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+ goto out;
+
max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
if (max > 0) {
@@ -1861,6 +1864,7 @@ int ext4_mb_try_best_found(struct ext4_allocation_context *ac,
ext4_mb_use_best_found(ac, e4b);
}
+out:
ext4_unlock_group(ac->ac_sb, group);
ext4_mb_unload_buddy(e4b);
--
2.43.0
l2tp_ip6_sendmsg needs to avoid accounting for the transport header
twice when splicing more data into an already partially-occupied skbuff.
To manage this, we check whether the skbuff contains data using
skb_queue_empty when deciding how much data to append using
ip6_append_data.
However, the code which performed the calculation was incorrect:
ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
...due to C operator precedence, this ends up setting ulen to
transhdrlen for messages with a non-zero length, which results in
corrupted packets on the wire.
Add parentheses to correct the calculation in line with the original
intent.
Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()")
Cc: David Howells <dhowells(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Tom Parkin <tparkin(a)katalix.com>
---
This issue was uncovered by Debian build-testing for the
golang-github-katalix-go-l2tp package[1].
It seems 9d4c75800f61 has been backported to the linux-6.1.y stable
kernel (and possibly others), so I think this fix will also need
backporting.
The bug is currently seen on at least Debian Bookworm, Ubuntu Jammy, and
Debian testing/unstable.
Unfortunately tests using "ip l2tp" and which focus on dataplane
transport will not uncover this bug: it's necessary to send a packet
using an L2TPIP6 socket opened by userspace, and to verify the packet on
the wire. The l2tp-ktest[2] test suite has been extended to cover this.
[1]. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1063746
[2]. https://github.com/katalix/l2tp-ktest
---
net/l2tp/l2tp_ip6.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index dd3153966173..7bf14cf9ffaa 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -627,7 +627,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
back_from_confirm:
lock_sock(sk);
- ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
+ ulen = len + (skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0);
err = ip6_append_data(sk, ip_generic_getfrag, msg,
ulen, transhdrlen, &ipc6,
&fl6, (struct rt6_info *)dst,
--
2.34.1
If caching mode change fails due to, for example, OOM we
free the allocated pages in a two-step process. First the pages
for which the caching change has already succeeded. Secondly
the pages for which a caching change did not succeed.
However the second step was incorrectly freeing the pages already
freed in the first step.
Fix.
Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Christian Koenig <christian.koenig(a)amd.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.4+
---
drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index b62f420a9f96..112438d965ff 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
enum ttm_caching caching,
pgoff_t start_page, pgoff_t end_page)
{
- struct page **pages = tt->pages;
+ struct page **pages = &tt->pages[start_page];
unsigned int order;
pgoff_t i, nr;
--
2.43.0
From: Ard Biesheuvel <ardb(a)kernel.org>
The bit-sliced implementation of AES-CTR operates on blocks of 128
bytes, and will fall back to the plain NEON version for tail blocks or
inputs that are shorter than 128 bytes to begin with.
It will call straight into the plain NEON asm helper, which performs all
memory accesses in granules of 16 bytes (the size of a NEON register).
For this reason, the associated plain NEON glue code will copy inputs
shorter than 16 bytes into a temporary buffer, given that this is a rare
occurrence and it is not worth the effort to work around this in the asm
code.
The fallback from the bit-sliced NEON version fails to take this into
account, potentially resulting in out-of-bounds accesses. So clone the
same workaround, and use a temp buffer for short in/outputs.
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+f1ceaa1a09ab891e1934(a)syzkaller.appspotmail.com
Tested-by: syzbot+f1ceaa1a09ab891e1934(a)syzkaller.appspotmail.com
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/crypto/aes-neonbs-glue.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index bac4cabef607..849dc41320db 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_request *req)
src += blocks * AES_BLOCK_SIZE;
}
if (nbytes && walk.nbytes == walk.total) {
+ u8 buf[AES_BLOCK_SIZE];
+ u8 *d = dst;
+
+ if (unlikely(nbytes < AES_BLOCK_SIZE))
+ src = dst = memcpy(buf + sizeof(buf) - nbytes,
+ src, nbytes);
+
neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds,
nbytes, walk.iv);
+
+ if (unlikely(nbytes < AES_BLOCK_SIZE))
+ memcpy(d, buf + sizeof(buf) - nbytes, nbytes);
+
nbytes = 0;
}
kernel_neon_end();
--
2.44.0.rc0.258.g7320e95886-goog