From: Vasiliy Kovalev <kovalev(a)altlinux.org>
When returning from the hci_disconnect() function, the conn->state
continues to be set to BT_CONNECTED and hci_conn_drop() is executed,
which decrements the conn->refcnt.
Syzkaller has generated a reproducer that results in multiple calls to
hci_encrypt_change_evt() of the same conn object.
--
hci_encrypt_change_evt(){
// conn->state == BT_CONNECTED
hci_disconnect(){
hci_abort_conn();
}
hci_conn_drop();
// conn->state == BT_CONNECTED
}
--
This behavior can cause the conn->refcnt to go far into negative values
and cause problems. To get around this, you need to change the conn->state,
namely to BT_DISCONN, as it was before.
Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
net/bluetooth/hci_event.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 64477e1bde7cec..e0477021183f9b 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2989,6 +2989,7 @@ static void hci_cs_le_start_enc(struct hci_dev *hdev, u8 status)
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
unlock:
hci_dev_unlock(hdev);
@@ -3654,6 +3655,7 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, void *data,
hci_encrypt_cfm(conn, ev->status);
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
@@ -5248,6 +5250,7 @@ static void hci_key_refresh_complete_evt(struct hci_dev *hdev, void *data,
if (ev->status && conn->state == BT_CONNECTED) {
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
--
2.33.8
After a recent discussion regarding "do we need a 'nobackport' tag" I
set out to create one change for stable-kernel-rules.rst. This is now
the second patch in the series, which links to that discussion; the
other stuff is fine-tuning that happened along the way.
Ciao, Thorsten
Thorsten Leemhuis (4):
docs: stable-kernel-rules: reduce redundancy
docs: stable-kernel-rules: mention "no semi-automatic backport"
docs: stable-kernel-rules: call mainline by its name and change
example
docs: stable-kernel-rules: remove code-labels tags
Documentation/process/stable-kernel-rules.rst | 50 +++++++------------
1 file changed, 18 insertions(+), 32 deletions(-)
base-commit: 3f86ed6ec0b390c033eae7f9c487a3fea268e027
--
2.44.0
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.
There *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:
- spin_unlock(),
- switch_to().
So it's fine when the architecture switch_mm happens to have that barrier
already, but less so when the architecture only provides the full barrier
in switch_to() or spin_unlock().
It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().
Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.
Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.
Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
Reported-by: levi.yun <yeoreum.yun(a)arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: levi.yun <yeoreum.yun(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
---
arch/x86/include/asm/barrier.h | 3 +++
include/asm-generic/barrier.h | 8 ++++++++
kernel/sched/sched.h | 20 ++++++++++++++------
3 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 35389b2af88e..0d5e54201eb2 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -79,6 +79,9 @@ do { \
#define __smp_mb__before_atomic() do { } while (0)
#define __smp_mb__after_atomic() do { } while (0)
+/* Writing to CR3 provides a full memory barrier in switch_mm(). */
+#define smp_mb__after_switch_mm() do { } while (0)
+
#include <asm-generic/barrier.h>
/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..5a6c94d7a598 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -296,5 +296,13 @@ do { \
#define io_stop_wc() do { } while (0)
#endif
+/*
+ * Architectures that guarantee an implicit smp_mb() in switch_mm()
+ * can override smp_mb__after_switch_mm.
+ */
+#ifndef smp_mb__after_switch_mm
+#define smp_mb__after_switch_mm() smp_mb()
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..044d842c696c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -79,6 +79,8 @@
# include <asm/paravirt_api_clock.h>
#endif
+#include <asm/barrier.h>
+
#include "cpupri.h"
#include "cpudeadline.h"
@@ -3481,13 +3483,19 @@ static inline void switch_mm_cid(struct rq *rq,
* between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
* Provide it here.
*/
- if (!prev->mm) // from kernel
+ if (!prev->mm) { // from kernel
smp_mb();
- /*
- * user -> user transition guarantees a memory barrier through
- * switch_mm() when current->mm changes. If current->mm is
- * unchanged, no barrier is needed.
- */
+ } else { // from user
+ /*
+ * user -> user transition relies on an implicit
+ * memory barrier in switch_mm() when
+ * current->mm changes. If the architecture
+ * switch_mm() does not have an implicit memory
+ * barrier, it is emitted here. If current->mm
+ * is unchanged, no barrier is needed.
+ */
+ smp_mb__after_switch_mm();
+ }
}
if (prev->mm_cid_active) {
mm_cid_snapshot_time(rq, prev->mm);
--
2.39.2
smp_call_function_single disables IRQs when executing the callback. To
prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
This is already done by qman_update_cgr and qman_delete_cgr; fix the
other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
CC: stable(a)vger.kernel.org
Signed-off-by: Sean Anderson <sean.anderson(a)seco.com>
Reviewed-by: Camelia Groza <camelia.groza(a)nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
I got no response the first time I sent this, so I am resending to net.
This issue was introduced in a series which went through net, so I hope
it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ande…
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
drivers/soc/fsl/qbman/qman.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 739e4eee6b75..1bf1f1ea67f0 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1456,11 +1456,11 @@ static void qm_congestion_task(struct work_struct *work)
union qm_mc_result *mcr;
struct qman_cgr *cgr;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
qm_mc_start(&p->p);
qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION);
if (!qm_mc_result_timeout(&p->p, &mcr)) {
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
dev_crit(p->config->dev, "QUERYCONGESTION timeout\n");
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
return;
@@ -1476,7 +1476,7 @@ static void qm_congestion_task(struct work_struct *work)
list_for_each_entry(cgr, &p->cgr_cbs, node)
if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid))
cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid));
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
}
@@ -2440,7 +2440,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
preempt_enable();
cgr->chan = p->config->channel;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
if (opts) {
struct qm_mcc_initcgr local_opts = *opts;
@@ -2477,7 +2477,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
qman_cgrs_get(&p->cgrs[1], cgr->cgrid))
cgr->cb(p, cgr, 1);
out:
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
put_affine_portal();
return ret;
}
--
2.35.1.1320.gc452695387.dirty
[Embedded World 2024, SECO SpA]<https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689>
On 11.04.24 09:20, Toralf Förster wrote:
> It is a remote system, nothing in the logs, system is a hardened Gentoo
> Linux, 6.8.4 was fine.
>
> Linux mr-fox 6.8.4 #4 SMP Thu Apr 4 22:10:47 UTC 2024 x86_64 AMD Ryzen
> 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
>
> Another Gentoo dev reported problems too.
>
> config is below.
Thx for the report, but the harsh reality is: nearly no developer will
see your initial report, as you just sent it to LKML, which nearly
nobody ready. I CCed a few lists, which might help. But that is
unlikely, as this could be cause by all sorts of changes. Which is why
we likely need a bisection (
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
) from somebody affected to make some progress here.
That being said: there are a few EFI changes in there that in a case
like this are a suspect. I CCed the developer, maybe something rings a bell.
Ciao, Thorsten
In current driver qcom_slim_ngd_up_worker() indefinitely
waiting for ctrl->qmi_up completion object. This is
resulting in workqueue lockup on Kthread.
Added wait_for_completion_interruptible_timeout to
allow the thread to wait for specific timeout period and
bail out instead waiting infinitely.
Fixes: a899d324863a ("slimbus: qcom-ngd-ctrl: add Sub System Restart support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
Signed-off-by: Viken Dadhaniya <quic_vdadhani(a)quicinc.com>
---
v1 -> v2:
- Remove macro and add value inline.
- add fix, cc and review tag.
---
---
drivers/slimbus/qcom-ngd-ctrl.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/slimbus/qcom-ngd-ctrl.c b/drivers/slimbus/qcom-ngd-ctrl.c
index efeba8275a66..a09a26bf4988 100644
--- a/drivers/slimbus/qcom-ngd-ctrl.c
+++ b/drivers/slimbus/qcom-ngd-ctrl.c
@@ -1451,7 +1451,11 @@ static void qcom_slim_ngd_up_worker(struct work_struct *work)
ctrl = container_of(work, struct qcom_slim_ngd_ctrl, ngd_up_work);
/* Make sure qmi service is up before continuing */
- wait_for_completion_interruptible(&ctrl->qmi_up);
+ if (!wait_for_completion_interruptible_timeout(&ctrl->qmi_up,
+ msecs_to_jiffies(MSEC_PER_SEC))) {
+ dev_err(ctrl->dev, "QMI wait timeout\n");
+ return;
+ }
mutex_lock(&ctrl->ssr_lock);
qcom_slim_ngd_enable(ctrl, true);
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
Hi,
there has been a report of a failure in a 5.4 based kernel, which has
been fixed in kernel 5.10 with commit abee7c494d8c41bb388839bccc47e06247f0d7de.
Please apply the attached backported patch to the stable 5.4 kernel.
Juergen
1. About v3-0001-tracing-Remove-unnecessary-hist_data-destroy-in-d.patch:
The reason I write the changelog by myself is that no one found the bug
at that time, then later the code was removed on upstream, but
4.19-stable has the bug.
2. About v3-0002-tracing-Remove-unnecessary-var-destroy-in-onmax_d.patch
I also write the changelog by myself is that the upstream api is changed.
refs commits:
466f4528fbc6 ("tracing: Generalize hist trigger onmax and save action")
ff9d31d0d466 ("tracing: Remove unnecessary var_ref destroy in track_data_destroy()")
George Guo (2):
tracing: Remove unnecessary hist_data destroy in
destroy_synth_var_refs()
tracing: Remove unnecessary var destroy in onmax_destroy()
kernel/trace/trace_events_hist.c | 27 ++-------------------------
1 file changed, 2 insertions(+), 25 deletions(-)
--
2.34.1