The ISRs of the tps25750 and tps6598x do not handle generated events
properly under all circumstances.
The tps6598x ISR does not read all bits of the INT_EVENTX registers,
leaving events signaled with bits above 64 unattended. Moreover, these
events are not cleared, leaving the interrupt enabled.
The tps25750 reads all bits of the INT_EVENT1 register, but the event
checking is not right because the same event is checked in two different
regions of the same register by means of an OR operation.
This series aims to fix both issues by reading all bits of the
INT_EVENTX registers, and limiting the event checking to the region
where the supported events are defined (currently they are limited to
the first 64 bits of the registers, as the are defined as BIT_ULL()).
If the need for events above the first 64 bits of the INT_EVENTX
registers arises, a different mechanism might be required. But for the
current needs, all definitions can be left as they are.
Note: resend to add the Cc tag for 'stable' (fixes in the series).
Signed-off-by: Javier Carrasco <javier.carrasco(a)wolfvision.net>
---
Javier Carrasco (2):
usb: typec: tipd: fix event checking for tps25750
usb: typec: tipd: fix event checking for tps6598x
drivers/usb/typec/tipd/core.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240328-tps6598x_fix_event_handling-3398d3d82f85
Best regards,
--
Javier Carrasco <javier.carrasco(a)wolfvision.net>
Hi,
These patches fix and reported by xfstests tests xfs/179 xfs/270
xfs/557 xfs/606, the patchset were tested to confirm they fix those
tests. all are clean picks.
thanks,
MNAdam
From: Vasiliy Kovalev <kovalev(a)altlinux.org>
When returning from the hci_disconnect() function, the conn->state
continues to be set to BT_CONNECTED and hci_conn_drop() is executed,
which decrements the conn->refcnt.
Syzkaller has generated a reproducer that results in multiple calls to
hci_encrypt_change_evt() of the same conn object.
--
hci_encrypt_change_evt(){
// conn->state == BT_CONNECTED
hci_disconnect(){
hci_abort_conn();
}
hci_conn_drop();
// conn->state == BT_CONNECTED
}
--
This behavior can cause the conn->refcnt to go far into negative values
and cause problems. To get around this, you need to change the conn->state,
namely to BT_DISCONN, as it was before.
Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org>
---
net/bluetooth/hci_event.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 64477e1bde7cec..e0477021183f9b 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2989,6 +2989,7 @@ static void hci_cs_le_start_enc(struct hci_dev *hdev, u8 status)
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
unlock:
hci_dev_unlock(hdev);
@@ -3654,6 +3655,7 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, void *data,
hci_encrypt_cfm(conn, ev->status);
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
@@ -5248,6 +5250,7 @@ static void hci_key_refresh_complete_evt(struct hci_dev *hdev, void *data,
if (ev->status && conn->state == BT_CONNECTED) {
hci_disconnect(conn, HCI_ERROR_AUTH_FAILURE);
hci_conn_drop(conn);
+ conn->state = BT_DISCONN;
goto unlock;
}
--
2.33.8
After a recent discussion regarding "do we need a 'nobackport' tag" I
set out to create one change for stable-kernel-rules.rst. This is now
the second patch in the series, which links to that discussion; the
other stuff is fine-tuning that happened along the way.
Ciao, Thorsten
Thorsten Leemhuis (4):
docs: stable-kernel-rules: reduce redundancy
docs: stable-kernel-rules: mention "no semi-automatic backport"
docs: stable-kernel-rules: call mainline by its name and change
example
docs: stable-kernel-rules: remove code-labels tags
Documentation/process/stable-kernel-rules.rst | 50 +++++++------------
1 file changed, 18 insertions(+), 32 deletions(-)
base-commit: 3f86ed6ec0b390c033eae7f9c487a3fea268e027
--
2.44.0
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.
There *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:
- spin_unlock(),
- switch_to().
So it's fine when the architecture switch_mm happens to have that barrier
already, but less so when the architecture only provides the full barrier
in switch_to() or spin_unlock().
It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().
Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.
Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.
Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
Reported-by: levi.yun <yeoreum.yun(a)arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: Juri Lelli <juri.lelli(a)redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Ben Segall <bsegall(a)google.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: Valentin Schneider <vschneid(a)redhat.com>
Cc: levi.yun <yeoreum.yun(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
---
arch/x86/include/asm/barrier.h | 3 +++
include/asm-generic/barrier.h | 8 ++++++++
kernel/sched/sched.h | 20 ++++++++++++++------
3 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 35389b2af88e..0d5e54201eb2 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -79,6 +79,9 @@ do { \
#define __smp_mb__before_atomic() do { } while (0)
#define __smp_mb__after_atomic() do { } while (0)
+/* Writing to CR3 provides a full memory barrier in switch_mm(). */
+#define smp_mb__after_switch_mm() do { } while (0)
+
#include <asm-generic/barrier.h>
/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..5a6c94d7a598 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -296,5 +296,13 @@ do { \
#define io_stop_wc() do { } while (0)
#endif
+/*
+ * Architectures that guarantee an implicit smp_mb() in switch_mm()
+ * can override smp_mb__after_switch_mm.
+ */
+#ifndef smp_mb__after_switch_mm
+#define smp_mb__after_switch_mm() smp_mb()
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..044d842c696c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -79,6 +79,8 @@
# include <asm/paravirt_api_clock.h>
#endif
+#include <asm/barrier.h>
+
#include "cpupri.h"
#include "cpudeadline.h"
@@ -3481,13 +3483,19 @@ static inline void switch_mm_cid(struct rq *rq,
* between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
* Provide it here.
*/
- if (!prev->mm) // from kernel
+ if (!prev->mm) { // from kernel
smp_mb();
- /*
- * user -> user transition guarantees a memory barrier through
- * switch_mm() when current->mm changes. If current->mm is
- * unchanged, no barrier is needed.
- */
+ } else { // from user
+ /*
+ * user -> user transition relies on an implicit
+ * memory barrier in switch_mm() when
+ * current->mm changes. If the architecture
+ * switch_mm() does not have an implicit memory
+ * barrier, it is emitted here. If current->mm
+ * is unchanged, no barrier is needed.
+ */
+ smp_mb__after_switch_mm();
+ }
}
if (prev->mm_cid_active) {
mm_cid_snapshot_time(rq, prev->mm);
--
2.39.2
smp_call_function_single disables IRQs when executing the callback. To
prevent deadlocks, we must disable IRQs when taking cgr_lock elsewhere.
This is already done by qman_update_cgr and qman_delete_cgr; fix the
other lockers.
Fixes: 96f413f47677 ("soc/fsl/qbman: fix issue in qman_delete_cgr_safe()")
CC: stable(a)vger.kernel.org
Signed-off-by: Sean Anderson <sean.anderson(a)seco.com>
Reviewed-by: Camelia Groza <camelia.groza(a)nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
I got no response the first time I sent this, so I am resending to net.
This issue was introduced in a series which went through net, so I hope
it makes sense to take it via net.
[1] https://lore.kernel.org/linux-arm-kernel/20240108161904.2865093-1-sean.ande…
(no changes since v3)
Changes in v3:
- Change blamed commit to something more appropriate
Changes in v2:
- Fix one additional call to spin_unlock
drivers/soc/fsl/qbman/qman.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 739e4eee6b75..1bf1f1ea67f0 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -1456,11 +1456,11 @@ static void qm_congestion_task(struct work_struct *work)
union qm_mc_result *mcr;
struct qman_cgr *cgr;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
qm_mc_start(&p->p);
qm_mc_commit(&p->p, QM_MCC_VERB_QUERYCONGESTION);
if (!qm_mc_result_timeout(&p->p, &mcr)) {
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
dev_crit(p->config->dev, "QUERYCONGESTION timeout\n");
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
return;
@@ -1476,7 +1476,7 @@ static void qm_congestion_task(struct work_struct *work)
list_for_each_entry(cgr, &p->cgr_cbs, node)
if (cgr->cb && qman_cgrs_get(&c, cgr->cgrid))
cgr->cb(p, cgr, qman_cgrs_get(&rr, cgr->cgrid));
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
qman_p_irqsource_add(p, QM_PIRQ_CSCI);
}
@@ -2440,7 +2440,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
preempt_enable();
cgr->chan = p->config->channel;
- spin_lock(&p->cgr_lock);
+ spin_lock_irq(&p->cgr_lock);
if (opts) {
struct qm_mcc_initcgr local_opts = *opts;
@@ -2477,7 +2477,7 @@ int qman_create_cgr(struct qman_cgr *cgr, u32 flags,
qman_cgrs_get(&p->cgrs[1], cgr->cgrid))
cgr->cb(p, cgr, 1);
out:
- spin_unlock(&p->cgr_lock);
+ spin_unlock_irq(&p->cgr_lock);
put_affine_portal();
return ret;
}
--
2.35.1.1320.gc452695387.dirty
[Embedded World 2024, SECO SpA]<https://www.messe-ticket.de/Nuernberg/embeddedworld2024/Register/ew24517689>
On 11.04.24 09:20, Toralf Förster wrote:
> It is a remote system, nothing in the logs, system is a hardened Gentoo
> Linux, 6.8.4 was fine.
>
> Linux mr-fox 6.8.4 #4 SMP Thu Apr 4 22:10:47 UTC 2024 x86_64 AMD Ryzen
> 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
>
> Another Gentoo dev reported problems too.
>
> config is below.
Thx for the report, but the harsh reality is: nearly no developer will
see your initial report, as you just sent it to LKML, which nearly
nobody ready. I CCed a few lists, which might help. But that is
unlikely, as this could be cause by all sorts of changes. Which is why
we likely need a bisection (
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
) from somebody affected to make some progress here.
That being said: there are a few EFI changes in there that in a case
like this are a suspect. I CCed the developer, maybe something rings a bell.
Ciao, Thorsten