struct rdma_cm_id has member "struct work_struct net_work"
that is reused for enqueuing cma_netevent_work_handler()s
onto cma_wq.
Below crash[1] can occur if more than one call to
cma_netevent_callback() occurs in quick succession,
which further enqueues cma_netevent_work_handler()s for the
same rdma_cm_id, overwriting any previously queued work-item(s)
that was just scheduled to run i.e. there is no guarantee
the queued work item may run between two successive calls
to cma_netevent_callback() and the 2nd INIT_WORK would overwrite
the 1st work item (for the same rdma_cm_id), despite grabbing
id_table_lock during enqueue.
Also drgn analysis [2] indicates the work item was likely overwritten.
Fix this by moving the INIT_WORK() to __rdma_create_id(),
so that it doesn't race with any existing queue_work() or
its worker thread.
[1] Trimmed crash stack:
=============================================
BUG: kernel NULL pointer dereference, address: 0000000000000008
kworker/u256:6 ... 6.12.0-0...
Workqueue: cma_netevent_work_handler [rdma_cm] (rdma_cm)
RIP: 0010:process_one_work+0xba/0x31a
Call Trace:
worker_thread+0x266/0x3a0
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
=============================================
[2] drgn crash analysis:
>>> trace = prog.crashed_thread().stack_trace()
>>> trace
(0) crash_setup_regs (./arch/x86/include/asm/kexec.h:111:15)
(1) __crash_kexec (kernel/crash_core.c:122:4)
(2) panic (kernel/panic.c:399:3)
(3) oops_end (arch/x86/kernel/dumpstack.c:382:3)
...
(8) process_one_work (kernel/workqueue.c:3168:2)
(9) process_scheduled_works (kernel/workqueue.c:3310:3)
(10) worker_thread (kernel/workqueue.c:3391:4)
(11) kthread (kernel/kthread.c:389:9)
Line workqueue.c:3168 for this kernel version is in process_one_work():
3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
>>> trace[8]["work"]
*(struct work_struct *)0xffff92577d0a21d8 = {
.data = (atomic_long_t){
.counter = (s64)536870912, <=== Note
},
.entry = (struct list_head){
.next = (struct list_head *)0xffff924d075924c0,
.prev = (struct list_head *)0xffff924d075924c0,
},
.func = (work_func_t)cma_netevent_work_handler+0x0 = 0xffffffffc2cec280,
}
Suspicion is that pwq is NULL:
>>> trace[8]["pwq"]
(struct pool_workqueue *)<absent>
In process_one_work(), pwq is assigned from:
struct pool_workqueue *pwq = get_work_pwq(work);
and get_work_pwq() is:
static struct pool_workqueue *get_work_pwq(struct work_struct *work)
{
unsigned long data = atomic_long_read(&work->data);
if (data & WORK_STRUCT_PWQ)
return work_struct_pwq(data);
else
return NULL;
}
WORK_STRUCT_PWQ is 0x4:
>>> print(repr(prog['WORK_STRUCT_PWQ']))
Object(prog, 'enum work_flags', value=4)
But work->data is 536870912 which is 0x20000000.
So, get_work_pwq() returns NULL and we crash in process_one_work():
3168 strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
=============================================
Fixes: 925d046e7e52 ("RDMA/core: Add a netevent notifier to cma")
Cc: stable(a)vger.kernel.org
Co-developed-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Signed-off-by: Sharath Srinivasan <sharath.srinivasan(a)oracle.com>
---
v1->v2 cc:stable@vger.kernel.org
---
drivers/infiniband/core/cma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 91db10515d74..176d0b3e4488 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -72,6 +72,8 @@ static const char * const cma_events[] = {
static void cma_iboe_set_mgid(struct sockaddr *addr, union ib_gid *mgid,
enum ib_gid_type gid_type);
+static void cma_netevent_work_handler(struct work_struct *_work);
+
const char *__attribute_const__ rdma_event_msg(enum rdma_cm_event_type event)
{
size_t index = event;
@@ -1033,6 +1035,7 @@ __rdma_create_id(struct net *net, rdma_cm_event_handler event_handler,
get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
id_priv->id.route.addr.dev_addr.net = get_net(net);
id_priv->seq_num &= 0x00ffffff;
+ INIT_WORK(&id_priv->id.net_work, cma_netevent_work_handler);
rdma_restrack_new(&id_priv->res, RDMA_RESTRACK_CM_ID);
if (parent)
@@ -5227,7 +5230,6 @@ static int cma_netevent_callback(struct notifier_block *self,
if (!memcmp(current_id->id.route.addr.dev_addr.dst_dev_addr,
neigh->ha, ETH_ALEN))
continue;
- INIT_WORK(¤t_id->id.net_work, cma_netevent_work_handler);
cma_id_get(current_id);
queue_work(cma_wq, ¤t_id->id.net_work);
}
--
2.39.5 (Apple Git-154)
The function brcmf_usb_dlneeded() calls the function brcmf_usb_dl_cmd()
but dose not check its return value. The 'id.chiprev' is uninitialized if
the function brcmf_usb_dl_cmd() fails, and may propagate to
'devinfo->bus_pub.chiprev'.
Add error handling for brcmf_usb_dl_cmd() to return the function if the
'id.chiprev' is uninitialized.
Fixes: 71bb244ba2fd ("brcm80211: fmac: add USB support for bcm43235/6/8 chipsets")
Cc: stable(a)vger.kernel.org # v3.4+
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
index 2821c27f317e..50dddac8a2ab 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/usb.c
@@ -790,6 +790,7 @@ brcmf_usb_dlneeded(struct brcmf_usbdev_info *devinfo)
{
struct bootrom_id_le id;
u32 chipid, chiprev;
+ int err;
brcmf_dbg(USB, "Enter\n");
@@ -798,7 +799,11 @@ brcmf_usb_dlneeded(struct brcmf_usbdev_info *devinfo)
/* Check if firmware downloaded already by querying runtime ID */
id.chip = cpu_to_le32(0xDEAD);
- brcmf_usb_dl_cmd(devinfo, DL_GETVER, &id, sizeof(id));
+ err = brcmf_usb_dl_cmd(devinfo, DL_GETVER, &id, sizeof(id));
+ if (err) {
+ brcmf_err("DL_GETID Failed\n");
+ return false;
+ }
chipid = le32_to_cpu(id.chip);
chiprev = le32_to_cpu(id.chiprev);
--
2.42.0.windows.2
The of_find_device_by_node() function increments the reference count of
the embedded device, which should be released with put_device() when it
is no longer needed.
In ill_acc_of_setup(), put_device() is only called on error paths, but
not on the success path. Fix this by calling put_device() before
returning successfully.
Compile-tested only.
Cc: stable(a)vger.kernel.org
Fixes: 5433acd81e873 ("MIPS: ralink: add illegal access driver")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
arch/mips/ralink/ill_acc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/mips/ralink/ill_acc.c b/arch/mips/ralink/ill_acc.c
index 25341b2319d0..6d1d829854b6 100644
--- a/arch/mips/ralink/ill_acc.c
+++ b/arch/mips/ralink/ill_acc.c
@@ -84,6 +84,7 @@ static int __init ill_acc_of_setup(void)
rt_memc_w32(ILL_INT_STATUS, REG_ILL_ACC_TYPE);
dev_info(&pdev->dev, "irq registered\n");
+ put_device(&pdev->dev);
return 0;
}
--
2.49.0
The RK3399 Puma SoM contains the internal Cypress CYUSB3304 USB
hub, that shows instability due to improper reset pin configuration.
Currently reset pin is modeled as a vcc5v0_host regulator, that
might result in too short reset pulse duration.
Starting with the v6.6, the Onboard USB hub driver (later renamed
to Onboard USB dev) contains support for Cypress HX3 hub family.
It can be now used to correctly model the RK3399 Puma SoM hardware.
The first commits in this series fix the onboard USB dev driver to
support all HX3 hub variants, including the CYUSB3304 found in
the RK3399 Puma SoM.
This allows to introduce fix for internal USB hub instability on
RK3399 Puma, by replacing the vcc5v0_host regulator with
cy3304_reset, used inside the hub node.
Please be aware that the patch that fixes USB hub instability in
arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi can me merged only
after updating the Onboard USB dev driver, otherwise the hub
will not work.
Two last commits in the series disable unrouted USB controllers
and PHYs on RK3399 Puma SOM and Haikou carrier board, with no
intended functional changes.
Signed-off-by: Lukasz Czechowski <lukasz.czechowski(a)thaumatec.com>
---
Lukasz Czechowski (3):
usb: misc: onboard_usb_dev: fix support for Cypress HX3 hubs
dt-bindings: usb: cypress,hx3: Add support for all variants
arm64: dts: rockchip: fix internal USB hub instability on RK3399 Puma
Quentin Schulz (2):
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma
arm64: dts: rockchip: disable unrouted USB controllers and PHY on RK3399 Puma with Haikou
.../devicetree/bindings/usb/cypress,hx3.yaml | 6 +++
.../arm64/boot/dts/rockchip/rk3399-puma-haikou.dts | 8 ----
arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 43 ++++++++++------------
drivers/usb/misc/onboard_usb_dev.c | 10 ++++-
drivers/usb/misc/onboard_usb_dev.h | 6 +++
5 files changed, 39 insertions(+), 34 deletions(-)
---
base-commit: 1e26c5e28ca5821a824e90dd359556f5e9e7b89f
change-id: 20250326-onboard_usb_dev-a7c063a8a515
Best regards,
--
Lukasz Czechowski <lukasz.czechowski(a)thaumatec.com>
Good day/night to everyone.
I am sorry about the duplicate post, I had missed that I was also supposed to CC the regressions list.
I have had S3 sleep broken for well over half a year at this point where the kernel simply crashes upon entering sleep, and the PC turns off (literally stops pulling power from the power supply), and turning it on, at first, causes to to run through 1 on/off cycle, with I believe is a diagnostic step on this board, as it happens whenever something causes the PC to unexpectedly turn off (unless the source is external, like a blackout, in which case once power is restored and I turn on the PC, it turns on normally).
At first I thought that it might be an AMDGPU issue, however removing my dGPU, and also, the rest of my hardware, changed nothing.
However I have had at one point had asus-wmi try to put the system into S0ix, despite the firmware lacking support for S0ix, and now FastFetch no longer prints out my motherboards model number on the Host line, instead just showing ASUS MB, so I have reason to suspect that asus-wmi may be the culprit for this regression.
Unfortunately, I am unable to procure logs, since the kernel crashes before anything meaningful is logged.
I know that this is not a hardware issue, since on Windows and older live-images (Solus is my distribution of choice) these issues do not exist.
I am unsure as to which kernel update broke S3 for me, however I believe it was either late in the 6.10.x cycle or the 6.11.x cycle, since the Solus 4.6 live-image has no such issues on 6.10.13, which was also our last 6.10.x kernel update, and I only started experienced it when we updated to 6.11.5, our first update to 6.11.x, and updating to 6.12.x did not fix anything. The current Solus 4.7 live-image has 6.12.9, and my installed system is currently on 6.12.21.
Hardware:
Motherboard: Asus Z97 Pro Gamer
CPU: i5-4570
dGPU: Sapphire Radeon 540 4GB
RAM: 2x Crucial 8GB DDR3L@1600MT/s, 1x Crucial 4GB DDR3L@1600MT/s
Storage: 500GB Western Digital WD5000AAKX SATA3 7200RPM HDD
DVD Drive: Lite-On DH16ABSH
Add-in cards: Intel 3168 PCIe+USB (by means of a simple adapter) VIA VT6315 Firewire 400 PCIe, MosChip MCS9865 Parallel PCI (over integrated ASMedia ASM1083/1085 PCIe to PCI bridge)
PSU: Seasonix SS-860XP² 860W