On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.06.23 04:36, Bagas Sanjaya wrote:
On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
Hello,
chiming in here as I'm experiencing what looks like the exact same issue, also on a Lenovo Z13 notebook, also on Arch: Oops during startup in task udev-worker followed by udev-worker blocking all attempts to suspend or cleanly shutdown/reboot the machine - in fact I first noticed because the machine surprised with repeatedly running out of battery after it had supposedly been in standby but couldn't. Only then I noticed the error on boot.
bisect result: 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit commit 904e28c6de083fa4834cdbd0026470ddc30676fc Merge: a738688177dc 2f7f4efb9411 Author: Benjamin Tissoires benjamin.tissoires@redhat.com Date: Wed Feb 22 10:44:31 2023 +0100
Merge branch 'for-6.3/hid-bpf' into for-linus
Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related to amd_sfh). Can you repeat the bisection?
Well, amd_sfh afaics apparently interacts with HID (see trace earlier in the thread), so it's not that far away. But it's a merge commit, which is possible, but doesn't happen every day. So a recheck might really be a good idea.
Let's not rule out that there is a bad interaction between HID-BPF and AMD SFH. HID-BPF is able to process any incoming HID event, whether it comes from AND SFH, USB, BT, I2C or anything else.
However, looking at the stack trace in the initial report[0], it seems we are getting the oops/stack traces while we are still in amd_sfh:
list_add corruption. next is NULL. WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0 ... RIP: 0010:__list_add_valid+0x57/0xa0 ... Call Trace:
<TASK> amd_sfh_get_report+0xba/0x110 [amd_sfh 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17] ...
If HID-BPF were involved, we should see a call to hid_input_report() IMO. Also AMD SFH calls hid_input_report() in a workqueue, so I would expect a different stack trace.
I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled, clean up") because the stack trace says that there is a bad list_add, which could happen if the object is not correctly initialized.
However, that commit was present in v6.2, so it might not be that one.
Back to the merge commit: the hid-bpf tree was merged in the hid tree while it took its branch during the v6.1 cycle. So that might be the reason you get this as a result of bisection because the AMD SFH code in the hid-bpf branch is the one from the v6.1 kernel, and when you merge it to the v6.2+ branch, you get a different code for that driver.
Cheers, Benjamin
[0] https://lore.kernel.org/regressions/f40e3897-76f1-2cd0-2d83-e48d87130eab@hex...
If I'm not mistaken the Z13 doesn't actually have any sensors connected to SFH. So I think the suspicion on 7bcfdab3f0c6 and theory this is triggered by HID init makes a lot of sense.
Can you try this patch?
diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c index d9b7b01900b5..fa693a5224c6 100644 --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata) devm_kfree(dev, cl_data->report_descr[i]); } dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", cl_data->is_any_sensor_enabled); + cl_data->num_hid_devices = 0; return -EOPNOTSUPP; } schedule_delayed_work(&cl_data->work_buffer, msecs_to_jiffies(AMD_SFH_IDLE_LOOP));