On 27/03/2024 17:31, Hans de Goede wrote:
Hi Philip,
On 3/19/24 5:52 AM, Philip Müller wrote:
On 18/03/2024 17:58, Philip Müller wrote:
I'm currently developing on the OrangePi Neo-01, which ships with two similar touchpads using the Synaptics driver. On 6.7.10 those two devices get detected normally. On 6.8.1 it seems to error out.
I either get none, or at best only one of those two devices.
i2c_hid_acpi: probe of i2c-XXX0001:00 failed with error -110 i2c_hid_acpi: probe of i2c-XXX0002:00 failed with error -110
what would be the best way to debug this?
I found the regression in commit aa69d6974185e9f7a552ba982540a38e34f69690 HID: i2c-hid: Switch i2c_hid_parse() to goto style error handling
I just checked that patch and I don't see anyway how that can create this regression. I assume you did a git bisect ?
Did you try the last commit in the tree before that commit got added and verified that that one works where as building a kernel from commit aa69d6974185e9f itself does not work ?
No I didn't do a git bisect actually. I did it more manually like git bisect might do it. What I did was to compile just that module and load it against the 6.7.10 kernel. a9f68ffe1170ca4bc17ab29067d806a354a026e0 is the last commit you find in 6.7 series. All the other commits are part of the 6.8 series or later. So I looked at the commit titles and picked a9f68ffe1170ca4bc17ab29067d806a354a026e0 which I compiled the "out of tree" module. I unloaded the 6.7.10 module, copied the new compiled one over and did the rmod, modprobe batch script to see how the touchpads react.
Then I went to compile and tested af93a167eda90192563bce64c4bb989836afac1f which created the issue with -110 errors. So I went to aa69d6974185e9f7a552ba982540a38e34f69690 which also results in the error. So I went back to 96d3098db835d58649b73a5788898bd7672a319b which still works. Since now 6.9 development started, I may try 00aab7dcb2267f2aef59447602f34501efe1a07f to see if that fixes anything.
I can get the touchpads both working in 6.8 code when I do enough tries by a rmod/modprobe loop. Based on the logs you see, sometimes both are up, mostly one or none is up. having only two commits of 6.7 code in the module it works as 6.7 code. Only when "HID: i2c-hid: Switch i2c_hid_parse() to goto style error handling" gets added I see the error and can reproduce it on my device as shown in the log.
Before we had
hid_err(hid, "reading report descriptor failed\n"); kfree(rdesc);
followed by the hardcoded return -EIO;
Now we have an redirect to the out function but don't do kfree(rdesc) if you hit the error anymore.
Also the last return 0; got removed. So maybe what ever error the device might have hit before might be reverted due to return 0 or not freeing rdesc when hitting the error might prevent it to load.
Anyhow, that patch is the first patch which changes the function in that series. Sure it might not be able to compile the whole kernel with it. I only tested the module based on the given code in an out of tree module style.
So I might test 6.9rc1 on the device and let you know if that kernel has the same issue still.
When I use the commit before I can rmmod and modprobe in a batch script using a loop without erroring out to -110. Attached the testing script and dmesg log snippets
#!/bin/bash for ((n=0;n<5;n++)) do sudo rmmod i2c_hid_acpi sleep 1 sudo modprobe i2c_hid_acpi --force-vermagic sleep 2 done
Ok, so you did try the commit before and that did work. Are you sure that aa69d6974185e9f was not actually the last working commit ?
For me 96d3098db835d58649b73a5788898bd7672a319b was the last working code, as it doesn't change functionality.
AFAICT aa69d6974185e9f makes no functional changes, except for actually propagating the error from i2c_hid_read_register() rather then hardcoding -EIO. But that should not matter...
See return 0 and kfree(rdesc) which is now missing.
Note that commit aa69d6974185e9f is part of a series and I would not be surprised if some other commit in that series is causing your problem, but aa69d6974185e9f itself seems rather harmless.
Well, it might also revealed an issue, which was ignored all the time and by erroring out it might stop the process to continue something. Cos maybe return 0 had skipped any recorded error before.
If you want me to test or try something else, let me know. I'm open for ideas.
Regards,
Hans