Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Thanks,
Hui.
[ 118.494238] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 118.495866] OOM killer disabled. [ 118.495868] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 118.497772] Suspending console(s) (use no_console_suspend to debug) [ 118.499120] rsi_91x: ===> Interface DOWN <=== [ 129.013207] mmc1: Controller never released inhibit bit(s). [ 129.013216] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 129.013226] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 129.013233] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 129.013240] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 129.013247] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 129.013254] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 129.013261] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 129.013268] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 129.013276] mmc1: sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 129.013283] mmc1: sdhci: ACmd stat: 0x0000ffff | Slot int: 0x0000ffff [ 129.013290] mmc1: sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 129.013297] mmc1: sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 129.013304] mmc1: sdhci: Resp[0]: 0xffffffff | Resp[1]: 0xffffffff [ 129.013311] mmc1: sdhci: Resp[2]: 0xffffffff | Resp[3]: 0xffffffff [ 129.013316] mmc1: sdhci: Host ctl2: 0x0000ffff [ 129.013323] mmc1: sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 129.013327] mmc1: sdhci: ============================================ [ 129.113415] mmc1: Reset 0x2 never completed. [ 129.113417] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 129.113421] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 129.113424] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 129.113428] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 129.113431] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 129.113435] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 129.113439] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 129.113442] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 129.113446] mmc1: sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 129.113449] mmc1: sdhci: ACmd stat: 0x0000ffff | Slot int: 0x0000ffff [ 129.113453] mmc1: sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 129.113457] mmc1: sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 129.113460] mmc1: sdhci: Resp[0]: 0xffffffff | Resp[1]: 0xffffffff [ 129.113464] mmc1: sdhci: Resp[2]: 0xffffffff | Resp[3]: 0xffffffff [ 129.113466] mmc1: sdhci: Host ctl2: 0x0000ffff [ 129.113470] mmc1: sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 129.113472] mmc1: sdhci: ============================================ [ 129.213489] mmc1: Reset 0x4 never completed. [ 129.213490] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 129.213494] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 129.213498] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 129.213501] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 129.213505] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 129.213508] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 129.213512] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 129.213515] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff [ 129.213519] mmc1: sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff [ 129.213523] mmc1: sdhci: ACmd stat: 0x0000ffff | Slot int: 0x0000ffff [ 129.213526] mmc1: sdhci: Caps: 0xffffffff | Caps_1: 0xffffffff [ 129.213530] mmc1: sdhci: Cmd: 0x0000ffff | Max curr: 0xffffffff [ 129.213534] mmc1: sdhci: Resp[0]: 0xffffffff | Resp[1]: 0xffffffff [ 129.213537] mmc1: sdhci: Resp[2]: 0xffffffff | Resp[3]: 0xffffffff [ 129.213540] mmc1: sdhci: Host ctl2: 0x0000ffff [ 129.213543] mmc1: sdhci: ADMA Err: 0xffffffff | ADMA Ptr: 0xffffffff [ 129.213545] mmc1: sdhci: ============================================ [ 129.213882] rsi_91x: rsi_sdio_enable_interrupts: Failed to read int enable register [ 129.240392] rsi_91x: ===> Interface UP <=== [ 129.240443] rsi_91x: rsi_disable_ps: Cannot accept disable PS in PS_NONE state
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
thanks,
greg k-h
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash after resume under 4.19.191:
admin@HW6VB02:~$ uname -a Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for non-removable host: -38 [ 59.682917] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 59.686063] OOM killer disabled. [ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 59.687385] Suspending console(s) (use no_console_suspend to debug) [ 59.687931] rsi_91x: ===> Interface DOWN <=== [ 70.068983] mmc1: Controller never released inhibit bit(s). [ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
So let us revert this commit from 4.19.y?
Thanks,
Hui.
On Thu, Aug 19, 2021 at 10:57:03AM +0800, Hui Wang wrote:
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash after resume under 4.19.191:
admin@HW6VB02:~$ uname -a Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for non-removable host: -38 [ 59.682917] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 59.686063] OOM killer disabled. [ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 59.687385] Suspending console(s) (use no_console_suspend to debug) [ 59.687931] rsi_91x: ===> Interface DOWN <=== [ 70.068983] mmc1: Controller never released inhibit bit(s). [ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
So let us revert this commit from 4.19.y?
If you revert it, does it work properly? What about in Linus's tree?
thanks,
greg k-h
On 8/19/21 7:31 AM, Greg Kroah-Hartman wrote:
On Thu, Aug 19, 2021 at 10:57:03AM +0800, Hui Wang wrote:
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash after resume under 4.19.191:
admin@HW6VB02:~$ uname -a Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for non-removable host: -38 [ 59.682917] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 59.686063] OOM killer disabled. [ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 59.687385] Suspending console(s) (use no_console_suspend to debug) [ 59.687931] rsi_91x: ===> Interface DOWN <=== [ 70.068983] mmc1: Controller never released inhibit bit(s). [ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
So let us revert this commit from 4.19.y?
If you revert it, does it work properly? What about in Linus's tree?
I suspect in that case, sdio_claim_host() will spin indefinitely and never finish, see the c434e5e48dc4e ("rsi: Use resume_noirq for SDIO") commit message.
Note that I did my tests on ARM MMCI (stm32mp1 variant).
This "[ 70.068983] mmc1: Controller never released inhibit bit(s)" looks suspicious in the log above.
Also, newer versions of the RSI downstream driver [1] as of 390542d ("Updated Readme.txt file") simply comment out rsi_sdio_enable_interrupts() in rsi/rsi_91x_sdio.c rsi_resume(), which looks like RSI ran into the same problem, but "fixed" it differently. I think that approach RSI took is wrong and it just hid the issue.
[1] git://github.com/SiliconLabs/RS911X-nLink-OSD
On 8/19/21 3:49 PM, Marek Vasut wrote:
On 8/19/21 7:31 AM, Greg Kroah-Hartman wrote:
On Thu, Aug 19, 2021 at 10:57:03AM +0800, Hui Wang wrote:
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote:
Hi Marex,
We backported this patch to ubuntu 4.15.0-generic kernel, and found this patch introduced the rsi driver crashing when running system resume on the Dell 300x IoT platform (100% rate). Below is the log, After seeing this log, the rsi wifi can't work anymore, need to run 'rmmod rsi_sdio;modprobe rsi_sdio" to make it work again.
So do you know what is missing apart from this patch or this patch is not suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash after resume under 4.19.191:
admin@HW6VB02:~$ uname -a Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for non-removable host: -38 [ 59.682917] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 59.686063] OOM killer disabled. [ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 59.687385] Suspending console(s) (use no_console_suspend to debug) [ 59.687931] rsi_91x: ===> Interface DOWN <=== [ 70.068983] mmc1: Controller never released inhibit bit(s). [ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
So let us revert this commit from 4.19.y?
If you revert it, does it work properly? What about in Linus's tree?
I reverted the commit in the 4.19.191, then the wifi could work both before and after the system resume. I tested the mainline kernel linux-5.13, before suspend, the wifi could work, after suspend, the whole system can't wakeup, and I couldn't recover the system since I can't access the machine physically. I did all test via ssh remotely. So there is no testing result for Linus' tree.
I suspect in that case, sdio_claim_host() will spin indefinitely and never finish, see the c434e5e48dc4e ("rsi: Use resume_noirq for SDIO") commit message.
At least, we never seen this issue in the kernel 4.15, without the commit of c434e5e48dc4e ("rsi: Use resume_noirq for SDIO"), the wifi and bluetooth works well before and after suspend.
Note that I did my tests on ARM MMCI (stm32mp1 variant).
The platform I am testing is a X86 one, and the sdhci controller driver is sdhci_acpi.c.
This "[ 70.068983] mmc1: Controller never released inhibit bit(s)" looks suspicious in the log above.
Also, newer versions of the RSI downstream driver [1] as of 390542d ("Updated Readme.txt file") simply comment out rsi_sdio_enable_interrupts() in rsi/rsi_91x_sdio.c rsi_resume(), which looks like RSI ran into the same problem, but "fixed" it differently. I think that approach RSI took is wrong and it just hid the issue.
[1] git://github.com/SiliconLabs/RS911X-nLink-OSD
On 8/19/21 10:52 AM, Hui Wang wrote:
On 8/19/21 3:49 PM, Marek Vasut wrote:
On 8/19/21 7:31 AM, Greg Kroah-Hartman wrote:
On Thu, Aug 19, 2021 at 10:57:03AM +0800, Hui Wang wrote:
On 8/18/21 5:04 PM, Marek Vasut wrote:
On 8/18/21 7:33 AM, Greg Kroah-Hartman wrote:
On Wed, Aug 18, 2021 at 12:06:15PM +0800, Hui Wang wrote: > Hi Marex, > > We backported this patch to ubuntu 4.15.0-generic kernel, and > found this > patch introduced the rsi driver crashing when running system > resume on the > Dell 300x IoT platform (100% rate). Below is the log, After > seeing this log, > the rsi wifi can't work anymore, need to run 'rmmod > rsi_sdio;modprobe > rsi_sdio" to make it work again. > > So do you know what is missing apart from this patch or this > patch is not > suitable for 4.15 kernel at all?
Does 4.19.191 work for this system? Why not just use that or newer instead?
I haven't seen this on linux-stable 5.4.y or 5.10.y, if that information is of any use.
But I have to admit, I am tempted to mark the whole driver as BROKEN and submit that for stable backports.
Because that is what it is, it is buggy, broken, and the hardware lacks any documentation. I spent an insane amount of time talking to RedPine Signals / SiLabs trying to get help with basic things like association problems against various APs, no result there. I tried getting hardware docs from them so I can fix the driver myself, no result either. So far I tried to pick various fixes from their downstream driver and submit them, but that is massively time consuming and the changes there are not separated or documented, it is just one large chunk of code.
As far as I can tell, they also have no interest in fixing the driver or helping others with fixing it, so maybe we should just mark it as broken ... :-(
Hi Marek,
Got it, thanks for sharing it.
Hi Greg,
I just tested the 4.19.191, got the same result, the wifi will crash after resume under 4.19.191:
admin@HW6VB02:~$ uname -a Linux HW6VB02 4.19.191 #1 SMP Thu Aug 19 10:19:32 CST 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 59.682908] sdhci-acpi INT33BB:00: pre_suspend failed for non-removable host: -38 [ 59.682917] Freezing user space processes ... (elapsed 0.003 seconds) done. [ 59.686063] OOM killer disabled. [ 59.686065] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 59.687385] Suspending console(s) (use no_console_suspend to debug) [ 59.687931] rsi_91x: ===> Interface DOWN <=== [ 70.068983] mmc1: Controller never released inhibit bit(s). [ 70.068992] mmc1: sdhci: ============ SDHCI REGISTER DUMP =========== [ 70.069002] mmc1: sdhci: Sys addr: 0xffffffff | Version: 0x0000ffff [ 70.069009] mmc1: sdhci: Blk size: 0x0000ffff | Blk cnt: 0x0000ffff [ 70.069016] mmc1: sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff [ 70.069023] mmc1: sdhci: Present: 0xffffffff | Host ctl: 0x000000ff [ 70.069030] mmc1: sdhci: Power: 0x000000ff | Blk gap: 0x000000ff [ 70.069036] mmc1: sdhci: Wake-up: 0x000000ff | Clock: 0x0000ffff [ 70.069043] mmc1: sdhci: Timeout: 0x000000ff | Int stat: 0xffffffff
So let us revert this commit from 4.19.y?
If you revert it, does it work properly? What about in Linus's tree?
I reverted the commit in the 4.19.191, then the wifi could work both before and after the system resume. I tested the mainline kernel linux-5.13, before suspend, the wifi could work, after suspend, the whole system can't wakeup, and I couldn't recover the system since I can't access the machine physically. I did all test via ssh remotely. So there is no testing result for Linus' tree.
I suspect you just hit the issue this patch was trying to fix then.
If you have console access, use no_console_suspend to see the backtrace on wake up.
I suspect in that case, sdio_claim_host() will spin indefinitely and never finish, see the c434e5e48dc4e ("rsi: Use resume_noirq for SDIO") commit message.
At least, we never seen this issue in the kernel 4.15, without the commit of c434e5e48dc4e ("rsi: Use resume_noirq for SDIO"), the wifi and bluetooth works well before and after suspend.
I suspect you might've just been lucky with that, because it seems RSI did hit it too (see below). This could also be something which triggers only on specific controller drivers (?).
Note that I did my tests on ARM MMCI (stm32mp1 variant).
The platform I am testing is a X86 one, and the sdhci controller driver is sdhci_acpi.c.
Do you have an RSI module which can be plugged into an SD card slot there , or is that RSI module soldered-on on some devkit/board ?
Mine is the later, soldered on a SoM, so I have hard time testing on other SDIO controllers.
This "[ 70.068983] mmc1: Controller never released inhibit bit(s)" looks suspicious in the log above.
Also, newer versions of the RSI downstream driver [1] as of 390542d ("Updated Readme.txt file") simply comment out rsi_sdio_enable_interrupts() in rsi/rsi_91x_sdio.c rsi_resume(), which looks like RSI ran into the same problem, but "fixed" it differently. I think that approach RSI took is wrong and it just hid the issue.
[1] git://github.com/SiliconLabs/RS911X-nLink-OSD
The bottom line is, I would really prefer to figure out what the problem that you see on the Linux 5.13.y is and fix that and backport that fix, so the suspend/resume works correctly for everyone ; rather than revert a patch without really understanding the underlying problem.
Sadly, the RSI driver is buggy.
linux-stable-mirror@lists.linaro.org