On 20 Dec 2017, at 7:11 AM, Brian Norris briannorris@chromium.org wrote:
Hi Kai,
On Tue, Dec 19, 2017 at 12:28:17PM +0800, Kai Heng Feng wrote:
On 19 Dec 2017, at 2:13 AM, Brian Norris briannorris@chromium.org wrote: On Mon, Dec 18, 2017 at 12:43:48PM +0100, Greg Kroah-Hartman wrote:
On Fri, Dec 15, 2017 at 07:05:39PM -0800, Matthias Kaehlcke wrote:
We identified the above patch as the culprit, in combination with USB autosuspend being enabled for the Bluetooth controller.
We found that the following (recent) upstream patch fixes the issue on most devices (we are still investigating one case on a specific device):
A key word to highlight here is "most". I have at least one device that is consistently having problems even with both patches included; the only way things work reliably so far is to simply revert the $subject patch in 4.4.x.
The problem we have is that the QCA Rome Bluetooth stops working after system S3. The reset resume quirk workaround the issue on both mainline and 4.4.x (at least for me).
Understood. But that's not the case for all systems, and on some, you're regressing functionality.
Yes. So reverting the commit is a reasonable path we should take.
Many systems keep power enabled for system suspend, and so that "fix" is not necessary for them. It's also not very precise, because it seems to act in many cases where it need not -- for one, it takes effect on *all* resume attempts, even resuming from autosuspend. I really doubt your system is losing USB power in S0 + USB autosuspend?
No, USB power is on. Not seeing my XHCI gets put to D3cold.
So, if you really need this patch for some systems, it seems like it should be much better targeted. You're currently adding a lot of unnecessary overhead and fragility.
You are right. Applying this unconditionally to all QCA Rome may be overkill.
I'll try to investigate further, since this result is a bit confusing still. If there's a real problem with the patch (and I suspect there might be), it probably shouldn't be in mainline either…
Do you have the same issue on mainline?
Refer to my note [1] :)
But because you asked, I did the work necessary to boot mainline on this system, and in fact, I see the same problem. I think that's enough reason to revert your patch in all branches (upstream, and any -stable branch that already took it).
To be clear, here's the kernel logs on 4.15-rc4+, where the 55-second mark is around where I try to power on and scan for BT devices (power-on and scan both fail):
[ 2.323475] usb 3-1: new full-speed USB device number 2 using ohci-platform ... [ 2.731719] usb 3-1: New USB device found, idVendor=0cf3, idProduct=e300 ... [ 2.867870] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 ...(udev doesn't run until here)... [ 35.733139] usbcore: registered new interface driver btusb [ 35.740912] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin [ 35.749216] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111 ... [ 35.816104] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin ... [ 55.073503] usb 3-1: reset full-speed USB device number 2 using ohci-platform ... [ 57.772513] Bluetooth: hci0: urb 00000000ec39086b failed to resubmit (113) [ 57.780217] Bluetooth: hci0: urb 0000000066228fda failed to resubmit (113)
Whereas reverting the $subject patch yields no reset and URB failure; Bluetooth seems to work fine.
Thanks for your testing on mainline!
commit a0085f2510e8976614ad8f766b209448b385492f Author: Sukumar Ghorai sukumar.ghorai@intel.com Date: Wed Aug 16 14:46:55 2017 -0700
Bluetooth: btusb: driver to enable the usb-wakeup feature
[...]
stable branches are currently broken for BTUSB_QCA_ROME with USB autosuspend enabled, since the above patch is not included (I only checked v4.4 and v4.9), so we probably want to integrate it.
Now queued up, thanks for letting me know.
I think you have almost as much information as I do at this point, and I'll try to reply back here if I figure out anything more, so I'll leave you to your decisions. But I would personally revert, not backport more patches.
Among the reasons: (a) I have at least one device that does not work better with both patches [1] (b) So far, I haven't seen any explanation why commit a0085f2510e8 should be a dependency for $subject patch; the closest I can imagine would be that commit a0085f2510e8 could effectively neutralize $subject patch -- if the device is marked as a wakeup source, then it will never try to RESET on resume -- but that's still a fuzzy signal; just because it's marked a wakeup source sometimes doesn't mean it always will be. We can disable it in user space too.
Hi Leif,
Can you try if a0085f2510e8 breaks your $subject commit? If it’s a yes, then what Brian suggests is correct.
Also, can you share the output of "usb-device” for your btusb device?
FWIW, here's mine:
T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0cf3 ProdID=e300 Rev=00.01 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
This is an integrated Wifi+BT QCA6174A-5 M.2 module.
Thanks for the info. I found the same BT on a Killer Wireless, I’ll do some testing on it.
(c) Isn't it safer to revert than to backport more? You're delving into feature-land, not bugfix-land…
Sounds reasonable.
Shall I post a proper revert for mainline, with Fixes tags and CC stable? Or did one of you want to do that?
Let me clean my own mess ;)
We’ll need more information from Leif if we need to do ID-specific quirks.
As I see it, this probably isn't just an ID-specific thing; this patch looks bad for its intended purpose, per my comments above (it's overkill, for one). I think there are many factors involved here, and it needs more study than "I just booted, suspended/resumed once, and it worked better."
Among the things to consider:
- How do we determine the likelihood that the device gets powered off in
S3? The device_may_wakeup() check can be influenced (among other things) by user space policy, so that doesn't really look right to me. (ID-based checks might help a little, since you can probably differentiate chips that are likely used on discrete/removable USB parts, rather than internal chips?)
Actually, I think we can’t know for sure. From my own experience, it’s pretty ODM specific. Some external chips keep its power under S3.
- The ordering: right now, you're enabling "reset on resume" at probe
time, but QCA ROME devices only load their firmware in btusb_open(). This gives a window of time in which you're adding weird reset behavior while the device may be running its ROM firmware, not the patched firmware loaded from /lib/firmware/. That might be one reason things don't work out so well?
You are right, the reset on resume should be used when USB probing, instead of in the middle of btusb_open().
I also suspect that might be one reason that Matthias sometimes saw the problem disappear on his devices, where I didn't. I have my device configured to cold reset the BT device on every reboot, whereas his isn't; it might retain the FW in the BT device's RAM across reboot. So my behavior was more deterministic :)
- The behavior here is affected by USB autosuspend, not just S3. I
mentioned this above, but more specifically, ChromeOS has udev rules like this, to enable autosuspend on various sorts of devices (including bluetooth controllers + BT HID devices):
<udev rule snippet> ATTR{idVendor}=="0cf3", ATTR{idProduct}=="e300", GOTO="autosuspend_enable" ... # Enable autosuspend LABEL="autosuspend_enable" TEST=="power/control", ATTR{power/control}="auto", GOTO="autosuspend_end"
LABEL="autosuspend_end" </udev rule snippet>
I suspect some desktop systems might not have the same policies, so you're not testing autosuspend in the same way I am?
Actually it’s same from kernel’s perspective. I use TLP, it does the same thing to the power/control knob.
Kai-Heng
Brian
Kai-Heng
Brian
[1] Before you ask: this particular device is not quite fully supported on upstream yet, though its sibling devices are. With a bit of effort, I might be able to test a clean upstream. At the moment, I'm using our Chrom{e,ium}OS 4.4 kernel, where we regularly merge in -stable updates.