Trying again since gmail didn't use plain text and the message got rejected.
We're waiting for these patches to appear in a 5.15 kernel so that we
can ship with an unpatched kernel. Will they be queued for the stable
kernels sometime soon?
Thanks,
Pat
On Mon, Dec 20, 2021 at 12:43 PM Patrick J. Volkerding
<volkerdi(a)gmail.com> wrote:
>
> Will these patches be queued for the stable kernels soon?
>
> Thanks,
>
> Pat
>
> On Mon, Dec 13, 2021, 03:33 Borislav Petkov <bp(a)alien8.de> wrote:
>>
>> On Mon, Dec 13, 2021 at 10:20:03AM +0200, Mike Rapoport wrote:
>> > Thanks for taking care of this!
>>
>> Sure, no probs.
>>
>> Lemme send them out officially so they're on the list. Will queue them
>> this week.
>>
>> Thx.
>>
>> --
>> Regards/Gruss,
>> Boris.
>>
>> https://people.kernel.org/tglx/notes-about-netiquette
Syzbot reports the following WARNING:
[200~raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 1206 at kernel/locking/irqflag-debug.c:10
warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
Hardware initialization for the rtl8188cu can run for as long as 350 ms,
and the routine may be called with interrupts disabled. To avoid locking
the machine for this long, the current routine saves the interrupt flags
and enables local interrupts. The problem is that it restores the flags
at the end without disabling local interrupts first.
This patch fixes commit a53268be0cb9 ("rtlwifi: rtl8192cu: Fix too long
disable of IRQs").
Reported-by: syzbot+cce1ee31614c171f5595(a)syzkaller.appspotmail.com
Cc: stable(a)vger.kernel.org
Fixes: a53268be0cb9 ("rtlwifi: rtl8192cu: Fix too long disable of IRQs")
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
---
drivers/net/wireless/realtek/rtlwifi/rtl8192cu/hw.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/hw.c
index 6312fddd9c00..eaba66113328 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/hw.c
@@ -1000,6 +1000,7 @@ int rtl92cu_hw_init(struct ieee80211_hw *hw)
_initpabias(hw);
rtl92c_dm_init(hw);
exit:
+ local_irq_disable();
local_irq_restore(flags);
return err;
}
--
2.34.1
Hi, this is your Linux kernel regression tracker speaking.
I noticed a bugreport from Tomasz C. (CCed) that sounds a lot like a
regression between v5.15.7..v5.15.8 and likely better dealt with by email:
To quote from: https://bugzilla.kernel.org/show_bug.cgi?id=215341
> After updating kernel from 5.15.7 to 5.15.8 on ArchLinux distribution, Holtek USB mouse stopped working.
> Exact model:
> 04d9:a067 Holtek Semiconductor, Inc. USB Gaming Mouse
>
> The dmesg output for this device from kernel version 5.15.8:
>
> [ 2.501958] usb 2-1.2.3: new full-speed USB device number 6 using ehci-pci
> [ 2.624369] usb 2-1.2.3: New USB device found, idVendor=04d9, idProduct=a067, bcdDevice= 1.16
> [ 2.624376] usb 2-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 2.624379] usb 2-1.2.3: Product: USB Gaming Mouse
> [ 2.624382] usb 2-1.2.3: Manufacturer: Holtek
>
> After disconnecting and connecting the USB:
>
> [ 71.976731] usb 2-1.2.3: USB disconnect, device number 6
> [ 75.013021] usb 2-1.2.3: new full-speed USB device number 8 using ehci-pci
> [ 75.135865] usb 2-1.2.3: New USB device found, idVendor=04d9, idProduct=a067, bcdDevice= 1.16
> [ 75.135873] usb 2-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 75.135877] usb 2-1.2.3: Product: USB Gaming Mouse
> [ 75.135880] usb 2-1.2.3: Manufacturer: Holtek
>
>
> On kernel version 5.15.7:
>
> [ 2.280515] usb 2-1.2.3: new full-speed USB device number 6 using ehci-pci
> [ 2.379777] usb 2-1.2.3: New USB device found, idVendor=04d9, idProduct=a067, bcdDevice= 1.16
> [ 2.379784] usb 2-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 2.379787] usb 2-1.2.3: Product: USB Gaming Mouse
> [ 2.379790] usb 2-1.2.3: Manufacturer: Holtek
> [ 2.398578] input: Holtek USB Gaming Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2.3/2-1.2.3:1.0/0003:04D9:A067.0005/input/input11
> [ 2.450977] holtek_mouse 0003:04D9:A067.0005: input,hidraw4: USB HID v1.10 Keyboard [Holtek USB Gaming Mouse] on usb-0000:00:1d.0-1.2.3/input0
> [ 2.451013] holtek_mouse 0003:04D9:A067.0006: Fixing up report descriptor
> [ 2.452189] input: Holtek USB Gaming Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2.3/2-1.2.3:1.1/0003:04D9:A067.0006/input/input12
> [ 2.468510] usb 2-1.2.4: new high-speed USB device number 7 using ehci-pci
> [ 2.503913] holtek_mouse 0003:04D9:A067.0006: input,hiddev96,hidraw5: USB HID v1.10 Mouse [Holtek USB Gaming Mouse] on usb-0000:00:1d.0-1.2.3/input1
> [ 2.504105] holtek_mouse 0003:04D9:A067.0007: hiddev97,hidraw6: USB HID v1.10 Device [Holtek USB Gaming Mouse] on usb-0000:00:1d.0-1.2.3/input2
>
> Rolling back the kernel to version 5.15.7 solves the problem.
[TLDR for the rest of the mail: adding this regression to regzbot; most
text you find below is compiled from a few templates paragraphs some of
you might have seen already.]
To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:
#regzbot introduced v5.15.7..v5.15.8
#regzbot title usb: Holtek mouse stopped working
Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to this report and the bugzilla ticket, then regzbot will automatically
mark the regression as resolved once the fix lands in the appropriate
tree. For more details about regzbot see footer.
Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).
P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave thus might sent someone reading this down the wrong
rabbit hole, which none of us wants.
BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 651740a502411793327e2f0741104749c4eedcd1 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Mon, 13 Dec 2021 14:22:33 -0500
Subject: [PATCH] btrfs: check WRITE_ERR when trying to read an extent buffer
Filipe reported a hang when we have errors on btrfs. This turned out to
be a side-effect of my fix c2e39305299f01 ("btrfs: clear extent buffer
uptodate when we fail to write it") which made it so we clear
EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out.
Below is a paste of Filipe's analysis he got from using drgn to debug
the hang
"""
btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to
a value while writeback of all pages has not yet completed:
--> writeback for the first 3 pages finishes, we clear
EXTENT_BUFFER_UPTODATE from eb on the first page when we get an
error.
--> at this point eb->io_pages is 1 and we cleared Uptodate bit from the
first 3 pages
--> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so
it continues, it's able to lock the pages since we obviously don't
hold the pages locked during writeback
--> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets
eb->io_pages to 3, since only the first page does not have Uptodate
bit set at this point
--> writeback for the remaining page completes, we ended decrementing
eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore
never calling end_extent_buffer_writeback(), so
EXTENT_BUFFER_WRITEBACK remains in the eb's flags
--> of course, when the read bio completes, it doesn't and shouldn't
call end_extent_buffer_writeback()
--> we should clear EXTENT_BUFFER_UPTODATE only after all pages of
the eb finished writeback? or maybe make the read pages code
wait for writeback of all pages of the eb to complete before
checking which pages need to be read, touch ->io_pages, submit
read bio, etc
writeback bit never cleared means we can hang when aborting a
transaction, at:
btrfs_cleanup_one_transaction()
btrfs_destroy_marked_extents()
wait_on_extent_buffer_writeback()
"""
This is a problem because our writes are not synchronized with reads in
any way. We clear the UPTODATE flag and then we can easily come in and
try to read the EB while we're still waiting on other bio's to
complete.
We have two options here, we could lock all the pages, and then check to
see if eb->io_pages != 0 to know if we've already got an outstanding
write on the eb.
Or we can simply check to see if we have WRITE_ERR set on this extent
buffer. We set this bit _before_ we clear UPTODATE, so if the read gets
triggered because we aren't UPTODATE because of a write error we're
guaranteed to have WRITE_ERR set, and in this case we can simply return
-EIO. This will fix the reported hang.
Reported-by: Filipe Manana <fdmanana(a)suse.com>
Fixes: c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3258b6f01e85..9234d96a7fd5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -6611,6 +6611,14 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
return 0;
+ /*
+ * We could have had EXTENT_BUFFER_UPTODATE cleared by the write
+ * operation, which could potentially still be in flight. In this case
+ * we simply want to return an error.
+ */
+ if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
+ return -EIO;
+
if (eb->fs_info->sectorsize < PAGE_SIZE)
return read_extent_buffer_subpage(eb, wait, mirror_num);