The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040851-hamster-canary-7b07@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in COW mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 3 Apr 2024 23:21:30 +0200
Subject: [PATCH] x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..36b603d0cdde 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
diff --git a/mm/memory.c b/mm/memory.c
index 904f70b99498..d2155ced45f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From 310227f42882c52356b523e2f4e11690eebcd2ab Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 13 Feb 2024 14:54:25 +0100
Subject: [PATCH] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: <stable(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Message-Id: <20240213135425.795001-1-david(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
drivers/virtio/virtio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b3513..f513ee21b1c18 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
if (drv && drv->freeze) {
ret = drv->freeze(dev);
- if (ret)
+ if (ret) {
+ virtio_config_enable(dev);
return ret;
+ }
}
if (dev->config->destroy_avq)
--
2.43.0
Dave,
>> Could you please try the patch below on top of v6.1.80?
> Works okay on top of v6.1.80:
>
> [ 30.952668] scsi 6:0:0:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3
> [ 31.072592] scsi target6:0:0: Beginning Domain Validation
> [ 31.139334] scsi 6:0:0:0: Power-on or device reset occurred
> [ 31.186227] scsi target6:0:0: Ending Domain Validation
> [ 31.240482] scsi target6:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
> [ 31.462587] ata5: SATA link down (SStatus 0 SControl 0)
> [ 31.618798] scsi 6:0:2:0: Direct-Access HP 73.4G ST373207LW HPC1 PQ: 0 ANSI: 3
> [ 31.732588] scsi target6:0:2: Beginning Domain Validation
> [ 31.799201] scsi 6:0:2:0: Power-on or device reset occurred
> [ 31.846724] scsi target6:0:2: Ending Domain Validation
> [ 31.900822] scsi target6:0:2: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
Great, thanks for testing!
Greg, please revert the following commits from linux-6.1.y:
b73dd5f99972 ("scsi: sd: usb_storage: uas: Access media prior to querying device properties")
cf33e6ca12d8 ("scsi: core: Add struct for args to execution functions")
and include the patch below instead.
Thank you!
--
Martin K. Petersen Oracle Linux Engineering
From 87441914d491c01b73b949663c101056a9d9b8c7 Mon Sep 17 00:00:00 2001
From: "Martin K. Petersen" <martin.petersen(a)oracle.com>
Date: Tue, 13 Feb 2024 09:33:06 -0500
Subject: [PATCH] scsi: sd: usb_storage: uas: Access media prior to querying
device properties
[ Upstream commit 321da3dc1f3c92a12e3c5da934090d2992a8814c ]
It has been observed that some USB/UAS devices return generic properties
hardcoded in firmware for mode pages for a period of time after a device
has been discovered. The reported properties are either garbage or they do
not accurately reflect the characteristics of the physical storage device
attached in the case of a bridge.
Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to
avoid calling revalidate twice") we would call revalidate several
times during device discovery. As a result, incorrect values would
eventually get replaced with ones accurately describing the attached
storage. When we did away with the redundant revalidate pass, several
cases were reported where devices reported nonsensical values or would
end up in write-protected state.
An initial attempt at addressing this issue involved introducing a
delayed second revalidate invocation. However, this approach still
left some devices reporting incorrect characteristics.
Tasos Sahanidis debugged the problem further and identified that
introducing a READ operation prior to MODE SENSE fixed the problem and that
it wasn't a timing issue. Issuing a READ appears to cause the devices to
update their state to reflect the actual properties of the storage
media. Device properties like vendor, model, and storage capacity appear to
be correctly reported from the get-go. It is unclear why these devices
defer populating the remaining characteristics.
Match the behavior of a well known commercial operating system and
trigger a READ operation prior to querying device characteristics to
force the device to populate the mode pages.
The additional READ is triggered by a flag set in the USB storage and
UAS drivers. We avoid issuing the READ for other transport classes
since some storage devices identify Linux through our particular
discovery command sequence.
Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.c…
Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice")
Cc: stable(a)vger.kernel.org
Reported-by: Tasos Sahanidis <tasos(a)tasossah.com>
Reviewed-by: Ewan D. Milne <emilne(a)redhat.com>
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Tested-by: Tasos Sahanidis <tasos(a)tasossah.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 31b5273f43a7..349b1455a2c6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3284,6 +3284,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp,
return true;
}
+static void sd_read_block_zero(struct scsi_disk *sdkp)
+{
+ unsigned int buf_len = sdkp->device->sector_size;
+ char *buffer, cmd[10] = { };
+
+ buffer = kmalloc(buf_len, GFP_KERNEL);
+ if (!buffer)
+ return;
+
+ cmd[0] = READ_10;
+ put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */
+ put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */
+
+ scsi_execute_req(sdkp->device, cmd, DMA_FROM_DEVICE, buffer, buf_len,
+ NULL, SD_TIMEOUT, sdkp->max_retries, NULL);
+ kfree(buffer);
+}
+
/**
* sd_revalidate_disk - called the first time a new disk is seen,
* performs disk spin up, read_capacity, etc.
@@ -3323,7 +3341,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
*/
if (sdkp->media_present) {
sd_read_capacity(sdkp, buffer);
-
+ /*
+ * Some USB/UAS devices return generic values for mode pages
+ * until the media has been accessed. Trigger a READ operation
+ * to force the device to populate mode pages.
+ */
+ if (sdp->read_before_ms)
+ sd_read_block_zero(sdkp);
/*
* set the default to rotational. All non-rotational devices
* support the block characteristics VPD page, which will
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index c54e9805da53..12cf9940e5b6 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -179,6 +179,13 @@ static int slave_configure(struct scsi_device *sdev)
*/
sdev->use_192_bytes_for_3f = 1;
+ /*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
/*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index de3836412bf3..ed22053b3252 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -878,6 +878,13 @@ static int uas_slave_configure(struct scsi_device *sdev)
if (devinfo->flags & US_FL_CAPACITY_HEURISTICS)
sdev->guess_capacity = 1;
+ /*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
/*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index d2751ed536df..1504d3137cc6 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -204,6 +204,7 @@ struct scsi_device {
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
+ unsigned read_before_ms:1; /* perform a READ before MODE SENSE */
unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
unsigned no_write_same:1; /* no WRITE SAME command */
unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
please backport
e7d24c0aa8e678f41
gcc-plugins/stackleak: Avoid .head.text section
to stable kernels v5.15 and newer. This addresses the regression reported here:
https://lkml.kernel.org/r/dc118105-b97c-4e51-9a42-a918fa875967%40hardfalcon…
On v5.15, there is a dependency that needs to be backported first:
ae978009fc013e3166c9f523f8b17e41a3c0286e
gcc-plugins/stackleak: Ignore .noinstr.text and .entry.text
The particular issue that this patch fixes does not exist [yet] in
v6.1 and v5.15, but I am working on backports that would introduce it.
But even without those backports, this change is important as it
prevents input sections from being instrumented by stackleak that may
not tolerate this for other reasons too.
Thanks,
Ard.
Hi stable team, there is a report that the recent backport of
5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
for unbound workqueues") [from Tejun] to 6.6.y (as 5a70baec2294) broke
hibernate for a user. 6.6.24-rc1 did not fix this problem; reverting the
culprit does.
> With kernel 6.6.23 hibernating usually hangs here: the display stays
> on but the mouse pointer does not move and the keyboard does not work.
> But SysRq REISUB does reboot. Sometimes it seems to hibernate: the
> computer powers down and can be waked up and the previous display comes
> visible, but it is stuck there.
See https://bugzilla.kernel.org/show_bug.cgi?id=218658 for details.
Note, you have to use bugzilla to reach the reporter, as I sadly[1] can
not CCed them in mails like this.
Side note: there is a mainline report about problems due to
5797b1c18919cd ("workqueue: Implement system-wide nr_active enforcement
for unbound workqueues") as well, but it's about "nohz_full=0 prevents
kernel from booting":
https://bugzilla.kernel.org/show_bug.cgi?id=218665; will forward that
separately to Tejun.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
#regzbot introduced: 5a70baec2294e8a7d0fcc4558741c23e752dad
#regzbot from: Petri Kaukasoina
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218658
#regzbot title: workqueue: hubernate usually hangs when going to sleep
#regzbot ignore-activity
Larry Finger <Larry.Finger(a)gmail.com> wrote:
> As discussed in the links below, the SDIO part of RTW8821CS fails to
> start correctly if such startup happens while the UART portion of
> the chip is initializing.
I checked with SDIO team internally, but they didn't meet this case, so we may
take this workaround.
SDIO team wonder if something other than BT cause this failure, and after
system boots everything will be well. Could you boot the system without WiFi/BT
drivers, but insmod drivers manually after booting?
> ---
> drivers/net/wireless/realtek/rtw88/sdio.c | 28 +++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c
> index 0cae5746f540..eec0ad85be72 100644
> --- a/drivers/net/wireless/realtek/rtw88/sdio.c
> +++ b/drivers/net/wireless/realtek/rtw88/sdio.c
> @@ -1325,6 +1325,34 @@ int rtw_sdio_probe(struct sdio_func *sdio_func,
[...]
> + mdelay(500);
Will it better to use sleep function?
As discussed in the links below, the SDIO part of RTW8821CS fails to
start correctly if such startup happens while the UART portion of
the chip is initializing. The logged results with such failure is
[ 10.230516] rtw_8821cs mmc3:0001:1: Start of rtw_sdio_probe
[ 10.306569] Bluetooth: HCI UART driver ver 2.3
[ 10.306717] Bluetooth: HCI UART protocol Three-wire (H5) registered
[ 10.307167] of_dma_request_slave_channel: dma-names property of node '/serial@fe650000' missing or empty
[ 10.307199] dw-apb-uart fe650000.serial: failed to request DMA
[ 10.543474] rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
[ 10.730744] rtw_8821cs mmc3:0001:1: sdio read32 failed (0x11080): -110
[ 10.730923] rtw_8821cs mmc3:0001:1: sdio write32 failed (0x11080): -110
Due to the above errors, wifi fails to work.
For those instances when wifi works, the following is logged:
[ 10.452861] Bluetooth: HCI UART protocol Three-wire (H5) registered
[ 10.453580] of_dma_request_slave_channel: dma-names property of node '/serial@fe650000' missing or empty
[ 10.453621] dw-apb-uart fe650000.serial: failed to request DMA
[ 10.455741] rtw_8821cs mmc3:0001:1: Start of rtw_sdio_probe
[ 10.639186] rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
In this case, SDIO wifi works correctly. The correct case is ensured by
adding an mdelay(500) statement before the call to rtw_core_init(). No
adverse effects are observed.
Link: https://1EHFQ.trk.elasticemail.com/tracking/click?d=1UfsVowwwMAM6kBoyumkHP3…
Link: https://1EHFQ.trk.elasticemail.com/tracking/click?d=XUEf4t8W9xt0czASPOeeDt8…
Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets")
Signed-off-by: Larry Finger <Larry.Finger(a)gmail.com>
Cc: stable(a)vger.kernel.org # v6.4+
---
drivers/net/wireless/realtek/rtw88/sdio.c | 28 +++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c
index 0cae5746f540..eec0ad85be72 100644
--- a/drivers/net/wireless/realtek/rtw88/sdio.c
+++ b/drivers/net/wireless/realtek/rtw88/sdio.c
@@ -1325,6 +1325,34 @@ int rtw_sdio_probe(struct sdio_func *sdio_func,
rtwdev->hci.ops = &rtw_sdio_ops;
rtwdev->hci.type = RTW_HCI_TYPE_SDIO;
+ /* Insert a delay of 500 ms. Without the delay, the wifi part
+ * and the UART that controls Bluetooth interfere with one
+ * another resulting in the following being logged:
+ *
+ * Start of SDIO probe function.
+ * Bluetooth: HCI UART driver ver 2.3
+ * Bluetooth: HCI UART protocol Three-wire (H5) registered
+ * of_dma_request_slave_channel: dma-names property of node '/serial@fe650000'
+ * missing or empty
+ * dw-apb-uart fe650000.serial: failed to request DMA
+` * rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
+ * rtw_8821cs mmc3:0001:1: sdio read32 failed (0x11080): -110
+ *
+ * If the UART is finished initializing before the SDIO probe
+ * function startw, the following is logged:
+ *
+ * Bluetooth: HCI UART protocol Three-wire (H5) registered
+ * of_dma_request_slave_channel: dma-names property of node '/serial@fe650000'
+ * missing or empty
+ * dw-apb-uart fe650000.serial: failed to request DMA
+ * Start of SDIO probe function.
+ * rtw_8821cs mmc3:0001:1: Firmware version 24.8.0, H2C version 12
+ * Bluetooth: hci0: RTL: examining hci_ver=08 hci_rev=000c lmp_ver=08 lmp_subver=8821
+ * SDIO wifi works correctly.
+ *
+ * No adverse effects are observed from the delay.
+ */
+ mdelay(500);
ret = rtw_core_init(rtwdev);
if (ret)
goto err_release_hw;
--
2.44.0
https://1EHFQ.trk.elasticemail.com/tracking/unsubscribe?d=XjvOA0R6jwFES_UmJ…
On Wed, Apr 10, 2024 at 10:31 PM Cem Topcuoglu
<topcuoglu.c(a)northeastern.edu> wrote:
>
> Hi,
>
>
>
> We encountered a bug labelled “KASAN: slab-out-of-bounds Write in ops_init” while fuzzing kernel version 5.15.124 with Syzkaller (lines exist in 5.15.154 as well).
>
>
>
> In the net_namespace.c file, we have an if condition at line 89. Subsequently, Syzkaller encounters the bug at line 90.
>
>
>
> 89 if (old_ng->s.len > id) {
>
> 90 old_ng->ptr[id] = data;
>
> 91 return 0;
>
> 92 }
>
>
>
> Upon inspecting the net_generic struct, we noticed that this struct uses union which puts the array and the header (including the array length information) together.
>
> We suspect that with this union, modifying the ng->ptr[0] is essentially modifying ng->s.len, which might fail the check in 89. This might be the cause for Syzkaller detecting this slab-out-of-bound.
>
Look for MIN_PERNET_OPS_ID (this should be 3)
ng->ptr[0] , [1], [2] can not be overwritten.
Do you have a repro ?
Also please use the latest stable (5.15.154).
> Since we are CS PhD students and Linux hobbyists, we do not have a full understanding of what could lead to this. We would really appreciate if you guys can share some insights into this matter : )
>
>
>
> We attached the syzkaller’s bug report below.
>
>
>
> ==================================================================
>
> BUG: KASAN: slab-out-of-bounds in net_assign_generic
>
> usr/src/kernel/net/core/net_namespace.c:90 [inline]
>
> BUG: KASAN: slab-out-of-bounds in ops_init+0x44b/0x4d0
>
> usr/src/kernel/net/core/net_namespace.c:129
>
> Write of size 8 at addr ffff888043c62ae8 by task (coredump)/5424
>
> CPU: 1 PID: 5424 Comm: (coredump) Not tainted 5.15.124-yocto-standard #1
>
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
>
> Call Trace:
>
> <TASK>
>
> __dump_stack usr/src/kernel/lib/dump_stack.c:88 [inline]
>
> dump_stack_lvl+0x51/0x70 usr/src/kernel/lib/dump_stack.c:106
>
> print_address_description.constprop.0+0x24/0x140 usr/src/kernel/mm/kasan/report.c:248
>
> __kasan_report usr/src/kernel/mm/kasan/report.c:434 [inline]
>
> kasan_report.cold+0x7d/0x117 usr/src/kernel/mm/kasan/report.c:451
>
> __asan_report_store8_noabort+0x17/0x20 usr/src/kernel/mm/kasan/report_generic.c:314
>
> net_assign_generic usr/src/kernel/net/core/net_namespace.c:90 [inline]
>
> ops_init+0x44b/0x4d0 usr/src/kernel/net/core/net_namespace.c:129
>
> setup_net+0x40a/0x970 usr/src/kernel/net/core/net_namespace.c:329
>
> copy_net_ns+0x2ac/0x680 usr/src/kernel/net/core/net_namespace.c:473
>
> create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
>
> unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
>
> ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
>
> __do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
>
> __se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
>
> __x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
>
> do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
>
> do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
>
> entry_SYSCALL_64_after_hwframe+0x61/0xcb
>
> RIP: 0033:0x7fbafce1b39b
>
> Code: 73 01 c3 48 8b 0d 85 2a 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00
>
> 00 90 f3 0f 1e fa b8 10 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 2a 0e 00 f7
>
> d8 64 89 01 48
>
> RSP: 002b:00007ffddc8dfda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
>
> RAX: ffffffffffffffda RBX: 0000557e645dd018 RCX: 00007fbafce1b39b
>
> RDX: 0000000000000000 RSI: 00007ffddc8dfd10 RDI: 0000000040000000
>
> RBP: 00007ffddc8dfde0 R08: 0000000000000000 R09: 00007ffd00000067
>
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000fffffff5
>
> R13: 00007fbafd26ba60 R14: 0000000040000000 R15: 0000000000000000
>
> </TASK>
>
> Allocated by task 5424:
>
> kasan_save_stack+0x26/0x60 usr/src/kernel/mm/kasan/common.c:38
>
> kasan_set_track usr/src/kernel/mm/kasan/common.c:46 [inline]
>
> set_alloc_info usr/src/kernel/mm/kasan/common.c:434 [inline]
>
> ____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:513 [inline]
>
> ____kasan_kmalloc usr/src/kernel/mm/kasan/common.c:472 [inline]
>
> __kasan_kmalloc+0xae/0xe0 usr/src/kernel/mm/kasan/common.c:522
>
> kasan_kmalloc usr/src/kernel/include/linux/kasan.h:264 [inline]
>
> __kmalloc+0x308/0x560 usr/src/kernel/mm/slub.c:4407
>
> kmalloc usr/src/kernel/include/linux/slab.h:596 [inline]
>
> kzalloc usr/src/kernel/include/linux/slab.h:721 [inline]
>
> net_alloc_generic+0x28/0x80 usr/src/kernel/net/core/net_namespace.c:74
>
> net_alloc usr/src/kernel/net/core/net_namespace.c:401 [inline]
>
> copy_net_ns+0xc3/0x680 usr/src/kernel/net/core/net_namespace.c:460
>
> create_new_namespaces+0x390/0xa50 usr/src/kernel/kernel/nsproxy.c:110
>
> unshare_nsproxy_namespaces+0xb0/0x1d0 usr/src/kernel/kernel/nsproxy.c:226
>
> ksys_unshare+0x30c/0x850 usr/src/kernel/kernel/fork.c:3094
>
> __do_sys_unshare usr/src/kernel/kernel/fork.c:3168 [inline]
>
> __se_sys_unshare usr/src/kernel/kernel/fork.c:3166 [inline]
>
> __x64_sys_unshare+0x36/0x50 usr/src/kernel/kernel/fork.c:3166
>
> do_syscall_x64 usr/src/kernel/arch/x86/entry/common.c:50 [inline]
>
> do_syscall_64+0x40/0x90 usr/src/kernel/arch/x86/entry/common.c:80
>
> entry_SYSCALL_64_after_hwframe+0x61/0xcb
>
> The buggy address belongs to the object at ffff888043c62a00
>
> which belongs to the cache kmalloc-256 of size 256
>
> The buggy address is located 232 bytes inside of
>
> 256-byte region [ffff888043c62a00, ffff888043c62b00)
>
> The buggy address belongs to the page:
>
> page:000000008dd0a6b6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43c62
>
> head:000000008dd0a6b6 order:1 compound_mapcount:0
>
> flags: 0x4000000000010200(slab|head|zone=1)
>
> raw: 4000000000010200 ffffea0001108f00 0000000700000007 ffff888001041b40
>
> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
>
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>
> ffff888043c62980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ffff888043c62a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> >ffff888043c62a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>
> ^
>
> ffff888043c62b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ffff888043c62b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> ==================================================================
>
> kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
>
>
>
> Best
>
>