This commit adds a new flag, devmem_only, to the drm_gpusvm structure. The
purpose of this flag is to ensure that the get_pages function allocates
memory exclusively from the device's memory. If the allocation from
device memory fails, the function will return an -EFAULT error.
Required for shared CPU and GPU atomics on certain devices.
v3:
- s/vram_only/devmem_only/
Fixes: 99624bdff867 ("drm/gpusvm: Add support for GPU Shared Virtual Memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost(a)intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Reviewed-by: Matthew Brost <matthew.brost(a)intel.com>
---
drivers/gpu/drm/drm_gpusvm.c | 5 +++++
include/drm/drm_gpusvm.h | 2 ++
2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index de424e670995..a58d03e6cac2 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -1454,6 +1454,11 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
goto err_unmap;
}
+ if (ctx->devmem_only) {
+ err = -EFAULT;
+ goto err_unmap;
+ }
+
addr = dma_map_page(gpusvm->drm->dev,
page, 0,
PAGE_SIZE << order,
diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
index df120b4d1f83..9fd25fc880a4 100644
--- a/include/drm/drm_gpusvm.h
+++ b/include/drm/drm_gpusvm.h
@@ -286,6 +286,7 @@ struct drm_gpusvm {
* @in_notifier: entering from a MMU notifier
* @read_only: operating on read-only memory
* @devmem_possible: possible to use device memory
+ * @devmem_only: use only device memory
*
* Context that is DRM GPUSVM is operating in (i.e. user arguments).
*/
@@ -294,6 +295,7 @@ struct drm_gpusvm_ctx {
unsigned int in_notifier :1;
unsigned int read_only :1;
unsigned int devmem_possible :1;
+ unsigned int devmem_only :1;
};
int drm_gpusvm_init(struct drm_gpusvm *gpusvm,
--
2.34.1
#regzbot introduced: v6.12..v6.13
I use RX6600 on arm64 Orion o6 board and it seems that amdgpu is broken on recent kernels, fails on boot:
[drm] amdgpu: 7886M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
SError Interrupt on CPU11, code 0x00000000be000011 -- SError
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S 6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan 1 1980
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : amdgpu_device_rreg+0x60/0xe4 [amdgpu]
lr : hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
sp : ffffffc08321b490
x29: ffffffc08321b490 x28: ffffff80b8b80000 x27: ffffff80b8bd0178
x26: ffffff80b8b8fe88 x25: 0000000000000001 x24: ffffff8081647000
x23: ffffffc079d6e000 x22: ffffff80b8bd5000 x21: 000000000007f000
x20: 000000000001fc00 x19: 00000000ffffffff x18: 00000000000015fc
x17: 00000000000015fc x16: 00000000000015cf x15: 00000000000015ce
x14: 00000000000015d0 x13: 00000000000015d1 x12: 00000000000015d2
x11: 00000000000015d3 x10: 000000000000ec00 x9 : 00000000000015fd
x8 : 00000000000015fd x7 : 0000000000001689 x6 : 0000000000555401
x5 : 0000000000000001 x4 : 0000000000100000 x3 : 0000000000100000
x2 : 0000000000000000 x1 : 000000000007f000 x0 : 0000000000000000
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 11 UID: 0 PID: 255 Comm: (udev-worker) Tainted: G S 6.15.0-rc2+ #1 VOLUNTARY
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Radxa Computer (Shenzhen) Co., Ltd. Radxa Orion O6/Radxa Orion O6, BIOS 1.0 Jan 1 1980
Call trace:
show_stack+0x2c/0x84 (C)
dump_stack_lvl+0x60/0x80
dump_stack+0x18/0x24
panic+0x148/0x330
add_taint+0x0/0xbc
arm64_serror_panic+0x64/0x7c
do_serror+0x28/0x68
el1h_64_error_handler+0x30/0x48
el1h_64_error+0x6c/0x70
amdgpu_device_rreg+0x60/0xe4 [amdgpu] (P)
hdp_v5_0_flush_hdp+0x6c/0x80 [amdgpu]
gmc_v10_0_hw_init+0xec/0x1fc [amdgpu]
amdgpu_device_init+0x19f8/0x2480 [amdgpu]
amdgpu_driver_load_kms+0x20/0xb0 [amdgpu]
amdgpu_pci_probe+0x1b8/0x5d4 [amdgpu]
pci_device_probe+0xbc/0x1a8
really_probe+0xc0/0x39c
__driver_probe_device+0x7c/0x14c
driver_probe_device+0x3c/0x120
__driver_attach+0xc4/0x200
bus_for_each_dev+0x68/0xb4
driver_attach+0x24/0x30
bus_add_driver+0x110/0x240
driver_register+0x68/0x124
__pci_register_driver+0x44/0x50
amdgpu_init+0x84/0xf94 [amdgpu]
do_one_initcall+0x60/0x1e0
do_init_module+0x54/0x200
load_module+0x18f8/0x1e68
init_module_from_file+0x74/0xa0
__arm64_sys_finit_module+0x1e0/0x3f0
invoke_syscall+0x64/0xe4
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x34/0xd0
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x19c
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x1000,000000e0,f169a650,9b7ff667
Memory Limit: none
---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
(bios version seems to be 45 years old but that is the state of the board
when I received it)
Also saw this crash with RX6700. Old radeons like HD5450 and nvidia gt1030
work fine on that board.
A little bit of testing showed that it was introduced between 6.12 and 6.13.
Also it seems that changes were taken by some distro kernels already and
different iso images I tried failed to boot before I bumped into some iso
with kernel 6.8 that worked just fine.
The only change related to hdp_v5_0_flush_hdp() was
cf424020e040 drm/amdgpu/hdp5.0: do a posting read when flushing HDP
Reverting that commit ^^ did help and resolved that problem. Before sending
revert as-is I was interested to know if there supposed to be a proper fix
for this or maybe someone is interested to debug this or have any suggestions.
In theory I also need to confirm that exactly that change introduced the
regression.
Thanks,
Alexey
Hi All,
Chages since v5:
- full error message included into commit description
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 14 deletions(-)
--
2.45.2
commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
Bu the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
CC: stable(a)vger.kernel.org # 6.12+
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/direct-io.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/direct-io.c b/fs/btrfs/direct-io.c
index 8567af46e16f..eacbb74bf6bc 100644
--- a/fs/btrfs/direct-io.c
+++ b/fs/btrfs/direct-io.c
@@ -855,6 +855,22 @@ ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
goto buffered;
}
+ /*
+ * We can't control the folios being passed in, applications can write
+ * to them while a direct IO write is in progress. This means the
+ * content might change after we calculated the data checksum.
+ * Therefore we can end up storing a checksum that doesn't match the
+ * persisted data.
+ *
+ * To be extra safe and avoid false data checksum mismatch, if the
+ * inode requires data checksum, just fallback to buffered IO.
+ * For buffered IO we have full control of page cache and can ensure
+ * no one is modifying the content during writeback.
+ */
+ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+ btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+ goto buffered;
+ }
/*
* The iov_iter can be mapped to the same file range we are writing to.
--
2.49.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 687b2bae0efff9b25e071737d6af5004e6e35af5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025051212-antirust-outshoot-07f7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 687b2bae0efff9b25e071737d6af5004e6e35af5 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Wed, 7 May 2025 07:34:24 -0600
Subject: [PATCH] io_uring: ensure deferred completions are flushed for
multishot
Multishot normally uses io_req_post_cqe() to post completions, but when
stopping it, it may finish up with a deferred completion. This is fine,
except if another multishot event triggers before the deferred completions
get flushed. If this occurs, then CQEs may get reordered in the CQ ring,
as new multishot completions get posted before the deferred ones are
flushed. This can cause confusion on the application side, if strict
ordering is required for the use case.
When multishot posting via io_req_post_cqe(), flush any pending deferred
completions first, if any.
Cc: stable(a)vger.kernel.org # 6.1+
Reported-by: Norman Maurer <norman_maurer(a)apple.com>
Reported-by: Christian Mazakas <christian.mazakas(a)gmail.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 769814d71153..541e65a1eebf 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -848,6 +848,14 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
struct io_ring_ctx *ctx = req->ctx;
bool posted;
+ /*
+ * If multishot has already posted deferred completions, ensure that
+ * those are flushed first before posting this one. If not, CQEs
+ * could get reordered.
+ */
+ if (!wq_list_empty(&ctx->submit_state.compl_reqs))
+ __io_submit_flush_completions(ctx);
+
lockdep_assert(!io_wq_current_is_worker());
lockdep_assert_held(&ctx->uring_lock);
Hi stable(a)vger.kernel.org,
Could you please apply:
1. Commit a7208610761ae ("Bluetooth: btmtk: Remove resetting mt7921
before downloading the fw") to v6.12.x (it's already in
v6.14).
2. Commit 33634e2ab7c6 ("Bluetooth: btmtk: Remove the resetting step
before downloading the fw") to v6.12.x and v6.14.x.
These fixes address an issue with some audio interfaces failing to
initialise during boot on kernels 6.11+. As noted in my original
analysis below, the MediaTek Bluetooth controller reset increases the
device setup time from ~200ms to ~20s and can interfere with other USB
devices on the bus.
Thanks,
Geoffrey.
On Fri, Mar 28, 2025 at 08:44:49AM +1030, Geoffrey D. Bennett wrote:
> Hi all,
>
> Sorry, I see that an identical patch has already been applied to
> bluetooth-next
> https://lore.kernel.org/linux-bluetooth/20250315022730.11071-1-hao.qin@medi…
>
> While I'm glad the issue is being addressed, my original patch
> https://lore.kernel.org/linux-bluetooth/Z8ybV04CVUfVAykH@m.b4.vu/
> contained useful context and tags that didn't make it into the final
> commit.
>
> For getting this fix into current kernel releases 6.12/6.13/6.14, I
> think the patch needs the "Cc: stable(a)vger.kernel.org" tag that was in
> my original submission but missing from Hao's. Since this is causing
> significant issues for users on kernels 6.11+ (audio interfaces
> failing to work), it's important this gets backported.
>
> Hao, is this something you can do? I think the instructions at
> https://www.kernel.org/doc/html/v6.14/process/stable-kernel-rules.html#opti…
> need to be followed, but I've not done this before.
>
> Thanks,
> Geoffrey.
>
> On Fri, Mar 28, 2025 at 07:22:06AM +1030, Geoffrey D. Bennett wrote:
> > This reverts commit ccfc8948d7e4d93cab341a99774b24586717d89a.
> >
> > The MediaTek Bluetooth controller reset that was added increases the
> > Bluetooth device setup time from ~200ms to ~20s and interferes with
> > other devices on the bus.
> >
> > Three users (with Focusrite Scarlett 2nd Gen 6i6 and 3rd Gen Solo and
> > 4i4 audio interfaces) reported that since 6.11 (which added this
> > commit) their audio interface fails to initialise if connected during
> > boot. Two of the users confirmed they have an MT7922.
> >
> > Errors like this are observed in dmesg for the audio interface:
> >
> > usb 3-4: parse_audio_format_rates_v2v3(): unable to find clock source (clock -110)
> > usb 3-4: uac_clock_source_is_valid(): cannot get clock validity for id 41
> > usb 3-4: clock source 41 is not valid, cannot use
> >
> > The problem only occurs when both devices and kernel modules are
> > present and loaded during system boot, so it can be worked around by
> > connecting the audio interface after booting.
> >
> > Fixes: ccfc8948d7e4 ("Bluetooth: btusb: mediatek: reset the controller before downloading the fw")
> > Closes: https://github.com/geoffreybennett/linux-fcp/issues/24
> > Bisected-by: Benedikt Ziemons <ben(a)rs485.network>
> > Tested-by: Benedikt Ziemons <ben(a)rs485.network>
> > Cc: stable(a)vger.kernel.org
> > Signed-off-by: Geoffrey D. Bennett <g(a)b4.vu>
> > ---
> > Changelog:
> >
> > v1 -> v2:
> >
> > - Updated commit message with additional information.
> > - No change to this patch's diff.
> > - Dropped alternate patch that only reverted for 0x7922.
> > - Chris, Sean, Hao agreed to reverting the change:
> > https://lore.kernel.org/linux-bluetooth/2025031352-octopus-quadrant-f7ca@gr…
> >
> > drivers/bluetooth/btmtk.c | 10 ----------
> > 1 file changed, 10 deletions(-)
> >
> > diff --git a/drivers/bluetooth/btmtk.c b/drivers/bluetooth/btmtk.c
> > index 68846c5bd4f7..4390fd571dbd 100644
> > --- a/drivers/bluetooth/btmtk.c
> > +++ b/drivers/bluetooth/btmtk.c
> > @@ -1330,13 +1330,6 @@ int btmtk_usb_setup(struct hci_dev *hdev)
> > break;
> > case 0x7922:
> > case 0x7925:
> > - /* Reset the device to ensure it's in the initial state before
> > - * downloading the firmware to ensure.
> > - */
> > -
> > - if (!test_bit(BTMTK_FIRMWARE_LOADED, &btmtk_data->flags))
> > - btmtk_usb_subsys_reset(hdev, dev_id);
> > - fallthrough;
> > case 0x7961:
> > btmtk_fw_get_filename(fw_bin_name, sizeof(fw_bin_name), dev_id,
> > fw_version, fw_flavor);
> > @@ -1345,12 +1338,9 @@ int btmtk_usb_setup(struct hci_dev *hdev)
> > btmtk_usb_hci_wmt_sync);
> > if (err < 0) {
> > bt_dev_err(hdev, "Failed to set up firmware (%d)", err);
> > - clear_bit(BTMTK_FIRMWARE_LOADED, &btmtk_data->flags);
> > return err;
> > }
> >
> > - set_bit(BTMTK_FIRMWARE_LOADED, &btmtk_data->flags);
> > -
> > /* It's Device EndPoint Reset Option Register */
> > err = btmtk_usb_uhw_reg_write(hdev, MTK_EP_RST_OPT,
> > MTK_EP_RST_IN_OUT_OPT);
> > --
> > 2.45.0
> >
> >
> >