The patch titled
Subject: userfaultfd: fix PTE unmapping stack-allocated PTE copies
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: fix PTE unmapping stack-allocated PTE copies
Date: Wed, 26 Feb 2025 10:55:09 -0800
Current implementation of move_pages_pte() copies source and destination
PTEs in order to detect concurrent changes to PTEs involved in the move.
However these copies are also used to unmap the PTEs, which will fail if
CONFIG_HIGHPTE is enabled because the copies are allocated on the stack.
Fix this by using the actual PTEs which were kmap()ed.
Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies
+++ a/mm/userfaultfd.c
@@ -1290,8 +1290,8 @@ retry:
spin_unlock(src_ptl);
if (!locked) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
@@ -1307,8 +1307,8 @@ retry:
/* at this point we have src_folio locked */
if (folio_test_large(src_folio)) {
/* split_folio() can block */
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
err = split_folio(src_folio);
if (err)
@@ -1333,8 +1333,8 @@ retry:
goto out;
}
if (!anon_vma_trylock_write(src_anon_vma)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
anon_vma_lock_write(src_anon_vma);
@@ -1352,8 +1352,8 @@ retry:
entry = pte_to_swp_entry(orig_src_pte);
if (non_swap_entry(entry)) {
if (is_migration_entry(entry)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
migration_entry_wait(mm, src_pmd, src_addr);
err = -EAGAIN;
@@ -1396,8 +1396,8 @@ retry:
src_folio = folio;
src_folio_pte = orig_src_pte;
if (!folio_trylock(src_folio)) {
- pte_unmap(&orig_src_pte);
- pte_unmap(&orig_dst_pte);
+ pte_unmap(src_pte);
+ pte_unmap(dst_pte);
src_pte = dst_pte = NULL;
/* now we can block and wait */
folio_lock(src_folio);
_
Patches currently in -mm which might be from surenb(a)google.com are
userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount.patch
userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies.patch
mm-avoid-extra-mem_alloc_profiling_enabled-checks.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-slab-allocator.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-page-allocator.patch
mm-introduce-vma_start_read_locked_nested-helpers.patch
mm-move-per-vma-lock-into-vm_area_struct.patch
mm-mark-vma-as-detached-until-its-added-into-vma-tree.patch
mm-introduce-vma_iter_store_attached-to-use-with-attached-vmas.patch
mm-mark-vmas-detached-upon-exit.patch
types-move-struct-rcuwait-into-typesh.patch
mm-allow-vma_start_read_locked-vma_start_read_locked_nested-to-fail.patch
mm-move-mmap_init_lock-out-of-the-header-file.patch
mm-uninline-the-main-body-of-vma_start_write.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused-fix.patch
refcount-introduce-__refcount_addinc_not_zero_limited_acquire.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count-fix.patch
mm-move-lesser-used-vma_area_struct-members-into-the-last-cacheline.patch
mm-debug-print-vm_refcnt-state-when-dumping-the-vma.patch
mm-remove-extra-vma_numab_state_init-call.patch
mm-prepare-lock_vma_under_rcu-for-vma-reuse-possibility.patch
mm-make-vma-cache-slab_typesafe_by_rcu.patch
mm-make-vma-cache-slab_typesafe_by_rcu-fix.patch
docs-mm-document-latest-changes-to-vm_lock.patch
The patch titled
Subject: userfaultfd: do not block on locking a large folio with raised refcount
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: do not block on locking a large folio with raised refcount
Date: Wed, 26 Feb 2025 10:55:08 -0800
Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock
state when it goes into split_folio() with raised folio refcount.
split_folio() expects the reference count to be exactly mapcount +
num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN
otherwise.
If multiple processes are trying to move the same large folio, they raise
the refcount (all tasks succeed in that) then one of them succeeds in
locking the folio, while others will block in folio_lock() while keeping
the refcount raised. The winner of this race will proceed with calling
split_folio() and will fail returning EAGAIN to the caller and unlocking
the folio. The next competing process will get the folio locked and will
go through the same flow. In the meantime the original winner will be
retried and will block in folio_lock(), getting into the queue of waiting
processes only to repeat the same path. All this results in a livelock.
An easy fix would be to avoid waiting for the folio lock while holding
folio refcount, similar to madvise_free_huge_pmd() where folio lock is
acquired before raising the folio refcount. Since we lock and take a
refcount of the folio while holding the PTE lock, changing the order of
these operations should not break anything.
Modify move_pages_pte() to try locking the folio first and if that fails
and the folio is large then return EAGAIN without touching the folio
refcount. If the folio is single-page then split_folio() is not called,
so we don't have this issue. Lokesh has a reproducer [1] and I verified
that this change fixes the issue.
[1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock
Link: https://lkml.kernel.org/r/20250226185510.2732648-2-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Lokesh Gidra <lokeshgidra(a)google.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Barry Song <v-songbaohua(a)oppo.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
--- a/mm/userfaultfd.c~userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount
+++ a/mm/userfaultfd.c
@@ -1250,6 +1250,7 @@ retry:
*/
if (!src_folio) {
struct folio *folio;
+ bool locked;
/*
* Pin the page while holding the lock to be sure the
@@ -1269,12 +1270,26 @@ retry:
goto out;
}
+ locked = folio_trylock(folio);
+ /*
+ * We avoid waiting for folio lock with a raised refcount
+ * for large folios because extra refcounts will result in
+ * split_folio() failing later and retrying. If multiple
+ * tasks are trying to move a large folio we can end
+ * livelocking.
+ */
+ if (!locked && folio_test_large(folio)) {
+ spin_unlock(src_ptl);
+ err = -EAGAIN;
+ goto out;
+ }
+
folio_get(folio);
src_folio = folio;
src_folio_pte = orig_src_pte;
spin_unlock(src_ptl);
- if (!folio_trylock(src_folio)) {
+ if (!locked) {
pte_unmap(&orig_src_pte);
pte_unmap(&orig_dst_pte);
src_pte = dst_pte = NULL;
_
Patches currently in -mm which might be from surenb(a)google.com are
userfaultfd-do-not-block-on-locking-a-large-folio-with-raised-refcount.patch
userfaultfd-fix-pte-unmapping-stack-allocated-pte-copies.patch
mm-avoid-extra-mem_alloc_profiling_enabled-checks.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-slab-allocator.patch
alloc_tag-uninline-code-gated-by-mem_alloc_profiling_key-in-page-allocator.patch
mm-introduce-vma_start_read_locked_nested-helpers.patch
mm-move-per-vma-lock-into-vm_area_struct.patch
mm-mark-vma-as-detached-until-its-added-into-vma-tree.patch
mm-introduce-vma_iter_store_attached-to-use-with-attached-vmas.patch
mm-mark-vmas-detached-upon-exit.patch
types-move-struct-rcuwait-into-typesh.patch
mm-allow-vma_start_read_locked-vma_start_read_locked_nested-to-fail.patch
mm-move-mmap_init_lock-out-of-the-header-file.patch
mm-uninline-the-main-body-of-vma_start_write.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused.patch
refcount-provide-ops-for-cases-when-objects-memory-can-be-reused-fix.patch
refcount-introduce-__refcount_addinc_not_zero_limited_acquire.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count.patch
mm-replace-vm_lock-and-detached-flag-with-a-reference-count-fix.patch
mm-move-lesser-used-vma_area_struct-members-into-the-last-cacheline.patch
mm-debug-print-vm_refcnt-state-when-dumping-the-vma.patch
mm-remove-extra-vma_numab_state_init-call.patch
mm-prepare-lock_vma_under_rcu-for-vma-reuse-possibility.patch
mm-make-vma-cache-slab_typesafe_by_rcu.patch
mm-make-vma-cache-slab_typesafe_by_rcu-fix.patch
docs-mm-document-latest-changes-to-vm_lock.patch
The quilt patch titled
Subject: mm: zswap: fix crypto_free_acomp deadlock in zswap_cpu_comp_dead
has been removed from the -mm tree. Its filename was
mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Subject: mm: zswap: fix crypto_free_acomp deadlock in zswap_cpu_comp_dead
Date: Tue, 25 Feb 2025 16:53:58 +0800
Call crypto_free_acomp outside of the mutex in zswap_cpu_comp_dead() as
otherwise this could deadlock as the allocation path may lead back into
zswap while holding the same lock. Zap the pointers to acomp and buffer
after freeing.
Also move the NULL check on acomp_ctx so that it takes place before
the mutex dereference.
Link: https://lkml.kernel.org/r/Z72FJnbA39zWh4zS@gondor.apana.org.au
Fixes: 12dcb0ef5406 ("mm: zswap: properly synchronize freeing resources during CPU hotunplug")
Reported-by: syzbot+1a517ccfcbc6a7ab0f82(a)syzkaller.appspotmail.com
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Yosry Ahmed <yosry.ahmed(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zswap.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
--- a/mm/zswap.c~mm-zswap-fix-crypto_free_acomp-deadlock-in-zswap_cpu_comp_dead
+++ a/mm/zswap.c
@@ -881,18 +881,23 @@ static int zswap_cpu_comp_dead(unsigned
{
struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
+ struct crypto_acomp *acomp = NULL;
+
+ if (IS_ERR_OR_NULL(acomp_ctx))
+ return 0;
mutex_lock(&acomp_ctx->mutex);
- if (!IS_ERR_OR_NULL(acomp_ctx)) {
- if (!IS_ERR_OR_NULL(acomp_ctx->req))
- acomp_request_free(acomp_ctx->req);
- acomp_ctx->req = NULL;
- if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
- crypto_free_acomp(acomp_ctx->acomp);
- kfree(acomp_ctx->buffer);
- }
+ if (!IS_ERR_OR_NULL(acomp_ctx->req))
+ acomp_request_free(acomp_ctx->req);
+ acomp_ctx->req = NULL;
+ acomp = acomp_ctx->acomp;
+ acomp_ctx->acomp = NULL;
+ kfree(acomp_ctx->buffer);
+ acomp_ctx->buffer = NULL;
mutex_unlock(&acomp_ctx->mutex);
+ crypto_free_acomp(acomp);
+
return 0;
}
_
Patches currently in -mm which might be from herbert(a)gondor.apana.org.au are
Patchset bundles two *unrelated* fixes in move_pages_pte because otherwise
they would create a merge conflict. The first fix which was posted before
at [1] fixes a livelock issue. The second change corrects the use of PTEs
when unmapping them.
The patchset applies cleanly over mm-hotfixes-unstable which contains
Barry's fix [2] that changes related code.
[1] https://lore.kernel.org/all/20250225204613.2316092-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250226003234.0B98FC4CEDD@smtp.kernel.org/
Suren Baghdasaryan (2):
userfaultfd: do not block on locking a large folio with raised
refcount
userfaultfd: fix PTE unmapping stack-allocated PTE copies
mm/userfaultfd.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
base-commit: a88b5ef577dd7ddb8606ef233c0634f05e884d4a
--
2.48.1.658.g4767266eb4-goog
Hi Greg, hi Sasha
A while back the following regression after 4bce37a68ff8 ("mips/mm:
Convert to using lock_mm_and_find_vma()") was reported:
https://lore.kernel.org/all/75e9fd7b08562ad9b456a5bdaacb7cc220311cc9.camel@…
affecting mips64el. This was later on fixed by 8fa507083388
("mm/memory: Use exception ip to search exception tables") in 6.8-rc5
and which got backported to 6.7.6 and 6.6.18.
The breaking commit was part of a series covering a security fix
(CVE-2023-3269), and landed in 6.5-rc1 and backported to 6.4.1, 6.3.11
and 6.1.37.
So far 6.1.y remained unfixed and in fact in Debian we got reports
about this issue seen on the build infrastructure when building
various packages, details are in:
https://bugs.debian.org/1086028https://bugs.debian.org/1087809https://bugs.debian.org/1093200
The fix probably did not got backported as there is one dependency
missing which was not CC'ed for stable afaics.
Thus, can you please cherry-pick the following two commits please as
well for 6.1.y?
11ba1728be3e ("ptrace: Introduce exception_ip arch hook")
8fa507083388 ("mm/memory: Use exception ip to search exception tables")
Sergei Golovan confirmed as well by testing that this fixes the seen
issue as well in 6.1.y, cf. https://bugs.debian.org/1086028#95
Thanks in advance already.
Regards,
Salvatore
Currently we just leave it uninitialised, which at first looks harmless,
however we also don't zero out the pfn array, and with pfn_flags_mask
the idea is to be able set individual flags for a given range of pfn or
completely ignore them, outside of default_flags. So here we end up with
pfn[i] & pfn_flags_mask, and if both are uninitialised we might get back
an unexpected flags value, like asking for read only with default_flags,
but getting back write on top, leading to potentially bogus behaviour.
To fix this ensure we zero the pfn_flags_mask, such that hmm only
considers the default_flags and not also the initial pfn[i] value.
v2 (Thomas):
- Prefer proper initializer.
Fixes: 81e058a3e7fd ("drm/xe: Introduce helper to populate userptr")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.10+
---
drivers/gpu/drm/xe/xe_hmm.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index 089834467880..2e4ae61567d8 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -166,13 +166,20 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
{
unsigned long timeout =
jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
- unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
+ unsigned long *pfns;
struct xe_userptr *userptr;
struct xe_vma *vma = &uvma->vma;
u64 userptr_start = xe_vma_userptr(vma);
u64 userptr_end = userptr_start + xe_vma_size(vma);
struct xe_vm *vm = xe_vma_vm(vma);
- struct hmm_range hmm_range;
+ struct hmm_range hmm_range = {
+ .pfn_flags_mask = 0, /* ignore pfns */
+ .default_flags = HMM_PFN_REQ_FAULT,
+ .start = userptr_start,
+ .end = userptr_end,
+ .notifier = &uvma->userptr.notifier,
+ .dev_private_owner = vm->xe,
+ };
bool write = !xe_vma_read_only(vma);
unsigned long notifier_seq;
u64 npages;
@@ -199,19 +206,14 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
return -ENOMEM;
if (write)
- flags |= HMM_PFN_REQ_WRITE;
+ hmm_range.default_flags |= HMM_PFN_REQ_WRITE;
if (!mmget_not_zero(userptr->notifier.mm)) {
ret = -EFAULT;
goto free_pfns;
}
- hmm_range.default_flags = flags;
hmm_range.hmm_pfns = pfns;
- hmm_range.notifier = &userptr->notifier;
- hmm_range.start = userptr_start;
- hmm_range.end = userptr_end;
- hmm_range.dev_private_owner = vm->xe;
while (true) {
hmm_range.notifier_seq = mmu_interval_read_begin(&userptr->notifier);
--
2.48.1
Null pointer dereference issue could occur when pipe_ctx->plane_state
is null. The fix adds a check to ensure 'pipe_ctx->plane_state' is not
null before accessing. This prevents a null pointer dereference.
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 3be5262e353b ("drm/amd/display: Rename more dc_surface stuff to plane_state")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- modified the patch as suggestions.
---
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 520a34a42827..a45037cb4cc0 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1455,7 +1455,8 @@ bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx)
DC_LOGGER_INIT(pipe_ctx->stream->ctx->logger);
/* Invalid input */
- if (!plane_state->dst_rect.width ||
+ if (!plane_state ||
+ !plane_state->dst_rect.width ||
!plane_state->dst_rect.height ||
!plane_state->src_rect.width ||
!plane_state->src_rect.height) {
--
2.25.1
This patch series attempts to enable the use of xe DRM driver on non-4KiB
kernel page platforms. This involves fixing the ttm/bo interface, as well
as parts of the userspace API to make use of kernel `PAGE_SIZE' for
alignment instead of the assumed `SZ_4K', it also fixes incorrect usage of
`PAGE_SIZE' in the GuC and ring buffer interface code to make sure all
instructions/commands were aligned to 4KiB barriers (per the Programmer's
Manual for the GPUs covered by this DRM driver).
This issue was first discovered and reported by members of the LoongArch
user communities, whose hardware commonly ran on 16KiB-page kernels. The
patch series began on an unassuming branch of a downstream kernel tree
maintained by Shang Yatsen.[^1]
It worked well but remained sparsely documented, a lot of the work done
here relied on Shang Yatsen's original patch.
AOSC OS then picked it up[^2] to provide Intel Xe/Arc support for users of
its LoongArch port, for which I worked extensively on. After months of
positive user feedback and from encouragement from Kexy Biscuit, my
colleague at the community, I decided to examine its potential for
upstreaming, cross-reference kernel and Intel documentation to better
document and revise this patch.
Now that this series has been tested good (for boot up, OpenGL, and
playback of a standardised set of video samples[^3]... with the exception
of the Intel Arc B580, which seems to segfault at intel-media-driver -
iHD_drv_video.so, but strangely, hardware accelerated video playback works
well with Firefox?) on the following platforms (motherboard + GPU model):
- x86-64, 4KiB kernel page:
- MS-7D42 + Intel Arc A580
- LoongArch, 16KiB kernel page:
- XA61200 + GUNNIR DG1 Blue Halberd (Intel DG1)
- XA61200 + ASRock Arc A380 Challenger ITX OC (Intel Arc 380)
- XA61200 + Intel Arc 580
- XA61200 + GUNNIR Intel Arc A750 Photon 8G OC (Intel Arc A750)
- ASUS XC-LS3A6M + GUNNIR Intel Arc B580 INDEX 12G (Intel Arc B580)
On these platforms, basic functionalities tested good but the driver was
unstable with occasional resets (I do suspect however, that this platform
suffers from PCIe coherence issues, as instability only occurs under heavy
VRAM I/O load):
- AArch64, 4KiB/64KiB kernel pages:
- ERUN-FD3000 (Phytium D3000) + GUNNIR Intel Arc A750 Photon 8G OC
(Intel Arc A750)
I think that this patch series is now ready for your comment and review.
Please forgive me if I made any simple mistake or used wrong terminologies,
but I have never worked on a patch for the DRM subsystem and my experience
is still quite thin.
But anyway, just letting you all know that Intel Xe/Arc works on non-4KiB
kernel page platforms (and honestly, it's great to use, especially for
games and media playback)!
[^1]: https://github.com/FanFansfan/loongson-linux/tree/loongarch-xe
[^2]: We maintained Shang Yatsen's patch until our v6.13.3 tree, until
we decided to test and send this series upstream,
https://github.com/AOSC-Tracking/linux/tree/aosc/v6.13.3
[^3]: Delicious hot pot!
https://repo.aosc.io/ahvl/sample-videos-20250223.tar.zst
Suggested-by: Kexy Biscuit <kexybiscuit(a)aosc.io>
Co-developed-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Shang Yatsen <429839446(a)qq.com>
Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io>
---
Mingcong Bai (5):
drm/xe/bo: fix alignment with non-4K kernel page sizes
drm/xe/guc: use SZ_4K for alignment
drm/xe/regs: fix RING_CTL_SIZE(size) calculation
drm/xe: use 4K alignment for cursor jumps
drm/xe/query: use PAGE_SIZE as the minimum page alignment
drivers/gpu/drm/xe/regs/xe_engine_regs.h | 3 +--
drivers/gpu/drm/xe/xe_bo.c | 8 ++++----
drivers/gpu/drm/xe/xe_guc.c | 4 ++--
drivers/gpu/drm/xe/xe_guc_ads.c | 32 ++++++++++++++++----------------
drivers/gpu/drm/xe/xe_guc_capture.c | 8 ++++----
drivers/gpu/drm/xe/xe_guc_ct.c | 2 +-
drivers/gpu/drm/xe/xe_guc_log.c | 4 ++--
drivers/gpu/drm/xe/xe_guc_pc.c | 4 ++--
drivers/gpu/drm/xe/xe_migrate.c | 4 ++--
drivers/gpu/drm/xe/xe_query.c | 2 +-
include/uapi/drm/xe_drm.h | 2 +-
11 files changed, 36 insertions(+), 37 deletions(-)
---
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
change-id: 20250226-xe-non-4k-fix-6b2eded0a564
Best regards,
--
Mingcong Bai <jeffbai(a)aosc.io>