6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit fe52c649438b8489c9456681d93a9b3de3d38263 upstream.
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
>From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
v3:
* Reverse the seqno sort order for a simpler squashing pass. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-3-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 126 +++++++++++++++++--------------------
1 file changed, 60 insertions(+), 66 deletions(-)
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return 1;
+ else if (dma_fence_is_later(a, b))
+ return -1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_mer
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,80 +115,55 @@ struct dma_fence *__dma_fence_unwrap_mer
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
+ }
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- }
- } while (tmp);
+ if (count == 0 || count == 1)
+ goto return_fastpath;
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
- }
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
- if (count == 1) {
- tmp = array[0];
- goto return_tmp;
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ for (i = 1; i < count; i++) {
+ if (array[i]->context == array[j]->context)
+ dma_fence_put(array[i]);
+ else
+ array[++j] = array[i];
}
+ count = ++j;
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- for (i = 0; i < count; i++)
- dma_fence_put(array[i]);
- tmp = NULL;
- goto return_tmp;
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
+ tmp = NULL;
+ goto return_tmp;
+ }
+ return &result->base;
}
- return &result->base;
+
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
+ tmp = array[0];
return_tmp:
kfree(array);
This is a note to let you know that I've just added the patch titled
dma-fence: Use kernel's sort for merging fences
to the 6.12-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-use-kernel-s-sort-for-merging-fences.patch
and it can be found in the queue-6.12 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From fe52c649438b8489c9456681d93a9b3de3d38263 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:50 +0000
Subject: dma-fence: Use kernel's sort for merging fences
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit fe52c649438b8489c9456681d93a9b3de3d38263 upstream.
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
v3:
* Reverse the seqno sort order for a simpler squashing pass. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-3-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 126 +++++++++++++++++--------------------
1 file changed, 60 insertions(+), 66 deletions(-)
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return 1;
+ else if (dma_fence_is_later(a, b))
+ return -1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_mer
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,80 +115,55 @@ struct dma_fence *__dma_fence_unwrap_mer
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
+ }
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- }
- } while (tmp);
+ if (count == 0 || count == 1)
+ goto return_fastpath;
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
- }
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
- if (count == 1) {
- tmp = array[0];
- goto return_tmp;
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ for (i = 1; i < count; i++) {
+ if (array[i]->context == array[j]->context)
+ dma_fence_put(array[i]);
+ else
+ array[++j] = array[i];
}
+ count = ++j;
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- for (i = 0; i < count; i++)
- dma_fence_put(array[i]);
- tmp = NULL;
- goto return_tmp;
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
+ tmp = NULL;
+ goto return_tmp;
+ }
+ return &result->base;
}
- return &result->base;
+
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
+ tmp = array[0];
return_tmp:
kfree(array);
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.12/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.12/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.12/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
This is a note to let you know that I've just added the patch titled
dma-fence: Fix reference leak on fence merge failure path
to the 6.12-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
and it can be found in the queue-6.12 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From 949291c5314009b4f6e252391edbb40fdd5d5414 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:49 +0000
Subject: dma-fence: Fix reference leak on fence merge failure path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit 949291c5314009b4f6e252391edbb40fdd5d5414 upstream.
Release all fence references if the output dma-fence-array could not be
allocated.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-2-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -164,6 +164,8 @@ restart:
dma_fence_context_alloc(1),
1, false);
if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
tmp = NULL;
goto return_tmp;
}
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.12/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.12/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.12/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
This is a note to let you know that I've just added the patch titled
dma-fence: Use kernel's sort for merging fences
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-use-kernel-s-sort-for-merging-fences.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From fe52c649438b8489c9456681d93a9b3de3d38263 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:50 +0000
Subject: dma-fence: Use kernel's sort for merging fences
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit fe52c649438b8489c9456681d93a9b3de3d38263 upstream.
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
v3:
* Reverse the seqno sort order for a simpler squashing pass. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-3-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 126 +++++++++++++++++--------------------
1 file changed, 60 insertions(+), 66 deletions(-)
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return 1;
+ else if (dma_fence_is_later(a, b))
+ return -1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_mer
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,80 +115,55 @@ struct dma_fence *__dma_fence_unwrap_mer
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
+ }
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- }
- } while (tmp);
+ if (count == 0 || count == 1)
+ goto return_fastpath;
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
- }
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
- if (count == 1) {
- tmp = array[0];
- goto return_tmp;
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ for (i = 1; i < count; i++) {
+ if (array[i]->context == array[j]->context)
+ dma_fence_put(array[i]);
+ else
+ array[++j] = array[i];
}
+ count = ++j;
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- for (i = 0; i < count; i++)
- dma_fence_put(array[i]);
- tmp = NULL;
- goto return_tmp;
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
+ tmp = NULL;
+ goto return_tmp;
+ }
+ return &result->base;
}
- return &result->base;
+
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
+ tmp = array[0];
return_tmp:
kfree(array);
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.6/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.6/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.6/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
This is a note to let you know that I've just added the patch titled
dma-fence: Fix reference leak on fence merge failure path
to the 6.6-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
and it can be found in the queue-6.6 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From 949291c5314009b4f6e252391edbb40fdd5d5414 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:49 +0000
Subject: dma-fence: Fix reference leak on fence merge failure path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit 949291c5314009b4f6e252391edbb40fdd5d5414 upstream.
Release all fence references if the output dma-fence-array could not be
allocated.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-2-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -164,6 +164,8 @@ restart:
dma_fence_context_alloc(1),
1, false);
if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
tmp = NULL;
goto return_tmp;
}
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.6/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.6/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.6/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
This is a note to let you know that I've just added the patch titled
dma-fence: Use kernel's sort for merging fences
to the 6.1-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-use-kernel-s-sort-for-merging-fences.patch
and it can be found in the queue-6.1 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From fe52c649438b8489c9456681d93a9b3de3d38263 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:50 +0000
Subject: dma-fence: Use kernel's sort for merging fences
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit fe52c649438b8489c9456681d93a9b3de3d38263 upstream.
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
v3:
* Reverse the seqno sort order for a simpler squashing pass. (Christian)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-3-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 128 ++++++++++++++---------------
1 file changed, 61 insertions(+), 67 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index b19d0adf6086..6345062731f1 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor)
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return 1;
+ else if (dma_fence_is_later(a, b))
+ return -1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,80 +115,55 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
-
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- }
- } while (tmp);
-
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
}
- if (count == 1) {
- tmp = array[0];
- goto return_tmp;
- }
+ if (count == 0 || count == 1)
+ goto return_fastpath;
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- for (i = 0; i < count; i++)
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
+
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ for (i = 1; i < count; i++) {
+ if (array[i]->context == array[j]->context)
dma_fence_put(array[i]);
- tmp = NULL;
- goto return_tmp;
+ else
+ array[++j] = array[i];
}
- return &result->base;
+ count = ++j;
+
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
+ tmp = NULL;
+ goto return_tmp;
+ }
+ return &result->base;
+ }
+
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
+ tmp = array[0];
return_tmp:
kfree(array);
--
2.47.1
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.1/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.1/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.1/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
This is a note to let you know that I've just added the patch titled
dma-fence: Fix reference leak on fence merge failure path
to the 6.1-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
and it can be found in the queue-6.1 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From 949291c5314009b4f6e252391edbb40fdd5d5414 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Date: Fri, 15 Nov 2024 10:21:49 +0000
Subject: dma-fence: Fix reference leak on fence merge failure path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
commit 949291c5314009b4f6e252391edbb40fdd5d5414 upstream.
Release all fence references if the output dma-fence-array could not be
allocated.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241115102153.1980-2-tursuli…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dma-buf/dma-fence-unwrap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index 628af51c81af..b19d0adf6086 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -164,6 +164,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
dma_fence_context_alloc(1),
1, false);
if (!result) {
+ for (i = 0; i < count; i++)
+ dma_fence_put(array[i]);
tmp = NULL;
goto return_tmp;
}
--
2.47.1
Patches currently in stable-queue which might be from tvrtko.ursulin(a)igalia.com are
queue-6.1/dma-fence-use-kernel-s-sort-for-merging-fences.patch
queue-6.1/dma-buf-fix-dma_fence_array_signaled-v4.patch
queue-6.1/dma-fence-fix-reference-leak-on-fence-merge-failure-path.patch
From: Rob Clark <robdclark(a)chromium.org>
Conversion to DRM GPU VA Manager[1], and adding support for Vulkan Sparse
Memory[2] in the form of:
1. A new VM_BIND submitqueue type for executing VM MSM_SUBMIT_BO_OP_MAP/
MAP_NULL/UNMAP commands
2. Extending the SUBMIT` ioctl to allow submitting batches of one or more
MAP/MAP_NULL/UNMAP commands to a VM_BIND submitqueue
The UABI takes a slightly different approach from what other drivers have
done, and what would make sense if starting from a clean sheet, ie separate
VM_BIND and EXEC ioctls. But since we have to maintain support for the
existing SUBMIT ioctl, and because the fence, syncobj, and BO pinning is
largely the same between legacy "BO-table" style SUBMIT ioctls, and new-
style VM updates submitted to a VM_BIND submitqueue, I chose to go the
route of extending the existing `SUBMIT` ioctl rather than adding a new
ioctl.
I also did not implement support for synchronous VM_BIND commands. Since
userspace could just immediately wait for the `SUBMIT` to complete, I don't
think we need this extra complexity in the kernel.
The corresponding mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32533
### Notes/TODOs/Open Questions:
1. The first handful of patches are from Bibek Kumar Patro's series,
"iommu/arm-smmu: introduction of ACTLR implementation for Qualcomm SoCs[3],
which introduces PRR (Partially-Resident-Region) support, needed to
implement MAP_NULL (for Vulkan Sparse Residency[4]
2. Why do VM_BIND commands need fence fd support, instead of just syncobjs?
Mainly for the benefit of virtgpu drm native context guest<->host fence
passing[5], where the host VMM is operating in terms of fence fd's
(syncobs are just a convenience wrapper above a dma_fence, and don't
exist below the guest kernel).
3. Currently shrinker support is disabled (hence this being in Draft/RFC
state). To properly support the shrinker, we need to pre-allocate
various objects and pages needed for the pagetables themselves, to
move memory allocations out of the fence signaling path. This short-
cut was taken to unblock userspace implementation of sparse buffer/
image support.
4. Could/should we do all the vm/vma updates synchronously and defer _only_
the io-pgtable updates to the VM_BIND scheduler queue? This would
simplify the previous point, in that we'd only have to pre-allocate
pages for the io-pgtable updates.
5. Currently we lose support for BO dumping for devcoredump. Ideally we'd
plumb `MSM_SUBMIT_BO_DUMP` flag in a `MAP` commands thru to the resulting
drm_gpuva's. To do this, I think we need to extend drm_gpuva with a
flags field.. the flags can be driver defined, but drm_gpuvm needs to
know not to merge drm_gpuva's with different flags.
This series can be found in MR form, if you prefer:
https://gitlab.freedesktop.org/drm/msm/-/merge_requests/144
[1] https://www.kernel.org/doc/html/next/gpu/drm-mm.html#drm-gpuvm
[2] https://docs.vulkan.org/spec/latest/chapters/sparsemem.html
[3] https://patchwork.kernel.org/project/linux-arm-kernel/list/?series=909700
[4] https://docs.vulkan.org/spec/latest/chapters/sparsemem.html#sparsememory-pa…
[5] https://patchew.org/linux/20231007194747.788934-1-dmitry.osipenko@collabora…
Rob Clark (24):
HACK: drm/msm: Disable shrinker
drm/gpuvm: Don't require obj lock in destructor path
drm/gpuvm: Remove bogus lock assert
drm/msm: Rename msm_file_private -> msm_context
drm/msm: Improve msm_context comments
drm/msm: Rename msm_gem_address_space -> msm_gem_vm
drm/msm: Remove vram carveout support
drm/msm: Collapse vma allocation and initialization
drm/msm: Collapse vma close and delete
drm/msm: drm_gpuvm conversion
drm/msm: Use drm_gpuvm types more
drm/msm: Split submit_pin_objects()
drm/msm: Lazily create context VM
drm/msm: Add opt-in for VM_BIND
drm/msm: Mark VM as unusable on faults
drm/msm: Extend SUBMIT ioctl for VM_BIND
drm/msm: Add VM_BIND submitqueue
drm/msm: Add _NO_SHARE flag
drm/msm: Split out helper to get iommu prot flags
drm/msm: Add mmu support for non-zero offset
drm/msm: Add PRR support
drm/msm: Rename msm_gem_vma_purge() -> _unmap()
drm/msm: Wire up gpuvm ops
drm/msm: Bump UAPI version
drivers/gpu/drm/drm_gpuvm.c | 10 +-
drivers/gpu/drm/msm/Kconfig | 1 +
drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 19 +-
drivers/gpu/drm/msm/adreno/a2xx_gpummu.c | 5 +-
drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 4 +-
drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 4 +-
drivers/gpu/drm/msm/adreno/a5xx_debugfs.c | 4 +-
drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 24 +-
drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +-
drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 10 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 32 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 51 +-
drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 6 +-
drivers/gpu/drm/msm/adreno/a6xx_preempt.c | 10 +-
drivers/gpu/drm/msm/adreno/adreno_gpu.c | 78 ++-
drivers/gpu/drm/msm/adreno/adreno_gpu.h | 22 +-
.../drm/msm/disp/dpu1/dpu_encoder_phys_wb.c | 14 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_formats.c | 18 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_formats.h | 2 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 18 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 14 +-
drivers/gpu/drm/msm/disp/dpu1/dpu_plane.h | 4 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 6 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 28 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 12 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_crtc.c | 4 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 19 +-
drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 12 +-
drivers/gpu/drm/msm/dsi/dsi_host.c | 14 +-
drivers/gpu/drm/msm/msm_drv.c | 175 ++----
drivers/gpu/drm/msm/msm_drv.h | 31 +-
drivers/gpu/drm/msm/msm_fb.c | 18 +-
drivers/gpu/drm/msm/msm_fbdev.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 403 ++++++-------
drivers/gpu/drm/msm/msm_gem.h | 193 +++++--
drivers/gpu/drm/msm/msm_gem_prime.c | 15 +
drivers/gpu/drm/msm/msm_gem_submit.c | 223 +++++--
drivers/gpu/drm/msm/msm_gem_vma.c | 543 +++++++++++++++---
drivers/gpu/drm/msm/msm_gpu.c | 66 ++-
drivers/gpu/drm/msm/msm_gpu.h | 132 +++--
drivers/gpu/drm/msm/msm_iommu.c | 84 ++-
drivers/gpu/drm/msm/msm_kms.c | 14 +-
drivers/gpu/drm/msm/msm_kms.h | 2 +-
drivers/gpu/drm/msm/msm_mmu.h | 2 +-
drivers/gpu/drm/msm/msm_ringbuffer.c | 4 +-
drivers/gpu/drm/msm/msm_submitqueue.c | 86 ++-
include/uapi/drm/msm_drm.h | 98 +++-
48 files changed, 1637 insertions(+), 903 deletions(-)
--
2.47.1