Hi Daniel,
just a gentle ping that you wanted to take a look at this.
Not much changed compared to the last version, only a minor bugfix in
the dma_resv_get_singleton error handling.
Regards,
Christian.
On Tue, Dec 07, 2021 at 01:34:05PM +0100, Christian König wrote:
> This change adds the dma_resv_usage enum and allows us to specify why a
> dma_resv object is queried for its containing fences.
>
> Additional to that a dma_resv_usage_rw() helper function is added to aid
> retrieving the fences for a read or write userspace submission.
>
> This is then deployed to the different query functions of the dma_resv
> object and all of their users. When the write paratermer was previously
> true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise.
>
> v2: add KERNEL/OTHER in separate patch
> v3: some kerneldoc suggestions by Daniel
>
> Signed-off-by: Christian König <christian.koenig(a)amd.com>
Just commenting on the kerneldoc here.
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 40ac9d486f8f..d96d8ca9af56 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -49,6 +49,49 @@ extern struct ww_class reservation_ww_class;
>
> struct dma_resv_list;
>
> +/**
> + * enum dma_resv_usage - how the fences from a dma_resv obj are used
> + *
> + * This enum describes the different use cases for a dma_resv object and
> + * controls which fences are returned when queried.
We need to link here to both dma_buf.resv and from there to here.
Also we had a fair amount of text in the old dma_resv fields which should
probably be included here.
> + */
> +enum dma_resv_usage {
> + /**
> + * @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
> + *
> + * This should only be used for userspace command submissions which add
> + * an implicit write dependency.
> + */
> + DMA_RESV_USAGE_WRITE,
> +
> + /**
> + * @DMA_RESV_USAGE_READ: Implicit read synchronization.
> + *
> + * This should only be used for userspace command submissions which add
> + * an implicit read dependency.
I think the above would benefit from at least a link each to &dma_buf.resv
for further discusion.
Plus the READ flag needs a huge warning that in general it does _not_
guarantee that neither there's no writes possible, nor that the writes can
be assumed mistakes and dropped (on buffer moves e.g.).
Drivers can only make further assumptions for driver-internal dma_resv
objects (e.g. on vm/pagetables) or when the fences are all fences of the
same driver (e.g. the special sync rules amd has that takes the fence
owner into account).
We have this documented in the dma_buf.resv rules, but since it came up
again in a discussion with Thomas H. somewhere, it's better to hammer this
in a few more time. Specically in generally ignoring READ fences for
buffer moves (well the copy job, memory freeing still has to wait for all
of them) is a correctness bug.
Maybe include a big warning that really the difference between READ and
WRITE should only matter for implicit sync, and _not_ for anything else
the kernel does.
I'm assuming the actual replacement is all mechanical, so I skipped that
one for now, that's for next year :-)
-Daniel
> + */
> + DMA_RESV_USAGE_READ,
> +};
> +
> +/**
> + * dma_resv_usage_rw - helper for implicit sync
> + * @write: true if we create a new implicit sync write
> + *
> + * This returns the implicit synchronization usage for write or read accesses,
> + * see enum dma_resv_usage.
> + */
> +static inline enum dma_resv_usage dma_resv_usage_rw(bool write)
> +{
> + /* This looks confusing at first sight, but is indeed correct.
> + *
> + * The rational is that new write operations needs to wait for the
> + * existing read and write operations to finish.
> + * But a new read operation only needs to wait for the existing write
> + * operations to finish.
> + */
> + return write ? DMA_RESV_USAGE_READ : DMA_RESV_USAGE_WRITE;
> +}
> +
> /**
> * struct dma_resv - a reservation object manages fences for a buffer
> *
> @@ -147,8 +190,8 @@ struct dma_resv_iter {
> /** @obj: The dma_resv object we iterate over */
> struct dma_resv *obj;
>
> - /** @all_fences: If all fences should be returned */
> - bool all_fences;
> + /** @usage: Controls which fences are returned */
> + enum dma_resv_usage usage;
>
> /** @fence: the currently handled fence */
> struct dma_fence *fence;
> @@ -178,14 +221,14 @@ struct dma_fence *dma_resv_iter_next(struct dma_resv_iter *cursor);
> * dma_resv_iter_begin - initialize a dma_resv_iter object
> * @cursor: The dma_resv_iter object to initialize
> * @obj: The dma_resv object which we want to iterate over
> - * @all_fences: If all fences should be returned or just the exclusive one
> + * @usage: controls which fences to include, see enum dma_resv_usage.
> */
> static inline void dma_resv_iter_begin(struct dma_resv_iter *cursor,
> struct dma_resv *obj,
> - bool all_fences)
> + enum dma_resv_usage usage)
> {
> cursor->obj = obj;
> - cursor->all_fences = all_fences;
> + cursor->usage = usage;
> cursor->fence = NULL;
> }
>
> @@ -242,7 +285,7 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor)
> * dma_resv_for_each_fence - fence iterator
> * @cursor: a struct dma_resv_iter pointer
> * @obj: a dma_resv object pointer
> - * @all_fences: true if all fences should be returned
> + * @usage: controls which fences to return
> * @fence: the current fence
> *
> * Iterate over the fences in a struct dma_resv object while holding the
> @@ -251,8 +294,8 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor)
> * valid as long as the lock is held and so no extra reference to the fence is
> * taken.
> */
> -#define dma_resv_for_each_fence(cursor, obj, all_fences, fence) \
> - for (dma_resv_iter_begin(cursor, obj, all_fences), \
> +#define dma_resv_for_each_fence(cursor, obj, usage, fence) \
> + for (dma_resv_iter_begin(cursor, obj, usage), \
> fence = dma_resv_iter_first(cursor); fence; \
> fence = dma_resv_iter_next(cursor))
>
> @@ -419,14 +462,14 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
> void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
> struct dma_fence *fence);
> void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
> -int dma_resv_get_fences(struct dma_resv *obj, bool write,
> +int dma_resv_get_fences(struct dma_resv *obj, enum dma_resv_usage usage,
> unsigned int *num_fences, struct dma_fence ***fences);
> -int dma_resv_get_singleton(struct dma_resv *obj, bool write,
> +int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage,
> struct dma_fence **fence);
> int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
> -long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr,
> - unsigned long timeout);
> -bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all);
> +long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage,
> + bool intr, unsigned long timeout);
> +bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage);
> void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq);
>
> #endif /* _LINUX_RESERVATION_H */
> --
> 2.25.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Useful for checking for dma-fence signalling annotations since they
don't quite nest as freely as we'd like to.
Cc: Matthew Brost <matthew.brost(a)intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/dma-buf/dma-fence.c | 19 +++++++++++++++++++
include/linux/dma-fence.h | 2 ++
2 files changed, 21 insertions(+)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..2b7c3fc965e6 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -307,6 +307,25 @@ bool dma_fence_begin_signalling(void)
}
EXPORT_SYMBOL(dma_fence_begin_signalling);
+/**
+ * dma_fence_assert_in_signalling_section - check fence signalling annotations
+ *
+ * Since dma_fence_begin_signalling() and dma_fence_end_signalling() are built
+ * using lockdep annotations they have limitations on how freely they can be
+ * nested. Specifically, they cannot be on both inside and outside of locked
+ * sections, which in practice means the annotations often have to be pushed out
+ * to the top level callers.
+ *
+ * To ensure low-level functions are only called with the correction
+ * annotations, this function can be used to check for that.
+ */
+void dma_fence_assert_in_signalling_section(void)
+{
+ if (!in_atomic())
+ lockdep_assert(lock_is_held(&dma_fence_lockdep_map));
+}
+EXPORT_SYMBOL(dma_fence_assert_in_signalling_section);
+
/**
* dma_fence_end_signalling - end a critical DMA fence signalling section
* @cookie: opaque cookie from dma_fence_begin_signalling()
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..7179a5692f72 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -356,6 +356,7 @@ dma_fence_get_rcu_safe(struct dma_fence __rcu **fencep)
#ifdef CONFIG_LOCKDEP
bool dma_fence_begin_signalling(void);
+void dma_fence_assert_in_signalling_section(void);
void dma_fence_end_signalling(bool cookie);
void __dma_fence_might_wait(void);
#else
@@ -363,6 +364,7 @@ static inline bool dma_fence_begin_signalling(void)
{
return true;
}
+static inline void dma_fence_assert_in_signalling_section(void) {}
static inline void dma_fence_end_signalling(bool cookie) {}
static inline void __dma_fence_might_wait(void) {}
#endif
--
2.34.1
From: Xiaoke Wang <xkernel.wang(a)foxmail.com>
kstrdup() is a memory allocation function which can return NULL when
some internaly memory errors happen. It is better to check the return
value of it to prevent further wrong memory access.
Signed-off-by: Xiaoke Wang <xkernel.wang(a)foxmail.com>
---
drivers/dma-buf/selftest.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/dma-buf/selftest.c b/drivers/dma-buf/selftest.c
index c60b694..2c29e2a 100644
--- a/drivers/dma-buf/selftest.c
+++ b/drivers/dma-buf/selftest.c
@@ -50,6 +50,9 @@ static bool apply_subtest_filter(const char *caller, const char *name)
bool result = true;
filter = kstrdup(__st_filter, GFP_KERNEL);
+ if (!filter)
+ return false;
+
for (sep = filter; (tok = strsep(&sep, ","));) {
bool allow = true;
char *sl;
--
Workstation application ANSA/META get this error dmesg:
[drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
This is caused by:
1. create a 256MB buffer in invisible VRAM
2. CPU map the buffer and access it causes vm_fault and try to move
it to visible VRAM
3. force visible VRAM space and traverse all VRAM bos to check if
evicting this bo is valuable
4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
will set amdgpu_vm->evicting, but latter due to not in visible
VRAM, won't really evict it so not add it to amdgpu_vm->evicted
5. before next CS to clear the amdgpu_vm->evicting, user VM ops
ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
but fail in amdgpu_vm_bo_update_mapping() (check
amdgpu_vm->evicting) and get this error log
This error won't affect functionality as next CS will finish the
waiting VM ops. But we'd better make the amdgpu_vm->evicting
correctly reflact the vm status and clear the error log.
Signed-off-by: Qiang Yu <qiang.yu(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 85 ++++++++++++++-----------
1 file changed, 47 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5a32ee66d8c8..88a27911054f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1306,45 +1306,11 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device *adev, struct ttm_tt *ttm,
return flags;
}
-/*
- * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
- * object.
- *
- * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
- * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
- * it can find space for a new object and by ttm_bo_force_list_clean() which is
- * used to clean out a memory space.
- */
-static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
- const struct ttm_place *place)
+static bool amdgpu_ttm_mem_eviction_valuable(struct ttm_buffer_object *bo,
+ const struct ttm_place *place)
{
unsigned long num_pages = bo->resource->num_pages;
struct amdgpu_res_cursor cursor;
- struct dma_resv_list *flist;
- struct dma_fence *f;
- int i;
-
- /* Swapout? */
- if (bo->resource->mem_type == TTM_PL_SYSTEM)
- return true;
-
- if (bo->type == ttm_bo_type_kernel &&
- !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
- return false;
-
- /* If bo is a KFD BO, check if the bo belongs to the current process.
- * If true, then return false as any KFD process needs all its BOs to
- * be resident to run successfully
- */
- flist = dma_resv_shared_list(bo->base.resv);
- if (flist) {
- for (i = 0; i < flist->shared_count; ++i) {
- f = rcu_dereference_protected(flist->shared[i],
- dma_resv_held(bo->base.resv));
- if (amdkfd_fence_check_mm(f, current->mm))
- return false;
- }
- }
switch (bo->resource->mem_type) {
case AMDGPU_PL_PREEMPT:
@@ -1377,10 +1343,53 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
return false;
default:
- break;
+ return ttm_bo_eviction_valuable(bo, place);
}
+}
- return ttm_bo_eviction_valuable(bo, place);
+/*
+ * amdgpu_ttm_bo_eviction_valuable - Check to see if we can evict a buffer
+ * object.
+ *
+ * Return true if eviction is sensible. Called by ttm_mem_evict_first() on
+ * behalf of ttm_bo_mem_force_space() which tries to evict buffer objects until
+ * it can find space for a new object and by ttm_bo_force_list_clean() which is
+ * used to clean out a memory space.
+ */
+static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
+ const struct ttm_place *place)
+{
+ struct dma_resv_list *flist;
+ struct dma_fence *f;
+ int i;
+
+ /* Swapout? */
+ if (bo->resource->mem_type == TTM_PL_SYSTEM)
+ return true;
+
+ /* If bo is a KFD BO, check if the bo belongs to the current process.
+ * If true, then return false as any KFD process needs all its BOs to
+ * be resident to run successfully
+ */
+ flist = dma_resv_shared_list(bo->base.resv);
+ if (flist) {
+ for (i = 0; i < flist->shared_count; ++i) {
+ f = rcu_dereference_protected(flist->shared[i],
+ dma_resv_held(bo->base.resv));
+ if (amdkfd_fence_check_mm(f, current->mm))
+ return false;
+ }
+ }
+
+ /* Check by different mem type. */
+ if (!amdgpu_ttm_mem_eviction_valuable(bo, place))
+ return false;
+
+ /* VM bo should be checked at last because it will mark VM evicting. */
+ if (bo->type == ttm_bo_type_kernel)
+ return amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo));
+
+ return true;
}
static void amdgpu_ttm_vram_mm_access(struct amdgpu_device *adev, loff_t pos,
--
2.25.1
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
T.J. Mercier (6):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
dmabuf: Use the GPU cgroup charge/uncharge APIs
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
android: binder: Add a buffer flag to relinquish ownership of fds
Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++
Documentation/gpu/rfc/index.rst | 4 +
drivers/android/binder.c | 26 +++
drivers/dma-buf/dma-buf.c | 100 +++++++++
drivers/dma-buf/dma-heap.c | 27 +++
drivers/dma-buf/heaps/system_heap.c | 3 +
include/linux/cgroup_gpu.h | 127 +++++++++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 22 +-
include/linux/dma-heap.h | 11 +
include/uapi/linux/android/binder.h | 1 +
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 304 +++++++++++++++++++++++++++
14 files changed, 830 insertions(+), 2 deletions(-)
create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
--
2.35.1.265.g69c8d7142f-goog
Hi guys,
by now that should be a rather well known set of changes.
The basic idea is that instead of the fixed exclusive/shared classes we now
attach an usage to each fence in the dma_resv object describing how the
operation represented by the fence is using the resources protected by
the dma_resv.
I've addressed quite a bunch of comments already and I think this set has
already been discussed quite well now. As improvement to the last version
I've now added CCs for all the relevant maintainers to the patches changing
some functionality inside drivers.
Please review and comment,
Christian.