- Linaro-mm-sig - lists.linaro.org

[PATCH] dma-buf: Delete the DMA-BUF attachment sysfs statistics

by Hridya Valsaraju

The DMA-BUF attachment statistics form a subset of the DMA-BUF sysfs statistics that recently merged to the drm-misc tree. Since there has been a reported a performance regression due to the overhead of sysfs directory creation/teardown during dma_buf_attach()/dma_buf_detach(), this patch deletes the DMA-BUF attachment statistics from sysfs. Fixes: bdb8d06dfefd (dmabuf: Add the capability to expose DMA-BUF stats in sysfs) Signed-off-by: Hridya Valsaraju <hridya(a)google.com> --- Hello all, One of our partners recently reported a perf regression in a driver which was being caused due to the overhead of setup/teardown of the sysfs attachment statistics in the dma_buf_attach()/dma_buf_detach() invocations. Since the driver's latency requirements were of the order of microseconds(~100us), the overhead was significant. Since this indicates that the solution might not work well for all DMA-BUF importers, I think the right thing to do might be to delete the same before it reaches upstream and becomes ABI :( I apologize for the inconvenience. This patch is based on the drm-misc-next branch. Please feel free to let me know if you would prefer that I send a full revert and new patch that adds the rest of the statistics. Regards, Hridya .../ABI/testing/sysfs-kernel-dmabuf-buffers | 28 ---- drivers/dma-buf/dma-buf-sysfs-stats.c | 140 +----------------- drivers/dma-buf/dma-buf-sysfs-stats.h | 27 ---- drivers/dma-buf/dma-buf.c | 16 -- include/linux/dma-buf.h | 17 --- 5 files changed, 4 insertions(+), 224 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers index a243984ed420..5d3bc997dc64 100644 --- a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers +++ b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers @@ -22,31 +22,3 @@ KernelVersion: v5.13 Contact: Hridya Valsaraju <hridya(a)google.com> Description: This file is read-only and specifies the size of the DMA-BUF in bytes. - -What: /sys/kernel/dmabuf/buffers/<inode_number>/attachments -Date: May 2021 -KernelVersion: v5.13 -Contact: Hridya Valsaraju <hridya(a)google.com> -Description: This directory will contain subdirectories representing every - attachment of the DMA-BUF. - -What: /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attachment_uid> -Date: May 2021 -KernelVersion: v5.13 -Contact: Hridya Valsaraju <hridya(a)google.com> -Description: This directory will contain information on the attached device - and the number of current distinct device mappings. - -What: /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attachment_uid>/device -Date: May 2021 -KernelVersion: v5.13 -Contact: Hridya Valsaraju <hridya(a)google.com> -Description: This file is read-only and is a symlink to the attached device's - sysfs entry. - -What: /sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attachment_uid>/map_counter -Date: May 2021 -KernelVersion: v5.13 -Contact: Hridya Valsaraju <hridya(a)google.com> -Description: This file is read-only and contains a map_counter indicating the - number of distinct device mappings of the attachment. diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c index a2638e84199c..053baadcada9 100644 --- a/drivers/dma-buf/dma-buf-sysfs-stats.c +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c @@ -40,14 +40,11 @@ * * * ``/sys/kernel/dmabuf/buffers/<inode_number>/exporter_name`` * * ``/sys/kernel/dmabuf/buffers/<inode_number>/size`` - * * ``/sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/device`` - * * ``/sys/kernel/dmabuf/buffers/<inode_number>/attachments/<attach_uid>/map_counter`` * - * The information in the interface can also be used to derive per-exporter and - * per-device usage statistics. The data from the interface can be gathered - * on error conditions or other important events to provide a snapshot of - * DMA-BUF usage. It can also be collected periodically by telemetry to monitor - * various metrics. + * The information in the interface can also be used to derive per-exporter + * statistics. The data from the interface can be gathered on error conditions + * or other important events to provide a snapshot of DMA-BUF usage. + * It can also be collected periodically by telemetry to monitor various metrics. * * Detailed documentation about the interface is present in * Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers. @@ -121,120 +118,6 @@ static struct kobj_type dma_buf_ktype = { .default_groups = dma_buf_stats_default_groups, }; -#define to_dma_buf_attach_entry_from_kobj(x) container_of(x, struct dma_buf_attach_sysfs_entry, kobj) - -struct dma_buf_attach_stats_attribute { - struct attribute attr; - ssize_t (*show)(struct dma_buf_attach_sysfs_entry *sysfs_entry, - struct dma_buf_attach_stats_attribute *attr, char *buf); -}; -#define to_dma_buf_attach_stats_attr(x) container_of(x, struct dma_buf_attach_stats_attribute, attr) - -static ssize_t dma_buf_attach_stats_attribute_show(struct kobject *kobj, - struct attribute *attr, - char *buf) -{ - struct dma_buf_attach_stats_attribute *attribute; - struct dma_buf_attach_sysfs_entry *sysfs_entry; - - attribute = to_dma_buf_attach_stats_attr(attr); - sysfs_entry = to_dma_buf_attach_entry_from_kobj(kobj); - - if (!attribute->show) - return -EIO; - - return attribute->show(sysfs_entry, attribute, buf); -} - -static const struct sysfs_ops dma_buf_attach_stats_sysfs_ops = { - .show = dma_buf_attach_stats_attribute_show, -}; - -static ssize_t map_counter_show(struct dma_buf_attach_sysfs_entry *sysfs_entry, - struct dma_buf_attach_stats_attribute *attr, - char *buf) -{ - return sysfs_emit(buf, "%u\n", sysfs_entry->map_counter); -} - -static struct dma_buf_attach_stats_attribute map_counter_attribute = - __ATTR_RO(map_counter); - -static struct attribute *dma_buf_attach_stats_default_attrs[] = { - &map_counter_attribute.attr, - NULL, -}; -ATTRIBUTE_GROUPS(dma_buf_attach_stats_default); - -static void dma_buf_attach_sysfs_release(struct kobject *kobj) -{ - struct dma_buf_attach_sysfs_entry *sysfs_entry; - - sysfs_entry = to_dma_buf_attach_entry_from_kobj(kobj); - kfree(sysfs_entry); -} - -static struct kobj_type dma_buf_attach_ktype = { - .sysfs_ops = &dma_buf_attach_stats_sysfs_ops, - .release = dma_buf_attach_sysfs_release, - .default_groups = dma_buf_attach_stats_default_groups, -}; - -void dma_buf_attach_stats_teardown(struct dma_buf_attachment *attach) -{ - struct dma_buf_attach_sysfs_entry *sysfs_entry; - - sysfs_entry = attach->sysfs_entry; - if (!sysfs_entry) - return; - - sysfs_delete_link(&sysfs_entry->kobj, &attach->dev->kobj, "device"); - - kobject_del(&sysfs_entry->kobj); - kobject_put(&sysfs_entry->kobj); -} - -int dma_buf_attach_stats_setup(struct dma_buf_attachment *attach, - unsigned int uid) -{ - struct dma_buf_attach_sysfs_entry *sysfs_entry; - int ret; - struct dma_buf *dmabuf; - - if (!attach) - return -EINVAL; - - dmabuf = attach->dmabuf; - - sysfs_entry = kzalloc(sizeof(struct dma_buf_attach_sysfs_entry), - GFP_KERNEL); - if (!sysfs_entry) - return -ENOMEM; - - sysfs_entry->kobj.kset = dmabuf->sysfs_entry->attach_stats_kset; - - attach->sysfs_entry = sysfs_entry; - - ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_attach_ktype, - NULL, "%u", uid); - if (ret) - goto kobj_err; - - ret = sysfs_create_link(&sysfs_entry->kobj, &attach->dev->kobj, - "device"); - if (ret) - goto link_err; - - return 0; - -link_err: - kobject_del(&sysfs_entry->kobj); -kobj_err: - kobject_put(&sysfs_entry->kobj); - attach->sysfs_entry = NULL; - - return ret; -} void dma_buf_stats_teardown(struct dma_buf *dmabuf) { struct dma_buf_sysfs_entry *sysfs_entry; @@ -243,7 +126,6 @@ void dma_buf_stats_teardown(struct dma_buf *dmabuf) if (!sysfs_entry) return; - kset_unregister(sysfs_entry->attach_stats_kset); kobject_del(&sysfs_entry->kobj); kobject_put(&sysfs_entry->kobj); } @@ -290,7 +172,6 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf) { struct dma_buf_sysfs_entry *sysfs_entry; int ret; - struct kset *attach_stats_kset; if (!dmabuf || !dmabuf->file) return -EINVAL; @@ -315,21 +196,8 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf) if (ret) goto err_sysfs_dmabuf; - /* create the directory for attachment stats */ - attach_stats_kset = kset_create_and_add("attachments", - &dmabuf_sysfs_no_uevent_ops, - &sysfs_entry->kobj); - if (!attach_stats_kset) { - ret = -ENOMEM; - goto err_sysfs_attach; - } - - sysfs_entry->attach_stats_kset = attach_stats_kset; - return 0; -err_sysfs_attach: - kobject_del(&sysfs_entry->kobj); err_sysfs_dmabuf: kobject_put(&sysfs_entry->kobj); dmabuf->sysfs_entry = NULL; diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.h b/drivers/dma-buf/dma-buf-sysfs-stats.h index 5f4703249117..a49c6e2650cc 100644 --- a/drivers/dma-buf/dma-buf-sysfs-stats.h +++ b/drivers/dma-buf/dma-buf-sysfs-stats.h @@ -14,23 +14,8 @@ int dma_buf_init_sysfs_statistics(void); void dma_buf_uninit_sysfs_statistics(void); int dma_buf_stats_setup(struct dma_buf *dmabuf); -int dma_buf_attach_stats_setup(struct dma_buf_attachment *attach, - unsigned int uid); -static inline void dma_buf_update_attachment_map_count(struct dma_buf_attachment *attach, - int delta) -{ - struct dma_buf_attach_sysfs_entry *entry = attach->sysfs_entry; - entry->map_counter += delta; -} void dma_buf_stats_teardown(struct dma_buf *dmabuf); -void dma_buf_attach_stats_teardown(struct dma_buf_attachment *attach); -static inline unsigned int dma_buf_update_attach_uid(struct dma_buf *dmabuf) -{ - struct dma_buf_sysfs_entry *entry = dmabuf->sysfs_entry; - - return entry->attachment_uid++; -} #else static inline int dma_buf_init_sysfs_statistics(void) @@ -44,19 +29,7 @@ static inline int dma_buf_stats_setup(struct dma_buf *dmabuf) { return 0; } -static inline int dma_buf_attach_stats_setup(struct dma_buf_attachment *attach, - unsigned int uid) -{ - return 0; -} static inline void dma_buf_stats_teardown(struct dma_buf *dmabuf) {} -static inline void dma_buf_attach_stats_teardown(struct dma_buf_attachment *attach) {} -static inline void dma_buf_update_attachment_map_count(struct dma_buf_attachment *attach, - int delta) {} -static inline unsigned int dma_buf_update_attach_uid(struct dma_buf *dmabuf) -{ - return 0; -} #endif #endif // _DMA_BUF_SYSFS_STATS_H diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 510b42771974..b1a6db71c656 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -738,7 +738,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, { struct dma_buf_attachment *attach; int ret; - unsigned int attach_uid; if (WARN_ON(!dmabuf || !dev)) return ERR_PTR(-EINVAL); @@ -764,13 +763,8 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, } dma_resv_lock(dmabuf->resv, NULL); list_add(&attach->node, &dmabuf->attachments); - attach_uid = dma_buf_update_attach_uid(dmabuf); dma_resv_unlock(dmabuf->resv); - ret = dma_buf_attach_stats_setup(attach, attach_uid); - if (ret) - goto err_sysfs; - /* When either the importer or the exporter can't handle dynamic * mappings we cache the mapping here to avoid issues with the * reservation object lock. @@ -797,7 +791,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, dma_resv_unlock(attach->dmabuf->resv); attach->sgt = sgt; attach->dir = DMA_BIDIRECTIONAL; - dma_buf_update_attachment_map_count(attach, 1 /* delta */); } return attach; @@ -814,7 +807,6 @@ dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev, if (dma_buf_is_dynamic(attach->dmabuf)) dma_resv_unlock(attach->dmabuf->resv); -err_sysfs: dma_buf_detach(dmabuf, attach); return ERR_PTR(ret); } @@ -864,7 +856,6 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) dma_resv_lock(attach->dmabuf->resv, NULL); __unmap_dma_buf(attach, attach->sgt, attach->dir); - dma_buf_update_attachment_map_count(attach, -1 /* delta */); if (dma_buf_is_dynamic(attach->dmabuf)) { dmabuf->ops->unpin(attach); @@ -878,7 +869,6 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) if (dmabuf->ops->detach) dmabuf->ops->detach(dmabuf, attach); - dma_buf_attach_stats_teardown(attach); kfree(attach); } EXPORT_SYMBOL_GPL(dma_buf_detach); @@ -1020,10 +1010,6 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, } } #endif /* CONFIG_DMA_API_DEBUG */ - - if (!IS_ERR(sg_table)) - dma_buf_update_attachment_map_count(attach, 1 /* delta */); - return sg_table; } EXPORT_SYMBOL_GPL(dma_buf_map_attachment); @@ -1061,8 +1047,6 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, if (dma_buf_is_dynamic(attach->dmabuf) && !IS_ENABLED(CONFIG_DMABUF_MOVE_NOTIFY)) dma_buf_unpin(attach); - - dma_buf_update_attachment_map_count(attach, -1 /* delta */); } EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 2b814fde0d11..678b2006be78 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -444,15 +444,6 @@ struct dma_buf { struct dma_buf_sysfs_entry { struct kobject kobj; struct dma_buf *dmabuf; - - /** - * @sysfs_entry.attachment_uid: - * - * This is protected by the dma_resv_lock() on @resv and is - * incremented on each attach. - */ - unsigned int attachment_uid; - struct kset *attach_stats_kset; } *sysfs_entry; #endif }; @@ -504,7 +495,6 @@ struct dma_buf_attach_ops { * @importer_ops: importer operations for this attachment, if provided * dma_buf_map/unmap_attachment() must be called with the dma_resv lock held. * @importer_priv: importer specific attachment data. - * @sysfs_entry: For exposing information about this attachment in sysfs. * * This structure holds the attachment information between the dma_buf buffer * and its user device(s). The list contains one attachment struct per device @@ -525,13 +515,6 @@ struct dma_buf_attachment { const struct dma_buf_attach_ops *importer_ops; void *importer_priv; void *priv; -#ifdef CONFIG_DMABUF_SYSFS_STATS - /* for sysfs stats */ - struct dma_buf_attach_sysfs_entry { - struct kobject kobj; - unsigned int map_counter; - } *sysfs_entry; -#endif }; /** -- 2.32.0.93.g670b81a890-goog

4 years, 7 months

3
3
0 0

[PATCH v4 18/18] dma-resv: Give the docs a do-over

by Daniel Vetter

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence. But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat. - Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-( - Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway. - Main part of the patch is some actual docs for struct dma_resv. Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great. Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either. v2: - Fix a pile of typos (Matt, Jason) - Hammer it in that breaking the rules leads to use-after-free issues around dma-buf sharing (Christian) Reviewed-by: Christian König <christian.koenig(a)amd.com> Cc: Jason Ekstrand <jason(a)jlekstrand.net> Cc: Matthew Auld <matthew.auld(a)intel.com> Reviewed-by: Matthew Auld <matthew.auld(a)intel.com> Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/dma-buf/dma-resv.c | 24 ++++++--- include/linux/dma-buf.h | 7 +++ include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 3 files changed, 124 insertions(+), 11 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index e744fd87c63c..84fbe60629e3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@ * write operations) or N shared fences (read operations). The RCU * mechanism is used to protect read access to fences from locked * write-side updates. + * + * See struct dma_resv for more details. */ DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini); * @num_fences: number of fences we want to add * * Should be called before dma_resv_add_shared_fence(). Must - * be called with obj->lock held. + * be called with @obj locked through dma_resv_lock(). + * + * Note that the preallocated slots need to be re-reserved if @obj is unlocked + * at any time before calling dma_resv_add_shared_fence(). This is validated + * when CONFIG_DEBUG_MUTEXES is enabled. * * RETURNS * Zero for success, or -errno @@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to a shared slot, obj->lock must be held, and + * Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and * dma_resv_reserve_shared() has been called. + * + * See also &dma_resv.fence for a discussion of the semantics. */ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -278,9 +286,11 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence); /** * dma_resv_add_excl_fence - Add an exclusive fence. * @obj: the reservation object - * @fence: the shared fence to add + * @fence: the exclusive fence to add * - * Add a fence to the exclusive slot. The obj->lock must be held. + * Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock(). + * Note that this function replaces all fences attached to @obj, see also + * &dma_resv.fence_excl for a discussion of the semantics. */ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence) * fence * * Callers are not required to hold specific locks, but maybe hold - * dma_resv_lock() already + * dma_resv_lock() already. + * * RETURNS - * true if all fences signaled, else false + * + * True if all fences signaled, else false. */ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 2b814fde0d11..8cc0c55877a6 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -420,6 +420,13 @@ struct dma_buf { * - Dynamic importers should set fences for any access that they can't * disable immediately from their &dma_buf_attach_ops.move_notify * callback. + * + * IMPORTANT: + * + * All drivers must obey the struct dma_resv rules, specifically the + * rules for updating fences, see &dma_resv.fence_excl and + * &dma_resv.fence. If these dependency rules are broken access tracking + * can be lost resulting in use after free issues. */ struct dma_resv *resv; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..9100dd3dc21f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list { /** * struct dma_resv - a reservation object manages fences for a buffer - * @lock: update side lock - * @seq: sequence count for managing RCU read-side synchronization - * @fence_excl: the exclusive fence, if there is one currently - * @fence: list of current shared fences + * + * There are multiple uses for this, with sometimes slightly different rules in + * how the fence slots are used. + * + * One use is to synchronize cross-driver access to a struct dma_buf, either for + * dynamic buffer management or just to handle implicit synchronization between + * different users of the buffer in userspace. See &dma_buf.resv for a more + * in-depth discussion. + * + * The other major use is to manage access and locking within a driver in a + * buffer based memory manager. struct ttm_buffer_object is the canonical + * example here, since this is where reservation objects originated from. But + * use in drivers is spreading and some drivers also manage struct + * drm_gem_object with the same scheme. */ struct dma_resv { + /** + * @lock: + * + * Update side lock. Don't use directly, instead use the wrapper + * functions like dma_resv_lock() and dma_resv_unlock(). + * + * Drivers which use the reservation object to manage memory dynamically + * also use this lock to protect buffer object state like placement, + * allocation policies or throughout command submission. + */ struct ww_mutex lock; + + /** + * @seq: + * + * Sequence count for managing RCU read-side synchronization, allows + * read-only access to @fence_excl and @fence while ensuring we take a + * consistent snapshot. + */ seqcount_ww_mutex_t seq; + /** + * @fence_excl: + * + * The exclusive fence, if there is one currently. + * + * There are two ways to update this fence: + * + * - First by calling dma_resv_add_excl_fence(), which replaces all + * fences attached to the reservation object. To guarantee that no + * fences are lost, this new fence must signal only after all previous + * fences, both shared and exclusive, have signalled. In some cases it + * is convenient to achieve that by attaching a struct dma_fence_array + * with all the new and old fences. + * + * - Alternatively the fence can be set directly, which leaves the + * shared fences unchanged. To guarantee that no fences are lost, this + * new fence must signal only after the previous exclusive fence has + * signalled. Since the shared fences are staying intact, it is not + * necessary to maintain any ordering against those. If semantically + * only a new access is added without actually treating the previous + * one as a dependency the exclusive fences can be strung together + * using struct dma_fence_chain. + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_fence __rcu *fence_excl; + + /** + * @fence: + * + * List of current shared fences. + * + * There are no ordering constraints of shared fences against the + * exclusive fence slot. If a waiter needs to wait for all access, it + * has to wait for both sets of fences to signal. + * + * A new fence is added by calling dma_resv_add_shared_fence(). Since + * this often needs to be done past the point of no return in command + * submission it cannot fail, and therefore sufficient slots need to be + * reserved by calling dma_resv_reserve_shared(). + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_resv_list __rcu *fence; }; @@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {} * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj. + * + * Unlocked by calling dma_resv_unlock(). + * + * See also dma_resv_lock_interruptible() for the interruptible variant. */ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj, * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on + * @obj. + * + * Unlocked by calling dma_resv_unlock(). */ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, * Acquires the reservation object after a die case. This function * will sleep until the lock becomes available. See dma_resv_lock() as * well. + * + * See also dma_resv_lock_slow_interruptible() for the interruptible variant. */ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj, * if they overlap with a writer. * * Also note that since no context is provided, no deadlock protection is - * possible. + * possible, which is also not needed for a trylock. * * Returns true if the lock was acquired, false otherwise. */ @@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj) * * Returns the context used to lock a reservation object or NULL if no context * was used or the object is not locked at all. + * + * WARNING: This interface is pretty horrible, but TTM needs it because it + * doesn't pass the struct ww_acquire_ctx around in some very long callchains. + * Everyone else just uses it to check whether they're holding a reservation or + * not. */ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v4 11/18] drm/gem: Delete gem array fencing helpers

by Daniel Vetter

Integrated into the scheduler now and all users converted over. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: David Airlie <airlied(a)linux.ie> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/drm_gem.c | 96 --------------------------------------- include/drm/drm_gem.h | 5 -- 2 files changed, 101 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 68deb1de8235..24d49a2636e0 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1294,99 +1294,3 @@ drm_gem_unlock_reservations(struct drm_gem_object **objs, int count, ww_acquire_fini(acquire_ctx); } EXPORT_SYMBOL(drm_gem_unlock_reservations); - -/** - * drm_gem_fence_array_add - Adds the fence to an array of fences to be - * waited on, deduplicating fences from the same context. - * - * @fence_array: array of dma_fence * for the job to block on. - * @fence: the dma_fence to add to the list of dependencies. - * - * This functions consumes the reference for @fence both on success and error - * cases. - * - * Returns: - * 0 on success, or an error on failing to expand the array. - */ -int drm_gem_fence_array_add(struct xarray *fence_array, - struct dma_fence *fence) -{ - struct dma_fence *entry; - unsigned long index; - u32 id = 0; - int ret; - - if (!fence) - return 0; - - /* Deduplicate if we already depend on a fence from the same context. - * This lets the size of the array of deps scale with the number of - * engines involved, rather than the number of BOs. - */ - xa_for_each(fence_array, index, entry) { - if (entry->context != fence->context) - continue; - - if (dma_fence_is_later(fence, entry)) { - dma_fence_put(entry); - xa_store(fence_array, index, fence, GFP_KERNEL); - } else { - dma_fence_put(fence); - } - return 0; - } - - ret = xa_alloc(fence_array, &id, fence, xa_limit_32b, GFP_KERNEL); - if (ret != 0) - dma_fence_put(fence); - - return ret; -} -EXPORT_SYMBOL(drm_gem_fence_array_add); - -/** - * drm_gem_fence_array_add_implicit - Adds the implicit dependencies tracked - * in the GEM object's reservation object to an array of dma_fences for use in - * scheduling a rendering job. - * - * This should be called after drm_gem_lock_reservations() on your array of - * GEM objects used in the job but before updating the reservations with your - * own fences. - * - * @fence_array: array of dma_fence * for the job to block on. - * @obj: the gem object to add new dependencies from. - * @write: whether the job might write the object (so we need to depend on - * shared fences in the reservation object). - */ -int drm_gem_fence_array_add_implicit(struct xarray *fence_array, - struct drm_gem_object *obj, - bool write) -{ - int ret; - struct dma_fence **fences; - unsigned int i, fence_count; - - if (!write) { - struct dma_fence *fence = - dma_resv_get_excl_unlocked(obj->resv); - - return drm_gem_fence_array_add(fence_array, fence); - } - - ret = dma_resv_get_fences(obj->resv, NULL, - &fence_count, &fences); - if (ret || !fence_count) - return ret; - - for (i = 0; i < fence_count; i++) { - ret = drm_gem_fence_array_add(fence_array, fences[i]); - if (ret) - break; - } - - for (; i < fence_count; i++) - dma_fence_put(fences[i]); - kfree(fences); - return ret; -} -EXPORT_SYMBOL(drm_gem_fence_array_add_implicit); diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..6d5e33b89074 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -409,11 +409,6 @@ int drm_gem_lock_reservations(struct drm_gem_object **objs, int count, struct ww_acquire_ctx *acquire_ctx); void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count, struct ww_acquire_ctx *acquire_ctx); -int drm_gem_fence_array_add(struct xarray *fence_array, - struct dma_fence *fence); -int drm_gem_fence_array_add_implicit(struct xarray *fence_array, - struct drm_gem_object *obj, - bool write); int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset); -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v4 10/18] drm/etnaviv: Use scheduler dependency handling

by Daniel Vetter

We need to pull the drm_sched_job_init much earlier, but that's very minor surgery. v2: Actually fix up cleanup paths by calling drm_sched_job_init, which I wanted to to in the previous round (and did, for all other drivers). Spotted by Lucas. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Lucas Stach <l.stach(a)pengutronix.de> Cc: Russell King <linux+etnaviv(a)armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner(a)gmail.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: etnaviv(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/etnaviv/etnaviv_gem.h | 5 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 58 +++++++++--------- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 63 +------------------- drivers/gpu/drm/etnaviv/etnaviv_sched.h | 3 +- 4 files changed, 35 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h index 98e60df882b6..63688e6e4580 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo { u64 va; struct etnaviv_gem_object *obj; struct etnaviv_vram_mapping *mapping; - struct dma_fence *excl; - unsigned int nr_shared; - struct dma_fence **shared; }; /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc, @@ -95,7 +92,7 @@ struct etnaviv_gem_submit { struct etnaviv_file_private *ctx; struct etnaviv_gpu *gpu; struct etnaviv_iommu_context *mmu_context, *prev_mmu_context; - struct dma_fence *out_fence, *in_fence; + struct dma_fence *out_fence; int out_fence_id; struct list_head node; /* GPU active submit list */ struct etnaviv_cmdbuf cmdbuf; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4dd7d9d541c0..5b97ce1299ad 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue; - if (bo->flags & ETNA_SUBMIT_BO_WRITE) { - ret = dma_resv_get_fences(robj, &bo->excl, - &bo->nr_shared, - &bo->shared); - if (ret) - return ret; - } else { - bo->excl = dma_resv_get_excl_unlocked(robj); - } - + ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base, + bo->flags & ETNA_SUBMIT_BO_WRITE); + if (ret) + return ret; } return ret; @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref) wake_up_all(&submit->gpu->fence_event); - if (submit->in_fence) - dma_fence_put(submit->in_fence); if (submit->out_fence) { /* first remove from IDR, so fence can not be found anymore */ mutex_lock(&submit->gpu->fence_lock); @@ -529,7 +521,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, ret = etnaviv_cmdbuf_init(priv->cmdbuf_suballoc, &submit->cmdbuf, ALIGN(args->stream_size, 8) + 8); if (ret) - goto err_submit_objects; + goto err_submit_put; submit->ctx = file->driver_priv; etnaviv_iommu_context_get(submit->ctx->mmu); @@ -537,51 +529,61 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, submit->exec_state = args->exec_state; submit->flags = args->flags; + ret = drm_sched_job_init(&submit->sched_job, + &ctx->sched_entity[args->pipe], + submit->ctx); + if (ret) + goto err_submit_put; + ret = submit_lookup_objects(submit, file, bos, args->nr_bos); if (ret) - goto err_submit_objects; + goto err_submit_job; if ((priv->mmu_global->version != ETNAVIV_IOMMU_V2) && !etnaviv_cmd_validate_one(gpu, stream, args->stream_size / 4, relocs, args->nr_relocs)) { ret = -EINVAL; - goto err_submit_objects; + goto err_submit_job; } if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) { - submit->in_fence = sync_file_get_fence(args->fence_fd); - if (!submit->in_fence) { + struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd); + if (!in_fence) { ret = -EINVAL; - goto err_submit_objects; + goto err_submit_job; } + + ret = drm_sched_job_await_fence(&submit->sched_job, in_fence); + if (ret) + goto err_submit_job; } ret = submit_pin_objects(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_reloc(submit, stream, args->stream_size / 4, relocs, args->nr_relocs); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_perfmon_validate(submit, args->exec_state, pmrs); if (ret) - goto err_submit_objects; + goto err_submit_job; memcpy(submit->cmdbuf.vaddr, stream, args->stream_size); ret = submit_lock_objects(submit, &ticket); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_fence_sync(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; - ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit); + ret = etnaviv_sched_push_job(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; submit_attach_object_fences(submit); @@ -595,7 +597,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, sync_file = sync_file_create(submit->out_fence); if (!sync_file) { ret = -ENOMEM; - goto err_submit_objects; + goto err_submit_job; } fd_install(out_fence_fd, sync_file->file); } @@ -603,7 +605,9 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, args->fence_fd = out_fence_fd; args->fence = submit->out_fence_id; -err_submit_objects: +err_submit_job: + drm_sched_job_cleanup(&submit->sched_job); +err_submit_put: etnaviv_submit_put(submit); err_submit_ww_acquire: diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..2bbbd6ccc95e 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444); static int etnaviv_hw_jobs_limit = 4; module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444); -static struct dma_fence * -etnaviv_sched_dependency(struct drm_sched_job *sched_job, - struct drm_sched_entity *entity) -{ - struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); - struct dma_fence *fence; - int i; - - if (unlikely(submit->in_fence)) { - fence = submit->in_fence; - submit->in_fence = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (i = 0; i < submit->nr_bos; i++) { - struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; - int j; - - if (bo->excl) { - fence = bo->excl; - bo->excl = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (j = 0; j < bo->nr_shared; j++) { - if (!bo->shared[j]) - continue; - - fence = bo->shared[j]; - bo->shared[j] = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - kfree(bo->shared); - bo->nr_shared = 0; - bo->shared = NULL; - } - - return NULL; -} - static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); @@ -140,29 +88,22 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) } static const struct drm_sched_backend_ops etnaviv_sched_ops = { - .dependency = etnaviv_sched_dependency, .run_job = etnaviv_sched_run_job, .timedout_job = etnaviv_sched_timedout_job, .free_job = etnaviv_sched_free_job, }; -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit) +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit) { int ret = 0; /* * Hold the fence lock across the whole operation to avoid jobs being * pushed out of order with regard to their sched fence seqnos as - * allocated in drm_sched_job_init. + * allocated in drm_sched_job_arm. */ mutex_lock(&submit->gpu->fence_lock); - ret = drm_sched_job_init(&submit->sched_job, sched_entity, - submit->ctx); - if (ret) - goto out_unlock; - drm_sched_job_arm(&submit->sched_job); submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h index c0a6796e22c9..baebfa069afc 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job) int etnaviv_sched_init(struct etnaviv_gpu *gpu); void etnaviv_sched_fini(struct etnaviv_gpu *gpu); -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit); +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit); #endif /* __ETNAVIV_SCHED_H__ */ -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v4 07/18] drm/lima: use scheduler dependency tracking

by Daniel Vetter

Nothing special going on here. Aside reviewing the code, it seems like drm_sched_job_arm() should be moved into lima_sched_context_queue_task and put under some mutex together with drm_sched_push_job(). See the kerneldoc for drm_sched_push_job(). Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Qiang Yu <yuq825(a)gmail.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: lima(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/lima/lima_gem.c | 4 ++-- drivers/gpu/drm/lima/lima_sched.c | 21 --------------------- drivers/gpu/drm/lima/lima_sched.h | 3 --- 3 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index c528f40981bb..e54a88d5037a 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -267,7 +267,7 @@ static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo, if (explicit) return 0; - return drm_gem_fence_array_add_implicit(&task->deps, &bo->base.base, write); + return drm_sched_job_await_implicit(&task->base, &bo->base.base, write); } static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit) @@ -285,7 +285,7 @@ static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit) if (err) return err; - err = drm_gem_fence_array_add(&submit->task->deps, fence); + err = drm_sched_job_await_fence(&submit->task->base, fence); if (err) { dma_fence_put(fence); return err; diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index e968b5a8f0b0..99d5f6f1a882 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -134,24 +134,15 @@ int lima_sched_task_init(struct lima_sched_task *task, task->num_bos = num_bos; task->vm = lima_vm_get(vm); - xa_init_flags(&task->deps, XA_FLAGS_ALLOC); - return 0; } void lima_sched_task_fini(struct lima_sched_task *task) { - struct dma_fence *fence; - unsigned long index; int i; drm_sched_job_cleanup(&task->base); - xa_for_each(&task->deps, index, fence) { - dma_fence_put(fence); - } - xa_destroy(&task->deps); - if (task->bos) { for (i = 0; i < task->num_bos; i++) drm_gem_object_put(&task->bos[i]->base.base); @@ -186,17 +177,6 @@ struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task) return fence; } -static struct dma_fence *lima_sched_dependency(struct drm_sched_job *job, - struct drm_sched_entity *entity) -{ - struct lima_sched_task *task = to_lima_task(job); - - if (!xa_empty(&task->deps)) - return xa_erase(&task->deps, task->last_dep++); - - return NULL; -} - static int lima_pm_busy(struct lima_device *ldev) { int ret; @@ -472,7 +452,6 @@ static void lima_sched_free_job(struct drm_sched_job *job) } static const struct drm_sched_backend_ops lima_sched_ops = { - .dependency = lima_sched_dependency, .run_job = lima_sched_run_job, .timedout_job = lima_sched_timedout_job, .free_job = lima_sched_free_job, diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h index ac70006b0e26..6a11764d87b3 100644 --- a/drivers/gpu/drm/lima/lima_sched.h +++ b/drivers/gpu/drm/lima/lima_sched.h @@ -23,9 +23,6 @@ struct lima_sched_task { struct lima_vm *vm; void *frame; - struct xarray deps; - unsigned long last_dep; - struct lima_bo **bos; int num_bos; -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v4 06/18] drm/panfrost: use scheduler dependency tracking

by Daniel Vetter

Just deletes some code that's now more shared. Note that thanks to the split into drm_sched_job_init/arm we can now easily pull the _init() part from under the submission lock way ahead where we're adding the sync file in-fences as dependencies. v2: Correctly clean up the partially set up job, now that job_init() and job_arm() are apart (Emma). Reviewed-by: Steven Price <steven.price(a)arm.com> Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Rob Herring <robh(a)kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com> Cc: Steven Price <steven.price(a)arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig(a)collabora.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/panfrost/panfrost_drv.c | 16 ++++++++--- drivers/gpu/drm/panfrost/panfrost_job.c | 37 +++---------------------- drivers/gpu/drm/panfrost/panfrost_job.h | 5 +--- 3 files changed, 17 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 1ffaef5ec5ff..9f53bea07d61 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -218,7 +218,7 @@ panfrost_copy_in_sync(struct drm_device *dev, if (ret) goto fail; - ret = drm_gem_fence_array_add(&job->deps, fence); + ret = drm_sched_job_await_fence(&job->base, fence); if (ret) goto fail; @@ -236,7 +236,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, struct drm_panfrost_submit *args = data; struct drm_syncobj *sync_out = NULL; struct panfrost_job *job; - int ret = 0; + int ret = 0, slot; if (!args->jc) return -EINVAL; @@ -258,14 +258,20 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, kref_init(&job->refcount); - xa_init_flags(&job->deps, XA_FLAGS_ALLOC); - job->pfdev = pfdev; job->jc = args->jc; job->requirements = args->requirements; job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev); job->file_priv = file->driver_priv; + slot = panfrost_job_get_slot(job); + + ret = drm_sched_job_init(&job->base, + &job->file_priv->sched_entity[slot], + NULL); + if (ret) + goto fail_job_put; + ret = panfrost_copy_in_sync(dev, file, args, job); if (ret) goto fail_job; @@ -283,6 +289,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, drm_syncobj_replace_fence(sync_out, job->render_done_fence); fail_job: + drm_sched_job_cleanup(&job->base); +fail_job_put: panfrost_job_put(job); fail_out_sync: if (sync_out) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 4bc962763e1f..86c843d8822e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -102,7 +102,7 @@ static struct dma_fence *panfrost_fence_create(struct panfrost_device *pfdev, in return &fence->base; } -static int panfrost_job_get_slot(struct panfrost_job *job) +int panfrost_job_get_slot(struct panfrost_job *job) { /* JS0: fragment jobs. * JS1: vertex/tiler jobs @@ -242,13 +242,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int bo_count, - struct xarray *deps) + struct drm_sched_job *job) { int i, ret; for (i = 0; i < bo_count; i++) { /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(deps, bos[i], true); + ret = drm_sched_job_await_implicit(job, bos[i], true); if (ret) return ret; } @@ -269,31 +269,21 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos, int panfrost_job_push(struct panfrost_job *job) { struct panfrost_device *pfdev = job->pfdev; - int slot = panfrost_job_get_slot(job); - struct drm_sched_entity *entity = &job->file_priv->sched_entity[slot]; struct ww_acquire_ctx acquire_ctx; int ret = 0; - ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx); if (ret) return ret; mutex_lock(&pfdev->sched_lock); - - ret = drm_sched_job_init(&job->base, entity, NULL); - if (ret) { - mutex_unlock(&pfdev->sched_lock); - goto unlock; - } - drm_sched_job_arm(&job->base); job->render_done_fence = dma_fence_get(&job->base.s_fence->finished); ret = panfrost_acquire_object_fences(job->bos, job->bo_count, - &job->deps); + &job->base); if (ret) { mutex_unlock(&pfdev->sched_lock); goto unlock; @@ -318,15 +308,8 @@ static void panfrost_job_cleanup(struct kref *ref) { struct panfrost_job *job = container_of(ref, struct panfrost_job, refcount); - struct dma_fence *fence; - unsigned long index; unsigned int i; - xa_for_each(&job->deps, index, fence) { - dma_fence_put(fence); - } - xa_destroy(&job->deps); - dma_fence_put(job->done_fence); dma_fence_put(job->render_done_fence); @@ -365,17 +348,6 @@ static void panfrost_job_free(struct drm_sched_job *sched_job) panfrost_job_put(job); } -static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job, - struct drm_sched_entity *s_entity) -{ - struct panfrost_job *job = to_panfrost_job(sched_job); - - if (!xa_empty(&job->deps)) - return xa_erase(&job->deps, job->last_dep++); - - return NULL; -} - static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) { struct panfrost_job *job = to_panfrost_job(sched_job); @@ -765,7 +737,6 @@ static void panfrost_reset_work(struct work_struct *work) } static const struct drm_sched_backend_ops panfrost_sched_ops = { - .dependency = panfrost_job_dependency, .run_job = panfrost_job_run, .timedout_job = panfrost_job_timedout, .free_job = panfrost_job_free diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index 82306a03b57e..77e6d0e6f612 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -19,10 +19,6 @@ struct panfrost_job { struct panfrost_device *pfdev; struct panfrost_file_priv *file_priv; - /* Contains both explicit and implicit fences */ - struct xarray deps; - unsigned long last_dep; - /* Fence to be signaled by IRQ handler when the job is complete. */ struct dma_fence *done_fence; @@ -42,6 +38,7 @@ int panfrost_job_init(struct panfrost_device *pfdev); void panfrost_job_fini(struct panfrost_device *pfdev); int panfrost_job_open(struct panfrost_file_priv *panfrost_priv); void panfrost_job_close(struct panfrost_file_priv *panfrost_priv); +int panfrost_job_get_slot(struct panfrost_job *job); int panfrost_job_push(struct panfrost_job *job); void panfrost_job_put(struct panfrost_job *job); void panfrost_job_enable_interrupts(struct panfrost_device *pfdev); -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 20/20] dma-resv: Give the docs a do-over

by Daniel Vetter

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence. But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat. - Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-( - Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway. - Main part of the patch is some actual docs for struct dma_resv. Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great. Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either. v2: - Fix a pile of typos (Matt, Jason) - Hammer it in that breaking the rules leads to use-after-free issues around dma-buf sharing (Christian) Reviewed-by: Christian König <christian.koenig(a)amd.com> Cc: Jason Ekstrand <jason(a)jlekstrand.net> Cc: Matthew Auld <matthew.auld(a)intel.com> Reviewed-by: Matthew Auld <matthew.auld(a)intel.com> Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/dma-buf/dma-resv.c | 24 ++++++--- include/linux/dma-buf.h | 7 +++ include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 3 files changed, 124 insertions(+), 11 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..a3acb6479ddf 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@ * write operations) or N shared fences (read operations). The RCU * mechanism is used to protect read access to fences from locked * write-side updates. + * + * See struct dma_resv for more details. */ DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini); * @num_fences: number of fences we want to add * * Should be called before dma_resv_add_shared_fence(). Must - * be called with obj->lock held. + * be called with @obj locked through dma_resv_lock(). + * + * Note that the preallocated slots need to be re-reserved if @obj is unlocked + * at any time before calling dma_resv_add_shared_fence(). This is validated + * when CONFIG_DEBUG_MUTEXES is enabled. * * RETURNS * Zero for success, or -errno @@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to a shared slot, obj->lock must be held, and + * Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and * dma_resv_reserve_shared() has been called. + * + * See also &dma_resv.fence for a discussion of the semantics. */ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -278,9 +286,11 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence); /** * dma_resv_add_excl_fence - Add an exclusive fence. * @obj: the reservation object - * @fence: the shared fence to add + * @fence: the exclusive fence to add * - * Add a fence to the exclusive slot. The obj->lock must be held. + * Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock(). + * Note that this function replaces all fences attached to @obj, see also + * &dma_resv.fence_excl for a discussion of the semantics. */ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence) * fence * * Callers are not required to hold specific locks, but maybe hold - * dma_resv_lock() already + * dma_resv_lock() already. + * * RETURNS - * true if all fences signaled, else false + * + * True if all fences signaled, else false. */ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 2b814fde0d11..8cc0c55877a6 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -420,6 +420,13 @@ struct dma_buf { * - Dynamic importers should set fences for any access that they can't * disable immediately from their &dma_buf_attach_ops.move_notify * callback. + * + * IMPORTANT: + * + * All drivers must obey the struct dma_resv rules, specifically the + * rules for updating fences, see &dma_resv.fence_excl and + * &dma_resv.fence. If these dependency rules are broken access tracking + * can be lost resulting in use after free issues. */ struct dma_resv *resv; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..9100dd3dc21f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list { /** * struct dma_resv - a reservation object manages fences for a buffer - * @lock: update side lock - * @seq: sequence count for managing RCU read-side synchronization - * @fence_excl: the exclusive fence, if there is one currently - * @fence: list of current shared fences + * + * There are multiple uses for this, with sometimes slightly different rules in + * how the fence slots are used. + * + * One use is to synchronize cross-driver access to a struct dma_buf, either for + * dynamic buffer management or just to handle implicit synchronization between + * different users of the buffer in userspace. See &dma_buf.resv for a more + * in-depth discussion. + * + * The other major use is to manage access and locking within a driver in a + * buffer based memory manager. struct ttm_buffer_object is the canonical + * example here, since this is where reservation objects originated from. But + * use in drivers is spreading and some drivers also manage struct + * drm_gem_object with the same scheme. */ struct dma_resv { + /** + * @lock: + * + * Update side lock. Don't use directly, instead use the wrapper + * functions like dma_resv_lock() and dma_resv_unlock(). + * + * Drivers which use the reservation object to manage memory dynamically + * also use this lock to protect buffer object state like placement, + * allocation policies or throughout command submission. + */ struct ww_mutex lock; + + /** + * @seq: + * + * Sequence count for managing RCU read-side synchronization, allows + * read-only access to @fence_excl and @fence while ensuring we take a + * consistent snapshot. + */ seqcount_ww_mutex_t seq; + /** + * @fence_excl: + * + * The exclusive fence, if there is one currently. + * + * There are two ways to update this fence: + * + * - First by calling dma_resv_add_excl_fence(), which replaces all + * fences attached to the reservation object. To guarantee that no + * fences are lost, this new fence must signal only after all previous + * fences, both shared and exclusive, have signalled. In some cases it + * is convenient to achieve that by attaching a struct dma_fence_array + * with all the new and old fences. + * + * - Alternatively the fence can be set directly, which leaves the + * shared fences unchanged. To guarantee that no fences are lost, this + * new fence must signal only after the previous exclusive fence has + * signalled. Since the shared fences are staying intact, it is not + * necessary to maintain any ordering against those. If semantically + * only a new access is added without actually treating the previous + * one as a dependency the exclusive fences can be strung together + * using struct dma_fence_chain. + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_fence __rcu *fence_excl; + + /** + * @fence: + * + * List of current shared fences. + * + * There are no ordering constraints of shared fences against the + * exclusive fence slot. If a waiter needs to wait for all access, it + * has to wait for both sets of fences to signal. + * + * A new fence is added by calling dma_resv_add_shared_fence(). Since + * this often needs to be done past the point of no return in command + * submission it cannot fail, and therefore sufficient slots need to be + * reserved by calling dma_resv_reserve_shared(). + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_resv_list __rcu *fence; }; @@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {} * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj. + * + * Unlocked by calling dma_resv_unlock(). + * + * See also dma_resv_lock_interruptible() for the interruptible variant. */ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj, * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on + * @obj. + * + * Unlocked by calling dma_resv_unlock(). */ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, * Acquires the reservation object after a die case. This function * will sleep until the lock becomes available. See dma_resv_lock() as * well. + * + * See also dma_resv_lock_slow_interruptible() for the interruptible variant. */ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj, * if they overlap with a writer. * * Also note that since no context is provided, no deadlock protection is - * possible. + * possible, which is also not needed for a trylock. * * Returns true if the lock was acquired, false otherwise. */ @@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj) * * Returns the context used to lock a reservation object or NULL if no context * was used or the object is not locked at all. + * + * WARNING: This interface is pretty horrible, but TTM needs it because it + * doesn't pass the struct ww_acquire_ctx around in some very long callchains. + * Everyone else just uses it to check whether they're holding a reservation or + * not. */ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 12/20] drm/gem: Delete gem array fencing helpers

by Daniel Vetter

Integrated into the scheduler now and all users converted over. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: David Airlie <airlied(a)linux.ie> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/drm_gem.c | 96 --------------------------------------- include/drm/drm_gem.h | 5 -- 2 files changed, 101 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 68deb1de8235..24d49a2636e0 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1294,99 +1294,3 @@ drm_gem_unlock_reservations(struct drm_gem_object **objs, int count, ww_acquire_fini(acquire_ctx); } EXPORT_SYMBOL(drm_gem_unlock_reservations); - -/** - * drm_gem_fence_array_add - Adds the fence to an array of fences to be - * waited on, deduplicating fences from the same context. - * - * @fence_array: array of dma_fence * for the job to block on. - * @fence: the dma_fence to add to the list of dependencies. - * - * This functions consumes the reference for @fence both on success and error - * cases. - * - * Returns: - * 0 on success, or an error on failing to expand the array. - */ -int drm_gem_fence_array_add(struct xarray *fence_array, - struct dma_fence *fence) -{ - struct dma_fence *entry; - unsigned long index; - u32 id = 0; - int ret; - - if (!fence) - return 0; - - /* Deduplicate if we already depend on a fence from the same context. - * This lets the size of the array of deps scale with the number of - * engines involved, rather than the number of BOs. - */ - xa_for_each(fence_array, index, entry) { - if (entry->context != fence->context) - continue; - - if (dma_fence_is_later(fence, entry)) { - dma_fence_put(entry); - xa_store(fence_array, index, fence, GFP_KERNEL); - } else { - dma_fence_put(fence); - } - return 0; - } - - ret = xa_alloc(fence_array, &id, fence, xa_limit_32b, GFP_KERNEL); - if (ret != 0) - dma_fence_put(fence); - - return ret; -} -EXPORT_SYMBOL(drm_gem_fence_array_add); - -/** - * drm_gem_fence_array_add_implicit - Adds the implicit dependencies tracked - * in the GEM object's reservation object to an array of dma_fences for use in - * scheduling a rendering job. - * - * This should be called after drm_gem_lock_reservations() on your array of - * GEM objects used in the job but before updating the reservations with your - * own fences. - * - * @fence_array: array of dma_fence * for the job to block on. - * @obj: the gem object to add new dependencies from. - * @write: whether the job might write the object (so we need to depend on - * shared fences in the reservation object). - */ -int drm_gem_fence_array_add_implicit(struct xarray *fence_array, - struct drm_gem_object *obj, - bool write) -{ - int ret; - struct dma_fence **fences; - unsigned int i, fence_count; - - if (!write) { - struct dma_fence *fence = - dma_resv_get_excl_unlocked(obj->resv); - - return drm_gem_fence_array_add(fence_array, fence); - } - - ret = dma_resv_get_fences(obj->resv, NULL, - &fence_count, &fences); - if (ret || !fence_count) - return ret; - - for (i = 0; i < fence_count; i++) { - ret = drm_gem_fence_array_add(fence_array, fences[i]); - if (ret) - break; - } - - for (; i < fence_count; i++) - dma_fence_put(fences[i]); - kfree(fences); - return ret; -} -EXPORT_SYMBOL(drm_gem_fence_array_add_implicit); diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index 240049566592..6d5e33b89074 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -409,11 +409,6 @@ int drm_gem_lock_reservations(struct drm_gem_object **objs, int count, struct ww_acquire_ctx *acquire_ctx); void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count, struct ww_acquire_ctx *acquire_ctx); -int drm_gem_fence_array_add(struct xarray *fence_array, - struct dma_fence *fence); -int drm_gem_fence_array_add_implicit(struct xarray *fence_array, - struct drm_gem_object *obj, - bool write); int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev, u32 handle, u64 *offset); -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 11/20] drm/etnaviv: Use scheduler dependency handling

by Daniel Vetter

We need to pull the drm_sched_job_init much earlier, but that's very minor surgery. v2: Actually fix up cleanup paths by calling drm_sched_job_init, which I wanted to to in the previous round (and did, for all other drivers). Spotted by Lucas. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Lucas Stach <l.stach(a)pengutronix.de> Cc: Russell King <linux+etnaviv(a)armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner(a)gmail.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: etnaviv(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/etnaviv/etnaviv_gem.h | 5 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 58 +++++++++--------- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 63 +------------------- drivers/gpu/drm/etnaviv/etnaviv_sched.h | 3 +- 4 files changed, 35 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h index 98e60df882b6..63688e6e4580 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo { u64 va; struct etnaviv_gem_object *obj; struct etnaviv_vram_mapping *mapping; - struct dma_fence *excl; - unsigned int nr_shared; - struct dma_fence **shared; }; /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc, @@ -95,7 +92,7 @@ struct etnaviv_gem_submit { struct etnaviv_file_private *ctx; struct etnaviv_gpu *gpu; struct etnaviv_iommu_context *mmu_context, *prev_mmu_context; - struct dma_fence *out_fence, *in_fence; + struct dma_fence *out_fence; int out_fence_id; struct list_head node; /* GPU active submit list */ struct etnaviv_cmdbuf cmdbuf; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4dd7d9d541c0..5b97ce1299ad 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue; - if (bo->flags & ETNA_SUBMIT_BO_WRITE) { - ret = dma_resv_get_fences(robj, &bo->excl, - &bo->nr_shared, - &bo->shared); - if (ret) - return ret; - } else { - bo->excl = dma_resv_get_excl_unlocked(robj); - } - + ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base, + bo->flags & ETNA_SUBMIT_BO_WRITE); + if (ret) + return ret; } return ret; @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref) wake_up_all(&submit->gpu->fence_event); - if (submit->in_fence) - dma_fence_put(submit->in_fence); if (submit->out_fence) { /* first remove from IDR, so fence can not be found anymore */ mutex_lock(&submit->gpu->fence_lock); @@ -529,7 +521,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, ret = etnaviv_cmdbuf_init(priv->cmdbuf_suballoc, &submit->cmdbuf, ALIGN(args->stream_size, 8) + 8); if (ret) - goto err_submit_objects; + goto err_submit_put; submit->ctx = file->driver_priv; etnaviv_iommu_context_get(submit->ctx->mmu); @@ -537,51 +529,61 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, submit->exec_state = args->exec_state; submit->flags = args->flags; + ret = drm_sched_job_init(&submit->sched_job, + &ctx->sched_entity[args->pipe], + submit->ctx); + if (ret) + goto err_submit_put; + ret = submit_lookup_objects(submit, file, bos, args->nr_bos); if (ret) - goto err_submit_objects; + goto err_submit_job; if ((priv->mmu_global->version != ETNAVIV_IOMMU_V2) && !etnaviv_cmd_validate_one(gpu, stream, args->stream_size / 4, relocs, args->nr_relocs)) { ret = -EINVAL; - goto err_submit_objects; + goto err_submit_job; } if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) { - submit->in_fence = sync_file_get_fence(args->fence_fd); - if (!submit->in_fence) { + struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd); + if (!in_fence) { ret = -EINVAL; - goto err_submit_objects; + goto err_submit_job; } + + ret = drm_sched_job_await_fence(&submit->sched_job, in_fence); + if (ret) + goto err_submit_job; } ret = submit_pin_objects(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_reloc(submit, stream, args->stream_size / 4, relocs, args->nr_relocs); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_perfmon_validate(submit, args->exec_state, pmrs); if (ret) - goto err_submit_objects; + goto err_submit_job; memcpy(submit->cmdbuf.vaddr, stream, args->stream_size); ret = submit_lock_objects(submit, &ticket); if (ret) - goto err_submit_objects; + goto err_submit_job; ret = submit_fence_sync(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; - ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit); + ret = etnaviv_sched_push_job(submit); if (ret) - goto err_submit_objects; + goto err_submit_job; submit_attach_object_fences(submit); @@ -595,7 +597,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, sync_file = sync_file_create(submit->out_fence); if (!sync_file) { ret = -ENOMEM; - goto err_submit_objects; + goto err_submit_job; } fd_install(out_fence_fd, sync_file->file); } @@ -603,7 +605,9 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, args->fence_fd = out_fence_fd; args->fence = submit->out_fence_id; -err_submit_objects: +err_submit_job: + drm_sched_job_cleanup(&submit->sched_job); +err_submit_put: etnaviv_submit_put(submit); err_submit_ww_acquire: diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..2bbbd6ccc95e 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444); static int etnaviv_hw_jobs_limit = 4; module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444); -static struct dma_fence * -etnaviv_sched_dependency(struct drm_sched_job *sched_job, - struct drm_sched_entity *entity) -{ - struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); - struct dma_fence *fence; - int i; - - if (unlikely(submit->in_fence)) { - fence = submit->in_fence; - submit->in_fence = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (i = 0; i < submit->nr_bos; i++) { - struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; - int j; - - if (bo->excl) { - fence = bo->excl; - bo->excl = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (j = 0; j < bo->nr_shared; j++) { - if (!bo->shared[j]) - continue; - - fence = bo->shared[j]; - bo->shared[j] = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - kfree(bo->shared); - bo->nr_shared = 0; - bo->shared = NULL; - } - - return NULL; -} - static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); @@ -140,29 +88,22 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) } static const struct drm_sched_backend_ops etnaviv_sched_ops = { - .dependency = etnaviv_sched_dependency, .run_job = etnaviv_sched_run_job, .timedout_job = etnaviv_sched_timedout_job, .free_job = etnaviv_sched_free_job, }; -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit) +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit) { int ret = 0; /* * Hold the fence lock across the whole operation to avoid jobs being * pushed out of order with regard to their sched fence seqnos as - * allocated in drm_sched_job_init. + * allocated in drm_sched_job_arm. */ mutex_lock(&submit->gpu->fence_lock); - ret = drm_sched_job_init(&submit->sched_job, sched_entity, - submit->ctx); - if (ret) - goto out_unlock; - drm_sched_job_arm(&submit->sched_job); submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h index c0a6796e22c9..baebfa069afc 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job) int etnaviv_sched_init(struct etnaviv_gpu *gpu); void etnaviv_sched_fini(struct etnaviv_gpu *gpu); -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit); +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit); #endif /* __ETNAVIV_SCHED_H__ */ -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 08/20] drm/lima: use scheduler dependency tracking

by Daniel Vetter

Nothing special going on here. Aside reviewing the code, it seems like drm_sched_job_arm() should be moved into lima_sched_context_queue_task and put under some mutex together with drm_sched_push_job(). See the kerneldoc for drm_sched_push_job(). Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Qiang Yu <yuq825(a)gmail.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: lima(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/lima/lima_gem.c | 4 ++-- drivers/gpu/drm/lima/lima_sched.c | 21 --------------------- drivers/gpu/drm/lima/lima_sched.h | 3 --- 3 files changed, 2 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index c528f40981bb..e54a88d5037a 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -267,7 +267,7 @@ static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo, if (explicit) return 0; - return drm_gem_fence_array_add_implicit(&task->deps, &bo->base.base, write); + return drm_sched_job_await_implicit(&task->base, &bo->base.base, write); } static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit) @@ -285,7 +285,7 @@ static int lima_gem_add_deps(struct drm_file *file, struct lima_submit *submit) if (err) return err; - err = drm_gem_fence_array_add(&submit->task->deps, fence); + err = drm_sched_job_await_fence(&submit->task->base, fence); if (err) { dma_fence_put(fence); return err; diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index e968b5a8f0b0..99d5f6f1a882 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -134,24 +134,15 @@ int lima_sched_task_init(struct lima_sched_task *task, task->num_bos = num_bos; task->vm = lima_vm_get(vm); - xa_init_flags(&task->deps, XA_FLAGS_ALLOC); - return 0; } void lima_sched_task_fini(struct lima_sched_task *task) { - struct dma_fence *fence; - unsigned long index; int i; drm_sched_job_cleanup(&task->base); - xa_for_each(&task->deps, index, fence) { - dma_fence_put(fence); - } - xa_destroy(&task->deps); - if (task->bos) { for (i = 0; i < task->num_bos; i++) drm_gem_object_put(&task->bos[i]->base.base); @@ -186,17 +177,6 @@ struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task) return fence; } -static struct dma_fence *lima_sched_dependency(struct drm_sched_job *job, - struct drm_sched_entity *entity) -{ - struct lima_sched_task *task = to_lima_task(job); - - if (!xa_empty(&task->deps)) - return xa_erase(&task->deps, task->last_dep++); - - return NULL; -} - static int lima_pm_busy(struct lima_device *ldev) { int ret; @@ -472,7 +452,6 @@ static void lima_sched_free_job(struct drm_sched_job *job) } static const struct drm_sched_backend_ops lima_sched_ops = { - .dependency = lima_sched_dependency, .run_job = lima_sched_run_job, .timedout_job = lima_sched_timedout_job, .free_job = lima_sched_free_job, diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h index ac70006b0e26..6a11764d87b3 100644 --- a/drivers/gpu/drm/lima/lima_sched.h +++ b/drivers/gpu/drm/lima/lima_sched.h @@ -23,9 +23,6 @@ struct lima_sched_task { struct lima_vm *vm; void *frame; - struct xarray deps; - unsigned long last_dep; - struct lima_bo **bos; int num_bos; -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 07/20] drm/panfrost: use scheduler dependency tracking

by Daniel Vetter

Just deletes some code that's now more shared. Note that thanks to the split into drm_sched_job_init/arm we can now easily pull the _init() part from under the submission lock way ahead where we're adding the sync file in-fences as dependencies. v2: Correctly clean up the partially set up job, now that job_init() and job_arm() are apart (Emma). Reviewed-by: Steven Price <steven.price(a)arm.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Rob Herring <robh(a)kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com> Cc: Steven Price <steven.price(a)arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig(a)collabora.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/panfrost/panfrost_drv.c | 16 ++++++++--- drivers/gpu/drm/panfrost/panfrost_job.c | 37 +++---------------------- drivers/gpu/drm/panfrost/panfrost_job.h | 5 +--- 3 files changed, 17 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 1ffaef5ec5ff..9f53bea07d61 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -218,7 +218,7 @@ panfrost_copy_in_sync(struct drm_device *dev, if (ret) goto fail; - ret = drm_gem_fence_array_add(&job->deps, fence); + ret = drm_sched_job_await_fence(&job->base, fence); if (ret) goto fail; @@ -236,7 +236,7 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, struct drm_panfrost_submit *args = data; struct drm_syncobj *sync_out = NULL; struct panfrost_job *job; - int ret = 0; + int ret = 0, slot; if (!args->jc) return -EINVAL; @@ -258,14 +258,20 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, kref_init(&job->refcount); - xa_init_flags(&job->deps, XA_FLAGS_ALLOC); - job->pfdev = pfdev; job->jc = args->jc; job->requirements = args->requirements; job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev); job->file_priv = file->driver_priv; + slot = panfrost_job_get_slot(job); + + ret = drm_sched_job_init(&job->base, + &job->file_priv->sched_entity[slot], + NULL); + if (ret) + goto fail_job_put; + ret = panfrost_copy_in_sync(dev, file, args, job); if (ret) goto fail_job; @@ -283,6 +289,8 @@ static int panfrost_ioctl_submit(struct drm_device *dev, void *data, drm_syncobj_replace_fence(sync_out, job->render_done_fence); fail_job: + drm_sched_job_cleanup(&job->base); +fail_job_put: panfrost_job_put(job); fail_out_sync: if (sync_out) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 4bc962763e1f..86c843d8822e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -102,7 +102,7 @@ static struct dma_fence *panfrost_fence_create(struct panfrost_device *pfdev, in return &fence->base; } -static int panfrost_job_get_slot(struct panfrost_job *job) +int panfrost_job_get_slot(struct panfrost_job *job) { /* JS0: fragment jobs. * JS1: vertex/tiler jobs @@ -242,13 +242,13 @@ static void panfrost_job_hw_submit(struct panfrost_job *job, int js) static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int bo_count, - struct xarray *deps) + struct drm_sched_job *job) { int i, ret; for (i = 0; i < bo_count; i++) { /* panfrost always uses write mode in its current uapi */ - ret = drm_gem_fence_array_add_implicit(deps, bos[i], true); + ret = drm_sched_job_await_implicit(job, bos[i], true); if (ret) return ret; } @@ -269,31 +269,21 @@ static void panfrost_attach_object_fences(struct drm_gem_object **bos, int panfrost_job_push(struct panfrost_job *job) { struct panfrost_device *pfdev = job->pfdev; - int slot = panfrost_job_get_slot(job); - struct drm_sched_entity *entity = &job->file_priv->sched_entity[slot]; struct ww_acquire_ctx acquire_ctx; int ret = 0; - ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx); if (ret) return ret; mutex_lock(&pfdev->sched_lock); - - ret = drm_sched_job_init(&job->base, entity, NULL); - if (ret) { - mutex_unlock(&pfdev->sched_lock); - goto unlock; - } - drm_sched_job_arm(&job->base); job->render_done_fence = dma_fence_get(&job->base.s_fence->finished); ret = panfrost_acquire_object_fences(job->bos, job->bo_count, - &job->deps); + &job->base); if (ret) { mutex_unlock(&pfdev->sched_lock); goto unlock; @@ -318,15 +308,8 @@ static void panfrost_job_cleanup(struct kref *ref) { struct panfrost_job *job = container_of(ref, struct panfrost_job, refcount); - struct dma_fence *fence; - unsigned long index; unsigned int i; - xa_for_each(&job->deps, index, fence) { - dma_fence_put(fence); - } - xa_destroy(&job->deps); - dma_fence_put(job->done_fence); dma_fence_put(job->render_done_fence); @@ -365,17 +348,6 @@ static void panfrost_job_free(struct drm_sched_job *sched_job) panfrost_job_put(job); } -static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job, - struct drm_sched_entity *s_entity) -{ - struct panfrost_job *job = to_panfrost_job(sched_job); - - if (!xa_empty(&job->deps)) - return xa_erase(&job->deps, job->last_dep++); - - return NULL; -} - static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job) { struct panfrost_job *job = to_panfrost_job(sched_job); @@ -765,7 +737,6 @@ static void panfrost_reset_work(struct work_struct *work) } static const struct drm_sched_backend_ops panfrost_sched_ops = { - .dependency = panfrost_job_dependency, .run_job = panfrost_job_run, .timedout_job = panfrost_job_timedout, .free_job = panfrost_job_free diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h index 82306a03b57e..77e6d0e6f612 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.h +++ b/drivers/gpu/drm/panfrost/panfrost_job.h @@ -19,10 +19,6 @@ struct panfrost_job { struct panfrost_device *pfdev; struct panfrost_file_priv *file_priv; - /* Contains both explicit and implicit fences */ - struct xarray deps; - unsigned long last_dep; - /* Fence to be signaled by IRQ handler when the job is complete. */ struct dma_fence *done_fence; @@ -42,6 +38,7 @@ int panfrost_job_init(struct panfrost_device *pfdev); void panfrost_job_fini(struct panfrost_device *pfdev); int panfrost_job_open(struct panfrost_file_priv *panfrost_priv); void panfrost_job_close(struct panfrost_file_priv *panfrost_priv); +int panfrost_job_get_slot(struct panfrost_job *job); int panfrost_job_push(struct panfrost_job *job); void panfrost_job_put(struct panfrost_job *job); void panfrost_job_enable_interrupts(struct panfrost_device *pfdev); -- 2.32.0

4 years, 7 months

1
0
0 0

[PATCH v3 04/20] drm/sched: Add dependency tracking

by Daniel Vetter

Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. On the naming bikeshed: The idea for using _await_ to denote adding dependencies to a job comes from i915, where that's used quite extensively all over the place, in lots of datastructures. v2/3: Rebased. Reviewed-by: Steven Price <steven.price(a)arm.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: David Airlie <airlied(a)linux.ie> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Cc: Lee Jones <lee.jones(a)linaro.org> Cc: Nirmoy Das <nirmoy.aiemd(a)gmail.com> Cc: Boris Brezillon <boris.brezillon(a)collabora.com> Cc: Luben Tuikov <luben.tuikov(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: Jack Zhang <Jack.Zhang1(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/scheduler/sched_entity.c | 18 +++- drivers/gpu/drm/scheduler/sched_main.c | 103 +++++++++++++++++++++++ include/drm/gpu_scheduler.h | 31 ++++++- 3 files changed, 146 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 4e1124ed80e0..c7e6d29c9a33 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, job->sched->ops->free_job(job); } +static struct dma_fence * +drm_sched_job_dependency(struct drm_sched_job *job, + struct drm_sched_entity *entity) +{ + if (!xa_empty(&job->dependencies)) + return xa_erase(&job->dependencies, job->last_dependency++); + + if (job->sched->ops->dependency) + return job->sched->ops->dependency(job, entity); + + return NULL; +} + /** * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed * @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) struct drm_sched_fence *s_fence = job->s_fence; /* Wait for all dependencies to avoid data corruptions */ - while ((f = job->sched->ops->dependency(job, entity))) + while ((f = drm_sched_job_dependency(job, entity))) dma_fence_wait(f, false); drm_sched_fence_scheduled(s_fence); @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) */ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) { - struct drm_gpu_scheduler *sched = entity->rq->sched; struct drm_sched_job *sched_job; sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) return NULL; while ((entity->dependency = - sched->ops->dependency(sched_job, entity))) { + drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); if (drm_sched_entity_add_dependency_cb(entity)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7e94754eb34c..ad62f1d2991c 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -594,6 +594,8 @@ int drm_sched_job_init(struct drm_sched_job *job, INIT_LIST_HEAD(&job->list); + xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC); + return 0; } EXPORT_SYMBOL(drm_sched_job_init); @@ -631,6 +633,98 @@ void drm_sched_job_arm(struct drm_sched_job *job) } EXPORT_SYMBOL(drm_sched_job_arm); +/** + * drm_sched_job_await_fence - adds the fence as a job dependency + * @job: scheduler job to add the dependencies to + * @fence: the dma_fence to add to the list of dependencies. + * + * Note that @fence is consumed in both the success and error cases. + * + * Returns: + * 0 on success, or an error on failing to expand the array. + */ +int drm_sched_job_await_fence(struct drm_sched_job *job, + struct dma_fence *fence) +{ + struct dma_fence *entry; + unsigned long index; + u32 id = 0; + int ret; + + if (!fence) + return 0; + + /* Deduplicate if we already depend on a fence from the same context. + * This lets the size of the array of deps scale with the number of + * engines involved, rather than the number of BOs. + */ + xa_for_each(&job->dependencies, index, entry) { + if (entry->context != fence->context) + continue; + + if (dma_fence_is_later(fence, entry)) { + dma_fence_put(entry); + xa_store(&job->dependencies, index, fence, GFP_KERNEL); + } else { + dma_fence_put(fence); + } + return 0; + } + + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL); + if (ret != 0) + dma_fence_put(fence); + + return ret; +} +EXPORT_SYMBOL(drm_sched_job_await_fence); + +/** + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies + * @job: scheduler job to add the dependencies to + * @obj: the gem object to add new dependencies from. + * @write: whether the job might write the object (so we need to depend on + * shared fences in the reservation object). + * + * This should be called after drm_gem_lock_reservations() on your array of + * GEM objects used in the job but before updating the reservations with your + * own fences. + * + * Returns: + * 0 on success, or an error on failing to expand the array. + */ +int drm_sched_job_await_implicit(struct drm_sched_job *job, + struct drm_gem_object *obj, + bool write) +{ + int ret; + struct dma_fence **fences; + unsigned int i, fence_count; + + if (!write) { + struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv); + + return drm_sched_job_await_fence(job, fence); + } + + ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences); + if (ret || !fence_count) + return ret; + + for (i = 0; i < fence_count; i++) { + ret = drm_sched_job_await_fence(job, fences[i]); + if (ret) + break; + } + + for (; i < fence_count; i++) + dma_fence_put(fences[i]); + kfree(fences); + return ret; +} +EXPORT_SYMBOL(drm_sched_job_await_implicit); + + /** * drm_sched_job_cleanup - clean up scheduler job resources * @job: scheduler job to clean up @@ -646,6 +740,9 @@ EXPORT_SYMBOL(drm_sched_job_arm); */ void drm_sched_job_cleanup(struct drm_sched_job *job) { + struct dma_fence *fence; + unsigned long index; + if (kref_read(&job->s_fence->finished.refcount)) { /* drm_sched_job_arm() has been called */ dma_fence_put(&job->s_fence->finished); @@ -655,6 +752,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job) } job->s_fence = NULL; + + xa_for_each(&job->dependencies, index, fence) { + dma_fence_put(fence); + } + xa_destroy(&job->dependencies); + } EXPORT_SYMBOL(drm_sched_job_cleanup); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 83afc3aa8e2f..74fb321dbc44 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -27,9 +27,12 @@ #include <drm/spsc_queue.h> #include <linux/dma-fence.h> #include <linux/completion.h> +#include <linux/xarray.h> #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000) +struct drm_gem_object; + struct drm_gpu_scheduler; struct drm_sched_rq; @@ -198,6 +201,16 @@ struct drm_sched_job { enum drm_sched_priority s_priority; struct drm_sched_entity *entity; struct dma_fence_cb cb; + /** + * @dependencies: + * + * Contains the dependencies as struct dma_fence for this job, see + * drm_sched_job_await_fence() and drm_sched_job_await_implicit(). + */ + struct xarray dependencies; + + /** @last_dependency: tracks @dependencies as they signal */ + unsigned long last_dependency; }; static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat { */ struct drm_sched_backend_ops { /** - * @dependency: Called when the scheduler is considering scheduling - * this job next, to get another struct dma_fence for this job to - * block on. Once it returns NULL, run_job() may be called. + * @dependency: + * + * Called when the scheduler is considering scheduling this job next, to + * get another struct dma_fence for this job to block on. Once it + * returns NULL, run_job() may be called. + * + * If a driver exclusively uses drm_sched_job_await_fence() and + * drm_sched_job_await_implicit() this can be ommitted and left as NULL. */ struct dma_fence *(*dependency)(struct drm_sched_job *sched_job, struct drm_sched_entity *s_entity); @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, void *owner); void drm_sched_job_arm(struct drm_sched_job *job); +int drm_sched_job_await_fence(struct drm_sched_job *job, + struct dma_fence *fence); +int drm_sched_job_await_implicit(struct drm_sched_job *job, + struct drm_gem_object *obj, + bool write); + + void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list); -- 2.32.0

4 years, 7 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

by Daniel Vetter

On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > Hi, > I'm sending v4 of this patch-set following the long email thread. > I want to thank Jason for reviewing v3 and pointing out the errors, saving > us time later to debug it :) > > I consulted with Christian on how to fix patch 2 (the implementation) and > at the end of the day I shamelessly copied the relevant content from > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > I also made a few improvements after looking at the relevant code in amdgpu. > The details are in the changelog of patch 2. > > I took the time to write an import code into the driver, allowing me to > check real P2P with two Gaudi devices, one as exporter and the other as > importer. I'm not going to include the import code in the product, it was > just for testing purposes (although I can share it if anyone wants). > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > Greg, I hope this will be good enough for you to merge this code. So we're officially going to use dri-devel for technical details review and then Greg for merging so we don't have to deal with other merge criteria dri-devel folks have? I don't expect anything less by now, but it does make the original claim that drivers/misc will not step all over accelerators folks a complete farce under the totally-not-a-gpu banner. This essentially means that for any other accelerator stack that doesn't fit the dri-devel merge criteria, even if it's acting like a gpu and uses other gpu driver stuff, you can just send it to Greg and it's good to go. There's quite a lot of these floating around actually (and many do have semi-open runtimes, like habanalabs have now too, just not open enough to be actually useful). It's going to be absolutely lovely having to explain to these companies in background chats why habanalabs gets away with their stack and they don't. Or maybe we should just merge them all and give up on the idea of having open cross-vendor driver stacks for these accelerators. Thanks, Daniel > > Thanks, > Oded > > Oded Gabbay (1): > habanalabs: define uAPI to export FD for DMA-BUF > > Tomer Tayar (1): > habanalabs: add support for dma-buf exporter > > drivers/misc/habanalabs/Kconfig | 1 + > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > drivers/misc/habanalabs/goya/goya.c | 1 + > include/uapi/misc/habanalabs.h | 28 +- > 6 files changed, 532 insertions(+), 5 deletions(-) > > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 7 months

3
4
0 0

[PATCH v2 08/11] drm/etnaviv: Use scheduler dependency handling

by Daniel Vetter

We need to pull the drm_sched_job_init much earlier, but that's very minor surgery. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Lucas Stach <l.stach(a)pengutronix.de> Cc: Russell King <linux+etnaviv(a)armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner(a)gmail.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: etnaviv(a)lists.freedesktop.org Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/etnaviv/etnaviv_gem.h | 5 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 32 +++++----- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 61 +------------------- drivers/gpu/drm/etnaviv/etnaviv_sched.h | 3 +- 4 files changed, 20 insertions(+), 81 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h index 98e60df882b6..63688e6e4580 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h @@ -80,9 +80,6 @@ struct etnaviv_gem_submit_bo { u64 va; struct etnaviv_gem_object *obj; struct etnaviv_vram_mapping *mapping; - struct dma_fence *excl; - unsigned int nr_shared; - struct dma_fence **shared; }; /* Created per submit-ioctl, to track bo's and cmdstream bufs, etc, @@ -95,7 +92,7 @@ struct etnaviv_gem_submit { struct etnaviv_file_private *ctx; struct etnaviv_gpu *gpu; struct etnaviv_iommu_context *mmu_context, *prev_mmu_context; - struct dma_fence *out_fence, *in_fence; + struct dma_fence *out_fence; int out_fence_id; struct list_head node; /* GPU active submit list */ struct etnaviv_cmdbuf cmdbuf; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4dd7d9d541c0..92478a50a580 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -188,16 +188,10 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue; - if (bo->flags & ETNA_SUBMIT_BO_WRITE) { - ret = dma_resv_get_fences(robj, &bo->excl, - &bo->nr_shared, - &bo->shared); - if (ret) - return ret; - } else { - bo->excl = dma_resv_get_excl_unlocked(robj); - } - + ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base, + bo->flags & ETNA_SUBMIT_BO_WRITE); + if (ret) + return ret; } return ret; @@ -403,8 +397,6 @@ static void submit_cleanup(struct kref *kref) wake_up_all(&submit->gpu->fence_event); - if (submit->in_fence) - dma_fence_put(submit->in_fence); if (submit->out_fence) { /* first remove from IDR, so fence can not be found anymore */ mutex_lock(&submit->gpu->fence_lock); @@ -537,6 +529,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, submit->exec_state = args->exec_state; submit->flags = args->flags; + ret = drm_sched_job_init(&submit->sched_job, + &ctx->sched_entity[args->pipe], + submit->ctx); + if (ret) + goto err_submit_objects; + ret = submit_lookup_objects(submit, file, bos, args->nr_bos); if (ret) goto err_submit_objects; @@ -549,11 +547,15 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, } if (args->flags & ETNA_SUBMIT_FENCE_FD_IN) { - submit->in_fence = sync_file_get_fence(args->fence_fd); - if (!submit->in_fence) { + struct dma_fence *in_fence = sync_file_get_fence(args->fence_fd); + if (!in_fence) { ret = -EINVAL; goto err_submit_objects; } + + ret = drm_sched_job_await_fence(&submit->sched_job, in_fence); + if (ret) + goto err_submit_objects; } ret = submit_pin_objects(submit); @@ -579,7 +581,7 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void *data, if (ret) goto err_submit_objects; - ret = etnaviv_sched_push_job(&ctx->sched_entity[args->pipe], submit); + ret = etnaviv_sched_push_job(submit); if (ret) goto err_submit_objects; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..c98d67320be3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -17,58 +17,6 @@ module_param_named(job_hang_limit, etnaviv_job_hang_limit, int , 0444); static int etnaviv_hw_jobs_limit = 4; module_param_named(hw_job_limit, etnaviv_hw_jobs_limit, int , 0444); -static struct dma_fence * -etnaviv_sched_dependency(struct drm_sched_job *sched_job, - struct drm_sched_entity *entity) -{ - struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); - struct dma_fence *fence; - int i; - - if (unlikely(submit->in_fence)) { - fence = submit->in_fence; - submit->in_fence = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (i = 0; i < submit->nr_bos; i++) { - struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; - int j; - - if (bo->excl) { - fence = bo->excl; - bo->excl = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - - for (j = 0; j < bo->nr_shared; j++) { - if (!bo->shared[j]) - continue; - - fence = bo->shared[j]; - bo->shared[j] = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - kfree(bo->shared); - bo->nr_shared = 0; - bo->shared = NULL; - } - - return NULL; -} - static struct dma_fence *etnaviv_sched_run_job(struct drm_sched_job *sched_job) { struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); @@ -140,14 +88,12 @@ static void etnaviv_sched_free_job(struct drm_sched_job *sched_job) } static const struct drm_sched_backend_ops etnaviv_sched_ops = { - .dependency = etnaviv_sched_dependency, .run_job = etnaviv_sched_run_job, .timedout_job = etnaviv_sched_timedout_job, .free_job = etnaviv_sched_free_job, }; -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit) +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit) { int ret = 0; @@ -158,11 +104,6 @@ int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, */ mutex_lock(&submit->gpu->fence_lock); - ret = drm_sched_job_init(&submit->sched_job, sched_entity, - submit->ctx); - if (ret) - goto out_unlock; - drm_sched_job_arm(&submit->sched_job); submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.h b/drivers/gpu/drm/etnaviv/etnaviv_sched.h index c0a6796e22c9..baebfa069afc 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.h @@ -18,7 +18,6 @@ struct etnaviv_gem_submit *to_etnaviv_submit(struct drm_sched_job *sched_job) int etnaviv_sched_init(struct etnaviv_gpu *gpu); void etnaviv_sched_fini(struct etnaviv_gpu *gpu); -int etnaviv_sched_push_job(struct drm_sched_entity *sched_entity, - struct etnaviv_gem_submit *submit); +int etnaviv_sched_push_job(struct etnaviv_gem_submit *submit); #endif /* __ETNAVIV_SCHED_H__ */ -- 2.32.0.rc2

4 years, 7 months

2
4
0 0

[PATCH v2 02/11] drm/sched: Add dependency tracking

by Daniel Vetter

Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. On the naming bikeshed: The idea for using _await_ to denote adding dependencies to a job comes from i915, where that's used quite extensively all over the place, in lots of datastructures. v2: Rebased. Reviewed-by: Steven Price <steven.price(a)arm.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: David Airlie <airlied(a)linux.ie> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Cc: Lee Jones <lee.jones(a)linaro.org> Cc: Nirmoy Das <nirmoy.aiemd(a)gmail.com> Cc: Boris Brezillon <boris.brezillon(a)collabora.com> Cc: Luben Tuikov <luben.tuikov(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Cc: Jack Zhang <Jack.Zhang1(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/gpu/drm/scheduler/sched_entity.c | 18 +++- drivers/gpu/drm/scheduler/sched_main.c | 103 +++++++++++++++++++++++ include/drm/gpu_scheduler.h | 31 ++++++- 3 files changed, 146 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index f7347c284886..b6f72fafd504 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -211,6 +211,19 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, job->sched->ops->free_job(job); } +static struct dma_fence * +drm_sched_job_dependency(struct drm_sched_job *job, + struct drm_sched_entity *entity) +{ + if (!xa_empty(&job->dependencies)) + return xa_erase(&job->dependencies, job->last_dependency++); + + if (job->sched->ops->dependency) + return job->sched->ops->dependency(job, entity); + + return NULL; +} + /** * drm_sched_entity_kill_jobs - Make sure all remaining jobs are killed * @@ -229,7 +242,7 @@ static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) struct drm_sched_fence *s_fence = job->s_fence; /* Wait for all dependencies to avoid data corruptions */ - while ((f = job->sched->ops->dependency(job, entity))) + while ((f = drm_sched_job_dependency(job, entity))) dma_fence_wait(f, false); drm_sched_fence_scheduled(s_fence); @@ -419,7 +432,6 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) */ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) { - struct drm_gpu_scheduler *sched = entity->rq->sched; struct drm_sched_job *sched_job; sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue)); @@ -427,7 +439,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) return NULL; while ((entity->dependency = - sched->ops->dependency(sched_job, entity))) { + drm_sched_job_dependency(sched_job, entity))) { trace_drm_sched_job_wait_dep(sched_job, entity->dependency); if (drm_sched_entity_add_dependency_cb(entity)) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 5e84e1500c32..12d533486518 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -605,6 +605,8 @@ int drm_sched_job_init(struct drm_sched_job *job, INIT_LIST_HEAD(&job->list); + xa_init_flags(&job->dependencies, XA_FLAGS_ALLOC); + return 0; } EXPORT_SYMBOL(drm_sched_job_init); @@ -628,6 +630,98 @@ void drm_sched_job_arm(struct drm_sched_job *job) } EXPORT_SYMBOL(drm_sched_job_arm); +/** + * drm_sched_job_await_fence - adds the fence as a job dependency + * @job: scheduler job to add the dependencies to + * @fence: the dma_fence to add to the list of dependencies. + * + * Note that @fence is consumed in both the success and error cases. + * + * Returns: + * 0 on success, or an error on failing to expand the array. + */ +int drm_sched_job_await_fence(struct drm_sched_job *job, + struct dma_fence *fence) +{ + struct dma_fence *entry; + unsigned long index; + u32 id = 0; + int ret; + + if (!fence) + return 0; + + /* Deduplicate if we already depend on a fence from the same context. + * This lets the size of the array of deps scale with the number of + * engines involved, rather than the number of BOs. + */ + xa_for_each(&job->dependencies, index, entry) { + if (entry->context != fence->context) + continue; + + if (dma_fence_is_later(fence, entry)) { + dma_fence_put(entry); + xa_store(&job->dependencies, index, fence, GFP_KERNEL); + } else { + dma_fence_put(fence); + } + return 0; + } + + ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL); + if (ret != 0) + dma_fence_put(fence); + + return ret; +} +EXPORT_SYMBOL(drm_sched_job_await_fence); + +/** + * drm_sched_job_await_implicit - adds implicit dependencies as job dependencies + * @job: scheduler job to add the dependencies to + * @obj: the gem object to add new dependencies from. + * @write: whether the job might write the object (so we need to depend on + * shared fences in the reservation object). + * + * This should be called after drm_gem_lock_reservations() on your array of + * GEM objects used in the job but before updating the reservations with your + * own fences. + * + * Returns: + * 0 on success, or an error on failing to expand the array. + */ +int drm_sched_job_await_implicit(struct drm_sched_job *job, + struct drm_gem_object *obj, + bool write) +{ + int ret; + struct dma_fence **fences; + unsigned int i, fence_count; + + if (!write) { + struct dma_fence *fence = dma_resv_get_excl_unlocked(obj->resv); + + return drm_sched_job_await_fence(job, fence); + } + + ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences); + if (ret || !fence_count) + return ret; + + for (i = 0; i < fence_count; i++) { + ret = drm_sched_job_await_fence(job, fences[i]); + if (ret) + break; + } + + for (; i < fence_count; i++) + dma_fence_put(fences[i]); + kfree(fences); + return ret; +} +EXPORT_SYMBOL(drm_sched_job_await_implicit); + + /** * drm_sched_job_cleanup - clean up scheduler job resources * @job: scheduler job to clean up @@ -643,6 +737,9 @@ EXPORT_SYMBOL(drm_sched_job_arm); */ void drm_sched_job_cleanup(struct drm_sched_job *job) { + struct dma_fence *fence; + unsigned long index; + if (!kref_read(&job->s_fence->finished.refcount)) { /* drm_sched_job_arm() has been called */ dma_fence_put(&job->s_fence->finished); @@ -652,6 +749,12 @@ void drm_sched_job_cleanup(struct drm_sched_job *job) } job->s_fence = NULL; + + xa_for_each(&job->dependencies, index, fence) { + dma_fence_put(fence); + } + xa_destroy(&job->dependencies); + } EXPORT_SYMBOL(drm_sched_job_cleanup); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 83afc3aa8e2f..74fb321dbc44 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -27,9 +27,12 @@ #include <drm/spsc_queue.h> #include <linux/dma-fence.h> #include <linux/completion.h> +#include <linux/xarray.h> #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000) +struct drm_gem_object; + struct drm_gpu_scheduler; struct drm_sched_rq; @@ -198,6 +201,16 @@ struct drm_sched_job { enum drm_sched_priority s_priority; struct drm_sched_entity *entity; struct dma_fence_cb cb; + /** + * @dependencies: + * + * Contains the dependencies as struct dma_fence for this job, see + * drm_sched_job_await_fence() and drm_sched_job_await_implicit(). + */ + struct xarray dependencies; + + /** @last_dependency: tracks @dependencies as they signal */ + unsigned long last_dependency; }; static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, @@ -220,9 +233,14 @@ enum drm_gpu_sched_stat { */ struct drm_sched_backend_ops { /** - * @dependency: Called when the scheduler is considering scheduling - * this job next, to get another struct dma_fence for this job to - * block on. Once it returns NULL, run_job() may be called. + * @dependency: + * + * Called when the scheduler is considering scheduling this job next, to + * get another struct dma_fence for this job to block on. Once it + * returns NULL, run_job() may be called. + * + * If a driver exclusively uses drm_sched_job_await_fence() and + * drm_sched_job_await_implicit() this can be ommitted and left as NULL. */ struct dma_fence *(*dependency)(struct drm_sched_job *sched_job, struct drm_sched_entity *s_entity); @@ -349,6 +367,13 @@ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, void *owner); void drm_sched_job_arm(struct drm_sched_job *job); +int drm_sched_job_await_fence(struct drm_sched_job *job, + struct dma_fence *fence); +int drm_sched_job_await_implicit(struct drm_sched_job *job, + struct drm_gem_object *obj, + bool write); + + void drm_sched_entity_modify_sched(struct drm_sched_entity *entity, struct drm_gpu_scheduler **sched_list, unsigned int num_sched_list); -- 2.32.0.rc2

4 years, 7 months

2
2
0 0

[PATCH 7/7] dma-resv: Give the docs a do-over

by Daniel Vetter

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence. But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat. - Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-( - Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway. - Main part of the patch is some actual docs for struct dma_resv. Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great. Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@ * write operations) or N shared fences (read operations). The RCU * mechanism is used to protect read access to fences from locked * write-side updates. + * + * See struct dma_resv for more details. */ DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini); * @num_fences: number of fences we want to add * * Should be called before dma_resv_add_shared_fence(). Must - * be called with obj->lock held. + * be called with @obj locked through dma_resv_lock(). + * + * Note that the preallocated slots need to be re-reserved if @obj is unlocked + * at any time before callind dma_resv_add_shared_fence(). This is validate when + * CONFIG_DEBUG_MUTEXES is enabled. * * RETURNS * Zero for success, or -errno @@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to a shared slot, obj->lock must be held, and + * Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and * dma_resv_reserve_shared() has been called. + * + * See also &dma_resv.fence for a discussion of the semantics. */ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to the exclusive slot. The obj->lock must be held. + * Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock(). + * Note that this function replaces all fences attached to @obj, see also + * &dma_resv.fence_excl for a discussion of the semantics. */ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence) * fence * * Callers are not required to hold specific locks, but maybe hold - * dma_resv_lock() already + * dma_resv_lock() already. + * * RETURNS - * true if all fences signaled, else false + * + * True if all fences signaled, else false. */ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list { /** * struct dma_resv - a reservation object manages fences for a buffer - * @lock: update side lock - * @seq: sequence count for managing RCU read-side synchronization - * @fence_excl: the exclusive fence, if there is one currently - * @fence: list of current shared fences + * + * There are multiple uses for this, with sometimes slightly different rules in + * how the fence slots are used. + * + * One use is to synchronize cross-driver access to a struct dma_buf, either for + * dynamic buffer management or just to handle implicit synchronization between + * different users of the buffer in userspace. See &dma_buf.resv for a more + * in-depth discussion. + * + * The other major use is to manage access and locking within a driver in a + * buffer based memory manager. struct ttm_buffer_object is the canonical + * example here, since this is were reservation objects originated from. But use + * in drivers is spreading and some drivers also manage struct + * drm_gem_object with the same scheme. */ struct dma_resv { + /** + * @lock: + * + * Update side lock. Don't use directly, instead use the wrapper + * functions like dma_resv_lock() and dma_resv_unlock(). + * + * Drivers which use the reservation object to manage memory dynamically + * also use this lock to protect buffer object state like placement, + * allocation policies or throughout command submission. + */ struct ww_mutex lock; + + /** + * @seq: + * + * Sequence count for managing RCU read-side synchronization, allows + * read-only access to @fence_excl and @fence while ensuring we take a + * consistent snapshot. + */ seqcount_ww_mutex_t seq; + /** + * @fence_excl: + * + * The exclusive fence, if there is one currently. + * + * There are two was to update this fence: + * + * - First by calling dma_resv_add_excl_fence(), which replaces all + * fences attached to the reservation object. To guarantee that no + * fences are lost this new fence must signal only after all previous + * fences, both shared and exclusive, have signalled. In some cases it + * is convenient to achieve that by attaching a struct dma_fence_array + * with all the new and old fences. + * + * - Alternatively the fence can be set directly, which leaves the + * shared fences unchanged. To guarantee that no fences are lost this + * new fence must signale only after the previous exclusive fence has + * singalled. Since the shared fences are staying intact, it is not + * necessary to maintain any ordering against those. If semantically + * only a new access is added without actually treating the previous + * one as a dependency the exclusive fences can be strung together + * using struct dma_fence_chain. + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_fence __rcu *fence_excl; + + /** + * @fence: + * + * List of current shared fences. + * + * There are no ordering constraints of shared fences against the + * exclusive fence slot. If a waiter needs to wait for all access, it + * has to wait for both set of fences to signal. + * + * A new fence is added by calling dma_resv_add_shared_fence(). Since + * this often needs to be done past the point of no return in command + * submission it cannot fail, and therefor sufficient slots need to be + * reserved by calling dma_resv_reserve_shared(). + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_resv_list __rcu *fence; }; @@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {} * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj. + * + * Unlocked by calling dma_resv_lock(). + * + * See also dma_resv_lock_interruptible() for the interruptible variant. */ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj, * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on + * @obj. + * + * Unlocked by calling dma_resv_lock(). */ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, * Acquires the reservation object after a die case. This function * will sleep until the lock becomes available. See dma_resv_lock() as * well. + * + * See also dma_resv_lock_slow_interruptible() for the interruptible variant. */ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj, * if they overlap with a writer. * * Also note that since no context is provided, no deadlock protection is - * possible. + * possible, which is also not needed for a trylock. * * Returns true if the lock was acquired, false otherwise. */ @@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj) * * Returns the context used to lock a reservation object or NULL if no context * was used or the object is not locked at all. + * + * WARNING: This interface is pretty horrible, but TTM needs it because it + * doesn't pass the struct ww_acquire_ctx around in some very long callchains. + * Everyone else just uses it to check whether they're holding a reservation or + * not. */ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { -- 2.32.0

4 years, 7 months

3
3
0 0

Re: [Linaro-mm-sig] [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

by Daniel Vetter

On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay <oded.gabbay(a)gmail.com> wrote: > > On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter <daniel(a)ffwll.ch> wrote: > > > > On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote: > > > Hi, > > > I'm sending v4 of this patch-set following the long email thread. > > > I want to thank Jason for reviewing v3 and pointing out the errors, saving > > > us time later to debug it :) > > > > > > I consulted with Christian on how to fix patch 2 (the implementation) and > > > at the end of the day I shamelessly copied the relevant content from > > > amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the > > > usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively. > > > > > > I also made a few improvements after looking at the relevant code in amdgpu. > > > The details are in the changelog of patch 2. > > > > > > I took the time to write an import code into the driver, allowing me to > > > check real P2P with two Gaudi devices, one as exporter and the other as > > > importer. I'm not going to include the import code in the product, it was > > > just for testing purposes (although I can share it if anyone wants). > > > > > > I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU > > > with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy). > > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > So we're officially going to use dri-devel for technical details review > > and then Greg for merging so we don't have to deal with other merge > > criteria dri-devel folks have? > I'm glad to receive any help or review, regardless of the subsystem > the person giving that help belongs to. > > > > > I don't expect anything less by now, but it does make the original claim > > that drivers/misc will not step all over accelerators folks a complete > > farce under the totally-not-a-gpu banner. > > > > This essentially means that for any other accelerator stack that doesn't > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > What's wrong with Greg ??? ;) > > On a more serious note, yes, I do think the dri-devel merge criteria > is very extreme, and effectively drives-out many AI accelerator > companies that want to contribute to the kernel but can't/won't open > their software IP and patents. > > I think the expectation from AI startups (who are 90% of the deep > learning field) to cooperate outside of company boundaries is not > realistic, especially on the user-side, where the real IP of the > company resides. > > Personally I don't think there is a real justification for that at > this point of time, but if it will make you (and other people here) > happy I really don't mind creating a non-gpu accelerator subsystem > that will contain all the totally-not-a-gpu accelerators, and will > have a more relaxed criteria for upstreaming. Something along an > "rdma-core" style library looks like the correct amount of user-level > open source that should be enough. > > The question is, what will happen later ? Will it be sufficient to > "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. > hmm) ? > > If the community and dri-devel maintainers (and you among them) will > assure me it is good enough, then I'll happily contribute my work and > personal time to organize this effort and implement it. I think dri-devel stance is pretty clear and well known: We want the userspace to be open, because that's where most of the driver stack is. Without an open driver stack there's no way to ever have anything cross-vendor. And that includes the compiler and anything else you need to drive the hardware. Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless. If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful. If the business case says you can't open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything. If the stack is actually cross vendor to begin with that's just bonus, but generally that doesn't happen voluntarily and needs a few years to decades to get there. So that's not really something we require. tldr; just a runtime isn't enough for dri-devel. Now Greg seems to be happy to merge kernel drivers that aren't useful with the open bits provided, so *shrug*. Cheers, Daniel PS: If requiring an actually useful open driver stack is somehow *extreme* I have no idea why we even bother with merging device drivers to upstream. Just make a stable driver api and done, vendors can then do whatever they feel like and protect their "valuable IP and patents" or whatever it is. > Thanks, > oded > > > > > There's quite a lot of these floating around actually (and many do have > > semi-open runtimes, like habanalabs have now too, just not open enough to > > be actually useful). It's going to be absolutely lovely having to explain > > to these companies in background chats why habanalabs gets away with their > > stack and they don't. > > > > Or maybe we should just merge them all and give up on the idea of having > > open cross-vendor driver stacks for these accelerators. > > > > Thanks, Daniel > > > > > > > > Thanks, > > > Oded > > > > > > Oded Gabbay (1): > > > habanalabs: define uAPI to export FD for DMA-BUF > > > > > > Tomer Tayar (1): > > > habanalabs: add support for dma-buf exporter > > > > > > drivers/misc/habanalabs/Kconfig | 1 + > > > drivers/misc/habanalabs/common/habanalabs.h | 26 ++ > > > drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- > > > drivers/misc/habanalabs/gaudi/gaudi.c | 1 + > > > drivers/misc/habanalabs/goya/goya.c | 1 + > > > include/uapi/misc/habanalabs.h | 28 +- > > > 6 files changed, 532 insertions(+), 5 deletions(-) > > > > > > -- > > > 2.25.1 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 7 months

4
21
0 0

Re: [Linaro-mm-sig] [PATCH v4 2/2] habanalabs: add support for dma-buf exporter

by Jason Gunthorpe

On Tue, Jul 06, 2021 at 12:44:49PM +0300, Oded Gabbay wrote: > > > + /* In case we got a large memory area to export, we need to divide it > > > + * to smaller areas because each entry in the dmabuf sgt can only > > > + * describe unsigned int. > > > + */ > > > > Huh? This is forming a SGL, it should follow the SGL rules which means > > you have to fragment based on the dma_get_max_seg_size() of the > > importer device. > > > hmm > I don't see anyone in drm checking this value (and using it) when > creating the SGL when exporting dmabuf. (e.g. > amdgpu_vram_mgr_alloc_sgt) For dmabuf the only importer is RDMA and it doesn't care, but you certainly should not introduce a hardwired constant instead of using the correct function. Jason

4 years, 7 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v4 0/2] Add p2p via dmabuf to habanalabs

by Daniel Vetter

On Tue, Jul 6, 2021 at 2:46 PM Oded Gabbay <oded.gabbay(a)gmail.com> wrote: > > On Tue, Jul 6, 2021 at 3:23 PM Daniel Vetter <daniel(a)ffwll.ch> wrote: > > > > On Tue, Jul 06, 2021 at 02:21:10PM +0200, Christoph Hellwig wrote: > > > On Tue, Jul 06, 2021 at 10:40:37AM +0200, Daniel Vetter wrote: > > > > > Greg, I hope this will be good enough for you to merge this code. > > > > > > > > So we're officially going to use dri-devel for technical details review > > > > and then Greg for merging so we don't have to deal with other merge > > > > criteria dri-devel folks have? > > > > > > > > I don't expect anything less by now, but it does make the original claim > > > > that drivers/misc will not step all over accelerators folks a complete > > > > farce under the totally-not-a-gpu banner. > > > > > > > > This essentially means that for any other accelerator stack that doesn't > > > > fit the dri-devel merge criteria, even if it's acting like a gpu and uses > > > > other gpu driver stuff, you can just send it to Greg and it's good to go. > > > > > > > > There's quite a lot of these floating around actually (and many do have > > > > semi-open runtimes, like habanalabs have now too, just not open enough to > > > > be actually useful). It's going to be absolutely lovely having to explain > > > > to these companies in background chats why habanalabs gets away with their > > > > stack and they don't. > > > > > > FYI, I fully agree with Daniel here. Habanlabs needs to open up their > > > runtime if they want to push any additional feature in the kernel. > > > The current situation is not sustainable. > Well, that's like, your opinion... > > > > > Before anyone replies: The runtime is open, the compiler is still closed. > > This has become the new default for accel driver submissions, I think > > mostly because all the interesting bits for non-3d accelerators are in the > > accel ISA, and no longer in the runtime. So vendors are fairly happy to > > throw in the runtime as a freebie. > > > > It's still incomplete, and it's still useless if you want to actually hack > > on the driver stack. > > -Daniel > > -- > I don't understand what's not sustainable here. > > There is zero code inside the driver that communicates or interacts > with our TPC code (TPC is the Tensor Processing Core). > Even submitting works to the TPC is done via a generic queue > interface. And that queue IP is common between all our engines > (TPC/DMA/NIC). The driver provides all the specs of that queue IP, > because the driver's code is handling that queue. But why is the TPC > compiler code even relevant here ? Can I use the hw how it's intended to be used without it? If the answer is no, then essentially what you're doing with your upstream driver is getting all the benefits of an upstream driver, while upstream gets nothing. We can't use your stack, not as-is. Sure we can use the queue, but we can't actually submit anything interesting. And I'm pretty sure the point of your hw is to do more than submit no-op packets to a queue. This is all "I want my cake and eat it too" approach to upstreaming, and it's totally fine attitude to have, but if you don't see why there's maybe an different side to it then I don't get what you're arguing. Upstream isn't free lunch for nothing. Frankly I'm starting to assume you're arguing this all in bad faith just because habanalabds doesn't want to actually have an open driver stack, so any attack is good, no matter what. Which is also what everyone else does who submits their accel driver to upstream, and which gets us back to the starting point of this sub-thread of me really appreciation how this will improve background discussions going forward for everyone. Like if the requirement for accel drivers truly is that you can submit a dummy command to the queues then I have about 5-10 drivers at least I could merge instantly. For something like the intel gpu driver it would be about 50 lines of code (including all the structure boiler plate the ioctls require)in userspace to submit a dummy queue command. GPU and accel vendors would really love that, because it would allow them to freeload on upstream and do essentially nothing in return. And we'd end up with an unmaintainable disaster of a gpu or well accelerator subsystem because there's nothing you can change or improve because all the really useful bits of the stack are closed. And ofc that's not any companies problem anymore, so ofc you with the habanalabs hat on don't care and call this *extreme*. > btw, you can today see our TPC code at > https://github.com/HabanaAI/Habana_Custom_Kernel > There is a link there to the TPC user guide and link to download the > LLVM compiler. I got stuck clicking links before I found the source for that llvm compiler. Can you give me a direct link to the repo with sourcecode instead please? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 7 months

1
0
0 0

[PATCH AUTOSEL 5.4 18/74] drm/sched: Avoid data corruptions

by Sasha Levin

From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> [ Upstream commit 0b10ab80695d61422337ede6ff496552d8ace99d ] Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey… Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/scheduler/sched_entity.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 1a5153197fe9..57f9baad9e36 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -235,11 +235,16 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) { struct drm_sched_job *job; + struct dma_fence *f; int r; while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) { struct drm_sched_fence *s_fence = job->s_fence; + /* Wait for all dependencies to avoid data corruptions */ + while ((f = job->sched->ops->dependency(job, entity))) + dma_fence_wait(f, false); + drm_sched_fence_scheduled(s_fence); dma_fence_set_error(&s_fence->finished, -ESRCH); -- 2.30.2

4 years, 7 months

1
0
0 0

[PATCH AUTOSEL 5.10 028/137] drm/sched: Avoid data corruptions

by Sasha Levin

From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> [ Upstream commit 0b10ab80695d61422337ede6ff496552d8ace99d ] Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey… Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/scheduler/sched_entity.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 2006cc057f99..3f7f761df4cd 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -219,11 +219,16 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) { struct drm_sched_job *job; + struct dma_fence *f; int r; while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) { struct drm_sched_fence *s_fence = job->s_fence; + /* Wait for all dependencies to avoid data corruptions */ + while ((f = job->sched->ops->dependency(job, entity))) + dma_fence_wait(f, false); + drm_sched_fence_scheduled(s_fence); dma_fence_set_error(&s_fence->finished, -ESRCH); -- 2.30.2

4 years, 7 months

1
0
0 0

[PATCH AUTOSEL 5.12 031/160] drm/sched: Avoid data corruptions

by Sasha Levin

From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> [ Upstream commit 0b10ab80695d61422337ede6ff496552d8ace99d ] Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey… Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/scheduler/sched_entity.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 72c39608236b..1b2fdf7f3ccd 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -222,11 +222,16 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) { struct drm_sched_job *job; + struct dma_fence *f; int r; while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) { struct drm_sched_fence *s_fence = job->s_fence; + /* Wait for all dependencies to avoid data corruptions */ + while ((f = job->sched->ops->dependency(job, entity))) + dma_fence_wait(f, false); + drm_sched_fence_scheduled(s_fence); dma_fence_set_error(&s_fence->finished, -ESRCH); -- 2.30.2

4 years, 7 months

1
0
0 0

[PATCH AUTOSEL 5.13 039/189] drm/sched: Avoid data corruptions

by Sasha Levin

From: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> [ Upstream commit 0b10ab80695d61422337ede6ff496552d8ace99d ] Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey… Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/gpu/drm/scheduler/sched_entity.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index cb58f692dad9..86a4209d8c77 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -222,11 +222,16 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity) { struct drm_sched_job *job; + struct dma_fence *f; int r; while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) { struct drm_sched_fence *s_fence = job->s_fence; + /* Wait for all dependencies to avoid data corruptions */ + while ((f = job->sched->ops->dependency(job, entity))) + dma_fence_wait(f, false); + drm_sched_fence_scheduled(s_fence); dma_fence_set_error(&s_fence->finished, -ESRCH); -- 2.30.2

4 years, 7 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v4 2/2] habanalabs: add support for dma-buf exporter

by Jason Gunthorpe

On Mon, Jul 05, 2021 at 04:03:14PM +0300, Oded Gabbay wrote: > + rc = sg_alloc_table(*sgt, nents, GFP_KERNEL | __GFP_ZERO); > + if (rc) > + goto error_free; If you are not going to include a CPU list then I suggest setting sg_table->orig_nents == 0 And using only the nents which is the length of the DMA list. At least it gives some hope that other parts of the system could detect this. > + > + /* Merge pages and put them into the scatterlist */ > + cur_page = 0; > + for_each_sgtable_sg((*sgt), sg, i) { for_each_sgtable_sg should never be used when working with sg_dma_address() type stuff, here and everywhere else. The DMA list should be iterated using the for_each_sgtable_dma_sg() macro. > + /* In case we got a large memory area to export, we need to divide it > + * to smaller areas because each entry in the dmabuf sgt can only > + * describe unsigned int. > + */ Huh? This is forming a SGL, it should follow the SGL rules which means you have to fragment based on the dma_get_max_seg_size() of the importer device. > + hl_dmabuf->pages = kcalloc(hl_dmabuf->npages, sizeof(*hl_dmabuf->pages), > + GFP_KERNEL); > + if (!hl_dmabuf->pages) { > + rc = -ENOMEM; > + goto err_free_dmabuf_wrapper; > + } Why not just create the SGL directly? Is there a reason it needs to make a page list? Jason

4 years, 7 months

1
0
0 0

Re: [Linaro-mm-sig] [PATCH v7 0/5] drm: address potential UAF bugs with drm_master ptrs

by Daniel Vetter

On Mon, Jul 05, 2021 at 10:15:45AM +0800, Desmond Cheong Zhi Xi wrote: > On 3/7/21 3:07 am, Daniel Vetter wrote: > > On Fri, Jul 02, 2021 at 12:53:53AM +0800, Desmond Cheong Zhi Xi wrote: > > > This patch series addresses potential use-after-free errors when dereferencing pointers to struct drm_master. These were identified after one such bug was caught by Syzbot in drm_getunique(): > > > https://syzkaller.appspot.com/bug?id=148d2f1dfac64af52ffd27b661981a540724f8… > > > > > > The series is broken up into five patches: > > > > > > 1. Move a call to drm_is_current_master() out from a section locked by &dev->mode_config.mutex in drm_mode_getconnector(). This patch does not apply to stable. > > > > > > 2. Move a call to _drm_lease_held() out from the section locked by &dev->mode_config.idr_mutex in __drm_mode_object_find(). > > > > > > 3. Implement a locked version of drm_is_current_master() function that's used within drm_auth.c. > > > > > > 4. Serialize drm_file.master by introducing a new lock that's held whenever the value of drm_file.master changes. > > > > > > 5. Identify areas in drm_lease.c where pointers to struct drm_master are dereferenced, and ensure that the master pointers are not freed during use. > > > > > > Changes in v6 -> v7: > > > - Patch 2: > > > Modify code alignment as suggested by the intel-gfx CI. > > > > > > Update commit message based on the changes to patch 5. > > > > > > - Patch 4: > > > Add patch 4 to the series. This patch adds a new lock to serialize drm_file.master, in response to the lockdep splat by the intel-gfx CI. > > > > > > - Patch 5: > > > Move kerneldoc comment about protecting drm_file.master with drm_device.master_mutex into patch 4. > > > > > > Update drm_file_get_master to use the new drm_file.master_lock instead of drm_device.master_mutex, in response to the lockdep splat by the intel-gfx CI. > > > > So there's another one now because master->leases is protected by the > > mode_config.idr_mutex, and that's a bit awkward to untangle. > > > > Also I'm really surprised that there was now lockdep through the atomic > > code anywhere. The reason seems to be that somehow CI reboot first before > > it managed to run any of the kms_atomic tests, and we can only hit this > > when we go through the atomic kms ioctl, the legacy kms ioctl don't have > > that specific issue. > > > > Anyway I think this approach doesn't look too workable, and we need > > something new. > > > > But first things first: Are you still on board working on this? You > > started with a simple patch to fix a UAF bug, now we're deep into > > reworking tricky locking ... If you feel like you want out I'm totally > > fine with that. > > > > Hi Daniel, > > Thanks for asking, but I'm committed to seeing this through :) In fact, I > really appreciate all your guidance and patience as the simple patch evolved > into the current state of things. Cool, it's definitely been fun trying to figure out a good solution for this tricky problem here :-) > > Anyway, I think we need to split drm_device->master_mutex up into two > > parts: > > > > - One part that protects the actual access/changes, which I think for > > simplicity we'll just leave as the current lock. That lock is a very > > inner lock, since for the drm_lease.c stuff it has to nest within > > mode_config.idr_mutex even. > > > > - Now the issue with checking master status/leases/whatever as an > > innermost lock is that you can race, it's a classic time of check vs > > time of use race: By the time we actually use the thing we validate > > we'er allowed to use, we might now have access anymore. There's two > > reasons for that: > > > > * DROPMASTER ioctl could remove the master rights, which removes access > > rights also for all leases > > > > * REVOKE_LEASE ioctl can do the same but only for a specific lease > > > > This is the thing we're trying to protect against in fbcon code, but > > that's very spotty protection because all the ioctls by other users > > aren't actually protected against this. > > > > So I think for this we need some kind of big reader lock. > > > > Now for the implementation, there's a few things: > > > > - I think best option for this big reader lock would be to just use srcu. > > We only need to flush out all current readers when we drop master or > > revoke a lease, so synchronize_srcu is perfectly good enough for this > > purpose. > > > > - The fbdev code would switch over to srcu in > > drm_master_internal_acquire() and drm_master_internal_release(). Ofc > > within drm_master_internal_acquire we'd still need to check master > > status with the normal master_mutex. > > > > - While we revamp all this we should fix the ioctl checks in drm_ioctl.c. > > Just noticed that drm_ioctl_permit() could and should be unexported, > > last user was removed. > > > > Within drm_ioctl_kernel we'd then replace the check for > > drm_is_current_master with the drm_master_internal_acquire/release. > > > > - This alone does nothing, we still need to make sure that dropmaster and > > revoke_lease ioctl flush out all other access before they return to > > userspace. We can't just call synchronize_srcu because due to the ioctl > > code in drm_ioctl_kernel we're in that sruc section, we'd need to add a > > DRM_MASTER_FLUSH ioctl flag which we'd check only when DRM_MASTER is > > set, and use to call synchronize_srcu. Maybe wrap that in a > > drm_master_flush or so, or perhaps a drm_master_internal_release_flush. > > > > - Also maybe we should drop the _internal_ from that name. Feels a bit > > wrong when we're also going to use this in the ioctl handler. > > > > Thoughts? Totally silly and overkill? > > > > Cheers, Daniel > > > > > > Just some thoughts on the previous approach before we move on to something > new. Regarding the lockdep warning for mode_config.idr_mutex, I think that's > resolvable now by simply removing patch 2, which is no longer really > necessary with the introduction of a new mutex at the bottom of the lock > hierarchy in patch 4. Oh I missed that, this is essentially part-way to what I'm describing above. > I was hesitant to create a new mutex (especially since this means that > drm_file.master is now protected by either of two mutexes), but it's > probably the smallest fix in terms of code churn. Is that approach no good? That's the other approach I considered. It solves the use-after-free issue, but while I was musing all the different issues here I realized that we might as well use the opportunity to plug a few functional races around drm_device ownership rules. I do think it works. One thing I'd change is make it a spinlock - that wayy it's very clear that it's a tiny inner lock that's really only meant to protect the ->master pointer. > Otherwise, on a high level, I think using an srcu mechanism makes a lot of > sense to me to address the issue of data items being reclaimed while some > readers still have references to them. > > The implementation details seem sound to me too, but I'll need to code it up > a bit before I can comment further. So maybe this is complete overkill, but what about three locks :-) - innermost spinlock, just to protect against use-after-free until we successfully got a reference. Essentially this is the lookup lock - maybe we could call it master_lookup_lock for clarity? - mutex like we have right now to make sure master state is consistent when someone races set/dropmaster in userspace. This would be the only write lock we have. - new srcu to make sure that after a dropmaster/revoke-lease all previous users calls are flushed out with synchronize_srcu(). Essentially this wouldn't be a lock, but more a barrier. So maybe should call it master_barrier_srcu or so? fbdev emulation in drm_client would use this, and also drm_ioctl code to plug the race I've spotted. So maybe refresh your series with just the pieces you think we need for the master lookup spinlock, and we try to land that first? I do agree this should work against the use-after-free. Cheers, Daniel > > Best wishes, > Desmond > > > > Changes in v5 -> v6: > > > - Patch 2: > > > Add patch 2 to the series. This patch moves the call to _drm_lease_held out from the section locked by &dev->mode_config.idr_mutex in __drm_mode_object_find. > > > > > > - Patch 5: > > > Clarify the kerneldoc for dereferencing drm_file.master, as suggested by Daniel Vetter. > > > > > > Refactor error paths with goto labels so that each function only has a single drm_master_put(), as suggested by Emil Velikov. > > > > > > Modify comparison to NULL into "!master", as suggested by the intel-gfx CI. > > > > > > Changes in v4 -> v5: > > > - Patch 1: > > > Add patch 1 to the series. The changes in patch 1 do not apply to stable because they apply to new changes in the drm-misc-next branch. This patch moves the call to drm_is_current_master in drm_mode_getconnector out from the section locked by &dev->mode_config.mutex. > > > > > > Additionally, added a missing semicolon to the patch, caught by the intel-gfx CI. > > > > > > - Patch 3: > > > Move changes to drm_connector.c into patch 1. > > > > > > Changes in v3 -> v4: > > > - Patch 3: > > > Move the call to drm_is_current_master in drm_mode_getconnector out from the section locked by &dev->mode_config.mutex. As suggested by Daniel Vetter. This avoids a circular lock lock dependency as reported here https://patchwork.freedesktop.org/patch/440406/ > > > > > > Additionally, inside drm_is_current_master, instead of grabbing &fpriv->master->dev->master_mutex, we grab &fpriv->minor->dev->master_mutex to avoid dereferencing a null ptr if fpriv->master is not set. > > > > > > - Patch 5: > > > Modify kerneldoc formatting. > > > > > > Additionally, add a file_priv->master NULL check inside drm_file_get_master, and handle the NULL result accordingly in drm_lease.c. As suggested by Daniel Vetter. > > > > > > Changes in v2 -> v3: > > > - Patch 3: > > > Move the definition of drm_is_current_master and the _locked version higher up in drm_auth.c to avoid needing a forward declaration of drm_is_current_master_locked. As suggested by Daniel Vetter. > > > > > > - Patch 5: > > > Instead of leaking drm_device.master_mutex into drm_lease.c to protect drm_master pointers, add a new drm_file_get_master() function that returns drm_file->master while increasing its reference count, to prevent drm_file->master from being freed. As suggested by Daniel Vetter. > > > > > > Changes in v1 -> v2: > > > - Patch 5: > > > Move the lock and assignment before the DRM_DEBUG_LEASE in drm_mode_get_lease_ioctl, as suggested by Emil Velikov. > > > > > > Desmond Cheong Zhi Xi (5): > > > drm: avoid circular locks in drm_mode_getconnector > > > drm: separate locks in __drm_mode_object_find > > > drm: add a locked version of drm_is_current_master > > > drm: serialize drm_file.master with a master lock > > > drm: protect drm_master pointers in drm_lease.c > > > > > > drivers/gpu/drm/drm_auth.c | 86 +++++++++++++++++++++++-------- > > > drivers/gpu/drm/drm_connector.c | 5 +- > > > drivers/gpu/drm/drm_file.c | 1 + > > > drivers/gpu/drm/drm_lease.c | 81 ++++++++++++++++++++++------- > > > drivers/gpu/drm/drm_mode_object.c | 10 ++-- > > > include/drm/drm_auth.h | 1 + > > > include/drm/drm_file.h | 18 +++++-- > > > 7 files changed, 153 insertions(+), 49 deletions(-) > > > > > > -- > > > 2.25.1 > > > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

4 years, 7 months

1
0
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig