From: Rob Clark robdclark@chromium.org
This series adds a deadline hint to fences, so realtime deadlines such as vblank can be communicated to the fence signaller for power/ frequency management decisions.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers
This iteration adds a dma-fence ioctl to set a deadline (both to support igt-tests, and compositors which delay decisions about which client buffer to display), and a sw_sync ioctl to read back the deadline. IGT tests utilizing these can be found at:
https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadli...
v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence v3: Add support in fence-array and fence-chain; Add some uabi to support igt tests and userspace compositors. v4: Rebase, address various comments, and add syncobj deadline support, and sync_file EPOLLPRI based on experience with perf/ freq issues with clvk compute workloads on i915 (anv) v5: Clarify that this is a hint as opposed to a more hard deadline guarantee, switch to using u64 ns values in UABI (still absolute CLOCK_MONOTONIC values), drop syncobj related cap and driver feature flag in favor of allowing count_handles==0 for probing kernel support. v6: Re-work vblank helper to calculate time of _start_ of vblank, and work correctly if the last vblank event was more than a frame ago. Add (mostly unrelated) drm/msm patch which also uses the vblank helper. Use dma_fence_chain_contained(). More verbose syncobj UABI comments. Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT. v7: Fix kbuild complaints about vblank helper. Add more docs. v8: Add patch to surface sync_file UAPI, and more docs updates. v9: Drop (E)POLLPRI support.. I still like it, but not essential and it can always be revived later. Fix doc build warning.
Rob Clark (15): dma-buf/dma-fence: Add deadline awareness dma-buf/fence-array: Add fence deadline support dma-buf/fence-chain: Add fence deadline support dma-buf/dma-resv: Add a way to set fence deadline dma-buf/sync_file: Surface sync-file uABI dma-buf/sync_file: Add SET_DEADLINE ioctl dma-buf/sw_sync: Add fence deadline support drm/scheduler: Add fence deadline support drm/syncobj: Add deadline support for syncobj waits drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/msm: Add deadline based boost support drm/msm: Add wait-boost support drm/msm/atomic: Switch to vblank_start helper drm/i915: Add deadline based boost support
Documentation/driver-api/dma-buf.rst | 16 ++++- drivers/dma-buf/dma-fence-array.c | 11 ++++ drivers/dma-buf/dma-fence-chain.c | 12 ++++ drivers/dma-buf/dma-fence.c | 60 ++++++++++++++++++ drivers/dma-buf/dma-resv.c | 22 +++++++ drivers/dma-buf/sw_sync.c | 81 +++++++++++++++++++++++++ drivers/dma-buf/sync_debug.h | 2 + drivers/dma-buf/sync_file.c | 19 ++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++ drivers/gpu/drm/drm_syncobj.c | 64 +++++++++++++++---- drivers/gpu/drm/drm_vblank.c | 53 +++++++++++++--- drivers/gpu/drm/i915/i915_request.c | 20 ++++++ drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 ----- drivers/gpu/drm/msm/msm_atomic.c | 8 ++- drivers/gpu/drm/msm/msm_drv.c | 12 ++-- drivers/gpu/drm/msm/msm_fence.c | 74 ++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 ++++++ drivers/gpu/drm/msm/msm_gem.c | 5 ++ drivers/gpu/drm/msm/msm_kms.h | 8 --- drivers/gpu/drm/scheduler/sched_fence.c | 46 ++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 17 ++++++ include/linux/dma-fence.h | 22 +++++++ include/linux/dma-resv.h | 2 + include/uapi/drm/drm.h | 17 ++++++ include/uapi/drm/msm_drm.h | 14 ++++- include/uapi/linux/sync_file.h | 59 +++++++++++------- 28 files changed, 639 insertions(+), 79 deletions(-)
From: Rob Clark robdclark@chromium.org
Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.
v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig] v3: Clarify locking wrt. set_deadline callback v4: Clarify in docs comment that this is a hint v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT. v6: More docs v7: Fix typo, clarify past deadlines
Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Christian König christian.koenig@amd.com Acked-by: Pekka Paalanen pekka.paalanen@collabora.com Reviewed-by: Bagas Sanjaya bagasdotme@gmail.com --- Documentation/driver-api/dma-buf.rst | 6 +++ drivers/dma-buf/dma-fence.c | 59 ++++++++++++++++++++++++++++ include/linux/dma-fence.h | 22 +++++++++++ 3 files changed, 87 insertions(+)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst index 622b8156d212..183e480d8cea 100644 --- a/Documentation/driver-api/dma-buf.rst +++ b/Documentation/driver-api/dma-buf.rst @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations .. kernel-doc:: drivers/dma-buf/dma-fence.c :doc: fence signalling annotation
+DMA Fence Deadline Hints +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: drivers/dma-buf/dma-fence.c + :doc: deadline hints + DMA Fences Functions Reference ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 0de0482cd36e..f177c56269bb 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);
+/** + * DOC: deadline hints + * + * In an ideal world, it would be possible to pipeline a workload sufficiently + * that a utilization based device frequency governor could arrive at a minimum + * frequency that meets the requirements of the use-case, in order to minimize + * power consumption. But in the real world there are many workloads which + * defy this ideal. For example, but not limited to: + * + * * Workloads that ping-pong between device and CPU, with alternating periods + * of CPU waiting for device, and device waiting on CPU. This can result in + * devfreq and cpufreq seeing idle time in their respective domains and in + * result reduce frequency. + * + * * Workloads that interact with a periodic time based deadline, such as double + * buffered GPU rendering vs vblank sync'd page flipping. In this scenario, + * missing a vblank deadline results in an *increase* in idle time on the GPU + * (since it has to wait an additional vblank period), sending a signal to + * the GPU's devfreq to reduce frequency, when in fact the opposite is what is + * needed. + * + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline. + * The deadline hint provides a way for the waiting driver, or userspace, to + * convey an appropriate sense of urgency to the signaling driver. + * + * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace + * facing APIs). The time could either be some point in the future (such as + * the vblank based deadline for page-flipping, or the start of a compositor's + * composition cycle), or the current time to indicate an immediate deadline + * hint (Ie. forward progress cannot be made until this fence is signaled). + * + * Multiple deadlines may be set on a given fence, even in parallel. See the + * documentation for &dma_fence_ops.set_deadline. + * + * The deadline hint is just that, a hint. The driver that created the fence + * may react by increasing frequency, making different scheduling choices, etc. + * Or doing nothing at all. + */ + +/** + * dma_fence_set_deadline - set desired fence-wait deadline hint + * @fence: the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + * signaled + * + * Give the fence signaler a hint about an upcoming deadline, such as + * vblank, by which point the waiter would prefer the fence to be + * signaled by. This is intended to give feedback to the fence signaler + * to aid in power management decisions, such as boosting GPU frequency + * if a periodic vblank deadline is approaching but the fence is not + * yet signaled.. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) + fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_describe - Dump fence describtion into seq_file * @fence: the 6fence to describe diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 775cdc0b4f24..d54b595a0fe0 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -257,6 +257,26 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size); + + /** + * @set_deadline: + * + * Callback to allow a fence waiter to inform the fence signaler of + * an upcoming deadline, such as vblank, by which point the waiter + * would prefer the fence to be signaled by. This is intended to + * give feedback to the fence signaler to aid in power management + * decisions, such as boosting GPU frequency. + * + * This is called without &dma_fence.lock held, it can be called + * multiple times and from any context. Locking is up to the callee + * if it has some state to manage. If multiple deadlines are set, + * the expectation is to track the soonest one. If the deadline is + * before the current time, it should be interpreted as an immediate + * deadline. + * + * This callback is optional. + */ + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); };
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -583,6 +603,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline); + struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);
From: Rob Clark robdclark@chromium.org
Propagate the deadline to all the fences in the array.
Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-fence-array.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/dma-buf/dma-fence-array.c b/drivers/dma-buf/dma-fence-array.c index 5c8a7084577b..9b3ce8948351 100644 --- a/drivers/dma-buf/dma-fence-array.c +++ b/drivers/dma-buf/dma-fence-array.c @@ -123,12 +123,23 @@ static void dma_fence_array_release(struct dma_fence *fence) dma_fence_free(fence); }
+static void dma_fence_array_set_deadline(struct dma_fence *fence, + ktime_t deadline) +{ + struct dma_fence_array *array = to_dma_fence_array(fence); + unsigned i; + + for (i = 0; i < array->num_fences; ++i) + dma_fence_set_deadline(array->fences[i], deadline); +} + const struct dma_fence_ops dma_fence_array_ops = { .get_driver_name = dma_fence_array_get_driver_name, .get_timeline_name = dma_fence_array_get_timeline_name, .enable_signaling = dma_fence_array_enable_signaling, .signaled = dma_fence_array_signaled, .release = dma_fence_array_release, + .set_deadline = dma_fence_array_set_deadline, }; EXPORT_SYMBOL(dma_fence_array_ops);
From: Rob Clark robdclark@chromium.org
Propagate the deadline to all the fences in the chain.
v2: Use dma_fence_chain_contained [Tvrtko]
Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Christian König christian.koenig@amd.com for this one. --- drivers/dma-buf/dma-fence-chain.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-fence-chain.c index a0d920576ba6..9663ba1bb6ac 100644 --- a/drivers/dma-buf/dma-fence-chain.c +++ b/drivers/dma-buf/dma-fence-chain.c @@ -206,6 +206,17 @@ static void dma_fence_chain_release(struct dma_fence *fence) dma_fence_free(fence); }
+ +static void dma_fence_chain_set_deadline(struct dma_fence *fence, + ktime_t deadline) +{ + dma_fence_chain_for_each(fence, fence) { + struct dma_fence *f = dma_fence_chain_contained(fence); + + dma_fence_set_deadline(f, deadline); + } +} + const struct dma_fence_ops dma_fence_chain_ops = { .use_64bit_seqno = true, .get_driver_name = dma_fence_chain_get_driver_name, @@ -213,6 +224,7 @@ const struct dma_fence_ops dma_fence_chain_ops = { .enable_signaling = dma_fence_chain_enable_signaling, .signaled = dma_fence_chain_signaled, .release = dma_fence_chain_release, + .set_deadline = dma_fence_chain_set_deadline, }; EXPORT_SYMBOL(dma_fence_chain_ops);
From: Rob Clark robdclark@chromium.org
Add a way to set a deadline on remaining resv fences according to the requested usage.
Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 22 ++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 24 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 1c76aed8e262..2a594b754af1 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -684,6 +684,28 @@ long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, } EXPORT_SYMBOL_GPL(dma_resv_wait_timeout);
+/** + * dma_resv_set_deadline - Set a deadline on reservation's objects fences + * @obj: the reservation object + * @usage: controls which fences to include, see enum dma_resv_usage. + * @deadline: the requested deadline (MONOTONIC) + * + * May be called without holding the dma_resv lock. Sets @deadline on + * all fences filtered by @usage. + */ +void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage, + ktime_t deadline) +{ + struct dma_resv_iter cursor; + struct dma_fence *fence; + + dma_resv_iter_begin(&cursor, obj, usage); + dma_resv_for_each_fence_unlocked(&cursor, fence) { + dma_fence_set_deadline(fence, deadline); + } + dma_resv_iter_end(&cursor); +} +EXPORT_SYMBOL_GPL(dma_resv_set_deadline);
/** * dma_resv_test_signaled - Test if a reservation object's fences have been diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 0637659a702c..8d0e34dad446 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -479,6 +479,8 @@ int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage, int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, bool intr, unsigned long timeout); +void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage, + ktime_t deadline); bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage); void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq);
From: Rob Clark robdclark@chromium.org
We had all of the internal driver APIs, but not the all important userspace uABI, in the dma-buf doc. Fix that. And re-arrange the comments slightly as otherwise the comments for the ioctl nr defines would not show up.
v2: Fix docs build warning coming from newly including the uabi header in the docs build
Signed-off-by: Rob Clark robdclark@chromium.org Acked-by: Pekka Paalanen pekka.paalanen@collabora.com --- Documentation/driver-api/dma-buf.rst | 10 ++++++-- include/uapi/linux/sync_file.h | 37 +++++++++++----------------- 2 files changed, 23 insertions(+), 24 deletions(-)
diff --git a/Documentation/driver-api/dma-buf.rst b/Documentation/driver-api/dma-buf.rst index 183e480d8cea..ff3f8da296af 100644 --- a/Documentation/driver-api/dma-buf.rst +++ b/Documentation/driver-api/dma-buf.rst @@ -203,8 +203,8 @@ DMA Fence unwrap .. kernel-doc:: include/linux/dma-fence-unwrap.h :internal:
-DMA Fence uABI/Sync File -~~~~~~~~~~~~~~~~~~~~~~~~ +DMA Fence Sync File +~~~~~~~~~~~~~~~~~~~
.. kernel-doc:: drivers/dma-buf/sync_file.c :export: @@ -212,6 +212,12 @@ DMA Fence uABI/Sync File .. kernel-doc:: include/linux/sync_file.h :internal:
+DMA Fence Sync File uABI +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: include/uapi/linux/sync_file.h + :internal: + Indefinite DMA Fences ~~~~~~~~~~~~~~~~~~~~~
diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h index ee2dcfb3d660..7e42a5b7558b 100644 --- a/include/uapi/linux/sync_file.h +++ b/include/uapi/linux/sync_file.h @@ -16,12 +16,16 @@ #include <linux/types.h>
/** - * struct sync_merge_data - data passed to merge ioctl + * struct sync_merge_data - SYNC_IOC_MERGE: merge two fences * @name: name of new fence * @fd2: file descriptor of second fence * @fence: returns the fd of the new fence to userspace * @flags: merge_data flags * @pad: padding for 64-bit alignment, should always be zero + * + * Creates a new fence containing copies of the sync_pts in both + * the calling fd and sync_merge_data.fd2. Returns the new fence's + * fd in sync_merge_data.fence */ struct sync_merge_data { char name[32]; @@ -34,8 +38,8 @@ struct sync_merge_data { /** * struct sync_fence_info - detailed fence information * @obj_name: name of parent sync_timeline -* @driver_name: name of driver implementing the parent -* @status: status of the fence 0:active 1:signaled <0:error + * @driver_name: name of driver implementing the parent + * @status: status of the fence 0:active 1:signaled <0:error * @flags: fence_info flags * @timestamp_ns: timestamp of status change in nanoseconds */ @@ -48,14 +52,19 @@ struct sync_fence_info { };
/** - * struct sync_file_info - data returned from fence info ioctl + * struct sync_file_info - SYNC_IOC_FILE_INFO: get detailed information on a sync_file * @name: name of fence * @status: status of fence. 1: signaled 0:active <0:error * @flags: sync_file_info flags * @num_fences number of fences in the sync_file * @pad: padding for 64-bit alignment, should always be zero - * @sync_fence_info: pointer to array of structs sync_fence_info with all + * @sync_fence_info: pointer to array of struct &sync_fence_info with all * fences in the sync_file + * + * Takes a struct sync_file_info. If num_fences is 0, the field is updated + * with the actual number of fences. If num_fences is > 0, the system will + * use the pointer provided on sync_fence_info to return up to num_fences of + * struct sync_fence_info, with detailed fence information. */ struct sync_file_info { char name[32]; @@ -69,30 +78,14 @@ struct sync_file_info {
#define SYNC_IOC_MAGIC '>'
-/** +/* * Opcodes 0, 1 and 2 were burned during a API change to avoid users of the * old API to get weird errors when trying to handling sync_files. The API * change happened during the de-stage of the Sync Framework when there was * no upstream users available. */
-/** - * DOC: SYNC_IOC_MERGE - merge two fences - * - * Takes a struct sync_merge_data. Creates a new fence containing copies of - * the sync_pts in both the calling fd and sync_merge_data.fd2. Returns the - * new fence's fd in sync_merge_data.fence - */ #define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data) - -/** - * DOC: SYNC_IOC_FILE_INFO - get detailed information on a sync_file - * - * Takes a struct sync_file_info. If num_fences is 0, the field is updated - * with the actual number of fences. If num_fences is > 0, the system will - * use the pointer provided on sync_fence_info to return up to num_fences of - * struct sync_fence_info, with detailed fence information. - */ #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
#endif /* _UAPI_LINUX_SYNC_H */
From: Rob Clark robdclark@chromium.org
The initial purpose is for igt tests, but this would also be useful for compositors that wait until close to vblank deadline to make decisions about which frame to show.
The igt tests can be found at:
https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadli...
v2: Clarify the timebase, add link to igt tests v3: Use u64 value in ns to express deadline. v4: More doc
Signed-off-by: Rob Clark robdclark@chromium.org Acked-by: Pekka Paalanen pekka.paalanen@collabora.com --- drivers/dma-buf/dma-fence.c | 3 ++- drivers/dma-buf/sync_file.c | 19 +++++++++++++++++++ include/uapi/linux/sync_file.h | 22 ++++++++++++++++++++++ 3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index f177c56269bb..74e36f6d05b0 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -933,7 +933,8 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout); * the GPU's devfreq to reduce frequency, when in fact the opposite is what is * needed. * - * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline. + * To this end, deadline hint(s) can be set on a &dma_fence via &dma_fence_set_deadline + * (or indirectly via userspace facing ioctls like &sync_set_deadline). * The deadline hint provides a way for the waiting driver, or userspace, to * convey an appropriate sense of urgency to the signaling driver. * diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index af57799c86ce..418021cfb87c 100644 --- a/drivers/dma-buf/sync_file.c +++ b/drivers/dma-buf/sync_file.c @@ -350,6 +350,22 @@ static long sync_file_ioctl_fence_info(struct sync_file *sync_file, return ret; }
+static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, + unsigned long arg) +{ + struct sync_set_deadline ts; + + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) + return -EFAULT; + + if (ts.pad) + return -EINVAL; + + dma_fence_set_deadline(sync_file->fence, ns_to_ktime(ts.deadline_ns)); + + return 0; +} + static long sync_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -362,6 +378,9 @@ static long sync_file_ioctl(struct file *file, unsigned int cmd, case SYNC_IOC_FILE_INFO: return sync_file_ioctl_fence_info(sync_file, arg);
+ case SYNC_IOC_SET_DEADLINE: + return sync_file_ioctl_set_deadline(sync_file, arg); + default: return -ENOTTY; } diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h index 7e42a5b7558b..d61752dca4c6 100644 --- a/include/uapi/linux/sync_file.h +++ b/include/uapi/linux/sync_file.h @@ -76,6 +76,27 @@ struct sync_file_info { __u64 sync_fence_info; };
+/** + * struct sync_set_deadline - SYNC_IOC_SET_DEADLINE - set a deadline hint on a fence + * @deadline_ns: absolute time of the deadline + * @pad: must be zero + * + * Allows userspace to set a deadline on a fence, see &dma_fence_set_deadline + * + * The timebase for the deadline is CLOCK_MONOTONIC (same as vblank). For + * example + * + * clock_gettime(CLOCK_MONOTONIC, &t); + * deadline_ns = (t.tv_sec * 1000000000L) + t.tv_nsec + ns_until_deadline + */ +struct sync_set_deadline { + __u64 deadline_ns; + /* Not strictly needed for alignment but gives some possibility + * for future extension: + */ + __u64 pad; +}; + #define SYNC_IOC_MAGIC '>'
/* @@ -87,5 +108,6 @@ struct sync_file_info {
#define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data) #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info) +#define SYNC_IOC_SET_DEADLINE _IOW(SYNC_IOC_MAGIC, 5, struct sync_set_deadline)
#endif /* _UAPI_LINUX_SYNC_H */
From: Rob Clark robdclark@chromium.org
This consists of simply storing the most recent deadline, and adding an ioctl to retrieve the deadline. This can be used in conjunction with the SET_DEADLINE ioctl on a fence fd for testing. Ie. create various sw_sync fences, merge them into a fence-array, set deadline on the fence-array and confirm that it is propagated properly to each fence.
v2: Switch UABI to express deadline as u64 v3: More verbose UAPI docs, show how to convert from timespec v4: Better comments, track the soonest deadline, as a normal fence implementation would, return an error if no deadline set.
Signed-off-by: Rob Clark robdclark@chromium.org Reviewed-by: Christian König christian.koenig@amd.com Acked-by: Pekka Paalanen pekka.paalanen@collabora.com --- drivers/dma-buf/sw_sync.c | 81 ++++++++++++++++++++++++++++++++++++ drivers/dma-buf/sync_debug.h | 2 + 2 files changed, 83 insertions(+)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 348b3a9170fa..f53071bca3af 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -52,12 +52,33 @@ struct sw_sync_create_fence_data { __s32 fence; /* fd of new fence */ };
+/** + * struct sw_sync_get_deadline - get the deadline hint of a sw_sync fence + * @deadline_ns: absolute time of the deadline + * @pad: must be zero + * @fence_fd: the sw_sync fence fd (in) + * + * Return the earliest deadline set on the fence. The timebase for the + * deadline is CLOCK_MONOTONIC (same as vblank). If there is no deadline + * set on the fence, this ioctl will return -ENOENT. + */ +struct sw_sync_get_deadline { + __u64 deadline_ns; + __u32 pad; + __s32 fence_fd; +}; + #define SW_SYNC_IOC_MAGIC 'W'
#define SW_SYNC_IOC_CREATE_FENCE _IOWR(SW_SYNC_IOC_MAGIC, 0,\ struct sw_sync_create_fence_data)
#define SW_SYNC_IOC_INC _IOW(SW_SYNC_IOC_MAGIC, 1, __u32) +#define SW_SYNC_GET_DEADLINE _IOWR(SW_SYNC_IOC_MAGIC, 2, \ + struct sw_sync_get_deadline) + + +#define SW_SYNC_HAS_DEADLINE_BIT DMA_FENCE_FLAG_USER_BITS
static const struct dma_fence_ops timeline_fence_ops;
@@ -171,6 +192,22 @@ static void timeline_fence_timeline_value_str(struct dma_fence *fence, snprintf(str, size, "%d", parent->value); }
+static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + struct sync_pt *pt = dma_fence_to_sync_pt(fence); + unsigned long flags; + + spin_lock_irqsave(fence->lock, flags); + if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) { + if (ktime_before(deadline, pt->deadline)) + pt->deadline = deadline; + } else { + pt->deadline = deadline; + set_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags); + } + spin_unlock_irqrestore(fence->lock, flags); +} + static const struct dma_fence_ops timeline_fence_ops = { .get_driver_name = timeline_fence_get_driver_name, .get_timeline_name = timeline_fence_get_timeline_name, @@ -179,6 +216,7 @@ static const struct dma_fence_ops timeline_fence_ops = { .release = timeline_fence_release, .fence_value_str = timeline_fence_value_str, .timeline_value_str = timeline_fence_timeline_value_str, + .set_deadline = timeline_fence_set_deadline, };
/** @@ -387,6 +425,46 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg) return 0; }
+static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long arg) +{ + struct sw_sync_get_deadline data; + struct dma_fence *fence; + struct sync_pt *pt; + int ret = 0; + + if (copy_from_user(&data, (void __user *)arg, sizeof(data))) + return -EFAULT; + + if (data.deadline_ns || data.pad) + return -EINVAL; + + fence = sync_file_get_fence(data.fence_fd); + if (!fence) + return -EINVAL; + + pt = dma_fence_to_sync_pt(fence); + if (!pt) + return -EINVAL; + + spin_lock(fence->lock); + if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, &fence->flags)) { + data.deadline_ns = ktime_to_ns(pt->deadline); + } else { + ret = -ENOENT; + } + spin_unlock(fence->lock); + + dma_fence_put(fence); + + if (ret) + return ret; + + if (copy_to_user((void __user *)arg, &data, sizeof(data))) + return -EFAULT; + + return 0; +} + static long sw_sync_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -399,6 +477,9 @@ static long sw_sync_ioctl(struct file *file, unsigned int cmd, case SW_SYNC_IOC_INC: return sw_sync_ioctl_inc(obj, arg);
+ case SW_SYNC_GET_DEADLINE: + return sw_sync_ioctl_get_deadline(obj, arg); + default: return -ENOTTY; } diff --git a/drivers/dma-buf/sync_debug.h b/drivers/dma-buf/sync_debug.h index 6176e52ba2d7..a1bdd62efccd 100644 --- a/drivers/dma-buf/sync_debug.h +++ b/drivers/dma-buf/sync_debug.h @@ -55,11 +55,13 @@ static inline struct sync_timeline *dma_fence_parent(struct dma_fence *fence) * @base: base fence object * @link: link on the sync timeline's list * @node: node in the sync timeline's tree + * @deadline: the earliest fence deadline hint */ struct sync_pt { struct dma_fence base; struct list_head link; struct rb_node node; + ktime_t deadline; };
extern const struct file_operations sw_sync_debugfs_fops;
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
v2: Split into drm_sched_fence_set_parent() (ckoenig) v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees fence->parent set before drm_sched_fence_set_parent() does this test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT).
Signed-off-by: Rob Clark robdclark@chromium.org Acked-by: Luben Tuikov luben.tuikov@amd.com --- drivers/gpu/drm/scheduler/sched_fence.c | 46 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 2 +- include/drm/gpu_scheduler.h | 17 +++++++++ 3 files changed, 64 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 7fd869520ef2..fe9c6468e440 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -123,6 +123,37 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); }
+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, + ktime_t deadline) +{ + struct drm_sched_fence *fence = to_drm_sched_fence(f); + struct dma_fence *parent; + unsigned long flags; + + spin_lock_irqsave(&fence->lock, flags); + + /* If we already have an earlier deadline, keep it: */ + if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && + ktime_before(fence->deadline, deadline)) { + spin_unlock_irqrestore(&fence->lock, flags); + return; + } + + fence->deadline = deadline; + set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); + + spin_unlock_irqrestore(&fence->lock, flags); + + /* + * smp_load_aquire() to ensure that if we are racing another + * thread calling drm_sched_fence_set_parent(), that we see + * the parent set before it calls test_bit(HAS_DEADLINE_BIT) + */ + parent = smp_load_acquire(&fence->parent); + if (parent) + dma_fence_set_deadline(parent, deadline); +} + static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, @@ -133,6 +164,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished, + .set_deadline = drm_sched_fence_set_deadline_finished, };
struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) @@ -147,6 +179,20 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) } EXPORT_SYMBOL(to_drm_sched_fence);
+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, + struct dma_fence *fence) +{ + /* + * smp_store_release() to ensure another thread racing us + * in drm_sched_fence_set_deadline_finished() sees the + * fence's parent set before test_bit() + */ + smp_store_release(&s_fence->parent, dma_fence_get(fence)); + if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, + &s_fence->finished.flags)) + dma_fence_set_deadline(fence, s_fence->deadline); +} + struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity, void *owner) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 4e6ad6e122bc..007f98c48f8d 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1019,7 +1019,7 @@ static int drm_sched_main(void *param) drm_sched_fence_scheduled(s_fence);
if (!IS_ERR_OR_NULL(fence)) { - s_fence->parent = dma_fence_get(fence); + drm_sched_fence_set_parent(s_fence, fence); /* Drop for original kref_init of the fence */ dma_fence_put(fence);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 9db9e5e504ee..99584e457153 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -41,6 +41,15 @@ */ #define DRM_SCHED_FENCE_DONT_PIPELINE DMA_FENCE_FLAG_USER_BITS
+/** + * DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT - A fence deadline hint has been set + * + * Because we could have a deadline hint can be set before the backing hw + * fence is created, we need to keep track of whether a deadline has already + * been set. + */ +#define DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT (DMA_FENCE_FLAG_USER_BITS + 1) + enum dma_resv_usage; struct dma_resv; struct drm_gem_object; @@ -280,6 +289,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
+ /** + * @deadline: deadline set on &drm_sched_fence.finished which + * potentially needs to be propagated to &drm_sched_fence.parent + */ + ktime_t deadline; + /** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the @@ -568,6 +583,8 @@ void drm_sched_entity_set_priority(struct drm_sched_entity *entity, enum drm_sched_priority priority); bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, + struct dma_fence *fence); struct drm_sched_fence *drm_sched_fence_alloc( struct drm_sched_entity *s_entity, void *owner); void drm_sched_fence_init(struct drm_sched_fence *fence,
From: Rob Clark robdclark@chromium.org
Track the nearest deadline on a fence timeline and set a timer to expire shortly before to trigger boost if the fence has not yet been signaled.
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ 2 files changed, 94 insertions(+)
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index 56641408ea74..51b461f32103 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,35 @@
#include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h" + +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{ + struct msm_drm_private *priv = fctx->dev->dev_private; + return priv->gpu; +} + +static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{ + struct msm_fence_context *fctx = container_of(t, + struct msm_fence_context, deadline_timer); + + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); + + return HRTIMER_NORESTART; +} + +static void deadline_work(struct kthread_work *work) +{ + struct msm_fence_context *fctx = container_of(work, + struct msm_fence_context, deadline_work); + + /* If deadline fence has already passed, nothing to do: */ + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) + return; + + msm_devfreq_boost(fctx2gpu(fctx), 2); +}
struct msm_fence_context * @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->completed_fence = fctx->last_fence; *fctx->fenceptr = fctx->last_fence;
+ hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + fctx->deadline_timer.function = deadline_timer; + + kthread_init_work(&fctx->deadline_work, deadline_work); + + fctx->next_deadline = ktime_get(); + return fctx; }
@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) spin_lock_irqsave(&fctx->spinlock, flags); if (fence_after(fence, fctx->completed_fence)) fctx->completed_fence = fence; + if (msm_fence_completed(fctx, fctx->next_deadline_fence)) + hrtimer_cancel(&fctx->deadline_timer); spin_unlock_irqrestore(&fctx->spinlock, flags); }
@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return msm_fence_completed(f->fctx, f->base.seqno); }
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + struct msm_fence *f = to_msm_fence(fence); + struct msm_fence_context *fctx = f->fctx; + unsigned long flags; + ktime_t now; + + spin_lock_irqsave(&fctx->spinlock, flags); + now = ktime_get(); + + if (ktime_after(now, fctx->next_deadline) || + ktime_before(deadline, fctx->next_deadline)) { + fctx->next_deadline = deadline; + fctx->next_deadline_fence = + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); + + /* + * Set timer to trigger boost 3ms before deadline, or + * if we are already less than 3ms before the deadline + * schedule boost work immediately. + */ + deadline = ktime_sub(deadline, ms_to_ktime(3)); + + if (ktime_after(now, deadline)) { + kthread_queue_work(fctx2gpu(fctx)->worker, + &fctx->deadline_work); + } else { + hrtimer_start(&fctx->deadline_timer, deadline, + HRTIMER_MODE_ABS); + } + } + + spin_unlock_irqrestore(&fctx->spinlock, flags); +} + static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled, + .set_deadline = msm_fence_set_deadline, };
struct dma_fence * diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 7f1798c54cd1..cdaebfb94f5c 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -52,6 +52,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr;
spinlock_t spinlock; + + /* + * TODO this doesn't really deal with multiple deadlines, like + * if userspace got multiple frames ahead.. OTOH atomic updates + * don't queue, so maybe that is ok + */ + + /** next_deadline: Time of next deadline */ + ktime_t next_deadline; + + /** + * next_deadline_fence: + * + * Fence value for next pending deadline. The deadline timer is + * canceled when this fence is signaled. + */ + uint32_t next_deadline_fence; + + struct hrtimer deadline_timer; + struct kthread_work deadline_work; };
struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
On 03/03/2023 01:53, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Track the nearest deadline on a fence timeline and set a timer to expire shortly before to trigger boost if the fence has not yet been signaled.
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ 2 files changed, 94 insertions(+)
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
A small question: do we boost to fit into the deadline or to miss the deadline for as little as possible? If the former is the case, we might need to adjust 3ms depending on the workload.
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index 56641408ea74..51b461f32103 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,35 @@ #include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h"
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{
- struct msm_drm_private *priv = fctx->dev->dev_private;
- return priv->gpu;
+}
+static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{
- struct msm_fence_context *fctx = container_of(t,
struct msm_fence_context, deadline_timer);
- kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
- return HRTIMER_NORESTART;
+}
+static void deadline_work(struct kthread_work *work) +{
- struct msm_fence_context *fctx = container_of(work,
struct msm_fence_context, deadline_work);
- /* If deadline fence has already passed, nothing to do: */
- if (msm_fence_completed(fctx, fctx->next_deadline_fence))
return;
- msm_devfreq_boost(fctx2gpu(fctx), 2);
+} struct msm_fence_context * @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->completed_fence = fctx->last_fence; *fctx->fenceptr = fctx->last_fence;
- hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
- fctx->deadline_timer.function = deadline_timer;
- kthread_init_work(&fctx->deadline_work, deadline_work);
- fctx->next_deadline = ktime_get();
- return fctx; }
@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) spin_lock_irqsave(&fctx->spinlock, flags); if (fence_after(fence, fctx->completed_fence)) fctx->completed_fence = fence;
- if (msm_fence_completed(fctx, fctx->next_deadline_fence))
spin_unlock_irqrestore(&fctx->spinlock, flags); }hrtimer_cancel(&fctx->deadline_timer);
@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return msm_fence_completed(f->fctx, f->base.seqno); } +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct msm_fence *f = to_msm_fence(fence);
- struct msm_fence_context *fctx = f->fctx;
- unsigned long flags;
- ktime_t now;
- spin_lock_irqsave(&fctx->spinlock, flags);
- now = ktime_get();
- if (ktime_after(now, fctx->next_deadline) ||
ktime_before(deadline, fctx->next_deadline)) {
fctx->next_deadline = deadline;
fctx->next_deadline_fence =
max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
/*
* Set timer to trigger boost 3ms before deadline, or
* if we are already less than 3ms before the deadline
* schedule boost work immediately.
*/
deadline = ktime_sub(deadline, ms_to_ktime(3));
if (ktime_after(now, deadline)) {
kthread_queue_work(fctx2gpu(fctx)->worker,
&fctx->deadline_work);
} else {
hrtimer_start(&fctx->deadline_timer, deadline,
HRTIMER_MODE_ABS);
}
- }
- spin_unlock_irqrestore(&fctx->spinlock, flags);
+}
- static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled,
- .set_deadline = msm_fence_set_deadline, };
struct dma_fence * diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 7f1798c54cd1..cdaebfb94f5c 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -52,6 +52,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr; spinlock_t spinlock;
- /*
* TODO this doesn't really deal with multiple deadlines, like
* if userspace got multiple frames ahead.. OTOH atomic updates
* don't queue, so maybe that is ok
*/
- /** next_deadline: Time of next deadline */
- ktime_t next_deadline;
- /**
* next_deadline_fence:
*
* Fence value for next pending deadline. The deadline timer is
* canceled when this fence is signaled.
*/
- uint32_t next_deadline_fence;
- struct hrtimer deadline_timer;
- struct kthread_work deadline_work; };
struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
On Fri, Mar 3, 2023 at 2:10 AM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
On 03/03/2023 01:53, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Track the nearest deadline on a fence timeline and set a timer to expire shortly before to trigger boost if the fence has not yet been signaled.
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ 2 files changed, 94 insertions(+)
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
A small question: do we boost to fit into the deadline or to miss the deadline for as little as possible? If the former is the case, we might need to adjust 3ms depending on the workload.
The goal is as much to run with higher clock on the next frame as it is to not miss a deadline. Ie. we don't want devfreq to come to the conclusion that running at <50% clks is best due to the amount of utilization caused by missing ever other vblank.
But 3ms is mostly just "seems like a good compromise" value. It might change.
BR, -R
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index 56641408ea74..51b461f32103 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,35 @@
#include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h"
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{
struct msm_drm_private *priv = fctx->dev->dev_private;
return priv->gpu;
+}
+static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{
struct msm_fence_context *fctx = container_of(t,
struct msm_fence_context, deadline_timer);
kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
return HRTIMER_NORESTART;
+}
+static void deadline_work(struct kthread_work *work) +{
struct msm_fence_context *fctx = container_of(work,
struct msm_fence_context, deadline_work);
/* If deadline fence has already passed, nothing to do: */
if (msm_fence_completed(fctx, fctx->next_deadline_fence))
return;
msm_devfreq_boost(fctx2gpu(fctx), 2);
+}
struct msm_fence_context * @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->completed_fence = fctx->last_fence; *fctx->fenceptr = fctx->last_fence;
hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
fctx->deadline_timer.function = deadline_timer;
kthread_init_work(&fctx->deadline_work, deadline_work);
fctx->next_deadline = ktime_get();
}return fctx;
@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) spin_lock_irqsave(&fctx->spinlock, flags); if (fence_after(fence, fctx->completed_fence)) fctx->completed_fence = fence;
if (msm_fence_completed(fctx, fctx->next_deadline_fence))
}hrtimer_cancel(&fctx->deadline_timer); spin_unlock_irqrestore(&fctx->spinlock, flags);
@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return msm_fence_completed(f->fctx, f->base.seqno); }
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
struct msm_fence *f = to_msm_fence(fence);
struct msm_fence_context *fctx = f->fctx;
unsigned long flags;
ktime_t now;
spin_lock_irqsave(&fctx->spinlock, flags);
now = ktime_get();
if (ktime_after(now, fctx->next_deadline) ||
ktime_before(deadline, fctx->next_deadline)) {
fctx->next_deadline = deadline;
fctx->next_deadline_fence =
max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
/*
* Set timer to trigger boost 3ms before deadline, or
* if we are already less than 3ms before the deadline
* schedule boost work immediately.
*/
deadline = ktime_sub(deadline, ms_to_ktime(3));
if (ktime_after(now, deadline)) {
kthread_queue_work(fctx2gpu(fctx)->worker,
&fctx->deadline_work);
} else {
hrtimer_start(&fctx->deadline_timer, deadline,
HRTIMER_MODE_ABS);
}
}
spin_unlock_irqrestore(&fctx->spinlock, flags);
+}
static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled,
.set_deadline = msm_fence_set_deadline,
};
struct dma_fence *
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 7f1798c54cd1..cdaebfb94f5c 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -52,6 +52,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr;
spinlock_t spinlock;
/*
* TODO this doesn't really deal with multiple deadlines, like
* if userspace got multiple frames ahead.. OTOH atomic updates
* don't queue, so maybe that is ok
*/
/** next_deadline: Time of next deadline */
ktime_t next_deadline;
/**
* next_deadline_fence:
*
* Fence value for next pending deadline. The deadline timer is
* canceled when this fence is signaled.
*/
uint32_t next_deadline_fence;
struct hrtimer deadline_timer;
struct kthread_work deadline_work;
};
struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
-- With best wishes Dmitry
On 03/03/2023 19:03, Rob Clark wrote:
On Fri, Mar 3, 2023 at 2:10 AM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
On 03/03/2023 01:53, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Track the nearest deadline on a fence timeline and set a timer to expire shortly before to trigger boost if the fence has not yet been signaled.
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/msm/msm_fence.c | 74 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++++ 2 files changed, 94 insertions(+)
Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
A small question: do we boost to fit into the deadline or to miss the deadline for as little as possible? If the former is the case, we might need to adjust 3ms depending on the workload.
The goal is as much to run with higher clock on the next frame as it is to not miss a deadline. Ie. we don't want devfreq to come to the conclusion that running at <50% clks is best due to the amount of utilization caused by missing ever other vblank.
Ack, thanks for the explanation.
But 3ms is mostly just "seems like a good compromise" value. It might change.
BR, -R
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index 56641408ea74..51b461f32103 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,35 @@
#include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h"
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{
struct msm_drm_private *priv = fctx->dev->dev_private;
return priv->gpu;
+}
+static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{
struct msm_fence_context *fctx = container_of(t,
struct msm_fence_context, deadline_timer);
kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
return HRTIMER_NORESTART;
+}
+static void deadline_work(struct kthread_work *work) +{
struct msm_fence_context *fctx = container_of(work,
struct msm_fence_context, deadline_work);
/* If deadline fence has already passed, nothing to do: */
if (msm_fence_completed(fctx, fctx->next_deadline_fence))
return;
msm_devfreq_boost(fctx2gpu(fctx), 2);
+}
struct msm_fence_context * @@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->completed_fence = fctx->last_fence; *fctx->fenceptr = fctx->last_fence;
hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
fctx->deadline_timer.function = deadline_timer;
kthread_init_work(&fctx->deadline_work, deadline_work);
fctx->next_deadline = ktime_get();
}return fctx;
@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) spin_lock_irqsave(&fctx->spinlock, flags); if (fence_after(fence, fctx->completed_fence)) fctx->completed_fence = fence;
if (msm_fence_completed(fctx, fctx->next_deadline_fence))
}hrtimer_cancel(&fctx->deadline_timer); spin_unlock_irqrestore(&fctx->spinlock, flags);
@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return msm_fence_completed(f->fctx, f->base.seqno); }
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
struct msm_fence *f = to_msm_fence(fence);
struct msm_fence_context *fctx = f->fctx;
unsigned long flags;
ktime_t now;
spin_lock_irqsave(&fctx->spinlock, flags);
now = ktime_get();
if (ktime_after(now, fctx->next_deadline) ||
ktime_before(deadline, fctx->next_deadline)) {
fctx->next_deadline = deadline;
fctx->next_deadline_fence =
max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
/*
* Set timer to trigger boost 3ms before deadline, or
* if we are already less than 3ms before the deadline
* schedule boost work immediately.
*/
deadline = ktime_sub(deadline, ms_to_ktime(3));
if (ktime_after(now, deadline)) {
kthread_queue_work(fctx2gpu(fctx)->worker,
&fctx->deadline_work);
} else {
hrtimer_start(&fctx->deadline_timer, deadline,
HRTIMER_MODE_ABS);
}
}
spin_unlock_irqrestore(&fctx->spinlock, flags);
+}
static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled,
.set_deadline = msm_fence_set_deadline,
};
struct dma_fence *
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 7f1798c54cd1..cdaebfb94f5c 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -52,6 +52,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr;
spinlock_t spinlock;
/*
* TODO this doesn't really deal with multiple deadlines, like
* if userspace got multiple frames ahead.. OTOH atomic updates
* don't queue, so maybe that is ok
*/
/** next_deadline: Time of next deadline */
ktime_t next_deadline;
/**
* next_deadline_fence:
*
* Fence value for next pending deadline. The deadline timer is
* canceled when this fence is signaled.
*/
uint32_t next_deadline_fence;
struct hrtimer deadline_timer;
struct kthread_work deadline_work;
};
struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
-- With best wishes Dmitry
From: Rob Clark robdclark@chromium.org
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + struct i915_request *rq = to_request(fence); + + if (i915_request_completed(rq)) + return; + + if (i915_request_started(rq)) + return; + + /* + * TODO something more clever for deadlines that are in the + * future. I think probably track the nearest deadline in + * rq->timeline and set timer to trigger boost accordingly? + */ + + intel_rps_boost(rq); +} + static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout) @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release, + .set_deadline = i915_fence_set_deadline, };
static void irq_execute_cb(struct irq_work *wrk)
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); } +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
- intel_rps_boost(rq);
+}
static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout) @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
- .set_deadline = i915_fence_set_deadline,
}; static void irq_execute_cb(struct irq_work *wrk) -- 2.39.1
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); } +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
We can off load this patch from Rob and deal with it separately, or after the fact?
It's a half solution without a smarter scheduler too. Like https://lore.kernel.org/all/20210208105236.28498-10-chris@chris-wilson.co.uk..., or if GuC plans to do something like that at any point.
Or bump the priority too if deadline is looming?
IMO it is not very effective to fiddle with the heuristic on an ad-hoc basis. For instance I have a new heuristics which improves the problematic OpenCL cases for further 5% (relative to the current waitboost improvement from adding missing syncobj waitboost). But I can't really test properly for regressions over platforms, stacks, workloads.. :(
Regards,
Tvrtko
- intel_rps_boost(rq);
+}
- static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
- .set_deadline = i915_fence_set_deadline, };
static void irq_execute_cb(struct irq_work *wrk) -- 2.39.1
On Fri, Mar 03, 2023 at 09:58:36AM +0000, Tvrtko Ursulin wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); } +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
We can off load this patch from Rob and deal with it separately, or after the fact?
It's a half solution without a smarter scheduler too. Like https://lore.kernel.org/all/20210208105236.28498-10-chris@chris-wilson.co.uk..., or if GuC plans to do something like that at any point.
Indeed, we already have the deadline implementation (and not just that), we just need to have some willingness to apply it.
Andi
Or bump the priority too if deadline is looming?
IMO it is not very effective to fiddle with the heuristic on an ad-hoc basis. For instance I have a new heuristics which improves the problematic OpenCL cases for further 5% (relative to the current waitboost improvement from adding missing syncobj waitboost). But I can't really test properly for regressions over platforms, stacks, workloads.. :(
Regards,
Tvrtko
- intel_rps_boost(rq);
+}
- static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
- .set_deadline = i915_fence_set_deadline, }; static void irq_execute_cb(struct irq_work *wrk)
-- 2.39.1
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
We can off load this patch from Rob and deal with it separately, or after the fact?
That is completely my intention, I expect you to replace my i915 patch ;-)
Rough idea when everyone is happy with the core bits is to setup an immutable branch without the driver specific patches, which could be merged into drm-next and $driver-next and then each driver team can add there own driver patches on top
BR, -R
It's a half solution without a smarter scheduler too. Like https://lore.kernel.org/all/20210208105236.28498-10-chris@chris-wilson.co.uk..., or if GuC plans to do something like that at any point.
Or bump the priority too if deadline is looming?
IMO it is not very effective to fiddle with the heuristic on an ad-hoc basis. For instance I have a new heuristics which improves the problematic OpenCL cases for further 5% (relative to the current waitboost improvement from adding missing syncobj waitboost). But I can't really test properly for regressions over platforms, stacks, workloads.. :(
Regards,
Tvrtko
- intel_rps_boost(rq);
+}
- static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
.set_deadline = i915_fence_set_deadline, };
static void irq_execute_cb(struct irq_work *wrk)
-- 2.39.1
On Fri, Mar 03, 2023 at 06:48:43AM -0800, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
I don't think i915 calls drm_atomic_helper_wait_for_fences() so that could explain something.
On Fri, Mar 03, 2023 at 05:00:03PM +0200, Ville Syrjälä wrote:
On Fri, Mar 03, 2023 at 06:48:43AM -0800, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
I don't think i915 calls drm_atomic_helper_wait_for_fences() so that could explain something.
Oh, I guess this wasn't even supposed to take over the current display boost stuff since you didn't remove the old one.
The current one just boosts after a missed vblank. The deadline could use your timer approach I suppose and boost already a bit earlier in the hopes of not missing the vblank.
On Fri, Mar 3, 2023 at 7:20 AM Ville Syrjälä ville.syrjala@linux.intel.com wrote:
On Fri, Mar 03, 2023 at 05:00:03PM +0200, Ville Syrjälä wrote:
On Fri, Mar 03, 2023 at 06:48:43AM -0800, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
I don't think i915 calls drm_atomic_helper_wait_for_fences() so that could explain something.
Oh, I guess this wasn't even supposed to take over the current display boost stuff since you didn't remove the old one.
Right, I didn't try to replace the current thing.. but hopefully at least make it possible for i915 to use more of the atomic helpers in the future
BR, -R
The current one just boosts after a missed vblank. The deadline could use your timer approach I suppose and boost already a bit earlier in the hopes of not missing the vblank.
-- Ville Syrjälä Intel
On 03/03/2023 14:48, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
Hm, there I have preserved this same !i915_request_started logic.
Presumably it's not just fewer boosts but lower performance. How is he setting the deadline? Somehow from clFlush or so?
Regards,
Tvrtko
P.S. Take note that I did not post the latest version of my RFC. The one where I fix the fence chain and array misses you pointed out. I did not think it would be worthwhile given no universal love for it, but if people are testing with it more widely that I was aware perhaps I should.
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
We can off load this patch from Rob and deal with it separately, or after the fact?
That is completely my intention, I expect you to replace my i915 patch ;-)
Rough idea when everyone is happy with the core bits is to setup an immutable branch without the driver specific patches, which could be merged into drm-next and $driver-next and then each driver team can add there own driver patches on top
BR, -R
It's a half solution without a smarter scheduler too. Like https://lore.kernel.org/all/20210208105236.28498-10-chris@chris-wilson.co.uk..., or if GuC plans to do something like that at any point.
Or bump the priority too if deadline is looming?
IMO it is not very effective to fiddle with the heuristic on an ad-hoc basis. For instance I have a new heuristics which improves the problematic OpenCL cases for further 5% (relative to the current waitboost improvement from adding missing syncobj waitboost). But I can't really test properly for regressions over platforms, stacks, workloads.. :(
Regards,
Tvrtko
- intel_rps_boost(rq);
+}
- static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
.set_deadline = i915_fence_set_deadline, };
static void irq_execute_cb(struct irq_work *wrk)
-- 2.39.1
On Fri, Mar 3, 2023 at 7:08 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 14:48, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
Hm, there I have preserved this same !i915_request_started logic.
Presumably it's not just fewer boosts but lower performance. How is he setting the deadline? Somehow from clFlush or so?
Yeah, fewer boosts, lower freq/perf.. I cobbled together a quick mesa hack to set the DEADLINE flag on syncobj waits, but it seems likely that I missed something somewhere
BR, -R
Regards,
Tvrtko
P.S. Take note that I did not post the latest version of my RFC. The one where I fix the fence chain and array misses you pointed out. I did not think it would be worthwhile given no universal love for it, but if people are testing with it more widely that I was aware perhaps I should.
- /*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
We can off load this patch from Rob and deal with it separately, or after the fact?
That is completely my intention, I expect you to replace my i915 patch ;-)
Rough idea when everyone is happy with the core bits is to setup an immutable branch without the driver specific patches, which could be merged into drm-next and $driver-next and then each driver team can add there own driver patches on top
BR, -R
It's a half solution without a smarter scheduler too. Like https://lore.kernel.org/all/20210208105236.28498-10-chris@chris-wilson.co.uk..., or if GuC plans to do something like that at any point.
Or bump the priority too if deadline is looming?
IMO it is not very effective to fiddle with the heuristic on an ad-hoc basis. For instance I have a new heuristics which improves the problematic OpenCL cases for further 5% (relative to the current waitboost improvement from adding missing syncobj waitboost). But I can't really test properly for regressions over platforms, stacks, workloads.. :(
Regards,
Tvrtko
- intel_rps_boost(rq);
+}
- static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
.set_deadline = i915_fence_set_deadline, };
static void irq_execute_cb(struct irq_work *wrk)
-- 2.39.1
On Fri, Mar 3, 2023 at 10:08 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 14:48, Rob Clark wrote:
On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 03/03/2023 03:21, Rodrigo Vivi wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- struct i915_request *rq = to_request(fence);
- if (i915_request_completed(rq))
return;
- if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
I'd wager Rob is just copying the current i915 wait boost logic.
Yup, and probably incorrectly.. Matt reported fewer boosts/sec compared to your RFC, this could be the bug
Hm, there I have preserved this same !i915_request_started logic.
Presumably it's not just fewer boosts but lower performance. How is he setting the deadline? Somehow from clFlush or so?
Regards,
Tvrtko
P.S. Take note that I did not post the latest version of my RFC. The one where I fix the fence chain and array misses you pointed out. I did not think it would be worthwhile given no universal love for it, but if people are testing with it more widely that I was aware perhaps I should.
Yep, that would be great. We're interested in it for ChromeOS. Please Cc me on the series when you send it.
On Thu, Mar 2, 2023 at 7:21 PM Rodrigo Vivi rodrigo.vivi@intel.com wrote:
On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
missing some wording here...
the wording should be "Pls replace this patch, kthx" ;-)
v2: rebase
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/i915/i915_request.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043b..44491e7e214c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence *fence) return i915_request_enable_breadcrumb(to_request(fence)); }
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
struct i915_request *rq = to_request(fence);
if (i915_request_completed(rq))
return;
if (i915_request_started(rq))
return;
why do we skip the boost if already started? don't we want to boost the freq anyway?
/*
* TODO something more clever for deadlines that are in the
* future. I think probably track the nearest deadline in
* rq->timeline and set timer to trigger boost accordingly?
*/
I'm afraid it will be very hard to find some heuristics of what's late enough for the boost no? I mean, how early to boost the freq on an upcoming deadline for the timer?
So, from my understanding of i915 boosting, it applies more specifically to a given request (vs msm which just bumps up the freq until the next devfreq sampling period which then recalculates target freq based on busy cycles and avg freq over the last sampling period). For msm I just set a timer for 3ms before the deadline and boost if the fence isn't signaled when the timer fires. It is kinda impossible to predict, even for userspace, how long a job will take to complete, so the goal isn't really to finish the specified job by the deadline, but instead to avoid devfreq landing at a local minimum (maximum?)
AFAIU what I _think_ would make sense for i915 is to do the same thing you do if you miss a vblank pageflip deadline if the deadline passes without the fence signaling.
BR, -R
intel_rps_boost(rq);
+}
static signed long i915_fence_wait(struct dma_fence *fence, bool interruptible, signed long timeout) @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = { .signaled = i915_fence_signaled, .wait = i915_fence_wait, .release = i915_fence_release,
.set_deadline = i915_fence_set_deadline,
};
static void irq_execute_cb(struct irq_work *wrk)
2.39.1
linaro-mm-sig@lists.linaro.org