We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+ --- drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
+ if (p->flags & I915_DEPENDENCY_WEAK) + continue; + /* Leave semaphores spinning on the other engines */ if (w->engine != rq->engine) continue; @@ -2726,6 +2729,9 @@ static void __execlists_hold(struct i915_request *rq) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
+ if (p->flags & I915_DEPENDENCY_WEAK) + continue; + /* Leave semaphores spinning on the other engines */ if (w->engine != rq->engine) continue; @@ -2850,6 +2856,9 @@ static void __execlists_unhold(struct i915_request *rq) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
+ if (p->flags & I915_DEPENDENCY_WEAK) + continue; + /* Propagate any change in error status */ if (rq->fence.error) i915_request_set_error_once(w, rq->fence.error); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 4d18f808fda2..3c38d61c90f8 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1040,7 +1040,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) }
if (to->engine->schedule) { - ret = i915_sched_node_add_dependency(&to->sched, &from->sched); + ret = i915_sched_node_add_dependency(&to->sched, + &from->sched, + I915_DEPENDENCY_EXTERNAL); if (ret < 0) return ret; } @@ -1202,7 +1204,9 @@ __i915_request_await_execution(struct i915_request *to,
/* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->schedule) { - err = i915_sched_node_add_dependency(&to->sched, &from->sched); + err = i915_sched_node_add_dependency(&to->sched, + &from->sched, + I915_DEPENDENCY_WEAK); if (err < 0) return err; } diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 37cfcf5b321b..6e2d4190099f 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -462,7 +462,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, }
int i915_sched_node_add_dependency(struct i915_sched_node *node, - struct i915_sched_node *signal) + struct i915_sched_node *signal, + unsigned long flags) { struct i915_dependency *dep;
@@ -473,8 +474,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, local_bh_disable();
if (!__i915_sched_node_add_dependency(node, signal, dep, - I915_DEPENDENCY_EXTERNAL | - I915_DEPENDENCY_ALLOC)) + flags | I915_DEPENDENCY_ALLOC)) i915_dependency_free(dep);
local_bh_enable(); /* kick submission tasklet */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index d1dc4efef77b..6f0bf00fc569 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -34,7 +34,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, unsigned long flags);
int i915_sched_node_add_dependency(struct i915_sched_node *node, - struct i915_sched_node *signal); + struct i915_sched_node *signal, + unsigned long flags);
void i915_sched_node_fini(struct i915_sched_node *node);
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index d18e70550054..7186875088a0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -78,6 +78,7 @@ struct i915_dependency { unsigned long flags; #define I915_DEPENDENCY_ALLOC BIT(0) #define I915_DEPENDENCY_EXTERNAL BIT(1) +#define I915_DEPENDENCY_WEAK BIT(2) };
#endif /* _I915_SCHEDULER_TYPES_H_ */
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
Regards,
Tvrtko
/* Leave semaphores spinning on the other engines */ if (w->engine != rq->engine) continue;
@@ -2726,6 +2729,9 @@ static void __execlists_hold(struct i915_request *rq) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
/* Leave semaphores spinning on the other engines */ if (w->engine != rq->engine) continue;
@@ -2850,6 +2856,9 @@ static void __execlists_unhold(struct i915_request *rq) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
/* Propagate any change in error status */ if (rq->fence.error) i915_request_set_error_once(w, rq->fence.error);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 4d18f808fda2..3c38d61c90f8 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1040,7 +1040,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) } if (to->engine->schedule) {
ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
ret = i915_sched_node_add_dependency(&to->sched,
&from->sched,
if (ret < 0) return ret; }I915_DEPENDENCY_EXTERNAL);
@@ -1202,7 +1204,9 @@ __i915_request_await_execution(struct i915_request *to, /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->schedule) {
err = i915_sched_node_add_dependency(&to->sched, &from->sched);
err = i915_sched_node_add_dependency(&to->sched,
&from->sched,
if (err < 0) return err; }I915_DEPENDENCY_WEAK);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 37cfcf5b321b..6e2d4190099f 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -462,7 +462,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, } int i915_sched_node_add_dependency(struct i915_sched_node *node,
struct i915_sched_node *signal)
struct i915_sched_node *signal,
{ struct i915_dependency *dep;unsigned long flags)
@@ -473,8 +474,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, local_bh_disable(); if (!__i915_sched_node_add_dependency(node, signal, dep,
I915_DEPENDENCY_EXTERNAL |
I915_DEPENDENCY_ALLOC))
i915_dependency_free(dep);flags | I915_DEPENDENCY_ALLOC))
local_bh_enable(); /* kick submission tasklet */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index d1dc4efef77b..6f0bf00fc569 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -34,7 +34,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, unsigned long flags); int i915_sched_node_add_dependency(struct i915_sched_node *node,
struct i915_sched_node *signal);
struct i915_sched_node *signal,
unsigned long flags);
void i915_sched_node_fini(struct i915_sched_node *node); diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index d18e70550054..7186875088a0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -78,6 +78,7 @@ struct i915_dependency { unsigned long flags; #define I915_DEPENDENCY_ALLOC BIT(0) #define I915_DEPENDENCY_EXTERNAL BIT(1) +#define I915_DEPENDENCY_WEAK BIT(2) }; #endif /* _I915_SCHEDULER_TYPES_H_ */
Quoting Tvrtko Ursulin (2020-05-07 15:53:08)
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
That submit fences can be between different contexts on the same engine. The example (from mesa) is where we have two interdependent clients which are using their own userlevel scheduling inside each batch, i.e. waiting on semaphores. -Chris
On 07/05/2020 16:00, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 15:53:08)
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
That submit fences can be between different contexts on the same engine. The example (from mesa) is where we have two interdependent clients which are using their own userlevel scheduling inside each batch, i.e. waiting on semaphores.
But if submit fence was used that means the waiter should never be submitted ahead of the signaler. And with this change it could get ahead in the priolist, no?
Regards,
Tvrtko
Quoting Tvrtko Ursulin (2020-05-07 16:23:59)
On 07/05/2020 16:00, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 15:53:08)
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
That submit fences can be between different contexts on the same engine. The example (from mesa) is where we have two interdependent clients which are using their own userlevel scheduling inside each batch, i.e. waiting on semaphores.
But if submit fence was used that means the waiter should never be submitted ahead of the signaler. And with this change it could get ahead in the priolist, no?
You do recall the starting point for this series was future fences :)
The test case for this is:
execbuf.flags = engine | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = 0; gem_execbuf_wr(i915, &execbuf);
execbuf.rsvd1 = gem_context_clone_with_engines(i915, 0); execbuf.rsvd2 >>= 32; execbuf.flags = e->flags; execbuf.flags |= I915_EXEC_FENCE_SUBMIT | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = offset; gem_execbuf_wr(i915, &execbuf); gem_context_destroy(i915, execbuf.rsvd1);
gem_sync(i915, obj.handle);
/* no hangs! */ out = execbuf.rsvd2; igt_assert_eq(sync_fence_status(out), 1); close(out);
out = execbuf.rsvd2 >> 32; igt_assert_eq(sync_fence_status(out), 1); close(out);
Where the batches are a couple of semaphore waits, which is the essence of a bonded request but being run on a single engine.
Unless we treat the submit fence as a weak dependency here, we can't timeslice between the two.
The other observation is that we may not have to suspend the request if the other hangs as the linkage is implicit. If the request does continue to wait on the hung request, we can only hope it too hangs. I should make that a second patch, since it is a distinct change. -Chris
On 07/05/2020 16:34, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 16:23:59)
On 07/05/2020 16:00, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 15:53:08)
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
That submit fences can be between different contexts on the same engine. The example (from mesa) is where we have two interdependent clients which are using their own userlevel scheduling inside each batch, i.e. waiting on semaphores.
But if submit fence was used that means the waiter should never be submitted ahead of the signaler. And with this change it could get ahead in the priolist, no?
You do recall the starting point for this series was future fences :)
The test case for this is:
execbuf.flags = engine | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = 0; gem_execbuf_wr(i915, &execbuf);
execbuf.rsvd1 = gem_context_clone_with_engines(i915, 0); execbuf.rsvd2 >>= 32; execbuf.flags = e->flags; execbuf.flags |= I915_EXEC_FENCE_SUBMIT | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = offset; gem_execbuf_wr(i915, &execbuf); gem_context_destroy(i915, execbuf.rsvd1); gem_sync(i915, obj.handle);
/* no hangs! */ out = execbuf.rsvd2; igt_assert_eq(sync_fence_status(out), 1); close(out);
out = execbuf.rsvd2 >> 32; igt_assert_eq(sync_fence_status(out), 1); close(out);
Where the batches are a couple of semaphore waits, which is the essence of a bonded request but being run on a single engine.
Unless we treat the submit fence as a weak dependency here, we can't timeslice between the two.
Yes it is fine, once the initial submit was controlled by a fence it is okay to re-order.
Regards,
Tvrtko
The other observation is that we may not have to suspend the request if the other hangs as the linkage is implicit. If the request does continue to wait on the hung request, we can only hope it too hangs. I should make that a second patch, since it is a distinct change.
Quoting Tvrtko Ursulin (2020-05-07 18:55:17)
On 07/05/2020 16:34, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 16:23:59)
On 07/05/2020 16:00, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-05-07 15:53:08)
On 07/05/2020 09:21, Chris Wilson wrote:
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. So for example we do need to suspend one if the other hangs.
The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores.
Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: stable@vger.kernel.org # v5.6+
drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++++++++ drivers/gpu/drm/i915/i915_request.c | 8 ++++++-- drivers/gpu/drm/i915/i915_scheduler.c | 6 +++--- drivers/gpu/drm/i915/i915_scheduler.h | 3 ++- drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index dc3f2ee7136d..10109f661bcb 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1880,6 +1880,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) struct i915_request *w = container_of(p->waiter, typeof(*w), sched);
if (p->flags & I915_DEPENDENCY_WEAK)
continue;
I did not quite get it - submit fence dependency would mean different engines, so the below check (w->engine != rq->engine) would effectively have the same effect. What am I missing?
That submit fences can be between different contexts on the same engine. The example (from mesa) is where we have two interdependent clients which are using their own userlevel scheduling inside each batch, i.e. waiting on semaphores.
But if submit fence was used that means the waiter should never be submitted ahead of the signaler. And with this change it could get ahead in the priolist, no?
You do recall the starting point for this series was future fences :)
The test case for this is:
execbuf.flags = engine | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = 0; gem_execbuf_wr(i915, &execbuf); execbuf.rsvd1 = gem_context_clone_with_engines(i915, 0); execbuf.rsvd2 >>= 32; execbuf.flags = e->flags; execbuf.flags |= I915_EXEC_FENCE_SUBMIT | I915_EXEC_FENCE_OUT; execbuf.batch_start_offset = offset; gem_execbuf_wr(i915, &execbuf); gem_context_destroy(i915, execbuf.rsvd1); gem_sync(i915, obj.handle); /* no hangs! */ out = execbuf.rsvd2; igt_assert_eq(sync_fence_status(out), 1); close(out); out = execbuf.rsvd2 >> 32; igt_assert_eq(sync_fence_status(out), 1); close(out);
Where the batches are a couple of semaphore waits, which is the essence of a bonded request but being run on a single engine.
Unless we treat the submit fence as a weak dependency here, we can't timeslice between the two.
Yes it is fine, once the initial submit was controlled by a fence it is okay to re-order.
Right. That we can't submit the second batch before the master, even if the master is blocked is covered in gem_exec_fence/parallel. Hmm, but that only covers submit fences on other engines (simply because it wants to check the submit fences complete before the master, and didn't assume timeslicing). So there's room for another test here, where we have both requests on the same engine, and block the master. -Chris
linux-stable-mirror@lists.linaro.org