[PATCH v3 0/2] Fix error propagation amongst request

List overview All Threads
Download

newer

older

FAILED: patch "[PATCH] regulator:...

FAILED: patch "[PATCH] arm64:...

Andi Shyti

28 Feb 2023 28 Feb '23

2:11 a.m.

Hi,

This series of two patches fixes the issue introduced in cf586021642d80 ("drm/i915/gt: Pipelined page migration") where, as reported by Matt, in a chain of requests an error is reported only if happens in the last request.

However Chris noticed that without ensuring exclusivity in the locking we might end up in some deadlock. That's why patch 1 throttles for the ringspace in order to make sure that no one is holding it.

Version 1 of this patch has been reviewed by matt and this version is adding Chris exclusive locking.

Thanks Chris for this work.

Andi

Changelog ========= v1 -> v2 - Add patch 1 for ensuring exclusive locking of the timeline - Reword git commit of patch 2.

Andi Shyti (1): drm/i915/gt: Make sure that errors are propagated through request chains

Chris Wilson (1): drm/i915: Throttle for ringspace prior to taking the timeline mutex

drivers/gpu/drm/i915/gt/intel_context.c | 41 +++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 2 ++ drivers/gpu/drm/i915/gt/intel_migrate.c | 39 +++++++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 3 ++ 4 files changed, 75 insertions(+), 10 deletions(-)

-- 2.39.1

Show replies by date

Andi Shyti

28 Feb 28 Feb

2:11 a.m.

New subject: [PATCH v3 1/2] drm/i915: Throttle for ringspace prior to taking the timeline mutex

From: Chris Wilson chris@chris-wilson.co.uk

Before taking exclusive ownership of the ring for emitting the request, wait for space in the ring to become available. This allows others to take the timeline->mutex to make forward progresses while userspace is blocked.

In particular, this allows regular clients to issue requests on the kernel context, potentially filling the ring, but allow the higher priority heartbeats and pulses to still be submitted without being blocked by the less critical work.

Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration") Signed-off-by: Chris Wilson chris.p.wilson@linux.intel.com Cc: Maciej Patelczyk maciej.patelczyk@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andi Shyti andi.shyti@linux.intel.com --- Hi,

I'm not sure I need to add the Fixes tag here as this is more preparatory for the next patch. Together, though, patch 1 and 2 make the fix with proper locking mechanism.

Andi

drivers/gpu/drm/i915/gt/intel_context.c | 41 +++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 2 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ 3 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2aa63ec521b89..59cd612a23561 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -626,6 +626,47 @@ bool intel_context_revoke(struct intel_context *ce) return ret; }

+int intel_context_throttle(const struct intel_context *ce) +{ + const struct intel_ring *ring = ce->ring; + const struct intel_timeline *tl = ce->timeline; + struct i915_request *rq; + int err = 0; + + if (READ_ONCE(ring->space) >= SZ_1K) + return 0; + + rcu_read_lock(); + list_for_each_entry_reverse(rq, &tl->requests, link) { + if (__i915_request_is_complete(rq)) + break; + + if (rq->ring != ring) + continue; + + /* Wait until there will be enough space following that rq */ + if (__intel_ring_space(rq->postfix, + ring->emit, + ring->size) < ring->size / 2) { + if (i915_request_get_rcu(rq)) { + rcu_read_unlock(); + + if (i915_request_wait(rq, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT) < 0) + err = -EINTR; + + rcu_read_lock(); + i915_request_put(rq); + } + break; + } + } + rcu_read_unlock(); + + return err; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 0a8d553da3f43..f919a66cebf5b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -226,6 +226,8 @@ static inline void intel_context_exit(struct intel_context *ce) ce->ops->exit(ce); }

+int intel_context_throttle(const struct intel_context *ce); + static inline struct intel_context *intel_context_get(struct intel_context *ce) { kref_get(&ce->ref); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7503dcb9043bb..a1741c4a8cffd 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1035,6 +1035,9 @@ i915_request_create(struct intel_context *ce) struct i915_request *rq; struct intel_timeline *tl;

+ if (intel_context_throttle(ce)) + return ERR_PTR(-EINTR); + tl = intel_context_timeline_lock(ce); if (IS_ERR(tl)) return ERR_CAST(tl);

-- 2.39.1

Andi Shyti

2:11 a.m.

New subject: [PATCH v3 2/2] drm/i915/gt: Make sure that errors are propagated through request chains

Currently, when we perform operations such as clearing or copying large blocks of memory, we generate multiple requests that are executed in a chain.

However, if one of these requests fails, we may not realize it unless it happens to be the last request in the chain. This is because errors are not properly propagated.

For this we need to keep propagating the chain of fence notification in order to always reach the final fence associated to the final request.

To address this issue, we need to ensure that the chain of fence notifications is always propagated so that we can reach the final fence associated with the last request. By doing so, we will be able to detect any memory operation failures and determine whether the memory is still invalid.

On copy and clear migration signal fences upon completion.

On copy and clear migration, signal fences upon request completion to ensure that we have a reliable perpetuation of the operation outcome.

Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration") Reported-by: Matthew Auld matthew.auld@intel.com Suggested-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/intel_migrate.c | 39 ++++++++++++++++++------- 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 3f638f1987968..6b497640d3a0a 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce, dst_offset = 2 * CHUNK_SZ; }

+ /* + * While building the chain of requests, we need to ensure + * that no one can sneak into the timeline unnoticed. + */ + mutex_lock(&ce->timeline->mutex); + do { int len;

rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_ce; + break; }

if (deps) { @@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce,

/* Arbitration is re-enabled between requests. */ out_rq: - if (*out) - i915_request_put(*out); - *out = i915_request_get(rq); + i915_sw_fence_await(&rq->submit); + i915_request_get(rq); i915_request_add(rq); + if (*out) { + i915_sw_fence_complete(&(*out)->submit); + i915_request_put(*out); + } + *out = rq;

if (err) break; @@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce, cond_resched(); } while (1);

-out_ce: + mutex_unlock(&ce->timeline->mutex); + + if (*out) + i915_sw_fence_complete(&(*out)->submit); return err; }

@@ -1005,7 +1018,7 @@ intel_context_migrate_clear(struct intel_context *ce, rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_ce; + break; }

if (deps) { @@ -1056,17 +1069,23 @@ intel_context_migrate_clear(struct intel_context *ce,

cond_resched(); } while (1);

-out_ce: + if (*out) + i915_sw_fence_complete(&(*out)->submit); return err; }

-- 2.39.1

Gwan-gyeong Mun

7 Mar 7 Mar

7:33 a.m.

New subject: [Intel-gfx] [PATCH v3 0/2] Fix error propagation amongst request

Hi Andi,

After applying these two patches, deadlock is being detected in the call stack below. Please review whether the patch to update the intel_context_migrate_copy() part affected the deadlock.

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114451v1/bat-dg2-8/igt@i9...

<4> [33.070967] ============================================ <4> [33.070968] WARNING: possible recursive locking detected <4> [33.070969] 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 Not tainted <4> [33.070970] -------------------------------------------- <4> [33.070971] i915_module_loa/948 is trying to acquire lock: <4> [33.070972] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: i915_request_create+0x1c6/0x230 [i915] <4> [33.071215] but task is already holding lock: <4> [33.071235] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.071484] other info that might help us debug this: <4> [33.071504] Possible unsafe locking scenario: <4> [33.071522] CPU0 <4> [33.071532] ---- <4> [33.071541] lock(migrate); <4> [33.071554] lock(migrate); <4> [33.071567] *** DEADLOCK *** <4> [33.071585] May be due to missing lock nesting notation <4> [33.071606] 3 locks held by i915_module_loa/948: <4> [33.071622] #0: ffffc90001eb7b70 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0xae2/0x21c0 [i915] <4> [33.071893] #1: ffff8881127b9c28 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __intel_context_do_pin_ww+0x7a/0xa30 [i915] <4> [33.072133] #2: ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.072384] stack backtrace: <4> [33.072399] CPU: 7 PID: 948 Comm: i915_module_loa Not tainted 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 <4> [33.072428] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X220.B00.2103302221 03/30/2021 <4> [33.072465] Call Trace: <4> [33.072475] <TASK> <4> [33.072486] dump_stack_lvl+0x5b/0x85 <4> [33.072503] __lock_acquire.cold+0x158/0x33b <4> [33.072524] lock_acquire+0xd6/0x310 <4> [33.072541] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.072812] __mutex_lock+0x95/0xf40 <4> [33.072829] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073093] ? rcu_read_lock_sched_held+0x55/0x80 <4> [33.073112] ? __mutex_lock+0x133/0xf40 <4> [33.073128] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073388] ? intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.073619] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073876] i915_request_create+0x1c6/0x230 [i915] <4> [33.074135] intel_context_migrate_copy+0x1d0/0xa80 [i915] <4> [33.074360] __i915_ttm_move+0x7a8/0x940 [i915] <4> [33.074538] ? _raw_spin_unlock_irqrestore+0x41/0x70 <4> [33.074552] ? dma_resv_iter_next+0x91/0xb0 <4> [33.074564] ? dma_resv_iter_first+0x42/0xb0 <4> [33.074576] ? i915_deps_add_resv+0x4c/0xc0 [i915] <4> [33.074744] i915_ttm_move+0x2ac/0x430 [i915] <4> [33.074910] ttm_bo_handle_move_mem+0xb5/0x140 [ttm] <4> [33.074930] ttm_bo_validate+0xe9/0x1a0 [ttm] <4> [33.074947] __i915_ttm_get_pages+0x4e/0x190 [i915] <4> [33.075112] i915_ttm_get_pages+0xf3/0x160 [i915] <4> [33.075280] ____i915_gem_object_get_pages+0x36/0xb0 [i915] <4> [33.075446] __i915_gem_object_get_pages+0x95/0xa0 [i915] <4> [33.075608] i915_vma_get_pages+0xfa/0x160 [i915] <4> [33.075779] i915_vma_pin_ww+0xdc/0xb50 [i915] <4> [33.075953] eb_validate_vmas+0x1c6/0xac0 [i915] <4> [33.076114] i915_gem_do_execbuffer+0xb2a/0x21c0 [i915] <4> [33.076276] ? __stack_depot_save+0x3f/0x4e0 <4> [33.076292] ? 0xffffffff81000000 <4> [33.076301] ? _raw_spin_unlock_irq+0x41/0x50 <4> [33.076312] ? lockdep_hardirqs_on+0xc3/0x140 <4> [33.076325] ? set_track_update+0x25/0x50 <4> [33.076338] ? __lock_acquire+0x5f2/0x2130 <4> [33.076356] i915_gem_execbuffer2_ioctl+0x123/0x2e0 [i915] <4> [33.076519] ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915] <4> [33.076679] drm_ioctl_kernel+0xb4/0x150 <4> [33.076692] drm_ioctl+0x21d/0x420 <4> [33.076703] ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915] <4> [33.076864] ? __vm_munmap+0xd3/0x170 <4> [33.076877] __x64_sys_ioctl+0x76/0xb0 <4> [33.076889] do_syscall_64+0x3c/0x90 <4> [33.076900] entry_SYSCALL_64_after_hwframe+0x72/0xdc <4> [33.076913] RIP: 0033:0x7f304aa903ab <4> [33.076923] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 <4> [33.076957] RSP: 002b:00007fffb1424cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4> [33.076975] RAX: ffffffffffffffda RBX: 00007fffb1424da0 RCX: 00007f304aa903ab <4> [33.076990] RDX: 00007fffb1424da0 RSI: 0000000040406469 RDI: 0000000000000005 <4> [33.077004] RBP: 0000000040406469 R08: 0000000000000005 R09: 0000000100003000 <4> [33.077019] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000010000 <4> [33.077034] R13: 0000000000000005 R14: 00000000ffffffff R15: 00000000000056a0 <4> [33.077052] </TASK>

Br,

G.G.

On 2/28/23 4:11 AM, Andi Shyti wrote:

...

Hi,

This series of two patches fixes the issue introduced in cf586021642d80 ("drm/i915/gt: Pipelined page migration") where, as reported by Matt, in a chain of requests an error is reported only if happens in the last request.

However Chris noticed that without ensuring exclusivity in the locking we might end up in some deadlock. That's why patch 1 throttles for the ringspace in order to make sure that no one is holding it.

Version 1 of this patch has been reviewed by matt and this version is adding Chris exclusive locking.

Thanks Chris for this work.

Andi

Changelog

v1 -> v2

Add patch 1 for ensuring exclusive locking of the timeline

Reword git commit of patch 2.

Andi Shyti (1): drm/i915/gt: Make sure that errors are propagated through request chains

Chris Wilson (1): drm/i915: Throttle for ringspace prior to taking the timeline mutex

drivers/gpu/drm/i915/gt/intel_context.c | 41 +++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 2 ++ drivers/gpu/drm/i915/gt/intel_migrate.c | 39 +++++++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 3 ++ 4 files changed, 75 insertions(+), 10 deletions(-)

Andi Shyti

9:45 a.m.

New subject: [Intel-gfx] [PATCH v3 0/2] Fix error propagation amongst request

Hi GG,

On Tue, Mar 07, 2023 at 09:33:12AM +0200, Gwan-gyeong Mun wrote:

...

Hi Andi,

After applying these two patches, deadlock is being detected in the call stack below. Please review whether the patch to update the intel_context_migrate_copy() part affected the deadlock.

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114451v1/bat-dg2-8/igt@i9...

Thanks for looking into this. Yes, there is a basic locking issue here coming from migrate. migrate() takes the timeline lock and then calls the request_create() which tries to lock again. We inevitably fall into deadlock.

The locking of the timeline is quite exotic, it's started in request_create() and released in request_add().

It's still in trybot, but this is supposed to be the next version:

https://patchwork.freedesktop.org/series/114645/

This creates new version of request_create_locked() and request_add_locked() where there the timeline is not locked in the process.

There are still some selftests that need to be fixed, though.

Andi

...

<4> [33.070967] ============================================ <4> [33.070968] WARNING: possible recursive locking detected <4> [33.070969] 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 Not tainted <4> [33.070970] -------------------------------------------- <4> [33.070971] i915_module_loa/948 is trying to acquire lock: <4> [33.070972] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: i915_request_create+0x1c6/0x230 [i915] <4> [33.071215] but task is already holding lock: <4> [33.071235] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.071484] other info that might help us debug this: <4> [33.071504] Possible unsafe locking scenario: <4> [33.071522] CPU0 <4> [33.071532] ---- <4> [33.071541] lock(migrate); <4> [33.071554] lock(migrate); <4> [33.071567] *** DEADLOCK *** <4> [33.071585] May be due to missing lock nesting notation <4> [33.071606] 3 locks held by i915_module_loa/948: <4> [33.071622] #0: ffffc90001eb7b70 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0xae2/0x21c0 [i915] <4> [33.071893] #1: ffff8881127b9c28 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __intel_context_do_pin_ww+0x7a/0xa30 [i915] <4> [33.072133] #2: ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.072384] stack backtrace: <4> [33.072399] CPU: 7 PID: 948 Comm: i915_module_loa Not tainted 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 <4> [33.072428] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X220.B00.2103302221 03/30/2021 <4> [33.072465] Call Trace: <4> [33.072475] <TASK> <4> [33.072486] dump_stack_lvl+0x5b/0x85 <4> [33.072503] __lock_acquire.cold+0x158/0x33b <4> [33.072524] lock_acquire+0xd6/0x310 <4> [33.072541] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.072812] __mutex_lock+0x95/0xf40 <4> [33.072829] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073093] ? rcu_read_lock_sched_held+0x55/0x80 <4> [33.073112] ? __mutex_lock+0x133/0xf40 <4> [33.073128] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073388] ? intel_context_migrate_copy+0x1b3/0xa80 [i915] <4> [33.073619] ? i915_request_create+0x1c6/0x230 [i915] <4> [33.073876] i915_request_create+0x1c6/0x230 [i915] <4> [33.074135] intel_context_migrate_copy+0x1d0/0xa80 [i915] <4> [33.074360] __i915_ttm_move+0x7a8/0x940 [i915] <4> [33.074538] ? _raw_spin_unlock_irqrestore+0x41/0x70 <4> [33.074552] ? dma_resv_iter_next+0x91/0xb0 <4> [33.074564] ? dma_resv_iter_first+0x42/0xb0 <4> [33.074576] ? i915_deps_add_resv+0x4c/0xc0 [i915] <4> [33.074744] i915_ttm_move+0x2ac/0x430 [i915] <4> [33.074910] ttm_bo_handle_move_mem+0xb5/0x140 [ttm] <4> [33.074930] ttm_bo_validate+0xe9/0x1a0 [ttm] <4> [33.074947] __i915_ttm_get_pages+0x4e/0x190 [i915] <4> [33.075112] i915_ttm_get_pages+0xf3/0x160 [i915] <4> [33.075280] ____i915_gem_object_get_pages+0x36/0xb0 [i915] <4> [33.075446] __i915_gem_object_get_pages+0x95/0xa0 [i915] <4> [33.075608] i915_vma_get_pages+0xfa/0x160 [i915] <4> [33.075779] i915_vma_pin_ww+0xdc/0xb50 [i915] <4> [33.075953] eb_validate_vmas+0x1c6/0xac0 [i915] <4> [33.076114] i915_gem_do_execbuffer+0xb2a/0x21c0 [i915] <4> [33.076276] ? __stack_depot_save+0x3f/0x4e0 <4> [33.076292] ? 0xffffffff81000000 <4> [33.076301] ? _raw_spin_unlock_irq+0x41/0x50 <4> [33.076312] ? lockdep_hardirqs_on+0xc3/0x140 <4> [33.076325] ? set_track_update+0x25/0x50 <4> [33.076338] ? __lock_acquire+0x5f2/0x2130 <4> [33.076356] i915_gem_execbuffer2_ioctl+0x123/0x2e0 [i915] <4> [33.076519] ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915] <4> [33.076679] drm_ioctl_kernel+0xb4/0x150 <4> [33.076692] drm_ioctl+0x21d/0x420 <4> [33.076703] ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915] <4> [33.076864] ? __vm_munmap+0xd3/0x170 <4> [33.076877] __x64_sys_ioctl+0x76/0xb0 <4> [33.076889] do_syscall_64+0x3c/0x90 <4> [33.076900] entry_SYSCALL_64_after_hwframe+0x72/0xdc <4> [33.076913] RIP: 0033:0x7f304aa903ab <4> [33.076923] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 <4> [33.076957] RSP: 002b:00007fffb1424cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4> [33.076975] RAX: ffffffffffffffda RBX: 00007fffb1424da0 RCX: 00007f304aa903ab <4> [33.076990] RDX: 00007fffb1424da0 RSI: 0000000040406469 RDI: 0000000000000005 <4> [33.077004] RBP: 0000000040406469 R08: 0000000000000005 R09: 0000000100003000 <4> [33.077019] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000010000 <4> [33.077034] R13: 0000000000000005 R14: 00000000ffffffff R15: 00000000000056a0 <4> [33.077052] </TASK>

Br,

G.G.

On 2/28/23 4:11 AM, Andi Shyti wrote:

...
Hi,

This series of two patches fixes the issue introduced in cf586021642d80 ("drm/i915/gt: Pipelined page migration") where, as reported by Matt, in a chain of requests an error is reported only if happens in the last request.

However Chris noticed that without ensuring exclusivity in the locking we might end up in some deadlock. That's why patch 1 throttles for the ringspace in order to make sure that no one is holding it.

Version 1 of this patch has been reviewed by matt and this version is adding Chris exclusive locking.

Thanks Chris for this work.

Andi

Changelog

v1 -> v2

Add patch 1 for ensuring exclusive locking of the timeline

Reword git commit of patch 2.

Andi Shyti (1): drm/i915/gt: Make sure that errors are propagated through request chains

Chris Wilson (1): drm/i915: Throttle for ringspace prior to taking the timeline mutex

drivers/gpu/drm/i915/gt/intel_context.c | 41 +++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 2 ++ drivers/gpu/drm/i915/gt/intel_migrate.c | 39 +++++++++++++++++------ drivers/gpu/drm/i915/i915_request.c | 3 ++ 4 files changed, 75 insertions(+), 10 deletions(-)

1011

days inactive

1018

days old

linux-stable-mirror@lists.linaro.org

4 comments

participants

tags (0)

participants (2)

Andi Shyti
Gwan-gyeong Mun