On 12/10/2024 4:38 PM, Lucas De Marchi wrote:
On Thu, Dec 05, 2024 at 03:33:46PM +0100, Nirmoy Das wrote:
On 12/5/2024 1:40 PM, Matthew Auld wrote:
On 05/12/2024 12:02, Nirmoy Das wrote:
There could be still migration job going on while doing xe_tt_unmap_sg() which could trigger GPU page faults. Fix this by waiting for the migration job to finish.
v2: Use intr=false(Matt A)
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3466 Fixes: 75521e8b56e8 ("drm/xe: Perform dma_map when moving system buffer objects to TT") Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: stable@vger.kernel.org # v6.11+ Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: Nirmoy Das nirmoy.das@intel.com
Ok, so this is something like ttm_bo_move_to_ghost() doing a pipeline move for tt -> system, but we then do xe_tt_unmap_sg() too early which tears down the IOMMU (if enabled) mappings whilst the job is in progress?
Yes, this exactly what is happening for this issue.
Maybe add some more info to the commit message?
I will add more details.
Are you going to send a new version?
Was waiting for more reviews. Sent out v3 with updated commit message.
Once this is fixed, please also send a revert MR to the kconfig workaround 3940181b1bad @ gitlab.freedesktop.org/drm/xe/ci.git
I will do that.
Thanks,
Nirmoy
Lucas De Marchi
I think this for sure fixes it. Just wondering if it's somehow possible to keep the mapping until the job is done, since all tt -> sys moves are now synced here?
Unless Thomas has a better idea here, Reviewed-by: Matthew Auld matthew.auld@intel.com
Thanks,
Nirmoy
drivers/gpu/drm/xe/xe_bo.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index b2aa368a23f8..c906a5529db0 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -857,8 +857,16 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, out: if ((!ttm_bo->resource || ttm_bo->resource->mem_type == XE_PL_SYSTEM) && - ttm_bo->ttm) + ttm_bo->ttm) { + long timeout = dma_resv_wait_timeout(ttm_bo->base.resv, + DMA_RESV_USAGE_BOOKKEEP, + false, + MAX_SCHEDULE_TIMEOUT); + if (timeout < 0) + ret = timeout;
xe_tt_unmap_sg(ttm_bo->ttm); + } return ret; }