Hi,
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware workaround.
2. Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog ========= v6 -> v7 - find a more appropriate place where to remove the CCS engines: remove them in init_engine_mask() instead of intel_engines_init_mmio(). (Thanks, Matt) - Add Michal's ACK, thanks Michal!
v5 -> v6 (thanks Matt for the suggestions in v6) - Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all. - In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5 - Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt) - Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt) - create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series. - Fix a selftest that was failing because could not set CCS2. - Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4 - Reword correctly the comment in the workaround - Fix a buffer overflow (Thanks Joonas) - Handle properly the fused engines when setting the CCS mode.
v2 -> v3 - Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko) - Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt) - Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2 - In Patch 1 use the correct workaround number (thanks Matt). - In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko). - Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 ++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 102 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
The hardware should not dynamically balance the load between CCS engines. Wa_14019159160 recommends disabling it across all platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Acked-by: Michal Mrozek michal.mrozek@intel.com --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 50962cfd1353..31b102604e3d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1477,6 +1477,7 @@ #define ECOBITS_PPGTT_CACHE4B (0 << 8)
#define GEN12_RCU_MODE _MMIO(0x14800) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
#define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index b079cbbc1897..9963e5725ae5 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -51,7 +51,8 @@ * registers belonging to BCS, VCS or VECS should be implemented in * xcs_engine_wa_init(). Workarounds for registers not belonging to a specific * engine's MMIO range but that are part of of the common RCS/CCS reset domain - * should be implemented in general_render_compute_wa_init(). + * should be implemented in general_render_compute_wa_init(). The settings + * about the CCS load balancing should be added in ccs_engine_wa_mode(). * * - GT workarounds: the list of these WAs is applied whenever these registers * revert to their default values: on GPU reset, suspend/resume [1]_, etc. @@ -2854,6 +2855,22 @@ add_render_compute_tuning_settings(struct intel_gt *gt, wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC); }
+static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal) +{ + struct intel_gt *gt = engine->gt; + + if (!IS_DG2(gt->i915)) + return; + + /* + * Wa_14019159160: This workaround, along with others, leads to + * significant challenges in utilizing load balancing among the + * CCS slices. Consequently, an architectural decision has been + * made to completely disable automatic CCS load balancing. + */ + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); +} + /* * The workarounds in this function apply to shared registers in * the general render reset domain that aren't tied to a @@ -3000,8 +3017,10 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal * to a single RCS/CCS engine's workaround list since * they're reset as part of the general render domain reset. */ - if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) + if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) { general_render_compute_wa_init(engine, wal); + ccs_engine_wa_mode(engine, wal); + }
if (engine->class == COMPUTE_CLASS) ccs_engine_wa_init(engine, wal);
We want a fixed load CCS balancing consisting in all slices sharing one single user engine. For this reason do not create the intel_engine_cs structure with its dedicated command streamer for CCS slices beyond the first.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Acked-by: Michal Mrozek michal.mrozek@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f553cf4e6449..47c4a69e854c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -908,6 +908,21 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt) info->engine_mask &= ~BIT(GSC0); }
+ /* + * Do not create the command streamer for CCS slices beyond the first. + * All the workload submitted to the first engine will be shared among + * all the slices. + * + * Once the user will be allowed to customize the CCS mode, then this + * check needs to be removed. + */ + if (IS_DG2(gt->i915)) { + intel_engine_mask_t first_ccs = BIT((CCS0 + __ffs(CCS_MASK(gt)))); + intel_engine_mask_t all_ccs = CCS_MASK(gt) << CCS0; + + info->engine_mask &= ~(all_ccs &= ~first_ccs); + } + return info->engine_mask; }
On Wed, Mar 27, 2024 at 04:56:18PM +0100, Andi Shyti wrote:
We want a fixed load CCS balancing consisting in all slices sharing one single user engine. For this reason do not create the intel_engine_cs structure with its dedicated command streamer for CCS slices beyond the first.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Acked-by: Michal Mrozek michal.mrozek@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f553cf4e6449..47c4a69e854c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -908,6 +908,21 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt) info->engine_mask &= ~BIT(GSC0); }
- /*
* Do not create the command streamer for CCS slices beyond the first.
* All the workload submitted to the first engine will be shared among
* all the slices.
*
* Once the user will be allowed to customize the CCS mode, then this
* check needs to be removed.
*/
- if (IS_DG2(gt->i915)) {
intel_engine_mask_t first_ccs = BIT((CCS0 + __ffs(CCS_MASK(gt))));
intel_engine_mask_t all_ccs = CCS_MASK(gt) << CCS0;
info->engine_mask &= ~(all_ccs &= ~first_ccs);
Shouldn't the second "&=" just be an "&" since there's no need to modify the all_ccs variable that never gets used again?
In fact since this is DG2-specific, it seems like it might be more intuitive to just write the whole thing more directly as
if (IS_DG2(gt->i915)) { int first_ccs = __ffs(CCS_MASK(gt));
info->engine_mask &= ~GENMASK(CCS3, CCS0); info->engine_mask |= BIT(_CCS(first_ccs)); }
But up to you; if you just want to remove the unnecessary "=" that's fine too. Either way,
Reviewed-by: Matt Roper matthew.d.roper@intel.com
Matt
- }
- return info->engine_mask;
} -- 2.43.0
Hi Matt,
- /*
* Do not create the command streamer for CCS slices beyond the first.
* All the workload submitted to the first engine will be shared among
* all the slices.
*
* Once the user will be allowed to customize the CCS mode, then this
* check needs to be removed.
*/
- if (IS_DG2(gt->i915)) {
intel_engine_mask_t first_ccs = BIT((CCS0 + __ffs(CCS_MASK(gt))));
intel_engine_mask_t all_ccs = CCS_MASK(gt) << CCS0;
info->engine_mask &= ~(all_ccs &= ~first_ccs);
Shouldn't the second "&=" just be an "&" since there's no need to modify the all_ccs variable that never gets used again?
yes, that's a leftover from me trying different ways of removing all the non first CCS engines.
In fact since this is DG2-specific, it seems like it might be more intuitive to just write the whole thing more directly as
if (IS_DG2(gt->i915)) { int first_ccs = __ffs(CCS_MASK(gt)); info->engine_mask &= ~GENMASK(CCS3, CCS0); info->engine_mask |= BIT(_CCS(first_ccs)); }
yes, looks a bit simpler. Will use this way.
But up to you; if you just want to remove the unnecessary "=" that's fine too. Either way,
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Thanks!
Andi
Enable only one CCS engine by default with all the compute sices allocated to it.
While generating the list of UABI engines to be exposed to the user, exclude any additional CCS engines beyond the first instance.
This change can be tested with igt i915_query.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Reviewed-by: Matt Roper matthew.d.roper@intel.com Acked-by: Michal Mrozek michal.mrozek@intel.com --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 5 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 7 ++++ 5 files changed, 65 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 3ef6ed41e62b..a6885a1d41a1 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -118,6 +118,7 @@ gt-y += \ gt/intel_ggtt_fencing.o \ gt/intel_gt.o \ gt/intel_gt_buffer_pool.o \ + gt/intel_gt_ccs_mode.o \ gt/intel_gt_clock_utils.o \ gt/intel_gt_debugfs.o \ gt/intel_gt_engines_debugfs.o \ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c new file mode 100644 index 000000000000..044219c5960a --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2024 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_gt.h" +#include "intel_gt_ccs_mode.h" +#include "intel_gt_regs.h" + +void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{ + int cslice; + u32 mode = 0; + int first_ccs = __ffs(CCS_MASK(gt)); + + if (!IS_DG2(gt->i915)) + return; + + /* Build the value for the fixed CCS load balancing */ + for (cslice = 0; cslice < I915_MAX_CCS; cslice++) { + if (CCS_MASK(gt) & BIT(cslice)) + /* + * If available, assign the cslice + * to the first available engine... + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs); + + else + /* + * ... otherwise, mark the cslice as + * unavailable if no CCS dispatches here + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, + XEHP_CCS_MODE_CSLICE_MASK); + } + + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode); +} diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h new file mode 100644 index 000000000000..9e5549caeb26 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef __INTEL_GT_CCS_MODE_H__ +#define __INTEL_GT_CCS_MODE_H__ + +struct intel_gt; + +void intel_gt_apply_ccs_mode(struct intel_gt *gt); + +#endif /* __INTEL_GT_CCS_MODE_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 31b102604e3d..743fe3566722 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1480,6 +1480,11 @@ #define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
+#define XEHP_CCS_MODE _MMIO(0x14804) +#define XEHP_CCS_MODE_CSLICE_MASK REG_GENMASK(2, 0) /* CCS0-3 + rsvd */ +#define XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1) +#define XEHP_CCS_MODE_CSLICE(cslice, ccs) (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH)) + #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) #define CHV_FGT_DISABLE_SS1 (1 << 11) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 9963e5725ae5..8188c9f0b5ce 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -10,6 +10,7 @@ #include "intel_engine_regs.h" #include "intel_gpu_commands.h" #include "intel_gt.h" +#include "intel_gt_ccs_mode.h" #include "intel_gt_mcr.h" #include "intel_gt_print.h" #include "intel_gt_regs.h" @@ -2869,6 +2870,12 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li * made to completely disable automatic CCS load balancing. */ wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); + + /* + * After having disabled automatic load balancing we need to + * assign all slices to a single CCS. We will call it CCS mode 1 + */ + intel_gt_apply_ccs_mode(gt); }
/*
linux-stable-mirror@lists.linaro.org