Hi,
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware workaround.
2. Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog ========= v5 -> v6 (thanks Matt for the suggestions in v6) - Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all. - In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5 - Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt) - Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt) - create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series. - Fix a selftest that was failing because could not set CCS2. - Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4 - Reword correctly the comment in the workaround - Fix a buffer overflow (Thanks Joonas) - Handle properly the fused engines when setting the CCS mode.
v2 -> v3 - Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko) - Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt) - Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2 - In Patch 1 use the correct workaround number (thanks Matt). - In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko). - Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++--- drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 103 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
The hardware should not dynamically balance the load between CCS engines. Wa_14019159160 recommends disabling it across all platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ Reviewed-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 23 +++++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 50962cfd1353..31b102604e3d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1477,6 +1477,7 @@ #define ECOBITS_PPGTT_CACHE4B (0 << 8)
#define GEN12_RCU_MODE _MMIO(0x14800) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
#define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index b079cbbc1897..9963e5725ae5 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -51,7 +51,8 @@ * registers belonging to BCS, VCS or VECS should be implemented in * xcs_engine_wa_init(). Workarounds for registers not belonging to a specific * engine's MMIO range but that are part of of the common RCS/CCS reset domain - * should be implemented in general_render_compute_wa_init(). + * should be implemented in general_render_compute_wa_init(). The settings + * about the CCS load balancing should be added in ccs_engine_wa_mode(). * * - GT workarounds: the list of these WAs is applied whenever these registers * revert to their default values: on GPU reset, suspend/resume [1]_, etc. @@ -2854,6 +2855,22 @@ add_render_compute_tuning_settings(struct intel_gt *gt, wa_write_clr(wal, GEN8_GARBCNTL, GEN12_BUS_HASH_CTL_BIT_EXC); }
+static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_list *wal) +{ + struct intel_gt *gt = engine->gt; + + if (!IS_DG2(gt->i915)) + return; + + /* + * Wa_14019159160: This workaround, along with others, leads to + * significant challenges in utilizing load balancing among the + * CCS slices. Consequently, an architectural decision has been + * made to completely disable automatic CCS load balancing. + */ + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); +} + /* * The workarounds in this function apply to shared registers in * the general render reset domain that aren't tied to a @@ -3000,8 +3017,10 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal * to a single RCS/CCS engine's workaround list since * they're reset as part of the general render domain reset. */ - if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) + if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE) { general_render_compute_wa_init(engine, wal); + ccs_engine_wa_mode(engine, wal); + }
if (engine->class == COMPUTE_CLASS) ccs_engine_wa_init(engine, wal);
We want a fixed load CCS balancing consisting in all slices sharing one single user engine. For this reason do not create the intel_engine_cs structure with its dedicated command streamer for CCS slices beyond the first.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f553cf4e6449..c4fb31bb6e72 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -966,6 +966,7 @@ int intel_engines_init_mmio(struct intel_gt *gt) const unsigned int engine_mask = init_engine_mask(gt); unsigned int mask = 0; unsigned int i, class; + u8 ccs_instance = 0; u8 logical_ids[MAX_ENGINE_INSTANCE + 1]; int err;
@@ -986,6 +987,19 @@ int intel_engines_init_mmio(struct intel_gt *gt) !HAS_ENGINE(gt, i)) continue;
+ /* + * Do not create the command streamer for CCS slices + * beyond the first. All the workload submitted to the + * first engine will be shared among all the slices. + * + * Once the user will be allowed to customize the CCS + * mode, then this check needs to be removed. + */ + if (IS_DG2(i915) && + class == COMPUTE_CLASS && + ccs_instance++) + continue; + err = intel_engine_setup(gt, i, logical_ids[instance]); if (err) @@ -996,11 +1010,9 @@ int intel_engines_init_mmio(struct intel_gt *gt) }
/* - * Catch failures to update intel_engines table when the new engines - * are added to the driver by a warning and disabling the forgotten - * engines. + * Update the intel_engines table. */ - if (drm_WARN_ON(&i915->drm, mask != engine_mask)) + if (mask != engine_mask) gt->info.engine_mask = mask;
gt->info.num_engines = hweight32(mask);
On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
We want a fixed load CCS balancing consisting in all slices sharing one single user engine. For this reason do not create the intel_engine_cs structure with its dedicated command streamer for CCS slices beyond the first.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index f553cf4e6449..c4fb31bb6e72 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -966,6 +966,7 @@ int intel_engines_init_mmio(struct intel_gt *gt) const unsigned int engine_mask = init_engine_mask(gt); unsigned int mask = 0; unsigned int i, class;
- u8 ccs_instance = 0; u8 logical_ids[MAX_ENGINE_INSTANCE + 1]; int err;
@@ -986,6 +987,19 @@ int intel_engines_init_mmio(struct intel_gt *gt) !HAS_ENGINE(gt, i)) continue;
/*
* Do not create the command streamer for CCS slices
* beyond the first. All the workload submitted to the
* first engine will be shared among all the slices.
*
* Once the user will be allowed to customize the CCS
* mode, then this check needs to be removed.
*/
if (IS_DG2(i915) &&
class == COMPUTE_CLASS &&
ccs_instance++)
continue;
Wouldn't it be more intuitive to drop the non-lowest CCS engines in init_engine_mask() since that's the function that's dedicated to building the list of engines we'll use? Then we don't need to kill the assertion farther down either.
Matt
err = intel_engine_setup(gt, i, logical_ids[instance]); if (err)
@@ -996,11 +1010,9 @@ int intel_engines_init_mmio(struct intel_gt *gt) } /*
* Catch failures to update intel_engines table when the new engines
* are added to the driver by a warning and disabling the forgotten
* engines.
*/* Update the intel_engines table.
- if (drm_WARN_ON(&i915->drm, mask != engine_mask))
- if (mask != engine_mask) gt->info.engine_mask = mask;
gt->info.num_engines = hweight32(mask); -- 2.43.0
Hi Matt,
On Tue, Mar 26, 2024 at 09:03:10AM -0700, Matt Roper wrote:
On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
/*
* Do not create the command streamer for CCS slices
* beyond the first. All the workload submitted to the
* first engine will be shared among all the slices.
*
* Once the user will be allowed to customize the CCS
* mode, then this check needs to be removed.
*/
if (IS_DG2(i915) &&
class == COMPUTE_CLASS &&
ccs_instance++)
continue;
Wouldn't it be more intuitive to drop the non-lowest CCS engines in init_engine_mask() since that's the function that's dedicated to building the list of engines we'll use? Then we don't need to kill the assertion farther down either.
Because we don't check the result of init_engine_mask() while creating the engine's structure. We check it only after and indeed I removed the drm_WARN_ON() check.
I think the whole process of creating the engine's structure in the intel_engines_init_mmio() can be simplified, but this goes beyong the scope of the series.
Or am I missing something?
Thanks, Andi
On Tue, Mar 26, 2024 at 07:42:34PM +0100, Andi Shyti wrote:
Hi Matt,
On Tue, Mar 26, 2024 at 09:03:10AM -0700, Matt Roper wrote:
On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
/*
* Do not create the command streamer for CCS slices
* beyond the first. All the workload submitted to the
* first engine will be shared among all the slices.
*
* Once the user will be allowed to customize the CCS
* mode, then this check needs to be removed.
*/
if (IS_DG2(i915) &&
class == COMPUTE_CLASS &&
ccs_instance++)
continue;
Wouldn't it be more intuitive to drop the non-lowest CCS engines in init_engine_mask() since that's the function that's dedicated to building the list of engines we'll use? Then we don't need to kill the assertion farther down either.
Because we don't check the result of init_engine_mask() while creating the engine's structure. We check it only after and indeed I removed the drm_WARN_ON() check.
I think the whole process of creating the engine's structure in the intel_engines_init_mmio() can be simplified, but this goes beyong the scope of the series.
Or am I missing something?
The important part of init_engine_mask isn't the return value, but rather that it's what sets up gt->info.engine_mask. The HAS_ENGINE() check that intel_engines_init_mmio() uses is based on the value stored there, so updating that function will also ensure that we skip the engines we don't want in the loop.
Matt
Thanks, Andi
Hi Matt,
On Tue, Mar 26, 2024 at 02:30:33PM -0700, Matt Roper wrote:
On Tue, Mar 26, 2024 at 07:42:34PM +0100, Andi Shyti wrote:
On Tue, Mar 26, 2024 at 09:03:10AM -0700, Matt Roper wrote:
On Wed, Mar 13, 2024 at 09:19:50PM +0100, Andi Shyti wrote:
/*
* Do not create the command streamer for CCS slices
* beyond the first. All the workload submitted to the
* first engine will be shared among all the slices.
*
* Once the user will be allowed to customize the CCS
* mode, then this check needs to be removed.
*/
if (IS_DG2(i915) &&
class == COMPUTE_CLASS &&
ccs_instance++)
continue;
Wouldn't it be more intuitive to drop the non-lowest CCS engines in init_engine_mask() since that's the function that's dedicated to building the list of engines we'll use? Then we don't need to kill the assertion farther down either.
Because we don't check the result of init_engine_mask() while creating the engine's structure. We check it only after and indeed I removed the drm_WARN_ON() check.
I think the whole process of creating the engine's structure in the intel_engines_init_mmio() can be simplified, but this goes beyong the scope of the series.
Or am I missing something?
The important part of init_engine_mask isn't the return value, but rather that it's what sets up gt->info.engine_mask. The HAS_ENGINE() check that intel_engines_init_mmio() uses is based on the value stored there, so updating that function will also ensure that we skip the engines we don't want in the loop.
Yes, can do like this, as well. After all this is done I'm going to do some cleanup here, as well.
Thanks, Andi
Enable only one CCS engine by default with all the compute sices allocated to it.
While generating the list of UABI engines to be exposed to the user, exclude any additional CCS engines beyond the first instance.
This change can be tested with igt i915_query.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 5 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 7 ++++ 5 files changed, 65 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 3ef6ed41e62b..a6885a1d41a1 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -118,6 +118,7 @@ gt-y += \ gt/intel_ggtt_fencing.o \ gt/intel_gt.o \ gt/intel_gt_buffer_pool.o \ + gt/intel_gt_ccs_mode.o \ gt/intel_gt_clock_utils.o \ gt/intel_gt_debugfs.o \ gt/intel_gt_engines_debugfs.o \ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c new file mode 100644 index 000000000000..044219c5960a --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2024 Intel Corporation + */ + +#include "i915_drv.h" +#include "intel_gt.h" +#include "intel_gt_ccs_mode.h" +#include "intel_gt_regs.h" + +void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{ + int cslice; + u32 mode = 0; + int first_ccs = __ffs(CCS_MASK(gt)); + + if (!IS_DG2(gt->i915)) + return; + + /* Build the value for the fixed CCS load balancing */ + for (cslice = 0; cslice < I915_MAX_CCS; cslice++) { + if (CCS_MASK(gt) & BIT(cslice)) + /* + * If available, assign the cslice + * to the first available engine... + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs); + + else + /* + * ... otherwise, mark the cslice as + * unavailable if no CCS dispatches here + */ + mode |= XEHP_CCS_MODE_CSLICE(cslice, + XEHP_CCS_MODE_CSLICE_MASK); + } + + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode); +} diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h new file mode 100644 index 000000000000..9e5549caeb26 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/* + * Copyright © 2024 Intel Corporation + */ + +#ifndef __INTEL_GT_CCS_MODE_H__ +#define __INTEL_GT_CCS_MODE_H__ + +struct intel_gt; + +void intel_gt_apply_ccs_mode(struct intel_gt *gt); + +#endif /* __INTEL_GT_CCS_MODE_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 31b102604e3d..743fe3566722 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1480,6 +1480,11 @@ #define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
+#define XEHP_CCS_MODE _MMIO(0x14804) +#define XEHP_CCS_MODE_CSLICE_MASK REG_GENMASK(2, 0) /* CCS0-3 + rsvd */ +#define XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1) +#define XEHP_CCS_MODE_CSLICE(cslice, ccs) (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH)) + #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) #define CHV_FGT_DISABLE_SS1 (1 << 11) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 9963e5725ae5..8188c9f0b5ce 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -10,6 +10,7 @@ #include "intel_engine_regs.h" #include "intel_gpu_commands.h" #include "intel_gt.h" +#include "intel_gt_ccs_mode.h" #include "intel_gt_mcr.h" #include "intel_gt_print.h" #include "intel_gt_regs.h" @@ -2869,6 +2870,12 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li * made to completely disable automatic CCS load balancing. */ wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); + + /* + * After having disabled automatic load balancing we need to + * assign all slices to a single CCS. We will call it CCS mode 1 + */ + intel_gt_apply_ccs_mode(gt); }
/*
On Wed, Mar 13, 2024 at 09:19:51PM +0100, Andi Shyti wrote:
Enable only one CCS engine by default with all the compute sices allocated to it.
While generating the list of UABI engines to be exposed to the user, exclude any additional CCS engines beyond the first instance.
This change can be tested with igt i915_query.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
Reviewed-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 5 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 7 ++++ 5 files changed, 65 insertions(+) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 3ef6ed41e62b..a6885a1d41a1 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -118,6 +118,7 @@ gt-y += \ gt/intel_ggtt_fencing.o \ gt/intel_gt.o \ gt/intel_gt_buffer_pool.o \
- gt/intel_gt_ccs_mode.o \ gt/intel_gt_clock_utils.o \ gt/intel_gt_debugfs.o \ gt/intel_gt_engines_debugfs.o \
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c new file mode 100644 index 000000000000..044219c5960a --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: MIT +/*
- Copyright © 2024 Intel Corporation
- */
+#include "i915_drv.h" +#include "intel_gt.h" +#include "intel_gt_ccs_mode.h" +#include "intel_gt_regs.h"
+void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{
- int cslice;
- u32 mode = 0;
- int first_ccs = __ffs(CCS_MASK(gt));
- if (!IS_DG2(gt->i915))
return;
- /* Build the value for the fixed CCS load balancing */
- for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
if (CCS_MASK(gt) & BIT(cslice))
/*
* If available, assign the cslice
* to the first available engine...
*/
mode |= XEHP_CCS_MODE_CSLICE(cslice, first_ccs);
else
/*
* ... otherwise, mark the cslice as
* unavailable if no CCS dispatches here
*/
mode |= XEHP_CCS_MODE_CSLICE(cslice,
XEHP_CCS_MODE_CSLICE_MASK);
- }
- intel_uncore_write(gt->uncore, XEHP_CCS_MODE, mode);
+} diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h new file mode 100644 index 000000000000..9e5549caeb26 --- /dev/null +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: MIT */ +/*
- Copyright © 2024 Intel Corporation
- */
+#ifndef __INTEL_GT_CCS_MODE_H__ +#define __INTEL_GT_CCS_MODE_H__
+struct intel_gt;
+void intel_gt_apply_ccs_mode(struct intel_gt *gt);
+#endif /* __INTEL_GT_CCS_MODE_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 31b102604e3d..743fe3566722 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1480,6 +1480,11 @@ #define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) +#define XEHP_CCS_MODE _MMIO(0x14804) +#define XEHP_CCS_MODE_CSLICE_MASK REG_GENMASK(2, 0) /* CCS0-3 + rsvd */ +#define XEHP_CCS_MODE_CSLICE_WIDTH ilog2(XEHP_CCS_MODE_CSLICE_MASK + 1) +#define XEHP_CCS_MODE_CSLICE(cslice, ccs) (ccs << (cslice * XEHP_CCS_MODE_CSLICE_WIDTH))
#define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) #define CHV_FGT_DISABLE_SS1 (1 << 11) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index 9963e5725ae5..8188c9f0b5ce 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -10,6 +10,7 @@ #include "intel_engine_regs.h" #include "intel_gpu_commands.h" #include "intel_gt.h" +#include "intel_gt_ccs_mode.h" #include "intel_gt_mcr.h" #include "intel_gt_print.h" #include "intel_gt_regs.h" @@ -2869,6 +2870,12 @@ static void ccs_engine_wa_mode(struct intel_engine_cs *engine, struct i915_wa_li * made to completely disable automatic CCS load balancing. */ wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
- /*
* After having disabled automatic load balancing we need to
* assign all slices to a single CCS. We will call it CCS mode 1
*/
- intel_gt_apply_ccs_mode(gt);
} /* -- 2.43.0
Ping! Any thoughts here?
Andi
On Wed, Mar 13, 2024 at 09:19:48PM +0100, Andi Shyti wrote:
Hi,
this series does basically two things:
Disables automatic load balancing as adviced by the hardware workaround.
Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where
I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
v5 -> v6 (thanks Matt for the suggestions in v6)
- Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all.
- In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5
- Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt)
- Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt)
- create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series.
- Fix a selftest that was failing because could not set CCS2.
- Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++--- drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 103 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
-- 2.43.0
On 20/03/2024 15:06, Andi Shyti wrote:
Ping! Any thoughts here?
I only casually observed the discussion after I saw Matt suggested further simplifications. As I understood it, you will bring back the uabi engine games when adding the dynamic behaviour and that is fine by me.
Regards,
Tvrtko
On Wed, Mar 13, 2024 at 09:19:48PM +0100, Andi Shyti wrote:
Hi,
this series does basically two things:
Disables automatic load balancing as adviced by the hardware workaround.
Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where
I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
v5 -> v6 (thanks Matt for the suggestions in v6)
- Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all.
- In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5
- Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt)
- Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt)
- create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series.
- Fix a selftest that was failing because could not set CCS2.
- Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++--- drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 103 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
-- 2.43.0
Hi Tvrtko,
On Wed, Mar 20, 2024 at 03:40:18PM +0000, Tvrtko Ursulin wrote:
On 20/03/2024 15:06, Andi Shyti wrote:
Ping! Any thoughts here?
I only casually observed the discussion after I saw Matt suggested further simplifications. As I understood it, you will bring back the uabi engine games when adding the dynamic behaviour and that is fine by me.
yes, the refactoring suggested by you will come later.
Thanks, Andi
Hi Michal, Mark,
can you please ack from your side this first batch of changes?
Thanks, Andi
On Wed, Mar 13, 2024 at 09:19:48PM +0100, Andi Shyti wrote:
Hi,
this series does basically two things:
Disables automatic load balancing as adviced by the hardware workaround.
Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where
I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
v5 -> v6 (thanks Matt for the suggestions in v6)
- Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all.
- In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5
- Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt)
- Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt)
- create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series.
- Fix a selftest that was failing because could not set CCS2.
- Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++--- drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 103 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
-- 2.43.0
On Wed, Mar 13, 2024 at 09:19:48PM +0100, Andi Shyti wrote:
Hi,
this series does basically two things:
Disables automatic load balancing as adviced by the hardware workaround.
Assigns all the CCS slices to one single user engine. The user will then be able to query only one CCS engine
From v5 I have created a new file, gt/intel_gt_ccs_mode.c where
I added the intel_gt_apply_ccs_mode(). In the upcoming patches, this file will contain the implementation for dynamic CCS mode setting.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
v5 -> v6 (thanks Matt for the suggestions in v6)
- Remove the refactoring and the for_each_available_engine() macro and instead do not create the intel_engine_cs structure at all.
- In patch 1 just a trivial reordering of the bit definitions.
v4 -> v5
- Use the workaround framework to do all the CCS balancing settings in order to always apply the modes also when the engine resets. Put everything in its own specific function to be executed for the first CCS engine encountered. (Thanks Matt)
- Calculate the CCS ID for the CCS mode as the first available CCS among all the engines (Thanks Matt)
- create the intel_gt_ccs_mode.c function to host the CCS configuration. We will have it ready for the next series.
- Fix a selftest that was failing because could not set CCS2.
- Add the for_each_available_engine() macro to exclude CCS1+ and start using it in the hangcheck selftest.
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list (Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI engine list and adapt the engine counting accordingly (thanks Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (3): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Do not generate the command streamer for all the CCS drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 ++++++++--- drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-- 6 files changed, 103 insertions(+), 6 deletions(-) create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
-- 2.43.0
Acked-by: Michal Mrozek michal.mrozek@intel.com
linux-stable-mirror@lists.linaro.org