Hi,
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware workaround.
2. Forces the sharing of the load submitted to CCS among all the CCS available (as of now only DG2 has more than one CCS). This way the user, when sending a query, will see only one CCS available.
Andi
Andi Shyti (2): drm/i915/gt: Disable HW load balancing for CCS drm/i915/gt: Set default CCS mode '1'
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 3 +++ drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 5 files changed, 40 insertions(+), 2 deletions(-)
The hardware should not dynamically balance the load between CCS engines. Wa_16016805146 recommends disabling it across all platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ --- drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++ 2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 50962cfd1353..cf709f6c05ae 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1478,6 +1478,7 @@
#define GEN12_RCU_MODE _MMIO(0x14800) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
#define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index d67d44611c28..7f42c8015f71 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); } + + /* + * Wa_16016805146: disable the CCS load balancing + * indiscriminately for all the platforms + */ + wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE); }
static void
On Thu, Feb 15, 2024 at 02:59:23PM +0100, Andi Shyti wrote:
The hardware should not dynamically balance the load between CCS engines. Wa_16016805146 recommends disabling it across all
Is this the right workaround number? When I check the database, this workaround was rejected on both DG2-G10 and DG2-G11, and doesn't even have an entry for DG2-G12.
There are other workarounds that sound somewhat related to load balancing (e.g., part 3 of Wa_14019159160), but what's asked there is more involved than just setting one register bit and conflicts a bit with the second patch of this series.
Matt
platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++ 2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 50962cfd1353..cf709f6c05ae 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1478,6 +1478,7 @@ #define GEN12_RCU_MODE _MMIO(0x14800) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index d67d44611c28..7f42c8015f71 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); }
- /*
* Wa_16016805146: disable the CCS load balancing
* indiscriminately for all the platforms
*/
- wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
} static void -- 2.43.0
Hi Matt,
On Thu, Feb 15, 2024 at 08:55:41AM -0800, Matt Roper wrote:
On Thu, Feb 15, 2024 at 02:59:23PM +0100, Andi Shyti wrote:
The hardware should not dynamically balance the load between CCS engines. Wa_16016805146 recommends disabling it across all
Is this the right workaround number? When I check the database, this workaround was rejected on both DG2-G10 and DG2-G11, and doesn't even have an entry for DG2-G12.
There are other workarounds that sound somewhat related to load balancing (e.g., part 3 of Wa_14019159160), but what's asked there is more involved than just setting one register bit and conflicts a bit with the second patch of this series.
thanks for checking it. Indeed the WA I mentioned is limited to a specific platform. This recommendation comes in different WA, e.g. this one: Wa_14019186972 (3rd point). Will start using that as a reference.
Thank you. Andi
Matt
platforms.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 1 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++++++ 2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index 50962cfd1353..cf709f6c05ae 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1478,6 +1478,7 @@ #define GEN12_RCU_MODE _MMIO(0x14800) #define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) +#define XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1) #define CHV_FUSE_GT _MMIO(VLV_GUNIT_BASE + 0x2168) #define CHV_FGT_DISABLE_SS0 (1 << 10) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index d67d44611c28..7f42c8015f71 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -2988,6 +2988,12 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li wa_mcr_masked_en(wal, GEN8_HALF_SLICE_CHICKEN1, GEN7_PSD_SINGLE_PORT_DISPATCH_ENABLE); }
- /*
* Wa_16016805146: disable the CCS load balancing
* indiscriminately for all the platforms
*/
- wa_masked_en(wal, GEN12_RCU_MODE, XEHP_RCU_MODE_FIXED_SLICE_CCS_MODE);
} static void -- 2.43.0
-- Matt Roper Graphics Software Engineer Linux GPU Platform Enablement Intel Corporation
On Mon, Feb 19, 2024 at 11:17:33AM +0100, Andi Shyti wrote:
On Thu, Feb 15, 2024 at 08:55:41AM -0800, Matt Roper wrote:
On Thu, Feb 15, 2024 at 02:59:23PM +0100, Andi Shyti wrote:
The hardware should not dynamically balance the load between CCS engines. Wa_16016805146 recommends disabling it across all
Is this the right workaround number? When I check the database, this workaround was rejected on both DG2-G10 and DG2-G11, and doesn't even have an entry for DG2-G12.
There are other workarounds that sound somewhat related to load balancing (e.g., part 3 of Wa_14019159160), but what's asked there is more involved than just setting one register bit and conflicts a bit with the second patch of this series.
thanks for checking it. Indeed the WA I mentioned is limited to a specific platform. This recommendation comes in different WA, e.g. this one: Wa_14019186972 (3rd point). Will start using that as a reference.
actually you are right, I checked with Joonas and I will use Wa_14019159160.
Thanks, Andi
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+ --- drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } }
+static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{ + if (!IS_DG2(gt->i915)) + return; + + intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0); +} + int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; @@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt)
intel_gt_init_swizzling(gt);
+ /* Configure CCS mode */ + intel_gt_apply_ccs_mode(gt); + /* * At least 830 can leave some of the unused rings * "active" (ie. head != tail) after resume which diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11)
+#define XEHP_CCS_MODE _MMIO(0x14804) + #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node)))
+/* + * Exclude unavailable engines. + * + * Only the first CCS engine is utilized due to the disabling of CCS auto load + * balancing. As a result, all CCS engines operate collectively, functioning + * essentially as a single CCS engine, hence the count of active CCS engines is + * considered '1'. + * Currently, this applies to platforms with more than one CCS engine, + * specifically DG2. + */ +#define for_each_available_uabi_engine(engine__, i915__) \ + for_each_uabi_engine(engine__, i915__) \ + if ((IS_DG2(i915__)) && \ + ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \ + ((engine__)->uabi_instance)) { } \ + else + #define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
+ static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item) @@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL;
- for_each_uabi_engine(engine, i915) + for_each_available_uabi_engine(engine, i915) num_uabi_engines++;
len = struct_size(query_ptr, engines, num_uabi_engines); @@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915,
info_ptr = &query_ptr->engines[0];
- for_each_uabi_engine(engine, i915) { + for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
On 2/15/2024 05:59, Andi Shyti wrote:
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
These two paragraphs are mutually exclusive. You can't have four CCS engines 'working together' if only one engine exists. I think you are meaning that we only export 1 CCS engine and that single engine is configured to control all the EUs. As opposed to running in 4 CCS engine mode where the EUs are (dynamically or statically) divided amongst those four engines.
John.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } } +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{
- if (!IS_DG2(gt->i915))
return;
- intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
- int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt) intel_gt_init_swizzling(gt);
- /* Configure CCS mode */
- intel_gt_apply_ccs_mode(gt);
- /*
- At least 830 can leave some of the unused rings
- "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11) +#define XEHP_CCS_MODE _MMIO(0x14804)
- #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node))) +/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of CCS auto load
- balancing. As a result, all CCS engines operate collectively, functioning
- essentially as a single CCS engine, hence the count of active CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \
- for_each_uabi_engine(engine__, i915__) \
if ((IS_DG2(i915__)) && \
((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
((engine__)->uabi_instance)) { } \
else
- #define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
- static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL;
- for_each_uabi_engine(engine, i915)
- for_each_available_uabi_engine(engine, i915) num_uabi_engines++;
len = struct_size(query_ptr, engines, num_uabi_engines); @@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915, info_ptr = &query_ptr->engines[0];
- for_each_uabi_engine(engine, i915) {
- for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
Hi John,
On Thu, Feb 15, 2024 at 01:23:24PM -0800, John Harrison wrote:
On 2/15/2024 05:59, Andi Shyti wrote:
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
These two paragraphs are mutually exclusive. You can't have four CCS engines 'working together' if only one engine exists. I think you are meaning that we only export 1 CCS engine and that single engine is configured to control all the EUs. As opposed to running in 4 CCS engine mode where the EUs are (dynamically or statically) divided amongst those four engines.
The balancing is done statically. The dynamic balancing is disabled in patch 1.
The 2 or 4 CCS engines will share the same workload.
Because the user won't be able anymore to select the CCS engine he wants to use, he will see only one CCS.
I think we are saying the same thing using different words :) I can try in v2 to reword the commit better.
Thanks for looking into this. Andi
John.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } } +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{
- if (!IS_DG2(gt->i915))
return;
- intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
- int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt) intel_gt_init_swizzling(gt);
- /* Configure CCS mode */
- intel_gt_apply_ccs_mode(gt);
- /*
- At least 830 can leave some of the unused rings
- "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11) +#define XEHP_CCS_MODE _MMIO(0x14804)
- #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node))) +/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of CCS auto load
- balancing. As a result, all CCS engines operate collectively, functioning
- essentially as a single CCS engine, hence the count of active CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \
- for_each_uabi_engine(engine__, i915__) \
if ((IS_DG2(i915__)) && \
((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
((engine__)->uabi_instance)) { } \
else
- #define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
- static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL;
- for_each_uabi_engine(engine, i915)
- for_each_available_uabi_engine(engine, i915) num_uabi_engines++; len = struct_size(query_ptr, engines, num_uabi_engines);
@@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915, info_ptr = &query_ptr->engines[0];
- for_each_uabi_engine(engine, i915) {
- for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
On 2/15/2024 14:34, Andi Shyti wrote:
Hi John,
On Thu, Feb 15, 2024 at 01:23:24PM -0800, John Harrison wrote:
On 2/15/2024 05:59, Andi Shyti wrote:
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
These two paragraphs are mutually exclusive. You can't have four CCS engines 'working together' if only one engine exists. I think you are meaning that we only export 1 CCS engine and that single engine is configured to control all the EUs. As opposed to running in 4 CCS engine mode where the EUs are (dynamically or statically) divided amongst those four engines.
The balancing is done statically. The dynamic balancing is disabled in patch 1.
The 2 or 4 CCS engines will share the same workload.
But they don't.
In i915, we use 'engine' to refer to a command streamer and all the associated hardware. This is distinct from the EUs which sit behind and can be driven by one or more command streamers. Saying that multiple engines are sharing a workload implies that you are submitting the context to multiple command streamers in parallel. I.e. a similar process to media frame split where they have a set of LRCA contexts bound together which are submitted in parallel to two or more video decode engines (VCS0, VCS1, etc.). That is not what is happening here.
Here, you are submitting a single context with a singe ring buffer to a single engine - CCS0. That engine is configured to own all EUs. Which actually means that submitting a compute task to another CCS engine will achieve nothing because there are no EUs available to those other engines. They will simply hang when waiting for the walker instruction to complete.
Because the user won't be able anymore to select the CCS engine he wants to use, he will see only one CCS.
I think we are saying the same thing using different words :)
But words are important.
John.
I can try in v2 to reword the commit better.
Thanks for looking into this. Andi
John.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } } +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{
- if (!IS_DG2(gt->i915))
return;
- intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
- int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt) intel_gt_init_swizzling(gt);
- /* Configure CCS mode */
- intel_gt_apply_ccs_mode(gt);
- /*
- At least 830 can leave some of the unused rings
- "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11) +#define XEHP_CCS_MODE _MMIO(0x14804)
- #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node))) +/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of CCS auto load
- balancing. As a result, all CCS engines operate collectively, functioning
- essentially as a single CCS engine, hence the count of active CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \
- for_each_uabi_engine(engine__, i915__) \
if ((IS_DG2(i915__)) && \
((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
((engine__)->uabi_instance)) { } \
else
- #define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps)
diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
- static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL;
- for_each_uabi_engine(engine, i915)
- for_each_available_uabi_engine(engine, i915) num_uabi_engines++; len = struct_size(query_ptr, engines, num_uabi_engines);
@@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915, info_ptr = &query_ptr->engines[0];
- for_each_uabi_engine(engine, i915) {
- for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
On 15/02/2024 13:59, Andi Shyti wrote:
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } } +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{
- if (!IS_DG2(gt->i915))
return;
- intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0);
+}
- int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915;
@@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt) intel_gt_init_swizzling(gt);
- /* Configure CCS mode */
- intel_gt_apply_ccs_mode(gt);
- /*
- At least 830 can leave some of the unused rings
- "active" (ie. head != tail) after resume which
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11) +#define XEHP_CCS_MODE _MMIO(0x14804)
- #define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node))) +/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of CCS auto load
- balancing. As a result, all CCS engines operate collectively, functioning
- essentially as a single CCS engine, hence the count of active CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \
- for_each_uabi_engine(engine__, i915__) \
if ((IS_DG2(i915__)) && \
((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \
((engine__)->uabi_instance)) { } \
else
If you don't want userspace to see some engines, just don't add them to the uabi list in intel_engines_driver_register or thereabouts?
Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs you can choose a different approach, whatever is more elegant.
That is also needed for i915->engine_uabi_class_count to be right, so userspace stats which rely on it are correct.
Regards,
Tvrtko
#define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
- static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item)
@@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL;
- for_each_uabi_engine(engine, i915)
- for_each_available_uabi_engine(engine, i915) num_uabi_engines++;
len = struct_size(query_ptr, engines, num_uabi_engines); @@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915, info_ptr = &query_ptr->engines[0];
- for_each_uabi_engine(engine, i915) {
- for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
On 19/02/2024 11:16, Tvrtko Ursulin wrote:
On 15/02/2024 13:59, Andi Shyti wrote:
Since CCS automatic load balancing is disabled, we will impose a fixed balancing policy that involves setting all the CCS engines to work together on the same load.
Simultaneously, the user will see only 1 CCS rather than the actual number. As of now, this change affects only DG2.
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement") Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Cc: Chris Wilson chris.p.wilson@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: stable@vger.kernel.org # v6.2+
drivers/gpu/drm/i915/gt/intel_gt.c | 11 +++++++++++ drivers/gpu/drm/i915/gt/intel_gt_regs.h | 2 ++ drivers/gpu/drm/i915/i915_drv.h | 17 +++++++++++++++++ drivers/gpu/drm/i915/i915_query.c | 5 +++-- 4 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a425db5ed3a2..e19df4ef47f6 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -168,6 +168,14 @@ static void init_unused_rings(struct intel_gt *gt) } } +static void intel_gt_apply_ccs_mode(struct intel_gt *gt) +{ + if (!IS_DG2(gt->i915)) + return;
+ intel_uncore_write(gt->uncore, XEHP_CCS_MODE, 0); +}
int intel_gt_init_hw(struct intel_gt *gt) { struct drm_i915_private *i915 = gt->i915; @@ -195,6 +203,9 @@ int intel_gt_init_hw(struct intel_gt *gt) intel_gt_init_swizzling(gt); + /* Configure CCS mode */ + intel_gt_apply_ccs_mode(gt);
/* * At least 830 can leave some of the unused rings * "active" (ie. head != tail) after resume which diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h index cf709f6c05ae..c148113770ea 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h @@ -1605,6 +1605,8 @@ #define GEN12_VOLTAGE_MASK REG_GENMASK(10, 0) #define GEN12_CAGF_MASK REG_GENMASK(19, 11) +#define XEHP_CCS_MODE _MMIO(0x14804)
#define GEN11_GT_INTR_DW(x) _MMIO(0x190018 + ((x) * 4)) #define GEN11_CSME (31) #define GEN12_HECI_2 (30) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e81b3b2858ac..0853ffd3cb8d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -396,6 +396,23 @@ static inline struct intel_gt *to_gt(const struct drm_i915_private *i915) (engine__); \ (engine__) = rb_to_uabi_engine(rb_next(&(engine__)->uabi_node))) +/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of CCS
auto load
- balancing. As a result, all CCS engines operate collectively,
functioning
- essentially as a single CCS engine, hence the count of active CCS
engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \ + for_each_uabi_engine(engine__, i915__) \ + if ((IS_DG2(i915__)) && \ + ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \ + ((engine__)->uabi_instance)) { } \ + else
If you don't want userspace to see some engines, just don't add them to the uabi list in intel_engines_driver_register or thereabouts?
Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs you can choose a different approach, whatever is more elegant.
That is also needed for i915->engine_uabi_class_count to be right, so userspace stats which rely on it are correct.
I later realized it is more than that - everything that uses intel_engine_lookup_user to look up class instance passed in from userspace relies on the engine not being on the user list otherwise userspace could bypass the fact engine query does not list it. Like PMU, Perf/POA, context engine map and SSEU context query.
Regards,
Tvrtko
Regards,
Tvrtko
#define INTEL_INFO(i915) ((i915)->__info) #define RUNTIME_INFO(i915) (&(i915)->__runtime) #define DRIVER_CAPS(i915) (&(i915)->caps) diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index fa3e937ed3f5..2d41bda626a6 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -124,6 +124,7 @@ static int query_geometry_subslices(struct drm_i915_private *i915, return fill_topology_info(sseu, query_item, sseu->geometry_subslice_mask); }
static int query_engine_info(struct drm_i915_private *i915, struct drm_i915_query_item *query_item) @@ -140,7 +141,7 @@ query_engine_info(struct drm_i915_private *i915, if (query_item->flags) return -EINVAL; - for_each_uabi_engine(engine, i915) + for_each_available_uabi_engine(engine, i915) num_uabi_engines++; len = struct_size(query_ptr, engines, num_uabi_engines); @@ -155,7 +156,7 @@ query_engine_info(struct drm_i915_private *i915, info_ptr = &query_ptr->engines[0]; - for_each_uabi_engine(engine, i915) { + for_each_available_uabi_engine(engine, i915) { info.engine.engine_class = engine->uabi_class; info.engine.engine_instance = engine->uabi_instance; info.flags = I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE;
Hi Tvrtko,
On Mon, Feb 19, 2024 at 12:51:44PM +0000, Tvrtko Ursulin wrote:
On 19/02/2024 11:16, Tvrtko Ursulin wrote:
On 15/02/2024 13:59, Andi Shyti wrote:
...
+/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of
CCS auto load
- balancing. As a result, all CCS engines operate collectively,
functioning
- essentially as a single CCS engine, hence the count of active
CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \ + for_each_uabi_engine(engine__, i915__) \ + if ((IS_DG2(i915__)) && \ + ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \ + ((engine__)->uabi_instance)) { } \ + else
If you don't want userspace to see some engines, just don't add them to the uabi list in intel_engines_driver_register or thereabouts?
It will be dynamic. In next series I am preparing the user will be able to increase the number of CCS engines he wants to use.
Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs you can choose a different approach, whatever is more elegant.
That is also needed for i915->engine_uabi_class_count to be right, so userspace stats which rely on it are correct.
Oh yes. Will update it.
I later realized it is more than that - everything that uses intel_engine_lookup_user to look up class instance passed in from userspace relies on the engine not being on the user list otherwise userspace could bypass the fact engine query does not list it. Like PMU, Perf/POA, context engine map and SSEU context query.
Correct, will look into that, thank you!
Andi
On 20/02/2024 10:11, Andi Shyti wrote:
Hi Tvrtko,
On Mon, Feb 19, 2024 at 12:51:44PM +0000, Tvrtko Ursulin wrote:
On 19/02/2024 11:16, Tvrtko Ursulin wrote:
On 15/02/2024 13:59, Andi Shyti wrote:
...
+/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of
CCS auto load
- balancing. As a result, all CCS engines operate collectively,
functioning
- essentially as a single CCS engine, hence the count of active
CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \ + for_each_uabi_engine(engine__, i915__) \ + if ((IS_DG2(i915__)) && \ + ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \ + ((engine__)->uabi_instance)) { } \ + else
If you don't want userspace to see some engines, just don't add them to the uabi list in intel_engines_driver_register or thereabouts?
It will be dynamic. In next series I am preparing the user will be able to increase the number of CCS engines he wants to use.
Oh tricky and new. Does it need to be at runtime or could be boot time?
If you are aiming to make the static single CCS only into the 6.9 release, and you feel running out of time, you could always do a simple solution for now. The one I mentioned of simply not registering on the uabi list. Then you can refine more leisurely for the next release.
Regards,
Tvrtko
Similar as we do for gsc which uses I915_NO_UABI_CLASS, although for ccs you can choose a different approach, whatever is more elegant.
That is also needed for i915->engine_uabi_class_count to be right, so userspace stats which rely on it are correct.
Oh yes. Will update it.
I later realized it is more than that - everything that uses intel_engine_lookup_user to look up class instance passed in from userspace relies on the engine not being on the user list otherwise userspace could bypass the fact engine query does not list it. Like PMU, Perf/POA, context engine map and SSEU context query.
Correct, will look into that, thank you!
Andi
On Tue, Feb 20, 2024 at 11:15:05AM +0000, Tvrtko Ursulin wrote:
On 20/02/2024 10:11, Andi Shyti wrote:
On Mon, Feb 19, 2024 at 12:51:44PM +0000, Tvrtko Ursulin wrote:
On 19/02/2024 11:16, Tvrtko Ursulin wrote:
On 15/02/2024 13:59, Andi Shyti wrote:
...
+/*
- Exclude unavailable engines.
- Only the first CCS engine is utilized due to the disabling of
CCS auto load
- balancing. As a result, all CCS engines operate collectively,
functioning
- essentially as a single CCS engine, hence the count of active
CCS engines is
- considered '1'.
- Currently, this applies to platforms with more than one CCS engine,
- specifically DG2.
- */
+#define for_each_available_uabi_engine(engine__, i915__) \ + for_each_uabi_engine(engine__, i915__) \ + if ((IS_DG2(i915__)) && \ + ((engine__)->uabi_class == I915_ENGINE_CLASS_COMPUTE) && \ + ((engine__)->uabi_instance)) { } \ + else
If you don't want userspace to see some engines, just don't add them to the uabi list in intel_engines_driver_register or thereabouts?
It will be dynamic. In next series I am preparing the user will be able to increase the number of CCS engines he wants to use.
Oh tricky and new. Does it need to be at runtime or could be boot time?
At boot time the CCS mode has to be set to '1', i.e. only one CCS will be visible to the user. Then, during the normal execution of the driver, the user can decide to change the mode and therefore we would need to expose more than one CCS engine.
If you are aiming to make the static single CCS only into the 6.9 release, and you feel running out of time, you could always do a simple solution for now. The one I mentioned of simply not registering on the uabi list. Then you can refine more leisurely for the next release.
Thanks. I started working on a dynamic rebuilt of the uabi_engines, but in this series it wouldn't fit.
I will add the limitation during the list creation.
Thanks a lot, Andi
linux-stable-mirror@lists.linaro.org