intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
---
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen. --- drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14) + if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT); + dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14) + if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock);
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking, so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking,
Actually the patch would keep the counter working, as the counter wouldn't be updated in the dmc==NULL case. However I still think the correct fix would be to check the correct DC state, which from the POV of the counter tracking is the driver's version of the state, not the HW state.
so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
On Tue, 02 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking,
Actually the patch would keep the counter working, as the counter wouldn't be updated in the dmc==NULL case. However I still think the correct fix would be to check the correct DC state, which from the POV of the counter tracking is the driver's version of the state, not the HW state.
One thing I failed to mention is that this happens in a KASAN run in QEMU. So I'm kind of not surprised we haven't hit this before. And it impacts the deductions about the DC state.
I'm not quite sure what exactly you're suggesting, maybe a draft patch would communicate the idea better than plain English? ;)
Anyway, I think "not oopsing" is a lot better than "inaccurate DC counters in debugfs".
BR, Jani.
so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
On Tue, 02 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking,
Actually the patch would keep the counter working, as the counter wouldn't be updated in the dmc==NULL case. However I still think the correct fix would be to check the correct DC state, which from the POV of the counter tracking is the driver's version of the state, not the HW state.
One thing I failed to mention is that this happens in a KASAN run in QEMU. So I'm kind of not surprised we haven't hit this before. And it impacts the deductions about the DC state.
Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register to a non-zero value then. But in any case the driver should handle it.
I'm not quite sure what exactly you're suggesting, maybe a draft patch would communicate the idea better than plain English? ;)
intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as the debugfs entry can be read at any point. With that, what I meant is:
in gen9_set_dc_state(): ... - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6; + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6; ...
in intel_dmc_get_dc6_allowed_count(): ... if (DISPLAY_VER(display) < 14) return false;
+ if (!dmc) { + *count = 0; + return true; + } + mutex_lock(&power_domains->lock); - dc6_enabled = intel_de_read(display, DC_STATE_EN) & - DC_STATE_EN_UPTO_DC6; + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6; ...
Anyway, I think "not oopsing" is a lot better than "inaccurate DC counters in debugfs".
Agreed, the above would ensure both.
BR, Jani.
so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
-- Jani Nikula, Intel
On Wed, 03 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
On Tue, 02 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking,
Actually the patch would keep the counter working, as the counter wouldn't be updated in the dmc==NULL case. However I still think the correct fix would be to check the correct DC state, which from the POV of the counter tracking is the driver's version of the state, not the HW state.
One thing I failed to mention is that this happens in a KASAN run in QEMU. So I'm kind of not surprised we haven't hit this before. And it impacts the deductions about the DC state.
Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register to a non-zero value then. But in any case the driver should handle it.
I'm not quite sure what exactly you're suggesting, maybe a draft patch would communicate the idea better than plain English? ;)
intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as the debugfs entry can be read at any point. With that, what I meant is:
in gen9_set_dc_state(): ...
dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
I still don't understand why we can trust our own value rather than what's in the hardware in this case.
For resume, we even call gen9_sanitize_dc_state(), but not for probe.
...
in intel_dmc_get_dc6_allowed_count(): ... if (DISPLAY_VER(display) < 14) return false;
if (!dmc) {*count = 0;return true;}
This seems neat but is overkill. dmc is never NULL here, but I added the check for completeness.
It's the intel_dmc_update_dc6_allowed_count() that's more fragile, and I want that to have the !dmc check, instead of relying on the subtle dependency on power_domains->dc_state.
mutex_lock(&power_domains->lock);
dc6_enabled = intel_de_read(display, DC_STATE_EN) &DC_STATE_EN_UPTO_DC6;
dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;...
Anyway, I think "not oopsing" is a lot better than "inaccurate DC counters in debugfs".
Agreed, the above would ensure both.
BR, Jani.
so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
-- Jani Nikula, Intel
On Thu, Dec 04, 2025 at 01:30:27PM +0200, Jani Nikula wrote:
On Wed, 03 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
On Tue, 02 Dec 2025, Imre Deak imre.deak@intel.com wrote:
On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been initialized, and dmc is thus NULL.
That would be the case when the call path is intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as intel_power_domains_init_hw() is called *before* intel_dmc_init().
However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count() conditionally, depending on the current and target DC states. At probe, the target is disabled, but if DC6 is enabled, the function is called, and an oops follows. Apparently it's quite unlikely that DC6 is enabled at probe, as we haven't seen this failure mode before.
Add NULL checks and switch the dmc->display references to just display.
Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 counter") Cc: Mohammed Thasleem mohammed.thasleem@intel.com Cc: Imre Deak imre.deak@intel.com Cc: stable@vger.kernel.org # v6.16+ Signed-off-by: Jani Nikula jani.nikula@intel.com
Rare case, but this may also throw off the rc6 counting in debugfs when it does happen.
Yes, I missed the case where the driver is being loaded while DC6 is enabled, this is what happens for the reporter:
i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
That's odd, as DC6 requires the DMC firmware, which - if it's indeed loaded by BIOS for instance - will be overwritten by the driver, not a well specified sequence (even though the driver is trying to handle it correctly by disabling any active firmware handler).
But as you pointed out this would also throw off the cooked-up DC6 counter tracking,
Actually the patch would keep the counter working, as the counter wouldn't be updated in the dmc==NULL case. However I still think the correct fix would be to check the correct DC state, which from the POV of the counter tracking is the driver's version of the state, not the HW state.
One thing I failed to mention is that this happens in a KASAN run in QEMU. So I'm kind of not surprised we haven't hit this before. And it impacts the deductions about the DC state.
Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register to a non-zero value then. But in any case the driver should handle it.
I'm not quite sure what exactly you're suggesting, maybe a draft patch would communicate the idea better than plain English? ;)
intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as the debugfs entry can be read at any point. With that, what I meant is:
in gen9_set_dc_state(): ...
dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;I still don't understand why we can trust our own value rather than what's in the hardware in this case.
The BIOS/FW can set random flags in the register, as in the above case, so it can't be trusted. The counter update ending the tracking of the duration of a DC6 enabled state only works if the driver did in fact enable DC6 previously and has started the tracking accordingly. This is only guaranteed if the driver has set DC_STATE_EN_UPTO_DC6 in power_domains->dc_state, the corrsponding HW flag doesn't guarantee it.
For resume, we even call gen9_sanitize_dc_state(), but not for probe.
After system suspend, the driver enabling DC9 by setting the corrsponding DC9 flag in the DC_STATE_EN register, the HW/firmware will disable DC9 while resuming. The SW version of the DC state will be updated accordingly in the above function to reflect the disabled DC9 state.
...
in intel_dmc_get_dc6_allowed_count(): ... if (DISPLAY_VER(display) < 14) return false;
if (!dmc) {*count = 0;return true;}This seems neat but is overkill. dmc is never NULL here, but I added the check for completeness.
intel_dmc_get_dc6_allowed_count() shouldn't fall back on DISPLAY_VER>=14 to report the DC6 residency in a way that only works for older platforms. Hence the function should return true for DISPLAY_VER>=14.
It's the intel_dmc_update_dc6_allowed_count() that's more fragile, and I want that to have the !dmc check, instead of relying on the subtle dependency on power_domains->dc_state.
The counter tracking should depend on the power_domians->dc_state SW state as described above, so that's the correct thing to do there. dmc==NULL in intel_dmc_update_dc6_allowed_count() would be only a bug in the driver, if you wanted to check for that it should be a WARN_ON(!dmc) check.
mutex_lock(&power_domains->lock);
dc6_enabled = intel_de_read(display, DC_STATE_EN) &DC_STATE_EN_UPTO_DC6;
dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;...
Anyway, I think "not oopsing" is a lot better than "inaccurate DC counters in debugfs".
Agreed, the above would ensure both.
BR, Jani.
so could instead the counter update depend on the driver's DC state instead of the HW state? I.e. set gen9_set_dc_state()/dc6_was_enabled, intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state says that DC6 was indeed enabled by the driver (instead of checking the HW state).
That would fix the reporter's oops when calling intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling it if the driver hasn't actually enabled DC6 and it would also keep the DC6 counter tracking correct.
intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also guaranteed to be called only once the firmware is loaded, as until that point enabling DC6 is blocked (by holding a reference on the DC_off power well).
drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c b/drivers/gpu/drm/i915/display/intel_dmc.c index 2fb6fec6dc99..169bbbc91f6d 100644 --- a/drivers/gpu/drm/i915/display/intel_dmc.c +++ b/drivers/gpu/drm/i915/display/intel_dmc.c @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct intel_display *display, struct intel_dmc *dmc = display_to_dmc(display); u32 dc5_cur_count;
- if (DISPLAY_VER(dmc->display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return;
- dc5_cur_count = intel_de_read(dmc->display, DG1_DMC_DEBUG_DC5_COUNT);
- dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
if (!start_tracking) dmc->dc6_allowed.count += dc5_cur_count - dmc->dc6_allowed.dc5_start; @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * struct intel_dmc *dmc = display_to_dmc(display); bool dc6_enabled;
- if (DISPLAY_VER(display) < 14)
- if (!dmc || DISPLAY_VER(display) < 14) return false;
mutex_lock(&power_domains->lock); -- 2.47.3
-- Jani Nikula, Intel
-- Jani Nikula, Intel
linux-stable-mirror@lists.linaro.org