When EOI virtualization is performed on VMX, kvm_apic_set_eoi_accelerated()
is called upon EXIT_REASON_EOI_INDUCED but unlike its non-accelerated
apic_set_eoi() sibling, Hyper-V SINT vectors are left unhandled.
Send EOI to Hyper-V SINT vectors when handling acclerated EOI-induced
VM-Exits. KVM Hyper-V needs to handle the SINT EOI irrespective of whether
the EOI is acclerated or not.
Rename kvm_apic_set_eoi_accelerated() to kvm_apic_set_eoi() and let the
non-accelerated helper call the "acclerated" version. That will document
the delta between the non-accelerated path and the accelerated path.
In addition, guarantee to trace even if there's no valid vector to EOI in
the non-accelerated path in order to keep the semantics of the function
intact.
Fixes: 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller")
Cc: <stable(a)vger.kernel.org>
Tested-by: Wang Guangju <wangguangju(a)baidu.com>
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Suggested-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Co-developed-by: Li Rongqing <lirongqing(a)baidu.com>
Signed-off-by: Wang Guangju <wangguangju(a)baidu.com>
---
v1 -> v2: Updated the commit message and implement a new inline function
of apic_set_eoi_vector()
v2 -> v3: Updated the subject and commit message, drop func
apic_set_eoi_vector() and rename kvm_apic_set_eoi_accelerated()
to kvm_apic_set_eoi()
arch/x86/kvm/lapic.c | 45 ++++++++++++++++++++++-----------------------
arch/x86/kvm/lapic.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 3 ++-
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f03facc..b2e72ab 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1269,46 +1269,45 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
}
+/*
+ * Send EOI for a valid vector. The caller, or hardware when this is invoked
+ * after an accelerated EOI VM-Exit, is responsible for updating the vISR and
+ * vPPR.
+ */
+void kvm_apic_set_eoi(struct kvm_lapic *apic, int vector)
+{
+ trace_kvm_eoi(apic, vector);
+
+ if (to_hv_vcpu(apic->vcpu) &&
+ test_bit(vector, to_hv_synic(apic->vcpu)->vec_bitmap))
+ kvm_hv_synic_send_eoi(apic->vcpu, vector);
+
+ kvm_ioapic_send_eoi(apic, vector);
+ kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_apic_set_eoi);
+
static int apic_set_eoi(struct kvm_lapic *apic)
{
int vector = apic_find_highest_isr(apic);
- trace_kvm_eoi(apic, vector);
-
/*
* Not every write EOI will has corresponding ISR,
* one example is when Kernel check timer on setup_IO_APIC
*/
- if (vector == -1)
+ if (vector == -1) {
+ trace_kvm_eoi(apic, vector);
return vector;
+ }
apic_clear_isr(vector, apic);
apic_update_ppr(apic);
- if (to_hv_vcpu(apic->vcpu) &&
- test_bit(vector, to_hv_synic(apic->vcpu)->vec_bitmap))
- kvm_hv_synic_send_eoi(apic->vcpu, vector);
+ kvm_apic_set_eoi(apic, vector);
- kvm_ioapic_send_eoi(apic, vector);
- kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
return vector;
}
-/*
- * this interface assumes a trap-like exit, which has already finished
- * desired side effect including vISR and vPPR update.
- */
-void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
-{
- struct kvm_lapic *apic = vcpu->arch.apic;
-
- trace_kvm_eoi(apic, vector);
-
- kvm_ioapic_send_eoi(apic, vector);
- kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
-}
-EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
-
void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
{
struct kvm_lapic_irq irq;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 762bf61..48260fa 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -126,7 +126,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
-void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
+void kvm_apic_set_eoi(struct kvm_lapic *apic, int vector);
int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr);
void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9258468..f8b9eb1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5519,9 +5519,10 @@ static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
{
unsigned long exit_qualification = vmx_get_exit_qual(vcpu);
int vector = exit_qualification & 0xff;
+ struct kvm_lapic *apic = vcpu->arch.apic;
/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
- kvm_apic_set_eoi_accelerated(vcpu, vector);
+ kvm_apic_set_eoi(apic, vector);
return 1;
}
--
2.9.4
WEC TPMs (in 1.2 mode) and NTC (in 2.0 mode) have been observer to
frequently, but intermittently, fail probe with:
tpm_tis: probe of 00:09 failed with error -1
Added debugging output showed that the request_locality in
tpm_tis_core_init succeeds, but then the tpm_chip_start fails when its
call to tpm_request_locality -> request_locality fails.
The access register in check_locality would show:
0x80 TPM_ACCESS_VALID
0x82 TPM_ACCESS_VALID | TPM_ACCESS_REQUEST_USE
0x80 TPM_ACCESS_VALID
continuing until it times out. TPM_ACCESS_ACTIVE_LOCALITY (0x20) doesn't
get set which would end the wait.
My best guess is something racy was going on between release_locality's
write and request_locality's write. There is no wait in
release_locality to ensure that the locality is released, so the
subsequent request_locality could confuse the TPM?
tpm_chip_start grabs locality 0, and updates chip->locality. Call that
before the TPM_INT_ENABLE write, and drop the explicit request/release
calls. tpm_chip_stop performs the release. With this, we switch to
using chip->locality instead of priv->locality. The probe failure is
not seen after this.
commit 0ef333f5ba7f ("tpm: add request_locality before write
TPM_INT_ENABLE") added a request_locality/release_locality pair around
tpm_tis_write32 TPM_INT_ENABLE, but there is a read of
TPM_INT_ENABLE for the intmask which should also have the locality
grabbed. tpm_chip_start is moved before that to have the locality open
during the read.
Fixes: 0ef333f5ba7f ("tpm: add request_locality before write TPM_INT_ENABLE")
CC: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk(a)gmail.com>
---
The probe failure was seen on 5.4, 5.15 and 5.17.
commit e42acf104d6e ("tpm_tis: Clean up locality release") removed the
release wait. I haven't tried, but re-introducing that would probably
fix this issue. It's hard to know apriori when a synchronous wait is
needed, and they don't seem to be needed typically. Re-introducing the
wait would re-introduce a wait in all cases.
Surrounding the read of TPM_INT_ENABLE with grabbing the locality may
not be necessary? It looks like the code only grabs a locality for
writing, but that asymmetry is surprising to me.
tpm_chip and tpm_tis_data track the locality separately. Should the
tpm_tis_data one be removed so they don't get out of sync?
---
drivers/char/tpm/tpm_tis_core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index dc56b976d816..529c241800c0 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -986,8 +986,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
goto out_err;
}
+ /* Grabs locality 0. */
+ rc = tpm_chip_start(chip);
+ if (rc)
+ goto out_err;
+
/* Take control of the TPM's interrupt hardware and shut it off */
- rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
+ rc = tpm_tis_read32(priv, TPM_INT_ENABLE(chip->locality), &intmask);
if (rc < 0)
goto out_err;
@@ -995,19 +1000,10 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
TPM_INTF_DATA_AVAIL_INT | TPM_INTF_STS_VALID_INT;
intmask &= ~TPM_GLOBAL_INT_ENABLE;
- rc = request_locality(chip, 0);
- if (rc < 0) {
- rc = -ENODEV;
- goto out_err;
- }
-
- tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
- release_locality(chip, 0);
+ tpm_tis_write32(priv, TPM_INT_ENABLE(chip->locality), intmask);
- rc = tpm_chip_start(chip);
- if (rc)
- goto out_err;
rc = tpm2_probe(chip);
+ /* Releases locality 0. */
tpm_chip_stop(chip);
if (rc)
goto out_err;
--
2.36.1
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Don't allow two engines to be reset in parallel, as they would both
try to select a reset bit (and send requests to common registers)
and wait on that register, at the same time. Serialize control of
the reset requests/acks using the uncore->lock, which will also ensure
that no other GT state changes at the same time as the actual reset.
Cc: stable(a)vger.kernel.org # v4.4 and upper
Reported-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala(a)linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda(a)intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)kernel.org>
---
See [PATCH v5 0/2] at: https://lore.kernel.org/all/cover.1657639152.git.mchehab@kernel.org/
drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a..c68d36fb5bbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
return err;
}
-static int gen6_reset_engines(struct intel_gt *gt,
- intel_engine_mask_t engine_mask,
- unsigned int retry)
+static int __gen6_reset_engines(struct intel_gt *gt,
+ intel_engine_mask_t engine_mask,
+ unsigned int retry)
{
struct intel_engine_cs *engine;
u32 hw_mask;
@@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
return gen6_hw_domain_reset(gt, hw_mask);
}
+static int gen6_reset_engines(struct intel_gt *gt,
+ intel_engine_mask_t engine_mask,
+ unsigned int retry)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(>->uncore->lock, flags);
+ ret = __gen6_reset_engines(gt, engine_mask, retry);
+ spin_unlock_irqrestore(>->uncore->lock, flags);
+
+ return ret;
+}
+
static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
{
int vecs_id;
@@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
}
-static int gen11_reset_engines(struct intel_gt *gt,
- intel_engine_mask_t engine_mask,
- unsigned int retry)
+static int __gen11_reset_engines(struct intel_gt *gt,
+ intel_engine_mask_t engine_mask,
+ unsigned int retry)
{
struct intel_engine_cs *engine;
intel_engine_mask_t tmp;
@@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
struct intel_engine_cs *engine;
const bool reset_non_ready = retry >= 1;
intel_engine_mask_t tmp;
+ unsigned long flags;
int ret;
+ spin_lock_irqsave(>->uncore->lock, flags);
+
for_each_engine_masked(engine, gt, engine_mask, tmp) {
ret = gen8_engine_reset_prepare(engine);
if (ret && !reset_non_ready)
@@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
* This is best effort, so ignore any error from the initial reset.
*/
if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
- gen11_reset_engines(gt, gt->info.engine_mask, 0);
+ __gen11_reset_engines(gt, gt->info.engine_mask, 0);
if (GRAPHICS_VER(gt->i915) >= 11)
- ret = gen11_reset_engines(gt, engine_mask, retry);
+ ret = __gen11_reset_engines(gt, engine_mask, retry);
else
- ret = gen6_reset_engines(gt, engine_mask, retry);
+ ret = __gen6_reset_engines(gt, engine_mask, retry);
skip_reset:
for_each_engine_masked(engine, gt, engine_mask, tmp)
gen8_engine_reset_cancel(engine);
+ spin_unlock_irqrestore(>->uncore->lock, flags);
+
return ret;
}
--
2.36.1
Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa5630 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.
Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.
DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.
Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.
Fixes: 353ca0fa5630 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <mario.kleiner.de(a)gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de(a)gmail.com>
Cc: stable(a)vger.kernel.org # 5.14.0
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 6774dd8bb53e..3fe3fbac1e63 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1117,12 +1117,13 @@ bool resource_build_scaling_params(struct pipe_ctx *pipe_ctx)
* on certain displays, such as the Sharp 4k. 36bpp is needed
* to support SURFACE_PIXEL_FORMAT_GRPH_ARGB16161616 and
* SURFACE_PIXEL_FORMAT_GRPH_ABGR16161616 with actual > 10 bpc
- * precision on at least DCN display engines. However, at least
- * Carrizo with DCE_VERSION_11_0 does not like 36 bpp lb depth,
- * so use only 30 bpp on DCE_VERSION_11_0. Testing with DCE 11.2 and 8.3
- * did not show such problems, so this seems to be the exception.
+ * precision on DCN display engines, but apparently not for DCE, as
+ * far as testing on DCE-11.2 and DCE-8 showed. Various DCE parts have
+ * problems: Carrizo with DCE_VERSION_11_0 does not like 36 bpp lb depth,
+ * neither do DCE-8 at 4k resolution, or DCE-11.2 (broken identify pixel
+ * passthrough). Therefore only use 36 bpp on DCN where it is actually needed.
*/
- if (plane_state->ctx->dce_version > DCE_VERSION_11_0)
+ if (plane_state->ctx->dce_version > DCE_VERSION_MAX)
pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_36BPP;
else
pipe_ctx->plane_res.scl_data.lb_params.depth = LB_PIXEL_DEPTH_30BPP;
--
2.34.1