Set the DMA mask before calling nvkm_device_ctor(), so that when the
flush page is created in nvkm_fb_ctor(), the allocation will not fail
if the page is outside of DMA address space, which can easily happen if
IOMMU is disable. In such situations, you will get an error like this:
nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0).
Commit 38f5359354d4 ("rm/nouveau/pci: set streaming DMA mask early")
set the mask after calling nvkm_device_ctor(), but back then there was
no flush page being created, which might explain why the mask wasn't
set earlier.
Flush page allocation was added in commit 5728d064190e ("drm/nouveau/fb:
handle sysmem flush page from common code"). nvkm_fb_ctor() calls
alloc_page(), which can allocate a page anywhere in system memory, but
then calls dma_map_page() on that page. But since the DMA mask is still
set to 32, the map can fail if the page is allocated above 4GB. This is
easy to reproduce on systems with a lot of memory and IOMMU disabled.
An alternative approach would be to force the allocation of the flush
page to low memory, by specifying __GFP_DMA32. However, this would
always allocate the page in low memory, even though the hardware can
access high memory.
Fixes: 5728d064190e ("drm/nouveau/fb: handle sysmem flush page from common code")
Signed-off-by: Timur Tabi <ttabi(a)nvidia.com>
---
.../gpu/drm/nouveau/nvkm/engine/device/pci.c | 24 +++++++++----------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
index 8f0261a0d618..7cc5a7499583 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/pci.c
@@ -1695,6 +1695,18 @@ nvkm_device_pci_new(struct pci_dev *pci_dev, const char *cfg, const char *dbg,
*pdevice = &pdev->device;
pdev->pdev = pci_dev;
+ /* Set DMA mask based on capabilities reported by the MMU subdev. */
+ if (pdev->device.mmu && !pdev->device.pci->agp.bridge)
+ bits = pdev->device.mmu->dma_bits;
+ else
+ bits = 32;
+
+ ret = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(bits));
+ if (ret && bits != 32) {
+ dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
+ pdev->device.mmu->dma_bits = 32;
+ }
+
ret = nvkm_device_ctor(&nvkm_device_pci_func, quirk, &pci_dev->dev,
pci_is_pcie(pci_dev) ? NVKM_DEVICE_PCIE :
pci_find_capability(pci_dev, PCI_CAP_ID_AGP) ?
@@ -1708,17 +1720,5 @@ nvkm_device_pci_new(struct pci_dev *pci_dev, const char *cfg, const char *dbg,
if (ret)
return ret;
- /* Set DMA mask based on capabilities reported by the MMU subdev. */
- if (pdev->device.mmu && !pdev->device.pci->agp.bridge)
- bits = pdev->device.mmu->dma_bits;
- else
- bits = 32;
-
- ret = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(bits));
- if (ret && bits != 32) {
- dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(32));
- pdev->device.mmu->dma_bits = 32;
- }
-
return 0;
}
base-commit: 18a7e218cfcdca6666e1f7356533e4c988780b57
--
2.51.0
KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.
However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1. Apart from the consistency
check, this is currently the case because NP_ENABLE is actually copied
from VMCB01 to VMCB02, not from VMCB12.
On the other hand, the APM does mention two other consistency checks for
NP_ENABLE, both of which are missing (paraphrased):
In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
following conditions are considered illegal state combinations, in
addition to those mentioned in “Canonicalization and Consistency
Checks”:
• Any MBZ bit of nCR3 is set.
• Any G_PAT.PA field has an unsupported type encoding or any
reserved field in G_PAT has a nonzero value.
Replace the existing consistency check with consistency checks on
hCR0.PG and nCR3. Only perform the consistency checks if L1 has
X86_FEATURE_NPT and NP_ENABLE is set in VMCB12. The G_PAT consistency
check will be addressed separately.
As it is now possible for an L1 to run L2 with NP_ENABLE set but
ignored, also check that L1 has X86_FEATURE_NPT in nested_npt_enabled().
Pass L1's CR0 to __nested_vmcb_check_controls(). In
nested_vmcb_check_controls(), L1's CR0 is available through
kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
through nested_vmcb02_prepare_save() -> svm_set_cr0().
In svm_set_nested_state(), L1's CR0 is available in the captured save
area, as svm_get_nested_state() captures L1's save area when running L2,
and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
nested_svm_vmrun()).
Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
arch/x86/kvm/svm/svm.h | 3 ++-
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 74211c5c68026..87bcc5eff96e8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
}
static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_ctrl_area_cached *control)
+ struct vmcb_ctrl_area_cached *control,
+ unsigned long l1_cr0)
{
if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;
@@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
if (CC(control->asid == 0))
return false;
- if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
- return false;
+ if (nested_npt_enabled(to_svm(vcpu))) {
+ if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+ return false;
+ if (CC(!(l1_cr0 & X86_CR0_PG)))
+ return false;
+ }
if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
MSRPM_SIZE)))
@@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
- return __nested_vmcb_check_controls(vcpu, ctl);
+ /*
+ * Make sure we did not enter guest mode yet, in which case
+ * kvm_read_cr0() could return L2's CR0.
+ */
+ WARN_ON_ONCE(is_guest_mode(vcpu));
+ return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
}
static
@@ -1831,7 +1841,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
ret = -EINVAL;
__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
- if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+ /* 'save' contains L1 state saved from before VMRUN */
+ if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
goto out_free;
/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f6fb70ddf7272..3e805a43ffcdb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
static inline bool nested_npt_enabled(struct vcpu_svm *svm)
{
- return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+ return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
+ svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
}
static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
--
2.51.2.1041.gc1ab5b90ca-goog
In preparation for using svm_copy_lbrs() with 'struct vmcb_save_area'
without a containing 'struct vmcb', and later even 'struct
vmcb_save_area_cached', make it a macro. Pull the call to
vmcb_mark_dirty() out to the callers.
Macros are generally not preferred compared to functions, mainly due to
type-safety. However, in this case it seems like having a simple macro
copying a few fields is better than copy-pasting the same 5 lines of
code in different places.
On the bright side, pulling vmcb_mark_dirty() calls to the callers makes
it clear that in one case, vmcb_mark_dirty() was being called on VMCB12.
It is not architecturally defined for the CPU to clear arbitrary clean
bits, and it is not needed, so drop that one call.
Technically fixes the non-architectural behavior of setting the dirty
bit on VMCB12.
Fixes: d20c796ca370 ("KVM: x86: nSVM: implement nested LBR virtualization")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/nested.c | 16 ++++++++++------
arch/x86/kvm/svm/svm.c | 11 -----------
arch/x86/kvm/svm/svm.h | 10 +++++++++-
3 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index da6e80b3ac353..a37bd5c1f36fa 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -675,10 +675,12 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(vmcb02, vmcb12);
+ svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
} else {
- svm_copy_lbrs(vmcb02, vmcb01);
+ svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
}
svm_update_lbrv(&svm->vcpu);
}
@@ -1184,10 +1186,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK)))
- svm_copy_lbrs(vmcb12, vmcb02);
- else
- svm_copy_lbrs(vmcb01, vmcb02);
+ (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+ } else {
+ svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
+ vmcb_mark_dirty(vmcb01, VMCB_LBR);
+ }
svm_update_lbrv(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 10c21e4c5406f..711276e8ee84f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -795,17 +795,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
*/
}
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
-{
- to_vmcb->save.dbgctl = from_vmcb->save.dbgctl;
- to_vmcb->save.br_from = from_vmcb->save.br_from;
- to_vmcb->save.br_to = from_vmcb->save.br_to;
- to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
- to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
-
- vmcb_mark_dirty(to_vmcb, VMCB_LBR);
-}
-
static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c856d8e0f95e7..f6fb70ddf7272 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -687,8 +687,16 @@ static inline void *svm_vcpu_alloc_msrpm(void)
return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
}
+#define svm_copy_lbrs(to, from) \
+({ \
+ (to)->dbgctl = (from)->dbgctl; \
+ (to)->br_from = (from)->br_from; \
+ (to)->br_to = (from)->br_to; \
+ (to)->last_excp_from = (from)->last_excp_from; \
+ (to)->last_excp_to = (from)->last_excp_to; \
+})
+
void svm_vcpu_free_msrpm(void *msrpm);
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
void svm_enable_lbrv(struct kvm_vcpu *vcpu);
void svm_update_lbrv(struct kvm_vcpu *vcpu);
--
2.51.2.1041.gc1ab5b90ca-goog
From: Samuel Zhang <guoqing.zhang(a)amd.com>
[ Upstream commit eb6e7f520d6efa4d4ebf1671455abe4a681f7a05 ]
On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.
Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.
The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.
Signed-off-by: Samuel Zhang <guoqing.zhang(a)amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 5d1b32cfe4a676fe552416cb5ae847b215463a1a)
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
Based on my comprehensive analysis using semantic code analysis tools
and repository examination, here's my assessment:
## **BACKPORT DECISION: YES**
### Analysis Process and Findings:
#### 1. **Semantic Analysis Tools Used:**
- **mcp__semcode__find_function**: Located
`aqua_vanjaram_switch_partition_mode`, `gfx_v9_4_3_cp_resume`, and
`amdgpu_xcp_restore_partition_mode`
- **mcp__semcode__find_callers**: Traced call graph showing
`gfx_v9_4_3_cp_resume` is called during resume via
`gfx_v9_4_3_hw_init`
- **mcp__semcode__find_callchain**: Confirmed the resume path and
analyzed impact scope
- **mcp__semcode__find_type**: Examined `struct amdgpu_device` to verify
`in_suspend` flag management
- **WebSearch**: Found mailing list discussions showing multiple patch
iterations (v2, v3, v4)
- **Repository analysis**: Traced historical context and related commits
#### 2. **Impact Analysis:**
**Severity: HIGH** - This fixes GPU page faults that crash user
workloads
- **Hardware affected**: Aqua Vanjaram/MI300 series datacenter GPUs
(gfx_v9_4_3, IP versions 9.4.4 and 9.5.0)
- **Configuration**: PF passthrough environments (SR-IOV virtualization)
- **Trigger**: User-space reachable via hibernation cycle + workload
execution
- **Root cause**: Out-of-bounds memory access on MQD (Memory Queue
Descriptor) buffer object due to wrong CP register values
(CP_HYP_XCP_CTL)
#### 3. **Code Changes Analysis:**
**Two minimal, targeted changes:**
**Change 1** (drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c:410-411):
```c
-if (adev->kfd.init_complete && !amdgpu_in_reset(adev))
+if (adev->kfd.init_complete && !amdgpu_in_reset(adev) &&
!adev->in_suspend)
flags |= AMDGPU_XCP_OPS_KFD;
```
- Prevents KFD operations during suspend/hibernation
- KFD resume is handled separately in the resume sequence
**Change 2** (drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:2295-2298):
```c
+if (adev->in_suspend)
+ amdgpu_xcp_restore_partition_mode(adev->xcp_mgr);
+else if (amdgpu_xcp_query_partition_mode(...) ==
AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE)
```
- Adds hibernation resume handling to restore partition mode
- Uses existing `amdgpu_xcp_restore_partition_mode()` function (added in
c45e38f21754b, Sept 2023)
- Ensures CP registers get correct values on resume
#### 4. **Scope and Dependencies:**
- **Contained fix**: Only 3 lines changed across 2 files
- **Existing infrastructure**: Depends on
`amdgpu_xcp_restore_partition_mode()` which has been in the kernel
since Sept 2023
- **Low coupling**: Changes are isolated to resume path, don't affect
normal operation
- **Call graph impact**: Minimal - only affects hibernation resume
codepath
#### 5. **Stable Tree Compliance:**
✅ **Bug fix**: Yes - fixes GPU page faults
✅ **Small and contained**: 3-line change
✅ **No new features**: Uses existing restore function
✅ **No architectural changes**: Follows pattern from c45e38f21754b
❌ **Stable tags**: No "Fixes:" or "Cc: stable@" tags present
However, the absence of stable tags appears to be an oversight given the
severity.
#### 6. **Risk Assessment:**
**Low regression risk:**
- Only affects specific hardware (Aqua Vanjaram GPUs)
- Only impacts PF passthrough configuration
- Only touches hibernation resume path
- Multiple patch iterations suggest thorough testing
- Builds on proven pattern from 2023 reset handling
#### 7. **Historical Context:**
- Part of ongoing partition mode fixes (multiple related commits in
2024-2025)
- Web search revealed extensive mailing list discussion
- Multiple patch versions (v2, v3, v4) indicate careful upstream review
- Targeted for drm-next-6.19
### Conclusion:
This commit **SHOULD be backported** because it:
1. Fixes a serious user-visible bug (GPU crashes from page faults)
2. Has minimal code changes with low regression risk
3. Affects critical datacenter hardware (MI300 series)
4. Is well-tested with multiple upstream review cycles
5. Follows established architectural patterns
6. Is confined to a specific use case, limiting blast radius
The fix is essential for users running AMD MI300 GPUs in virtualized
environments with hibernation support.
drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 811124ff88a88..f9e2edf5260bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -407,7 +407,8 @@ static int aqua_vanjaram_switch_partition_mode(struct amdgpu_xcp_mgr *xcp_mgr,
return -EINVAL;
}
- if (adev->kfd.init_complete && !amdgpu_in_reset(adev))
+ if (adev->kfd.init_complete && !amdgpu_in_reset(adev) &&
+ !adev->in_suspend)
flags |= AMDGPU_XCP_OPS_KFD;
if (flags & AMDGPU_XCP_OPS_KFD) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index 51babf5c78c86..02c69ffd05837 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -2292,7 +2292,9 @@ static int gfx_v9_4_3_cp_resume(struct amdgpu_device *adev)
r = amdgpu_xcp_init(adev->xcp_mgr, num_xcp, mode);
} else {
- if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
+ if (adev->in_suspend)
+ amdgpu_xcp_restore_partition_mode(adev->xcp_mgr);
+ else if (amdgpu_xcp_query_partition_mode(adev->xcp_mgr,
AMDGPU_XCP_FL_NONE) ==
AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE)
r = amdgpu_xcp_switch_partition_mode(
--
2.51.0
In preparation for using svm_copy_lbrs() with 'struct vmcb_save_area'
without a containing 'struct vmcb', and later even 'struct
vmcb_save_area_cached', make it a macro. Pull the call to
vmcb_mark_dirty() out to the callers.
Macros are generally not preferred compared to functions, mainly due to
type-safety. However, in this case it seems like having a simple macro
copying a few fields is better than copy-pasting the same 5 lines of
code in different places.
On the bright side, pulling vmcb_mark_dirty() calls to the callers makes
it clear that in one case, vmcb_mark_dirty() was being called on VMCB12.
It is not architecturally defined for the CPU to clear arbitrary clean
bits, and it is not needed, so drop that one call.
Technically fixes the non-architectural behavior of setting the dirty
bit on VMCB12.
Fixes: d20c796ca370 ("KVM: x86: nSVM: implement nested LBR virtualization")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/nested.c | 16 ++++++++++------
arch/x86/kvm/svm/svm.c | 11 -----------
arch/x86/kvm/svm/svm.h | 10 +++++++++-
3 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c81005b245222..e7861392f2fcd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -676,10 +676,12 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
* Reserved bits of DEBUGCTL are ignored. Be consistent with
* svm_set_msr's definition of reserved bits.
*/
- svm_copy_lbrs(vmcb02, vmcb12);
+ svm_copy_lbrs(&vmcb02->save, &vmcb12->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
vmcb02->save.dbgctl &= ~DEBUGCTL_RESERVED_BITS;
} else {
- svm_copy_lbrs(vmcb02, vmcb01);
+ svm_copy_lbrs(&vmcb02->save, &vmcb01->save);
+ vmcb_mark_dirty(vmcb02, VMCB_LBR);
}
svm_update_lbrv(&svm->vcpu);
}
@@ -1186,10 +1188,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
- (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK)))
- svm_copy_lbrs(vmcb12, vmcb02);
- else
- svm_copy_lbrs(vmcb01, vmcb02);
+ (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
+ svm_copy_lbrs(&vmcb12->save, &vmcb02->save);
+ } else {
+ svm_copy_lbrs(&vmcb01->save, &vmcb02->save);
+ vmcb_mark_dirty(vmcb01, VMCB_LBR);
+ }
svm_update_lbrv(vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fc42bcdbb5200..9eb112f0e61f0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -795,17 +795,6 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
*/
}
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
-{
- to_vmcb->save.dbgctl = from_vmcb->save.dbgctl;
- to_vmcb->save.br_from = from_vmcb->save.br_from;
- to_vmcb->save.br_to = from_vmcb->save.br_to;
- to_vmcb->save.last_excp_from = from_vmcb->save.last_excp_from;
- to_vmcb->save.last_excp_to = from_vmcb->save.last_excp_to;
-
- vmcb_mark_dirty(to_vmcb, VMCB_LBR);
-}
-
static void __svm_enable_lbrv(struct kvm_vcpu *vcpu)
{
to_svm(vcpu)->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c2acaa49ee1c5..e510c8183bd87 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -687,8 +687,16 @@ static inline void *svm_vcpu_alloc_msrpm(void)
return svm_alloc_permissions_map(MSRPM_SIZE, GFP_KERNEL_ACCOUNT);
}
+#define svm_copy_lbrs(to, from) \
+({ \
+ (to)->dbgctl = (from)->dbgctl; \
+ (to)->br_from = (from)->br_from; \
+ (to)->br_to = (from)->br_to; \
+ (to)->last_excp_from = (from)->last_excp_from; \
+ (to)->last_excp_to = (from)->last_excp_to; \
+})
+
void svm_vcpu_free_msrpm(void *msrpm);
-void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
void svm_enable_lbrv(struct kvm_vcpu *vcpu);
void svm_update_lbrv(struct kvm_vcpu *vcpu);
--
2.51.2.1041.gc1ab5b90ca-goog
Gestiona el desempeño con Vorecol
body {
margin: 0;
padding: 0;
font-family: Arial, Helvetica, sans-serif;
font-size: 14px;
color: #333;
background-color: #ffffff;
}
table {
border-spacing: 0;
width: 100%;
max-width: 600px;
margin: auto;
}
td {
padding: 12px 20px;
}
a {
color: #1a73e8;
text-decoration: none;
}
.footer {
font-size: 12px;
color: #888888;
text-align: center;
}
Mejora la gestión del desempeño y talento con Vorecol Performance Management.
Hola ,
Gestionar el desempeño de tu equipo puede ser más sencillo y efectivo con las herramientas adecuadas. Sin un buen sistema, es difícil identificar, desarrollar y retener a los mejores colaboradores.
El módulo de Performance Management de Vorecol te ofrece una solución completa para medir y potenciar el talento en tu organización.
Con este módulo puedes:
Evaluar el desempeño y potencial de tus colaboradores con la matriz Nine Box para tomar mejores decisiones.
Establecer y seguir objetivos claros usando la metodología SMART, alineados con las prioridades de tu empresa.
Ajustar el sistema según lo que necesites, desde manejar los periodos hasta recibir notificaciones, todo fácil de usar.
Además, contarás con soporte técnico y capacitación especializada para resolver cualquier duda y aprovechar al máximo la herramienta.
Aprovecha el Buen Fin del 1 al 22 de noviembre con hasta 15% de descuento y descubre cómo mejorar la gestión del desempeño en tu equipo.
Si quieres conocer más, responde este correo o contáctame directamente.
Saludos,
--------------
Atte.: Luis Ramírez
Ciudad de México: (55) 5018 0565
WhatsApp: +52 33 1607 2089
Si no deseas recibir más correos, haz clic aquí para darte de baja.
Para remover su dirección de esta lista haga <a href="https://s1.arrobamail.com/unsuscribe.php?id=yiwtsrewiswqwqseup">click aquí</a>
On 11/8/2025 9:26 AM, Sasha Levin wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> This is a note to let you know that I've just added the patch titled
>
> net: ionic: add dma_wmb() before ringing TX doorbell
>
> to the 6.12-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-ionic-add-dma_wmb-before-ringing-tx-doorbell.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 05587f91cc2e8b071605aeef6442d2acf6e627c9
> Author: Mohammad Heib <mheib(a)redhat.com>
> Date: Fri Oct 31 17:52:02 2025 +0200
>
> net: ionic: add dma_wmb() before ringing TX doorbell
>
> [ Upstream commit d261f5b09c28850dc63ca1d3018596f829f402d5 ]
>
> The TX path currently writes descriptors and then immediately writes to
> the MMIO doorbell register to notify the NIC. On weakly ordered
> architectures, descriptor writes may still be pending in CPU or DMA
> write buffers when the doorbell is issued, leading to the device
> fetching stale or incomplete descriptors.
>
> Add a dma_wmb() in ionic_txq_post() to ensure all descriptor writes are
> visible to the device before the doorbell MMIO write.
>
> Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling")
> Signed-off-by: Mohammad Heib <mheib(a)redhat.com>
> Link: https://patch.msgid.link/20251031155203.203031-1-mheib@redhat.com
> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> index 0f5758c273c22..3a094d3ea6f4f 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> @@ -29,6 +29,10 @@ static void ionic_tx_clean(struct ionic_queue *q,
>
> static inline void ionic_txq_post(struct ionic_queue *q, bool ring_dbell)
> {
> + /* Ensure TX descriptor writes reach memory before NIC reads them.
> + * Prevents device from fetching stale descriptors.
> + */
> + dma_wmb();
> ionic_q_post(q, ring_dbell);
> }
>
I posted on the original patch, but I will post here as well.
Apologies for the late and duplicate response, but it's not clear to me
why this is necessary.
In other vendors the "doorbell record" (dbr) is writing another location
in system memory, not an mmio write. These cases do use a dma_wmb().
Why isn't the writeq() sufficient in our case? According to
Documentation/memory-barriers.txt it seems like writeq() should be
sufficient.
Thanks,
Brett