'vfio_device' keeps the ->kvm pointer with elevated counter from the first
open of the device up until the last close(). So the kvm struct and its
dependencies (kvm kthreads, cgroups ...) kept alive even for VFIO device
that don't need ->kvm.
Copy ->kvm pointer from the vfio_device struct and store it in the
'intel_vgpu'. Note that kvm_page_track_[un]register_notifier() already
does get/put calls, keeping the kvm struct alive.
This will allow to release ->kvm from the vfio_device righ after the
first open call, so that devices not using kvm not keeping it alive.
Devices that are using kvm (like intel_vgpu) will be expected to mange
the lifetime of the kvm struct by themselves.
Fixes: 2b48f52f2bff ("vfio: fix deadlock between group lock and kvm lock")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrey Ryabinin <arbn(a)yandex-team.com>
---
drivers/gpu/drm/i915/gvt/gvt.h | 1 +
drivers/gpu/drm/i915/gvt/kvmgt.c | 14 +++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2c95aeef4e41..6c62467df22c 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -232,6 +232,7 @@ struct intel_vgpu {
unsigned long nr_cache_entries;
struct mutex cache_lock;
+ struct kvm *kvm;
struct kvm_page_track_notifier_node track_node;
#define NR_BKT (1 << 18)
struct hlist_head ptable[NR_BKT];
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b27ff77bfb50..cf418e2c560d 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -36,6 +36,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/kthread.h>
+#include <linux/kvm_host.h>
#include <linux/sched/mm.h>
#include <linux/types.h>
#include <linux/list.h>
@@ -649,7 +650,7 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, itr->status))
continue;
- if (vgpu->vfio_device.kvm == itr->vfio_device.kvm) {
+ if (vgpu->kvm == itr->kvm) {
ret = true;
goto out;
}
@@ -664,13 +665,13 @@ static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
int ret;
+ vgpu->kvm = vgpu->vfio_device.kvm;
if (__kvmgt_vgpu_exist(vgpu))
return -EEXIST;
vgpu->track_node.track_write = kvmgt_page_track_write;
vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
- ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
- &vgpu->track_node);
+ ret = kvm_page_track_register_notifier(vgpu->kvm, &vgpu->track_node);
if (ret) {
gvt_vgpu_err("KVM is required to use Intel vGPU\n");
return ret;
@@ -707,8 +708,7 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)
debugfs_lookup_and_remove(KVMGT_DEBUGFS_FILENAME, vgpu->debugfs);
- kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
- &vgpu->track_node);
+ kvm_page_track_unregister_notifier(vgpu->kvm, &vgpu->track_node);
kvmgt_protect_table_destroy(vgpu);
gvt_cache_destroy(vgpu);
@@ -1560,7 +1560,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
if (kvmgt_gfn_is_write_protected(info, gfn))
return 0;
- r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+ r = kvm_write_track_add_gfn(info->kvm, gfn);
if (r)
return r;
@@ -1578,7 +1578,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
if (!kvmgt_gfn_is_write_protected(info, gfn))
return 0;
- r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+ r = kvm_write_track_remove_gfn(info->kvm, gfn);
if (r)
return r;
--
2.45.3
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x e52e97f09fb66fd868260d05bd6b74a9a3db39ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021110-demeaning-mushroom-9922@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e52e97f09fb66fd868260d05bd6b74a9a3db39ee Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Thu, 30 Jan 2025 13:15:00 +0100
Subject: [PATCH] statmount: let unset strings be empty
Just like it's normal for unset values to be zero, unset strings should be
empty instead of containing random values.
It seems to be a typical mistake that the mask returned by statmount is not
checked, which can result in various bugs.
With this fix, these bugs are prevented, since it is highly likely that
userspace would just want to turn the missing mask case into an empty
string anyway (most of the recently found cases are of this type).
Link: https://lore.kernel.org/all/CAJfpegsVCPfCn2DpM8iiYSS5DpMsLB8QBUCHecoj6s0Vxf…
Fixes: 68385d77c05b ("statmount: simplify string option retrieval")
Fixes: 46eae99ef733 ("add statmount(2) syscall")
Cc: stable(a)vger.kernel.org # v6.8
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Link: https://lore.kernel.org/r/20250130121500.113446-1-mszeredi@redhat.com
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/namespace.c b/fs/namespace.c
index a3ed3f2980cb..9c4d307a82cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -5191,39 +5191,45 @@ static int statmount_string(struct kstatmount *s, u64 flag)
size_t kbufsize;
struct seq_file *seq = &s->seq;
struct statmount *sm = &s->sm;
- u32 start = seq->count;
+ u32 start, *offp;
+
+ /* Reserve an empty string at the beginning for any unset offsets */
+ if (!seq->count)
+ seq_putc(seq, 0);
+
+ start = seq->count;
switch (flag) {
case STATMOUNT_FS_TYPE:
- sm->fs_type = start;
+ offp = &sm->fs_type;
ret = statmount_fs_type(s, seq);
break;
case STATMOUNT_MNT_ROOT:
- sm->mnt_root = start;
+ offp = &sm->mnt_root;
ret = statmount_mnt_root(s, seq);
break;
case STATMOUNT_MNT_POINT:
- sm->mnt_point = start;
+ offp = &sm->mnt_point;
ret = statmount_mnt_point(s, seq);
break;
case STATMOUNT_MNT_OPTS:
- sm->mnt_opts = start;
+ offp = &sm->mnt_opts;
ret = statmount_mnt_opts(s, seq);
break;
case STATMOUNT_OPT_ARRAY:
- sm->opt_array = start;
+ offp = &sm->opt_array;
ret = statmount_opt_array(s, seq);
break;
case STATMOUNT_OPT_SEC_ARRAY:
- sm->opt_sec_array = start;
+ offp = &sm->opt_sec_array;
ret = statmount_opt_sec_array(s, seq);
break;
case STATMOUNT_FS_SUBTYPE:
- sm->fs_subtype = start;
+ offp = &sm->fs_subtype;
statmount_fs_subtype(s, seq);
break;
case STATMOUNT_SB_SOURCE:
- sm->sb_source = start;
+ offp = &sm->sb_source;
ret = statmount_sb_source(s, seq);
break;
default:
@@ -5251,6 +5257,7 @@ static int statmount_string(struct kstatmount *s, u64 flag)
seq->buf[seq->count++] = '\0';
sm->mask |= flag;
+ *offp = start;
return 0;
}
From: Saurabh Sengar <ssengar(a)linux.microsoft.com>
On a x86 system under test with 1780 CPUs, topology_span_sane() takes
around 8 seconds cumulatively for all the iterations. It is an expensive
operation which does the sanity of non-NUMA topology masks.
CPU topology is not something which changes very frequently hence make
this check optional for the systems where the topology is trusted and
need faster bootup.
Restrict this to sched_verbose kernel cmdline option so that this penalty
can be avoided for the systems who want to avoid it.
Cc: stable(a)vger.kernel.org
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Co-developed-by: Naman Jain <namjain(a)linux.microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
Changes since v2:
https://lore.kernel.org/all/1731922777-7121-1-git-send-email-ssengar@linux.…
- Use sched_debug() instead of using sched_debug_verbose
variable directly (addressing Prateek's comment)
Changes since v1:
https://lore.kernel.org/all/1729619853-2597-1-git-send-email-ssengar@linux.…
- Use kernel cmdline param instead of compile time flag.
Adding a link to the other patch which is under review.
https://lore.kernel.org/lkml/20241031200431.182443-1-steve.wahl@hpe.com/
Above patch tries to optimize the topology sanity check, whereas this
patch makes it optional. We believe both patches can coexist, as even
with optimization, there will still be some performance overhead for
this check.
---
kernel/sched/topology.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..b030c1a2121f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2359,6 +2359,13 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
{
int i = cpu + 1;
+ /* Skip the topology sanity check for non-debug, as it is a time-consuming operatin */
+ if (!sched_debug()) {
+ pr_info_once("%s: Skipping topology span sanity check. Use `sched_verbose` boot parameter to enable it.\n",
+ __func__);
+ return true;
+ }
+
/* NUMA levels are allowed to overlap */
if (tl->flags & SDTL_OVERLAP)
return true;
base-commit: 00f3246adeeacbda0bd0b303604e46eb59c32e6e
--
2.43.0
These patches fix some issues with the way KVM manages FPSIMD/SVE/SME
state. The series supersedes my earlier attempt at fixing the host SVE
state corruption issue:
https://lore.kernel.org/linux-arm-kernel/20250121100026.3974971-1-mark.rutl…
Patch 1 addresses the host SVE state corruption issue by always saving
and unbinding the host state when loading a vCPU, as discussed on the
earlier patch:
https://lore.kernel.org/linux-arm-kernel/Z4--YuG5SWrP_pW7@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86plkful48.wl-maz@kernel.org/
Patches 2 to 4 remove code made redundant by patch 1. These probably
warrant backporting along with patch 1 as there is some historical
brokenness in the code they remove.
Patches 5 to 7 are preparatory refactoring for patch 8, and are not
intended to have any functional impact.
Patch 8 addresses some mismanagement of ZCR_EL{1,2} which can result in
the host VMM unexpectedly receiving a SIGKILL. To fix this, we eagerly
switch ZCR_EL{1,2} at guest<->host transitions, as discussed on another
series:
https://lore.kernel.org/linux-arm-kernel/Z4pAMaEYvdLpmbg2@J2N7QTR9R3/https://lore.kernel.org/linux-arm-kernel/86o6zzukwr.wl-maz@kernel.org/https://lore.kernel.org/linux-arm-kernel/Z5Dc-WMu2azhTuMn@J2N7QTR9R3/
The end result is that KVM loses ~100 lines of code, and becomes a bit
simpler to reason about.
I've pushed these patches to the arm64-kvm-fpsimd-fixes-20250210 tag on my
kernel.org repo:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git
The (unstable) arm64/kvm/fpsimd-fixes branch in that repo contains the
fixes plus additional debug patches I've used for testing. I've given
this some basic testing on a virtual platform, booting a host and a
guest with and without constraining the guest's max SVE VL, with:
* kvm_arm.mode=vhe
* kvm_arm.mode=nvhe
* kvm_arm.mode=protected (IIUC this will default to hVHE)
Since v1 [1]:
* Address some additional compiler warnings in patch 7
* Use ZCR_EL1 alias in VHE code
* Fold in Tested-by and Reviewed-by tags
* Fix typos
Since v2 [2]:
* Ensure context synchronization in patch 8
* Fold in Tested-by and Reviewed-by tags
* Fix typos
[1] https://lore.kernel.org/linux-arm-kernel/20250204152100.705610-1-mark.rutla…
[2] https://lore.kernel.org/linux-arm-kernel/20250206141102.954688-1-mark.rutla…
Mark.
Mark Rutland (8):
KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state
KVM: arm64: Remove host FPSIMD saving for non-protected KVM
KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN
KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN
KVM: arm64: Refactor CPTR trap deactivation
KVM: arm64: Refactor exit handlers
KVM: arm64: Mark some header functions as inline
KVM: arm64: Eagerly switch ZCR_EL{1,2}
arch/arm64/include/asm/kvm_emulate.h | 42 --------
arch/arm64/include/asm/kvm_host.h | 22 +---
arch/arm64/kernel/fpsimd.c | 25 -----
arch/arm64/kvm/arm.c | 8 --
arch/arm64/kvm/fpsimd.c | 100 ++----------------
arch/arm64/kvm/hyp/entry.S | 5 +
arch/arm64/kvm/hyp/include/hyp/switch.h | 133 +++++++++++++++++-------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 15 ++-
arch/arm64/kvm/hyp/nvhe/switch.c | 91 ++++++++--------
arch/arm64/kvm/hyp/vhe/switch.c | 33 +++---
10 files changed, 187 insertions(+), 287 deletions(-)
--
2.30.2
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x be92ab2de0ee1a13291c3b47b2d7eb24d80c0a2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025021058-ruse-paradox-92e6@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be92ab2de0ee1a13291c3b47b2d7eb24d80c0a2c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bence=20Cs=C3=B3k=C3=A1s?= <csokas.bence(a)prolan.hu>
Date: Thu, 19 Dec 2024 10:12:58 +0100
Subject: [PATCH] spi: atmel-qspi: Memory barriers after memory-mapped I/O
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The QSPI peripheral control and status registers are
accessible via the SoC's APB bus, whereas MMIO transactions'
data travels on the AHB bus.
Microchip documentation and even sample code from Atmel
emphasises the need for a memory barrier before the first
MMIO transaction to the AHB-connected QSPI, and before the
last write to its registers via APB. This is achieved by
the following lines in `atmel_qspi_transfer()`:
/* Dummy read of QSPI_IFR to synchronize APB and AHB accesses */
(void)atmel_qspi_read(aq, QSPI_IFR);
However, the current documentation makes no mention to
synchronization requirements in the other direction, i.e.
after the last data written via AHB, and before the first
register access on APB.
In our case, we were facing an issue where the QSPI peripheral
would cease to send any new CSR (nCS Rise) interrupts,
leading to a timeout in `atmel_qspi_wait_for_completion()`
and ultimately this panic in higher levels:
ubi0 error: ubi_io_write: error -110 while writing 63108 bytes
to PEB 491:128, written 63104 bytes
After months of extensive research of the codebase, fiddling
around the debugger with kgdb, and back-and-forth with
Microchip, we came to the conclusion that the issue is
probably that the peripheral is still busy receiving on AHB
when the LASTXFER bit is written to its Control Register
on APB, therefore this write gets lost, and the peripheral
still thinks there is more data to come in the MMIO transfer.
This was first formulated when we noticed that doubling the
write() of QSPI_CR_LASTXFER seemed to solve the problem.
Ultimately, the solution is to introduce memory barriers
after the AHB-mapped MMIO transfers, to ensure ordering.
Fixes: d5433def3153 ("mtd: spi-nor: atmel-quadspi: Add spi-mem support to atmel-quadspi")
Cc: Hari.PrasathGE(a)microchip.com
Cc: Mahesh.Abotula(a)microchip.com
Cc: Marco.Cardellini(a)microchip.com
Cc: stable(a)vger.kernel.org # c0a0203cf579: ("spi: atmel-quadspi: Create `atmel_qspi_ops`"...)
Cc: stable(a)vger.kernel.org # 6.x.y
Signed-off-by: Bence Csókás <csokas.bence(a)prolan.hu>
Link: https://patch.msgid.link/20241219091258.395187-1-csokas.bence@prolan.hu
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/atmel-quadspi.c b/drivers/spi/atmel-quadspi.c
index f46da363574f..8fdc9d27a95e 100644
--- a/drivers/spi/atmel-quadspi.c
+++ b/drivers/spi/atmel-quadspi.c
@@ -661,13 +661,20 @@ static int atmel_qspi_transfer(struct spi_mem *mem,
(void)atmel_qspi_read(aq, QSPI_IFR);
/* Send/Receive data */
- if (op->data.dir == SPI_MEM_DATA_IN)
+ if (op->data.dir == SPI_MEM_DATA_IN) {
memcpy_fromio(op->data.buf.in, aq->mem + offset,
op->data.nbytes);
- else
+
+ /* Synchronize AHB and APB accesses again */
+ rmb();
+ } else {
memcpy_toio(aq->mem + offset, op->data.buf.out,
op->data.nbytes);
+ /* Synchronize AHB and APB accesses again */
+ wmb();
+ }
+
/* Release the chip-select */
atmel_qspi_write(QSPI_CR_LASTXFER, aq, QSPI_CR);