Hello,
RFC v2 addresses comments in RFC v1 [1]. This series is also rebased on kvm/next (v6.15-rc4).
Here's the series stitched together for your convenience: https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfcv...
Changes from RFC v1: + Adds patches to make guest mem use guest mem inodes instead of anonymous inodes. + Changed the name of factored out gmem allocating function to kvm_gmem_alloc_view(). + Changed the flag name vm_move_enc_ctxt_supported to use_vm_enc_ctxt_op. + Various small changes to make patchset compatible with latest version of kvm/next.
As a refresher, split file/inode model was proposed in guest_mem v11, where memslot bindings belong to the file and pages belong to the inode. This model lends itself well to having different VMs use separate files pointing to the same inode.
The split file/inode model has also been used by the other following recent patch series:
+ mmap support for guest_memfd: [2] + NUMA mempolicy support for guest_memfd: [3] + HugeTLB support for guest_memfd: [4]
This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and a gmem fd, and returns another gmem fd referencing a different file and associated with VM. This RFC also includes an update to KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context (slot->arch.lpage_info and kvm->mem_attr_array) from source to destination vm, intra-host.
Intended usage of the two ioctls:
1. Source VM’s fd is passed to destination VM via unix sockets. 2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source VM’s fd to a new fd. 3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION, which will bind the new file, pointing to the same inode that the source VM’s file points to, to memslots. 4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array and slot->arch.lpage_info to the destination VM. 5. Run the destination VM as per normal.
Some other approaches considered were:
+ Using the linkat() syscall, but that requires a mount/directory for a source fd to be linked to + Using the dup() syscall, but that only duplicates the fd, and both fds point to the same file
[1] https://lore.kernel.org/all/cover.1691446946.git.ackerleytng@google.com/T/ [2] https://lore.kernel.org/all/20250328153133.3504118-2-tabba@google.com/ [3] https://lore.kernel.org/all/20250408112402.181574-6-shivankg@amd.com/ [4] https://lore.kernel.org/all/c1ee659c212b5a8b0e7a7f4d1763699176dd3a62.1747264...
---
Ackerley Tng (12): KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes KVM: guest_mem: Refactor out kvm_gmem_alloc_view() KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl KVM: selftests: Test transferring private memory to another VM KVM: x86: Refactor sev's flag migration_in_progress to kvm struct KVM: x86: Refactor common code out of sev.c KVM: x86: Refactor common migration preparation code out of sev_vm_move_enc_context_from KVM: x86: Let moving encryption context be configurable KVM: x86: Handle moving of memory context for intra-host migration KVM: selftests: Generalize migration functions from sev_migrate_tests.c KVM: selftests: Add tests for migration of private mem
David Hildenbrand (1): fs: Refactor to provide function that allocates a secure anonymous inode
arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/svm/sev.c | 82 +------ arch/x86/kvm/svm/svm.h | 3 +- arch/x86/kvm/x86.c | 218 ++++++++++++++++- arch/x86/kvm/x86.h | 6 + fs/anon_inodes.c | 23 +- include/linux/fs.h | 13 +- include/linux/kvm_host.h | 18 ++ include/uapi/linux/kvm.h | 8 + include/uapi/linux/magic.h | 1 + mm/secretmem.c | 9 +- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/guest_memfd_test.c | 43 ++++ .../testing/selftests/kvm/include/kvm_util.h | 31 +++ .../kvm/x86/private_mem_migrate_tests.c | 93 ++++++++ .../selftests/kvm/x86/sev_migrate_tests.c | 48 ++-- virt/kvm/guest_memfd.c | 225 +++++++++++++++--- virt/kvm/kvm_main.c | 17 +- virt/kvm/kvm_mm.h | 14 +- 19 files changed, 697 insertions(+), 159 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c
From: David Hildenbrand david@redhat.com
alloc_anon_secure_inode() returns an inode after running checks in security_inode_init_security_anon().
Also refactor secretmem's file creation process to use the new function.
Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- fs/anon_inodes.c | 23 ++++++++++++++++------- include/linux/fs.h | 13 +++++++------ mm/secretmem.c | 9 +-------- 3 files changed, 24 insertions(+), 21 deletions(-)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index 583ac81669c2..0ce28959c43a 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -55,17 +55,20 @@ static struct file_system_type anon_inode_fs_type = { .kill_sb = kill_anon_super, };
-static struct inode *anon_inode_make_secure_inode( - const char *name, - const struct inode *context_inode) +static struct inode *anon_inode_make_secure_inode(struct super_block *s, + const char *name, const struct inode *context_inode, + bool fs_internal) { struct inode *inode; int error;
- inode = alloc_anon_inode(anon_inode_mnt->mnt_sb); + inode = alloc_anon_inode(s); if (IS_ERR(inode)) return inode; - inode->i_flags &= ~S_PRIVATE; + + if (!fs_internal) + inode->i_flags &= ~S_PRIVATE; + error = security_inode_init_security_anon(inode, &QSTR(name), context_inode); if (error) { @@ -75,6 +78,12 @@ static struct inode *anon_inode_make_secure_inode( return inode; }
+struct inode *alloc_anon_secure_inode(struct super_block *s, const char *name) +{ + return anon_inode_make_secure_inode(s, name, NULL, true); +} +EXPORT_SYMBOL_GPL(alloc_anon_secure_inode); + static struct file *__anon_inode_getfile(const char *name, const struct file_operations *fops, void *priv, int flags, @@ -88,7 +97,8 @@ static struct file *__anon_inode_getfile(const char *name, return ERR_PTR(-ENOENT);
if (make_inode) { - inode = anon_inode_make_secure_inode(name, context_inode); + inode = anon_inode_make_secure_inode(anon_inode_mnt->mnt_sb, + name, context_inode, false); if (IS_ERR(inode)) { file = ERR_CAST(inode); goto err; @@ -318,4 +328,3 @@ static int __init anon_inode_init(void) }
fs_initcall(anon_inode_init); - diff --git a/include/linux/fs.h b/include/linux/fs.h index 016b0fe1536e..8eeef9a7fe07 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -309,7 +309,7 @@ struct iattr { */ #define FILESYSTEM_MAX_STACK_DEPTH 2
-/** +/** * enum positive_aop_returns - aop return codes with specific semantics * * @AOP_WRITEPAGE_ACTIVATE: Informs the caller that page writeback has @@ -319,7 +319,7 @@ struct iattr { * be a candidate for writeback again in the near * future. Other callers must be careful to unlock * the page if they get this return. Returned by - * writepage(); + * writepage(); * * @AOP_TRUNCATED_PAGE: The AOP method that was handed a locked page has * unlocked it and the page might have been truncated. @@ -1141,8 +1141,8 @@ struct file *get_file_active(struct file **f);
#define MAX_NON_LFS ((1UL<<31) - 1)
-/* Page cache limit. The filesystems should put that into their s_maxbytes - limits, otherwise bad things can happen in VM. */ +/* Page cache limit. The filesystems should put that into their s_maxbytes + limits, otherwise bad things can happen in VM. */ #if BITS_PER_LONG==32 #define MAX_LFS_FILESIZE ((loff_t)ULONG_MAX << PAGE_SHIFT) #elif BITS_PER_LONG==64 @@ -2607,7 +2607,7 @@ int sync_inode_metadata(struct inode *inode, int wait); struct file_system_type { const char *name; int fs_flags; -#define FS_REQUIRES_DEV 1 +#define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ @@ -3195,7 +3195,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos); extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *); extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *); extern struct file * open_exec(const char *); - + /* fs/dcache.c -- generic fs support functions */ extern bool is_subdir(struct dentry *, struct dentry *); extern bool path_is_under(const struct path *, const struct path *); @@ -3550,6 +3550,7 @@ extern int simple_write_begin(struct file *file, struct address_space *mapping, extern const struct address_space_operations ram_aops; extern int always_delete_dentry(const struct dentry *); extern struct inode *alloc_anon_inode(struct super_block *); +extern struct inode *alloc_anon_secure_inode(struct super_block *, const char *); extern int simple_nosetlease(struct file *, int, struct file_lease **, void **); extern const struct dentry_operations simple_dentry_operations;
diff --git a/mm/secretmem.c b/mm/secretmem.c index 1b0a214ee558..c0e459e58cb6 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -195,18 +195,11 @@ static struct file *secretmem_file_create(unsigned long flags) struct file *file; struct inode *inode; const char *anon_name = "[secretmem]"; - int err;
- inode = alloc_anon_inode(secretmem_mnt->mnt_sb); + inode = alloc_anon_secure_inode(secretmem_mnt->mnt_sb, anon_name); if (IS_ERR(inode)) return ERR_CAST(inode);
- err = security_inode_init_security_anon(inode, &QSTR(anon_name), NULL); - if (err) { - file = ERR_PTR(err); - goto err_free_inode; - } - file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", O_RDWR, &secretmem_fops); if (IS_ERR(file))
From: Ackerley Tng ackerleytng@google.com
Using guest mem inodes allows us to store metadata for the backing memory on the inode. Metadata will be added in a later patch to support HugeTLB pages.
Metadata about backing memory should not be stored on the file, since the file represents a guest_memfd's binding with a struct kvm, and metadata about backing memory is not unique to a specific binding and struct kvm.
Signed-off-by: Fuad Tabba tabba@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 132 +++++++++++++++++++++++++++++++------ virt/kvm/kvm_main.c | 7 +- virt/kvm/kvm_mm.h | 9 ++- 4 files changed, 124 insertions(+), 25 deletions(-)
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..169dba2a6920 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */
#endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index b2aa6bf24d3a..2ee26695dc31 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1,12 +1,16 @@ // SPDX-License-Identifier: GPL-2.0 +#include <linux/fs.h> #include <linux/backing-dev.h> #include <linux/falloc.h> #include <linux/kvm_host.h> +#include <linux/pseudo_fs.h> #include <linux/pagemap.h> #include <linux/anon_inodes.h>
#include "kvm_mm.h"
+static struct vfsmount *kvm_gmem_mnt; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -318,9 +322,51 @@ static struct file_operations kvm_gmem_fops = { .fallocate = kvm_gmem_fallocate, };
-void kvm_gmem_init(struct module *module) +static const struct super_operations kvm_gmem_super_operations = { + .statfs = simple_statfs, +}; + +static int kvm_gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx; + + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) + return -ENOMEM; + + ctx = fc->fs_private; + ctx->ops = &kvm_gmem_super_operations; + + return 0; +} + +static struct file_system_type kvm_gmem_fs = { + .name = "kvm_guest_memory", + .init_fs_context = kvm_gmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static int kvm_gmem_init_mount(void) +{ + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs); + + if (WARN_ON_ONCE(IS_ERR(kvm_gmem_mnt))) + return PTR_ERR(kvm_gmem_mnt); + + kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; + return 0; +} + +int kvm_gmem_init(struct module *module) { kvm_gmem_fops.owner = module; + + return kvm_gmem_init_mount(); +} + +void kvm_gmem_exit(void) +{ + kern_unmount(kvm_gmem_mnt); + kvm_gmem_mnt = NULL; }
static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -402,11 +448,71 @@ static const struct inode_operations kvm_gmem_iops = { .setattr = kvm_gmem_setattr, };
+static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, + loff_t size, u64 flags) +{ + struct inode *inode; + + inode = alloc_anon_secure_inode(kvm_gmem_mnt->mnt_sb, name); + if (IS_ERR(inode)) + return inode; + + inode->i_private = (void *)(unsigned long)flags; + inode->i_op = &kvm_gmem_iops; + inode->i_mapping->a_ops = &kvm_gmem_aops; + inode->i_mode |= S_IFREG; + inode->i_size = size; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_inaccessible(inode->i_mapping); + /* Unmovable mappings are supposed to be marked unevictable as well. */ + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + + return inode; +} + +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, + u64 flags) +{ + static const char *name = "[kvm-gmem]"; + struct inode *inode; + struct file *file; + int err; + + err = -ENOENT; + if (!try_module_get(kvm_gmem_fops.owner)) + goto err; + + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); + if (IS_ERR(inode)) { + err = PTR_ERR(inode); + goto err_put_module; + } + + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, + &kvm_gmem_fops); + if (IS_ERR(file)) { + err = PTR_ERR(file); + goto err_put_inode; + } + + file->f_flags |= O_LARGEFILE; + file->private_data = priv; + +out: + return file; + +err_put_inode: + iput(inode); +err_put_module: + module_put(kvm_gmem_fops.owner); +err: + file = ERR_PTR(err); + goto out; +} + static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name = "[kvm-gmem]"; struct kvm_gmem *gmem; - struct inode *inode; struct file *file; int fd, err;
@@ -420,32 +526,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) goto err_fd; }
- file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file = kvm_gmem_inode_create_getfile(gmem, size, flags); if (IS_ERR(file)) { err = PTR_ERR(file); goto err_gmem; }
- file->f_flags |= O_LARGEFILE; - - inode = file->f_inode; - WARN_ON(file->f_mapping != inode->i_mapping); - - inode->i_private = (void *)(unsigned long)flags; - inode->i_op = &kvm_gmem_iops; - inode->i_mapping->a_ops = &kvm_gmem_aops; - inode->i_mode |= S_IFREG; - inode->i_size = size; - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); - mapping_set_inaccessible(inode->i_mapping); - /* Unmovable mappings are supposed to be marked unevictable as well. */ - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); - kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); - list_add(&gmem->entry, &inode->i_mapping->i_private_list); + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list);
fd_install(fd, file); return fd; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 69782df3617f..1e3fd81868bc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -6412,7 +6412,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (WARN_ON_ONCE(r)) goto err_vfio;
- kvm_gmem_init(module); + r = kvm_gmem_init(module); + if (r) + goto err_gmem;
r = kvm_init_virtualization(); if (r) @@ -6433,6 +6435,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_register: kvm_uninit_virtualization(); err_virt: + kvm_gmem_exit(); +err_gmem: kvm_vfio_ops_exit(); err_vfio: kvm_async_pf_deinit(); @@ -6464,6 +6468,7 @@ void kvm_exit(void) for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); + kvm_gmem_exit(); kvm_vfio_ops_exit(); kvm_async_pf_deinit(); kvm_irqfd_exit(); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index acef3f5c582a..dcacb76b8f00 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -68,17 +68,20 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, #endif /* HAVE_KVM_PFNCACHE */
#ifdef CONFIG_KVM_PRIVATE_MEM -void kvm_gmem_init(struct module *module); +int kvm_gmem_init(struct module *module); +void kvm_gmem_exit(void); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); #else -static inline void kvm_gmem_init(struct module *module) +static inline int kvm_gmem_init(struct module *module) { - + return 0; }
+static inline void kvm_gmem_exit(void) {}; + static inline int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset)
From: Ackerley Tng ackerleytng@google.com
kvm_gmem_alloc_view() will allocate and build a file out of an inode.
Will be reused later by kvm_gmem_link()
Signed-off-by: Ackerley Tng ackerleytng@google.com Co-developed-by: Ryan Afranji afranji@google.com Signed-off-by: Ryan Afranji afranji@google.com --- virt/kvm/guest_memfd.c | 61 +++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 34 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 2ee26695dc31..a3918d1695b9 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -470,49 +470,47 @@ static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, return inode; }
-static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, - u64 flags) +static struct file *kvm_gmem_alloc_view(struct kvm *kvm, struct inode *inode, + const char *name) { - static const char *name = "[kvm-gmem]"; - struct inode *inode; + struct kvm_gmem *gmem; struct file *file; - int err;
- err = -ENOENT; if (!try_module_get(kvm_gmem_fops.owner)) - goto err; + return ERR_PTR(-ENOENT);
- inode = kvm_gmem_inode_make_secure_inode(name, size, flags); - if (IS_ERR(inode)) { - err = PTR_ERR(inode); + gmem = kzalloc(sizeof(*gmem), GFP_KERNEL); + if (!gmem) { + file = ERR_PTR(-ENOMEM); goto err_put_module; }
file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, &kvm_gmem_fops); - if (IS_ERR(file)) { - err = PTR_ERR(file); - goto err_put_inode; - } + if (IS_ERR(file)) + goto err_gmem;
file->f_flags |= O_LARGEFILE; - file->private_data = priv; + file->private_data = gmem; + + kvm_get_kvm(kvm); + gmem->kvm = kvm; + xa_init(&gmem->bindings); + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list);
-out: return file;
-err_put_inode: - iput(inode); +err_gmem: + kfree(gmem); err_put_module: module_put(kvm_gmem_fops.owner); -err: - file = ERR_PTR(err); - goto out; + return file; }
static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - struct kvm_gmem *gmem; + static const char *name = "[kvm-gmem]"; + struct inode *inode; struct file *file; int fd, err;
@@ -520,28 +518,23 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) if (fd < 0) return fd;
- gmem = kzalloc(sizeof(*gmem), GFP_KERNEL); - if (!gmem) { - err = -ENOMEM; + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); + if (IS_ERR(inode)) { + err = PTR_ERR(inode); goto err_fd; }
- file = kvm_gmem_inode_create_getfile(gmem, size, flags); + file = kvm_gmem_alloc_view(kvm, inode, name); if (IS_ERR(file)) { err = PTR_ERR(file); - goto err_gmem; + goto err_put_inode; }
- kvm_get_kvm(kvm); - gmem->kvm = kvm; - xa_init(&gmem->bindings); - list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); - fd_install(fd, file); return fd;
-err_gmem: - kfree(gmem); +err_put_inode: + iput(inode); err_fd: put_unused_fd(fd); return err;
From: Ackerley Tng ackerleytng@google.com
KVM_LINK_GUEST_MEMFD will link a gmem fd's underlying inode to a new file (and fd).
Signed-off-by: Ackerley Tng ackerleytng@google.com Co-developed-by: Ryan Afranji afranji@google.com Signed-off-by: Ryan Afranji afranji@google.com --- include/uapi/linux/kvm.h | 8 ++++++ virt/kvm/guest_memfd.c | 57 ++++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 10 +++++++ virt/kvm/kvm_mm.h | 7 +++++ 4 files changed, 82 insertions(+)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index c6988e2c68d5..8f17f0b462aa 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1583,4 +1583,12 @@ struct kvm_pre_fault_memory { __u64 padding[5]; };
+#define KVM_LINK_GUEST_MEMFD _IOWR(KVMIO, 0xd6, struct kvm_link_guest_memfd) + +struct kvm_link_guest_memfd { + __u64 fd; + __u64 flags; + __u64 reserved[6]; +}; + #endif /* __LINUX_KVM_H */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index a3918d1695b9..d76bd1119198 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -555,6 +555,63 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) return __kvm_gmem_create(kvm, size, flags); }
+int kvm_gmem_link(struct kvm *kvm, struct kvm_link_guest_memfd *args) +{ + static const char *name = "[kvm-gmem]"; + u64 flags = args->flags; + u64 valid_flags = 0; + struct file *dst_file, *src_file; + struct kvm_gmem *gmem; + struct timespec64 ts; + struct inode *inode; + struct fd f; + int ret, fd; + + if (flags & ~valid_flags) + return -EINVAL; + + f = fdget(args->fd); + src_file = fd_file(f); + if (!src_file) + return -EINVAL; + + ret = -EINVAL; + if (src_file->f_op != &kvm_gmem_fops) + goto out; + + /* Cannot link a gmem file with the same vm again */ + gmem = src_file->private_data; + if (gmem->kvm == kvm) + goto out; + + ret = fd = get_unused_fd_flags(0); + if (ret < 0) + goto out; + + inode = file_inode(src_file); + dst_file = kvm_gmem_alloc_view(kvm, inode, name); + if (IS_ERR(dst_file)) { + ret = PTR_ERR(dst_file); + goto out_fd; + } + + ts = inode_set_ctime_current(inode); + inode_set_atime_to_ts(inode, ts); + + inc_nlink(inode); + ihold(inode); + + fd_install(fd, dst_file); + fdput(f); + return fd; + +out_fd: + put_unused_fd(fd); +out: + fdput(f); + return ret; +} + int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1e3fd81868bc..a9b01841a243 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5285,6 +5285,16 @@ static long kvm_vm_ioctl(struct file *filp, r = kvm_gmem_create(kvm, &guest_memfd); break; } + case KVM_LINK_GUEST_MEMFD: { + struct kvm_link_guest_memfd params; + + r = -EFAULT; + if (copy_from_user(¶ms, argp, sizeof(params))) + goto out; + + r = kvm_gmem_link(kvm, ¶ms); + break; + } #endif default: r = kvm_arch_vm_ioctl(filp, ioctl, arg); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index dcacb76b8f00..85baf8a7e0de 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -71,6 +71,7 @@ static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, int kvm_gmem_init(struct module *module); void kvm_gmem_exit(void); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); +int kvm_gmem_link(struct kvm *kvm, struct kvm_link_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); @@ -82,6 +83,12 @@ static inline int kvm_gmem_init(struct module *module)
static inline void kvm_gmem_exit(void) {};
+static inline int kvm_gmem_link(struct kvm *kvm, + struct kvm_link_guest_memfd *args) +{ + return -EOPNOTSUPP; +} + static inline int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset)
From: Ackerley Tng ackerleytng@google.com
Test that
+ Invalid inputs should be rejected with EINVAL + Successful inputs return a new (destination) fd + Destination and source fds have the same inode number + No crash on program exit
Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- .../testing/selftests/kvm/guest_memfd_test.c | 43 +++++++++++++++++++ .../testing/selftests/kvm/include/kvm_util.h | 18 ++++++++ 2 files changed, 61 insertions(+)
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index ce687f8d248f..9b2a58cd9b64 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -170,6 +170,48 @@ static void test_create_guest_memfd_multiple(struct kvm_vm *vm) close(fd1); }
+static void test_link(struct kvm_vm *src_vm, int src_fd, size_t total_size) +{ + int ret; + int dst_fd; + struct kvm_vm *dst_vm; + struct stat src_stat; + struct stat dst_stat; + + dst_vm = vm_create_barebones(); + + /* Linking with a nonexistent fd */ + dst_fd = __vm_link_guest_memfd(dst_vm, 99, 0); + TEST_ASSERT_EQ(dst_fd, -1); + TEST_ASSERT_EQ(errno, EINVAL); + + /* Linking with a non-gmem fd */ + dst_fd = __vm_link_guest_memfd(dst_vm, 0, 1); + TEST_ASSERT_EQ(dst_fd, -1); + TEST_ASSERT_EQ(errno, EINVAL); + + /* Linking with invalid flags */ + dst_fd = __vm_link_guest_memfd(dst_vm, src_fd, 1); + TEST_ASSERT_EQ(dst_fd, -1); + TEST_ASSERT_EQ(errno, EINVAL); + + /* Linking with an already-associated vm */ + dst_fd = __vm_link_guest_memfd(src_vm, src_fd, 1); + TEST_ASSERT_EQ(dst_fd, -1); + TEST_ASSERT_EQ(errno, EINVAL); + + dst_fd = __vm_link_guest_memfd(dst_vm, src_fd, 0); + TEST_ASSERT(dst_vm > 0, "linking should succeed with valid inputs"); + TEST_ASSERT(src_fd != dst_fd, "linking should return a different fd"); + + ret = fstat(src_fd, &src_stat); + TEST_ASSERT_EQ(ret, 0); + ret = fstat(dst_fd, &dst_stat); + TEST_ASSERT_EQ(ret, 0); + TEST_ASSERT(src_stat.st_ino == dst_stat.st_ino, + "src and dst files should have the same inode number"); +} + int main(int argc, char *argv[]) { size_t page_size; @@ -194,6 +236,7 @@ int main(int argc, char *argv[]) test_file_size(fd, page_size, total_size); test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size); + test_link(vm, fd, total_size);
close(fd); } diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 373912464fb4..68faa658b69e 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -571,6 +571,24 @@ static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, return fd; }
+static inline int __vm_link_guest_memfd(struct kvm_vm *vm, int fd, uint64_t flags) +{ + struct kvm_link_guest_memfd params = { + .fd = fd, + .flags = flags, + }; + + return __vm_ioctl(vm, KVM_LINK_GUEST_MEMFD, ¶ms); +} + +static inline int vm_link_guest_memfd(struct kvm_vm *vm, int fd, uint64_t flags) +{ + int new_fd = __vm_link_guest_memfd(vm, fd, flags); + + TEST_ASSERT(new_fd >= 0, KVM_IOCTL_ERROR(KVM_LINK_GUEST_MEMFD, new_fd)); + return new_fd; +} + void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t flags, uint64_t gpa, uint64_t size, void *hva); int __vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t flags,
From: Ackerley Tng ackerleytng@google.com
Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- .../kvm/x86/private_mem_migrate_tests.c | 87 +++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c
diff --git a/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c b/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c new file mode 100644 index 000000000000..4226de3ebd41 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c @@ -0,0 +1,87 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "kvm_util_base.h" +#include "test_util.h" +#include "ucall_common.h" +#include <linux/kvm.h> +#include <linux/sizes.h> + +#define TRANSFER_PRIVATE_MEM_TEST_SLOT 10 +#define TRANSFER_PRIVATE_MEM_GPA ((uint64_t)(1ull << 32)) +#define TRANSFER_PRIVATE_MEM_GVA TRANSFER_PRIVATE_MEM_GPA +#define TRANSFER_PRIVATE_MEM_VALUE 0xdeadbeef + +static void transfer_private_mem_guest_code_src(void) +{ + uint64_t volatile *const ptr = (uint64_t *)TRANSFER_PRIVATE_MEM_GVA; + + *ptr = TRANSFER_PRIVATE_MEM_VALUE; + + GUEST_SYNC1(*ptr); +} + +static void transfer_private_mem_guest_code_dst(void) +{ + uint64_t volatile *const ptr = (uint64_t *)TRANSFER_PRIVATE_MEM_GVA; + + GUEST_SYNC1(*ptr); +} + +static void test_transfer_private_mem(void) +{ + struct kvm_vm *src_vm, *dst_vm; + struct kvm_vcpu *src_vcpu, *dst_vcpu; + int src_memfd, dst_memfd; + struct ucall uc; + + const struct vm_shape shape = { + .mode = VM_MODE_DEFAULT, + .type = KVM_X86_SW_PROTECTED_VM, + }; + + /* Build the source VM, use it to write to private memory */ + src_vm = __vm_create_shape_with_one_vcpu( + shape, &src_vcpu, 0, transfer_private_mem_guest_code_src); + src_memfd = vm_create_guest_memfd(src_vm, SZ_4K, 0); + + vm_mem_add(src_vm, DEFAULT_VM_MEM_SRC, TRANSFER_PRIVATE_MEM_GPA, + TRANSFER_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_PRIVATE, + src_memfd, 0); + + virt_map(src_vm, TRANSFER_PRIVATE_MEM_GVA, TRANSFER_PRIVATE_MEM_GPA, 1); + vm_set_memory_attributes(src_vm, TRANSFER_PRIVATE_MEM_GPA, SZ_4K, + KVM_MEMORY_ATTRIBUTE_PRIVATE); + + vcpu_run(src_vcpu); + TEST_ASSERT_KVM_EXIT_REASON(src_vcpu, KVM_EXIT_IO); + get_ucall(src_vcpu, &uc); + TEST_ASSERT(uc.args[0] == TRANSFER_PRIVATE_MEM_VALUE, + "Source VM should be able to write to private memory"); + + /* Build the destination VM with linked fd */ + dst_vm = __vm_create_shape_with_one_vcpu( + shape, &dst_vcpu, 0, transfer_private_mem_guest_code_dst); + dst_memfd = vm_link_guest_memfd(dst_vm, src_memfd, 0); + + vm_mem_add(dst_vm, DEFAULT_VM_MEM_SRC, TRANSFER_PRIVATE_MEM_GPA, + TRANSFER_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_PRIVATE, + dst_memfd, 0); + + virt_map(dst_vm, TRANSFER_PRIVATE_MEM_GVA, TRANSFER_PRIVATE_MEM_GPA, 1); + vm_set_memory_attributes(dst_vm, TRANSFER_PRIVATE_MEM_GPA, SZ_4K, + KVM_MEMORY_ATTRIBUTE_PRIVATE); + + vcpu_run(dst_vcpu); + TEST_ASSERT_KVM_EXIT_REASON(dst_vcpu, KVM_EXIT_IO); + get_ucall(dst_vcpu, &uc); + TEST_ASSERT(uc.args[0] == TRANSFER_PRIVATE_MEM_VALUE, + "Destination VM should be able to read value transferred"); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM)); + + test_transfer_private_mem(); + + return 0; +}
From: Ackerley Tng ackerleytng@google.com
The migration_in_progress flag will also be needed for migration of non-sev VMs.
Co-developed-by: Sagi Shahar sagis@google.com Signed-off-by: Sagi Shahar sagis@google.com Co-developed-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- arch/x86/kvm/svm/sev.c | 17 ++++++----------- arch/x86/kvm/svm/svm.h | 1 - include/linux/kvm_host.h | 1 + 3 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 0bc708ee2788..89c06cfcc200 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1838,8 +1838,6 @@ static bool is_cmd_allowed_from_mirror(u32 cmd_id)
static int sev_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) { - struct kvm_sev_info *dst_sev = to_kvm_sev_info(dst_kvm); - struct kvm_sev_info *src_sev = to_kvm_sev_info(src_kvm); int r = -EBUSY;
if (dst_kvm == src_kvm) @@ -1849,10 +1847,10 @@ static int sev_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) * Bail if these VMs are already involved in a migration to avoid * deadlock between two VMs trying to migrate to/from each other. */ - if (atomic_cmpxchg_acquire(&dst_sev->migration_in_progress, 0, 1)) + if (atomic_cmpxchg_acquire(&dst_kvm->migration_in_progress, 0, 1)) return -EBUSY;
- if (atomic_cmpxchg_acquire(&src_sev->migration_in_progress, 0, 1)) + if (atomic_cmpxchg_acquire(&src_kvm->migration_in_progress, 0, 1)) goto release_dst;
r = -EINTR; @@ -1865,21 +1863,18 @@ static int sev_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) unlock_dst: mutex_unlock(&dst_kvm->lock); release_src: - atomic_set_release(&src_sev->migration_in_progress, 0); + atomic_set_release(&src_kvm->migration_in_progress, 0); release_dst: - atomic_set_release(&dst_sev->migration_in_progress, 0); + atomic_set_release(&dst_kvm->migration_in_progress, 0); return r; }
static void sev_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) { - struct kvm_sev_info *dst_sev = to_kvm_sev_info(dst_kvm); - struct kvm_sev_info *src_sev = to_kvm_sev_info(src_kvm); - mutex_unlock(&dst_kvm->lock); mutex_unlock(&src_kvm->lock); - atomic_set_release(&dst_sev->migration_in_progress, 0); - atomic_set_release(&src_sev->migration_in_progress, 0); + atomic_set_release(&dst_kvm->migration_in_progress, 0); + atomic_set_release(&src_kvm->migration_in_progress, 0); }
/* vCPU mutex subclasses. */ diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index d4490eaed55d..35df8be621c5 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -107,7 +107,6 @@ struct kvm_sev_info { struct list_head mirror_vms; /* List of VMs mirroring */ struct list_head mirror_entry; /* Use as a list entry of mirrors */ struct misc_cg *misc_cg; /* For misc cgroup accounting */ - atomic_t migration_in_progress; void *snp_context; /* SNP guest context page */ void *guest_req_buf; /* Bounce buffer for SNP Guest Request input */ void *guest_resp_buf; /* Bounce buffer for SNP Guest Request output */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1dedc421b3e3..0c1d637a6e7d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -862,6 +862,7 @@ struct kvm { /* Protected by slots_locks (for writes) and RCU (for reads) */ struct xarray mem_attr_array; #endif + atomic_t migration_in_progress; char stats_id[KVM_STATS_NAME_SIZE]; };
From: Ackerley Tng ackerleytng@google.com
Split sev_lock_two_vms() into kvm_mark_migration_in_progress() and kvm_lock_two_vms() and refactor sev.c to use these two new functions.
Co-developed-by: Sagi Shahar sagis@google.com Signed-off-by: Sagi Shahar sagis@google.com Co-developed-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- arch/x86/kvm/svm/sev.c | 60 ++++++++++------------------------------ arch/x86/kvm/x86.c | 62 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.h | 6 ++++ 3 files changed, 82 insertions(+), 46 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 89c06cfcc200..b3048ec411e2 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1836,47 +1836,6 @@ static bool is_cmd_allowed_from_mirror(u32 cmd_id) return false; }
-static int sev_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) -{ - int r = -EBUSY; - - if (dst_kvm == src_kvm) - return -EINVAL; - - /* - * Bail if these VMs are already involved in a migration to avoid - * deadlock between two VMs trying to migrate to/from each other. - */ - if (atomic_cmpxchg_acquire(&dst_kvm->migration_in_progress, 0, 1)) - return -EBUSY; - - if (atomic_cmpxchg_acquire(&src_kvm->migration_in_progress, 0, 1)) - goto release_dst; - - r = -EINTR; - if (mutex_lock_killable(&dst_kvm->lock)) - goto release_src; - if (mutex_lock_killable_nested(&src_kvm->lock, SINGLE_DEPTH_NESTING)) - goto unlock_dst; - return 0; - -unlock_dst: - mutex_unlock(&dst_kvm->lock); -release_src: - atomic_set_release(&src_kvm->migration_in_progress, 0); -release_dst: - atomic_set_release(&dst_kvm->migration_in_progress, 0); - return r; -} - -static void sev_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) -{ - mutex_unlock(&dst_kvm->lock); - mutex_unlock(&src_kvm->lock); - atomic_set_release(&dst_kvm->migration_in_progress, 0); - atomic_set_release(&src_kvm->migration_in_progress, 0); -} - /* vCPU mutex subclasses. */ enum sev_migration_role { SEV_MIGRATION_SOURCE = 0, @@ -2057,9 +2016,12 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) return -EBADF;
source_kvm = fd_file(f)->private_data; - ret = sev_lock_two_vms(kvm, source_kvm); + ret = kvm_mark_migration_in_progress(kvm, source_kvm); if (ret) return ret; + ret = kvm_lock_two_vms(kvm, source_kvm); + if (ret) + goto out_mark_migration_done;
if (kvm->arch.vm_type != source_kvm->arch.vm_type || sev_guest(kvm) || !sev_guest(source_kvm)) { @@ -2105,7 +2067,9 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) put_misc_cg(cg_cleanup_sev->misc_cg); cg_cleanup_sev->misc_cg = NULL; out_unlock: - sev_unlock_two_vms(kvm, source_kvm); + kvm_unlock_two_vms(kvm, source_kvm); +out_mark_migration_done: + kvm_mark_migration_done(kvm, source_kvm); return ret; }
@@ -2779,9 +2743,12 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd) return -EBADF;
source_kvm = fd_file(f)->private_data; - ret = sev_lock_two_vms(kvm, source_kvm); + ret = kvm_mark_migration_in_progress(kvm, source_kvm); if (ret) return ret; + ret = kvm_lock_two_vms(kvm, source_kvm); + if (ret) + goto e_mark_migration_done;
/* * Mirrors of mirrors should work, but let's not get silly. Also @@ -2821,9 +2788,10 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd) * KVM contexts as the original, and they may have different * memory-views. */ - e_unlock: - sev_unlock_two_vms(kvm, source_kvm); + kvm_unlock_two_vms(kvm, source_kvm); +e_mark_migration_done: + kvm_mark_migration_done(kvm, source_kvm); return ret; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f6ce044b090a..422c66a033d2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4502,6 +4502,68 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) } EXPORT_SYMBOL_GPL(kvm_get_msr_common);
+int kvm_mark_migration_in_progress(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + int r; + + if (dst_kvm == src_kvm) + return -EINVAL; + + /* + * Bail if these VMs are already involved in a migration to avoid + * deadlock between two VMs trying to migrate to/from each other. + */ + r = -EBUSY; + if (atomic_cmpxchg_acquire(&dst_kvm->migration_in_progress, 0, 1)) + return r; + + if (atomic_cmpxchg_acquire(&src_kvm->migration_in_progress, 0, 1)) + goto release_dst; + + return 0; + +release_dst: + atomic_set_release(&dst_kvm->migration_in_progress, 0); + return r; +} +EXPORT_SYMBOL_GPL(kvm_mark_migration_in_progress); + +void kvm_mark_migration_done(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + atomic_set_release(&dst_kvm->migration_in_progress, 0); + atomic_set_release(&src_kvm->migration_in_progress, 0); +} +EXPORT_SYMBOL_GPL(kvm_mark_migration_done); + +int kvm_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + int r; + + if (dst_kvm == src_kvm) + return -EINVAL; + + r = -EINTR; + if (mutex_lock_killable(&dst_kvm->lock)) + return r; + + if (mutex_lock_killable_nested(&src_kvm->lock, SINGLE_DEPTH_NESTING)) + goto unlock_dst; + + return 0; + +unlock_dst: + mutex_unlock(&dst_kvm->lock); + return r; +} +EXPORT_SYMBOL_GPL(kvm_lock_two_vms); + +void kvm_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + mutex_unlock(&dst_kvm->lock); + mutex_unlock(&src_kvm->lock); +} +EXPORT_SYMBOL_GPL(kvm_unlock_two_vms); + /* * Read or write a bunch of msrs. All parameters are kernel addresses. * diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 88a9475899c8..508f9509546c 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -649,4 +649,10 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
+int kvm_mark_migration_in_progress(struct kvm *dst_kvm, struct kvm *src_kvm); +void kvm_mark_migration_done(struct kvm *dst_kvm, struct kvm *src_kvm); + +int kvm_lock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm); +void kvm_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm); + #endif
From: Ackerley Tng ackerleytng@google.com
Co-developed-by: Sagi Shahar sagis@google.com Signed-off-by: Sagi Shahar sagis@google.com Co-developed-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm/sev.c | 29 +++--------------------- arch/x86/kvm/svm/svm.h | 2 +- arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++++++++++- 4 files changed, 43 insertions(+), 29 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6c06f3d6e081..179618300270 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1871,7 +1871,7 @@ struct kvm_x86_ops { int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *argp); int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *argp); int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); - int (*vm_move_enc_context_from)(struct kvm *kvm, unsigned int source_fd); + int (*vm_move_enc_context_from)(struct kvm *kvm, struct kvm *source_kvm); void (*guest_memory_reclaimed)(struct kvm *kvm);
int (*get_feature_msr)(u32 msr, u64 *data); diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index b3048ec411e2..689521d9e26f 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2000,34 +2000,15 @@ static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src) return 0; }
-int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) +int sev_vm_move_enc_context_from(struct kvm *kvm, struct kvm *source_kvm) { struct kvm_sev_info *dst_sev = to_kvm_sev_info(kvm); struct kvm_sev_info *src_sev, *cg_cleanup_sev; - CLASS(fd, f)(source_fd); - struct kvm *source_kvm; bool charged = false; int ret;
- if (fd_empty(f)) - return -EBADF; - - if (!file_is_kvm(fd_file(f))) - return -EBADF; - - source_kvm = fd_file(f)->private_data; - ret = kvm_mark_migration_in_progress(kvm, source_kvm); - if (ret) - return ret; - ret = kvm_lock_two_vms(kvm, source_kvm); - if (ret) - goto out_mark_migration_done; - - if (kvm->arch.vm_type != source_kvm->arch.vm_type || - sev_guest(kvm) || !sev_guest(source_kvm)) { - ret = -EINVAL; - goto out_unlock; - } + if (sev_guest(kvm) || !sev_guest(source_kvm)) + return -EINVAL;
src_sev = to_kvm_sev_info(source_kvm);
@@ -2066,10 +2047,6 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) sev_misc_cg_uncharge(cg_cleanup_sev); put_misc_cg(cg_cleanup_sev->misc_cg); cg_cleanup_sev->misc_cg = NULL; -out_unlock: - kvm_unlock_two_vms(kvm, source_kvm); -out_mark_migration_done: - kvm_mark_migration_done(kvm, source_kvm); return ret; }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 35df8be621c5..7bd31c0b135a 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -757,7 +757,7 @@ int sev_mem_enc_register_region(struct kvm *kvm, int sev_mem_enc_unregister_region(struct kvm *kvm, struct kvm_enc_region *range); int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd); -int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd); +int sev_vm_move_enc_context_from(struct kvm *kvm, struct kvm *source_kvm); void sev_guest_memory_reclaimed(struct kvm *kvm); int sev_handle_vmgexit(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 422c66a033d2..637540309456 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6597,6 +6597,43 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, return 0; }
+static int kvm_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) +{ + int r; + struct kvm *source_kvm; + struct fd f = fdget(source_fd); + struct file *file = fd_file(f); + + r = -EBADF; + if (!file) + return r; + + if (!file_is_kvm(file)) + goto out_fdput; + + r = -EINVAL; + source_kvm = file->private_data; + if (kvm->arch.vm_type != source_kvm->arch.vm_type) + goto out_fdput; + + r = kvm_mark_migration_in_progress(kvm, source_kvm); + if (r) + goto out_fdput; + + r = kvm_lock_two_vms(kvm, source_kvm); + if (r) + goto out_mark_migration_done; + + r = kvm_x86_call(vm_move_enc_context_from)(kvm, source_kvm); + + kvm_unlock_two_vms(kvm, source_kvm); +out_mark_migration_done: + kvm_mark_migration_done(kvm, source_kvm); +out_fdput: + fdput(f); + return r; +} + int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) { @@ -6738,7 +6775,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, if (!kvm_x86_ops.vm_move_enc_context_from) break;
- r = kvm_x86_call(vm_move_enc_context_from)(kvm, cap->args[0]); + r = kvm_vm_move_enc_context_from(kvm, cap->args[0]); break; case KVM_CAP_EXIT_HYPERCALL: if (cap->args[0] & ~KVM_EXIT_HYPERCALL_VALID_MASK) {
From: Ackerley Tng ackerleytng@google.com
SEV-capable VMs may also use the KVM_X86_SW_PROTECTED_VM type, but they will still need architecture-specific handling to move encryption context. Hence, we let moving of encryption context be configurable and store that configuration in a flag.
Co-developed-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/sev.c | 2 ++ arch/x86/kvm/x86.c | 9 ++++++++- 3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 179618300270..db37ce814611 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1576,6 +1576,7 @@ struct kvm_arch { #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache;
+ bool use_vm_enc_ctxt_op; gfn_t gfn_direct_bits;
/* diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 689521d9e26f..95083556d321 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -442,6 +442,8 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp, if (ret) goto e_no_asid;
+ kvm->arch.use_vm_enc_ctxt_op = true; + init_args.probe = false; ret = sev_platform_init(&init_args); if (ret) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 637540309456..3a7e05c47aa8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6624,7 +6624,14 @@ static int kvm_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) if (r) goto out_mark_migration_done;
- r = kvm_x86_call(vm_move_enc_context_from)(kvm, source_kvm); + /* + * Different types of VMs will allow userspace to define if moving + * encryption context should be required. + */ + if (kvm->arch.use_vm_enc_ctxt_op && + kvm_x86_ops.vm_move_enc_context_from) { + r = kvm_x86_call(vm_move_enc_context_from)(kvm, source_kvm); + }
kvm_unlock_two_vms(kvm, source_kvm); out_mark_migration_done:
From: Ackerley Tng ackerleytng@google.com
Migration of memory context involves moving lpage_info and mem_attr_array from source to destination VM.
Co-developed-by: Sagi Shahar sagis@google.com Signed-off-by: Sagi Shahar sagis@google.com Co-developed-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Vishal Annapurve vannapurve@google.com Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- arch/x86/kvm/x86.c | 110 +++++++++++++++++++++++++++++++++++++++ include/linux/kvm_host.h | 17 ++++++ virt/kvm/guest_memfd.c | 25 +++++++++ 3 files changed, 152 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3a7e05c47aa8..887702781465 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4564,6 +4564,33 @@ void kvm_unlock_two_vms(struct kvm *dst_kvm, struct kvm *src_kvm) } EXPORT_SYMBOL_GPL(kvm_unlock_two_vms);
+static int kvm_lock_vm_memslots(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + int r = -EINVAL; + + if (dst_kvm == src_kvm) + return r; + + r = -EINTR; + if (mutex_lock_killable(&dst_kvm->slots_lock)) + return r; + + if (mutex_lock_killable_nested(&src_kvm->slots_lock, SINGLE_DEPTH_NESTING)) + goto unlock_dst; + + return 0; + +unlock_dst: + mutex_unlock(&dst_kvm->slots_lock); + return r; +} + +static void kvm_unlock_vm_memslots(struct kvm *dst_kvm, struct kvm *src_kvm) +{ + mutex_unlock(&src_kvm->slots_lock); + mutex_unlock(&dst_kvm->slots_lock); +} + /* * Read or write a bunch of msrs. All parameters are kernel addresses. * @@ -6597,6 +6624,78 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, return 0; }
+static bool memslot_configurations_match(struct kvm_memslots *src_slots, + struct kvm_memslots *dst_slots) +{ + struct kvm_memslot_iter src_iter; + struct kvm_memslot_iter dst_iter; + + kvm_for_each_memslot_pair(&src_iter, src_slots, &dst_iter, dst_slots) { + if (src_iter.slot->base_gfn != dst_iter.slot->base_gfn || + src_iter.slot->npages != dst_iter.slot->npages || + src_iter.slot->flags != dst_iter.slot->flags) + return false; + + if (kvm_slot_can_be_private(dst_iter.slot) && + !kvm_gmem_params_match(src_iter.slot, dst_iter.slot)) + return false; + } + + /* There should be no more nodes to iterate if configurations match */ + return !src_iter.node && !dst_iter.node; +} + +static int kvm_move_memory_ctxt_from(struct kvm *dst, struct kvm *src) +{ + struct kvm_memslot_iter src_iter; + struct kvm_memslot_iter dst_iter; + struct kvm_memslots *src_slots, *dst_slots; + int i; + + /* TODO: Do we also need to check consistency for as_id == SMM? */ + src_slots = __kvm_memslots(src, 0); + dst_slots = __kvm_memslots(dst, 0); + + if (!memslot_configurations_match(src_slots, dst_slots)) + return -EINVAL; + + /* + * Transferring lpage_info is an optimization, lpage_info can be rebuilt + * by the destination VM. + */ + kvm_for_each_memslot_pair(&src_iter, src_slots, &dst_iter, dst_slots) { + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) { + unsigned long ugfn = dst_iter.slot->userspace_addr >> PAGE_SHIFT; + int level = i + 1; + + /* + * If the gfn and userspace address are not aligned wrt each + * other, skip migrating lpage_info. + */ + if ((dst_iter.slot->base_gfn ^ ugfn) & + (KVM_PAGES_PER_HPAGE(level) - 1)) + continue; + + kvfree(dst_iter.slot->arch.lpage_info[i - 1]); + dst_iter.slot->arch.lpage_info[i - 1] = + src_iter.slot->arch.lpage_info[i - 1]; + src_iter.slot->arch.lpage_info[i - 1] = NULL; + } + } + +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + /* + * For VMs that don't use private memory, this will just be moving an + * empty xarray pointer. + */ + dst->mem_attr_array.xa_head = src->mem_attr_array.xa_head; + src->mem_attr_array.xa_head = NULL; +#endif + + kvm_vm_dead(src); + return 0; +} + static int kvm_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) { int r; @@ -6624,6 +6723,14 @@ static int kvm_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) if (r) goto out_mark_migration_done;
+ r = kvm_lock_vm_memslots(kvm, source_kvm); + if (r) + goto out_unlock; + + r = kvm_move_memory_ctxt_from(kvm, source_kvm); + if (r) + goto out_unlock_memslots; + /* * Different types of VMs will allow userspace to define if moving * encryption context should be required. @@ -6633,6 +6740,9 @@ static int kvm_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd) r = kvm_x86_call(vm_move_enc_context_from)(kvm, source_kvm); }
+out_unlock_memslots: + kvm_unlock_vm_memslots(kvm, source_kvm); +out_unlock: kvm_unlock_two_vms(kvm, source_kvm); out_mark_migration_done: kvm_mark_migration_done(kvm, source_kvm); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0c1d637a6e7d..99abe9879856 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1197,6 +1197,16 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+ +/* Iterate over a pair of memslots in gfn order until one of the trees end */ +#define kvm_for_each_memslot_pair(iter1, slots1, iter2, slots2) \ + for (kvm_memslot_iter_start(iter1, slots1, 0), \ + kvm_memslot_iter_start(iter2, slots2, 0); \ + kvm_memslot_iter_is_valid(iter1, U64_MAX) && \ + kvm_memslot_iter_is_valid(iter2, U64_MAX); \ + kvm_memslot_iter_next(iter1), \ + kvm_memslot_iter_next(iter2)) + /* * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations: * - create a new memory slot @@ -2521,6 +2531,8 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, struct page **page, int *max_order); +bool kvm_gmem_params_match(struct kvm_memory_slot *slot1, + struct kvm_memory_slot *slot2); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2530,6 +2542,11 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } +static inline bool kvm_gmem_params_match(struct kvm_memory_slot *slot1, + struct kvm_memory_slot *slot2) +{ + return false; +} #endif /* CONFIG_KVM_PRIVATE_MEM */
#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index d76bd1119198..1a4198c4a4dd 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -778,6 +778,31 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, } EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);
+bool kvm_gmem_params_match(struct kvm_memory_slot *slot1, + struct kvm_memory_slot *slot2) +{ + bool ret; + struct file *file1; + struct file *file2; + + if (slot1->gmem.pgoff != slot2->gmem.pgoff) + return false; + + file1 = kvm_gmem_get_file(slot1); + file2 = kvm_gmem_get_file(slot2); + + ret = (file1 && file2 && + file_inode(file1) == file_inode(file2)); + + if (file1) + fput(file1); + if (file2) + fput(file2); + + return ret; +} +EXPORT_SYMBOL_GPL(kvm_gmem_params_match); + #ifdef CONFIG_KVM_GENERIC_PRIVATE_MEM long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages, kvm_gmem_populate_cb post_populate, void *opaque)
From: Ackerley Tng ackerleytng@google.com
These functions will be used in private (guest mem) migration tests.
Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- .../testing/selftests/kvm/include/kvm_util.h | 13 +++++ .../selftests/kvm/x86/sev_migrate_tests.c | 48 +++++++------------ 2 files changed, 30 insertions(+), 31 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 68faa658b69e..80375d6456a5 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -378,6 +378,19 @@ static inline void vm_enable_cap(struct kvm_vm *vm, uint32_t cap, uint64_t arg0) vm_ioctl(vm, KVM_ENABLE_CAP, &enable_cap); }
+static inline int __vm_migrate_from(struct kvm_vm *dst, struct kvm_vm *src) +{ + return __vm_enable_cap(dst, KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM, src->fd); +} + +static inline void vm_migrate_from(struct kvm_vm *dst, struct kvm_vm *src) +{ + int ret; + + ret = __vm_migrate_from(dst, src); + TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d\n", ret, errno); +} + static inline void vm_set_memory_attributes(struct kvm_vm *vm, uint64_t gpa, uint64_t size, uint64_t attributes) { diff --git a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c b/tools/testing/selftests/kvm/x86/sev_migrate_tests.c index 0a6dfba3905b..905cdf9b39b1 100644 --- a/tools/testing/selftests/kvm/x86/sev_migrate_tests.c +++ b/tools/testing/selftests/kvm/x86/sev_migrate_tests.c @@ -56,20 +56,6 @@ static struct kvm_vm *aux_vm_create(bool with_vcpus) return vm; }
-static int __sev_migrate_from(struct kvm_vm *dst, struct kvm_vm *src) -{ - return __vm_enable_cap(dst, KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM, src->fd); -} - - -static void sev_migrate_from(struct kvm_vm *dst, struct kvm_vm *src) -{ - int ret; - - ret = __sev_migrate_from(dst, src); - TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d", ret, errno); -} - static void test_sev_migrate_from(bool es) { struct kvm_vm *src_vm; @@ -81,13 +67,13 @@ static void test_sev_migrate_from(bool es) dst_vms[i] = aux_vm_create(true);
/* Initial migration from the src to the first dst. */ - sev_migrate_from(dst_vms[0], src_vm); + vm_migrate_from(dst_vms[0], src_vm);
for (i = 1; i < NR_MIGRATE_TEST_VMS; i++) - sev_migrate_from(dst_vms[i], dst_vms[i - 1]); + vm_migrate_from(dst_vms[i], dst_vms[i - 1]);
/* Migrate the guest back to the original VM. */ - ret = __sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]); + ret = __vm_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]); TEST_ASSERT(ret == -1 && errno == EIO, "VM that was migrated from should be dead. ret %d, errno: %d", ret, errno); @@ -109,7 +95,7 @@ static void *locking_test_thread(void *arg)
for (i = 0; i < NR_LOCK_TESTING_ITERATIONS; ++i) { j = i % NR_LOCK_TESTING_THREADS; - __sev_migrate_from(input->vm, input->source_vms[j]); + __vm_migrate_from(input->vm, input->source_vms[j]); }
return NULL; @@ -146,7 +132,7 @@ static void test_sev_migrate_parameters(void)
vm_no_vcpu = vm_create_barebones(); vm_no_sev = aux_vm_create(true); - ret = __sev_migrate_from(vm_no_vcpu, vm_no_sev); + ret = __vm_migrate_from(vm_no_vcpu, vm_no_sev); TEST_ASSERT(ret == -1 && errno == EINVAL, "Migrations require SEV enabled. ret %d, errno: %d", ret, errno); @@ -160,25 +146,25 @@ static void test_sev_migrate_parameters(void) sev_es_vm_init(sev_es_vm_no_vmsa); __vm_vcpu_add(sev_es_vm_no_vmsa, 1);
- ret = __sev_migrate_from(sev_vm, sev_es_vm); + ret = __vm_migrate_from(sev_vm, sev_es_vm); TEST_ASSERT( ret == -1 && errno == EINVAL, "Should not be able migrate to SEV enabled VM. ret: %d, errno: %d", ret, errno);
- ret = __sev_migrate_from(sev_es_vm, sev_vm); + ret = __vm_migrate_from(sev_es_vm, sev_vm); TEST_ASSERT( ret == -1 && errno == EINVAL, "Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d", ret, errno);
- ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm); + ret = __vm_migrate_from(vm_no_vcpu, sev_es_vm); TEST_ASSERT( ret == -1 && errno == EINVAL, "SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d", ret, errno);
- ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm_no_vmsa); + ret = __vm_migrate_from(vm_no_vcpu, sev_es_vm_no_vmsa); TEST_ASSERT( ret == -1 && errno == EINVAL, "SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d", @@ -331,14 +317,14 @@ static void test_sev_move_copy(void)
sev_mirror_create(mirror_vm, sev_vm);
- sev_migrate_from(dst_mirror_vm, mirror_vm); - sev_migrate_from(dst_vm, sev_vm); + vm_migrate_from(dst_mirror_vm, mirror_vm); + vm_migrate_from(dst_vm, sev_vm);
- sev_migrate_from(dst2_vm, dst_vm); - sev_migrate_from(dst2_mirror_vm, dst_mirror_vm); + vm_migrate_from(dst2_vm, dst_vm); + vm_migrate_from(dst2_mirror_vm, dst_mirror_vm);
- sev_migrate_from(dst3_mirror_vm, dst2_mirror_vm); - sev_migrate_from(dst3_vm, dst2_vm); + vm_migrate_from(dst3_mirror_vm, dst2_mirror_vm); + vm_migrate_from(dst3_vm, dst2_vm);
kvm_vm_free(dst_vm); kvm_vm_free(sev_vm); @@ -360,8 +346,8 @@ static void test_sev_move_copy(void)
sev_mirror_create(mirror_vm, sev_vm);
- sev_migrate_from(dst_mirror_vm, mirror_vm); - sev_migrate_from(dst_vm, sev_vm); + vm_migrate_from(dst_mirror_vm, mirror_vm); + vm_migrate_from(dst_vm, sev_vm);
kvm_vm_free(mirror_vm); kvm_vm_free(dst_mirror_vm);
From: Ackerley Tng ackerleytng@google.com
Tests that private mem (in guest_mem files) can be migrated. Also demonstrates the migration flow.
Signed-off-by: Ackerley Tng ackerleytng@google.com Signed-off-by: Ryan Afranji afranji@google.com --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/private_mem_migrate_tests.c | 56 ++++++++++--------- 2 files changed, 32 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index f62b0a5aba35..e9d53ea6c6c8 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -85,6 +85,7 @@ TEST_GEN_PROGS_x86 += x86/platform_info_test TEST_GEN_PROGS_x86 += x86/pmu_counters_test TEST_GEN_PROGS_x86 += x86/pmu_event_filter_test TEST_GEN_PROGS_x86 += x86/private_mem_conversions_test +TEST_GEN_PROGS_x86 += x86/private_mem_migrate_tests TEST_GEN_PROGS_x86 += x86/private_mem_kvm_exits_test TEST_GEN_PROGS_x86 += x86/set_boot_cpu_id TEST_GEN_PROGS_x86 += x86/set_sregs_test diff --git a/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c b/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c index 4226de3ebd41..4ad94ea04b66 100644 --- a/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c +++ b/tools/testing/selftests/kvm/x86/private_mem_migrate_tests.c @@ -1,32 +1,32 @@ // SPDX-License-Identifier: GPL-2.0 -#include "kvm_util_base.h" +#include "kvm_util.h" #include "test_util.h" #include "ucall_common.h" #include <linux/kvm.h> #include <linux/sizes.h>
-#define TRANSFER_PRIVATE_MEM_TEST_SLOT 10 -#define TRANSFER_PRIVATE_MEM_GPA ((uint64_t)(1ull << 32)) -#define TRANSFER_PRIVATE_MEM_GVA TRANSFER_PRIVATE_MEM_GPA -#define TRANSFER_PRIVATE_MEM_VALUE 0xdeadbeef +#define MIGRATE_PRIVATE_MEM_TEST_SLOT 10 +#define MIGRATE_PRIVATE_MEM_GPA ((uint64_t)(1ull << 32)) +#define MIGRATE_PRIVATE_MEM_GVA MIGRATE_PRIVATE_MEM_GPA +#define MIGRATE_PRIVATE_MEM_VALUE 0xdeadbeef
-static void transfer_private_mem_guest_code_src(void) +static void migrate_private_mem_data_guest_code_src(void) { - uint64_t volatile *const ptr = (uint64_t *)TRANSFER_PRIVATE_MEM_GVA; + uint64_t volatile *const ptr = (uint64_t *)MIGRATE_PRIVATE_MEM_GVA;
- *ptr = TRANSFER_PRIVATE_MEM_VALUE; + *ptr = MIGRATE_PRIVATE_MEM_VALUE;
GUEST_SYNC1(*ptr); }
-static void transfer_private_mem_guest_code_dst(void) +static void migrate_private_mem_guest_code_dst(void) { - uint64_t volatile *const ptr = (uint64_t *)TRANSFER_PRIVATE_MEM_GVA; + uint64_t volatile *const ptr = (uint64_t *)MIGRATE_PRIVATE_MEM_GVA;
GUEST_SYNC1(*ptr); }
-static void test_transfer_private_mem(void) +static void test_migrate_private_mem_data(bool migrate) { struct kvm_vm *src_vm, *dst_vm; struct kvm_vcpu *src_vcpu, *dst_vcpu; @@ -40,40 +40,43 @@ static void test_transfer_private_mem(void)
/* Build the source VM, use it to write to private memory */ src_vm = __vm_create_shape_with_one_vcpu( - shape, &src_vcpu, 0, transfer_private_mem_guest_code_src); + shape, &src_vcpu, 0, migrate_private_mem_data_guest_code_src); src_memfd = vm_create_guest_memfd(src_vm, SZ_4K, 0);
- vm_mem_add(src_vm, DEFAULT_VM_MEM_SRC, TRANSFER_PRIVATE_MEM_GPA, - TRANSFER_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_PRIVATE, + vm_mem_add(src_vm, DEFAULT_VM_MEM_SRC, MIGRATE_PRIVATE_MEM_GPA, + MIGRATE_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_GUEST_MEMFD, src_memfd, 0);
- virt_map(src_vm, TRANSFER_PRIVATE_MEM_GVA, TRANSFER_PRIVATE_MEM_GPA, 1); - vm_set_memory_attributes(src_vm, TRANSFER_PRIVATE_MEM_GPA, SZ_4K, + virt_map(src_vm, MIGRATE_PRIVATE_MEM_GVA, MIGRATE_PRIVATE_MEM_GPA, 1); + vm_set_memory_attributes(src_vm, MIGRATE_PRIVATE_MEM_GPA, SZ_4K, KVM_MEMORY_ATTRIBUTE_PRIVATE);
vcpu_run(src_vcpu); TEST_ASSERT_KVM_EXIT_REASON(src_vcpu, KVM_EXIT_IO); get_ucall(src_vcpu, &uc); - TEST_ASSERT(uc.args[0] == TRANSFER_PRIVATE_MEM_VALUE, + TEST_ASSERT(uc.args[0] == MIGRATE_PRIVATE_MEM_VALUE, "Source VM should be able to write to private memory");
/* Build the destination VM with linked fd */ dst_vm = __vm_create_shape_with_one_vcpu( - shape, &dst_vcpu, 0, transfer_private_mem_guest_code_dst); + shape, &dst_vcpu, 0, migrate_private_mem_guest_code_dst); dst_memfd = vm_link_guest_memfd(dst_vm, src_memfd, 0);
- vm_mem_add(dst_vm, DEFAULT_VM_MEM_SRC, TRANSFER_PRIVATE_MEM_GPA, - TRANSFER_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_PRIVATE, + vm_mem_add(dst_vm, DEFAULT_VM_MEM_SRC, MIGRATE_PRIVATE_MEM_GPA, + MIGRATE_PRIVATE_MEM_TEST_SLOT, 1, KVM_MEM_GUEST_MEMFD, dst_memfd, 0);
- virt_map(dst_vm, TRANSFER_PRIVATE_MEM_GVA, TRANSFER_PRIVATE_MEM_GPA, 1); - vm_set_memory_attributes(dst_vm, TRANSFER_PRIVATE_MEM_GPA, SZ_4K, - KVM_MEMORY_ATTRIBUTE_PRIVATE); + virt_map(dst_vm, MIGRATE_PRIVATE_MEM_GVA, MIGRATE_PRIVATE_MEM_GPA, 1); + if (migrate) + vm_migrate_from(dst_vm, src_vm); + else + vm_set_memory_attributes(dst_vm, MIGRATE_PRIVATE_MEM_GPA, SZ_4K, + KVM_MEMORY_ATTRIBUTE_PRIVATE);
vcpu_run(dst_vcpu); TEST_ASSERT_KVM_EXIT_REASON(dst_vcpu, KVM_EXIT_IO); get_ucall(dst_vcpu, &uc); - TEST_ASSERT(uc.args[0] == TRANSFER_PRIVATE_MEM_VALUE, + TEST_ASSERT(uc.args[0] == MIGRATE_PRIVATE_MEM_VALUE, "Destination VM should be able to read value transferred"); }
@@ -81,7 +84,10 @@ int main(int argc, char *argv[]) { TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM));
- test_transfer_private_mem(); + test_migrate_private_mem_data(false); + + if (kvm_check_cap(KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM)) + test_migrate_private_mem_data(true);
return 0; }
linux-kselftest-mirror@lists.linaro.org