User space can use the MEM_OP ioctl to make storage key checked reads
and writes to the guest, however, it has no way of performing atomic,
key checked, accesses to the guest.
Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
operation. For now, support this operation for absolute accesses only.
This operation can be use, for example, to set the device-state-change
indicator and the adapter-local-summary indicator atomically.
Also contains some fixes/changes for the memop selftest independent of
the cmpxchg changes.
v5 -> v6
* move memop selftest fixes/refactoring to front of series so they can
be picked independently from the rest
* use op instead of flag to indicate cmpxchg
* no longer indicate success of cmpxchg to user space, which can infer
it by observing a change in the old value instead
* refactor functions implementing the ioctl
* adjust documentation (drop R-b)
* adjust selftest
* rebase
v4 -> v5
* refuse cmpxchg if not write (thanks Thomas)
* minor doc changes (thanks Claudio)
* picked up R-b's (thanks Thomas & Claudio)
* memop selftest fixes
* rebased
v3 -> v4
* no functional change intended
* rework documentation a bit
* name extension cap cmpxchg bit
* picked up R-b (thanks Thomas)
* various changes (rename variable, comments, ...) see range-diff below
v2 -> v3
* rebase onto the wip/cmpxchg_user_key branch in the s390 kernel repo
* use __uint128_t instead of unsigned __int128
* put moving of testlist into main into separate patch
* pick up R-b's (thanks Nico)
v1 -> v2
* get rid of xrk instruction for cmpxchg byte and short implementation
* pass old parameter via pointer instead of in mem_op struct
* indicate failure of cmpxchg due to wrong old value by special return
code
* picked up R-b's (thanks Thomas)
Janis Schoetterl-Glausch (14):
KVM: s390: selftest: memop: Pass mop_desc via pointer
KVM: s390: selftest: memop: Replace macros by functions
KVM: s390: selftest: memop: Move testlist into main
KVM: s390: selftest: memop: Add bad address test
KVM: s390: selftest: memop: Fix typo
KVM: s390: selftest: memop: Fix wrong address being used in test
KVM: s390: selftest: memop: Fix integer literal
KVM: s390: Move common code of mem_op functions into functions
KVM: s390: Dispatch to implementing function at top level of vm mem_op
KVM: s390: Refactor absolute vm mem_op function
KVM: s390: Refactor absolute vcpu mem_op function
KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg
Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG
KVM: s390: selftest: memop: Add cmpxchg tests
Documentation/virt/kvm/api.rst | 29 +-
include/uapi/linux/kvm.h | 8 +
arch/s390/kvm/gaccess.h | 3 +
arch/s390/kvm/gaccess.c | 103 ++++
arch/s390/kvm/kvm-s390.c | 249 ++++----
tools/testing/selftests/kvm/s390x/memop.c | 675 +++++++++++++++++-----
6 files changed, 819 insertions(+), 248 deletions(-)
Range-diff against v5:
3: 94c1165ae24a = 1: 512e1a3e0ae5 KVM: s390: selftest: memop: Pass mop_desc via pointer
4: 027c87eee0ac = 2: 47328ea64f80 KVM: s390: selftest: memop: Replace macros by functions
5: 16ac410ecc0f = 3: 224fe37eeec7 KVM: s390: selftest: memop: Move testlist into main
7: 2d6776733e64 = 4: f622d3413cf0 KVM: s390: selftest: memop: Add bad address test
8: 8c49eafd2881 = 5: 431f191a8a57 KVM: s390: selftest: memop: Fix typo
9: 0af907110b34 = 6: 3122187435fb KVM: s390: selftest: memop: Fix wrong address being used in test
10: 886c80b2bdce = 7: 401f51f3ef55 KVM: s390: selftest: memop: Fix integer literal
-: ------------ > 8: df09794e0794 KVM: s390: Move common code of mem_op functions into functions
-: ------------ > 9: 5cbae63357ed KVM: s390: Dispatch to implementing function at top level of vm mem_op
-: ------------ > 10: 76ba77b63a26 KVM: s390: Refactor absolute vm mem_op function
-: ------------ > 11: c848e772e22a KVM: s390: Refactor absolute vcpu mem_op function
1: 6adc166ee141 ! 12: 6ccb200ad85c KVM: s390: Extend MEM_OP ioctl by storage key checked cmpxchg
@@ Commit message
and writes to the guest, however, it has no way of performing atomic,
key checked, accesses to the guest.
Extend the MEM_OP ioctl in order to allow for this, by adding a cmpxchg
- mode. For now, support this mode for absolute accesses only.
+ op. For now, support this op for absolute accesses only.
- This mode can be use, for example, to set the device-state-change
+ This op can be use, for example, to set the device-state-change
indicator and the adapter-local-summary indicator atomically.
Signed-off-by: Janis Schoetterl-Glausch <scgl(a)linux.ibm.com>
@@ include/uapi/linux/kvm.h: struct kvm_s390_mem_op {
__u8 ar; /* the access register number */
__u8 key; /* access key, ignored if flag unset */
+ __u8 pad1[6]; /* ignored */
-+ __u64 old_addr; /* ignored if flag unset */
++ __u64 old_addr; /* ignored if cmpxchg flag unset */
};
__u32 sida_offset; /* offset into the sida */
__u8 reserved[32]; /* ignored */
@@ include/uapi/linux/kvm.h: struct kvm_s390_mem_op {
+ #define KVM_S390_MEMOP_SIDA_WRITE 3
+ #define KVM_S390_MEMOP_ABSOLUTE_READ 4
+ #define KVM_S390_MEMOP_ABSOLUTE_WRITE 5
++#define KVM_S390_MEMOP_ABSOLUTE_CMPXCHG 6
++
+ /* flags for kvm_s390_mem_op->flags */
#define KVM_S390_MEMOP_F_CHECK_ONLY (1ULL << 0)
#define KVM_S390_MEMOP_F_INJECT_EXCEPTION (1ULL << 1)
#define KVM_S390_MEMOP_F_SKEY_PROTECTION (1ULL << 2)
-+#define KVM_S390_MEMOP_F_CMPXCHG (1ULL << 3)
-+/* flags specifying extension support */
-+#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG 0x2
-+/* Non program exception return codes (pgm codes are 16 bit) */
-+#define KVM_S390_MEMOP_R_NO_XCHG (1 << 16)
++/* flags specifying extension support via KVM_CAP_S390_MEM_OP_EXTENSION */
++#define KVM_S390_MEMOP_EXTENSION_CAP_BASE (1 << 0)
++#define KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG (1 << 1)
++
/* for KVM_INTERRUPT */
struct kvm_interrupt {
+ /* in */
## arch/s390/kvm/gaccess.h ##
@@ arch/s390/kvm/gaccess.h: int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
void *data, unsigned long len, enum gacc_mode mode);
-+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
-+ __uint128_t *old, __uint128_t new, u8 access_key);
++int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len, __uint128_t *old,
++ __uint128_t new, u8 access_key, bool *success);
+
/**
* write_guest_with_key - copy data from kernel space to guest space
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ * @gpa: Absolute guest address of the location to be changed.
+ * @len: Operand length of the cmpxchg, required: 1 <= len <= 16. Providing a
+ * non power of two will result in failure.
-+ * @old_addr: Pointer to old value. If the location at @gpa contains this value, the
-+ * exchange will succeed. After calling cmpxchg_guest_abs_with_key() *@old
-+ * contains the value at @gpa before the attempt to exchange the value.
++ * @old_addr: Pointer to old value. If the location at @gpa contains this value,
++ * the exchange will succeed. After calling cmpxchg_guest_abs_with_key()
++ * *@old_addr contains the value at @gpa before the attempt to
++ * exchange the value.
+ * @new: The value to place at @gpa.
+ * @access_key: The access key to use for the guest access.
++ * @success: output value indicating if an exchange occurred.
+ *
+ * Atomically exchange the value at @gpa by @new, if it contains *@old.
+ * Honors storage keys.
+ *
+ * Return: * 0: successful exchange
-+ * * 1: exchange unsuccessful
+ * * a program interruption code indicating the reason cmpxchg could
+ * not be attempted
+ * * -EINVAL: address misaligned or len not power of two
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ */
+int cmpxchg_guest_abs_with_key(struct kvm *kvm, gpa_t gpa, int len,
+ __uint128_t *old_addr, __uint128_t new,
-+ u8 access_key)
++ u8 access_key, bool *success)
+{
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ u8 old;
+
+ ret = cmpxchg_user_key((u8 *)hva, &old, *old_addr, new, access_key);
-+ ret = ret < 0 ? ret : old != *old_addr;
++ *success = !ret && old == *old_addr;
+ *old_addr = old;
+ break;
+ }
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ u16 old;
+
+ ret = cmpxchg_user_key((u16 *)hva, &old, *old_addr, new, access_key);
-+ ret = ret < 0 ? ret : old != *old_addr;
++ *success = !ret && old == *old_addr;
+ *old_addr = old;
+ break;
+ }
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ u32 old;
+
+ ret = cmpxchg_user_key((u32 *)hva, &old, *old_addr, new, access_key);
-+ ret = ret < 0 ? ret : old != *old_addr;
++ *success = !ret && old == *old_addr;
+ *old_addr = old;
+ break;
+ }
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ u64 old;
+
+ ret = cmpxchg_user_key((u64 *)hva, &old, *old_addr, new, access_key);
-+ ret = ret < 0 ? ret : old != *old_addr;
++ *success = !ret && old == *old_addr;
+ *old_addr = old;
+ break;
+ }
@@ arch/s390/kvm/gaccess.c: int access_guest_real(struct kvm_vcpu *vcpu, unsigned l
+ __uint128_t old;
+
+ ret = cmpxchg_user_key((__uint128_t *)hva, &old, *old_addr, new, access_key);
-+ ret = ret < 0 ? ret : old != *old_addr;
++ *success = !ret && old == *old_addr;
+ *old_addr = old;
+ break;
+ }
@@ arch/s390/kvm/kvm-s390.c: int kvm_vm_ioctl_check_extension(struct kvm *kvm, long
+ case KVM_CAP_S390_MEM_OP_EXTENSION:
+ /*
+ * Flag bits indicating which extensions are supported.
-+ * The first extension doesn't use a flag, but pretend it does,
-+ * this way that can be changed in the future.
++ * If r > 0, the base extension must also be supported/indicated,
++ * in order to maintain backwards compatibility.
+ */
-+ r = KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG | 1;
++ r = KVM_S390_MEMOP_EXTENSION_CAP_BASE |
++ KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG;
+ break;
case KVM_CAP_NR_VCPUS:
case KVM_CAP_MAX_VCPUS:
case KVM_CAP_MAX_VCPU_ID:
-@@ arch/s390/kvm/kvm-s390.c: static bool access_key_invalid(u8 access_key)
- static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
- {
- void __user *uaddr = (void __user *)mop->buf;
+@@ arch/s390/kvm/kvm-s390.c: static int kvm_s390_vm_mem_op_abs(struct kvm *kvm, struct kvm_s390_mem_op *mop)
+ return r;
+ }
+
++static int kvm_s390_vm_mem_op_cmpxchg(struct kvm *kvm, struct kvm_s390_mem_op *mop)
++{
++ void __user *uaddr = (void __user *)mop->buf;
+ void __user *old_addr = (void __user *)mop->old_addr;
+ union {
+ __uint128_t quad;
+ char raw[sizeof(__uint128_t)];
+ } old = { .quad = 0}, new = { .quad = 0 };
+ unsigned int off_in_quad = sizeof(new) - mop->size;
- u64 supported_flags;
- void *tmpbuf = NULL;
- int r, srcu_idx;
-
- supported_flags = KVM_S390_MEMOP_F_SKEY_PROTECTION
-- | KVM_S390_MEMOP_F_CHECK_ONLY;
-+ | KVM_S390_MEMOP_F_CHECK_ONLY
-+ | KVM_S390_MEMOP_F_CMPXCHG;
- if (mop->flags & ~supported_flags || !mop->size)
- return -EINVAL;
- if (mop->size > MEM_OP_MAX_SIZE)
-@@ arch/s390/kvm/kvm-s390.c: static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
- } else {
- mop->key = 0;
- }
-+ if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
-+ /*
-+ * This validates off_in_quad. Checking that size is a power
-+ * of two is not necessary, as cmpxchg_guest_abs_with_key
-+ * takes care of that
-+ */
-+ if (mop->size > sizeof(new))
-+ return -EINVAL;
-+ if (mop->op != KVM_S390_MEMOP_ABSOLUTE_WRITE)
-+ return -EINVAL;
-+ if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
-+ return -EFAULT;
-+ if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
-+ return -EFAULT;
++ int r, srcu_idx;
++ bool success;
++
++ r = mem_op_validate_common(mop, KVM_S390_MEMOP_F_SKEY_PROTECTION);
++ if (r)
++ return r;
++ /*
++ * This validates off_in_quad. Checking that size is a power
++ * of two is not necessary, as cmpxchg_guest_abs_with_key
++ * takes care of that
++ */
++ if (mop->size > sizeof(new))
++ return -EINVAL;
++ if (copy_from_user(&new.raw[off_in_quad], uaddr, mop->size))
++ return -EFAULT;
++ if (copy_from_user(&old.raw[off_in_quad], old_addr, mop->size))
++ return -EFAULT;
++
++ srcu_idx = srcu_read_lock(&kvm->srcu);
++
++ if (kvm_is_error_gpa(kvm, mop->gaddr)) {
++ r = PGM_ADDRESSING;
++ goto out_unlock;
+ }
- if (!(mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY)) {
- tmpbuf = vmalloc(mop->size);
- if (!tmpbuf)
++
++ r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size, &old.quad,
++ new.quad, mop->key, &success);
++ if (!success && copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
++ r = -EFAULT;
++
++out_unlock:
++ srcu_read_unlock(&kvm->srcu, srcu_idx);
++ return r;
++}
++
+ static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
+ {
+ /*
@@ arch/s390/kvm/kvm-s390.c: static int kvm_s390_vm_mem_op(struct kvm *kvm, struct kvm_s390_mem_op *mop)
- case KVM_S390_MEMOP_ABSOLUTE_WRITE: {
- if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) {
- r = check_gpa_range(kvm, mop->gaddr, mop->size, GACC_STORE, mop->key);
-+ } else if (mop->flags & KVM_S390_MEMOP_F_CMPXCHG) {
-+ r = cmpxchg_guest_abs_with_key(kvm, mop->gaddr, mop->size,
-+ &old.quad, new.quad, mop->key);
-+ if (r == 1) {
-+ r = KVM_S390_MEMOP_R_NO_XCHG;
-+ if (copy_to_user(old_addr, &old.raw[off_in_quad], mop->size))
-+ r = -EFAULT;
-+ }
- } else {
- if (copy_from_user(tmpbuf, uaddr, mop->size)) {
- r = -EFAULT;
+ case KVM_S390_MEMOP_ABSOLUTE_READ:
+ case KVM_S390_MEMOP_ABSOLUTE_WRITE:
+ return kvm_s390_vm_mem_op_abs(kvm, mop);
++ case KVM_S390_MEMOP_ABSOLUTE_CMPXCHG:
++ return kvm_s390_vm_mem_op_cmpxchg(kvm, mop);
+ default:
+ return -EINVAL;
+ }
2: fce9a063ab70 ! 13: 4d983d179903 Documentation: KVM: s390: Describe KVM_S390_MEMOP_F_CMPXCHG
@@ Commit message
checked) cmpxchg operations on guest memory.
Signed-off-by: Janis Schoetterl-Glausch <scgl(a)linux.ibm.com>
- Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
## Documentation/virt/kvm/api.rst ##
@@ Documentation/virt/kvm/api.rst: The fields in each entry are defined as follows:
@@ Documentation/virt/kvm/api.rst: Parameters are specified via the following struc
};
__u32 sida_offset; /* offset into the sida */
__u8 reserved[32]; /* ignored */
-@@ Documentation/virt/kvm/api.rst: Absolute accesses are permitted for non-protected guests only.
- Supported flags:
+@@ Documentation/virt/kvm/api.rst: Possible operations are:
+ * ``KVM_S390_MEMOP_ABSOLUTE_WRITE``
+ * ``KVM_S390_MEMOP_SIDA_READ``
+ * ``KVM_S390_MEMOP_SIDA_WRITE``
++ * ``KVM_S390_MEMOP_ABSOLUTE_CMPXCHG``
+
+ Logical read/write:
+ ^^^^^^^^^^^^^^^^^^^
+@@ Documentation/virt/kvm/api.rst: the checks required for storage key protection as one operation (as opposed to
+ user space getting the storage keys, performing the checks, and accessing
+ memory thereafter, which could lead to a delay between check and access).
+ Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION
+-is > 0.
++has the KVM_S390_MEMOP_EXTENSION_CAP_BASE bit set.
+ Currently absolute accesses are not permitted for VCPU ioctls.
+ Absolute accesses are permitted for non-protected guests only.
+
+@@ Documentation/virt/kvm/api.rst: Supported flags:
* ``KVM_S390_MEMOP_F_CHECK_ONLY``
* ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
-+ * ``KVM_S390_MEMOP_F_CMPXCHG``
-+
+
+-The semantics of the flags are as for logical accesses.
+The semantics of the flags common with logical accesses are as for logical
+accesses.
+
-+For write accesses, the KVM_S390_MEMOP_F_CMPXCHG flag is supported if
-+KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set.
-+In this case, instead of doing an unconditional write, the access occurs
-+only if the target location contains the value pointed to by "old_addr".
++Absolute cmpxchg:
++^^^^^^^^^^^^^^^^^
++
++Perform cmpxchg on absolute guest memory. Intended for use with the
++KVM_S390_MEMOP_F_SKEY_PROTECTION flag.
++Instead of doing an unconditional write, the access occurs only if the target
++location contains the value pointed to by "old_addr".
+This is performed as an atomic cmpxchg with the length specified by the "size"
+parameter. "size" must be a power of two up to and including 16.
+If the exchange did not take place because the target value doesn't match the
-+old value, KVM_S390_MEMOP_R_NO_XCHG is returned.
-+In this case the value "old_addr" points to is replaced by the target value.
-
--The semantics of the flags are as for logical accesses.
++old value, the value "old_addr" points to is replaced by the target value.
++User space can tell if an exchange took place by checking if this replacement
++occurred. The cmpxchg op is permitted for the VM ioctl if
++KVM_CAP_S390_MEM_OP_EXTENSION has flag KVM_S390_MEMOP_EXTENSION_CAP_CMPXCHG set.
++
++Supported flags:
++ * ``KVM_S390_MEMOP_F_SKEY_PROTECTION``
SIDA read/write:
^^^^^^^^^^^^^^^^
6: 214281b6eb96 ! 14: 5250be3dd58b KVM: s390: selftest: memop: Add cmpxchg tests
@@ tools/testing/selftests/kvm/s390x/memop.c
#include <linux/bits.h>
+@@ tools/testing/selftests/kvm/s390x/memop.c: enum mop_target {
+ enum mop_access_mode {
+ READ,
+ WRITE,
++ CMPXCHG,
+ };
+
+ struct mop_desc {
@@ tools/testing/selftests/kvm/s390x/memop.c: struct mop_desc {
enum mop_access_mode mode;
void *buf;
uint32_t sida_offset;
+ void *old;
++ uint8_t old_value[16];
+ bool *cmpxchg_success;
uint8_t ar;
uint8_t key;
};
+
+ const uint8_t NO_KEY = 0xff;
+
+-static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc)
++static struct kvm_s390_mem_op ksmo_from_desc(struct mop_desc *desc)
+ {
+ struct kvm_s390_mem_op ksmo = {
+ .gaddr = (uintptr_t)desc->gaddr,
@@ tools/testing/selftests/kvm/s390x/memop.c: static struct kvm_s390_mem_op ksmo_from_desc(const struct mop_desc *desc)
- ksmo.flags |= KVM_S390_MEMOP_F_SKEY_PROTECTION;
- ksmo.key = desc->key;
- }
-+ if (desc->old) {
-+ ksmo.flags |= KVM_S390_MEMOP_F_CMPXCHG;
-+ ksmo.old_addr = (uint64_t)desc->old;
-+ }
- if (desc->_ar)
- ksmo.ar = desc->ar;
- else
+ ksmo.op = KVM_S390_MEMOP_ABSOLUTE_READ;
+ if (desc->mode == WRITE)
+ ksmo.op = KVM_S390_MEMOP_ABSOLUTE_WRITE;
++ if (desc->mode == CMPXCHG) {
++ ksmo.op = KVM_S390_MEMOP_ABSOLUTE_CMPXCHG;
++ ksmo.old_addr = (uint64_t)desc->old;
++ memcpy(desc->old_value, desc->old, desc->size);
++ }
+ break;
+ case INVALID:
+ ksmo.op = -1;
@@ tools/testing/selftests/kvm/s390x/memop.c: static void print_memop(struct kvm_vcpu *vcpu, const struct kvm_s390_mem_op *ksm
+ case KVM_S390_MEMOP_ABSOLUTE_WRITE:
printf("ABSOLUTE, WRITE, ");
break;
++ case KVM_S390_MEMOP_ABSOLUTE_CMPXCHG:
++ printf("ABSOLUTE, CMPXCHG, ");
++ break;
}
- printf("gaddr=%llu, size=%u, buf=%llu, ar=%u, key=%u",
- ksmo->gaddr, ksmo->size, ksmo->buf, ksmo->ar, ksmo->key);
@@ tools/testing/selftests/kvm/s390x/memop.c: static void print_memop(struct kvm_vc
if (ksmo->flags & KVM_S390_MEMOP_F_CHECK_ONLY)
printf(", CHECK_ONLY");
if (ksmo->flags & KVM_S390_MEMOP_F_INJECT_EXCEPTION)
- printf(", INJECT_EXCEPTION");
- if (ksmo->flags & KVM_S390_MEMOP_F_SKEY_PROTECTION)
- printf(", SKEY_PROTECTION");
-+ if (ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG)
-+ printf(", CMPXCHG");
+@@ tools/testing/selftests/kvm/s390x/memop.c: static void print_memop(struct kvm_vcpu *vcpu, const struct kvm_s390_mem_op *ksm
puts(")");
}
@@ tools/testing/selftests/kvm/s390x/memop.c: static void print_memop(struct kvm_vc
+ int r;
+
+ r = err_memop_ioctl(info, ksmo, desc);
-+ if (ksmo->flags & KVM_S390_MEMOP_F_CMPXCHG) {
-+ if (desc->cmpxchg_success)
-+ *desc->cmpxchg_success = !r;
-+ if (r == KVM_S390_MEMOP_R_NO_XCHG)
-+ r = 0;
++ if (ksmo->op == KVM_S390_MEMOP_ABSOLUTE_CMPXCHG) {
++ if (desc->cmpxchg_success) {
++ int diff = memcmp(desc->old_value, desc->old, desc->size);
++ *desc->cmpxchg_success = !diff;
++ }
+ }
+ TEST_ASSERT(!r, __KVM_IOCTL_ERROR("KVM_S390_MEM_OP", r));
@@ tools/testing/selftests/kvm/s390x/memop.c: static void default_read(struct test_
+ default_write_read(test->vcpu, test->vcpu, LOGICAL, 16, NO_KEY);
+
+ memcpy(&old, mem1, 16);
-+ CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset,
-+ size, GADDR_V(mem1 + offset),
-+ CMPXCHG_OLD(old + offset),
-+ CMPXCHG_SUCCESS(&succ), KEY(key));
++ MOP(test->vm, ABSOLUTE, CMPXCHG, new + offset,
++ size, GADDR_V(mem1 + offset),
++ CMPXCHG_OLD(old + offset),
++ CMPXCHG_SUCCESS(&succ), KEY(key));
+ HOST_SYNC(test->vcpu, STAGE_COPIED);
+ MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2));
+ TEST_ASSERT(succ, "exchange of values should succeed");
@@ tools/testing/selftests/kvm/s390x/memop.c: static void default_read(struct test_
+ memcpy(&old, mem1, 16);
+ new[offset]++;
+ old[offset]++;
-+ CHECK_N_DO(MOP, test->vm, ABSOLUTE, WRITE, new + offset,
-+ size, GADDR_V(mem1 + offset),
-+ CMPXCHG_OLD(old + offset),
-+ CMPXCHG_SUCCESS(&succ), KEY(key));
++ MOP(test->vm, ABSOLUTE, CMPXCHG, new + offset,
++ size, GADDR_V(mem1 + offset),
++ CMPXCHG_OLD(old + offset),
++ CMPXCHG_SUCCESS(&succ), KEY(key));
+ HOST_SYNC(test->vcpu, STAGE_COPIED);
+ MOP(test->vm, ABSOLUTE, READ, mem2, 16, GADDR_V(mem2));
+ TEST_ASSERT(!succ, "exchange of values should not succeed");
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ do {
+ old = 0;
+ new = 1;
-+ MOP(t.vm, ABSOLUTE, WRITE, &new,
++ MOP(t.vm, ABSOLUTE, CMPXCHG, &new,
+ sizeof(new), GADDR_V(mem1),
+ CMPXCHG_OLD(&old),
+ CMPXCHG_SUCCESS(&success), KEY(1));
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_copy_key(void)
+ choose_block(false, i + j, &size, &offset);
+ do {
+ new = permutate_bits(false, i + j, size, old);
-+ MOP(t.vm, ABSOLUTE, WRITE, quad_to_char(&new, size),
++ MOP(t.vm, ABSOLUTE, CMPXCHG, quad_to_char(&new, size),
+ size, GADDR_V(mem2 + offset),
+ CMPXCHG_OLD(quad_to_char(&old, size)),
+ CMPXCHG_SUCCESS(&success), KEY(1));
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors_key(void)
+ for (i = 1; i <= 16; i *= 2) {
+ __uint128_t old = 0;
+
-+ CHECK_N_DO(ERR_PROT_MOP, t.vm, ABSOLUTE, WRITE, mem2, i, GADDR_V(mem2),
-+ CMPXCHG_OLD(&old), KEY(2));
++ ERR_PROT_MOP(t.vm, ABSOLUTE, CMPXCHG, mem2, i, GADDR_V(mem2),
++ CMPXCHG_OLD(&old), KEY(2));
+ }
+
+ kvm_vm_free(t.kvm_vm);
@@ tools/testing/selftests/kvm/s390x/memop.c: static void test_errors(void)
+ power *= 2;
+ continue;
+ }
-+ rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1),
++ rv = ERR_MOP(t.vm, ABSOLUTE, CMPXCHG, mem1, i, GADDR_V(mem1),
+ CMPXCHG_OLD(&old));
+ TEST_ASSERT(rv == -1 && errno == EINVAL,
+ "ioctl allows bad size for cmpxchg");
+ }
+ for (i = 1; i <= 16; i *= 2) {
-+ rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR((void *)~0xfffUL),
++ rv = ERR_MOP(t.vm, ABSOLUTE, CMPXCHG, mem1, i, GADDR((void *)~0xfffUL),
+ CMPXCHG_OLD(&old));
+ TEST_ASSERT(rv > 0, "ioctl allows bad guest address for cmpxchg");
-+ rv = ERR_MOP(t.vm, ABSOLUTE, READ, mem1, i, GADDR_V(mem1),
-+ CMPXCHG_OLD(&old));
-+ TEST_ASSERT(rv == -1 && errno == EINVAL,
-+ "ioctl allows read cmpxchg call");
+ }
+ for (i = 2; i <= 16; i *= 2) {
-+ rv = ERR_MOP(t.vm, ABSOLUTE, WRITE, mem1, i, GADDR_V(mem1 + 1),
++ rv = ERR_MOP(t.vm, ABSOLUTE, CMPXCHG, mem1, i, GADDR_V(mem1 + 1),
+ CMPXCHG_OLD(&old));
+ TEST_ASSERT(rv == -1 && errno == EINVAL,
+ "ioctl allows bad alignment for cmpxchg");
--
2.34.1
Hi Linus,
Please pull the following KUnit fixes update for Linux 6.2-rc7.
This KUnit fixes update for Linux 6.2-rc7 consists of 3 fixes to bugs
that cause kernel crash, link error during build, and a third to fix
kunit_test_init_section_suites() extra indirection issue.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 88603b6dc419445847923fcb7fe5080067a30f98:
Linux 6.2-rc2 (2023-01-01 13:53:16 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-fixes-6.2-rc7
for you to fetch changes up to 254c71374a70051a043676b67ba4f7ad392b5fe6:
kunit: fix kunit_test_init_section_suites(...) (2023-01-31 09:10:38 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-fixes-6.2-rc7
This KUnit fixes update for Linux 6.2-rc7 consists of 3 fixes to bugs
that cause kernel crash, link error during build, and a third to fix
kunit_test_init_section_suites() extra indirection issue.
----------------------------------------------------------------
Arnd Bergmann (1):
kunit: Export kunit_running()
Brendan Higgins (1):
kunit: fix kunit_test_init_section_suites(...)
Rae Moar (1):
kunit: fix bug in KUNIT_EXPECT_MEMEQ
include/kunit/test.h | 6 +++---
lib/kunit/assert.c | 40 +++++++++++++++++++++++++---------------
lib/kunit/test.c | 1 +
3 files changed, 29 insertions(+), 18 deletions(-)
----------------------------------------------------------------
There are two spelling mistakes in the test messages. Fix them.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/prctl/disable-tsc-ctxt-sw-stress-test.c | 2 +-
tools/testing/selftests/prctl/disable-tsc-on-off-stress-test.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/prctl/disable-tsc-ctxt-sw-stress-test.c b/tools/testing/selftests/prctl/disable-tsc-ctxt-sw-stress-test.c
index 62a93cc61b7c..6d1a5ee8eb28 100644
--- a/tools/testing/selftests/prctl/disable-tsc-ctxt-sw-stress-test.c
+++ b/tools/testing/selftests/prctl/disable-tsc-ctxt-sw-stress-test.c
@@ -79,7 +79,7 @@ int main(void)
{
int n_tasks = 100, i;
- fprintf(stderr, "[No further output means we're allright]\n");
+ fprintf(stderr, "[No further output means we're all right]\n");
for (i=0; i<n_tasks; i++)
if (fork() == 0)
diff --git a/tools/testing/selftests/prctl/disable-tsc-on-off-stress-test.c b/tools/testing/selftests/prctl/disable-tsc-on-off-stress-test.c
index 79950f9a26fd..d39511eb9b01 100644
--- a/tools/testing/selftests/prctl/disable-tsc-on-off-stress-test.c
+++ b/tools/testing/selftests/prctl/disable-tsc-on-off-stress-test.c
@@ -83,7 +83,7 @@ int main(void)
{
int n_tasks = 100, i;
- fprintf(stderr, "[No further output means we're allright]\n");
+ fprintf(stderr, "[No further output means we're all right]\n");
for (i=0; i<n_tasks; i++)
if (fork() == 0)
--
2.30.2
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific masks.
The page addresses are returned in struct page_region in a compact form.
The max_pages is needed to support a use case where user only wants to get
a specific number of pages. So there is no need to find all the pages of
interest in the range when max_pages is specified. The IOCTL returns when
the maximum number of the pages are found. The max_pages is optional. If
max_pages is specified, it must be equal or greater than the vec_size.
This restriction is needed to handle worse case when one page_region only
contains info of one page and it cannot be compacted. This is needed to
emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an example
for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as
well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (3):
userfaultfd: Add UFFD WP Async support
fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about
PTEs
selftests: vm: add pagemap ioctl tests
fs/proc/task_mmu.c | 290 +++++++
fs/userfaultfd.c | 11 +
include/linux/userfaultfd_k.h | 6 +
include/uapi/linux/fs.h | 50 ++
include/uapi/linux/userfaultfd.h | 8 +-
mm/memory.c | 23 +-
tools/include/uapi/linux/fs.h | 50 ++
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 5 +-
tools/testing/selftests/vm/pagemap_ioctl.c | 881 +++++++++++++++++++++
10 files changed, 1319 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
Dzień dobry,
rozważali Państwo wybór finansowania, które spełni potrzeby firmy, zapewniając natychmiastowy dostęp do gotówki, bez zbędnych przestojów?
Przygotowaliśmy rozwiązania faktoringowe dopasowane do Państwa branży i wielkości firmy, dzięki którym, nie muszą Państwo martwić się o niewypłacalność kontrahentów, ponieważ transakcje są zabezpieczone i posiadają gwarancję spłaty.
Chcą Państwo przeanalizować dostępne opcje?
Pozdrawiam
Szczepan Kiełbasa