From: "Tyler Hicks" <code(a)tyhicks.com>
The backport of commit 05c2224d4b04 ("KVM: selftests: Fix number of
pages for memory slot in memslot_modification_stress_test") broke the
build of the KVM selftest memslot_modification_stress_test.c source file
in two ways:
- Incorrectly assumed that max_t() was defined despite commit
5cf67a6051ea ("tools/include: Add _RET_IP_ and math definitions to
kernel.h") not being present
- Incorrectly assumed that kvm_vm struct members could be directly
accessed despite b530eba14c70 ("KVM: selftests: Get rid of
kvm_util_internal.h") not being present
Backport the first commit, as it is simple enough. Work around the lack
of the second commit by using the accessors to get to the kvm_vm struct
members.
Note that the linux-6.0.y backport of commit 05c2224d4b04 ("KVM:
selftests: Fix number of pages for memory slot in
memslot_modification_stress_test") is fine because the two prerequisite
commits, mentioned above, are both present in v6.0.
Tyler
Karolina Drobnik (1):
tools/include: Add _RET_IP_ and math definitions to kernel.h
Tyler Hicks (Microsoft) (1):
KVM: selftests: Fix build regression by using accessor function
tools/include/linux/kernel.h | 6 ++++++
.../selftests/kvm/memslot_modification_stress_test.c | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
--
2.34.1
The patch titled
Subject: mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
Date: Sat, 24 Dec 2022 00:20:35 -0800
SHMEM_HUGE_DENY is for emergency use by the admin, to disable allocation
of shmem huge pages if, for example, a dangerous bug is found in their
usage: see "deny" in Documentation/mm/transhuge.rst. An app using
madvise(,,MADV_COLLAPSE) should not be allowed to override it: restore its
precedence over shmem_huge_force.
Restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE.
Link: https://lkml.kernel.org/r/20221224082035.3197140-2-zokeefe@google.com
Fixes: 7c6c6cc4d3a2 ("mm/shmem: add flag to enforce shmem THP in hugepage_vma_check()")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Suggested-by: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/shmem.c~mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse
+++ a/mm/shmem.c
@@ -478,12 +478,10 @@ bool shmem_is_huge(struct vm_area_struct
if (vma && ((vma->vm_flags & VM_NOHUGEPAGE) ||
test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)))
return false;
- if (shmem_huge_force)
- return true;
- if (shmem_huge == SHMEM_HUGE_FORCE)
- return true;
if (shmem_huge == SHMEM_HUGE_DENY)
return false;
+ if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+ return true;
switch (SHMEM_SB(inode->i_sb)->huge) {
case SHMEM_HUGE_ALWAYS:
_
Patches currently in -mm which might be from zokeefe(a)google.com are
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
The patch titled
Subject: mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Zach O'Keefe" <zokeefe(a)google.com>
Subject: mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
Date: Sat, 24 Dec 2022 00:20:34 -0800
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped. One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results. An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region. An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.
Don't expand the acted-on region when refetching vma->vm_end.
Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/khugepaged.c~mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end
+++ a/mm/khugepaged.c
@@ -2647,7 +2647,7 @@ int madvise_collapse(struct vm_area_stru
goto out_nolock;
}
- hend = vma->vm_end & HPAGE_PMD_MASK;
+ hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
_
Patches currently in -mm which might be from zokeefe(a)google.com are
mm-madv_collapse-dont-expand-collapse-when-vm_end-is-past-requested-end.patch
mm-shmem-restore-shmem_huge_deny-precedence-over-madv_collapse.patch
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into
the IPA space as part as a read-only memslot.
Not only this is legitimate, but it also results in added security,
so thumbs up. However, this clashes mildly with our handling of a S1PTW
as a write to correctly handle AF/DB updates to the S1 PTs, and results
in the guest taking an abort it won't recover from (the PTs mapping the
vectors will suffer freom the same problem...).
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:
- On S1PTW translation fault, handle the fault as a read
- On S1PTW permission fault, handle the fault as a write
This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..fd6ad8b21f85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
{
- if (kvm_vcpu_abt_iss1tw(vcpu))
- return true;
+ if (kvm_vcpu_abt_iss1tw(vcpu)) {
+ /*
+ * Only a permission fault on a S1PTW should be
+ * considered as a write. Otherwise, page tables baked
+ * in a read-only memslot will result in an exception
+ * being delivered in the guest.
+ *
+ * The drawback is that we end-up fauling twice if the
+ * guest is using any of HW AF/DB: a translation fault
+ * to map the page containing the PT (read only at
+ * first), then a permission fault to allow the flags
+ * to be set.
+ */
+ switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+ case ESR_ELx_FSC_PERM:
+ return true;
+ default:
+ return false;
+ }
+ }
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;
--
2.34.1
The following changes since commit 830b3c68c1fb1e9176028d02ef86f3cf76aa2476:
Linux 6.1 (2022-12-11 14:15:18 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 98dd6b2ef50d6f7876606a86c8d8a767c9fef6f5:
virtio_blk: mark all zone fields LE (2022-12-22 14:32:36 -0500)
Note: merging this upstream results in a conflict
between commit:
de4eda9de2d9 ("use less confusing names for iov_iter direction initializers")
from Linus' tree and commit:
("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
from this tree.
This resolution below fixes it up, due to Stephen Rothwell
diff --cc drivers/vhost/vsock.c
index cd6f7776013a,830bc823addc..000000000000
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@@ -165,8 -157,9 +157,9 @@@ vhost_transport_do_send_pkt(struct vhos
break;
}
- iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len);
+ iov_iter_init(&iov_iter, ITER_DEST, &vq->iov[out], in, iov_len);
- payload_len = pkt->len - pkt->off;
+ payload_len = skb->len;
+ hdr = virtio_vsock_hdr(skb);
/* If the packet is greater than the space available in the
* buffer, we split it using multiple buffers.
@@@ -366,18 -340,21 +340,22 @@@ vhost_vsock_alloc_skb(struct vhost_virt
return NULL;
}
- pkt = kzalloc(sizeof(*pkt), GFP_KERNEL);
- if (!pkt)
+ len = iov_length(vq->iov, out);
+
+ /* len contains both payload and hdr */
+ skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
+ if (!skb)
return NULL;
- iov_iter_init(&iov_iter, WRITE, vq->iov, out, len);
+ len = iov_length(vq->iov, out);
+ iov_iter_init(&iov_iter, ITER_SOURCE, vq->iov, out, len);
- nbytes = copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
- if (nbytes != sizeof(pkt->hdr)) {
+ hdr = virtio_vsock_hdr(skb);
+ nbytes = copy_from_iter(hdr, sizeof(*hdr), &iov_iter);
+ if (nbytes != sizeof(*hdr)) {
vq_err(vq, "Expected %zu bytes for pkt->hdr, got %zu bytes\n",
- sizeof(pkt->hdr), nbytes);
- kfree(pkt);
+ sizeof(*hdr), nbytes);
+ kfree_skb(skb);
return NULL;
}
It can also be found in linux-next, see next-20221220.
----------------------------------------------------------------
virtio,vhost,vdpa: features, fixes, cleanups
zoned block device support
lifetime stats support (for virtio devices backed by memory supporting that)
vsock rework to use skbuffs
ifcvf features provisioning
new SolidNET DPU driver
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Alvaro Karsz (5):
Add SolidRun vendor id
New PCI quirk for SolidRun SNET DPU.
virtio: vdpa: new SolidNET DPU driver.
virtio_blk: add VIRTIO_BLK_F_LIFETIME feature support
virtio: vdpa: fix snprintf size argument in snet_vdpa driver
Angus Chen (2):
virtio_pci: modify ENOENT to EINVAL
virtio_blk: use UINT_MAX instead of -1U
Bobby Eshleman (1):
virtio/vsock: replace virtio_vsock_pkt with sk_buff
Cindy Lu (2):
vhost_vdpa: fix the crash in unmap a large memory
vdpa_sim_net: should not drop the multicast/broadcast packet
Colin Ian King (1):
RDMA/mlx5: remove variable i
Davidlohr Bueso (2):
tools/virtio: remove stray characters
tools/virtio: remove smp_read_barrier_depends()
Dawei Li (1):
virtio: Implementing attribute show with sysfs_emit
Dmitry Fomichev (2):
virtio-blk: use a helper to handle request queuing errors
virtio-blk: add support for zoned block devices
Eli Cohen (8):
vdpa/mlx5: Fix rule forwarding VLAN to TIR
vdpa/mlx5: Return error on vlan ctrl commands if not supported
vdpa/mlx5: Fix wrong mac address deletion
vdpa/mlx5: Avoid using reslock in event_handler
vdpa/mlx5: Avoid overwriting CVQ iotlb
vdpa/mlx5: Move some definitions to a new header file
vdpa/mlx5: Add debugfs subtree
vdpa/mlx5: Add RX counters to debugfs
Eugenio Pérez (1):
vdpa_sim_net: Offer VIRTIO_NET_F_STATUS
Harshit Mogalapalli (1):
vduse: Validate vq_num in vduse_validate_config()
Jason Wang (2):
vdpa: conditionally fill max max queue pair for stats
vdpasim: fix memory leak when freeing IOTLBs
Michael S. Tsirkin (3):
virtio_blk: temporary variable type tweak
virtio_blk: zone append in header type tweak
virtio_blk: mark all zone fields LE
Michael Sammler (1):
virtio_pmem: populate numa information
Rafael Mendonca (1):
virtio_blk: Fix signedness bug in virtblk_prep_rq()
Ricardo Cañuelo (2):
tools/virtio: initialize spinlocks in vring_test.c
docs: driver-api: virtio: virtio on Linux
Rong Wang (1):
vdpa/vp_vdpa: fix kfree a wrong pointer in vp_vdpa_remove
Shaomin Deng (1):
tools: Delete the unneeded semicolon after curly braces
Shaoqin Huang (2):
virtio_pci: use helper function is_power_of_2()
virtio_ring: use helper function is_power_of_2()
Si-Wei Liu (1):
vdpa: merge functionally duplicated dev_features attributes
Stefano Garzarella (4):
vringh: fix range used in iotlb_translate()
vhost: fix range used in translate_desc()
vhost-vdpa: fix an iotlb memory leak
vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
Wei Yongjun (1):
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
Yuan Can (1):
vhost/vsock: Fix error handling in vhost_vsock_init()
Zhu Lingshan (12):
vDPA/ifcvf: decouple hw features manipulators from the adapter
vDPA/ifcvf: decouple config space ops from the adapter
vDPA/ifcvf: alloc the mgmt_dev before the adapter
vDPA/ifcvf: decouple vq IRQ releasers from the adapter
vDPA/ifcvf: decouple config IRQ releaser from the adapter
vDPA/ifcvf: decouple vq irq requester from the adapter
vDPA/ifcvf: decouple config/dev IRQ requester and vectors allocator from the adapter
vDPA/ifcvf: ifcvf_request_irq works on ifcvf_hw
vDPA/ifcvf: manage ifcvf_hw in the mgmt_dev
vDPA/ifcvf: allocate the adapter in dev_add()
vDPA/ifcvf: retire ifcvf_private_to_vf
vDPA/ifcvf: implement features provisioning
ruanjinjie (1):
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
wangjianli (1):
tools/virtio: Variable type completion
Documentation/driver-api/index.rst | 1 +
Documentation/driver-api/virtio/index.rst | 11 +
Documentation/driver-api/virtio/virtio.rst | 144 +++
.../driver-api/virtio/writing_virtio_drivers.rst | 197 ++++
MAINTAINERS | 6 +
drivers/block/virtio_blk.c | 522 ++++++++-
.../crypto/virtio/virtio_crypto_skcipher_algs.c | 3 +-
drivers/nvdimm/virtio_pmem.c | 11 +-
drivers/pci/quirks.c | 8 +
drivers/vdpa/Kconfig | 22 +
drivers/vdpa/Makefile | 1 +
drivers/vdpa/ifcvf/ifcvf_base.c | 32 +-
drivers/vdpa/ifcvf/ifcvf_base.h | 10 +-
drivers/vdpa/ifcvf/ifcvf_main.c | 162 ++-
drivers/vdpa/mlx5/Makefile | 2 +-
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 5 +-
drivers/vdpa/mlx5/core/mr.c | 46 +-
drivers/vdpa/mlx5/net/debug.c | 152 +++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 252 +++--
drivers/vdpa/mlx5/net/mlx5_vnet.h | 94 ++
drivers/vdpa/solidrun/Makefile | 6 +
drivers/vdpa/solidrun/snet_hwmon.c | 188 ++++
drivers/vdpa/solidrun/snet_main.c | 1111 ++++++++++++++++++++
drivers/vdpa/solidrun/snet_vdpa.h | 196 ++++
drivers/vdpa/vdpa.c | 11 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +-
drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 +-
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 8 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 3 +
drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +-
drivers/vhost/vdpa.c | 52 +-
drivers/vhost/vhost.c | 4 +-
drivers/vhost/vringh.c | 5 +-
drivers/vhost/vsock.c | 224 ++--
drivers/virtio/virtio.c | 12 +-
drivers/virtio/virtio_pci_modern.c | 4 +-
drivers/virtio/virtio_ring.c | 2 +-
include/linux/pci_ids.h | 2 +
include/linux/virtio_config.h | 8 +-
include/linux/virtio_vsock.h | 126 ++-
include/uapi/linux/vdpa.h | 4 +-
include/uapi/linux/virtio_blk.h | 133 +++
include/uapi/linux/virtio_blk_ioctl.h | 44 +
net/vmw_vsock/virtio_transport.c | 149 +--
net/vmw_vsock/virtio_transport_common.c | 420 ++++----
net/vmw_vsock/vsock_loopback.c | 51 +-
tools/virtio/ringtest/main.h | 37 +-
tools/virtio/virtio-trace/trace-agent-ctl.c | 2 +-
tools/virtio/virtio_test.c | 2 +-
tools/virtio/vringh_test.c | 2 +
50 files changed, 3661 insertions(+), 839 deletions(-)
create mode 100644 Documentation/driver-api/virtio/index.rst
create mode 100644 Documentation/driver-api/virtio/virtio.rst
create mode 100644 Documentation/driver-api/virtio/writing_virtio_drivers.rst
create mode 100644 drivers/vdpa/mlx5/net/debug.c
create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h
create mode 100644 drivers/vdpa/solidrun/Makefile
create mode 100644 drivers/vdpa/solidrun/snet_hwmon.c
create mode 100644 drivers/vdpa/solidrun/snet_main.c
create mode 100644 drivers/vdpa/solidrun/snet_vdpa.h
create mode 100644 include/uapi/linux/virtio_blk_ioctl.h
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds
supplied by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might
have changed while mmap_lock was dropped. One thing that might occur,
is that the VMA could be resized, and as such, we refetch vma->vm_end
to make sure we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma>vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false
failure" user-visible results. An example of the former is if we
MADV_COLLAPSE the first 4MiB of a 2TiB mmap()'d file, the incorrect
refetch would cause the operation to block for much longer than
anticipated as we attempt to collapse the entire TiB region. An example
of the latter is that applying MADV_COLLPSE to a 4MiB file mapped to the
start of a 6MiB VMA will successfully collapse the first 4MiB, then
incorrectly attempt to collapse the last hugepage-aligned/sized region
-- fail (since readahead/page cache lookup will fail) -- and report a
failure to the user.
Don't expand the acted-on region when refetching vma->vm_end.
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Reported-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
v2->v3: Add 'Cc: stable(a)vger.kernel.org' as per stable-kernel-rules.
v1->v2: Updated changelog to make clear what user-visible issues this
patch addresses, as well makes the case for backporting (Andrew
Morton).
While there aren't any stability risks, without this patch there exist
trivial examples where MADV_COLLAPSE won't work; as such, this should be
backported to stable 6.1.X to make MADV_COLLAPSE dependable in such
cases.
v1: https://lore.kernel.org/linux-mm/CAAa6QmRx_b2UCJWE2XZ3=3c3-_N3R4cDGX6Wm4OT7…
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5cb401aa2b9d..b4d2ec0a94ed 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2649,7 +2649,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
goto out_nolock;
}
- hend = vma->vm_end & HPAGE_PMD_MASK;
+ hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
--
2.39.0.314.g84b9a713c41-goog
MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds
supplied by the user.
At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might
have changed while mmap_lock was dropped. One thing that might occur,
is that the VMA could be resized, and as such, we refetch vma->vm_end
to make sure we don't collapse past the end of the VMA's new end.
However, it's possible that when refetching vma>vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.
The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false
failure" user-visible results. An example of the former is if we
MADV_COLLAPSE the first 4MiB of a 2TiB mmap()'d file, the incorrect
refetch would cause the operation to block for much longer than
anticipated as we attempt to collapse the entire TiB region. An example
of the latter is that applying MADV_COLLPSE to a 4MiB file mapped to the
start of a 6MiB VMA will successfully collapse the first 4MiB, then
incorrectly attempt to collapse the last hugepage-aligned/sized region
-- fail (since readahead/page cache lookup will fail) -- and report a
failure to the user.
Don't expand the acted-on region when refetching vma->vm_end.
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Reported-by: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Zach O'Keefe <zokeefe(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
---
v1->v2 : Updated changelog to make clear what user-visible issues this
patch addresses, as well makes the case for backporting (Andrew
Morton).
While there aren't any stability risks, without this patch there exist
trivial examples where MADV_COLLAPSE won't work; as such, this should be
backported to stable 6.1.X to make MADV_COLLAPSE dependable in such
cases.
v1: https://lore.kernel.org/linux-mm/CAAa6QmRx_b2UCJWE2XZ3=3c3-_N3R4cDGX6Wm4OT7…
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5cb401aa2b9d..b4d2ec0a94ed 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2649,7 +2649,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev,
goto out_nolock;
}
- hend = vma->vm_end & HPAGE_PMD_MASK;
+ hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
mmap_assert_locked(mm);
memset(cc->node_load, 0, sizeof(cc->node_load));
--
2.39.0.314.g84b9a713c41-goog
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index bbbb067de8ec..be01d49abb62 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -806,6 +806,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register - initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -816,8 +819,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index bbbb067de8ec..be01d49abb62 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -806,6 +806,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register - initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -816,8 +819,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index 97d86dc17c42..1eb1230a2d28 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -821,6 +821,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register - initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -831,8 +834,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index 0c6d17503a11..3ce572d7c26d 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -869,6 +869,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register() - Initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -879,8 +882,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index ea53b30cf483..743e629d60d2 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -866,6 +866,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register() - Initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -876,8 +879,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index ea53b30cf483..743e629d60d2 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -866,6 +866,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register() - Initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -876,8 +879,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
From: Yang Yingliang <yangyingliang(a)huawei.com>
[ Upstream commit 1662cea4623f75d8251adf07370bbaa958f0355d ]
Inject fault while loading module, kset_register() may fail.
If it fails, the kset.kobj.name allocated by kobject_set_name()
which must be called before a call to kset_register() may be
leaked, since refcount of kobj was set in kset_init().
To mitigate this, we free the name in kset_register() when an
error is encountered, i.e. when kset_register() returns an error.
A kset may be embedded in a larger structure which may be dynamically
allocated in callers, it needs to be freed in ktype.release() or error
path in callers, in this case, we can not call kset_put() in kset_register(),
or it will cause double free, so just call kfree_const() to free the
name and set it to NULL to avoid accessing bad pointer in callers.
With this fix, the callers don't need care about freeing the name
and may call kset_put() if kset_register() fails.
Suggested-by: Luben Tuikov <luben.tuikov(a)amd.com>
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Reviewed-by: <luben.tuikov(a)amd.com>
Link: https://lore.kernel.org/r/20221025071549.1280528-1-yangyingliang@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kobject.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index 5f0e71ab292c..0f9cc0b93d99 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -834,6 +834,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
/**
* kset_register() - Initialize and add a kset.
* @k: kset.
+ *
+ * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
+ * is freed, it can not be used any more.
*/
int kset_register(struct kset *k)
{
@@ -844,8 +847,12 @@ int kset_register(struct kset *k)
kset_init(k);
err = kobject_add_internal(&k->kobj);
- if (err)
+ if (err) {
+ kfree_const(k->kobj.name);
+ /* Set it to NULL to avoid accessing bad pointer in callers. */
+ k->kobj.name = NULL;
return err;
+ }
kobject_uevent(&k->kobj, KOBJ_ADD);
return 0;
}
--
2.35.1
The patch titled
Subject: mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-enable-writenotify-while-userfaultfd-wp-is-enabled-for-a-vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
Date: Fri, 9 Dec 2022 09:09:12 +0100
Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb). The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).
So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect). For example, when enabling softdirty tracking, we enable
writenotify. With uffd-wp on shared mappings, that changed. More details
on vma->vm_page_prot semantics were summarized in [1].
This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().
Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable. In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.
This fixes two known cases:
(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
writable, resulting in uffd-wp not triggering on write access.
Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. On
such a VMA, userfaultfd-wp is currently non-functional.
Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.
Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access. This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.
[1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.com
Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Ives van Hoorne <ives(a)codesandbox.io>
Debugged-by: Peter Xu <peterx(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++------
mm/mmap.c | 4 ++++
2 files changed, 26 insertions(+), 6 deletions(-)
--- a/fs/userfaultfd.c~mm-userfaultfd-enable-writenotify-while-userfaultfd-wp-is-enabled-for-a-vma
+++ a/fs/userfaultfd.c
@@ -108,6 +108,21 @@ static bool userfaultfd_is_initialized(s
return ctx->features & UFFD_FEATURE_INITIALIZED;
}
+static void userfaultfd_set_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP;
+
+ vma->vm_flags = flags;
+ /*
+ * For shared mappings, we want to enable writenotify while
+ * userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply
+ * recalculate vma->vm_page_prot whenever userfaultfd-wp changes.
+ */
+ if ((vma->vm_flags & VM_SHARED) && uffd_wp_changed)
+ vma_set_page_prot(vma);
+}
+
static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
int wake_flags, void *key)
{
@@ -618,7 +633,8 @@ static void userfaultfd_event_wait_compl
for_each_vma(vmi, vma) {
if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) {
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
- vma->vm_flags &= ~__VM_UFFD_FLAGS;
+ userfaultfd_set_vm_flags(vma,
+ vma->vm_flags & ~__VM_UFFD_FLAGS);
}
}
mmap_write_unlock(mm);
@@ -652,7 +668,7 @@ int dup_userfaultfd(struct vm_area_struc
octx = vma->vm_userfaultfd_ctx.ctx;
if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) {
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
- vma->vm_flags &= ~__VM_UFFD_FLAGS;
+ userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS);
return 0;
}
@@ -733,7 +749,7 @@ void mremap_userfaultfd_prep(struct vm_a
} else {
/* Drop uffd context if remap feature not enabled */
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
- vma->vm_flags &= ~__VM_UFFD_FLAGS;
+ userfaultfd_set_vm_flags(vma, vma->vm_flags & ~__VM_UFFD_FLAGS);
}
}
@@ -895,7 +911,7 @@ static int userfaultfd_release(struct in
prev = vma;
}
- vma->vm_flags = new_flags;
+ userfaultfd_set_vm_flags(vma, new_flags);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
mmap_write_unlock(mm);
@@ -1463,7 +1479,7 @@ static int userfaultfd_register(struct u
* the next vma was merged into the current one and
* the current one has not been updated yet.
*/
- vma->vm_flags = new_flags;
+ userfaultfd_set_vm_flags(vma, new_flags);
vma->vm_userfaultfd_ctx.ctx = ctx;
if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
@@ -1648,7 +1664,7 @@ static int userfaultfd_unregister(struct
* the next vma was merged into the current one and
* the current one has not been updated yet.
*/
- vma->vm_flags = new_flags;
+ userfaultfd_set_vm_flags(vma, new_flags);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
skip:
--- a/mm/mmap.c~mm-userfaultfd-enable-writenotify-while-userfaultfd-wp-is-enabled-for-a-vma
+++ a/mm/mmap.c
@@ -1524,6 +1524,10 @@ int vma_wants_writenotify(struct vm_area
if (vma_soft_dirty_enabled(vma) && !is_vm_hugetlb_page(vma))
return 1;
+ /* Do we need write faults for uffd-wp tracking? */
+ if (userfaultfd_wp(vma))
+ return 1;
+
/* Specialty mapping? */
if (vm_flags & VM_PFNMAP)
return 0;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-swap-fix-swp_pfn_bits-with-config_phys_addr_t_64bit-on-32bit.patch
mm-swap-fix-swp_pfn_bits-with-config_phys_addr_t_64bit-on-32bit-v2.patch
mm-swap-fix-swp_pfn_bits-with-config_phys_addr_t_64bit-on-32bit-fix.patch
mm-userfaultfd-enable-writenotify-while-userfaultfd-wp-is-enabled-for-a-vma.patch
selftests-vm-add-ksm-unmerge-tests.patch
mm-pagewalk-dont-trigger-test_walk-in-walk_page_vma.patch
selftests-vm-add-test-to-measure-madv_unmergeable-performance.patch
mm-ksm-simplify-break_ksm-to-not-rely-on-vm_fault_write.patch
mm-remove-vm_fault_write.patch
mm-ksm-fix-ksm-cow-breaking-with-userfaultfd-wp-via-fault_flag_unshare.patch
mm-pagewalk-add-walk_page_range_vma.patch
mm-ksm-convert-break_ksm-to-use-walk_page_range_vma.patch
mm-gup-remove-foll_migration.patch
mm-gup_test-fix-pin_longterm_test_read-with-highmem.patch
selftests-vm-madv_populate-fix-missing-madv_populate_readwrite-definitions.patch
selftests-vm-cow-fix-compile-warning-on-32bit.patch
selftests-vm-ksm_functional_tests-fixes-for-32bit.patch
Hello stable team and Greg,
There are 3 commits in Linus' tree for the rtc driver which should be
merged against stable 6.0.y (they're already in 6.1 / 6.1.y).
Without the first two, a x86-64 machine might panic during boot (Mel saw
a 50% chance of panic at boot - 5 out of 10 tries - and my experience
was identical).
https://lore.kernel.org/linux-acpi/20221010141630.zfzi7mk7zvnmclzy@techsing…
And after applying the first two, the kernel will not compile anymore
on non ACPI platform, so you need a third one.
The first two commits:
commit 4919d3eb2ec0ee364f7e3cf2d99646c1b224fae8
Author: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Date: Wed Oct 12 20:07:01 2022 +0200
rtc: cmos: Fix event handler registration ordering issue
commit 0782b66ed2fbb035dda76111df0954515e417b24
Author: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Date: Tue Oct 18 18:09:31 2022 +0200
rtc: cmos: Fix wake alarm breakage
And the third one:
commit db4e955ae333567dea02822624106c0b96a2f84f
Author: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Date: Tue Oct 18 22:35:11 2022 +0200
rtc: cmos: fix build on non-ACPI platforms
Cheers,
--
Mathieu Chouquet-Stringer me(a)mathieu.digital
The sun itself sees not till heaven clears.
-- William Shakespeare --
Function pointer ki_complete() expects 'long' as its second
argument, but we pass integer from ffs_user_copy_worker. This
might cause a CFI failure, as ki_complete is an indirect call
with mismatched prototype. Fix this by typecasting the second
argument to long.
Cc: <stable(a)vger.kernel.org> # 5.15
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/gadget/function/f_fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 73dc10a7..9c26561 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -835,7 +835,7 @@ static void ffs_user_copy_worker(struct work_struct *work)
kthread_unuse_mm(io_data->mm);
}
- io_data->kiocb->ki_complete(io_data->kiocb, ret);
+ io_data->kiocb->ki_complete(io_data->kiocb, (long)ret);
if (io_data->ffs->ffs_eventfd && !kiocb_has_eventfd)
eventfd_signal(io_data->ffs->ffs_eventfd, 1);
--
2.7.4