From: Jeff Xu <jeffxu(a)chromium.org>
Pedro Falcato's optimization [1] for checking sealed VMAs, which replaces
the can_modify_mm() function with an in-loop check, necessitates an update
to the mseal.rst documentation to reflect this change.
Furthermore, the document has received offline comments regarding the code
sample and suggestions for sentence clarification to enhance reader
comprehension.
[1] https://lore.kernel.org/linux-mm/20240817-mseal-depessimize-v3-0-d8d2e037df…
History:
V3: update according to Randy Dunlap's comment
V2: update according to Randy Dunlap's comments.
https://lore.kernel.org/all/20241001002628.2239032-1-jeffxu@chromium.org/
V1: initial version
https://lore.kernel.org/all/20240927185211.729207-1-jeffxu@chromium.org/
Jeff Xu (1):
mseal: update mseal.rst
Documentation/userspace-api/mseal.rst | 307 +++++++++++++-------------
1 file changed, 148 insertions(+), 159 deletions(-)
--
2.47.0.rc0.187.ge670bccf7e-goog
This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It
allows hypervisors to inhibit remote TLB flushing of a vCPU coming from
Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and
HvFlushirtualAddressList(Ex)). It is required to implement the
HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
emulate VSM within KVM and QEMU. The hyper-call requires several new KVM
APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.
Once the inhibit flag is set, any processor attempting to flush the TLB on
the marked vCPU, with a HyperV hyper-call, will be suspended until the
flag is cleared again. During the suspension the vCPU will not run at all,
neither receiving events nor running other code. It will wake up from
suspension once the vCPU it is waiting on clears the inhibit flag. This
behaviour is specified in Microsoft's "Hypervisor Top Level Functional
Specification" (TLFS).
The vCPU will block execution during the suspension, making it transparent
to the hypervisor. An alternative design to what is proposed here would be
to exit from the Hyper-V hypercall upon finding an inhibited vCPU. We
decided against it, to allow for a simpler and more performant
implementation. Exiting to user space would create an additional
synchronisation burden and make the resulting code more complex.
Additionally, since the suspension is specific to HyperV events, it
wouldn't provide any functional benefits.
The TLFS specifies that the instruction pointer is not moved during the
suspension, so upon unsuspending the hyper-calls is re-executed. This
means that, if the vCPU encounters another inhibited TLB and is
resuspended, any pending events and interrupts are still executed. This is
identical to the vCPU receiving such events right before the hyper-call.
This inhibiting of TLB flushes is necessary, to securely implement
intercepts. These allow a higher "Virtual Trust Level" (VTL) to react to
a lower VTL accessing restricted memory. In such an intercept the VTL may
want to emulate a memory access in software, however, if another processor
flushes the TLB during that operation, incorrect behaviour can result.
The patch series includes basic testing of the ioctl and suspension state.
All previously passing KVM selftests and KVM unit tests still pass.
Series overview:
- 1: Document the new ioctl
- 2: Implement the suspension state
- 3: Update TLB flush hyper-call in preparation
- 4-5: Implement the ioctl
- 6: Add traces
- 7: Implement testing
As the suspension state is transparent to the hypervisor, testing is
complicated. The current version makes use of a set time intervall to give
the vCPU time to enter the hyper-call and get suspended. Ideas for
improvement on this are very welcome.
This series, alongside my series [1] implementing KVM_TRANSLATE2, the
series by Nicolas Saenz Julienne [2] implementing the core building blocks
for VSM and the accompanying QEMU implementation [3], is capable of
booting Windows Server 2019 with VSM/CredentialGuard enabled.
All three series are also available on GitHub [4].
[1] https://lore.kernel.org/linux-kernel/20240910152207.38974-1-nikwip@amazon.d…
[2] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.c…
[3] https://github.com/vianpl/qemu/tree/vsm/next
[4] https://github.com/vianpl/linux/tree/vsm/next
Best,
Nikolas
Nikolas Wipper (7):
KVM: Add API documentation for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement Hyper-V's vCPU suspended state
KVM: x86: Check vCPUs before enqueuing TLB flushes in
kvm_hv_flush_tlb()
KVM: Introduce KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Add trace events to track Hyper-V suspensions
KVM: selftests: Add tests for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
Documentation/virt/kvm/api.rst | 41 +++
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/kvm/hyperv.c | 86 +++++-
arch/x86/kvm/hyperv.h | 17 ++
arch/x86/kvm/trace.h | 39 +++
arch/x86/kvm/x86.c | 41 ++-
include/uapi/linux/kvm.h | 15 +
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/x86_64/hyperv_tlb_flush_inhibit.c | 274 ++++++++++++++++++
9 files changed, 503 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush_inhibit.c
--
2.40.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.
- Patch 1: Explicitly reject such requests. A fix for >= v5.12.
- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- This new version fixes the root cause for the issue Cong Wang sent a
patch for a few weeks ago, see the v1, and the explanations below. The
new version is very different from the v1, from a different author.
Thanks to Cong Wang for the first analysis, and to Paolo for having
spot the root cause, and sent a fix for it.
- Link to v1: https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangcong@gmail.com
- Link: https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com
---
Paolo Abeni (2):
mptcp: prevent MPC handshake on port-based signal endpoints
selftests: mptcp: join: test for prohibited MPC to port-based endp
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 11 +++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +++++++++++++++++-------
6 files changed, 101 insertions(+), 31 deletions(-)
---
base-commit: 174714f0e505070a16be6fbede30d32b81df789f
change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
DAMON debugfs interface was the only user interface of DAMON at the
beginning[1]. However, it turned out the interface would be not good
enough for long-term flexibility and stability.
In Feb 2022[2], we therefore introduced DAMON sysfs interface as an
alternative user interface that aims long-term flexibility and
stability. With its introduction, DAMON debugfs interface has announced
to be deprecated in near future.
In Feb 2023[3], we announced the official deprecation of DAMON debugfs
interface. In Jan 2024[4], we further made the deprecation difficult to
be ignored.
And as of this writing (2024-10-14), no problem or concerns about the
deprecation have reported. Apparently users are already moved to the
alternative, or made good plans for the change.
Remove the DAMON debugfs interface code from the tree. Given the past
timeline and the absence of reported problems or concerns, it is safe
enough to be done. That said, we will not drop the RFC tag of this
patch series at least until the end of this year, to use this as the
real last call for users.
[1] https://lore.kernel.org/20210716081449.22187-1-sj38.park@gmail.com
[2] https://lore.kernel.org/20220228081314.5770-1-sj@kernel.org
[3] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org
[4] https://lore.kernel.org/20240130013549.89538-1-sj@kernel.org
SeongJae Park (7):
Docs/admin-guide/mm/damon/usage: remove DAMON debugfs interface
documentation
Docs/mm/damon/design: update for removal of DAMON debugfs interface
selftests/damon/config: remove configs for DAMON debugfs interface
selftests
selftests/damon: remove tests for DAMON debugfs interface
kunit: configs: remove configs for DAMON debugfs interface tests
mm/damon: remove DAMON debugfs interface kunit tests
mm/damon: remove DAMON debugfs interface
Documentation/admin-guide/mm/damon/usage.rst | 309 -----
Documentation/mm/damon/design.rst | 23 +-
mm/damon/Kconfig | 30 -
mm/damon/Makefile | 1 -
mm/damon/dbgfs.c | 1148 -----------------
mm/damon/tests/.kunitconfig | 7 -
mm/damon/tests/dbgfs-kunit.h | 173 ---
tools/testing/kunit/configs/all_tests.config | 3 -
tools/testing/selftests/damon/.gitignore | 3 -
tools/testing/selftests/damon/Makefile | 11 +-
tools/testing/selftests/damon/config | 1 -
.../testing/selftests/damon/debugfs_attrs.sh | 17 -
.../debugfs_duplicate_context_creation.sh | 27 -
.../selftests/damon/debugfs_empty_targets.sh | 21 -
.../damon/debugfs_huge_count_read_write.sh | 22 -
.../damon/debugfs_rm_non_contexts.sh | 19 -
.../selftests/damon/debugfs_schemes.sh | 19 -
.../selftests/damon/debugfs_target_ids.sh | 19 -
.../damon/debugfs_target_ids_pid_leak.c | 68 -
.../damon/debugfs_target_ids_pid_leak.sh | 22 -
...fs_target_ids_read_before_terminate_race.c | 80 --
...s_target_ids_read_before_terminate_race.sh | 14 -
.../selftests/damon/huge_count_read_write.c | 48 -
23 files changed, 11 insertions(+), 2074 deletions(-)
delete mode 100644 mm/damon/dbgfs.c
delete mode 100644 mm/damon/tests/dbgfs-kunit.h
delete mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_duplicate_context_creation.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_empty_targets.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_rm_non_contexts.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_schemes.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.sh
delete mode 100644 tools/testing/selftests/damon/huge_count_read_write.c
base-commit: 5ef943709a1b88304aa6e8cb8683a25bf81874f0
--
2.39.5
PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.
This scenario was identified while studying DPDK pmd_af_packet_drv.
Since sockets are only created during initialization, there is no reason
to fail the initialization if a single link is temporarily down.
This patch allows PACKET socket to join a fanout while not RUNNING.
Selftest psock_fanout is extended to test this "fanout while link down"
scenario.
Selftest psock_fanout is also extended to test fanout create/join by
socket that did not bind or specified a protocol, which carries an
implicit bind.
This is the only test that was performed.
Changes:
V04:
* Minimized code change.
* Removed test of ifindex. A socket that went through bind "unlisted" race can
join a fanout.
V03: https://lore.kernel.org/netdev/cover.1728555449.git.gur.stavi@huawei.com
* psock_fanout: add test for joining fanout with unbound socket.
* Test that socket can receive packets before adding it to a fanout match.
This is kind of replaces the RUNNING test that was removed.
* Initialize po->ifindex in packet_create. To -1 if no protocol is specified
and add an explicit initialization to 0 if protocol is specified.
* Refactor relevant code in fanout_add within bind_lock, as a sequence of
if {} else if {}, in order to reduce indentation of nested if statements and
provide specific error codes.
V02: https://lore.kernel.org/netdev/cover.1728382839.git.gur.stavi@huawei.com
* psock_fanout: use explicit loopback up/down instead of toggle.
* psock_fanout: don't try to restore loopback state on failure.
* Rephrase commit message about "leaving a fanout".
V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/
Gur Stavi (3):
af_packet: allow fanout_add when socket is not RUNNING
selftests: net/psock_fanout: socket joins fanout when link is down
selftests: net/psock_fanout: unbound socket fanout
net/packet/af_packet.c | 9 +--
tools/testing/selftests/net/psock_fanout.c | 78 +++++++++++++++++++++-
2 files changed, 80 insertions(+), 7 deletions(-)
base-commit: c531f2269a53db5cf64b24baf785ccbcda52970f
--
2.45.2
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 240 ++++++++++++++++++
drivers/connector/cn_proc.c | 58 ++++-
drivers/connector/connector.c | 96 ++++++-
include/linux/connector.h | 47 ++++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 4 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 ++++++++++++
lib/cn_hash_test.h | 12 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 5 +
tools/testing/selftests/connector/thread.c | 90 +++++++
.../selftests/connector/thread_filter.c | 93 +++++++
15 files changed, 847 insertions(+), 10 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
This splits the preparation works of the iommu and the Intel iommu driver
out from the iommufd pasid attach/replace series. [1]
To support domain replacement, the definition of the set_dev_pasid op
needs to be enhanced. Meanwhile, the existing set_dev_pasid callbacks
should be extended as well to suit the new definition.
This series first prepares the Intel iommu set_dev_pasid op for the new
definition, adds the missing set_dev_pasid support for nested domain, makes
ARM SMMUv3 set_dev_pasid op to suit the new definition, and in the end
enhances the definition of set_dev_pasid op. The AMD set_dev_pasid callback
is extended to fail if the caller tries to do domain replacement to meet the
new definition of set_dev_pasid op. AMD iommu driver would support it later
per Vasant [2].
[1] https://lore.kernel.org/linux-iommu/20240412081516.31168-1-yi.l.liu@intel.c…
[2] https://lore.kernel.org/linux-iommu/fa9c4fc3-9365-465e-8926-b4d2d6361b9c@am…
v2:
- Make ARM SMMUv3 set_dev_pasid op support domain replacement (Jason)
- Drop patch 03 of v1 (Kevin)
- Multiple tweaks in VT-d driver (Kevin)
v1: https://lore.kernel.org/linux-iommu/20240628085538.47049-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Jason Gunthorpe (1):
iommu/arm-smmu-v3: Make smmuv3 set_dev_pasid() op support replace
Lu Baolu (1):
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (4):
iommu: Pass old domain to set_dev_pasid op
iommu/vt-d: Move intel_drain_pasid_prq() into
intel_pasid_tear_down_entry()
iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain
replacement
iommu: Make set_dev_pasid op support domain replacement
drivers/iommu/amd/amd_iommu.h | 3 +-
drivers/iommu/amd/pasid.c | 6 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +-
drivers/iommu/intel/iommu.c | 122 ++++++++++++------
drivers/iommu/intel/iommu.h | 3 +
drivers/iommu/intel/nested.c | 1 +
drivers/iommu/intel/pasid.c | 13 +-
drivers/iommu/intel/pasid.h | 8 +-
drivers/iommu/intel/svm.c | 6 +-
drivers/iommu/iommu.c | 3 +-
include/linux/iommu.h | 5 +-
13 files changed, 129 insertions(+), 56 deletions(-)
--
2.34.1