From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When TCP over IPv4 via INET6 API, sk->sk_family is AF_INET6, but it is a v4 pkt.
inet_csk(sk)->icsk_af_ops is ipv6_mapped and use ip_queue_xmit. Some sockopt did
not take effect, such as tos.
0001: Use sk_is_inet helper to fix it.
0002: Setget_sockopt add a test for tcp over ipv4 via ipv6.
Changelog:
v2->v3: Addressed comments from Eric Dumazet
- Use sk_is_inet() helper
Details in here:
https://lore.kernel.org/bpf/CANn89i+9GmBLCdgsfH=WWe-tyFYpiO27wONyxaxiU6aOBC…
v1->v2: Addressed comments from kernel test robot
- Fix compilation error
Details in here:
https://lore.kernel.org/bpf/202408152058.YXAnhLgZ-lkp@intel.com/T/
Feng Zhou (2):
bpf: Fix bpf_get/setsockopt to tos not take effect when TCP over IPv4
via INET6 API
selftests/bpf: Setget_sockopt add a test for tcp over ipv4 via ipv6
net/core/filter.c | 7 +++-
.../selftests/bpf/prog_tests/setget_sockopt.c | 33 +++++++++++++++++++
.../selftests/bpf/progs/setget_sockopt.c | 13 ++++++--
3 files changed, 49 insertions(+), 4 deletions(-)
--
2.30.2
From: Jeff Xu <jeffxu(a)chromium.org>
Pedro Falcato's optimization [1] for checking sealed VMAs, which replaces
the can_modify_mm() function with an in-loop check, necessitates an update
to the mseal.rst documentation to reflect this change.
Furthermore, the document has received offline comments regarding the code
sample and suggestions for sentence clarification to enhance reader
comprehension.
[1] https://lore.kernel.org/linux-mm/20240817-mseal-depessimize-v3-0-d8d2e037df…
History:
V3: update according to Randy Dunlap's comment
V2: update according to Randy Dunlap's comments.
https://lore.kernel.org/all/20241001002628.2239032-1-jeffxu@chromium.org/
V1: initial version
https://lore.kernel.org/all/20240927185211.729207-1-jeffxu@chromium.org/
Jeff Xu (1):
mseal: update mseal.rst
Documentation/userspace-api/mseal.rst | 307 +++++++++++++-------------
1 file changed, 148 insertions(+), 159 deletions(-)
--
2.47.0.rc0.187.ge670bccf7e-goog
This series introduces a new ioctl KVM_HYPERV_SET_TLB_FLUSH_INHIBIT. It
allows hypervisors to inhibit remote TLB flushing of a vCPU coming from
Hyper-V hyper-calls (namely HvFlushVirtualAddressSpace(Ex) and
HvFlushirtualAddressList(Ex)). It is required to implement the
HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
emulate VSM within KVM and QEMU. The hyper-call requires several new KVM
APIs, one of which is KVM_HYPERV_SET_TLB_FLUSH_INHIBIT.
Once the inhibit flag is set, any processor attempting to flush the TLB on
the marked vCPU, with a HyperV hyper-call, will be suspended until the
flag is cleared again. During the suspension the vCPU will not run at all,
neither receiving events nor running other code. It will wake up from
suspension once the vCPU it is waiting on clears the inhibit flag. This
behaviour is specified in Microsoft's "Hypervisor Top Level Functional
Specification" (TLFS).
The vCPU will block execution during the suspension, making it transparent
to the hypervisor. An alternative design to what is proposed here would be
to exit from the Hyper-V hypercall upon finding an inhibited vCPU. We
decided against it, to allow for a simpler and more performant
implementation. Exiting to user space would create an additional
synchronisation burden and make the resulting code more complex.
Additionally, since the suspension is specific to HyperV events, it
wouldn't provide any functional benefits.
The TLFS specifies that the instruction pointer is not moved during the
suspension, so upon unsuspending the hyper-calls is re-executed. This
means that, if the vCPU encounters another inhibited TLB and is
resuspended, any pending events and interrupts are still executed. This is
identical to the vCPU receiving such events right before the hyper-call.
This inhibiting of TLB flushes is necessary, to securely implement
intercepts. These allow a higher "Virtual Trust Level" (VTL) to react to
a lower VTL accessing restricted memory. In such an intercept the VTL may
want to emulate a memory access in software, however, if another processor
flushes the TLB during that operation, incorrect behaviour can result.
The patch series includes basic testing of the ioctl and suspension state.
All previously passing KVM selftests and KVM unit tests still pass.
Series overview:
- 1: Document the new ioctl
- 2: Implement the suspension state
- 3: Update TLB flush hyper-call in preparation
- 4-5: Implement the ioctl
- 6: Add traces
- 7: Implement testing
As the suspension state is transparent to the hypervisor, testing is
complicated. The current version makes use of a set time intervall to give
the vCPU time to enter the hyper-call and get suspended. Ideas for
improvement on this are very welcome.
This series, alongside my series [1] implementing KVM_TRANSLATE2, the
series by Nicolas Saenz Julienne [2] implementing the core building blocks
for VSM and the accompanying QEMU implementation [3], is capable of
booting Windows Server 2019 with VSM/CredentialGuard enabled.
All three series are also available on GitHub [4].
[1] https://lore.kernel.org/linux-kernel/20240910152207.38974-1-nikwip@amazon.d…
[2] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.c…
[3] https://github.com/vianpl/qemu/tree/vsm/next
[4] https://github.com/vianpl/linux/tree/vsm/next
Best,
Nikolas
Nikolas Wipper (7):
KVM: Add API documentation for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement Hyper-V's vCPU suspended state
KVM: x86: Check vCPUs before enqueuing TLB flushes in
kvm_hv_flush_tlb()
KVM: Introduce KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Implement KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
KVM: x86: Add trace events to track Hyper-V suspensions
KVM: selftests: Add tests for KVM_HYPERV_SET_TLB_FLUSH_INHIBIT
Documentation/virt/kvm/api.rst | 41 +++
arch/x86/include/asm/kvm_host.h | 5 +
arch/x86/kvm/hyperv.c | 86 +++++-
arch/x86/kvm/hyperv.h | 17 ++
arch/x86/kvm/trace.h | 39 +++
arch/x86/kvm/x86.c | 41 ++-
include/uapi/linux/kvm.h | 15 +
tools/testing/selftests/kvm/Makefile | 1 +
.../kvm/x86_64/hyperv_tlb_flush_inhibit.c | 274 ++++++++++++++++++
9 files changed, 503 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush_inhibit.c
--
2.40.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
MPTCP connection requests toward a listening socket created by the
in-kernel PM for a port based signal endpoint will never be accepted,
they need to be explicitly rejected.
- Patch 1: Explicitly reject such requests. A fix for >= v5.12.
- Patch 2: Cover this case in the MPTCP selftests to avoid regressions.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- This new version fixes the root cause for the issue Cong Wang sent a
patch for a few weeks ago, see the v1, and the explanations below. The
new version is very different from the v1, from a different author.
Thanks to Cong Wang for the first analysis, and to Paolo for having
spot the root cause, and sent a fix for it.
- Link to v1: https://lore.kernel.org/r/20240908180620.822579-1-xiyou.wangcong@gmail.com
- Link: https://lore.kernel.org/r/a5289a0d-2557-40b8-9575-6f1a0bbf06e4@redhat.com
---
Paolo Abeni (2):
mptcp: prevent MPC handshake on port-based signal endpoints
selftests: mptcp: join: test for prohibited MPC to port-based endp
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 11 +++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 117 +++++++++++++++++-------
6 files changed, 101 insertions(+), 31 deletions(-)
---
base-commit: 174714f0e505070a16be6fbede30d32b81df789f
change-id: 20241014-net-mptcp-mpc-port-endp-4f88bd428ec7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
DAMON debugfs interface was the only user interface of DAMON at the
beginning[1]. However, it turned out the interface would be not good
enough for long-term flexibility and stability.
In Feb 2022[2], we therefore introduced DAMON sysfs interface as an
alternative user interface that aims long-term flexibility and
stability. With its introduction, DAMON debugfs interface has announced
to be deprecated in near future.
In Feb 2023[3], we announced the official deprecation of DAMON debugfs
interface. In Jan 2024[4], we further made the deprecation difficult to
be ignored.
And as of this writing (2024-10-14), no problem or concerns about the
deprecation have reported. Apparently users are already moved to the
alternative, or made good plans for the change.
Remove the DAMON debugfs interface code from the tree. Given the past
timeline and the absence of reported problems or concerns, it is safe
enough to be done. That said, we will not drop the RFC tag of this
patch series at least until the end of this year, to use this as the
real last call for users.
[1] https://lore.kernel.org/20210716081449.22187-1-sj38.park@gmail.com
[2] https://lore.kernel.org/20220228081314.5770-1-sj@kernel.org
[3] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org
[4] https://lore.kernel.org/20240130013549.89538-1-sj@kernel.org
SeongJae Park (7):
Docs/admin-guide/mm/damon/usage: remove DAMON debugfs interface
documentation
Docs/mm/damon/design: update for removal of DAMON debugfs interface
selftests/damon/config: remove configs for DAMON debugfs interface
selftests
selftests/damon: remove tests for DAMON debugfs interface
kunit: configs: remove configs for DAMON debugfs interface tests
mm/damon: remove DAMON debugfs interface kunit tests
mm/damon: remove DAMON debugfs interface
Documentation/admin-guide/mm/damon/usage.rst | 309 -----
Documentation/mm/damon/design.rst | 23 +-
mm/damon/Kconfig | 30 -
mm/damon/Makefile | 1 -
mm/damon/dbgfs.c | 1148 -----------------
mm/damon/tests/.kunitconfig | 7 -
mm/damon/tests/dbgfs-kunit.h | 173 ---
tools/testing/kunit/configs/all_tests.config | 3 -
tools/testing/selftests/damon/.gitignore | 3 -
tools/testing/selftests/damon/Makefile | 11 +-
tools/testing/selftests/damon/config | 1 -
.../testing/selftests/damon/debugfs_attrs.sh | 17 -
.../debugfs_duplicate_context_creation.sh | 27 -
.../selftests/damon/debugfs_empty_targets.sh | 21 -
.../damon/debugfs_huge_count_read_write.sh | 22 -
.../damon/debugfs_rm_non_contexts.sh | 19 -
.../selftests/damon/debugfs_schemes.sh | 19 -
.../selftests/damon/debugfs_target_ids.sh | 19 -
.../damon/debugfs_target_ids_pid_leak.c | 68 -
.../damon/debugfs_target_ids_pid_leak.sh | 22 -
...fs_target_ids_read_before_terminate_race.c | 80 --
...s_target_ids_read_before_terminate_race.sh | 14 -
.../selftests/damon/huge_count_read_write.c | 48 -
23 files changed, 11 insertions(+), 2074 deletions(-)
delete mode 100644 mm/damon/dbgfs.c
delete mode 100644 mm/damon/tests/dbgfs-kunit.h
delete mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_duplicate_context_creation.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_empty_targets.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_huge_count_read_write.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_rm_non_contexts.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_schemes.sh
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_pid_leak.sh
delete mode 100644 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.c
delete mode 100755 tools/testing/selftests/damon/debugfs_target_ids_read_before_terminate_race.sh
delete mode 100644 tools/testing/selftests/damon/huge_count_read_write.c
base-commit: 5ef943709a1b88304aa6e8cb8683a25bf81874f0
--
2.39.5
PACKET socket can retain its fanout membership through link down and up
and leave a fanout while closed regardless of link state.
However, socket was forbidden from joining a fanout while it was not
RUNNING.
This scenario was identified while studying DPDK pmd_af_packet_drv.
Since sockets are only created during initialization, there is no reason
to fail the initialization if a single link is temporarily down.
This patch allows PACKET socket to join a fanout while not RUNNING.
Selftest psock_fanout is extended to test this "fanout while link down"
scenario.
Selftest psock_fanout is also extended to test fanout create/join by
socket that did not bind or specified a protocol, which carries an
implicit bind.
This is the only test that was performed.
Changes:
V04:
* Minimized code change.
* Removed test of ifindex. A socket that went through bind "unlisted" race can
join a fanout.
V03: https://lore.kernel.org/netdev/cover.1728555449.git.gur.stavi@huawei.com
* psock_fanout: add test for joining fanout with unbound socket.
* Test that socket can receive packets before adding it to a fanout match.
This is kind of replaces the RUNNING test that was removed.
* Initialize po->ifindex in packet_create. To -1 if no protocol is specified
and add an explicit initialization to 0 if protocol is specified.
* Refactor relevant code in fanout_add within bind_lock, as a sequence of
if {} else if {}, in order to reduce indentation of nested if statements and
provide specific error codes.
V02: https://lore.kernel.org/netdev/cover.1728382839.git.gur.stavi@huawei.com
* psock_fanout: use explicit loopback up/down instead of toggle.
* psock_fanout: don't try to restore loopback state on failure.
* Rephrase commit message about "leaving a fanout".
V01: https://lore.kernel.org/netdev/cover.1728303615.git.gur.stavi@huawei.com/
Gur Stavi (3):
af_packet: allow fanout_add when socket is not RUNNING
selftests: net/psock_fanout: socket joins fanout when link is down
selftests: net/psock_fanout: unbound socket fanout
net/packet/af_packet.c | 9 +--
tools/testing/selftests/net/psock_fanout.c | 78 +++++++++++++++++++++-
2 files changed, 80 insertions(+), 7 deletions(-)
base-commit: c531f2269a53db5cf64b24baf785ccbcda52970f
--
2.45.2