virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (10):
virtio_net: Add functions for hashing
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Unify vnet implementation
tun: Pad virtio header with zero
tun: Introduce virtio-net hash reporting feature
tun: Introduce virtio-net RSS
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
MAINTAINERS | 1 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 218 ++++--------
drivers/net/tun.c | 293 ++++++----------
drivers/net/tun_vnet.h | 342 +++++++++++++++++++
drivers/vhost/net.c | 16 +-
include/linux/if_tap.h | 2 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 +++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 75 +++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 630 ++++++++++++++++++++++++++++++++++-
16 files changed, 1430 insertions(+), 356 deletions(-)
---
base-commit: 752ebcbe87aceeb6334e846a466116197711a982
change-id: 20240403-rss-e737d89efa77
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This fixes several smaller issues I faced when compiling the arm64
kselftests on my machine.
Patch 1 avoids a warning about the double definition of GNU_SOURCE,
for the arm64/signal tests. Patch 2 fixes a typo, where the f8dp2 hwcap
feature test was looking at the f8dp*4* cpuinfo name. Patch 3 adjusts
the output of the MTE tests when MTE is not available, so that tools
parsing the TAP output don't get confused and report errors.
The remaining patches are about wrong printf format specifiers. I grouped
them by type of error, in patch 4-8.
Please have a look!
Cheers,
Andre
Andre Przywara (8):
kselftest/arm64: signal: drop now redundant GNU_SOURCE definition
kselftest/arm64: hwcap: fix f8dp2 cpuinfo name
kselftest/arm64: mte: use proper SKIP syntax
kselftest/arm64: mte: use string literal for printf-style functions
kselftest/arm64: mte: fix printf type warning about mask
kselftest/arm64: mte: fix printf type warnings about __u64
kselftest/arm64: mte: fix printf type warnings about pointers
kselftest/arm64: mte: fix printf type warnings about longs
tools/testing/selftests/arm64/abi/hwcap.c | 2 +-
.../selftests/arm64/mte/check_buffer_fill.c | 4 ++--
tools/testing/selftests/arm64/mte/check_prctl.c | 4 ++--
.../selftests/arm64/mte/check_tags_inclusion.c | 4 ++--
.../testing/selftests/arm64/mte/mte_common_util.c | 15 +++++++--------
.../testing/selftests/arm64/mte/mte_common_util.h | 6 +++---
tools/testing/selftests/arm64/signal/Makefile | 2 +-
7 files changed, 18 insertions(+), 19 deletions(-)
--
2.25.1
Currently, the situation when guest accesses MMIO during event delivery
is handled differently in VMX and SVM: on VMX KVM returns internal error
with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV, when SVM simply goes
into infinite loop trying to deliver an event again and again.
This patch series eliminates this difference by returning a KVM internal
error with suberror = KVM_INTERNAL_ERROR_DELIVERY_EV when guest is
performing MMIO during event delivery, for both VMX and SVM.
Also, it introduces a selftest test case which covers the MMIO during
event delivery error handling.
Ivan Orlov (3):
KVM: x86, vmx: Add function for event delivery error generation
KVM: vmx, svm, mmu: Process MMIO during event delivery
selftests: KVM: Add test case for MMIO during event delivery
arch/x86/include/asm/kvm_host.h | 8 ++++
arch/x86/kvm/mmu/mmu.c | 15 +++++-
arch/x86/kvm/svm/svm.c | 4 ++
arch/x86/kvm/vmx/vmx.c | 32 ++++---------
arch/x86/kvm/x86.c | 22 +++++++++
.../selftests/kvm/set_memory_region_test.c | 46 +++++++++++++++++++
6 files changed, 104 insertions(+), 23 deletions(-)
--
2.43.0
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v3->v4 changes:
- Reduce size of exit.log by removing unnecessary text.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one.
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Made threads.c automated, by having an exit.log file created by
proc_filter.c, which threads.c checks to see if the values reported
for thread exits are correct. This was for a comment by Liam Howlett to
make the tests automated.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 ++++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 202 ++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 937 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one.
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Made threads.c automated, by having an exit.log file created by
proc_filter.c, which threads.c checks to see if the values reported
for thread exits are correct. This was for a comment by Liam Howlett to
make the tests automated.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 ++++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 202 ++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 937 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
Hi all,
This series implements the Permission Overlay Extension introduced in 2022
VMSA enhancements [1]. It is based on v6.11-rc4.
Changes since v4[2]:
- Added Acks and R-bs, thanks!
- KVM:
- Move POR_EL{0,1} handling inside TCR_EL2 blocks
- Add visibility functions for registers [4]
- Make ID_AA64MMFR3_EL1 writable
- use system_supports_poe() more consistently
- use BIT instead of hex constants
- fix off-by-one in arch_max_pkey() macro
- add PKEY_DISABLE_EXECUTE and PKEY_DISABLE_READ
- Update some comments and commit messages.
- No change to when we save/restore POR_EL0 for signals!
Conflicts with GCS:
- Uses the same (last) bit in HWCAP2
- Uses the same VM_HIGH_ARCH_5
Conflicts with arm64 KVM:
- Maz has taken patch 8 into one of his own series
- I have taken and modified a patch from Maz (patch 9)
The Permission Overlay Extension allows to constrain permissions on memory
regions. This can be used from userspace (EL0) without a system call or TLB
invalidation.
POE is used to implement the Memory Protection Keys [3] Linux syscall.
The first few patches add the basic framework, then the PKEYS interface is
implemented, and then the selftests are made to work on arm64.
I have tested the modified protection_keys test on x86_64, but not PPC.
I haven't build tested the x86/ppc arch changes.
Thanks,
Joey
[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processor…
[2] https://lore.kernel.org/linux-arm-kernel/20240503130147.1154804-1-joey.goul…
[3] Documentation/core-api/protection-keys.rst
[4] https://lore.kernel.org/linux-arm-kernel/20240806-kvm-arm64-get-reg-list-v2…
Joey Gouly (30):
powerpc/mm: add ARCH_PKEY_BITS to Kconfig
x86/mm: add ARCH_PKEY_BITS to Kconfig
mm: use ARCH_PKEY_BITS to define VM_PKEY_BITN
arm64: disable trapping of POR_EL0 to EL2
arm64: cpufeature: add Permission Overlay Extension cpucap
arm64: context switch POR_EL0 register
KVM: arm64: Save/restore POE registers
KVM: arm64: make kvm_at() take an OP_AT_*
KVM: arm64: use `at s1e1a` for POE
KVM: arm64: Sanitise ID_AA64MMFR3_EL1
arm64: enable the Permission Overlay Extension for EL0
arm64: re-order MTE VM_ flags
arm64: add POIndex defines
arm64: convert protection key into vm_flags and pgprot values
arm64: mask out POIndex when modifying a PTE
arm64: handle PKEY/POE faults
arm64: add pte_access_permitted_no_overlay()
arm64: implement PKEYS support
arm64: add POE signal support
arm64/ptrace: add support for FEAT_POE
arm64: enable POE and PIE to coexist
arm64: enable PKEY support for CPUs with S1POE
arm64: add Permission Overlay Extension Kconfig
kselftest/arm64: move get_header()
selftests: mm: move fpregs printing
selftests: mm: make protection_keys test work on arm64
kselftest/arm64: add HWCAP test for FEAT_S1POE
kselftest/arm64: parse POE_MAGIC in a signal frame
kselftest/arm64: Add test case for POR_EL0 signal frame records
KVM: selftests: get-reg-list: add Permission Overlay registers
Documentation/arch/arm64/elf_hwcaps.rst | 2 +
arch/arm64/Kconfig | 23 +++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 10 +-
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 3 +-
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/mman.h | 10 +-
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/include/asm/mmu_context.h | 46 +++++-
arch/arm64/include/asm/pgtable-hwdef.h | 10 ++
arch/arm64/include/asm/pgtable-prot.h | 8 +-
arch/arm64/include/asm/pgtable.h | 34 ++++-
arch/arm64/include/asm/pkeys.h | 108 ++++++++++++++
arch/arm64/include/asm/por.h | 33 +++++
arch/arm64/include/asm/processor.h | 1 +
arch/arm64/include/asm/sysreg.h | 3 +
arch/arm64/include/asm/traps.h | 1 +
arch/arm64/include/asm/vncr_mapping.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/mman.h | 9 ++
arch/arm64/include/uapi/asm/sigcontext.h | 7 +
arch/arm64/kernel/cpufeature.c | 23 +++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/process.c | 28 ++++
arch/arm64/kernel/ptrace.c | 46 ++++++
arch/arm64/kernel/signal.c | 62 ++++++++
arch/arm64/kernel/traps.c | 6 +
arch/arm64/kvm/hyp/include/hyp/fault.h | 5 +-
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 27 ++++
arch/arm64/kvm/sys_regs.c | 25 +++-
arch/arm64/mm/fault.c | 55 ++++++-
arch/arm64/mm/mmap.c | 11 ++
arch/arm64/mm/mmu.c | 45 ++++++
arch/arm64/tools/cpucaps | 1 +
arch/powerpc/Kconfig | 4 +
arch/x86/Kconfig | 4 +
fs/proc/task_mmu.c | 2 +
include/linux/mm.h | 20 ++-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 14 ++
.../testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/poe_siginfo.c | 86 +++++++++++
.../arm64/signal/testcases/testcases.c | 27 +---
.../arm64/signal/testcases/testcases.h | 28 +++-
.../selftests/kvm/aarch64/get-reg-list.c | 14 ++
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/pkey-arm64.h | 139 ++++++++++++++++++
tools/testing/selftests/mm/pkey-helpers.h | 8 +
tools/testing/selftests/mm/pkey-powerpc.h | 3 +
tools/testing/selftests/mm/pkey-x86.h | 4 +
tools/testing/selftests/mm/protection_keys.c | 109 ++++++++++++--
52 files changed, 1060 insertions(+), 63 deletions(-)
create mode 100644 arch/arm64/include/asm/pkeys.h
create mode 100644 arch/arm64/include/asm/por.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/poe_siginfo.c
create mode 100644 tools/testing/selftests/mm/pkey-arm64.h
--
2.25.1