The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature will be available in the
next version of "AMD64 Architecture Programmer’s Manual".
Testing Done:
Added a selftest to test the Idle HLT intercept functionality.
Tested SEV and SEV-ES guest for the Idle HLT intercept functionality.
Manali Shukla (5):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
tools: Add KVM exit reason for the Idle HLT
selftests: Add an interface to read the data of named vcpu stat
selftests: KVM: SVM: Add Idle HLT intercept test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 11 +-
tools/arch/x86/include/uapi/asm/svm.h | 2 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 11 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 41 ++++++
.../selftests/kvm/x86_64/svm_idlehlt_test.c | 119 ++++++++++++++++++
9 files changed, 186 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/svm_idlehlt_test.c
base-commit: fdd58834d132046149699b88a27a0db26829f4fb
--
2.34.1
Hi,
While running kselftest on vanilla torvalds tree kernel commit v6.8-11167-g4438a810f396,
the test suite reported a number of errors.
I was using the latest iproute2-next suite on an Ubuntu 22.04 LTS box.
# Tests passed: 558
# Tests failed: 84
not ok 90 selftests: net: test_vxlan_mdb.sh # exit=1
495:# TEST: Destination IP - match [FAIL]
496:# TEST: Destination IP - no match [FAIL]
497:# TEST: Default destination port - match [FAIL]
498:# TEST: Default destination port - no match [FAIL]
499:# TEST: Non-default destination port - match [FAIL]
500:# TEST: Non-default destination port - no match [FAIL]
501:# TEST: Default destination VNI - match [FAIL]
502:# TEST: Default destination VNI - no match [FAIL]
503:# TEST: Non-default destination VNI - match [FAIL]
504:# TEST: Non-default destination VNI - no match [FAIL]
521:# TEST: Destination IP - match [FAIL]
522:# TEST: Destination IP - no match [FAIL]
523:# TEST: Default destination port - match [FAIL]
524:# TEST: Default destination port - no match [FAIL]
525:# TEST: Non-default destination port - match [FAIL]
526:# TEST: Non-default destination port - no match [FAIL]
527:# TEST: Default destination VNI - match [FAIL]
528:# TEST: Default destination VNI - no match [FAIL]
529:# TEST: Non-default destination VNI - match [FAIL]
530:# TEST: Non-default destination VNI - no match [FAIL]
549:# TEST: Forward valid source - first VTEP [FAIL]
550:# TEST: Forward valid source - second VTEP [FAIL]
551:# TEST: Block excluded source after removal - first VTEP [FAIL]
552:# TEST: Block excluded source after removal - second VTEP [FAIL]
553:# TEST: Forward valid source after removal - first VTEP [FAIL]
554:# TEST: Forward valid source after removal - second VTEP [FAIL]
571:# TEST: Forward valid source - first VTEP [FAIL]
572:# TEST: Forward valid source - second VTEP [FAIL]
573:# TEST: Block excluded source after removal - first VTEP [FAIL]
574:# TEST: Block excluded source after removal - second VTEP [FAIL]
575:# TEST: Forward valid source after removal - first VTEP [FAIL]
576:# TEST: Forward valid source after removal - second VTEP [FAIL]
593:# TEST: Forward valid source - first VTEP [FAIL]
594:# TEST: Forward valid source - second VTEP [FAIL]
595:# TEST: Block excluded source after removal - first VTEP [FAIL]
596:# TEST: Block excluded source after removal - second VTEP [FAIL]
597:# TEST: Forward valid source after removal - first VTEP [FAIL]
598:# TEST: Forward valid source after removal - second VTEP [FAIL]
615:# TEST: Forward valid source - first VTEP [FAIL]
616:# TEST: Forward valid source - second VTEP [FAIL]
617:# TEST: Block excluded source after removal - first VTEP [FAIL]
618:# TEST: Block excluded source after removal - second VTEP [FAIL]
619:# TEST: Forward valid source after removal - first VTEP [FAIL]
620:# TEST: Forward valid source after removal - second VTEP [FAIL]
636:# TEST: Forward valid source [FAIL]
637:# TEST: Receive of valid source after removal from group [FAIL]
648:# TEST: Forward valid source [FAIL]
649:# TEST: Receive of valid source after removal from group [FAIL]
660:# TEST: Forward valid source [FAIL]
661:# TEST: Receive of valid source after removal from group [FAIL]
672:# TEST: Forward valid source [FAIL]
673:# TEST: Receive of valid source after removal from group [FAIL]
683:# TEST: Egress VNI translation - PVID configured [FAIL]
684:# TEST: Egress VNI translation - no PVID configured [FAIL]
685:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
695:# TEST: Egress VNI translation - PVID configured [FAIL]
696:# TEST: Egress VNI translation - no PVID configured [FAIL]
697:# TEST: Egress VNI translation - PVID reconfigured [FAIL]
707:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
709:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
710:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
711:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
712:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
713:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
714:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
715:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
716:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
734:# TEST: Registered IPv4 multicast - first VTEP [FAIL]
736:# TEST: Unregistered IPv4 multicast - first VTEP [FAIL]
737:# TEST: Unregistered IPv4 multicast - second VTEP [FAIL]
738:# TEST: Link-local IPv4 multicast - first VTEP [FAIL]
739:# TEST: Link-local IPv4 multicast - second VTEP [FAIL]
740:# TEST: Registered IPv4 multicast with a unicast MAC - first VTEP [FAIL]
741:# TEST: Registered IPv4 multicast with a unicast MAC - second VTEP [FAIL]
742:# TEST: Registered IPv4 multicast with a broadcast MAC - first VTEP [FAIL]
743:# TEST: Registered IPv4 multicast with a broadcast MAC - second VTEP [FAIL]
761:# TEST: IP multicast - first VTEP [FAIL]
763:# TEST: Broadcast - first VTEP [FAIL]
765:# TEST: IP multicast after removal - first VTEP [FAIL]
766:# TEST: IP multicast after removal - second VTEP [FAIL]
779:# TEST: IP multicast - first VTEP [FAIL]
781:# TEST: Broadcast - first VTEP [FAIL]
783:# TEST: IP multicast after removal - first VTEP [FAIL]
784:# TEST: IP multicast after removal - second VTEP [FAIL]
The problem is present at least since 6.8-rc7.
Please find attached the config and the full output of test_vxlan_mdb.sh.
Hope this helps.
Best regards,
Mirsad Todorovac
New version of the sleepable bpf_timer code, without the HID changes, as
they can now go through the HID tree independantly.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.
Cheers,
Benjamin
To: Alexei Starovoitov <ast(a)kernel.org>
To: Daniel Borkmann <daniel(a)iogearbox.net>
To: Andrii Nakryiko <andrii(a)kernel.org>
To: Martin KaFai Lau <martin.lau(a)linux.dev>
To: Eduard Zingerman <eddyz87(a)gmail.com>
To: Song Liu <song(a)kernel.org>
To: Yonghong Song <yonghong.song(a)linux.dev>
To: John Fastabend <john.fastabend(a)gmail.com>
To: KP Singh <kpsingh(a)kernel.org>
To: Stanislav Fomichev <sdf(a)google.com>
To: Hao Luo <haoluo(a)google.com>
To: Jiri Olsa <jolsa(a)kernel.org>
To: Mykola Lysenko <mykolal(a)fb.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: Benjamin Tissoires <bentiss(a)kernel.org>
Cc: <bpf(a)vger.kernel.org>
Cc: <linux-kernel(a)vger.kernel.org>
Cc: <linux-kselftest(a)vger.kernel.org>
---
Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-0-1fb378ca6301@kern…
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/20240214-hid-bpf-sleepable-v2-0-5756b054724d@kern…
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/20240209-hid-bpf-sleepable-v1-0-4cc895b5adbd@kern…
---
Benjamin Tissoires (6):
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 132 ++++++++++++++++++++-
kernel/bpf/verifier.c | 92 +++++++++++++-
tools/include/uapi/linux/bpf.h | 4 +
tools/testing/selftests/bpf/bpf_experimental.h | 4 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 1 +
tools/testing/selftests/bpf/progs/timer.c | 40 ++++++-
tools/testing/selftests/bpf/progs/timer_failure.c | 114 +++++++++++++++++-
11 files changed, 387 insertions(+), 11 deletions(-)
---
base-commit: 9187210eee7d87eea37b45ea93454a88681894a4
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
PASID (Process Address Space ID) is a PCIe extension to tag the DMA
transactions out of a physical device, and most modern IOMMU hardware
have supported PASID granular address translation. So a PASID-capable
device can be attached to multiple hwpts (a.k.a. domains), each attachment
is tagged with a pasid.
This series first adds a missing iommu API to replace domain for a pasid,
then adds iommufd APIs for device drivers to attach/replace/detach pasid
to/from hwpt per userspace's request, and adds selftest to validate the
iommufd APIs.
pasid attach/replace is mandatory on Intel VT-d given the PASID table
locates in the physical address space hence must be managed by the kernel,
both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD
which allow configuring the PASID/CD table either in host physical address
space or nested on top of an GPA address space. This series only add VT-d
support as the minimal requirement.
Complete code can be found in below link:
https://github.com/yiliu1765/iommufd/tree/iommufd_pasid
Change log:
v1:
- Implemnet iommu_replace_device_pasid() to fall back to the original domain
if this replacement failed (Kevin)
- Add check in do_attach() to check corressponding attach_fn per the pasid value.
rfc: https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Kevin Tian (1):
iommufd: Support attach/replace hwpt per pasid
Lu Baolu (2):
iommu: Introduce a replace API for device pasid
iommu/vt-d: Add set_dev_pasid callback for nested domain
Yi Liu (5):
iommufd: replace attach_fn with a structure
iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add coverage for iommufd pasid attach/detach
drivers/iommu/intel/nested.c | 47 +++++
drivers/iommu/iommu-priv.h | 2 +
drivers/iommu/iommu.c | 82 ++++++--
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/device.c | 50 +++--
drivers/iommu/iommufd/iommufd_private.h | 23 +++
drivers/iommu/iommufd/iommufd_test.h | 24 +++
drivers/iommu/iommufd/pasid.c | 138 ++++++++++++++
drivers/iommu/iommufd/selftest.c | 176 ++++++++++++++++--
include/linux/iommufd.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 172 +++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 28 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++
13 files changed, 785 insertions(+), 42 deletions(-)
create mode 100644 drivers/iommu/iommufd/pasid.c
--
2.34.1
This patch enhances the BPF helpers by adding a kfunc to retrieve the
cgroup v2 of a task, addressing a previous limitation where only
bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
particularly useful for scenarios where obtaining the cgroup ID of a
task other than the "current" one is necessary, which the existing
bpf_get_current_cgroup_id helper cannot accommodate. A specific use
case at Netflix involved the sched_switch tracepoint, where we had to
get the cgroup IDs of both the prev and next tasks.
The bpf_task_get_cgroup kfunc acquires and returns a reference to a
task's default cgroup, ensuring thread-safe access by correctly
implementing RCU read locking and unlocking. It leverages the existing
cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
Signed-off-by: Jose Fernandez <josef(a)netflix.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.pizza>
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Acked-by: Stanislav Fomichev <sdf(a)google.com>
---
V2 -> V3: No changes
V1 -> V2: Return a pointer to the cgroup instead of the cgroup ID
kernel/bpf/helpers.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a89587859571..bbd19d5eedb6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2266,6 +2266,31 @@ bpf_task_get_cgroup1(struct task_struct *task, int hierarchy_id)
return NULL;
return cgrp;
}
+
+/**
+ * bpf_task_get_cgroup - Acquire a reference to the default cgroup of a task.
+ * @task: The target task
+ *
+ * This function returns the task's default cgroup, primarily
+ * designed for use with cgroup v2. In cgroup v1, the concept of default
+ * cgroup varies by subsystem, and while this function will work with
+ * cgroup v1, it's recommended to use bpf_task_get_cgroup1 instead.
+ * A cgroup returned by this kfunc which is not subsequently stored in a
+ * map, must be released by calling bpf_cgroup_release().
+ *
+ * Return: On success, the cgroup is returned. On failure, NULL is returned.
+ */
+__bpf_kfunc struct cgroup *bpf_task_get_cgroup(struct task_struct *task)
+{
+ struct cgroup *cgrp;
+
+ rcu_read_lock();
+ cgrp = task_dfl_cgroup(task);
+ if (!cgroup_tryget(cgrp))
+ cgrp = NULL;
+ rcu_read_unlock();
+ return cgrp;
+}
#endif /* CONFIG_CGROUPS */
/**
@@ -2573,6 +2598,7 @@ BTF_ID_FLAGS(func, bpf_cgroup_ancestor, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_cgroup_from_id, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_task_under_cgroup, KF_RCU)
BTF_ID_FLAGS(func, bpf_task_get_cgroup1, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_get_cgroup, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
#endif
BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_throw)
base-commit: c733239f8f530872a1f80d8c45dcafbaff368737
--
2.40.1
On riscv, mmap currently returns an address from the largest address
space that can fit entirely inside of the hint address. This makes it
such that the hint address is almost never returned. This patch raises
the mappable area up to and including the hint address. This allows mmap
to often return the hint address, which allows a performance improvement
over searching for a valid address as well as making the behavior more
similar to other architectures.
Note that a previous patch introduced stronger semantics compared to
other architectures for riscv mmap. On riscv, mmap will not use bits in
the upper bits of the virtual address depending on the hint address. On
other architectures, a random address is returned in the address space
requested. On all architectures the hint address will be returned if it
is available. This allows riscv applications to configure how many bits
in the virtual address should be left empty. This has the two benefits
of being able to request address spaces that are smaller than the
default and doesn't require the application to know the page table
layout of riscv.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Add back forgotten "mmap_end = STACK_TOP_MAX"
- Link to v1: https://lore.kernel.org/r/20240129-use_mmap_hint_address-v1-0-4c74da813ba1@…
---
Charlie Jenkins (3):
riscv: mm: Use hint address in mmap if available
selftests: riscv: Generalize mm selftests
docs: riscv: Define behavior of mmap
Documentation/arch/riscv/vm-layout.rst | 16 ++--
arch/riscv/include/asm/processor.h | 22 +++---
tools/testing/selftests/riscv/mm/mmap_bottomup.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_default.c | 20 +----
tools/testing/selftests/riscv/mm/mmap_test.h | 93 +++++++++++++-----------
5 files changed, 67 insertions(+), 104 deletions(-)
---
base-commit: 556e2d17cae620d549c5474b1ece053430cd50bc
change-id: 20240119-use_mmap_hint_address-f9f4b1b6f5f1
--
- Charlie