This patchset refactors non-composite global variables into a common
struct that can be initialized and passed around per-test instead of
relying on the presence of global variables.
This allows:
- Better encapsulation
- Debugging becomes easier -- local variable state can be viewed per
stack frame, and we can more easily reason about the variable
mutations
Patch 1 needs to be applied first and can be followed by any of the
other patches.
I've ensured that the tests are passing locally (or atleast have the
same output as the code on master).
Ujwal Kundur (4):
selftests/mm/uffd: Refactor non-composite global vars into struct
selftests/mm/uffd: Swap global vars with global test options
selftests/mm/uffd: Swap global variables with global test opts
selftests/mm/uffd: Swap global variables with global test opts
tools/testing/selftests/mm/uffd-common.c | 269 +++++-----
tools/testing/selftests/mm/uffd-common.h | 78 +--
tools/testing/selftests/mm/uffd-stress.c | 226 ++++----
tools/testing/selftests/mm/uffd-unit-tests.c | 523 ++++++++++---------
tools/testing/selftests/mm/uffd-wp-mremap.c | 23 +-
5 files changed, 591 insertions(+), 528 deletions(-)
--
2.20.1
The kunit test that checks the longests symbol length [1], has triggered
warnings in some pilelines when symbol prefixes are used [2][3]. The test
will to depend on !PREFIX_SYMBOLS and !CFI_CLANG as sujested in [4] and
on !GCOV_KERNEL.
[1] https://lore.kernel.org/rust-for-linux/CABVgOSm=5Q0fM6neBhxSbOUHBgNzmwf2V22…
[2] https://lore.kernel.org/all/20250328112156.2614513-1-arnd@kernel.org/T/#u
[3] https://lore.kernel.org/rust-for-linux/bbd03b37-c4d9-4a92-9be2-75aaf8c19815…
[4] https://lore.kernel.org/linux-kselftest/20250427200916.GA1661412@ax162/T/#t
Reviewed-by: Rae Moar <rmoar(a)google.com>
Signed-off-by: Sergio González Collado <sergio.collado(a)gmail.com>
---
v2 -> v3: added dependency on !GCOV_KERNEL (to avoid __gcov_ prefix)
---
v1 -> v2: added dependency on !CFI_CLANG as suggested in [3], removed
CONFIG_ prefix
---
lib/Kconfig.debug | 1 +
lib/tests/longest_symbol_kunit.c | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f9051ab610d5..e55c761eae20 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2886,6 +2886,7 @@ config FORTIFY_KUNIT_TEST
config LONGEST_SYM_KUNIT_TEST
tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS
depends on KUNIT && KPROBES
+ depends on !PREFIX_SYMBOLS && !CFI_CLANG && !GCOV_KERNEL
default KUNIT_ALL_TESTS
help
Tests the longest symbol possible
diff --git a/lib/tests/longest_symbol_kunit.c b/lib/tests/longest_symbol_kunit.c
index e3c28ff1807f..9b4de3050ba7 100644
--- a/lib/tests/longest_symbol_kunit.c
+++ b/lib/tests/longest_symbol_kunit.c
@@ -3,8 +3,7 @@
* Test the longest symbol length. Execute with:
* ./tools/testing/kunit/kunit.py run longest-symbol
* --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y
- * --kconfig_add CONFIG_RETPOLINE=n --kconfig_add CONFIG_CFI_CLANG=n
- * --kconfig_add CONFIG_MITIGATION_RETPOLINE=n
+ * --kconfig_add CONFIG_CPU_MITIGATIONS=n --kconfig_add CONFIG_GCOV_KERNEL=n
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
base-commit: 1a80a098c606b285fb0a13aa992af4f86da1ff06
--
2.39.2
The protocol device drivers under drivers/platform/chrome/ are responsible
to communicate to the ChromeOS EC (Embedded Controller). They need to pack
the data in a pre-defined format and check if the EC responds accordingly.
The series adds some fundamental unit tests for the protocol. It calls the
.cmd_xfer() and .pkt_xfer() callbacks (which are the most crucial parts for
the protocol), mocks the rest of the system, and checks if the interactions
are all correct.
The series isn't ready for landing. It's more like a PoC for the
binary-level function redirection and its use cases.
The 1st patch adds ftrace stub which is originally from [1][2]. There is no
follow-up discussion about the ftrace stub. As a result, the patch is still
on the mailing list.
The 2nd patch adds Kunit tests for cros_ec_i2c. It relies on the ftrace stub
for redirecting cros_ec_{un,}register().
The 3rd patch uses static stub instead (if ftrace stub isn't really an option).
However, I'm not a big fan to change the production code (i.e. adding the
prologue in cros_ec_{un,}register()) for testing.
The 4th patch adds Kunit tests for cros_ec_spi. It relies on the ftrace stub
for redirecting cros_ec_{un,}register() again.
The 5th patch calls .probe() directly instead of forcing the driver probe
needs to be synchronous. In comparison with the 4th patch, I don't think
this is simpler. I'd prefer to the way in the 4th patch.
After talked to Masami about the work, he suggested to use Kprobes for
function redirection. The 6th patch adds kprobes stub.
The 7th patch uses kprobes stub instead for cros_ec_spi.
Questions:
- Are we going to support ftrace stub so that tests can use it?
- If ftrace stub isn't on the plate (e.g. due to too many dependencies), how
about the kprobes stub? Is it something we could pursue?
- (minor) I'm unsure if people would prefer 'kprobes stub' vs. 'kprobe stub'.
[1]: https://kunit.dev/mocking.html#binary-level-ftrace-et-al
[2]: https://lore.kernel.org/linux-kselftest/20220318021314.3225240-3-davidgow@g…
Daniel Latypov (1):
kunit: expose ftrace-based API for stubbing out functions during tests
Tzung-Bi Shih (6):
platform/chrome: kunit: cros_ec_i2c: Add tests with ftrace stub
platform/chrome: kunit: cros_ec_i2c: Use static stub instead
platform/chrome: kunit: cros_ec_spi: Add tests with ftrace stub
platform/chrome: kunit: cros_ec_spi: Call .probe() directly
kunit: Expose 'kprobes stub' API to redirect functions
platform/chrome: kunit: cros_ec_spi: Use kprobes stub instead
drivers/platform/chrome/Kconfig | 17 +
drivers/platform/chrome/Makefile | 2 +
drivers/platform/chrome/cros_ec.c | 6 +
drivers/platform/chrome/cros_ec_i2c_test.c | 479 +++++++++++++++++++++
drivers/platform/chrome/cros_ec_spi_test.c | 355 +++++++++++++++
include/kunit/ftrace_stub.h | 84 ++++
include/kunit/kprobes_stub.h | 19 +
kernel/trace/ftrace.c | 3 +
lib/kunit/Kconfig | 18 +
lib/kunit/Makefile | 8 +
lib/kunit/ftrace_stub.c | 139 ++++++
lib/kunit/kprobes_stub.c | 113 +++++
lib/kunit/kunit-example-test.c | 27 +-
lib/kunit/stubs_example.kunitconfig | 11 +
14 files changed, 1280 insertions(+), 1 deletion(-)
create mode 100644 drivers/platform/chrome/cros_ec_i2c_test.c
create mode 100644 drivers/platform/chrome/cros_ec_spi_test.c
create mode 100644 include/kunit/ftrace_stub.h
create mode 100644 include/kunit/kprobes_stub.h
create mode 100644 lib/kunit/ftrace_stub.c
create mode 100644 lib/kunit/kprobes_stub.c
create mode 100644 lib/kunit/stubs_example.kunitconfig
--
2.49.0.1101.gccaa498523-goog
This series introduces VFIO selftests, located in
tools/testing/selftests/vfio/.
VFIO selftests aim to enable kernel developers to write and run tests
that take the form of userspace programs that interact with VFIO and
IOMMUFD uAPIs. VFIO selftests can be used to write functional tests for
new features, regression tests for bugs, and performance tests for
optimizations.
These tests are designed to interact with real PCI devices, i.e. they do
not rely on mocking out or faking any behavior in the kernel. This
allows the tests to exercise not only VFIO but also IOMMUFD, the IOMMU
driver, interrupt remapping, IRQ handling, etc.
We chose selftests to host these tests primarily to enable integration
with the existing KVM selftests. As explained in the next section,
enabling KVM developers to test the interaction between VFIO and KVM is
one of the motivators of this series.
Motivation
-----------------------------------------------------------------------
The main motivation for this series is upcoming development in the
kernel to support Hypervisor Live Updates [1][2]. Live Update is a
specialized reboot process where selected devices are kept operational
and their kernel state is preserved and recreated across a kexec. For
devices, DMA and interrupts may continue during the reboot. VFIO-bound
devices are the main target, since the first usecase of Live Updates is
to enable host kernel upgrades in a Cloud Computing environment without
disrupting running customer VMs.
To prepare for upcoming support for Live Updates in VFIO, IOMMUFD, IOMMU
drivers, the PCI layer, etc., we'd like to first lay the ground work for
exercising and testing VFIO from kernel selftests. This way when we
eventually upstream support for Live Updates, we can also upstream tests
for those changes, rather than purely relying on Live Update integration
tests which would be hard to share and reproduce upstream.
But even without Live Updates, VFIO and IOMMUFD are becoming an
increasingly critical component of running KVM-based VMs in cloud
environments. Virtualized networking and storage are increasingly being
offloaded to smart NICs/cards, and demand for high performance
networking, storage, and AI are also leading to NICs, SSDs, and GPUs
being directly attached to VMs via VFIO.
VFIO selftests increases our ability to test in several ways.
- It enables developers sending VFIO, IOMMUFD, etc. commits upstream to
test their changes against all existing VFIO selftests, reducing the
probability of regressions.
- It enables developers sending VFIO, IOMMUFD, etc. commits upstream to
include tests alongside their changes, increasing the quality of the
code that is merged.
- It enables testing the interaction between VFIO and KVM. There are
some paths in KVM that are only exercised through VFIO, such as IRQ
bypass. VFIO selftests provides a helper library to enable KVM
developers to write KVM selftests to test those interactions [3].
Design
-----------------------------------------------------------------------
VFIO selftests are designed around interacting with with VFIO-managed PCI
devices. As such, the core data struture is struct vfio_pci_device, which
represents a single PCI device.
struct vfio_pci_device *device;
device = vfio_pci_device_init("0000:6a:01.0", iommu_mode);
...
vfio_pci_device_cleanup(device);
vfio_pci_device_init() sets up a container or iommufd, depending on the
iommu_mode argument, to manage DMA mappings, fetches information about
the device and what interrupts it supports from VFIO and caches it, and
mmap()s all mappable BARs for the test to use.
There are helper methods that operate on struct vfio_pci_device to do
things like read and write to PCI config space, enable/disable IRQs, and
map memory for DMA,
struct vfio_pci_device and its methods do not care about what device
they are actually interacting with. It can be a GPU, a NIC, an SSD, etc.
To keep things simple initially, VFIO selftests only support a single
device per group and per container/iommufd. But it should be possible to
relax those restrictions in the future, e.g. to enable testing with
multiple devices in the same container/iommufd.
Driver Framework
-----------------------------------------------------------------------
In order to support VFIO selftests where a device is generating DMA and
interrupts on command, the VFIO selftests supports a driver framework.
This framework abstracts away device-specific details allowing VFIO
selftests to be written in a generic way, and then run against different
devices depending on what hardware developers have access to.
The framework also aims to support carrying drivers out-of-tree, e.g.
so that companies can run VFIO selftests with custom/test hardware.
Drivers must implement the following methods:
- probe(): Check if the driver supports a given device.
- init(): Initialize the driver.
- remove(): Deinitialize the driver and reset the device.
- memcpy_start(): Kick off a series of repeated memcpys (DMA reads and
DMA writes).
- memcpy_wait(): Wait for a memcpy operation to complete.
- send_msi(): Make the device send an MSI interrupt.
memcpy_start/wait() are for generating DMA. We separate the operation
into 2 steps so that tests can trigger a long-running DMA operation. We
expect to use this to stress test Live Updates by kicking off a
long-running mempcy operation and then performing a Live Update. These
methods are required to not generate any interrupts.
send_msi() is used for testing MSI and MSI-x interrupts. The driver
tells the test which MSI it will be using via device->driver.msi.
It's the responsibility of the test to set up a region of memory
and map it into the device for use by the driver, e.g. for in-memory
descriptors, before calling init().
A demo of the driver framework can be found in
tools/testing/selftests/vfio/vfio_pci_driver_test.c.
In addition, this series introduces a new KVM selftest to demonstrate
delivering a device MSI directly into a guest, which can be found in
tools/testing/selftests/kvm/vfio_pci_device_irq_test.c.
Tests
-----------------------------------------------------------------------
There are 5 tests in this series, mostly to demonstrate as a
proof-of-concept:
- tools/testing/selftests/vfio/vfio_pci_device_test.c
- tools/testing/selftests/vfio/vfio_pci_driver_test.c
- tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
- tools/testing/selftests/vfio/vfio_dma_mapping_test.c
- tools/testing/selftests/kvm/vfio_pci_device_irq_test.c
Integrating with KVM selftests
-----------------------------------------------------------------------
To support testing the interactions between VFIO and KVM, the VFIO
selftests support sharing its library with the KVM selftest. The patches
at the end of this series demonstrate how that works.
Essentially, we allow the KVM selftests to build their own copy of
tools/testing/selftests/vfio/lib/ and link it into KVM selftests
binaries. This requires minimal changes to the KVM selftests Makefile.
Future Areas of Development
-----------------------------------------------------------------------
Library:
- Driver support for devices that can be used on AMD, ARM, and other
platforms.
- Driver support for a device available in QEMU VMs.
- Support for tests that use multiple devices.
- Support for IOMMU groups with multiple devices.
- Support for multiple devices sharing the same container/iommufd.
- Sharing TEST_ASSERT() macros and other common code between KVM
and VFIO selftests.
Tests:
- DMA mapping performance tests for BARs/HugeTLB/etc.
- Live Update selftests.
- Porting Sean's KVM selftest for posted interrupts to use the VFIO
selftests library [3]
This series can also be found on GitHub:
https://github.com/dmatlack/linux/tree/vfio/selftests/rfc
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Vipin Sharma <vipinsh(a)google.com>
Cc: Josh Hilke <jrhilke(a)google.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Saeed Mahameed <saeedm(a)nvidia.com>
Cc: Saeed Mahameed <saeedm(a)nvidia.com>
Cc: Adithya Jayachandran <ajayachandra(a)nvidia.com>
Cc: Parav Pandit <parav(a)nvidia.com>
Cc: Leon Romanovsky <leonro(a)nvidia.com>
Cc: Vinicius Costa Gomes <vinicius.gomes(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
[1] https://lore.kernel.org/all/f35359d5-63e1-8390-619f-67961443bfe1@google.com/
[2] https://lore.kernel.org/all/20250515182322.117840-1-pasha.tatashin@soleen.c…
[3] https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/
David Matlack (28):
selftests: Create tools/testing/selftests/vfio
vfio: selftests: Add a helper library for VFIO selftests
vfio: selftests: Introduce vfio_pci_device_test
tools headers: Add stub definition for __iomem
tools headers: Import asm-generic MMIO helpers
tools headers: Import x86 MMIO helper overrides
tools headers: Import iosubmit_cmds512()
tools headers: Import drivers/dma/ioat/{hw.h,registers.h}
tools headers: Import drivers/dma/idxd/registers.h
tools headers: Import linux/pci_ids.h
vfio: selftests: Keep track of DMA regions mapped into the device
vfio: selftests: Enable asserting MSI eventfds not firing
vfio: selftests: Add a helper for matching vendor+device IDs
vfio: selftests: Add driver framework
vfio: sefltests: Add vfio_pci_driver_test
vfio: selftests: Add driver for Intel CBDMA
vfio: selftests: Add driver for Intel DSA
vfio: selftests: Move helper to get cdev path to libvfio
vfio: selftests: Encapsulate IOMMU mode
vfio: selftests: Add [-i iommu_mode] option to all tests
vfio: selftests: Add vfio_type1v2_mode
vfio: selftests: Add iommufd_compat_type1{,v2} modes
vfio: selftests: Add iommufd mode
vfio: selftests: Make iommufd the default iommu_mode
vfio: selftests: Add a script to help with running VFIO selftests
KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
KVM: selftests: Test sending a vfio-pci device IRQ to a VM
KVM: selftests: Use real device MSIs in vfio_pci_device_irq_test
Josh Hilke (5):
vfio: selftests: Test basic VFIO and IOMMUFD integration
vfio: selftests: Move vfio dma mapping test to their own file
vfio: selftests: Add test to reset vfio device.
vfio: selftests: Use command line to set hugepage size for DMA mapping
test
vfio: selftests: Validate 2M/1G HugeTLB are mapped as 2M/1G in IOMMU
MAINTAINERS | 7 +
tools/arch/x86/include/asm/io.h | 101 +
tools/arch/x86/include/asm/special_insns.h | 27 +
tools/include/asm-generic/io.h | 482 +++
tools/include/asm/io.h | 11 +
tools/include/drivers/dma/idxd/registers.h | 601 +++
tools/include/drivers/dma/ioat/hw.h | 270 ++
tools/include/drivers/dma/ioat/registers.h | 251 ++
tools/include/linux/compiler.h | 4 +
tools/include/linux/io.h | 4 +-
tools/include/linux/pci_ids.h | 3212 +++++++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 6 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +
tools/testing/selftests/kvm/lib/kvm_util.c | 21 +
.../selftests/kvm/vfio_pci_device_irq_test.c | 173 +
tools/testing/selftests/vfio/.gitignore | 7 +
tools/testing/selftests/vfio/Makefile | 20 +
.../testing/selftests/vfio/lib/drivers/dsa.c | 416 +++
.../testing/selftests/vfio/lib/drivers/ioat.c | 235 ++
.../selftests/vfio/lib/include/vfio_util.h | 271 ++
tools/testing/selftests/vfio/lib/libvfio.mk | 26 +
.../selftests/vfio/lib/vfio_pci_device.c | 573 +++
.../selftests/vfio/lib/vfio_pci_driver.c | 126 +
tools/testing/selftests/vfio/run.sh | 110 +
.../selftests/vfio/vfio_dma_mapping_test.c | 239 ++
.../selftests/vfio/vfio_iommufd_setup_test.c | 133 +
.../selftests/vfio/vfio_pci_device_test.c | 195 +
.../selftests/vfio/vfio_pci_driver_test.c | 256 ++
29 files changed, 7780 insertions(+), 2 deletions(-)
create mode 100644 tools/arch/x86/include/asm/io.h
create mode 100644 tools/arch/x86/include/asm/special_insns.h
create mode 100644 tools/include/asm-generic/io.h
create mode 100644 tools/include/asm/io.h
create mode 100644 tools/include/drivers/dma/idxd/registers.h
create mode 100644 tools/include/drivers/dma/ioat/hw.h
create mode 100644 tools/include/drivers/dma/ioat/registers.h
create mode 100644 tools/include/linux/pci_ids.h
create mode 100644 tools/testing/selftests/kvm/vfio_pci_device_irq_test.c
create mode 100644 tools/testing/selftests/vfio/.gitignore
create mode 100644 tools/testing/selftests/vfio/Makefile
create mode 100644 tools/testing/selftests/vfio/lib/drivers/dsa.c
create mode 100644 tools/testing/selftests/vfio/lib/drivers/ioat.c
create mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h
create mode 100644 tools/testing/selftests/vfio/lib/libvfio.mk
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_device.c
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_driver.c
create mode 100755 tools/testing/selftests/vfio/run.sh
create mode 100644 tools/testing/selftests/vfio/vfio_dma_mapping_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_driver_test.c
base-commit: a11a72229881d8ac1d52ea727101bc9c744189c1
prerequisite-patch-id: 3bae97c9e1093148763235f47a84fa040b512d04
--
2.49.0.1151.ga128411c76-goog
glibc does not define SYS_futex for 32-bit architectures using 64-bit
time_t e.g. riscv32, therefore this test fails to compile since it does not
find SYS_futex in C library headers. Define SYS_futex as SYS_futex_time64
in this situation to ensure successful compilation and compatibility.
Signed-off-by: Ben Zong-You Xie <ben717(a)andestech.com>
Signed-off-by: Cynthia Huang <cynthia(a)andestech.com>
---
tools/testing/selftests/futex/include/futextest.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index ddbcfc9b7bac..7a5fd1d5355e 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -47,6 +47,17 @@ typedef volatile u_int32_t futex_t;
FUTEX_PRIVATE_FLAG)
#endif
+/*
+ * SYS_futex is expected from system C library, in glibc some 32-bit
+ * architectures (e.g. RV32) are using 64-bit time_t, therefore it doesn't have
+ * SYS_futex defined but just SYS_futex_time64. Define SYS_futex as
+ * SYS_futex_time64 in this situation to ensure the compilation and the
+ * compatibility.
+ */
+#if !defined(SYS_futex) && defined(SYS_futex_time64)
+#define SYS_futex SYS_futex_time64
+#endif
+
/**
* futex() - SYS_futex syscall wrapper
* @uaddr: address of first futex
--
2.34.1
Corrected two instances of the misspelled word 'occurences' to
'occurrences' in comments explaining node invariants in sparsebit.c.
These comments describe core behavior of the data structure and
should be clear.
Signed-off-by: Rahul Kumar <rk0006818(a)gmail.com>
---
tools/testing/selftests/kvm/lib/sparsebit.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/sparsebit.c b/tools/testing/selftests/kvm/lib/sparsebit.c
index cfed9d26cc71..a99188f87a38 100644
--- a/tools/testing/selftests/kvm/lib/sparsebit.c
+++ b/tools/testing/selftests/kvm/lib/sparsebit.c
@@ -116,7 +116,7 @@
*
* + A node with all mask bits set only occurs when the last bit
* described by the previous node is not equal to this nodes
- * starting index - 1. All such occurences of this condition are
+ * starting index - 1. All such occurrences of this condition are
* avoided by moving the setting of the nodes mask bits into
* the previous nodes num_after setting.
*
@@ -592,7 +592,7 @@ static struct node *node_split(struct sparsebit *s, sparsebit_idx_t idx)
*
* + A node with all mask bits set only occurs when the last bit
* described by the previous node is not equal to this nodes
- * starting index - 1. All such occurences of this condition are
+ * starting index - 1. All such occurrences of this condition are
* avoided by moving the setting of the nodes mask bits into
* the previous nodes num_after setting.
*/
--
2.43.0
This patch fixes two misspellings of the word 'occurrences' in comments within sparsebit.c used by the KVM selftests.
Fixing the spelling improves readability and clarity of the documented behavior.
Only comment text has been changed — there are no modifications to the functional logic of the tests.
I would appreciate your review and any feedback you may have.
Thank you for your time and support.
Best regards,
Rahul Kumar
Non-KVM folks,
I am hoping to route this through the KVM tree (6.17 or later), as the non-KVM
changes should be glorified nops. Please holler if you object to that idea.
Hyper-V folks in particular, let me know if you want a stable topic branch/tag,
e.g. on the off chance you want to make similar changes to the Hyper-V code,
and I'll make sure that happens.
As for what this series actually does...
Rework KVM's irqfd registration to require that an eventfd is bound to at
most one irqfd throughout the entire system. KVM currently disallows
binding an eventfd to multiple irqfds for a single VM, but doesn't reject
attempts to bind an eventfd to multiple VMs.
This is obviously an ABI change, but I'm fairly confident that it won't
break userspace, because binding an eventfd to multiple irqfds hasn't
truly worked since commit e8dbf19508a1 ("kvm/eventfd: Use priority waitqueue
to catch events before userspace"). A somewhat undocumented, and perhaps
even unintentional, side effect of suppressing eventfd notifications for
userspace is that the priority+exclusive behavior also suppresses eventfd
notifications for any subsequent waiters, even if they are priority waiters.
I.e. only the first VM with an irqfd+eventfd binding will get notifications.
And for IRQ bypass, a.k.a. device posted interrupts, globally unique
bindings are a hard requirement (at least on x86; I assume other archs are
the same). KVM and the IRQ bypass manager kinda sorta handle this, but in
the absolute worst way possible (IMO). Instead of surfacing an error to
userspace, KVM silently ignores IRQ bypass registration errors.
The motivation for this series is to harden against userspace goofs. AFAIK,
we (Google) have never actually had a bug where userspace tries to assign
an eventfd to multiple VMs, but the possibility has come up in more than one
bug investigation (our intra-host, a.k.a. copyless, migration scheme
transfers eventfds from the old to the new VM when updating the host VMM).
v3:
- Retain WQ_FLAG_EXCLUSIVE in mshv_eventfd.c, which snuck in between v1
and v2. [Peter]
- Use EXPORT_SYMBOL_GPL. [Peter]
- Move WQ_FLAG_EXCLUSIVE out of add_wait_queue_priority() in a prep patch
so that the affected subsystems are more explicitly documented (and then
immediately drop the flag from drivers/xen/privcmd.c, which amusingly
hides that file from the diff stats).
v2:
- https://lore.kernel.org/all/20250519185514.2678456-1-seanjc@google.com
- Use guard(spinlock_irqsave). [Prateek]
v1: https://lore.kernel.org/all/20250401204425.904001-1-seanjc@google.com
Sean Christopherson (13):
KVM: Use a local struct to do the initial vfs_poll() on an irqfd
KVM: Acquire SCRU lock outside of irqfds.lock during assignment
KVM: Initialize irqfd waitqueue callback when adding to the queue
KVM: Add irqfd to KVM's list via the vfs_poll() callback
KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock
sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority()
xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE
sched/wait: Add a waitqueue helper for fully exclusive priority
waiters
KVM: Disallow binding multiple irqfds to an eventfd with a priority
waiter
KVM: Drop sanity check that per-VM list of irqfds is unique
KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test
KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD
KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements
drivers/hv/mshv_eventfd.c | 8 ++
include/linux/kvm_irqfd.h | 1 -
include/linux/wait.h | 2 +
kernel/sched/wait.c | 22 ++-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/arm64/vgic_irq.c | 12 +-
.../testing/selftests/kvm/include/kvm_util.h | 40 ++++++
tools/testing/selftests/kvm/irqfd_test.c | 130 ++++++++++++++++++
.../selftests/kvm/x86/xen_shinfo_test.c | 21 +--
virt/kvm/eventfd.c | 130 +++++++++++++-----
10 files changed, 302 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/kvm/irqfd_test.c
base-commit: 45eb29140e68ffe8e93a5471006858a018480a45
--
2.49.0.1151.ga128411c76-goog
NOTE: I'm leaving Daynix Computing Ltd., for which I worked on this
patch series, by the end of this month.
While net-next is closed, this is the last chance for me to send another
version so let me send the local changes now.
Please contact Yuri Benditovich, who is CCed on this email, for anything
about this series.
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20250530-hash-v5-0-343d7d7a8200@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v12:
- Updated tools/testing/selftests/net/config.
- Split TUNSETVNETHASH.
- Link to v11: https://lore.kernel.org/r/20250317-rss-v11-0-4cacca92f31f@daynix.com
Changes in v11:
- Added the missing code to free vnet_hash in patch
"tap: Introduce virtio-net hash feature".
- Link to v10: https://lore.kernel.org/r/20250313-rss-v10-0-3185d73a9af0@daynix.com
Changes in v10:
- Split common code and TUN/TAP-specific code into separate patches.
- Reverted a spurious style change in patch "tun: Introduce virtio-net
hash feature".
- Added a comment explaining disable_ipv6 in tests.
- Used AF_PACKET for patch "selftest: tun: Add tests for
virtio-net hashing". I also added the usage of FIXTURE_VARIANT() as
the testing function now needs access to more variant-specific
variables.
- Corrected the message of patch "selftest: tun: Add tests for
virtio-net hashing"; it mentioned validation of configuration but
it is not scope of this patch.
- Expanded the description of patch "selftest: tun: Add tests for
virtio-net hashing".
- Added patch "tun: Allow steering eBPF program to fall back".
- Changed to handle TUNGETVNETHASHCAP before taking the rtnl lock.
- Removed redundant tests for tun_vnet_ioctl().
- Added patch "selftest: tap: Add tests for virtio-net ioctls".
- Added a design explanation of ioctls for extensibility and migration.
- Removed a few branches in patch
"vhost/net: Support VIRTIO_NET_F_HASH_REPORT".
- Link to v9: https://lore.kernel.org/r/20250307-rss-v9-0-df76624025eb@daynix.com
Changes in v9:
- Added a missing return statement in patch
"tun: Introduce virtio-net hash feature".
- Link to v8: https://lore.kernel.org/r/20250306-rss-v8-0-7ab4f56ff423@daynix.com
Changes in v8:
- Disabled IPv6 to eliminate noises in tests.
- Added a branch in tap to avoid unnecessary dissection when hash
reporting is disabled.
- Removed unnecessary rtnl_lock().
- Extracted code to handle new ioctls into separate functions to avoid
adding extra NULL checks to the code handling other ioctls.
- Introduced variable named "fd" to __tun_chr_ioctl().
- s/-/=/g in a patch message to avoid confusing Git.
- Link to v7: https://lore.kernel.org/r/20250228-rss-v7-0-844205cbbdd6@daynix.com
Changes in v7:
- Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for
VHOST_NET_F_VIRTIO_NET_HDR.
- s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing".
- Added tap_skb_cb type.
- Rebased.
- Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com
Changes in v6:
- Extracted changes to fill vnet header holes into another series.
- Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun:
Introduce virtio-net hash reporting feature", and "tun: Introduce
virtio-net RSS" into patch "tun: Introduce virtio-net hash feature".
- Dropped the RFC tag.
- Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
Changes in v5:
- Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE.
- Optimized the calculation of the hash value according to:
https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca…
- Added patch "tun: Unify vnet implementation".
- Dropped patch "tap: Pad virtio header with zero".
- Added patch "selftest: tun: Test vnet ioctls without device".
- Reworked selftests to skip for older kernels.
- Documented the case when the underlying device is deleted and packets
have queue_mapping set by TC.
- Reordered test harness arguments.
- Added code to handle fragmented packets.
- Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (10):
virtio_net: Add functions for hashing
net: flow_dissector: Export flow_keys_dissector_symmetric
tun: Allow steering eBPF program to fall back
tun: Add common virtio-net hash feature code
tun: Introduce virtio-net hash feature
tap: Introduce virtio-net hash feature
selftest: tun: Test vnet ioctls without device
selftest: tun: Add tests for virtio-net hashing
selftest: tap: Add tests for virtio-net ioctls
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
drivers/net/Kconfig | 1 +
drivers/net/ipvlan/ipvtap.c | 2 +-
drivers/net/macvtap.c | 2 +-
drivers/net/tap.c | 80 +++++-
drivers/net/tun.c | 92 +++++--
drivers/net/tun_vnet.h | 165 +++++++++++-
drivers/vhost/net.c | 68 ++---
include/linux/if_tap.h | 4 +-
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 188 ++++++++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 80 ++++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/tap.c | 131 +++++++++-
tools/testing/selftests/net/tun.c | 485 ++++++++++++++++++++++++++++++++++-
19 files changed, 1234 insertions(+), 85 deletions(-)
---
base-commit: 5cb8274d66c611b7889565c418a8158517810f9b
change-id: 20240403-rss-e737d89efa77
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>