┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v21:
- Align to bar size, try to fix Niklas reported problem.
- Rebase to v6.16-rc5
- Link to v20: https://lore.kernel.org/r/20250709-ep-msi-v20-0-43d56f9bd54a@nxp.com
Changes in v20:
- remove set epf of_node's patch and only support one epf now.
- move imx6's patch to first
- detail change see each patches' change log
- Link to v19: https://lore.kernel.org/r/20250609-ep-msi-v19-0-77362eaa48fa@nxp.com
Changes in v19:
- irq part already in v6.16-rc1, only missed pcie/dts part
- rebase to v6.16-rc1
- update commit message for patch IMMUTABLE check.
- Link to v18: https://lore.kernel.org/r/20250414-ep-msi-v18-0-f69b49917464@nxp.com
Changes in v18:
- pci-ep.yaml: sort property order, fix maxvalue to 0x7ffff for msi-map-mask and
iommu-map-mask
- Link to v17: https://lore.kernel.org/r/20250407-ep-msi-v17-0-633ab45a31d0@nxp.com
Changes in v17:
- move document part to pci-ep.yaml
- Link to v16: https://lore.kernel.org/r/20250404-ep-msi-v16-0-d4919d68c0d0@nxp.com
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (9):
PCI: imx6: Add helper function imx_pcie_add_lut_by_rid()
PCI: imx6: Add LUT configuration for MSI/IOMMU in Endpoint mode
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
arm64: dts: imx95: Add msi-map for pci-ep device
Documentation/PCI/endpoint/pci-test-howto.rst | 14 +++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/misc/pci_endpoint_test.c | 85 ++++++++++++-
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Kconfig | 8 ++
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 136 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 98 +++++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 36 ++++++
include/linux/pci-ep-msi.h | 28 +++++
include/linux/pci-epf.h | 18 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 +++++
13 files changed, 470 insertions(+), 9 deletions(-)
---
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
--
Frank Li <Frank.Li(a)nxp.com>
Recently, I reviewed a patch on the mm/kselftest mailing list about a
test which had obvious type mismatch fix in it. It was strange why that
wasn't caught during development and when patch was accepted. This led
me to discover that those extra compiler options to catch these warnings
aren't being used. When I added them, I found tens of warnings in just
mm suite.
In this series, I'm adding these flags and fixing those warnings. In the
last try several months ago [1], I'd patches for individual tests. I've
made patches better by grouping the same type of fixes together. Hence
there is no changelog for individual patches.
The changes have been build tested on x86_64, arm64, powerpc64 and partially
on riscv64. The test run with and without this series has been done on
x86_64.
---
Changes since v1:
- Drop test harness patch which isn't needed anymore
- Revamp how patches are written per same kind of failure
Muhammad Usama Anjum (8):
selftests/mm: Add -Wunreachable-code and fix warnings
selftests/mm: protection_keys: Fix dead code
selftests: kselftest.h: Add __unused macro
selftests/mm: Add -Wunused family of flags
selftests/mm: Remove unused parameters
selftests/mm: Mark unused arguments with __unused
selftests/mm: Mark unused arguments with __unused
selftests/mm: Fix unused parameter warnings for different
architectures
tools/testing/selftests/kselftest.h | 4 ++
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/compaction_test.c | 2 +-
tools/testing/selftests/mm/cow.c | 22 +++++------
tools/testing/selftests/mm/droppable.c | 2 +-
tools/testing/selftests/mm/gup_longterm.c | 2 +-
tools/testing/selftests/mm/hmm-tests.c | 5 +--
tools/testing/selftests/mm/hugepage-vmemmap.c | 2 +-
tools/testing/selftests/mm/hugetlb-madvise.c | 2 +-
.../selftests/mm/hugetlb-soft-offline.c | 2 +-
.../selftests/mm/hugetlb_fault_after_madv.c | 4 +-
.../selftests/mm/hugetlb_madv_vs_map.c | 6 +--
tools/testing/selftests/mm/ksm_tests.c | 17 ++++-----
tools/testing/selftests/mm/madv_populate.c | 2 +-
tools/testing/selftests/mm/map_populate.c | 2 +-
tools/testing/selftests/mm/memfd_secret.c | 6 +--
.../testing/selftests/mm/mlock-random-test.c | 2 +-
tools/testing/selftests/mm/mlock2-tests.c | 2 +-
tools/testing/selftests/mm/mseal_test.c | 8 +++-
tools/testing/selftests/mm/on-fault-limit.c | 2 +-
tools/testing/selftests/mm/pfnmap.c | 2 +-
tools/testing/selftests/mm/pkey-arm64.h | 5 ++-
tools/testing/selftests/mm/pkey-powerpc.h | 2 +-
tools/testing/selftests/mm/pkey-x86.h | 3 +-
.../selftests/mm/pkey_sighandler_tests.c | 35 ++++++++++++-----
tools/testing/selftests/mm/protection_keys.c | 22 +++++------
tools/testing/selftests/mm/soft-dirty.c | 6 +--
.../selftests/mm/split_huge_page_test.c | 8 ++--
tools/testing/selftests/mm/uffd-common.c | 15 ++++----
tools/testing/selftests/mm/uffd-common.h | 2 +-
tools/testing/selftests/mm/uffd-stress.c | 2 +-
tools/testing/selftests/mm/uffd-unit-tests.c | 38 +++++++++----------
tools/testing/selftests/mm/uffd-wp-mremap.c | 2 +-
.../selftests/mm/virtual_address_range.c | 2 +-
34 files changed, 130 insertions(+), 111 deletions(-)
--
2.39.5
This series introduces VFIO selftests, located in
tools/testing/selftests/vfio/.
VFIO selftests aim to enable kernel developers to write and run tests
that take the form of userspace programs that interact with VFIO and
IOMMUFD uAPIs. VFIO selftests can be used to write functional tests for
new features, regression tests for bugs, and performance tests for
optimizations.
These tests are designed to interact with real PCI devices, i.e. they do
not rely on mocking out or faking any behavior in the kernel. This
allows the tests to exercise not only VFIO but also IOMMUFD, the IOMMU
driver, interrupt remapping, IRQ handling, etc.
For more background on the motivation and design of this series, please
see the RFC:
https://lore.kernel.org/kvm/20250523233018.1702151-1-dmatlack@google.com/
This series can also be found on GitHub:
https://github.com/dmatlack/linux/tree/vfio/selftests/v1
Changelog
-----------------------------------------------------------------------
RFC: https://lore.kernel.org/kvm/20250523233018.1702151-1-dmatlack@google.com/
- Add symlink to linux/pci_ids.h instead of copying (Jason)
- Add symlinks to drivers/dma/*/*.h instead of copying (Jason)
- Automatically replicate vfio_dma_mapping_test across backing
sources using fixture variants (Jason)
- Automatically replicate vfio_dma_mapping_test and
vfio_pci_driver_test across all iommu_modes using fixture
variants (Jason)
- Invert access() check in vfio_dma_mapping_test (me)
- Use driver_override instead of add/remove_id (Alex)
- Allow tests to get BDF from env var (Alex)
- Use KSFT_FAIL instead of 1 to exit with failure (Alex)
- Unconditionally create $(LIBVFIO_O_DIRS) to avoid target
conflict with ../cgroup/lib/libcgroup.mk when building
KVM selftests (me)
- Allow VFIO selftests to run automatically by switching from
TEST_GEN_PROGS_EXTENDED to TEST_GEN_PROGS. Automatically run
selftests will use $VFIO_SELFTESTS_BDF environment variable
to know which device to use (Alex)
- Replace hardcoded SZ_4K with getpagesize() in vfio_dma_mapping_test
to support platforms with other page sizes (me)
- Make all global variables static where possible (me)
- Pass argc and argv to test_harness_main() so that users can
pass flags to the kselftest harness (me)
Instructions
-----------------------------------------------------------------------
Running VFIO selftests requires at a PCI device bound to vfio-pci for
the tests to use. The address of this device is passed to the test as
a segment:bus:device.function string, which must match the path to
the device in /sys/bus/pci/devices/ (e.g. 0000:00:04.0).
Once you have chosen a device, there is a helper script provided to
unbind the device from its current driver, bind it to vfio-pci, export
the environment variable $VFIO_SELFTESTS_BDF, and launch a shell:
$ tools/testing/selftests/vfio/run.sh -d 0000:00:04.0 -s
The -d option tells the script which device to use and the -s option
tells the script to launch a shell.
Additionally, the VFIO selftest vfio_dma_mapping_test has test cases
that rely on HugeTLB pages being available, otherwise they are skipped.
To enable those tests make sure at least 1 2MB and 1 1GB HugeTLB pages
are available.
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
To run all VFIO selftests using make:
$ make -C tools/testing/selftests/vfio run_tests
To run individual tests:
$ tools/testing/selftests/vfio/vfio_dma_mapping_test
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous_hugetlb_2mb
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -r vfio_dma_mapping_test.iommufd_anonymous_hugetlb_2mb.dma_map_unmap
The environment variable $VFIO_SELFTESTS_BDF can be overridden for a
specific test by passing in the BDF on the command line as the last
positional argument.
$ tools/testing/selftests/vfio/vfio_dma_mapping_test 0000:00:04.0
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous_hugetlb_2mb 0000:00:04.0
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -r vfio_dma_mapping_test.iommufd_anonymous_hugetlb_2mb.dma_map_unmap 0000:00:04.0
When you are done, free the HugeTLB pages and exit the shell started by
run.sh. Exiting the shell will cause the device to be unbound from
vfio-pci and bound back to its original driver.
$ echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ exit
It's also possible to use run.sh to run just a single test hermetically,
rather than dropping into a shell:
$ tools/testing/selftests/vfio/run.sh -d 0000:00:04.0 -- tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous
Tests
-----------------------------------------------------------------------
There are 5 tests in this series, mostly to demonstrate as a
proof-of-concept:
- tools/testing/selftests/vfio/vfio_pci_device_test.c
- tools/testing/selftests/vfio/vfio_pci_driver_test.c
- tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
- tools/testing/selftests/vfio/vfio_dma_mapping_test.c
- tools/testing/selftests/kvm/vfio_pci_device_irq_test.c
Future Areas of Development
-----------------------------------------------------------------------
Library:
- Driver support for devices that can be used on AMD, ARM, and other
platforms (e.g. mlx5).
- Driver support for a device available in QEMU VMs (e.g.
pcie-ats-testdev [1])
- Support for tests that use multiple devices.
- Support for IOMMU groups with multiple devices.
- Support for multiple devices sharing the same container/iommufd.
- Sharing TEST_ASSERT() macros and other common code between KVM
and VFIO selftests.
Tests:
- DMA mapping performance tests for BARs/HugeTLB/etc.
- Porting tests from
https://github.com/awilliam/tests/commits/for-clg/ to selftests.
- Live Update selftests.
- Porting Sean's KVM selftest for posted interrupts to use the VFIO
selftests library [2]
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Vipin Sharma <vipinsh(a)google.com>
Cc: Josh Hilke <jrhilke(a)google.com>
Cc: Aaron Lewis <aaronlewis(a)google.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Saeed Mahameed <saeedm(a)nvidia.com>
Cc: Adithya Jayachandran <ajayachandra(a)nvidia.com>
Cc: Joel Granados <joel.granados(a)kernel.org>
[1] https://github.com/Joelgranados/qemu/blob/pcie-testdev/hw/misc/pcie-ats-tes…
[2] https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/
David Matlack (28):
selftests: Create tools/testing/selftests/vfio
vfio: selftests: Add a helper library for VFIO selftests
vfio: selftests: Introduce vfio_pci_device_test
tools headers: Add stub definition for __iomem
tools headers: Import asm-generic MMIO helpers
tools headers: Import x86 MMIO helper overrides
tools headers: Import iosubmit_cmds512()
tools headers: Add symlink to linux/pci_ids.h
vfio: selftests: Keep track of DMA regions mapped into the device
vfio: selftests: Enable asserting MSI eventfds not firing
vfio: selftests: Add a helper for matching vendor+device IDs
vfio: selftests: Add driver framework
vfio: sefltests: Add vfio_pci_driver_test
dmaengine: ioat: Move system_has_dca_enabled() to dma.h
vfio: selftests: Add driver for Intel CBDMA
dmaengine: idxd: Allow registers.h to be included from tools/
vfio: selftests: Add driver for Intel DSA
vfio: selftests: Move helper to get cdev path to libvfio
vfio: selftests: Encapsulate IOMMU mode
vfio: selftests: Replicate tests across all iommu_modes
vfio: selftests: Add vfio_type1v2_mode
vfio: selftests: Add iommufd_compat_type1{,v2} modes
vfio: selftests: Add iommufd mode
vfio: selftests: Make iommufd the default iommu_mode
vfio: selftests: Add a script to help with running VFIO selftests
KVM: selftests: Build and link sefltests/vfio/lib into KVM selftests
KVM: selftests: Test sending a vfio-pci device IRQ to a VM
KVM: selftests: Add -d option to vfio_pci_device_irq_test for
device-sent MSIs
Josh Hilke (5):
vfio: selftests: Test basic VFIO and IOMMUFD integration
vfio: selftests: Move vfio dma mapping test to their own file
vfio: selftests: Add test to reset vfio device.
vfio: selftests: Add DMA mapping tests for 2M and 1G HugeTLB
vfio: selftests: Validate 2M/1G HugeTLB are mapped as 2M/1G in IOMMU
MAINTAINERS | 7 +
drivers/dma/idxd/registers.h | 4 +
drivers/dma/ioat/dma.h | 2 +
drivers/dma/ioat/hw.h | 3 -
tools/arch/x86/include/asm/io.h | 101 +++
tools/arch/x86/include/asm/special_insns.h | 27 +
tools/include/asm-generic/io.h | 482 ++++++++++++++
tools/include/asm/io.h | 11 +
tools/include/linux/compiler.h | 4 +
tools/include/linux/io.h | 4 +-
tools/include/linux/pci_ids.h | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 4 +
.../testing/selftests/kvm/include/kvm_util.h | 4 +
tools/testing/selftests/kvm/lib/kvm_util.c | 21 +
.../selftests/kvm/vfio_pci_device_irq_test.c | 172 +++++
tools/testing/selftests/vfio/.gitignore | 7 +
tools/testing/selftests/vfio/Makefile | 21 +
.../selftests/vfio/lib/drivers/dsa/dsa.c | 416 ++++++++++++
.../vfio/lib/drivers/dsa/registers.h | 1 +
.../selftests/vfio/lib/drivers/ioat/hw.h | 1 +
.../selftests/vfio/lib/drivers/ioat/ioat.c | 235 +++++++
.../vfio/lib/drivers/ioat/registers.h | 1 +
.../selftests/vfio/lib/include/vfio_util.h | 295 +++++++++
tools/testing/selftests/vfio/lib/libvfio.mk | 24 +
.../selftests/vfio/lib/vfio_pci_device.c | 594 ++++++++++++++++++
.../selftests/vfio/lib/vfio_pci_driver.c | 126 ++++
tools/testing/selftests/vfio/run.sh | 109 ++++
.../selftests/vfio/vfio_dma_mapping_test.c | 199 ++++++
.../selftests/vfio/vfio_iommufd_setup_test.c | 127 ++++
.../selftests/vfio/vfio_pci_device_test.c | 176 ++++++
.../selftests/vfio/vfio_pci_driver_test.c | 247 ++++++++
32 files changed, 3423 insertions(+), 4 deletions(-)
create mode 100644 tools/arch/x86/include/asm/io.h
create mode 100644 tools/arch/x86/include/asm/special_insns.h
create mode 100644 tools/include/asm-generic/io.h
create mode 100644 tools/include/asm/io.h
create mode 120000 tools/include/linux/pci_ids.h
create mode 100644 tools/testing/selftests/kvm/vfio_pci_device_irq_test.c
create mode 100644 tools/testing/selftests/vfio/.gitignore
create mode 100644 tools/testing/selftests/vfio/Makefile
create mode 100644 tools/testing/selftests/vfio/lib/drivers/dsa/dsa.c
create mode 120000 tools/testing/selftests/vfio/lib/drivers/dsa/registers.h
create mode 120000 tools/testing/selftests/vfio/lib/drivers/ioat/hw.h
create mode 100644 tools/testing/selftests/vfio/lib/drivers/ioat/ioat.c
create mode 120000 tools/testing/selftests/vfio/lib/drivers/ioat/registers.h
create mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h
create mode 100644 tools/testing/selftests/vfio/lib/libvfio.mk
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_device.c
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_driver.c
create mode 100755 tools/testing/selftests/vfio/run.sh
create mode 100644 tools/testing/selftests/vfio/vfio_dma_mapping_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_driver_test.c
base-commit: e271ed52b344ac02d4581286961d0c40acc54c03
prerequisite-patch-id: c1decca4653262d3d2451e6fd4422ebff9c0b589
--
2.50.0.rc2.701.gf1e915cc24-goog
This introduces signal->exec_bprm, which is used to
fix the case when at least one of the sibling threads
is traced, and therefore the trace process may dead-lock
in ptrace_attach, but de_thread will need to wait for the
tracer to continue execution.
The solution is to detect this situation and allow
ptrace_attach to continue by temporarily releasing the
cred_guard_mutex, while de_thread() is still waiting for
traced zombies to be eventually released by the tracer.
In the case of the thread group leader we only have to wait
for the thread to become a zombie, which may also need
co-operation from the tracer due to PTRACE_O_TRACEEXIT.
When a tracer wants to ptrace_attach a task that already
is in execve, we simply retry the ptrace_may_access
check while temporarily installing the new credentials
and dumpability which are about to be used after execve
completes. If the ptrace_attach happens on a thread that
is a sibling-thread of the thread doing execve, it is
sufficient to check against the old credentials, as this
thread will be waited for, before the new credentials are
installed.
Other threads die quickly since the cred_guard_mutex is
released, but a deadly signal is already pending. In case
the mutex_lock_killable misses the signal, the non-zero
current->signal->exec_bprm makes sure they release the
mutex immediately and return with -ERESTARTNOINTR.
This means there is no API change, unlike the previous
version of this patch which was discussed here:
https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace882@hotmail.d…
See tools/testing/selftests/ptrace/vmaccess.c
for a test case that gets fixed by this change.
Note that since the test case was originally designed to
test the ptrace_attach returning an error in this situation,
the test expectation needed to be adjusted, to allow the
API to succeed at the first attempt.
Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de>
---
fs/exec.c | 69 ++++++++++++++++-------
fs/proc/base.c | 6 ++
include/linux/cred.h | 1 +
include/linux/sched/signal.h | 18 ++++++
kernel/cred.c | 28 +++++++--
kernel/ptrace.c | 32 +++++++++++
kernel/seccomp.c | 12 +++-
tools/testing/selftests/ptrace/vmaccess.c | 23 +++++---
8 files changed, 155 insertions(+), 34 deletions(-)
v10: Changes to previous version, make the PTRACE_ATTACH
retun -EAGAIN, instead of execve return -ERESTARTSYS.
Added some lessions learned to the description.
v11: Check old and new credentials in PTRACE_ATTACH again without
changing the API.
Note: I got actually one response from an automatic checker to the v11 patch,
https://lore.kernel.org/lkml/202107121344.wu68hEPF-lkp@intel.com/
which is complaining about:
>> kernel/ptrace.c:425:26: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct cred const *old_cred @@ got struct cred const [noderef] __rcu *real_cred @@
417 struct linux_binprm *bprm = task->signal->exec_bprm;
418 const struct cred *old_cred;
419 struct mm_struct *old_mm;
420
421 retval = down_write_killable(&task->signal->exec_update_lock);
422 if (retval)
423 goto unlock_creds;
424 task_lock(task);
> 425 old_cred = task->real_cred;
v12: Essentially identical to v11.
- Fixed a minor merge conflict in linux v5.17, and fixed the
above mentioned nit by adding __rcu to the declaration.
- re-tested the patch with all linux versions from v5.11 to v6.6
v10 was an alternative approach which did imply an API change.
But I would prefer to avoid such an API change.
The difficult part is getting the right dumpability flags assigned
before de_thread starts, hope you like this version.
If not, the v10 is of course also acceptable.
Thanks
Bernd.
diff --git a/fs/exec.c b/fs/exec.c
index 2f2b0acec4f0..902d3b230485 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1041,11 +1041,13 @@ static int exec_mmap(struct mm_struct *mm)
return 0;
}
-static int de_thread(struct task_struct *tsk)
+static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm)
{
struct signal_struct *sig = tsk->signal;
struct sighand_struct *oldsighand = tsk->sighand;
spinlock_t *lock = &oldsighand->siglock;
+ struct task_struct *t = tsk;
+ bool unsafe_execve_in_progress = false;
if (thread_group_empty(tsk))
goto no_thread_group;
@@ -1068,6 +1070,19 @@ static int de_thread(struct task_struct *tsk)
if (!thread_group_leader(tsk))
sig->notify_count--;
+ while_each_thread(tsk, t) {
+ if (unlikely(t->ptrace)
+ && (t != tsk->group_leader || !t->exit_state))
+ unsafe_execve_in_progress = true;
+ }
+
+ if (unlikely(unsafe_execve_in_progress)) {
+ spin_unlock_irq(lock);
+ sig->exec_bprm = bprm;
+ mutex_unlock(&sig->cred_guard_mutex);
+ spin_lock_irq(lock);
+ }
+
while (sig->notify_count) {
__set_current_state(TASK_KILLABLE);
spin_unlock_irq(lock);
@@ -1158,6 +1173,11 @@ static int de_thread(struct task_struct *tsk)
release_task(leader);
}
+ if (unlikely(unsafe_execve_in_progress)) {
+ mutex_lock(&sig->cred_guard_mutex);
+ sig->exec_bprm = NULL;
+ }
+
sig->group_exec_task = NULL;
sig->notify_count = 0;
@@ -1169,6 +1189,11 @@ static int de_thread(struct task_struct *tsk)
return 0;
killed:
+ if (unlikely(unsafe_execve_in_progress)) {
+ mutex_lock(&sig->cred_guard_mutex);
+ sig->exec_bprm = NULL;
+ }
+
/* protects against exit_notify() and __exit_signal() */
read_lock(&tasklist_lock);
sig->group_exec_task = NULL;
@@ -1253,6 +1278,24 @@ int begin_new_exec(struct linux_binprm * bprm)
if (retval)
return retval;
+ /* If the binary is not readable then enforce mm->dumpable=0 */
+ would_dump(bprm, bprm->file);
+ if (bprm->have_execfd)
+ would_dump(bprm, bprm->executable);
+
+ /*
+ * Figure out dumpability. Note that this checking only of current
+ * is wrong, but userspace depends on it. This should be testing
+ * bprm->secureexec instead.
+ */
+ if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
+ is_dumpability_changed(current_cred(), bprm->cred) ||
+ !(uid_eq(current_euid(), current_uid()) &&
+ gid_eq(current_egid(), current_gid())))
+ set_dumpable(bprm->mm, suid_dumpable);
+ else
+ set_dumpable(bprm->mm, SUID_DUMP_USER);
+
/*
* Ensure all future errors are fatal.
*/
@@ -1261,7 +1304,7 @@ int begin_new_exec(struct linux_binprm * bprm)
/*
* Make this the only thread in the thread group.
*/
- retval = de_thread(me);
+ retval = de_thread(me, bprm);
if (retval)
goto out;
@@ -1284,11 +1327,6 @@ int begin_new_exec(struct linux_binprm * bprm)
if (retval)
goto out;
- /* If the binary is not readable then enforce mm->dumpable=0 */
- would_dump(bprm, bprm->file);
- if (bprm->have_execfd)
- would_dump(bprm, bprm->executable);
-
/*
* Release all of the old mmap stuff
*/
@@ -1350,18 +1388,6 @@ int begin_new_exec(struct linux_binprm * bprm)
me->sas_ss_sp = me->sas_ss_size = 0;
- /*
- * Figure out dumpability. Note that this checking only of current
- * is wrong, but userspace depends on it. This should be testing
- * bprm->secureexec instead.
- */
- if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
- !(uid_eq(current_euid(), current_uid()) &&
- gid_eq(current_egid(), current_gid())))
- set_dumpable(current->mm, suid_dumpable);
- else
- set_dumpable(current->mm, SUID_DUMP_USER);
-
perf_event_exec();
__set_task_comm(me, kbasename(bprm->filename), true);
@@ -1480,6 +1506,11 @@ static int prepare_bprm_creds(struct linux_binprm *bprm)
if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex))
return -ERESTARTNOINTR;
+ if (unlikely(current->signal->exec_bprm)) {
+ mutex_unlock(¤t->signal->cred_guard_mutex);
+ return -ERESTARTNOINTR;
+ }
+
bprm->cred = prepare_exec_creds();
if (likely(bprm->cred))
return 0;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ffd54617c354..0da9adfadb48 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2788,6 +2788,12 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
if (rv < 0)
goto out_free;
+ if (unlikely(current->signal->exec_bprm)) {
+ mutex_unlock(¤t->signal->cred_guard_mutex);
+ rv = -ERESTARTNOINTR;
+ goto out_free;
+ }
+
rv = security_setprocattr(PROC_I(inode)->op.lsm,
file->f_path.dentry->d_name.name, page,
count);
diff --git a/include/linux/cred.h b/include/linux/cred.h
index f923528d5cc4..b01e309f5686 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -159,6 +159,7 @@ extern const struct cred *get_task_cred(struct task_struct *);
extern struct cred *cred_alloc_blank(void);
extern struct cred *prepare_creds(void);
extern struct cred *prepare_exec_creds(void);
+extern bool is_dumpability_changed(const struct cred *, const struct cred *);
extern int commit_creds(struct cred *);
extern void abort_creds(struct cred *);
extern const struct cred *override_creds(const struct cred *);
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 0014d3adaf84..14df7073a0a8 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -234,9 +234,27 @@ struct signal_struct {
struct mm_struct *oom_mm; /* recorded mm when the thread group got
* killed by the oom killer */
+ struct linux_binprm *exec_bprm; /* Used to check ptrace_may_access
+ * against new credentials while
+ * de_thread is waiting for other
+ * traced threads to terminate.
+ * Set while de_thread is executing.
+ * The cred_guard_mutex is released
+ * after de_thread() has called
+ * zap_other_threads(), therefore
+ * a fatal signal is guaranteed to be
+ * already pending in the unlikely
+ * event, that
+ * current->signal->exec_bprm happens
+ * to be non-zero after the
+ * cred_guard_mutex was acquired.
+ */
+
struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
* (notably. ptrace)
+ * Held while execve runs, except when
+ * a sibling thread is being traced.
* Deprecated do not use in new code.
* Use exec_update_lock instead.
*/
diff --git a/kernel/cred.c b/kernel/cred.c
index 98cb4eca23fb..586cb6c7cf6b 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -433,6 +433,28 @@ static bool cred_cap_issubset(const struct cred *set, const struct cred *subset)
return false;
}
+/**
+ * is_dumpability_changed - Will changing creds from old to new
+ * affect the dumpability in commit_creds?
+ *
+ * Return: false - dumpability will not be changed in commit_creds.
+ * Return: true - dumpability will be changed to non-dumpable.
+ *
+ * @old: The old credentials
+ * @new: The new credentials
+ */
+bool is_dumpability_changed(const struct cred *old, const struct cred *new)
+{
+ if (!uid_eq(old->euid, new->euid) ||
+ !gid_eq(old->egid, new->egid) ||
+ !uid_eq(old->fsuid, new->fsuid) ||
+ !gid_eq(old->fsgid, new->fsgid) ||
+ !cred_cap_issubset(old, new))
+ return true;
+
+ return false;
+}
+
/**
* commit_creds - Install new credentials upon the current task
* @new: The credentials to be assigned
@@ -467,11 +489,7 @@ int commit_creds(struct cred *new)
get_cred(new); /* we will require a ref for the subj creds too */
/* dumpability changes */
- if (!uid_eq(old->euid, new->euid) ||
- !gid_eq(old->egid, new->egid) ||
- !uid_eq(old->fsuid, new->fsuid) ||
- !gid_eq(old->fsgid, new->fsgid) ||
- !cred_cap_issubset(old, new)) {
+ if (is_dumpability_changed(old, new)) {
if (task->mm)
set_dumpable(task->mm, suid_dumpable);
task->pdeath_signal = 0;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 443057bee87c..eb1c450bb7d7 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -20,6 +20,7 @@
#include <linux/pagemap.h>
#include <linux/ptrace.h>
#include <linux/security.h>
+#include <linux/binfmts.h>
#include <linux/signal.h>
#include <linux/uio.h>
#include <linux/audit.h>
@@ -435,6 +436,28 @@ static int ptrace_attach(struct task_struct *task, long request,
if (retval)
goto unlock_creds;
+ if (unlikely(task->in_execve)) {
+ struct linux_binprm *bprm = task->signal->exec_bprm;
+ const struct cred __rcu *old_cred;
+ struct mm_struct *old_mm;
+
+ retval = down_write_killable(&task->signal->exec_update_lock);
+ if (retval)
+ goto unlock_creds;
+ task_lock(task);
+ old_cred = task->real_cred;
+ old_mm = task->mm;
+ rcu_assign_pointer(task->real_cred, bprm->cred);
+ task->mm = bprm->mm;
+ retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+ rcu_assign_pointer(task->real_cred, old_cred);
+ task->mm = old_mm;
+ task_unlock(task);
+ up_write(&task->signal->exec_update_lock);
+ if (retval)
+ goto unlock_creds;
+ }
+
write_lock_irq(&tasklist_lock);
retval = -EPERM;
if (unlikely(task->exit_state))
@@ -508,6 +531,14 @@ static int ptrace_traceme(void)
{
int ret = -EPERM;
+ if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex))
+ return -ERESTARTNOINTR;
+
+ if (unlikely(current->signal->exec_bprm)) {
+ mutex_unlock(¤t->signal->cred_guard_mutex);
+ return -ERESTARTNOINTR;
+ }
+
write_lock_irq(&tasklist_lock);
/* Are we already being traced? */
if (!current->ptrace) {
@@ -523,6 +554,7 @@ static int ptrace_traceme(void)
}
}
write_unlock_irq(&tasklist_lock);
+ mutex_unlock(¤t->signal->cred_guard_mutex);
return ret;
}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 255999ba9190..b29bbfa0b044 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1955,9 +1955,15 @@ static long seccomp_set_mode_filter(unsigned int flags,
* Make sure we cannot change seccomp or nnp state via TSYNC
* while another thread is in the middle of calling exec.
*/
- if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
- mutex_lock_killable(¤t->signal->cred_guard_mutex))
- goto out_put_fd;
+ if (flags & SECCOMP_FILTER_FLAG_TSYNC) {
+ if (mutex_lock_killable(¤t->signal->cred_guard_mutex))
+ goto out_put_fd;
+
+ if (unlikely(current->signal->exec_bprm)) {
+ mutex_unlock(¤t->signal->cred_guard_mutex);
+ goto out_put_fd;
+ }
+ }
spin_lock_irq(¤t->sighand->siglock);
diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
index 4db327b44586..3b7d81fb99bb 100644
--- a/tools/testing/selftests/ptrace/vmaccess.c
+++ b/tools/testing/selftests/ptrace/vmaccess.c
@@ -39,8 +39,15 @@ TEST(vmaccess)
f = open(mm, O_RDONLY);
ASSERT_GE(f, 0);
close(f);
- f = kill(pid, SIGCONT);
- ASSERT_EQ(f, 0);
+ f = waitpid(-1, NULL, 0);
+ ASSERT_NE(f, -1);
+ ASSERT_NE(f, 0);
+ ASSERT_NE(f, pid);
+ f = waitpid(-1, NULL, 0);
+ ASSERT_EQ(f, pid);
+ f = waitpid(-1, NULL, 0);
+ ASSERT_EQ(f, -1);
+ ASSERT_EQ(errno, ECHILD);
}
TEST(attach)
@@ -57,22 +64,24 @@ TEST(attach)
sleep(1);
k = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
- ASSERT_EQ(errno, EAGAIN);
- ASSERT_EQ(k, -1);
+ ASSERT_EQ(k, 0);
k = waitpid(-1, &s, WNOHANG);
ASSERT_NE(k, -1);
ASSERT_NE(k, 0);
ASSERT_NE(k, pid);
ASSERT_EQ(WIFEXITED(s), 1);
ASSERT_EQ(WEXITSTATUS(s), 0);
- sleep(1);
- k = ptrace(PTRACE_ATTACH, pid, 0L, 0L);
+ k = waitpid(-1, &s, 0);
+ ASSERT_EQ(k, pid);
+ ASSERT_EQ(WIFSTOPPED(s), 1);
+ ASSERT_EQ(WSTOPSIG(s), SIGTRAP);
+ k = ptrace(PTRACE_CONT, pid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, 0);
ASSERT_EQ(k, pid);
ASSERT_EQ(WIFSTOPPED(s), 1);
ASSERT_EQ(WSTOPSIG(s), SIGSTOP);
- k = ptrace(PTRACE_DETACH, pid, 0L, 0L);
+ k = ptrace(PTRACE_CONT, pid, 0L, 0L);
ASSERT_EQ(k, 0);
k = waitpid(-1, &s, 0);
ASSERT_EQ(k, pid);
--
2.39.2
Rename is_signed_type() to is_signed_var() to avoid colliding with a macro
of the same name defined by linux/overflow.h. Note, overflow.h's version
takes a type as the input, whereas the harness's version takes a variable!
This fixes warnings (and presumably potential test failures) in tests
that utilize the selftests harness and happen to (indirectly) include
overflow.h.
In file included from tools/include/linux/bits.h:34,
from tools/include/linux/bitops.h:14,
from tools/include/linux/hashtable.h:13,
from include/kvm_util.h:11,
from x86/userspace_msr_exit_test.c:11:
tools/include/linux/overflow.h:31:9: error: "is_signed_type" redefined [-Werror]
31 | #define is_signed_type(type) (((type)(-1)) < (type)1)
| ^~~~~~~~~~~~~~
In file included from include/kvm_test_harness.h:11,
from x86/userspace_msr_exit_test.c:9:
../kselftest_harness.h:754:9: note: this is the location of the previous definition
754 | #define is_signed_type(var) (!!(((__typeof__(var))(-1)) < (__typeof__(var))1))
| ^~~~~~~~~~~~~~
Opportunistically use is_signed_type() to implement is_signed_var() so
that the relationship and differences are obvious.
Fixes: fc92099902fb ("tools headers: Synchronize linux/bits.h with the kernel sources")
Cc: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
This is probably compile-tested only, I don't think any of the KVM selftests
utilize the harness's EXPECT macros.
tools/testing/selftests/kselftest_harness.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 2925e47db995..f3e7a46345db 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -56,6 +56,7 @@
#include <asm/types.h>
#include <ctype.h>
#include <errno.h>
+#include <linux/overflow.h>
#include <linux/unistd.h>
#include <poll.h>
#include <stdbool.h>
@@ -751,7 +752,7 @@
for (; _metadata->trigger; _metadata->trigger = \
__bail(_assert, _metadata))
-#define is_signed_type(var) (!!(((__typeof__(var))(-1)) < (__typeof__(var))1))
+#define is_signed_var(var) is_signed_type(__typeof__(var))
#define __EXPECT(_expected, _expected_str, _seen, _seen_str, _t, _assert) do { \
/* Avoid multiple evaluation of the cases */ \
@@ -759,7 +760,7 @@
__typeof__(_seen) __seen = (_seen); \
if (!(__exp _t __seen)) { \
/* Report with actual signedness to avoid weird output. */ \
- switch (is_signed_type(__exp) * 2 + is_signed_type(__seen)) { \
+ switch (is_signed_var(__exp) * 2 + is_signed_var(__seen)) { \
case 0: { \
uintmax_t __exp_print = (uintmax_t)__exp; \
uintmax_t __seen_print = (uintmax_t)__seen; \
base-commit: 78f4e737a53e1163ded2687a922fce138aee73f5
--
2.50.0.714.g196bf9f422-goog
Extend the vDSO for fast-path access to auxiliary clocks (CLOCK_AUX).
The implementation is based on the generic vDSO infrastructure and works for
all its supported architectures.
Namely x86, arm, arm64, riscv, powerpc, loongarch and s390.
No changes to userspace are necessary.
Based on timers/ptp of tip.git.
This also depends on v6.16-rc2 *exactly*.
The specific dependency is commit 11fcf368506d ("uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again"),
which is available in v6.16-rc2.
Unfortunately that got broken again in v6.16-rc3 by
commit fc92099902fb ("tools headers: Synchronize linux/bits.h with the kernel sources").
Another fix for this is pending [0] and should make it into v6.16.
[0] https://lore.kernel.org/lkml/20250630-uapi-genmask-v1-1-eb0ad956a83e@linutr…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (14):
selftests/timers: Add testcase for auxiliary clocks
vdso/vsyscall: Introduce a helper to fill clock configurations
vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock()
vdso/helpers: Add helpers for seqlocks of single vdso_clock
vdso/gettimeofday: Return bool from clock_getres() helpers
vdso/gettimeofday: Return bool from clock_gettime() helpers
vdso/gettimeofday: Introduce vdso_clockid_valid()
vdso/gettimeofday: Introduce vdso_set_timespec()
vdso/gettimeofday: Introduce vdso_get_timestamp()
vdso: Introduce aux_clock_resolution_ns()
vdso/vsyscall: Update auxiliary clock data in the datapage
vdso/gettimeofday: Add support for auxiliary clocks
Revert "selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers"
selftests/timers/auxclock: Test vDSO functionality
arch/arm64/include/asm/vdso/vsyscall.h | 7 +-
include/asm-generic/vdso/vsyscall.h | 6 +-
include/linux/timekeeper_internal.h | 13 +
include/vdso/auxclock.h | 13 +
include/vdso/datapage.h | 5 +
include/vdso/helpers.h | 40 ++-
kernel/time/namespace.c | 5 +
kernel/time/timekeeping.c | 18 +-
kernel/time/vsyscall.c | 70 ++++--
lib/vdso/gettimeofday.c | 212 ++++++++++------
tools/testing/selftests/timers/.gitignore | 1 +
tools/testing/selftests/timers/Makefile | 2 +-
tools/testing/selftests/timers/auxclock.c | 406 ++++++++++++++++++++++++++++++
tools/testing/selftests/vDSO/Makefile | 2 -
tools/testing/selftests/vDSO/parse_vdso.c | 3 +-
15 files changed, 683 insertions(+), 120 deletions(-)
---
base-commit: 4e83b31e48cf2e62aeaed5cd9875c851e36a90d9
change-id: 20250630-vdso-auxclock-97abdf8e042a
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Currently the test setup does not support running nolibc-test built with
LLVM in qemu-system. Enable this.
FYI, sparc32 on LLVM seems to be broken at the moment. To me this looks
like a LLVM regression, emitting invalid object code.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
selftests/nolibc: deduplicate invocations of toplevel Makefile
selftests/nolibc: don't pass CC to toplevel Makefile
selftests/nolibc: always compile the kernel with GCC
tools/testing/selftests/nolibc/Makefile.nolibc | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
---
base-commit: b9e50363178a40c76bebaf2f00faa2b0b6baf8d1
change-id: 20250719-nolibc-llvm-system-311762b62829
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Jakub reported that the rtnetlink test for the preferred lifetime of an
address has become quite flaky. The issue started appearing around the 6.16
merge window in May, and the test fails with:
FAIL: preferred_lft addresses remaining
The flakiness might be related to power-saving behavior, as address
expiration is handled by a "power-efficient" workqueue.
To address this, use slowwait to check more frequently whether the address
still exists. This reduces the likelihood of the system entering a low-power
state during the test, improving reliability.
Reported-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/net/rtnetlink.sh | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 2e8243a65b50..49141254065c 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -291,6 +291,17 @@ kci_test_route_get()
end_test "PASS: route get"
}
+check_addr_not_exist()
+{
+ dev=$1
+ addr=$2
+ if ip addr show dev $dev | grep -q $addr; then
+ return 1
+ else
+ return 0
+ fi
+}
+
kci_test_addrlft()
{
for i in $(seq 10 100) ;do
@@ -298,9 +309,8 @@ kci_test_addrlft()
run_cmd ip addr add 10.23.11.$i/32 dev "$devdummy" preferred_lft $lft valid_lft $((lft+1))
done
- sleep 5
- run_cmd_grep_fail "10.23.11." ip addr show dev "$devdummy"
- if [ $? -eq 0 ]; then
+ slowwait 5 check_addr_not_exist "$devdummy" "10.23.11."
+ if [ $? -eq 1 ]; then
check_err 1
end_test "FAIL: preferred_lft addresses remaining"
return
--
2.46.0
We already have a selftest for harness, while there is not usage
of FIXTURE_VARIANT.
Patch 2 add FIXTURE_VARIANT usage in the selftest.
Patch 1 is a typo fix.
v2:
* drop patch 2 in v1
* adjust patch 2 based on Thomas comment
Wei Yang (2):
selftests: harness: correct typo of __constructor_order_forward in
comment
selftests: harness: Add kselftest harness selftest with variant
tools/testing/selftests/kselftest_harness.h | 2 +-
.../kselftest_harness/harness-selftest.c | 30 +++++++++++++++++++
.../harness-selftest.expected | 20 ++++++++++---
3 files changed, 47 insertions(+), 5 deletions(-)
--
2.34.1
Since commit 028df914e546 ("rust: str: convert `rusttest` tests into
KUnit"), we do not have anymore host `#[test]`s that run in the host.
Moreover, we do not plan to add any new ones -- tests should generally
run within KUnit, since there they are built the same way the kernel
does. While we may want to have some way to define tests that can also
be run outside the kernel, we still want to test within the kernel too
[1], and thus would likely use a custom syntax anyway to define them.
Thus simplify the `rusttest` target by removing support for host
`#[test]`s for the `kernel` crate.
This still maintains the support for the `macros` crate, even though we
do not have any such tests there.
Link: https://lore.kernel.org/rust-for-linux/CABVgOS=AKHSfifp0S68K3jgNZAkALBr=7iF… [1]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
rust/Makefile | 9 +--------
rust/kernel/alloc.rs | 6 +++---
rust/kernel/error.rs | 4 ++--
rust/kernel/lib.rs | 2 +-
4 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/rust/Makefile b/rust/Makefile
index 115b63b7d1e3..5290b37868dd 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -235,7 +235,7 @@ quiet_cmd_rustc_test = $(RUSTC_OR_CLIPPY_QUIET) T $<
$(objtree)/$(obj)/test/$(subst rusttest-,,$@) $(rust_test_quiet) \
$(rustc_test_run_flags)
-rusttest: rusttest-macros rusttest-kernel
+rusttest: rusttest-macros
rusttest-macros: private rustc_target_flags = --extern proc_macro \
--extern macros --extern kernel --extern pin_init
@@ -245,13 +245,6 @@ rusttest-macros: $(src)/macros/lib.rs \
+$(call if_changed,rustc_test)
+$(call if_changed,rustdoc_test)
-rusttest-kernel: private rustc_target_flags = --extern ffi --extern pin_init \
- --extern build_error --extern macros --extern bindings --extern uapi
-rusttest-kernel: $(src)/kernel/lib.rs rusttestlib-ffi rusttestlib-kernel \
- rusttestlib-build_error rusttestlib-macros rusttestlib-bindings \
- rusttestlib-uapi rusttestlib-pin_init FORCE
- +$(call if_changed,rustc_test)
-
ifdef CONFIG_CC_IS_CLANG
bindgen_c_flags = $(c_flags)
else
diff --git a/rust/kernel/alloc.rs b/rust/kernel/alloc.rs
index a2c49e5494d3..335ae3271fa8 100644
--- a/rust/kernel/alloc.rs
+++ b/rust/kernel/alloc.rs
@@ -2,16 +2,16 @@
//! Implementation of the kernel's memory allocation infrastructure.
-#[cfg(not(any(test, testlib)))]
+#[cfg(not(testlib))]
pub mod allocator;
pub mod kbox;
pub mod kvec;
pub mod layout;
-#[cfg(any(test, testlib))]
+#[cfg(testlib)]
pub mod allocator_test;
-#[cfg(any(test, testlib))]
+#[cfg(testlib)]
pub use self::allocator_test as allocator;
pub use self::kbox::Box;
diff --git a/rust/kernel/error.rs b/rust/kernel/error.rs
index 3dee3139fcd4..7812aca1b6ef 100644
--- a/rust/kernel/error.rs
+++ b/rust/kernel/error.rs
@@ -157,7 +157,7 @@ pub fn to_ptr<T>(self) -> *mut T {
}
/// Returns a string representing the error, if one exists.
- #[cfg(not(any(test, testlib)))]
+ #[cfg(not(testlib))]
pub fn name(&self) -> Option<&'static CStr> {
// SAFETY: Just an FFI call, there are no extra safety requirements.
let ptr = unsafe { bindings::errname(-self.0.get()) };
@@ -174,7 +174,7 @@ pub fn name(&self) -> Option<&'static CStr> {
/// When `testlib` is configured, this always returns `None` to avoid the dependency on a
/// kernel function so that tests that use this (e.g., by calling [`Result::unwrap`]) can still
/// run in userspace.
- #[cfg(any(test, testlib))]
+ #[cfg(testlib)]
pub fn name(&self) -> Option<&'static CStr> {
None
}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index e13d6ed88fa6..8a0153f61732 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -197,7 +197,7 @@ pub const fn as_ptr(&self) -> *mut bindings::module {
}
}
-#[cfg(not(any(testlib, test)))]
+#[cfg(not(testlib))]
#[panic_handler]
fn panic(info: &core::panic::PanicInfo<'_>) -> ! {
pr_emerg!("{}\n", info);
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
--
2.50.1