This patch set enables the Intel flexible return and event delivery
(FRED) architecture with KVM VMX to allow guests to utilize FRED.
The FRED architecture defines simple new transitions that change
privilege level (ring transitions). The FRED architecture was
designed with the following goals:
1) Improve overall performance and response time by replacing event
delivery through the interrupt descriptor table (IDT event
delivery) and event return by the IRET instruction with lower
latency transitions.
2) Improve software robustness by ensuring that event delivery
establishes the full supervisor context and that event return
establishes the full user context.
The new transitions defined by the FRED architecture are FRED event
delivery and, for returning from events, two FRED return instructions.
FRED event delivery can effect a transition from ring 3 to ring 0, but
it is used also to deliver events incident to ring 0. One FRED
instruction (ERETU) effects a return from ring 0 to ring 3, while the
other (ERETS) returns while remaining in ring 0. Collectively, FRED
event delivery and the FRED return instructions are FRED transitions.
Intel VMX architecture is extended to run FRED guests, and the major
changes are:
1) New VMCS fields for FRED context management, which includes two new
event data VMCS fields, eight new guest FRED context VMCS fields and
eight new host FRED context VMCS fields.
2) VMX nested-exception support for proper virtualization of stack
levels introduced with FRED architecture.
Search for the latest FRED spec in most search engines with this search
pattern:
site:intel.com FRED (flexible return and event delivery) specification
As the native FRED patches are committed in the tip tree "x86/fred"
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/fred,
and we have received a good amount of review comments for v1, it's time
to send out v2 based on this branch for further help from the community.
Patch 1-2 are cleanups to VMX basic and misc MSRs, which were sent
out earlier as a preparation for FRED changes:
https://lore.kernel.org/kvm/20240206182032.1596-1-xin3.li@intel.com/T/#u
Patch 3-15 add FRED support to VMX.
Patch 16-21 add FRED support to nested VMX.
Patch 22 exposes FRED and its baseline features to KVM guests.
Patch 23-25 add FRED selftests.
There is also a counterpart qemu patch set for FRED at:
https://lore.kernel.org/qemu-devel/20231109072012.8078-1-xin3.li@intel.com/…,
which works with this patch set to allow KVM to run FRED guests.
Changes since v1:
* Always load the secondary VM exit controls (Sean Christopherson).
* Remove FRED VM entry/exit controls consistency checks in
setup_vmcs_config() (Sean Christopherson).
* Clear FRED VM entry/exit controls if FRED is not enumerated (Chao Gao).
* Use guest_can_use() to trace FRED enumeration in a vcpu (Chao Gao).
* Enable FRED MSRs intercept if FRED is no longer enumerated in CPUID
(Chao Gao).
* Move guest FRED states init into __vmx_vcpu_reset() (Chao Gao).
* Don't use guest_cpuid_has() in vmx_prepare_switch_to_{host,guest}(),
which are called from IRQ-disabled context (Chao Gao).
* Reset msr_guest_fred_rsp0 in __vmx_vcpu_reset() (Chao Gao).
* Fail host requested FRED MSRs access if KVM cannot virtualize FRED
(Chao Gao).
* Handle the case FRED MSRs are valid but KVM cannot virtualize FRED
(Chao Gao).
* Add sanity checks when writing to FRED MSRs.
* Explain why it is ok to only check CR4.FRED in kvm_is_fred_enabled()
(Chao Gao).
* Document event data should be equal to CR2/DR6/IA32_XFD_ERR instead
of using WARN_ON() (Chao Gao).
* Zero event data if a #NM was not caused by extended feature disable
(Chao Gao).
* Set the nested flag when there is an original interrupt (Chao Gao).
* Dump guest FRED states only if guest has FRED enabled (Nikolay Borisov).
* Add a prerequisite to SHADOW_FIELD_R[OW] macros
* Remove hyperv TLFS related changes (Jeremi Piotrowski).
* Use kvm_cpu_cap_has() instead of cpu_feature_enabled() to decouple
KVM's capability to virtualize a feature and host's enabling of a
feature (Chao Gao).
Xin Li (25):
KVM: VMX: Cleanup VMX basic information defines and usages
KVM: VMX: Cleanup VMX misc information defines and usages
KVM: VMX: Add support for the secondary VM exit controls
KVM: x86: Mark CR4.FRED as not reserved
KVM: VMX: Initialize FRED VM entry/exit controls in vmcs_config
KVM: VMX: Defer enabling FRED MSRs save/load until after set CPUID
KVM: VMX: Set intercept for FRED MSRs
KVM: VMX: Initialize VMCS FRED fields
KVM: VMX: Switch FRED RSP0 between host and guest
KVM: VMX: Add support for FRED context save/restore
KVM: x86: Add kvm_is_fred_enabled()
KVM: VMX: Handle FRED event data
KVM: VMX: Handle VMX nested exception for FRED
KVM: VMX: Disable FRED if FRED consistency checks fail
KVM: VMX: Dump FRED context in dump_vmcs()
KVM: VMX: Invoke vmx_set_cpu_caps() before nested setup
KVM: nVMX: Add support for the secondary VM exit controls
KVM: nVMX: Add a prerequisite to SHADOW_FIELD_R[OW] macros
KVM: nVMX: Add FRED VMCS fields
KVM: nVMX: Add support for VMX FRED controls
KVM: nVMX: Add VMCS FRED states checking
KVM: x86: Allow FRED/LKGS/WRMSRNS to be exposed to guests
KVM: selftests: Run debug_regs test with FRED enabled
KVM: selftests: Add a new VM guest mode to run user level code
KVM: selftests: Add fred exception tests
Documentation/virt/kvm/x86/nested-vmx.rst | 19 +
arch/x86/include/asm/kvm_host.h | 8 +-
arch/x86/include/asm/msr-index.h | 15 +-
arch/x86/include/asm/vmx.h | 59 ++-
arch/x86/kvm/cpuid.c | 4 +-
arch/x86/kvm/governed_features.h | 1 +
arch/x86/kvm/kvm_cache_regs.h | 17 +
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/vmx/capabilities.h | 30 +-
arch/x86/kvm/vmx/nested.c | 329 ++++++++++++---
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmcs.h | 1 +
arch/x86/kvm/vmx/vmcs12.c | 19 +
arch/x86/kvm/vmx/vmcs12.h | 38 ++
arch/x86/kvm/vmx/vmcs_shadow_fields.h | 80 ++--
arch/x86/kvm/vmx/vmx.c | 385 +++++++++++++++---
arch/x86/kvm/vmx/vmx.h | 15 +-
arch/x86/kvm/x86.c | 103 ++++-
arch/x86/kvm/x86.h | 5 +-
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 1 +
.../selftests/kvm/include/x86_64/processor.h | 36 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 5 +-
.../selftests/kvm/lib/x86_64/processor.c | 15 +-
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 4 +-
.../testing/selftests/kvm/x86_64/debug_regs.c | 50 ++-
.../testing/selftests/kvm/x86_64/fred_test.c | 297 ++++++++++++++
27 files changed, 1320 insertions(+), 223 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/fred_test.c
base-commit: e13841907b8fda0ae0ce1ec03684665f578416a8
--
2.43.0
During the review of iommufd pasid series, Kevin and Jason suggested
attaching PASID to the blocked domain hence replacing the usage of
remove_dev_pasid() op [1]. This makes sense as it makes the PASID path
aligned with the RID path which attaches the RID to the blocked_domain
when it is to be blocked. To do it, it requires passing the old domain
to the iommu driver. This has been done in [2].
This series makes the Intel iommu driver and ARM SMMUv3 driver support
attaching PASID to the blocked domain. While the AMD iommu driver does
not have the blocked domain yet, so still uses the remove_dev_pasid() op.
[1] https://lore.kernel.org/linux-iommu/20240816130202.GB2032816@nvidia.com/
[2] https://lore.kernel.org/linux-iommu/20240912130427.10119-1-yi.l.liu@intel.c…
Regards,
Yi Liu
Jason Gunthorpe (1):
iommu/arm-smmu-v3: Make smmuv3 blocked domain support PASID
Yi Liu (2):
iommu/vt-d: Make blocked domain support PASID
iommu: Add a wrapper for remove_dev_pasid
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 12 ++++-----
drivers/iommu/intel/iommu.c | 17 ++++++++----
drivers/iommu/iommu.c | 30 ++++++++++++++++-----
3 files changed, 42 insertions(+), 17 deletions(-)
--
2.34.1
This series adds support for the C-SKY architecture to nolibc.
It is hard to find a usable C-SKY userspace and compiler, so having
support in nolibc provides an easy way to perform tests there.
The nolibc test suite requires system power off support in QEMU,
so a driver for that is added, too.
I'm not sure who is responsible for drivers/virt/ and can take the
driver.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (5):
drivers/virt: introduce csky_exit system poweroff driver
tools/nolibc: provide a fallback for lseek through llseek
selftests/nolibc: add support to use standalone kernels for tests
tools/nolibc: add csky support
selftests/nolibc: skip test for getppid() on csky
drivers/virt/Kconfig | 11 ++
drivers/virt/Makefile | 1 +
drivers/virt/csky_exit.c | 57 ++++++++++
tools/include/nolibc/arch-csky.h | 161 +++++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/include/nolibc/sys.h | 8 ++
tools/testing/selftests/nolibc/Makefile | 21 +++-
tools/testing/selftests/nolibc/nolibc-test.c | 9 +-
8 files changed, 265 insertions(+), 5 deletions(-)
---
base-commit: e7ed343658792771cf1b868df061661b7bcc5cef
change-id: 20240928-nolibc-csky-eff1104825d2
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi all,
This patch was developed during a hackathon organized by LKCAMP [1],
with the objective of writing KUnit tests, both to introduce people to
the kernel development process and to learn about different subsystems
(with the positive side effect of improving the kernel test coverage, of
course).
We noticed there were tests for CRC32 in lib/crc32test.c and thought it
would be nice to have something similar for CRC16, since it seems to be
widely used in network drivers (as well as in some ext4 code).
Although this patch turned out quite big, most of the LOCs come from
tables containing randomly-generated test data that we use to validate
the kernel's implementation of CRC-16.
We would really appreciate any feedback/suggestions on how to improve
this. Thanks! :-)
Vinicius Peixoto (1):
lib/crc16_kunit.c: add KUnit tests for crc16
lib/Kconfig.debug | 8 +
lib/Makefile | 1 +
lib/crc16_kunit.c | 715 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 724 insertions(+)
create mode 100644 lib/crc16_kunit.c
--
2.43.0
By reading the code, I found these variables are never
referenced in the code. Just remove them.
Signed-off-by: Ba Jing <bajing(a)cmss.chinamobile.com>
---
Notes:
v1: https://lore.kernel.org/all/20240903034300.10443-1-bajing@cmss.chinamobile.…
v2: Modify the commit subject and commit log.
tools/testing/selftests/damon/access_memory_even.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/damon/access_memory_even.c b/tools/testing/selftests/damon/access_memory_even.c
index 3be121487432..a9f4e9aaf3a9 100644
--- a/tools/testing/selftests/damon/access_memory_even.c
+++ b/tools/testing/selftests/damon/access_memory_even.c
@@ -14,10 +14,8 @@
int main(int argc, char *argv[])
{
char **regions;
- clock_t start_clock;
int nr_regions;
int sz_region;
- int access_time_ms;
int i;
if (argc != 3) {
--
2.33.0
Hi Linus,
Please pull this fixes update for Linux 6.12-rc1.
This kselftest fixes update for Linux 6.12-rc1 consists of an urgent
fix to vDSO as automated testing is failing due to this bug.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a0474b8d5974e142461ac7584c996feea167bcc1:
selftests: kselftest: Use strerror() on nolibc (2024-09-11 09:52:33 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-next-6.12-rc1-fixes
for you to fetch changes up to 4b721fcc094e9eb6dd4702df8d79ab11e120833d:
selftests: vDSO: align stack for O2-optimized memcpy (2024-09-27 12:17:12 -0600)
----------------------------------------------------------------
linux_kselftest-next-6.12-rc1-fixes
This kselftest fixes update for Linux 6.12-rc1 consists of an urgent
fix to vDSO as automated testing is failing due to this bug.
----------------------------------------------------------------
Jason A. Donenfeld (1):
selftests: vDSO: align stack for O2-optimized memcpy
tools/testing/selftests/vDSO/vdso_standalone_test_x86.c | 2 ++
1 file changed, 2 insertions(+)
----------------------------------------------------------------
From: Tycho Andersen <tandersen(a)netflix.com>
Zbigniew mentioned at Linux Plumber's that systemd is interested in
switching to execveat() for service execution, but can't, because the
contents of /proc/pid/comm are the file descriptor which was used,
instead of the path to the binary. This makes the output of tools like
top and ps useless, especially in a world where most fds are opened
CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have
allocated one of these synthetic paths in bprm_init(). This way the actual
exec machinery is unchanged, but cosmetically the comm looks reasonable to
admins investigating things.
Signed-off-by: Tycho Andersen <tandersen(a)netflix.com>
Suggested-by: Zbigniew Jędrzejewski-Szmek <zbyszek(a)in.waw.pl>
CC: Aleksa Sarai <cyphar(a)cyphar.com>
Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
---
v2: * drop the flag, everyone :)
* change the rendered value to f_path.dentry->d_name.name instead of
argv[0], Eric
---
fs/exec.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index dad402d55681..9520359a8dcc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm)
set_dumpable(current->mm, SUID_DUMP_USER);
perf_event_exec();
- __set_task_comm(me, kbasename(bprm->filename), true);
+
+ /*
+ * If fdpath was set, execveat() made up a path that will
+ * probably not be useful to admins running ps or similar.
+ * Let's fix it up to be something reasonable.
+ */
+ if (bprm->fdpath) {
+ BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
+ __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
+ } else {
+ __set_task_comm(me, kbasename(bprm->filename), true);
+ }
/* An exec changes our domain. We are no longer part of the thread
group */
base-commit: baeb9a7d8b60b021d907127509c44507539c15e5
--
2.34.1
Introduce a new test to identify regressions causing devices to go
missing on the system.
For each bus and class on the system the test checks the number of
devices present against a reference file, which needs to have been
generated by the program at a previous point on a known-good kernel, and
if there are missing devices they are reported.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Hi,
For details about the test, please see the README.rst included in the
patch.
This v2 contains changes addressing feedback received on the RFCv1
series, during the session at plumbers [1] and a few other things I
noticed along the way.
[1] https://www.youtube.com/live/kcr8NXEbzcg?si=QWBvJAOjj7tg264o&t=11283
For the open questions I posted in v1, the v2 changelog below should
make it clear what was decided. A few clarifications are needed though:
* I've decided to leave driver probe out of this test to keep it simple
and avoid potential false-positives
* The reference file now includes the full kernel config as part of its
metadata (Example at [2]). This is clunky but seems worth it for the
purposes of reproducibility, and potentially (in the future) choosing
the reference that best matches the running system
[2] https://github.com/kernelci/platform-test-parameters/pull/3/files
Let me know your thoughts.
Thanks,
Nícolas
---
Changes in v2:
- Switched reference format from YAML to JSON
- Introduced metadata to reference file, it includes: kernel version,
kernel configuration and platform identifier
- Added -u flag to allow updating reference file in-place if it is a
superset
- Added -f flag to allow specifying filename of the reference
- Added a few device properties (., device, firmware_node, driver)
- Un-ignored devlink device class
- Refactored code to improve legibility
- Added README.rst with documentation
- Renamed test from exist.py to test_dev_exist.py
- Link to v1: https://lore.kernel.org/r/20240724-kselftest-dev-exist-v1-1-9bc21aa761b5@co…
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/exist/Makefile | 3 +
tools/testing/selftests/devices/exist/README.rst | 146 +++++++++
.../selftests/devices/exist/test_dev_exist.py | 357 +++++++++++++++++++++
4 files changed, 507 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b38199965f99..eacf4b062f01 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += cpufreq
TARGETS += cpu-hotplug
TARGETS += damon
TARGETS += devices/error_logs
+TARGETS += devices/exist
TARGETS += devices/probe
TARGETS += dmabuf-heaps
TARGETS += drivers/dma-buf
diff --git a/tools/testing/selftests/devices/exist/Makefile b/tools/testing/selftests/devices/exist/Makefile
new file mode 100644
index 000000000000..df85f661aa99
--- /dev/null
+++ b/tools/testing/selftests/devices/exist/Makefile
@@ -0,0 +1,3 @@
+TEST_PROGS := test_dev_exist.py
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/devices/exist/README.rst b/tools/testing/selftests/devices/exist/README.rst
new file mode 100644
index 000000000000..1599204e355d
--- /dev/null
+++ b/tools/testing/selftests/devices/exist/README.rst
@@ -0,0 +1,146 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. Copyright (c) 2024 Collabora Ltd
+
+==========================
+Device existence kselftest
+==========================
+
+This test verifies whether all devices still exist on the system when compared
+to a reference run, allowing detection of regressions that cause devices to go
+missing.
+
+TL;DR
+=====
+
+Run ``./test_dev_exist.py -g``, then run ``./test_dev_exist.py``.
+
+Usage
+=====
+
+The test program can be found as ``test_dev_exist.py`` in this directory. Run it
+with the ``--help`` argument to get information for all available arguments.
+Detailed usage follows below.
+
+Reference generation
+--------------------
+
+Before running the test, it is necessary to generate a reference. To do that,
+run it with the ``--generate-reference`` argument. This will generate a JSON
+file encoding all the devices available, per subsystem (class or bus), in the
+running system, as well as metadata about the system (kernel version,
+configuration and system identifiers).
+
+By default, the file will be saved in the current directory and named based on
+the system identifier, but that can be changed through the use of the
+``--reference-dir`` and ``--reference-file`` flags.
+
+Running the test
+----------------
+
+To run the test, simply execute it **without** the ``--generate-reference``
+argument. By default, once again, the test will look for the reference file in
+the current directory and named as the system identifier, but that can be
+changed through the ``--reference-dir`` and ``--reference-file`` flags.
+
+Reading the results
+-------------------
+
+The test outputs in the KTAP format, with one result per subsystem. For each
+failure the output shows the devices that were expected by the reference file,
+the devices that were found in the running system, and a best-effort guess for
+the devices that are missing in the system. For each device, its main properties
+are printed out to help in identifying it.
+
+As an example, below is the snippet printed when one of the three devices in the
+media bus went missing::
+
+ # Missing devices for subsystem 'media': 1 (Expected 3, found 2)
+ # =================
+ # Devices expected:
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8.3/3-8.3.2/3-8.3.2:1.0/media2
+ # uevent:
+ # MAJOR=237
+ # MINOR=2
+ # DEVNAME=media2
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0/media0
+ # uevent:
+ # MAJOR=237
+ # MINOR=0
+ # DEVNAME=media0
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.2/media1
+ # uevent:
+ # MAJOR=237
+ # MINOR=1
+ # DEVNAME=media1
+ #
+ # -----------------
+ # Devices found:
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.0/media0
+ # uevent:
+ # MAJOR=237
+ # MINOR=0
+ # DEVNAME=media0
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-9/3-9:1.2/media1
+ # uevent:
+ # MAJOR=237
+ # MINOR=1
+ # DEVNAME=media1
+ #
+ # -----------------
+ # Devices missing (best guess):
+ #
+ # .:
+ # /sys/devices/pci0000:00/0000:00:14.0/usb3/3-8/3-8.3/3-8.3.2/3-8.3.2:1.0/media2
+ # uevent:
+ # MAJOR=237
+ # MINOR=2
+ # DEVNAME=media2
+ #
+ # =================
+ not ok 67 bus.media
+
+Updating the reference
+----------------------
+
+As time goes on, new devices might be introduced in the system. To replace a
+reference file with a more up-to-date one containing more devices, pass both
+``--generate-reference`` and ``--update-reference`` arguments. The program will
+refuse to replace the reference if the new one doesn't contain all the devices
+in the old reference, as that is usually not desirable.
+
+Caveats
+=======
+
+The test relies solely on the count of devices per subsystem to detect missing
+devices. [#f1]_ That means that it is possible for the test to fail to detect a
+missing device.
+
+For example, if the running system contains one extra device and one missing
+device on the same subsystem compared to the reference, no test will fail since
+the count is the same. To minimize the risk of this happening, it is recommended
+to keep the reference file as up-to-date as possible.
+
+.. [#f1] The reason for this is that there aren't any device properties that are
+ used for every device and that are guaranteed to uniquely identify them and be
+ stable across kernel releases, so any attempt to match devices based on their
+ properties would lead to false-positives.
+
+Pre-existing reference files
+============================
+
+Due to the per-platform nature of the reference files, it is not viable to keep
+them in-tree.
+
+To facilitate running the test, especially by CI systems, a collection of
+pre-existing reference files is kept at
+https://github.com/kernelci/platform-test-parameters.
diff --git a/tools/testing/selftests/devices/exist/test_dev_exist.py b/tools/testing/selftests/devices/exist/test_dev_exist.py
new file mode 100755
index 000000000000..58bff5ea99e7
--- /dev/null
+++ b/tools/testing/selftests/devices/exist/test_dev_exist.py
@@ -0,0 +1,357 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Collabora Ltd
+
+import os
+import sys
+import argparse
+import gzip
+import json
+
+# Allow ksft module to be imported from different directory
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../../kselftest/"))
+
+import ksft
+
+
+def generate_ref_metadata():
+ metadata = {}
+
+ config_file = "/proc/config.gz"
+ if os.path.isfile(config_file):
+ with gzip.open(config_file, "r") as f:
+ config = str(f.read())
+ metadata["config"] = config
+
+ metadata["version"] = os.uname().release
+
+ metadata["platform_ids"] = get_possible_ref_filenames()
+
+ return metadata
+
+
+def generate_dev_data():
+ data = {}
+
+ device_subsys_types = [
+ {
+ "type": "class",
+ "base_dir": "/sys/class",
+ "add_path": "",
+ "ignored": [],
+ },
+ {
+ "type": "bus",
+ "base_dir": "/sys/bus",
+ "add_path": "devices",
+ "ignored": [],
+ },
+ ]
+
+ properties = sorted(
+ [
+ ".",
+ "uevent",
+ "name",
+ "device",
+ "firmware_node",
+ "driver",
+ "device/uevent",
+ "firmware_node/uevent",
+ ]
+ )
+
+ for dev_subsys_type in device_subsys_types:
+ subsystems = {}
+ for subsys_name in sorted(os.listdir(dev_subsys_type["base_dir"])):
+ if subsys_name in dev_subsys_type["ignored"]:
+ continue
+
+ devs_path = os.path.join(
+ dev_subsys_type["base_dir"], subsys_name, dev_subsys_type["add_path"]
+ )
+ # Filter out non-symlinks as they're not devices
+ dev_dirs = [dev for dev in os.scandir(devs_path) if dev.is_symlink()]
+ devs_data = []
+ for dev_dir in dev_dirs:
+ dev_path = os.path.join(devs_path, dev_dir)
+ dev_data = {"info": {}}
+ for prop in properties:
+ prop_path = os.path.join(dev_path, prop)
+ if os.path.isfile(prop_path):
+ with open(prop_path) as f:
+ dev_data["info"][prop] = f.read()
+ elif os.path.isdir(prop_path):
+ dev_data["info"][prop] = os.path.realpath(prop_path)
+ devs_data.append(dev_data)
+ if len(dev_dirs):
+ subsystems[subsys_name] = {
+ "count": len(dev_dirs),
+ "devices": devs_data,
+ }
+ data[dev_subsys_type["type"]] = subsystems
+
+ return data
+
+
+def generate_reference():
+ return {"metadata": generate_ref_metadata(), "data": generate_dev_data()}
+
+
+def commented(s):
+ return s.replace("\n", "\n# ")
+
+
+def indented(s, n):
+ return " " * n + s.replace("\n", "\n" + " " * n)
+
+
+def stripped(s):
+ return s.strip("\n")
+
+
+def devices_difference(dev1, dev2):
+ difference = 0
+
+ for prop in dev1["info"].keys():
+ for l1, l2 in zip(
+ dev1["info"].get(prop, "").split("\n"),
+ dev2["info"].get(prop, "").split("\n"),
+ ):
+ if l1 != l2:
+ difference += 1
+ return difference
+
+
+def guess_missing_devices(cur_subsys_data, ref_subsys_data):
+ # Detect what devices on the current system are the most similar to devices
+ # on the reference one by one until the leftovers are the most dissimilar
+ # devices and therefore most likely the missing ones.
+ found_count = cur_subsys_data["count"]
+ expected_count = ref_subsys_data["count"]
+ missing_count = found_count - expected_count
+
+ diffs = []
+ for cur_d in cur_subsys_data["devices"]:
+ for ref_d in ref_subsys_data["devices"]:
+ diffs.append((devices_difference(cur_d, ref_d), cur_d, ref_d))
+
+ diffs.sort(key=lambda x: x[0])
+
+ assigned_ref_devs = []
+ assigned_cur_devs = []
+ for diff in diffs:
+ if len(assigned_ref_devs) >= expected_count - missing_count:
+ break
+ if diff[1] in assigned_cur_devs or diff[2] in assigned_ref_devs:
+ continue
+ assigned_cur_devs.append(diff[1])
+ assigned_ref_devs.append(diff[2])
+
+ missing_devices = []
+ for d in ref_subsys_data["devices"]:
+ if d not in assigned_ref_devs:
+ missing_devices.append(d)
+
+ return missing_devices
+
+
+def dump_devices_info(cur_subsys_data, ref_subsys_data):
+ def dump_device_info(dev):
+ for name, val in dev["info"].items():
+ ksft.print_msg(indented(name + ":", 2))
+ val = stripped(val)
+ if val:
+ ksft.print_msg(commented(indented(val, 4)))
+ ksft.print_msg("")
+
+ ksft.print_msg("=================")
+ ksft.print_msg("Devices expected:")
+ ksft.print_msg("")
+ for d in ref_subsys_data["devices"]:
+ dump_device_info(d)
+ ksft.print_msg("-----------------")
+ ksft.print_msg("Devices found:")
+ ksft.print_msg("")
+ for d in cur_subsys_data["devices"]:
+ dump_device_info(d)
+ ksft.print_msg("-----------------")
+ ksft.print_msg("Devices missing (best guess):")
+ ksft.print_msg("")
+ missing_devices = guess_missing_devices(cur_subsys_data, ref_subsys_data)
+ for d in missing_devices:
+ dump_device_info(d)
+ ksft.print_msg("=================")
+
+
+def load_reference(ref_filename):
+ with open(ref_filename) as f:
+ ref = json.load(f)
+ return ref
+
+
+def run_test(ref_filename):
+ ksft.print_msg(f"Using reference file: '{ref_filename}'")
+
+ ref_data = load_reference(ref_filename)["data"]
+
+ num_tests = 0
+ for subsys_type in ref_data.values():
+ num_tests += len(subsys_type)
+ ksft.set_plan(num_tests)
+
+ cur_data = generate_dev_data()
+
+ reference_outdated = False
+
+ for subsys_type_name, ref_subsys_type_data in ref_data.items():
+ for subsys_name, ref_subsys_data in ref_subsys_type_data.items():
+ test_name = f"{subsys_type_name}.{subsys_name}"
+ if not (
+ cur_data.get(subsys_type_name)
+ and cur_data.get(subsys_type_name).get(subsys_name)
+ ):
+ ksft.print_msg(f"Device subsystem '{subsys_name}' missing")
+ ksft.test_result_fail(test_name)
+ continue
+ cur_subsys_data = cur_data[subsys_type_name][subsys_name]
+
+ found_count = cur_subsys_data["count"]
+ expected_count = ref_subsys_data["count"]
+ if found_count < expected_count:
+ ksft.print_msg(
+ f"Missing devices for subsystem '{subsys_name}': {expected_count - found_count} (Expected {expected_count}, found {found_count})"
+ )
+ dump_devices_info(cur_subsys_data, ref_subsys_data)
+ ksft.test_result_fail(test_name)
+ else:
+ ksft.test_result_pass(test_name)
+ if found_count > expected_count:
+ reference_outdated = True
+
+ if len(cur_data[subsys_type_name]) > len(ref_subsys_type_data):
+ reference_outdated = True
+
+ if reference_outdated:
+ ksft.print_msg(
+ "Warning: The current system contains more devices and/or subsystems than the reference. Updating the reference is recommended."
+ )
+
+
+def ref_is_superset(new_ref_data, old_ref_data):
+ for subsys_type in old_ref_data:
+ for subsys in old_ref_data[subsys_type]:
+ if subsys not in new_ref_data[subsys_type]:
+ return False
+ if (
+ new_ref_data[subsys_type][subsys]["count"]
+ < old_ref_data[subsys_type][subsys]["count"]
+ ):
+ return False
+ return True
+
+
+def get_possible_ref_filenames():
+ filenames = []
+
+ dt_board_compatible_file = "/proc/device-tree/compatible"
+ if os.path.exists(dt_board_compatible_file):
+ with open(dt_board_compatible_file) as f:
+ for line in f:
+ compatibles = [compat for compat in line.split("\0") if compat]
+ filenames.extend(compatibles)
+ else:
+ dmi_id_dir = "/sys/devices/virtual/dmi/id"
+ vendor_dmi_file = os.path.join(dmi_id_dir, "sys_vendor")
+ product_dmi_file = os.path.join(dmi_id_dir, "product_name")
+
+ with open(vendor_dmi_file) as f:
+ vendor = f.read().replace("\n", "")
+ with open(product_dmi_file) as f:
+ product = f.read().replace("\n", "")
+
+ filenames = [vendor + "," + product]
+
+ return filenames
+
+
+def get_ref_filename(ref_dir, should_exist=True):
+ chosen_ref_filename = ""
+ full_ref_paths = [
+ os.path.join(ref_dir, f + ".json") for f in get_possible_ref_filenames()
+ ]
+ if not should_exist:
+ return full_ref_paths[0]
+
+ for path in full_ref_paths:
+ if os.path.exists(path):
+ chosen_ref_filename = path
+ break
+
+ if not chosen_ref_filename:
+ tried_paths = ",".join(["'" + p + "'" for p in full_ref_paths])
+ ksft.print_msg(f"No matching reference file found (tried {tried_paths})")
+ ksft.exit_fail()
+
+ return chosen_ref_filename
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+ "--reference-dir",
+ "-d",
+ default=".",
+ help="Directory containing the reference files",
+)
+parser.add_argument(
+ "--reference-file", "-f", help="File name of the reference to read from or write to"
+)
+parser.add_argument(
+ "--generate-reference",
+ "-g",
+ action="store_true",
+ help="Generate a reference file with the devices on the running system",
+)
+parser.add_argument(
+ "--update-reference",
+ "-u",
+ action="store_true",
+ help="Allow overwriting the reference in-place if the existing reference is a subset of the new one",
+)
+args = parser.parse_args()
+
+if args.reference_file:
+ ref_filename = os.path.join(args.reference_dir, args.reference_file)
+ if not os.path.exists(ref_filename) and not args.generate_reference:
+ ksft.print_msg(f"Reference file not found: '{ref_filename}'")
+ ksft.exit_fail()
+else:
+ ref_filename = get_ref_filename(args.reference_dir, not args.generate_reference)
+
+if args.generate_reference:
+ if os.path.exists(ref_filename) and not args.update_reference:
+ print(
+ f"Reference file '{ref_filename}' already exists; won't overwrite; aborting"
+ )
+ sys.exit(1)
+
+ gen_ref = generate_reference()
+ if args.update_reference and os.path.exists(ref_filename):
+ loaded_ref = load_reference(ref_filename)
+ if not ref_is_superset(gen_ref["data"], loaded_ref["data"]):
+ print(
+ f"New reference is not a superset of the existing one; skipping update for '{ref_filename}'"
+ )
+ sys.exit(1)
+
+ with open(ref_filename, "w") as f:
+ json.dump(gen_ref, f, indent=4)
+ print(f"Reference generated to file '{ref_filename}'")
+ sys.exit(0)
+
+ksft.print_header()
+
+run_test(ref_filename)
+
+ksft.finished()
---
base-commit: 40e0c9d414f57d450e3ad03c12765e797fc3fede
change-id: 20240724-kselftest-dev-exist-bb1bcf884654
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Hello all,
This patch series targets a long-standing BPF usability issue - the lack
of general cross-compilation support - by enabling cross-endian usage of
libbpf and bpftool, as well as supporting cross-endian build targets for
selftests/bpf.
Benefits include improved BPF development and testing for embedded systems
based on e.g. big-endian MIPS, more build options e.g for s390x systems,
and better accessibility to the very latest test tools e.g. 'test_progs'.
The series touches many functional areas: BTF.ext handling; object access,
introspection, and linking; generation of normal and "light" skeletons.
Initial development and testing used mips64, since this arch makes
switching the build byte-order trivial and is thus very handy for A/B
testing. However, it lacks some key features (bpf2bpf call, kfuncs, etc)
making for poor selftests/bpf coverage.
Final testing takes the kernel and selftests/bpf cross-built from x86_64
to s390x, and runs the result under QEMU/s390x. That same configuration
could also be used on kernel-patches/bpf CI for regression testing endian
support or perhaps load-sharing s390x builds across x86_64 systems.
This thread includes some background regarding testing on QEMU/s390x and
the generally favourable results:
https://lore.kernel.org/bpf/ZsEcsaa3juxxQBUf@kodidev-ubuntu/
Earlier versions and related discussion of the series are here:
v1: https://lore.kernel.org/bpf/cover.1724216108.git.tony.ambardar@gmail.com/
v2: https://lore.kernel.org/bpf/cover.1724313164.git.tony.ambardar@gmail.com/
v3: https://lore.kernel.org/bpf/cover.1724843049.git.tony.ambardar@gmail.com/
v4: https://lore.kernel.org/bpf/cover.1724976539.git.tony.ambardar@gmail.com/
v5: https://lore.kernel.org/bpf/cover.1725347944.git.tony.ambardar@gmail.com/
Feedback and suggestions are welcome!
Best regards,
Tony
Changelog:
---------
v5 -> v6: (comments from Andrii, Alexei, Eduard)
- clarify info_blob_bswap() by making it explicitly conditional on
non-native target endianness, and merge a pair of related debug
statements
- reformat debug statement in bpf_object_bswap_progs() on single line
- update existing info setup functions to validate and parse info
section metadata prior to any byte-swapping, and drop earlier added
validation checks
- rework cross-endian BTF.ext handling by using callback functions to
byte-swap different types of info records, but after initial parsing
- fix a bug always outputting BTF.ext raw data in native endianness
- include v5 "Acked-by:" from Alexei, Yonghong
v4 -> v5: (feedback from Andrii and Eduard)
- add separate functions to byte-swap info metadata and records, and
ensure ordering so record bswaps occur when metadata is native endian
- use new and existing macros to iterate through info sections/records,
and check embedded record sizes match that of info structs used
- drop use of <cough> evil callbacks
- move setting swapped_endian flag to after byte-swapping functions are
called during initialization, allowing funcs to infer endianness and
drop a 'bool native' call parameter
- simplify byte-swapping macro used to generate light skeleton, and use
internal lib funcs to swap info records instead of assuming all __u32
- change info bswap library funcs to void return
- rework/consolidate new debug statements to reduce their number
- remove some unneeded handling of impossible errors, and drop a safety
check already handled elsewhere
- add and clarify some comments
v3 -> v4:
- fix a use-after-free ELF data-handling error causing rare CI failures
- move bswap functions for func/line/core-relo records to internal header
- use bswap functions also for info blobs in light skeleton
v2 -> v3: (feedback from Andrii)
- improve some log and commit message formatting
- restructure BTF.ext endianness safety checks and byte-swapping
- use BTF.ext info record definitions for swapping, require BTF v1
- follow BTF API implementation more closely for BTF.ext
- explicitly reject loading non-native endianness program into kernel
- simplify linker output byte-order setting
- drop redundant safety checks during linking
- simplify endianness macro and improve blob setup code for light skel
- no unexpected test failures after cross-compiling x86_64 -> s390x
v1 -> v2:
- fixed a light skeleton bug causing test_progs 'map_ptr' failure
- simplified some BTF.ext related endianness logic
- remove an 'inline' usage related to CI checkpatch failure
- improve some formatting noted by checkpatch warnings
- unexpected 'test_progs' failures drop 3 -> 2 (x86_64 to s390x cross)
Tony Ambardar (8):
libbpf: Improve log message formatting
libbpf: Fix header comment typos for BTF.ext
libbpf: Fix output .symtab byte-order during linking
libbpf: Support BTF.ext loading and output in either endianness
libbpf: Support opening bpf objects of either endianness
libbpf: Support linking bpf objects of either endianness
libbpf: Support creating light skeleton of either endianness
selftests/bpf: Support cross-endian building
tools/lib/bpf/bpf_gen_internal.h | 1 +
tools/lib/bpf/btf.c | 284 +++++++++++++++++++++------
tools/lib/bpf/btf.h | 3 +
tools/lib/bpf/btf_dump.c | 2 +-
tools/lib/bpf/btf_relocate.c | 2 +-
tools/lib/bpf/gen_loader.c | 187 +++++++++++++-----
tools/lib/bpf/libbpf.c | 56 ++++--
tools/lib/bpf/libbpf.map | 2 +
tools/lib/bpf/libbpf_internal.h | 45 ++++-
tools/lib/bpf/linker.c | 80 ++++++--
tools/lib/bpf/relo_core.c | 2 +-
tools/lib/bpf/skel_internal.h | 3 +-
tools/testing/selftests/bpf/Makefile | 7 +-
13 files changed, 527 insertions(+), 147 deletions(-)
--
2.34.1