Problem
=======
When host APEI is unable to claim a synchronous external abort (SEA)
during guest abort, today KVM directly injects an asynchronous SError
into the VCPU then resumes it. The injected SError usually results in
unpleasant guest kernel panic.
One of the major situation of guest SEA is when VCPU consumes recoverable
uncorrected memory error (UER), which is not uncommon at all in modern
datacenter servers with large amounts of physical memory. Although SError
and guest panic is sufficient to stop the propagation of corrupted memory,
there is room to recover from an UER in a more graceful manner.
Proposed Solution
=================
The idea is, we can replay the SEA to the faulting VCPU. If the memory
error consumption or the fault that cause SEA is not from guest kernel,
the blast radius can be limited to the poison-consuming guest process,
while the VM can keep running.
In addition, instead of doing under the hood without involving userspace,
there are benefits to redirect the SEA to VMM:
- VM customers care about the disruptions caused by memory errors, and
VMM usually has the responsibility to start the process of notifying
the customers of memory error events in their VMs. For example some
cloud provider emits a critical log in their observability UI [1], and
provides a playbook for customers on how to mitigate disruptions to
their workloads.
- VMM can protect future memory error consumption by unmapping the poisoned
pages from stage-2 page table with KVM userfault [2], or by splitting the
memslot that contains the poisoned pages.
- VMM can keep track of SEA events in the VM. When VMM thinks the status
on the host or the VM is bad enough, e.g. number of distinct SEAs
exceeds a threshold, it can restart the VM on another healthy host.
- Behavior parity with x86 architecture. When machine check exception
(MCE) is caused by VCPU, kernel or KVM signals userspace SIGBUS to
let VMM either recover from the MCE, or terminate itself with VM.
The prior RFC proposes to implement SIGBUS on arm64 as well, but
Marc preferred KVM exit over signal [3]. However, implementation
aside, returning SEA to VMM is on par with returning MCE to VMM.
Once SEA is redirected to VMM, among other actions, VMM is encouraged
to inject external aborts into the faulting VCPU.
New UAPIs
=========
This patchset introduces following userspace-visible changes to empower
VMM to control what happens for SEA on guest memory:
- KVM_CAP_ARM_SEA_TO_USER. While taking SEA, if userspace has enabled
this new capability at VM creation, and the SEA is not owned by kernel
allocated memory, instead of injecting SError, return KVM_EXIT_ARM_SEA
to userspace.
- KVM_EXIT_ARM_SEA. This is the VM exit reason VMM gets. The details
about the SEA is provided in arm_sea as much as possible, including
sanitized ESR value at EL2, faulting guest virtual and physical
addresses if available.
* From v2 [4]:
- Rebased on "[PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection" [5]
and kvmarm/next commit 7b8346bd9fce ("KVM: arm64: Don't attempt vLPI
mappings when vPE allocation is disabled")
- Took the host_owns_sea implementation from Oliver [6, 7].
- Excluded the guest SEA injection patches.
- Updated selftest.
* From v1 [8]:
- Rebased on commit 4d62121ce9b5 ("KVM: arm64: vgic-debug: Avoid
dereferencing NULL ITE pointer").
- Sanitize ESR_EL2 before reporting it to userspace.
- Do not do KVM_EXIT_ARM_SEA when SEA is caused by memory allocated to
stage-2 translation table.
[1] https://cloud.google.com/solutions/sap/docs/manage-host-errors
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@google.com
[3] https://lore.kernel.org/kvm/86pljbqqh0.wl-maz@kernel.org
[4] https://lore.kernel.org/kvm/20250604050902.3944054-1-jiaqiyan@google.com/
[5] https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.…
[6] https://lore.kernel.org/kvm/aHFohmTb9qR_JG1E@linux.dev/#t
[7] https://lore.kernel.org/kvm/aHK-DPufhLy5Dtuk@linux.dev/
[8] https://lore.kernel.org/kvm/20250505161412.1926643-1-jiaqiyan@google.com
Jiaqi Yan (3):
KVM: arm64: VM exit to userspace to handle SEA
KVM: selftests: Test for KVM_EXIT_ARM_SEA
Documentation: kvm: new UAPI for handling SEA
Documentation/virt/kvm/api.rst | 61 ++++
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/arm.c | 5 +
arch/arm64/kvm/mmu.c | 68 +++-
include/uapi/linux/kvm.h | 10 +
tools/arch/arm64/include/asm/esr.h | 2 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/arm64/sea_to_user.c | 327 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
9 files changed, 476 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
--
2.50.1.565.gc32cd1483b-goog
Hi,
While staring at epoll, I noticed ep_events_available() looks wrong. I
wrote a small program to confirm, and yes it is definitely wrong.
This series adds a reproducer to kselftest, and fix the bug.
Nam Cao (2):
selftests/eventpoll: Add test for multiple waiters
eventpoll: Fix epoll_wait() report false negative
fs/eventpoll.c | 16 +------
.../filesystems/epoll/epoll_wakeup_test.c | 45 +++++++++++++++++++
2 files changed, 47 insertions(+), 14 deletions(-)
--
2.39.5
From: Jeff Xu <jeffxu(a)google.com>
Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
X bit.For example, if a container has vm.memfd_noexec=2, then
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.
Change history:
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).
v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h
v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c
v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().
v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
memfd_create to be non-executable.
v1:https://lwn.net/Articles/890096/
[1] https://crbug.com/1305411
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me…
[3] https://lwn.net/Articles/781013/
Daniel Verkamp (2):
mm/memfd: add F_SEAL_EXEC
selftests/memfd: add tests for F_SEAL_EXEC
Jeff Xu (4):
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
mm/memfd: security hook for memfd_create
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/pid_namespace.h | 19 ++
include/linux/security.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 4 +
kernel/pid_namespace.c | 5 +
kernel/pid_sysctl.h | 59 ++++
mm/memfd.c | 61 +++-
mm/shmem.c | 6 +
security/security.c | 5 +
tools/testing/selftests/memfd/fuse_test.c | 1 +
tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
13 files changed, 510 insertions(+), 3 deletions(-)
create mode 100644 kernel/pid_sysctl.h
base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
--
2.39.0.rc1.256.g54fd8350bd-goog
From: Yicong Yang <yangyicong(a)hisilicon.com>
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
Add support for Armv8.7 FEAT_{LS64, LS64_V}:
- Add identifying and enabling in the cpufeature list
- Expose the support of these features to userspace through HWCAP3
and cpuinfo
- Add related hwcap test
- Handle the trap of unsupported memory (normal/uncacheable) access in a VM
A real scenario for this feature is that the userspace driver can make use of
this to implement direct WQE (workqueue entry) - a mechanism to fill WQE
directly into the hardware.
Picked Marc's 2 patches form [1] for handling the LS64 trap in a VM on emulated
MMIO and the introduce of KVM_EXIT_ARM_LDST64B.
[1] https://lore.kernel.org/linux-arm-kernel/20240815125959.2097734-1-maz@kerne…
Tested with updated hwcap test:
[root@localhost tmp]# dmesg | grep "All CPU(s) started"
[ 14.789859] CPU: All CPU(s) started at EL2
[root@localhost tmp]# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64_V
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
root@localhost:/mnt# dmesg | grep "All CPU(s) started"
[ 0.281152] CPU: All CPU(s) started at EL1
root@localhost:/mnt# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
Change since v3:
- Inject DABT fault for LS64 fault on unsupported memory but with valid memslot
Link: https://lore.kernel.org/linux-arm-kernel/20250626080906.64230-1-yangyicong@…
Change since v2:
- Handle the LS64 fault to userspace and allow userspace to inject LS64 fault
- Reorder the patches to make KVM handling prior to feature support
Link: https://lore.kernel.org/linux-arm-kernel/20250331094320.35226-1-yangyicong@…
Change since v1:
- Drop the support for LS64_ACCDATA
- handle the DABT of unsupported memory type after checking the memory attributes
Link: https://lore.kernel.org/linux-arm-kernel/20241202135504.14252-1-yangyicong@…
Marc Zyngier (2):
KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
Yicong Yang (5):
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported
memory
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
arm64: Add support for FEAT_{LS64, LS64_V}
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
kselftest/arm64: Add HWCAP test for FEAT_{LS64, LS64_V}
Documentation/arch/arm64/booting.rst | 12 +++
Documentation/arch/arm64/elf_hwcaps.rst | 6 ++
Documentation/virt/kvm/api.rst | 43 +++++++++--
arch/arm64/include/asm/el2_setup.h | 12 ++-
arch/arm64/include/asm/esr.h | 8 ++
arch/arm64/include/asm/hwcap.h | 2 +
arch/arm64/include/asm/kvm_emulate.h | 7 ++
arch/arm64/include/uapi/asm/hwcap.h | 2 +
arch/arm64/kernel/cpufeature.c | 51 +++++++++++++
arch/arm64/kernel/cpuinfo.c | 2 +
arch/arm64/kvm/inject_fault.c | 29 ++++++++
arch/arm64/kvm/mmio.c | 27 ++++++-
arch/arm64/kvm/mmu.c | 14 +++-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/kvm.h | 3 +-
tools/testing/selftests/arm64/abi/hwcap.c | 90 +++++++++++++++++++++++
16 files changed, 299 insertions(+), 11 deletions(-)
--
2.24.0
For testing the functionality of the vDSO, it is necessary to build
userspace programs for multiple different architectures.
It is additional work to acquire matching userspace cross-compilers with
full C libraries and then building root images out of those.
The kernel tree already contains nolibc, a small, header-only C library.
By using it, it is possible to build userspace programs without any
additional dependencies.
For example the kernel.org crosstools or multi-target clang can be used
to build test programs for a multitude of architectures.
While nolibc is very limited, it is enough for many selftests.
With some minor adjustments it is possible to make parse_vdso.c
compatible with nolibc.
As an example, vdso_standalone_test_x86 is now built from the same C
code as the regular vdso_test_gettimeofday, while still being completely
standalone.
Also drop the dependency of parse_vdso.c on the elf.h header from libc and only
use the one from the kernel's UAPI.
While this series is useful on its own now, it will also integrate with the
kunit UAPI framework currently under development:
https://lore.kernel.org/lkml/20250217-kunit-kselftests-v1-0-42b4524c3b0a@li…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Provide a limits.h header in nolibc
- Pick up Reviewed-by tags from Kees
- Link to v1: https://lore.kernel.org/r/20250203-parse_vdso-nolibc-v1-0-9cb6268d77be@linu…
---
Thomas Weißschuh (16):
MAINTAINERS: Add vDSO selftests
elf, uapi: Add definition for STN_UNDEF
elf, uapi: Add definition for DT_GNU_HASH
elf, uapi: Add definitions for VER_FLG_BASE and VER_FLG_WEAK
elf, uapi: Add type ElfXX_Versym
elf, uapi: Add types ElfXX_Verdef and ElfXX_Veraux
tools/include: Add uapi/linux/elf.h
selftests: Add headers target
tools/nolibc: add limits.h shim header
selftests: vDSO: vdso_standalone_test_x86: Use vdso_init_form_sysinfo_ehdr
selftests: vDSO: parse_vdso: Drop vdso_init_from_auxv()
selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers
selftests: vDSO: parse_vdso: Test __SIZEOF_LONG__ instead of ULONG_MAX
selftests: vDSO: vdso_test_gettimeofday: Clean up includes
selftests: vDSO: vdso_test_gettimeofday: Make compatible with nolibc
selftests: vDSO: vdso_standalone_test_x86: Switch to nolibc
MAINTAINERS | 1 +
include/uapi/linux/elf.h | 38 ++
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/limits.h | 7 +
tools/include/uapi/linux/elf.h | 524 +++++++++++++++++++++
tools/testing/selftests/lib.mk | 5 +-
tools/testing/selftests/vDSO/Makefile | 11 +-
tools/testing/selftests/vDSO/parse_vdso.c | 19 +-
tools/testing/selftests/vDSO/parse_vdso.h | 1 -
.../selftests/vDSO/vdso_standalone_test_x86.c | 143 +-----
.../selftests/vDSO/vdso_test_gettimeofday.c | 4 +-
11 files changed, 590 insertions(+), 164 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20241017-parse_vdso-nolibc-e069baa7ff48
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v21:
- Align to bar size, try to fix Niklas reported problem.
- Rebase to v6.16-rc5
- Link to v20: https://lore.kernel.org/r/20250709-ep-msi-v20-0-43d56f9bd54a@nxp.com
Changes in v20:
- remove set epf of_node's patch and only support one epf now.
- move imx6's patch to first
- detail change see each patches' change log
- Link to v19: https://lore.kernel.org/r/20250609-ep-msi-v19-0-77362eaa48fa@nxp.com
Changes in v19:
- irq part already in v6.16-rc1, only missed pcie/dts part
- rebase to v6.16-rc1
- update commit message for patch IMMUTABLE check.
- Link to v18: https://lore.kernel.org/r/20250414-ep-msi-v18-0-f69b49917464@nxp.com
Changes in v18:
- pci-ep.yaml: sort property order, fix maxvalue to 0x7ffff for msi-map-mask and
iommu-map-mask
- Link to v17: https://lore.kernel.org/r/20250407-ep-msi-v17-0-633ab45a31d0@nxp.com
Changes in v17:
- move document part to pci-ep.yaml
- Link to v16: https://lore.kernel.org/r/20250404-ep-msi-v16-0-d4919d68c0d0@nxp.com
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (9):
PCI: imx6: Add helper function imx_pcie_add_lut_by_rid()
PCI: imx6: Add LUT configuration for MSI/IOMMU in Endpoint mode
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
arm64: dts: imx95: Add msi-map for pci-ep device
Documentation/PCI/endpoint/pci-test-howto.rst | 14 +++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/misc/pci_endpoint_test.c | 85 ++++++++++++-
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Kconfig | 8 ++
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 136 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 98 +++++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 36 ++++++
include/linux/pci-ep-msi.h | 28 +++++
include/linux/pci-epf.h | 18 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 +++++
13 files changed, 470 insertions(+), 9 deletions(-)
---
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
--
Frank Li <Frank.Li(a)nxp.com>
Fix to use the return value of the function 'chdir("/")' and check if the
return is either 0 (ok) or 1 (not ok, so the test stops).
The patch fies the solves the following errors:
mount-notify_test.c:468:17: warning: ignoring return value of ‘chdir’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
468 | chdir("/");
mount-notify_test_ns.c:489:17: warning: ignoring return value of
‘chdir’ declared with attribute ‘warn_unused_result’ [-Wunused-
result]
489 | chdir("/");
To reproduce the issue, use the command:
make kselftest TARGET=filesystems/statmount
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
.../selftests/filesystems/mount-notify/mount-notify_test_ns.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 5a3b0ace1a88..a7f899599d52 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -458,7 +458,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
index d91946e69591..dc9eb3087a1a 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
@@ -486,7 +486,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
--
2.43.0
The BTF dumper code currently displays arrays of characters as just that -
arrays, with each character formatted individually. Sometimes this is what
makes sense, but it's nice to be able to treat that array as a string.
This change adds a special case to the btf_dump functionality to allow
0-terminated arrays of single-byte integer values to be printed as
character strings. Characters for which isprint() returns false are
printed as hex-escaped values. This is enabled when the new ".emit_strings"
is set to 1 in the btf_dump_type_data_opts structure.
As an example, here's what it looks like to dump the string "hello" using
a few different field values for btf_dump_type_data_opts (.compact = 1):
- .emit_strings = 0, .skip_names = 0: (char[6])['h','e','l','l','o',]
- .emit_strings = 0, .skip_names = 1: ['h','e','l','l','o',]
- .emit_strings = 1, .skip_names = 0: (char[6])"hello"
- .emit_strings = 1, .skip_names = 1: "hello"
Here's the string "h\xff", dumped with .compact = 1 and .skip_names = 1:
- .emit_strings = 0: ['h',-1,]
- .emit_strings = 1: "h\xff"
Signed-off-by: Blake Jones <blakejones(a)google.com>
---
tools/lib/bpf/btf.h | 3 ++-
tools/lib/bpf/btf_dump.c | 55 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 4392451d634b..ccfd905f03df 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -326,9 +326,10 @@ struct btf_dump_type_data_opts {
bool compact; /* no newlines/indentation */
bool skip_names; /* skip member/type names */
bool emit_zeroes; /* show 0-valued fields */
+ bool emit_strings; /* print char arrays as strings */
size_t :0;
};
-#define btf_dump_type_data_opts__last_field emit_zeroes
+#define btf_dump_type_data_opts__last_field emit_strings
LIBBPF_API int
btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 460c3e57fadb..7c2f1f13f958 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -68,6 +68,7 @@ struct btf_dump_data {
bool compact;
bool skip_names;
bool emit_zeroes;
+ bool emit_strings;
__u8 indent_lvl; /* base indent level */
char indent_str[BTF_DATA_INDENT_STR_LEN];
/* below are used during iteration */
@@ -2028,6 +2029,52 @@ static int btf_dump_var_data(struct btf_dump *d,
return btf_dump_dump_type_data(d, NULL, t, type_id, data, 0, 0);
}
+static int btf_dump_string_data(struct btf_dump *d,
+ const struct btf_type *t,
+ __u32 id,
+ const void *data)
+{
+ const struct btf_array *array = btf_array(t);
+ const char *chars = data;
+ __u32 i;
+
+ /* Make sure it is a NUL-terminated string. */
+ for (i = 0; i < array->nelems; i++) {
+ if ((void *)(chars + i) >= d->typed_dump->data_end)
+ return -E2BIG;
+ if (chars[i] == '\0')
+ break;
+ }
+ if (i == array->nelems) {
+ /* The caller will print this as a regular array. */
+ return -EINVAL;
+ }
+
+ btf_dump_data_pfx(d);
+ btf_dump_printf(d, "\"");
+
+ for (i = 0; i < array->nelems; i++) {
+ char c = chars[i];
+
+ if (c == '\0') {
+ /*
+ * When printing character arrays as strings, NUL bytes
+ * are always treated as string terminators; they are
+ * never printed.
+ */
+ break;
+ }
+ if (isprint(c))
+ btf_dump_printf(d, "%c", c);
+ else
+ btf_dump_printf(d, "\\x%02x", (__u8)c);
+ }
+
+ btf_dump_printf(d, "\"");
+
+ return 0;
+}
+
static int btf_dump_array_data(struct btf_dump *d,
const struct btf_type *t,
__u32 id,
@@ -2055,8 +2102,13 @@ static int btf_dump_array_data(struct btf_dump *d,
* char arrays, so if size is 1 and element is
* printable as a char, we'll do that.
*/
- if (elem_size == 1)
+ if (elem_size == 1) {
+ if (d->typed_dump->emit_strings &&
+ btf_dump_string_data(d, t, id, data) == 0) {
+ return 0;
+ }
d->typed_dump->is_array_char = true;
+ }
}
/* note that we increment depth before calling btf_dump_print() below;
@@ -2544,6 +2596,7 @@ int btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
d->typed_dump->compact = OPTS_GET(opts, compact, false);
d->typed_dump->skip_names = OPTS_GET(opts, skip_names, false);
d->typed_dump->emit_zeroes = OPTS_GET(opts, emit_zeroes, false);
+ d->typed_dump->emit_strings = OPTS_GET(opts, emit_strings, false);
ret = btf_dump_dump_type_data(d, NULL, t, id, data, 0, 0);
--
2.49.0.1204.g71687c7c1d-goog