On Tue, Aug 19, 2025 at 5:13 PM Radim Krčmář rkrcmar@ventanamicro.com wrote:
2025-08-19T12:00:43+05:30, Anup Patel apatel@ventanamicro.com:
On Mon, Aug 18, 2025 at 3:59 PM Radim Krčmář rkrcmar@ventanamicro.com wrote:
2025-08-14T21:25:42+05:30, Anup Patel apatel@ventanamicro.com:
This series adds ONE_REG interface for SBI FWFT extension implemented by KVM RISC-V.
I think it would be better to ONE_REG the CSRs (medeleg/menvcfg), or at least expose their CSR fields (each sensible medeleg bit, PMM, ...) through kvm_riscv_config, than to couple this with SBI/FWFT.
The controlled behavior is defined by the ISA, and userspace might want to configure the S-mode execution environment even when SBI/FWFT is not present, which is not possible with the current design.
Is there a benefit in expressing the ISA model through SBI/FWFT?
Exposing medeleg/menvcfg is not the right approach because a Guest/VM does not have M-mode hence it is not appropriate to expose m<xyz> CSRs via ONE_REG interface. This also aligns with H-extension architecture which does not virtualize M-mode.
We already have mvendorid, marchid, and mipid in kvm_riscv_config.
The mvendorid, marchid, and mipid are accessible via SBI BASE extension but not any other M-mode CSRs hence these are special.
The virtualized M-mode is userspace+KVM. (KVM doesn't allow userspace to configure most things now, but I think we'll have to change that when getting ready for production.)
The RISC-V architecture is not designed to virtualize M-mode and there is no practical use-case for virtualized M-mode hence WE WON'T BE SUPPORTING IT IN KVM RISC-V.
FYI, the KVM ARM64 does not virtualize EL3 either and it is already in production so please stop making random arguments for requiring virtualized M-mode for production.
We already had discussions about this in the past.
As such, we have two options. One option is to expose hedeleg/henvcfg via kvm_riscv_config and another option is to have a separate ONE_REG for each FWFT feature.
Separate ONE_REG registers for each FWFT feature is better than directly exposing hedeleg/henvcfg via ONE_REG because:
- Once nested virtualization lands, we will be having separate
hedeleg/henvcfg as part of nested virtualization state of Guest which is trap-n-emulated by KVM. The existence of hedeleg/henvcfg in kvm_riscv_config and nested virtualization state will only create more confusion.
Right, the userspace registers for this can't be called h*.
- Not all bits in hedeleg/henvcfg are used for FWFT since quite
a few bits are programmed with fixed value based on KVM implementation choices (which may change in future).
Yes, we'll want to expose some to userspace.
Also,
things like set_debug_ioctl() change hedeleg at runtime which allow KVM user space to decide who takes breakpoint traps from Guest/VM.
This is still doable. The clear hedeleg bit does not have to change the virtualized behavior -- if the guest is expecting to see breakpoint traps, then even if userspace+KVM configure the architecture to direct the traps to the hypervisor, they must then forward the breakpoints that were supposed to be delivered to the guest.
This means value saved/restored
through hedeleg/henvcfg in kvm_riscv_config becomes specific to the kernel version and specific to host ISA features.
Hedeleg/henvcfg bits do not have to be the same as userspace interface bits -- KVM always has to distinguish what the userspace wants to virtualize, and what the KVM changed for its own reasons.
- We anyway need to provide ONE_REG interface to
save/restore FWFT feature flags so it's better to keep the FWFT feature value as part of the same ONE_REG interface.
I think we want to have SBI in userspace (especially for single-shot ecalls like FWFT). The userspace implementation will want an interface to set the ISA bits, and it's very awkward with the proposed design.
Flags can to stay, in case the userpace wants to accelerate FWFT.
- The availability of quite a few FWFT features is dependent
on corresponding ISA extensions so having separate ONE_REG registers of each FWFT feature allows get_reg_list_ioctl() to provide KVM user-space only available FWFT feature registers.
Yes, but similarly the userspace would be forbidden from setting bits that cannot be expressed in henvcfg/hededeg.
Instead of having henvcfg/hededeg via ONE_REG with restrictions, it's much cleaner and maintainable to have a separate ONE_REG register for each FWFT feature.
There are also behaviors we want to configure that do not have a FWFT toggle. e.g. the recent patches for delegation of illegal-instruction exceptions that changed the guest behavior -- someone might want to keep incrementing the SBI PMU counter, and someone will want to forward them to be implemented in userspace (when developing a new extension, because most of the existing ISA can still be accelerated by KVM).
There are alternate approaches for supporting SBI PMU counters in user-mode where we don't need virtualized M-mode. In any case, we will support user-mode emulated perf counters only when absolutely needed.
For general virtualization, we want to be able to configure the following behavior for each exception that would go to the virtualized M-mode: 0) delegated to the guest
- implemented by userspace
2-N) implementations by KVM (ideally zero or one)
We can have medeleg, and another method to decide how to handle trapped exceptions, but it probably makes more sense to have a per-exception ONE_REG that sets how each exception behaves.
No pointing in discussing this further since we won't be supporting virtualized M-mode.
-- Anup