On Tue, Mar 28, 2023 at 1:35 PM Heiko Stübner heiko@sntech.de wrote:
Am Montag, 27. März 2023, 18:31:57 CEST schrieb Evan Green:
There's been a bunch of off-list discussions about this, including at Plumbers. The original plan was to do something involving providing an ISA string to userspace, but ISA strings just aren't sufficient for a stable ABI any more: in order to parse an ISA string users need the version of the specifications that the string is written to, the version of each extension (sometimes at a finer granularity than the RISC-V releases/versions encode), and the expected use case for the ISA string (ie, is it a U-mode or M-mode string). That's a lot of complexity to try and keep ABI compatible and it's probably going to continue to grow, as even if there's no more complexity in the specifications we'll have to deal with the various ISA string parsing oddities that end up all over userspace.
Instead this patch set takes a very different approach and provides a set of key/value pairs that encode various bits about the system. The big advantage here is that we can clearly define what these mean so we can ensure ABI stability, but it also allows us to encode information that's unlikely to ever appear in an ISA string (see the misaligned access performance, for example). The resulting interface looks a lot like what arm64 and x86 do, and will hopefully fit well into something like ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of it. The vDSO function can answer some queries without a syscall at all, and falls back to the syscall for cases it doesn't have answers to. Currently we prepopulate it with an array of answers for all keys and a CPU set of "all CPUs". This can be adjusted as necessary to provide fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an ifunc selector for memcpy can be found at [1]. I'm about to send a v2 of that series out that incorporates the vDSO function.
I was asked about the performance delta between this and something like sysfs. I created a small test program [2] and ran it on a Nezha D1 Allwinner board. Doing each operation 100000 times and dividing, these operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
Looks like this series spawned a thread on one of the riscv-lists [0].
As auxvals were mentioned in that thread, I was wondering what's the difference between doing a new syscall vs. putting the keys + values as architecture auxvec elements [1] ?
The auxvec approach would also work. The primary difference is that auxvec bits are actively copied into every new process, forever. If you predict a slow pace of new bits coming in, the auxvec approach probably makes more sense. This series was born out of a prediction that this set of "stuff" was going to be larger than traditional x86/ARM architectures, fiddly (ie bits possibly representing specific versions of various extensions), evolving regularly over time, and heterogeneous between cores. With that sort of rubber band ball in mind, a key/value interface seemed to make more sense.
-Evan