Hi Thomas!
On Sun, Dec 22, 2024 at 12:39:01PM +0100, Thomas Weißschuh wrote:
Maybe it will be time for us to run an overall audit of arch-dependent syscalls we currently have, to make sure that the common ones continue to work fine there (and waitpid() definitely is as common a syscall as open() since it's the good old and portable one).
Isn't this what nolibc-test is already doing?
My concern is that it might be progressively going away from this if we replace some standard syscalls with new ones that are cross-arch.
Or do you also want to compare it to non-current kernel versions?
I mean that we progressively replace old posix calls with new cross arch ones in the system (e.g. open->openat, waitpid->waitid etc) and normally it's a libc's role to preserve application-level compatibility by maintaining the mapping between standard ones and specific ones so that applications relying on standard ones continue to work, and that was one of the original goals of nolibc.
I have nothing against missing some calls in newly added architectures, of course, but when I'm seeing for example that we switch some of the lower layer tests to use a pipe because some call was not present, I tend to think that maybe we should first define what is the minimal set of working syscalls that the nolibc-test program requires to be usable on any arch.
In the current case, we seem to have to arbiter between pipe() and lseek() support for basic nolibc-test support. But maybe a new arch will be added for which it will be the opposite choice between the two. We may very well require both of them to work if needed, or either, at the risk of delaying support of a specific arch in the future, but that's fine.
Second we should have a new look at all our supported calls and check if some of them are present while the legacy calls they're supposed to replace is missing (which would be perfectly possible). For example if we had implemented waitpid() much later, it would have been perfectly possible that we'd only implement waitid() and miss waitpid() that applications expect.
Honestly it's not a particularly interesting job to do. That's why I'm mostly saying that we should just keep that in mind to be careful with new additions.
In general the special rv32 syscalls are not really architecture-dependent, they just dropped the "legacy" ones, especially all using 32bit timestamps.
I understand, and when adding a new arch we need to start with something. I just think that we should consider that for a new arch to switch from "in progress" to "working", it would require the legacy ones working on other archs to work on that one as well. My concern is that early boot tools would only build on certain archs but not all when all of them are supposed to be in a working state. When it fails everywhere that's fine, it just means we're missing some calls and the user is welcome to submit a patch. But when the user only tests on, say, x86 and arm, and someone relies on that to package kernels and discovers late that it fails on riscv for example, that's a problem. Note that I'm just making up examples, and not designating any particular issue.
Maybe it would be convenient to maintain a support matrix for the syscalls we currently support. It could look something like:
waitpid() x86: native arm: native riscv32: via waitid() foobar: not yet
open() ...
etc. I could try to work on such a thing if you're interested as well, but not now as I don't have the time at the moment.
Cheers, Willy