On Tue, Jul 6, 2021 at 4:56 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 04:09:25PM +0200, Daniel Vetter wrote:
Anyway, for anything that works like a gpu accelerator, like 3d accel, or parallel compute accel (aka gpgpu) or spatial compute accel (aka NN/AI) or maybe even fpga accel most of the magic to use the hardware is in this backend compiler, which translates from an IR into whatever your accelerator consumes. That's the part we really care about for modern accelerators because without that defacto the hardware is useless. Generally these chips have full-blown, if special purpose ISA, with register files, spilling, branches, loops and other control flow (sometimes only execution masks on simpler hw).
I don't know if I see it so clearly as you do - at the end of the day the user keys in the program in some proprietary (or open!) language and and wack of propritary magic transforms it to "make it work".
There are many barriers that prevent someone without the secret knowledge from duplicating the end result of a working program. An accelerator ISA is certainly one example, but I wouldn't overly focus on it as the only blocker.
Well we don't, we do just ask for the full driver stack to make the hw work. It's just that in the past most vendors choose to leave out the compiler/ISA from their open stack/specs. Well except nvidia, which still chooses to leave out everything aside from some very, very minimal thing around documenting display functionality.
Like you said below the NVIDIA GPU ISA seems known but the HW is still not really useful for other reasons.
Habana seems to have gone the other way, the HW is fully useful but we don't have the ISA transformation and other details.
You can actually use nvidia gpus, they're fully functional.
If you install the blobby stack. Which is exactly the same thing as with habanalabs, plus/minus a few things at the fringes.
In the end it's about drawing the line somewhere, so maybe we should merge the nvidia glue code that makes their blobby stack work better with upstream? There's quite a few pieces there, e.g. their display driver is by design a userspace driver, whereas with kernel modesetting it needs to be in the kernel to expose the common kms ioctl interfaces, so they've built up a glue layer to forward everything to userspace and back. On windows it works because there kernel code can have growing stacks and fun stuff like that, at least that's my understanding. Not really an option to just run the code in linux.
I'm pretty sure nvidia would appreciate that, and maybe every once in a while they open up a header for a generation or two of products like they've done in the past.
Both cases seem to have ended up with something useless, and I have a hard time saying nouveau has more right to be in the kernel tree than Habana does.
Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks.
That's not possible without some serious cans of worms though. Wayland is a protocol, and you can't forbid people from implementing it. Otherwise all the compatible open implementations of closed protocols wouldn't be possible either.
Well, in many ways so is Linux, but nobody would seriously re-implement Linux just to produce a driver.
Well in the gpu space for 2+ decades nvidia has been setting the standard, and the open stack has been trying to catch up by reimplementing the entire thing. It took a fair while.
So I'm not clear what you're suggesting here we should do different.
Not enabling proprietary stacks as above would be a good start.
I'm still not sure what exactly you mean here. Like on the 3d side there's opengl and vulkan, and nvidia just has an entirely different implementation of that compared to any of the open drivers. That is a bit less code than linux, but it's not small, and reimplementing over decades is pretty much what happened. And if it's not allowed we'd actually not have an open 3d gpu stack at all, because only very recently did we get an agreement around the tracemark/licensing issues of that stuff with Khronos. Recently compared to the history of opengl at least.
So I'm still not clear what exactly it is you're suggesting we should do? Not implement the industry standards for 3d (and accept we stay irrelevant forever)? Reject nvidia blobs harder than we do already? Distros will continue to ship an auto-installer for that stack, at least some, so we're pretty much maxed out already. Like in what way do you think the upstream stack does enable the proprietary nvidia stack? Should we permanently ban any contributions from anyone with an @nvidia.com address, even if it helps the open stack improve?
Like I'm not seeing something concrete that could be done, which would actually prevent nvidia from having their completely independent stack, with exact same functionality and not a line of code shared. Which is were we are right now. The only thing where we could be more strict is to reject any contributions from them at all, just because we don't like them. That seems a bit too extreme -Daniel