On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay oded.gabbay@gmail.com wrote:
On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote:
Hi, I'm sending v4 of this patch-set following the long email thread. I want to thank Jason for reviewing v3 and pointing out the errors, saving us time later to debug it :)
I consulted with Christian on how to fix patch 2 (the implementation) and at the end of the day I shamelessly copied the relevant content from amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively.
I also made a few improvements after looking at the relevant code in amdgpu. The details are in the changelog of patch 2.
I took the time to write an import code into the driver, allowing me to check real P2P with two Gaudi devices, one as exporter and the other as importer. I'm not going to include the import code in the product, it was just for testing purposes (although I can share it if anyone wants).
I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy).
Greg, I hope this will be good enough for you to merge this code.
So we're officially going to use dri-devel for technical details review and then Greg for merging so we don't have to deal with other merge criteria dri-devel folks have?
I'm glad to receive any help or review, regardless of the subsystem the person giving that help belongs to.
I don't expect anything less by now, but it does make the original claim that drivers/misc will not step all over accelerators folks a complete farce under the totally-not-a-gpu banner.
This essentially means that for any other accelerator stack that doesn't fit the dri-devel merge criteria, even if it's acting like a gpu and uses other gpu driver stuff, you can just send it to Greg and it's good to go.
What's wrong with Greg ??? ;)
On a more serious note, yes, I do think the dri-devel merge criteria is very extreme, and effectively drives-out many AI accelerator companies that want to contribute to the kernel but can't/won't open their software IP and patents.
I think the expectation from AI startups (who are 90% of the deep learning field) to cooperate outside of company boundaries is not realistic, especially on the user-side, where the real IP of the company resides.
Personally I don't think there is a real justification for that at this point of time, but if it will make you (and other people here) happy I really don't mind creating a non-gpu accelerator subsystem that will contain all the totally-not-a-gpu accelerators, and will have a more relaxed criteria for upstreaming. Something along an "rdma-core" style library looks like the correct amount of user-level open source that should be enough.
The question is, what will happen later ? Will it be sufficient to "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. hmm) ?
If the community and dri-devel maintainers (and you among them) will assure me it is good enough, then I'll happily contribute my work and personal time to organize this effort and implement it.
I think dri-devel stance is pretty clear and well known: We want the userspace to be open, because that's where most of the driver stack is. Without an open driver stack there's no way to ever have anything cross-vendor.
And that includes the compiler and anything else you need to drive the hardware.
Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless.
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful. If the business case says you can't open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything.
If the stack is actually cross vendor to begin with that's just bonus, but generally that doesn't happen voluntarily and needs a few years to decades to get there. So that's not really something we require.
tldr; just a runtime isn't enough for dri-devel.
Now Greg seems to be happy to merge kernel drivers that aren't useful with the open bits provided, so *shrug*.
Cheers, Daniel
PS: If requiring an actually useful open driver stack is somehow *extreme* I have no idea why we even bother with merging device drivers to upstream. Just make a stable driver api and done, vendors can then do whatever they feel like and protect their "valuable IP and patents" or whatever it is.
Thanks, oded
There's quite a lot of these floating around actually (and many do have semi-open runtimes, like habanalabs have now too, just not open enough to be actually useful). It's going to be absolutely lovely having to explain to these companies in background chats why habanalabs gets away with their stack and they don't.
Or maybe we should just merge them all and give up on the idea of having open cross-vendor driver stacks for these accelerators.
Thanks, Daniel
Thanks, Oded
Oded Gabbay (1): habanalabs: define uAPI to export FD for DMA-BUF
Tomer Tayar (1): habanalabs: add support for dma-buf exporter
drivers/misc/habanalabs/Kconfig | 1 + drivers/misc/habanalabs/common/habanalabs.h | 26 ++ drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- drivers/misc/habanalabs/gaudi/gaudi.c | 1 + drivers/misc/habanalabs/goya/goya.c | 1 + include/uapi/misc/habanalabs.h | 28 +- 6 files changed, 532 insertions(+), 5 deletions(-)
-- 2.25.1
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 12:36 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay oded.gabbay@gmail.com wrote:
On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote:
Hi, I'm sending v4 of this patch-set following the long email thread. I want to thank Jason for reviewing v3 and pointing out the errors, saving us time later to debug it :)
I consulted with Christian on how to fix patch 2 (the implementation) and at the end of the day I shamelessly copied the relevant content from amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively.
I also made a few improvements after looking at the relevant code in amdgpu. The details are in the changelog of patch 2.
I took the time to write an import code into the driver, allowing me to check real P2P with two Gaudi devices, one as exporter and the other as importer. I'm not going to include the import code in the product, it was just for testing purposes (although I can share it if anyone wants).
I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy).
Greg, I hope this will be good enough for you to merge this code.
So we're officially going to use dri-devel for technical details review and then Greg for merging so we don't have to deal with other merge criteria dri-devel folks have?
I'm glad to receive any help or review, regardless of the subsystem the person giving that help belongs to.
I don't expect anything less by now, but it does make the original claim that drivers/misc will not step all over accelerators folks a complete farce under the totally-not-a-gpu banner.
This essentially means that for any other accelerator stack that doesn't fit the dri-devel merge criteria, even if it's acting like a gpu and uses other gpu driver stuff, you can just send it to Greg and it's good to go.
What's wrong with Greg ??? ;)
On a more serious note, yes, I do think the dri-devel merge criteria is very extreme, and effectively drives-out many AI accelerator companies that want to contribute to the kernel but can't/won't open their software IP and patents.
I think the expectation from AI startups (who are 90% of the deep learning field) to cooperate outside of company boundaries is not realistic, especially on the user-side, where the real IP of the company resides.
Personally I don't think there is a real justification for that at this point of time, but if it will make you (and other people here) happy I really don't mind creating a non-gpu accelerator subsystem that will contain all the totally-not-a-gpu accelerators, and will have a more relaxed criteria for upstreaming. Something along an "rdma-core" style library looks like the correct amount of user-level open source that should be enough.
The question is, what will happen later ? Will it be sufficient to "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. hmm) ?
If the community and dri-devel maintainers (and you among them) will assure me it is good enough, then I'll happily contribute my work and personal time to organize this effort and implement it.
I think dri-devel stance is pretty clear and well known: We want the userspace to be open, because that's where most of the driver stack is. Without an open driver stack there's no way to ever have anything cross-vendor.
And that includes the compiler and anything else you need to drive the hardware.
Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless.
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful. If the business case says you can't open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything.
If the stack is actually cross vendor to begin with that's just bonus, but generally that doesn't happen voluntarily and needs a few years to decades to get there. So that's not really something we require.
tldr; just a runtime isn't enough for dri-devel.
Now Greg seems to be happy to merge kernel drivers that aren't useful with the open bits provided, so *shrug*.
Cheers, Daniel
PS: If requiring an actually useful open driver stack is somehow *extreme* I have no idea why we even bother with merging device drivers to upstream. Just make a stable driver api and done, vendors can then do whatever they feel like and protect their "valuable IP and patents" or whatever it is.
So perhaps this isn't clear, so let's explain this differently.
The deal when having a driver in upstream is that both the vendor and upstream benefits: - vendor gets their driver carried and adjusted in upstream, because there's no stable uapi, and the benefit of being included everywhere by default - upstream gets the benefit to be able to hack around in more drivers, which generally leads to a more robust subsystem and driver architecture
Now what you want is to have the benefits for you, without giving the wider community the benefit of actually being able to hack on your driver stack. Because you prefer to keep critical pieces of it protected and closed, which makes sure no one can create a new cross-vendor stack without your permission. Or without investing a lot of time into reverse-engineering the hardware. That's not extreme, that's just preferring to have your cake and eat it too.
And frankly on dri-devel we don't take such a loopsided deal. Greg otoh seems to be totally fine, or not really understand what it takes to build an accelerator stack, or I dunno what, but he's happy merging them.
Cheers, Daniel
Thanks, oded
There's quite a lot of these floating around actually (and many do have semi-open runtimes, like habanalabs have now too, just not open enough to be actually useful). It's going to be absolutely lovely having to explain to these companies in background chats why habanalabs gets away with their stack and they don't.
Or maybe we should just merge them all and give up on the idea of having open cross-vendor driver stacks for these accelerators.
Thanks, Daniel
Thanks, Oded
Oded Gabbay (1): habanalabs: define uAPI to export FD for DMA-BUF
Tomer Tayar (1): habanalabs: add support for dma-buf exporter
drivers/misc/habanalabs/Kconfig | 1 + drivers/misc/habanalabs/common/habanalabs.h | 26 ++ drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- drivers/misc/habanalabs/gaudi/gaudi.c | 1 + drivers/misc/habanalabs/goya/goya.c | 1 + include/uapi/misc/habanalabs.h | 28 +- 6 files changed, 532 insertions(+), 5 deletions(-)
-- 2.25.1
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 6, 2021 at 12:47 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jul 6, 2021 at 12:36 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jul 6, 2021 at 12:03 PM Oded Gabbay oded.gabbay@gmail.com wrote:
On Tue, Jul 6, 2021 at 11:40 AM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jul 05, 2021 at 04:03:12PM +0300, Oded Gabbay wrote:
Hi, I'm sending v4 of this patch-set following the long email thread. I want to thank Jason for reviewing v3 and pointing out the errors, saving us time later to debug it :)
I consulted with Christian on how to fix patch 2 (the implementation) and at the end of the day I shamelessly copied the relevant content from amdgpu_vram_mgr_alloc_sgt() and amdgpu_dma_buf_attach(), regarding the usage of dma_map_resource() and pci_p2pdma_distance_many(), respectively.
I also made a few improvements after looking at the relevant code in amdgpu. The details are in the changelog of patch 2.
I took the time to write an import code into the driver, allowing me to check real P2P with two Gaudi devices, one as exporter and the other as importer. I'm not going to include the import code in the product, it was just for testing purposes (although I can share it if anyone wants).
I run it on a bare-metal environment with IOMMU enabled, on a sky-lake CPU with a white-listed PCIe bridge (to make the pci_p2pdma_distance_many happy).
Greg, I hope this will be good enough for you to merge this code.
So we're officially going to use dri-devel for technical details review and then Greg for merging so we don't have to deal with other merge criteria dri-devel folks have?
I'm glad to receive any help or review, regardless of the subsystem the person giving that help belongs to.
I don't expect anything less by now, but it does make the original claim that drivers/misc will not step all over accelerators folks a complete farce under the totally-not-a-gpu banner.
This essentially means that for any other accelerator stack that doesn't fit the dri-devel merge criteria, even if it's acting like a gpu and uses other gpu driver stuff, you can just send it to Greg and it's good to go.
What's wrong with Greg ??? ;)
On a more serious note, yes, I do think the dri-devel merge criteria is very extreme, and effectively drives-out many AI accelerator companies that want to contribute to the kernel but can't/won't open their software IP and patents.
I think the expectation from AI startups (who are 90% of the deep learning field) to cooperate outside of company boundaries is not realistic, especially on the user-side, where the real IP of the company resides.
Personally I don't think there is a real justification for that at this point of time, but if it will make you (and other people here) happy I really don't mind creating a non-gpu accelerator subsystem that will contain all the totally-not-a-gpu accelerators, and will have a more relaxed criteria for upstreaming. Something along an "rdma-core" style library looks like the correct amount of user-level open source that should be enough.
On the "rdma-core" idea, afaik rdma NIC do not have fully programmable cores in their hw, for which you'd need some kind of compiler to make use of the hardware and the interfaces the kernel provides? So not really compareable, but also my understanding is that rdma-core does actually allow you to reasonable use&drive all the hw features and kernel interfaces fully.
So we actually want less on dri-devel, because for compute/accel chips we're currently happy with a vendor userspace. It just needs to be functional and complete, and open in its entirety.
Now if there's going to be a AI/NN/spatial compute core runtime with all the things included that's cross-vendor that's obviously going to be great, but that's strictly a bonus. And eventually the long-term goal, once we have a few open stacks from various vendors. But atm we have 0 open stacks, so one thing at a time.
The question is, what will happen later ? Will it be sufficient to "allow" us to use dmabuf and maybe other gpu stuff in the future (e.g. hmm) ?
If the community and dri-devel maintainers (and you among them) will assure me it is good enough, then I'll happily contribute my work and personal time to organize this effort and implement it.
I think dri-devel stance is pretty clear and well known: We want the userspace to be open, because that's where most of the driver stack is. Without an open driver stack there's no way to ever have anything cross-vendor.
And that includes the compiler and anything else you need to drive the hardware.
Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless.
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful. If the business case says you can't open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything.
If the stack is actually cross vendor to begin with that's just bonus, but generally that doesn't happen voluntarily and needs a few years to decades to get there. So that's not really something we require.
tldr; just a runtime isn't enough for dri-devel.
Now Greg seems to be happy to merge kernel drivers that aren't useful with the open bits provided, so *shrug*.
Cheers, Daniel
PS: If requiring an actually useful open driver stack is somehow *extreme* I have no idea why we even bother with merging device drivers to upstream. Just make a stable driver api and done, vendors can then do whatever they feel like and protect their "valuable IP and patents" or whatever it is.
So perhaps this isn't clear, so let's explain this differently.
The deal when having a driver in upstream is that both the vendor and upstream benefits:
- vendor gets their driver carried and adjusted in upstream, because
there's no stable uapi, and the benefit of being included everywhere
s/uapi/kernel driver api/ ofc, but I got it right in the first reply at least. -Daniel
by default
- upstream gets the benefit to be able to hack around in more drivers,
which generally leads to a more robust subsystem and driver architecture
Now what you want is to have the benefits for you, without giving the wider community the benefit of actually being able to hack on your driver stack. Because you prefer to keep critical pieces of it protected and closed, which makes sure no one can create a new cross-vendor stack without your permission. Or without investing a lot of time into reverse-engineering the hardware. That's not extreme, that's just preferring to have your cake and eat it too.
And frankly on dri-devel we don't take such a loopsided deal. Greg otoh seems to be totally fine, or not really understand what it takes to build an accelerator stack, or I dunno what, but he's happy merging them.
Cheers, Daniel
Thanks, oded
There's quite a lot of these floating around actually (and many do have semi-open runtimes, like habanalabs have now too, just not open enough to be actually useful). It's going to be absolutely lovely having to explain to these companies in background chats why habanalabs gets away with their stack and they don't.
Or maybe we should just merge them all and give up on the idea of having open cross-vendor driver stacks for these accelerators.
Thanks, Daniel
Thanks, Oded
Oded Gabbay (1): habanalabs: define uAPI to export FD for DMA-BUF
Tomer Tayar (1): habanalabs: add support for dma-buf exporter
drivers/misc/habanalabs/Kconfig | 1 + drivers/misc/habanalabs/common/habanalabs.h | 26 ++ drivers/misc/habanalabs/common/memory.c | 480 +++++++++++++++++++- drivers/misc/habanalabs/gaudi/gaudi.c | 1 + drivers/misc/habanalabs/goya/goya.c | 1 + include/uapi/misc/habanalabs.h | 28 +- 6 files changed, 532 insertions(+), 5 deletions(-)
-- 2.25.1
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 06, 2021 at 02:07:16PM +0200, Daniel Vetter wrote:
On the "rdma-core" idea, afaik rdma NIC do not have fully programmable cores in their hw, for which you'd need some kind of compiler to make use of the hardware and the interfaces the kernel provides? So not really compareable, but also my understanding is that rdma-core does actually allow you to reasonable use&drive all the hw features and kernel interfaces fully.
The whole HPC stack has speciality compilers of course. OpenMP, PGAS, etc. These compilers map onto library primitives that eventually boil down into rdma-core calls. Even the HW devices have various programmability that are being targetted with compilers now. People are making NIC devices with ARM cores/etc - P4 is emerging for some packet processing tasks.
rdma-core can drive all the kernel interfaces with at least an ioctl wrapper, and it has a test suite that tries to cover this. It does not exercise the full HW capability, programmability, etc of every single device.
I actually don't entirely know what everyone has built on top of rdma-core, or how I'd try to map it the DRI ideas you are trying to explain.
Should we ban all Intel RDMA drivers because they are shipping proprietary Intel HPC compilers and proprietary Intel MPI which drives their RDMA HW? Or is that OK because there are open analogs for some of that stuff? And yes, the open versions are inferior in various metrics.
Pragmatically what I want to see is enough RDMA common/open user space to understand the uAPI and thus more about how the kernel driver works. Forcing everyone into rdma-core has already prevented a number of uAPI mistakes in drivers that would have been bad - so at least this level really is valuable.
So we actually want less on dri-devel, because for compute/accel chips we're currently happy with a vendor userspace. It just needs to be functional and complete, and open in its entirety.
In a sense yes: DRI doesn't insist on a single code base to act as the kernel interface, but that is actually the thing that has brought the most value to RDMA, IMHO.
We've certainly had some interesting successes because of this. The first submission for AWS's EFA driver proposed to skip the rdma-core step, which was rejected. However since EFA has been in that ecosystem it has benefited greatly, I think.
However, in another sense no: RDMA hasn't been blocking, say Intel, just because they have built proprietary stuff on top of our open stack.
Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks.
Jason
On Tue, Jul 6, 2021 at 3:44 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 02:07:16PM +0200, Daniel Vetter wrote:
On the "rdma-core" idea, afaik rdma NIC do not have fully programmable cores in their hw, for which you'd need some kind of compiler to make use of the hardware and the interfaces the kernel provides? So not really compareable, but also my understanding is that rdma-core does actually allow you to reasonable use&drive all the hw features and kernel interfaces fully.
The whole HPC stack has speciality compilers of course. OpenMP, PGAS, etc. These compilers map onto library primitives that eventually boil down into rdma-core calls. Even the HW devices have various programmability that are being targetted with compilers now. People are making NIC devices with ARM cores/etc - P4 is emerging for some packet processing tasks.
Well it depends which compilers we're talking about here, and what kind of features. Higher level compilers that break down some fancy language like OpenMP into what that actually should do on a given hardware like gpu, or rdma-connected cluster, or whatever, we really don't care about. You don't need that to drive the hardware. Usually that stuff works by breaking some of the code down into cpu compiler IR (most of this is built on top of LLVM IR nowadays), interspersed with library calls to the runtime.
Now the thing I care about here is if things doen't get compiled down to cpu code, but to some other IR (SPIR-V is starting to win, but very often ist still a hacked up version of LLVM IR), which then in a hw-specific backend gets compiled down to instructions that run on the hw. I had no idea that rdma NICs can do that, but it sounds like? I guess maybe some openmpi operations could be done directly on the rdma chip, but I'm not sure why you'd want a backend compiler here.
Anyway, for anything that works like a gpu accelerator, like 3d accel, or parallel compute accel (aka gpgpu) or spatial compute accel (aka NN/AI) or maybe even fpga accel most of the magic to use the hardware is in this backend compiler, which translates from an IR into whatever your accelerator consumes. That's the part we really care about for modern accelerators because without that defacto the hardware is useless. Generally these chips have full-blown, if special purpose ISA, with register files, spilling, branches, loops and other control flow (sometimes only execution masks on simpler hw).
rdma-core can drive all the kernel interfaces with at least an ioctl wrapper, and it has a test suite that tries to cover this. It does not exercise the full HW capability, programmability, etc of every single device.
I actually don't entirely know what everyone has built on top of rdma-core, or how I'd try to map it the DRI ideas you are trying to explain.
Should we ban all Intel RDMA drivers because they are shipping proprietary Intel HPC compilers and proprietary Intel MPI which drives their RDMA HW? Or is that OK because there are open analogs for some of that stuff? And yes, the open versions are inferior in various metrics.
Pragmatically what I want to see is enough RDMA common/open user space to understand the uAPI and thus more about how the kernel driver works. Forcing everyone into rdma-core has already prevented a number of uAPI mistakes in drivers that would have been bad - so at least this level really is valuable.
So we actually want less on dri-devel, because for compute/accel chips we're currently happy with a vendor userspace. It just needs to be functional and complete, and open in its entirety.
In a sense yes: DRI doesn't insist on a single code base to act as the kernel interface, but that is actually the thing that has brought the most value to RDMA, IMHO.
So in practice we're not that different in DRI wrt userspace - if there is an established cross-vendor project in the given area, we do expect the userspace side to be merged there. And nowadays most of the feature work is done that way, it's just that we don't have a single project like rdma-core for this. We do still allow per-driver submit interfaces because hw is just not standardized enough there, the standards are at a higher level. Which is why it just doesn't make sense to talk about a kernel driver as something that's useful stand-alone at all.
We've certainly had some interesting successes because of this. The first submission for AWS's EFA driver proposed to skip the rdma-core step, which was rejected. However since EFA has been in that ecosystem it has benefited greatly, I think.
However, in another sense no: RDMA hasn't been blocking, say Intel, just because they have built proprietary stuff on top of our open stack.
Oh we allow this too. We only block the initial submission if the proprietary stuff is the only thing out there.
Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks.
That's not possible without some serious cans of worms though. Wayland is a protocol, and you can't forbid people from implementing it. Otherwise all the compatible open implementations of closed protocols wouldn't be possible either.
Now the implementation is a different thing, and there a few compositors have succumbed to market pressure and enabled the nvidia stack, as a mostly separate piece from supporting the open stack. And that's largely because nvidia managed to completely kill the open source r/e effort through firmware licensing and crypto-key based verified loading, so unless you install the proprietary stack you actually can't make use of the hardware at all - well display works without the firmware, but 3d/compute just doesn't. So you just can't use nvidia hw without accepting their proprietary driver licenses and all that entails for the latest hardware.
So I'm not clear what you're suggesting here we should do different. -Daniel
On Tue, Jul 06, 2021 at 04:09:25PM +0200, Daniel Vetter wrote:
Anyway, for anything that works like a gpu accelerator, like 3d accel, or parallel compute accel (aka gpgpu) or spatial compute accel (aka NN/AI) or maybe even fpga accel most of the magic to use the hardware is in this backend compiler, which translates from an IR into whatever your accelerator consumes. That's the part we really care about for modern accelerators because without that defacto the hardware is useless. Generally these chips have full-blown, if special purpose ISA, with register files, spilling, branches, loops and other control flow (sometimes only execution masks on simpler hw).
I don't know if I see it so clearly as you do - at the end of the day the user keys in the program in some proprietary (or open!) language and and wack of propritary magic transforms it to "make it work".
There are many barriers that prevent someone without the secret knowledge from duplicating the end result of a working program. An accelerator ISA is certainly one example, but I wouldn't overly focus on it as the only blocker.
Like you said below the NVIDIA GPU ISA seems known but the HW is still not really useful for other reasons.
Habana seems to have gone the other way, the HW is fully useful but we don't have the ISA transformation and other details.
Both cases seem to have ended up with something useless, and I have a hard time saying nouveau has more right to be in the kernel tree than Habana does.
Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks.
That's not possible without some serious cans of worms though. Wayland is a protocol, and you can't forbid people from implementing it. Otherwise all the compatible open implementations of closed protocols wouldn't be possible either.
Well, in many ways so is Linux, but nobody would seriously re-implement Linux just to produce a driver.
So I'm not clear what you're suggesting here we should do different.
Not enabling proprietary stacks as above would be a good start.
Jason
On Tue, Jul 6, 2021 at 4:56 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 04:09:25PM +0200, Daniel Vetter wrote:
Anyway, for anything that works like a gpu accelerator, like 3d accel, or parallel compute accel (aka gpgpu) or spatial compute accel (aka NN/AI) or maybe even fpga accel most of the magic to use the hardware is in this backend compiler, which translates from an IR into whatever your accelerator consumes. That's the part we really care about for modern accelerators because without that defacto the hardware is useless. Generally these chips have full-blown, if special purpose ISA, with register files, spilling, branches, loops and other control flow (sometimes only execution masks on simpler hw).
I don't know if I see it so clearly as you do - at the end of the day the user keys in the program in some proprietary (or open!) language and and wack of propritary magic transforms it to "make it work".
There are many barriers that prevent someone without the secret knowledge from duplicating the end result of a working program. An accelerator ISA is certainly one example, but I wouldn't overly focus on it as the only blocker.
Well we don't, we do just ask for the full driver stack to make the hw work. It's just that in the past most vendors choose to leave out the compiler/ISA from their open stack/specs. Well except nvidia, which still chooses to leave out everything aside from some very, very minimal thing around documenting display functionality.
Like you said below the NVIDIA GPU ISA seems known but the HW is still not really useful for other reasons.
Habana seems to have gone the other way, the HW is fully useful but we don't have the ISA transformation and other details.
You can actually use nvidia gpus, they're fully functional.
If you install the blobby stack. Which is exactly the same thing as with habanalabs, plus/minus a few things at the fringes.
In the end it's about drawing the line somewhere, so maybe we should merge the nvidia glue code that makes their blobby stack work better with upstream? There's quite a few pieces there, e.g. their display driver is by design a userspace driver, whereas with kernel modesetting it needs to be in the kernel to expose the common kms ioctl interfaces, so they've built up a glue layer to forward everything to userspace and back. On windows it works because there kernel code can have growing stacks and fun stuff like that, at least that's my understanding. Not really an option to just run the code in linux.
I'm pretty sure nvidia would appreciate that, and maybe every once in a while they open up a header for a generation or two of products like they've done in the past.
Both cases seem to have ended up with something useless, and I have a hard time saying nouveau has more right to be in the kernel tree than Habana does.
Honestly, I think GPU is approaching this backwards. Wayland should have been designed to prevent proprietary userspace stacks.
That's not possible without some serious cans of worms though. Wayland is a protocol, and you can't forbid people from implementing it. Otherwise all the compatible open implementations of closed protocols wouldn't be possible either.
Well, in many ways so is Linux, but nobody would seriously re-implement Linux just to produce a driver.
Well in the gpu space for 2+ decades nvidia has been setting the standard, and the open stack has been trying to catch up by reimplementing the entire thing. It took a fair while.
So I'm not clear what you're suggesting here we should do different.
Not enabling proprietary stacks as above would be a good start.
I'm still not sure what exactly you mean here. Like on the 3d side there's opengl and vulkan, and nvidia just has an entirely different implementation of that compared to any of the open drivers. That is a bit less code than linux, but it's not small, and reimplementing over decades is pretty much what happened. And if it's not allowed we'd actually not have an open 3d gpu stack at all, because only very recently did we get an agreement around the tracemark/licensing issues of that stuff with Khronos. Recently compared to the history of opengl at least.
So I'm still not clear what exactly it is you're suggesting we should do? Not implement the industry standards for 3d (and accept we stay irrelevant forever)? Reject nvidia blobs harder than we do already? Distros will continue to ship an auto-installer for that stack, at least some, so we're pretty much maxed out already. Like in what way do you think the upstream stack does enable the proprietary nvidia stack? Should we permanently ban any contributions from anyone with an @nvidia.com address, even if it helps the open stack improve?
Like I'm not seeing something concrete that could be done, which would actually prevent nvidia from having their completely independent stack, with exact same functionality and not a line of code shared. Which is were we are right now. The only thing where we could be more strict is to reject any contributions from them at all, just because we don't like them. That seems a bit too extreme -Daniel
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote:
Afaik linux cpu arch ports are also not accepted if there's no open gcc or llvm port around, because without that the overall stack just becomes useless.
Yes. And the one architecture that has an open but not upstream compiler already is more than enough of a pain to not repeat that mistake ever again.
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote:
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful.
I don't think this is true at all - nouveau is probably the best example.
nouveau reverse engineered a userspace stack for one of these devices.
How much further ahead would they have been by now if they had a vendor supported, fully featured, open kernel driver to build the userspace upon?
open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything.
My perspective from RDMA is that the drivers are black boxes. I can hack around the interface layers but there is a lot of wild stuff in there that can't be understood without access to the HW documentation.
I think only HW that has open specs, like say NVMe, can really be properly community oriented. Otherwise we have to work in a community partnership with the vendor.
Jason
On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote:
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful.
I don't think this is true at all - nouveau is probably the best example.
nouveau reverse engineered a userspace stack for one of these devices.
How much further ahead would they have been by now if they had a vendor supported, fully featured, open kernel driver to build the userspace upon?
There is actually tons of example here, most of the arm socs have fully open kernel drivers, supported by the vendor (out of tree).
The hard part is the userspace driver and all the things you're submitting to it. We've had open kernel drivers for mail/qualcomm/... years before any believable open implementation started existing. Typing up the memory manager and hw submission queue handling is comparitively trivial. Generally the kernel driver is also done last, you bring up the userspace first, often by just directly programming the hw from userspace. Kernel driver only gets in the way with this stuff (nouveau is entirely developed as a userspace driver, as the most extreme example).
This is a bit different for the display side, but nowadays those drivers are fully in-kernel so they're all open. Well except the nvidia one, and I've not heard of nvidia working on even an out-of-tree open display driver, so that won't help the in-tree effort at all.
Where it would have helped is if this open driver would come with redistributable firmware, because that is right now the thing making nouveau reverse-engineering painful enough to be non-feasible. Well not the reverse-engineering, but the "shipping the result as a working driver stack".
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
open up your hw enough for that, I really don't see the point in merging such a driver, it'll be an unmaintainable stack by anyone else who's not having access to those NDA covered specs and patents and everything.
My perspective from RDMA is that the drivers are black boxes. I can hack around the interface layers but there is a lot of wild stuff in there that can't be understood without access to the HW documentation.
There's shipping gpu drivers with entirely reverse-engineered stacks. And I don't mean "shipping in fedora" but "shipping in Chrome tablets sold by OEM partners of Google". So it's very much possible, even if the vendor is maximally stubborn about things.
I think only HW that has open specs, like say NVMe, can really be properly community oriented. Otherwise we have to work in a community partnership with the vendor.
Well sure that's the ideal case, but most vendors in the accel space arent interested actual partnership with the wider community. It's "merge this kernel driver and have no further demands about anything else". Well there are some who are on board, but it does take pretty enormous amounts of coercion. -Daniel
On Tue, Jul 06, 2021 at 04:39:19PM +0200, Daniel Vetter wrote:
On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote:
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful.
I don't think this is true at all - nouveau is probably the best example.
nouveau reverse engineered a userspace stack for one of these devices.
How much further ahead would they have been by now if they had a vendor supported, fully featured, open kernel driver to build the userspace upon?
There is actually tons of example here, most of the arm socs have fully open kernel drivers, supported by the vendor (out of tree).
I choose nouveau because of this:
$ git ls-files drivers/gpu/drm/arm/ | xargs wc -l 15039 total $ git ls-files drivers/gpu/drm/nouveau/ | xargs wc -l 204198 total
At 13x the size of mali this is not just some easy to wire up memory manager and command submission. And after all that typing it still isn't very good. The fully supported AMD vendor driver is over 3 million lines, so nouveau probably needs to grow several times.
My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack.
For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed.
I'm not sure about this all or nothing approach. AFAIK DRM has the worst problems with out of tree drivers right now.
Where it would have helped is if this open driver would come with redistributable firmware, because that is right now the thing making nouveau reverse-engineering painful enough to be non-feasible. Well not the reverse-engineering, but the "shipping the result as a working driver stack".
I don't think much of the out of tree but open drivers. The goal must be to get vendors in tree.
I would applaud Habana for getting an intree driver at least, even if the userspace is not what we'd all want to see.
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
The internet tells me there is nvgpu, it doesn't seem to have helped.
Jason
On Tue, Jul 6, 2021 at 5:25 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 04:39:19PM +0200, Daniel Vetter wrote:
On Tue, Jul 6, 2021 at 4:23 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 12:36:51PM +0200, Daniel Vetter wrote:
If that means AI companies don't want to open our their hw specs enough to allow that, so be it - all you get in that case is offloading the kernel side of the stack for convenience, with zero long term prospects to ever make this into a cross vendor subsystem stack that does something useful.
I don't think this is true at all - nouveau is probably the best example.
nouveau reverse engineered a userspace stack for one of these devices.
How much further ahead would they have been by now if they had a vendor supported, fully featured, open kernel driver to build the userspace upon?
There is actually tons of example here, most of the arm socs have fully open kernel drivers, supported by the vendor (out of tree).
I choose nouveau because of this:
$ git ls-files drivers/gpu/drm/arm/ | xargs wc -l 15039 total $ git ls-files drivers/gpu/drm/nouveau/ | xargs wc -l 204198 total
drm/arm is the arm display driver, which isn't actually shipping anywhere afaik. Also it's not including the hdmi/dp output drivers, those are generally external on socs, but integrated in discrete gpu.
The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't. So I think adding it all up it's not that much different. Last time I looked if you look at just command submission and rendering/compute, and not include display, which heavily skews the stats, it's about 10% kernel, 90% userspace driver parts. Not including anything that's shared, which is most of it (compiler frontend, intermediate optimizer, entire runtime/state tracker and all the integration and glue pieces largely).
At 13x the size of mali this is not just some easy to wire up memory manager and command submission. And after all that typing it still isn't very good. The fully supported AMD vendor driver is over 3 million lines, so nouveau probably needs to grow several times.
AMD is 3 million lines the size because it includes per-generation generated header files.
And of course once you throw an entire vendor team at a driver all those engineers will produce something, and there's the usual that the last 10% of features produce about 90% of the complexity and code problem. E.g. the kbase driver for arm mali gpu is 20x the size of the in-tree panfrost driver - they need to keep typing to justify their continued employement, or something like that. Usually it's because they reinvent the world.
My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack.
Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example.
For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed.
Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that.
I'm not sure about this all or nothing approach. AFAIK DRM has the worst problems with out of tree drivers right now.
Well I guess someone could stand up a drivers/totally-not-gpu and just let the flood in. Even duplicated drivers and everything included, because the vendor drivers are better. Worth a shot, we've practically started this already, I'm just not going to help with the cleanup.
Where it would have helped is if this open driver would come with redistributable firmware, because that is right now the thing making nouveau reverse-engineering painful enough to be non-feasible. Well not the reverse-engineering, but the "shipping the result as a working driver stack".
I don't think much of the out of tree but open drivers. The goal must be to get vendors in tree.
Agreed. We actually got them in-tree largely. Nvidia even contributes the oddball thing, and I think the tegra line is still fully supported in upstream with the upstream driver.
I'm not sure the bleak picture you're drawing is reality, aside from the fact that Nvidia discrete gpu drivers being a disaster with no redistributable firmware, no open kernel driver that works, and nothing else really either.
I would applaud Habana for getting an intree driver at least, even if the userspace is not what we'd all want to see.
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
The internet tells me there is nvgpu, it doesn't seem to have helped.
Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik. -Daniel
On Tue, Jul 6, 2021 at 5:49 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jul 6, 2021 at 5:25 PM Jason Gunthorpe jgg@ziepe.ca wrote:
I'm not sure about this all or nothing approach. AFAIK DRM has the worst problems with out of tree drivers right now.
Well I guess someone could stand up a drivers/totally-not-gpu and just let the flood in. Even duplicated drivers and everything included, because the vendor drivers are better. Worth a shot, we've practically started this already, I'm just not going to help with the cleanup.
tbh I think at this point someone should just do that. Ideally with some boundary like please don't use dma-fence or dma-buf and stuff like that so drivers/gpu doesn't ever have to deal with the fallout. But way too many people think that somehow you magically get the other 90% of an open accel stack if you're just friendly enough and merge the kernel driver, so we really should just that experiment in upstream and watch it pan out in reality.
Minimally it would be some great entertainment :-)
Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports. So seems to be about the same level of screwed up, if you ask the vendor they tell you the upstream driver isn't a thing they care about and it's just done for a bit of goodwill. Except if you have enormous amounts of volume, then suddenly it's an option ... Minus the fw issue for nvidia, upstream does support all the gpus you can buy right now and that can run on linux with some vendor driver (aka excluding apple M1 and ofc upcoming products from most vendors).
drivers/accel otoh is mostly out-of-tree, because aside from Greg mergin habanalabs no one is bold enough anymore to just merge them all. There's lots of those going around that would be ready for picking. And they've been continously submitted to upstream over the years, even before the entire habanalabs thing. -Daniel
On Tue, Jul 06, 2021 at 06:07:17PM +0200, Daniel Vetter wrote:
Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports.
In the enterprise world everyone has their out of tree backport drivers. It varies on the vendor how much deviation there is from the upstream driver and what commercial support relationship the vendor has with the enterprise distros.
So seems to be about the same level of screwed up, if you ask the vendor they tell you the upstream driver isn't a thing they care about and it's just done for a bit of goodwill.
Sounds like you should get a new RDMA supplier :)
To be fair Intel is getting better, they got their new RDMA HW support merged into v5.14 after about 2 years in the out of tree world. Though it is still incomplete compared to their out of tree driver, the gap is much smaller now.
amounts of volume, then suddenly it's an option ... Minus the fw issue for nvidia, upstream does support all the gpus you can buy right now and that can run on linux with some vendor driver (aka excluding apple M1 and ofc upcoming products from most vendors).
I would look at how many actual commercial systems are running the upstream/inbox stack. I personally know of quite a few sites with big HPC RDMA deployments running pure inbox kernels, no add on kernel modules, with full commercial support.
If you can say that kind of arrangment is also common place in the GPU world then I will happily be wrong.
Jason
On Tue, Jul 06, 2021 at 02:28:28PM -0300, Jason Gunthorpe wrote:
Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports.
In the enterprise world everyone has their out of tree backport drivers. It varies on the vendor how much deviation there is from the upstream driver and what commercial support relationship the vendor has with the enterprise distros.
I think he means the Mellanox OFED stack, which is a complete and utter mess and which gets force fed by Mellanox/Nvidia on unsuspecting customers. I know many big HPC sites that ignore it, but a lot of enterprise customers are dumb enought to deploy it.
On Tue, Jul 06, 2021 at 07:31:37PM +0200, Christoph Hellwig wrote:
On Tue, Jul 06, 2021 at 02:28:28PM -0300, Jason Gunthorpe wrote:
Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports.
In the enterprise world everyone has their out of tree backport drivers. It varies on the vendor how much deviation there is from the upstream driver and what commercial support relationship the vendor has with the enterprise distros.
I think he means the Mellanox OFED stack, which is a complete and utter mess and which gets force fed by Mellanox/Nvidia on unsuspecting customers. I know many big HPC sites that ignore it, but a lot of enterprise customers are dumb enought to deploy it.
No, I don't think so. While MOFED is indeed a giant mess, the mlx5 upstream driver is not some token effort to generate good will and Mellanox certainly does provide full commercial support for the mlx5 drivers shipped inside various enterprise distros.
MOFED also doesn't have a big functional divergance from RDMA upstream, and it is not mandatory just to use the hardware.
I can not say the same about other company's RDMA driver distributions, Daniel's description of "minimal effort to get goodwill" would match others much better.
You are right that there are a lot of enterprise customers who deploy the MOFED. I can't agree with their choices, but they are not forced into using it anymore.
Jason
On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote:
The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't.
Sure, but that is the point, isn't it? To have an actually useful thing you need all of this mess
My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack.
Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example.
So the argument with Habana really boils down to how much do they need to show in the open source space to get a kernel driver? You want to see the ISA or compiler at least?
That at least doesn't seem "extreme" to me.
For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed.
Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that.
Right, companies have no incentive to work in a sane way if they have their own parallel world. I think drawing them part by part into the standard open workflows and expectations is actually helpful to everyone.
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
The internet tells me there is nvgpu, it doesn't seem to have helped.
Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik.
I understand nvgpu is the tegra driver that people actualy use. nouveau may have good tegra support but is it used in any actual commercial product?
Jason
On Tue, Jul 6, 2021 at 6:29 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote:
The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't.
Sure, but that is the point, isn't it? To have an actually useful thing you need all of this mess
My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack.
Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example.
So the argument with Habana really boils down to how much do they need to show in the open source space to get a kernel driver? You want to see the ISA or compiler at least?
Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly.
That at least doesn't seem "extreme" to me.
For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed.
Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that.
Right, companies have no incentive to work in a sane way if they have their own parallel world. I think drawing them part by part into the standard open workflows and expectations is actually helpful to everyone.
Well we do try to get them on board part-by-part generally starting with the kernel and ending with a proper compiler instead of the usual llvm hack job, but for whatever reasons they really like their in-house stuff, see below for what I mean.
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
The internet tells me there is nvgpu, it doesn't seem to have helped.
Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik.
I understand nvgpu is the tegra driver that people actualy use. nouveau may have good tegra support but is it used in any actual commercial product?
I think it was almost the case. Afaik they still have their internal userspace stack working on top of nvidia, at least last year someone fixed up a bunch of issues in the tegra+nouveau combo to enable format modifiers properly across the board. But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware.
And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware. But for the next generation they decided it's going to be again only their in-house tree that completele ignores drivers/gpu/drm, and also tosses all the foundational work they helped build on the userspace side. And this is consistent across all companies, over the last 20 years I know of (often non-public) stories across every single company where they decided that all the time invested into community/upstream collaboration isn't useful anymore, we go all vendor solo for the next one.
Most of those you luckily don't hear about anymore, all it results in the upstream driver being 1-2 years late or so. But even the good ones where we collaborate well can't seem to help themselves and want to throw it all away every few years. -Daniel
I should stop typing and prep dinner, but I found some too hilarious typos below.
On Tue, Jul 6, 2021 at 7:35 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Tue, Jul 6, 2021 at 6:29 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 05:49:01PM +0200, Daniel Vetter wrote:
The other thing to keep in mind is that one of these drivers supports 25 years of product generations, and the other one doesn't.
Sure, but that is the point, isn't it? To have an actually useful thing you need all of this mess
My argument is that an in-tree open kernel driver is a big help to reverse engineering an open userspace. Having the vendors collaboration to build that monstrous thing can only help the end goal of an end to end open stack.
Not sure where this got lost, but we're totally fine with vendors using the upstream driver together with their closed stack. And most of the drivers we do have in upstream are actually, at least in parts, supported by the vendor. E.g. if you'd have looked the drm/arm driver you picked is actually 100% written by ARM engineers. So kinda unfitting example.
So the argument with Habana really boils down to how much do they need to show in the open source space to get a kernel driver? You want to see the ISA or compiler at least?
Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all
s/demons/demos/ but hw tends to be funky enough that either fits :-)
that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly.
That at least doesn't seem "extreme" to me.
For instance a vendor with an in-tree driver has a strong incentive to sort out their FW licensing issues so it can be redistributed.
Nvidia has been claiming to try and sort out the FW problem for years. They even managed to release a few things, but I think the last one is 2-3 years late now. Partially the reason is that there don't have a stable api between the firmware and driver, it's all internal from the same source tree, and they don't really want to change that.
Right, companies have no incentive to work in a sane way if they have their own parallel world. I think drawing them part by part into the standard open workflows and expectations is actually helpful to everyone.
Well we do try to get them on board part-by-part generally starting with the kernel and ending with a proper compiler instead of the usual llvm hack job, but for whatever reasons they really like their in-house stuff, see below for what I mean.
I don't think the facts on the ground support your claim here, aside from the practical problem that nvidia is unwilling to even create an open driver to begin with. So there isn't anything to merge.
The internet tells me there is nvgpu, it doesn't seem to have helped.
Not sure which one you mean, but every once in a while they open up a few headers, or a few programming specs, or a small driver somewhere for a very specific thing, and then it dies again or gets obfuscated for the next platform, or just never updated. I've never seen anything that comes remotely to something complete, aside from tegra socs, which are fully supported in upstream afaik.
I understand nvgpu is the tegra driver that people actualy use. nouveau may have good tegra support but is it used in any actual commercial product?
I think it was almost the case. Afaik they still have their internal userspace stack working on top of nvidia, at least last year someone fixed up a bunch of issues in the tegra+nouveau combo to enable format modifiers properly across the board. But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware.
And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and
s/know/name/ I do know them unfortunately quite well ...
Cheers, Daniel
the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware. But for the next generation they decided it's going to be again only their in-house tree that completele ignores drivers/gpu/drm, and also tosses all the foundational work they helped build on the userspace side. And this is consistent across all companies, over the last 20 years I know of (often non-public) stories across every single company where they decided that all the time invested into community/upstream collaboration isn't useful anymore, we go all vendor solo for the next one.
Most of those you luckily don't hear about anymore, all it results in the upstream driver being 1-2 years late or so. But even the good ones where we collaborate well can't seem to help themselves and want to throw it all away every few years.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote:
Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly.
Seems reasonable to me
And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware.
What I've seen is that this only works with customer demand. Companies need to hear from their customers that upstream is what is needed, and companies cannot properly hear that until they are at least already partially invested in the upstream process and have the right customers that are sophisticated enough to care.
Embedded makes everything 10x worse because too many customers just don't care about upstream, you can hack your way through everything, and indulge in single generation thinking. Fork the whole kernel for 3 years, EOL, no problem!
It is the enterprise world, particularly with an opinionated company like RH saying NO stuck in the middle that really seems to drive things toward upstream.
Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an example) but it is incredibly time consuming, expensive and becoming more and more difficult every year.
The big point is this:
But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware.
I think this is at the core of Linux's success in the enterprise world. Big customers who care demanding open source. Any vendor, even nvidia will want to meet customer demands.
IHMO upstream success is found by motivating the customer to demand and make it "easy" for the vendor to supply it.
Jason
On Tue, Jul 6, 2021 at 8:31 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote:
Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly.
Seems reasonable to me
And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware.
What I've seen is that this only works with customer demand. Companies need to hear from their customers that upstream is what is needed, and companies cannot properly hear that until they are at least already partially invested in the upstream process and have the right customers that are sophisticated enough to care.
Embedded makes everything 10x worse because too many customers just don't care about upstream, you can hack your way through everything, and indulge in single generation thinking. Fork the whole kernel for 3 years, EOL, no problem!
It's not entirely hopeless in embedded either. Sure there's the giant pile of sell&forget abandonware, but there are lots of embedded things where multi-year to multi-decade support is required. And an upstream gfx stack beats anything the vendor has to offer on that, easily.
And on the server side it's actually pretty hard to convince customers of the upstream driver benefits, because they don't want or can't abandon nvidia and have just learned to accept the pain. They either build a few abstraction layers on top (and demand the vendor support those), or they flat out demand you support the nvidia broprietary interfaces. And AMD has been trying to move the needle here for years, with not that much success.
It is the enterprise world, particularly with an opinionated company like RH saying NO stuck in the middle that really seems to drive things toward upstream.
Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an example) but it is incredibly time consuming, expensive and becoming more and more difficult every year.
The big point is this:
But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware.
I think this is at the core of Linux's success in the enterprise world. Big customers who care demanding open source. Any vendor, even nvidia will want to meet customer demands.
IHMO upstream success is found by motivating the customer to demand and make it "easy" for the vendor to supply it.
Yup, exactly same situation here. The problem seems to be a bit that gpu vendor stubbornness is higher than established customer demand even, or they just don't care, and so in the last few years that customer demand has resulted in payment to consulting shops and hiring of engineers into reverse-engineering a full driver, instead of customer and vendor splitting the difference and the vendor upstreaming their stack. And that's for companies who've done it in the past, or at least collaborated on parts like the kernel driver, so I really have no clue why they don't just continue. We have well-established customers who do want it all open and upstream, across kernel and userspace pieces.
And it looks like it's going to repeat itself a few more times unfortunately. I'm not sure when exactly the lesson will sink in.
Maybe I missed some, but looking at current render/compute drivers I think (but not even sure on that) only drm/lima is a hobbyist project and perhaps you want to include drm/nouveau as not paid by customers and more something redhat does out of principle. All the others are paid for by customers, with vendor involvement ranging from "just helping out with the kernel driver" to "pays for pretty much all of the development". And still apparently that's not enough demand for an upstream driver stack. -Daniel
On Tue, Jul 6, 2021 at 2:31 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jul 06, 2021 at 07:35:55PM +0200, Daniel Vetter wrote:
Yup. We dont care about any of the fancy pieces you build on top, nor does the compiler need to be the optimizing one. Just something that's good enough to drive the hw in some demons to see how it works and all that. Generally that's also not that hard to reverse engineer, if someone is bored enough, the real fancy stuff tends to be in how you optimize the generated code. And make it fit into the higher levels properly.
Seems reasonable to me
And it's not just nvidia, it's pretty much everyone. Like a soc company I don't want to know started collaborating with upstream and the reverse-engineered mesa team on a kernel driver, seems to work pretty well for current hardware.
What I've seen is that this only works with customer demand. Companies need to hear from their customers that upstream is what is needed, and companies cannot properly hear that until they are at least already partially invested in the upstream process and have the right customers that are sophisticated enough to care.
Embedded makes everything 10x worse because too many customers just don't care about upstream, you can hack your way through everything, and indulge in single generation thinking. Fork the whole kernel for 3 years, EOL, no problem!
It is the enterprise world, particularly with an opinionated company like RH saying NO stuck in the middle that really seems to drive things toward upstream.
Yes, vendors can work around Red Hat's No (and NVIDIA GPU is such an example) but it is incredibly time consuming, expensive and becoming more and more difficult every year.
The big point is this:
But also nvidia is never going to sell you that as the officially supported thing, unless your ask comes back with enormous amounts of sold hardware.
I think this is at the core of Linux's success in the enterprise world. Big customers who care demanding open source. Any vendor, even nvidia will want to meet customer demands.
IHMO upstream success is found by motivating the customer to demand and make it "easy" for the vendor to supply it.
I think this is one of the last big challenges on Linux. It's REALLY hard to align new products with Linux kernel releases and distro kernels. Hardware cycles are too short and drivers (at least for GPUs) are too big to really fit well with the current Linux release model. In many cases enterprise distros have locked down on a kernel version around the same time we are doing new chip bring up. You are almost always off by one when it comes to kernel version alignment. Even if you can get the initial code upstream in the right kernel version, it tends to be aligned to such early silicon that you end up needing a pile of additional patches to make production cards work. Those changes are often deemed "too big" for stable kernel fixes. The only real way to deal with that effectively is with vendor provided packaged drivers using something like dkms to cover launch. Thus you need to do your bring up against latest upstream and then backport, or do your bring up against some older kernel and forward port for upstream. You end up doing everything twice. Things get better with sustaining support in subsequent distro releases, but it doesn't help at product launch. I don't know what the right solution for this is.
Alex
linaro-mm-sig@lists.linaro.org