On Tue, Jul 06, 2021 at 06:07:17PM +0200, Daniel Vetter wrote:
Also on your claim that drivers/gpu is a non-upstream disaster: I've also learned that that for drivers/rdma there's the upstream driver, and then there's the out-of-tree hackjob the vendor actually supports.
In the enterprise world everyone has their out of tree backport drivers. It varies on the vendor how much deviation there is from the upstream driver and what commercial support relationship the vendor has with the enterprise distros.
So seems to be about the same level of screwed up, if you ask the vendor they tell you the upstream driver isn't a thing they care about and it's just done for a bit of goodwill.
Sounds like you should get a new RDMA supplier :)
To be fair Intel is getting better, they got their new RDMA HW support merged into v5.14 after about 2 years in the out of tree world. Though it is still incomplete compared to their out of tree driver, the gap is much smaller now.
amounts of volume, then suddenly it's an option ... Minus the fw issue for nvidia, upstream does support all the gpus you can buy right now and that can run on linux with some vendor driver (aka excluding apple M1 and ofc upcoming products from most vendors).
I would look at how many actual commercial systems are running the upstream/inbox stack. I personally know of quite a few sites with big HPC RDMA deployments running pure inbox kernels, no add on kernel modules, with full commercial support.
If you can say that kind of arrangment is also common place in the GPU world then I will happily be wrong.
Jason