I'm i right in says, the bad firmware was introduced with:
Correct.
commit 0a6890b9b4df89a83678eba0bee3541bcca8753c Author: Sudarsana Reddy Kalluru skalluru@marvell.com Date: Mon Nov 4 21:51:09 2019 -0800
bnx2x: Utilize FW 7.13.15.0. Commit 97a27d6d6e8d "bnx2x: Add FW 7.13.15.0" added said .bin FW to linux-firmware tree. This FW addresses few important issues in the earlier FW release. This patch incorporates FW 7.13.15.0 in the bnx2x driver.
And that means v5.5 through to at least 5.16 will be broken? It has been broken for a little under 2 years? And both 5.10 and 5.15 are LTS. And you don't care.You will leave them broken, even knowing that distribution kernels are going to use these LTS kernel?
Not Correct. We would like to solve the problem here too. But what we plan is to push these fixes upstream
Isn't mainline the top of upstream? You cannot get any further up. Yet you plan to drop stable? Please could you explain some more.
Are you thinking of releasing a 7.13.15.1 which only fixes the problem, keeping ABI compatibility, so it can be added to stable? And then submit 7.13.20.0 for net-next?
It is not correct that this would have been avoided by not Breaking the ABI. The breakage was a bug introduced in the FW for SR-IOV. Having backwards/forwards compatible ABI would not change the fact that the bug would be there. The bug is only exposed with old VM running on new Hypervisor, so it is not correct to say "bug was there for 2 years". Although problem was introduced 2 years ago, it was exposed now, and now we want to fix it. Whether the fix is done in a manner by which driver can work with old FW file on disk or not is not related to the problem itself.
I stand by that *generally* this HW architecture is not designed for backward/forward compatibility with regard to this FW. But it is true that in this case it can be done. Numerous FW versions of this device which were already accepted and all were non backwards compatible and all had this same issue (updating driver mandates syncing up to latest FW tree, otherwise driver load gracefully fails). Since this is the last FW we are pushing for this EOLing device it seems a bit meticulous to insist on this for this (hopefully) last version of the device FW.
Part of the problem is the Marvell keeps doing this for its products. See the discussion with Prestera. It is like there is a Marvell policy to not even bother to try to keep ABI compatibility with the firmware.
If the community wants Marvell to get better in this respect, we need to push back and say ABI compatibility is important. I hope Prestera has learned its lesion, they say they will never break ABI compatibility again, but i think we need to wait a few years before we can actually trust that statement.
What about other NIC drivers. I hope you don't have any other ABI breaks planned.
Andrew