On Mon, 16 Apr 2018 11:52:48 -0700 Linus Torvalds torvalds@linux-foundation.org wrote:
On Mon, Apr 16, 2018 at 11:41 AM, Steven Rostedt rostedt@goodmis.org wrote:
I never said the second bug fix should not have been backported. I even said that the first bug "didn't go far enough".
You're still not getting it.
The "didn't go far enough" means that the bug fix is *BUGGY*. It needs to be reverted.
It wasn't reverted. Look at the code in question.
Commit d63c7dd5bcb
+++ b/drivers/scsi/ipr.c @@ -4003,13 +4003,12 @@ static ssize_t ipr_store_update_fw(struct device *dev, struct ipr_sglist *sglist; char fname[100]; char *src; - int len, result, dnld_size; + int result, dnld_size;
if (!capable(CAP_SYS_ADMIN)) return -EACCES;
- len = snprintf(fname, 99, "%s", buf); - fname[len-1] = '\0'; + snprintf(fname, sizeof(fname), "%s", buf);
if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) { dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname);
The bug is that len returned by snprintf() can be much larger than 100. That fname[len-1] = '\0' can allow a user to decide where to write zeros.
That patch never got reverted in mainline. It was fixed with this:
Commit 21b81716c6bf
--- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -4002,6 +4002,7 @@ static ssize_t ipr_store_update_fw(struct device *dev, struct ipr_sglist *sglist; char fname[100]; char *src; + char *endline; int result, dnld_size;
if (!capable(CAP_SYS_ADMIN)) @@ -4009,6 +4010,10 @@ static ssize_t ipr_store_update_fw(struct device *dev,
snprintf(fname, sizeof(fname), "%s", buf);
+ endline = strchr(fname, '\n'); + if (endline) + *endline = '\0'; + if (request_firmware(&fw_entry, fname, &ioa_cfg->pdev->dev)) { dev_err(&ioa_cfg->pdev->dev, "Firmware file %s not found\n", fname); return -EIO;
I hope the answer was not to revert the bug and put back the possible bad memory access in to keep API.
But that very must *IS* the answer. If there isn't a fix for the ABI breakage, then the first bugfix needs to be reverted.
It wasn't reverted and that was my point. It just wasn't a complete fix. And I'm saying that once the API breakage became apparent, the second fix should have been backported as well.
I'm not saying that we should allow API breakage to fix a critical bug. I'm saying that the API breakage was really a secondary bug that needed to be addressed. My point is the first fix was NOT reverted!
Really. There is no such thing as "but the fix was more important than the bug it introduced".
I'm not saying that.
This is why we started with the whole "actively revert things that introduce regressions". Because people always kept claiming that "but but I fixed a worse bug, and it's better to fix the worse bug even if it then introduces another problem, because the other problem is lesser".
NO.
Right, but the fix to the API was also trivial. I don't understand why you are arguing with me. I agree with you. I'm talking about this specific instance. Where a bug was fixed, and the API breakage was another fix that needed to be backported.
Are you saying if code could allow userspace to write zeros anywhere in memory, that we should keep it to allow API compatibility?
We're better off making *no* progress, than making "unsteady progress".
Really. Seriously.
If you cannot fix a bug without introducing another one, don't do it. Don't do kernel development.
Um, I think that's impossible. As the example shows. Not many people would have caught the original fix would caused another bug. That requirement would pretty much keep everyone from ever doing any kernel development.
The whole mentality you show is NOT ACCEPTABLE.
So the *only* answer is: "fix the bug _and_ keep the API". There is no other choice.
I agree. But that that wasn't the question.
The whole "I fixed one problem but introduced another" is not how we work. You should damn well know that. There are no excuses.
And yes, sometimes that means jumping through hoops. But that's what it takes to keep users happy.
I'm talking about the given example of a simple memory bug that caused a very subtle breakage of API, which had another trivial fix that should be backported. I'm not sure that's what you were talking about.
-- Steve