On Mon, Apr 16, 2018 at 01:44:23PM -0400, Steven Rostedt wrote:
On Mon, 16 Apr 2018 17:16:10 +0000 Sasha Levin Alexander.Levin@microsoft.com wrote:
So if a user is operating a nuclear power plant, and has 2 leds: green one that says "All OK!" and a red one saying "NUCLEAR MELTDOWN!", and once in a blue moon a race condition is causing the red one to go on and cause panic in the little province he lives in, we should tell that user to fuck off?
LEDs may not be critical for you, but they can be critical for someone else. Think of all the different users we have and the wildly different ways they use the kernel.
We can point them to the fix and have them backport it. Or they should ask their distribution to backport it.
It may work in your subsystem, but it really doesn't work this way with the kernel.
Let me share a concrete example with you: there's a vfs bug that's a pain to reproduce going around. It was originally reported on CoreOS/AWS:
https://github.com/coreos/bugs/issues/2356
But our customers reported to us that they're hitting this issue too.
We couldn't reproduce it, and the call trace indicated it may be a memory corrution. We could however confirm with the customers that the latest mainline fixes the issue.
Given that we couldn't reproduce it, and neither of us is a fs/ expert, we sent a mail to LKML, just like you suggested doing:
https://lkml.org/lkml/2018/3/2/1038
But unlike what you said, no one pointed us to the fix, even though the issue was fixed on mainline. Heck, no one engaged in any meaningful conversation about the bug.
I really think that we have a different views as to how well the whole "let me shoot a mail to LKML" process works, which leads to different views on -stable.
Hopefully they tested the kernel they are using for something like that, and only want critical fixes. What happens if they take the next stable assuming that it has critical fixes only, and this fix causes a regression that creates the "ALL OK!" when it wasn't.
Basically, I rather have stable be more bug compatible with the version it is based on with only critical fixes (things that will cause an oops) than to try to be bug compatible with mainline, as then we get into a state where things are a frankenstein of the stable base version and mainline. I could say, "Yeah this feature works better on this 4.x version of the kernel" and not worry about "4.x.y" versions having it better.
This is how things used to work, right? Look at redhat kernels for example, they'd stick with a kernel for tens of years, doing the tiniest fixes, only when customers complained, and encouraging users to upgrade only when the kernel would go EoL, and when customers couldn't do that because they were too locked on that kernel version.
redhat still supports 2.6.9.
I thought we agreed that this is bad? We wanted users to be closer to mainline, and we can't do it without bringing -stable closer to mainline as well.