On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote:
Den 2018-12-03 kl. 11:22, skrev Sasha Levin:
This is a case where theory collides with the real world. Yes, our QA is lacking, but we don't have the option of not doing the current process. If we stop backporting until a future data where our QA problem is solved we'll end up with what we had before: users stuck on ancient kernels without a way to upgrade.
Sorry, but you seem to be living in a different "real world"...
People stay on "ancient kernels" that "just works" instead of updating to a newer one that "hopefully/maybe/... works"
If users are stuck at older kernels and refuse to update then there's not much I can do about it. They are knowingly staying on kernels with known issues and will end up paying a much bigger price later to update.
With the current model we're aware that bugs sneak through, but we try to deal with it by both improving our QA, and encouraging users to do their own extensive QA. If we encourage users to update frequently we can keep improving our process and the quality of kernels will keep getting better.
And here you want to turn/force users into QA ... good luck with that.
Yes, users are expected to test their workloads with new kernels - I'm not sure why this is a surprise to anyone. Isn't it true for every other piece of software?
I invite you to read Jon's great summary on LWN of a related session that happened during the maintainer's summit: https://lwn.net/Articles/769253/ . The conclusion reached was very similar.
In reality they wont "update frequently", instead they will stop updating when they have something that works... and start ignoring updates as they expect something "to break as usual" as they actually need to get some real work done too...
Again, this model was proven to be bad in the past, and if users keep following it then they're knowingly shooting themselves in the foot.
We simply can't go back to the "enterprise distro" days.
Maybe so, but we should atleast get back to having "stable" or "longterm" actually mean something again...
Or what does it say when distros starts thinking about ignoring (and some already do) stable/longterm trees because there is _way_ too much questionable changes coming through, even overriding maintainers to the point where they basically state "we dont care about monitoring stable trees anymore, as they add whatever they want anyway"...
I'm assuming you mean "enterprise distros" here, as most of the community distros I'm aware of are tracking stable trees.
Enterprise distros are a mix of everything: on one hand they would refuse most stable patches because they don't have any demand from customers to fix those bugs, but on the other hand they will update drivers and subsystems as a whole to create these frankenstein kernels that are very difficult to support.
When your kernel is driven by paying customer demands it's difficult to argue for the technical merits of your process.
And pretending that every fix is important enough to backport, and saying if you dont take everything you have an "unsecure" kernel wont help, as reality has shown from time to time that backports can/will open up a new issue instead for no good reason
Wich for distros starts to mean, switch back to selectively taking fixes for _known_ security issues are considered way better choice
That was my exact thinking 2 years ago (see my stable-security project: https://lwn.net/Articles/683335/). I even had a back-and-forth with Greg on LKML when I was trying to argue your point: "Lets only take security fixes because no one cares about the other crap".
If you're interested, I'd be happy to explain further why this was a complete flop.
-- Thanks, Sasha