On Sat, Mar 11, 2023 at 11:24:31AM -0800, Eric Biggers wrote:
On Sat, Mar 11, 2023 at 07:33:58PM +0100, Willy Tarreau wrote:
On Sat, Mar 11, 2023 at 09:48:13AM -0800, Eric Biggers wrote:
The purpose of all these mailing list searches would be to generate a list of potential issues with backporting each commit, which would then undergo brief human review.
This is one big part that I suspect is underestimated. I'll speak from my past experience maintaining extended LTS for 3.10. I couldn't produce as many releases as I would have liked to because despite the scripts that helped me figure some series, some dependencies, origin branches etc, the whole process of reviewing ~600 patches to end up with ~200 at the end (and adapting some of them to fit) required ~16 hours a day for a full week-end, and I didn't always have that amount of time available. Any my choices were far from being perfect, as during the reviews I got a number of "please don't backport this there" and "if you take this one you also need these ones". Also I used to intentionally drop what had nothing to do on old LTS stuff so even from that perspective my work could have been perceived as insufficient.
The reviewing process is overwhelming, really. There is a point where you start to fail and make choices that are not better than a machine's. But is a mistake once in a while dramatic if on the other hand it fixes 200 other issues ? I think not as long as it's transparent and accepted by the users, because for one user that could experience a regression (one that escaped all the testing in place), thousands get fixes for existing problems. I'm not saying that regressions are good, I hate them, but as James said, we have to accept that user are part of the quality process.
My approach on another project I maintain is to announce upfront my own level of trust in my backport work, saying "I had a difficult week fixing that problem, do not rush on it or be extra careful", or "nothing urgent, no need to upgrade if you have no problem" or also "just upgrade, it's almost riskless". Users love that, because they know they're part of the quality assurance process, and they will either take small risks when they can, or wait for others to take risks.
But thinking that having one person review patches affecting many subsystem after pre-selection and extra info regarding discussions on each individual patch could result in more reliable stable releases is just an illusion IMHO, because the root of problem is that there are not enough humans to fix all the problems that humans introduce in the first place, and despite this we need to fix them. Just like automated scripts scraping lore, AUTOSEL does bring some value if it offloads some work from the available humans, even in its current state. And I hope that more of the selection and review work in the future will be automated and even less dependent on humans, because it does have a chance to be more reliable in front of that vast amount of work.
As I said in a part of my email which you did not quote, the fallback option is to send the list of issues to the mailing list for others to review.
If even that fails, then it could be cut down to the *just the most useful* heuristics and decisions made automatically based on those... "Don't AUTOSEL patch N of a series without 1...N-1" might be a good one.
But again, this comes back to one of the core issues here which is how does one even build something for the stable maintainers if their requirements are unknown to others?
Another issue that I'd like to reiterate is that AUTOSEL is currently turned up to 11. It's simply selecting too much.
It should be made less sensitive and select higher confidence commits only.
That would cut down on the workload slightly.
(And please note, the key word here is *confidence*. We all agree that it's never possible to be absolutely 100% sure whether a commit is appropriate for stable or not. That's a red herring.
And I would assume, or at least hope, that the neural network thing being used for AUTOSEL outputs a confidence rating and not just a yes/no answer. If it actually just outputs yes/no, well how is anyone supposed to know that and fix that, given that it does not seem to be an open source project?)
- Eric