Hello,
As discussed with Greg at LPC, we are starting an iterative process to deliver meaningful stable test reports from KernelCI. As Greg pointed out, he doesn't look at the current reports sent automatically from KernelCI. Those are not clean enough to help the stable release process, so we discussed starting over again.
This reporting process is a learning exercise, growing over time. We are starting small with data we can verify manually (at first) to make sure we are not introducing noise or reporting flakes and false-positives. The feedback loop will teach us how to filter the results and report with incremental automation of the steps.
Today we are starting with build and boot tests (for the hardware platforms in KernelCI with sustained availability over time). Then, at every iteration we try to improve it, increasing the coverage and data visualization. Feedback is really important. Eventually, we will also have this report implemented in the upcoming KernelCI Web Dashboard.
This work is a contribution from Collabora(on behalf of its clients) to improve the Kernel Integration as whole. Moving forward, Shreeya Patel, from the Collabora team will be taking on the responsibilities of delivering these reports.
Without further ado, here's our first report:
## stable-rc HEADs:
Date: 2023-12-08 6.1: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l... 5:15: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l... 5.10: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l... 5:4: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l...
* 6.6 stable-rc was not added in KernelCI yet, but we plan to add it next week.
## Build failures:
No build failures seen for the stable-rc/queue commit heads for 6.1/5.15/5.10/5.4 \o/
## Boot failures:
No **new** boot failures seen for the stable-rc/queue commit heads for 6.1/5.15/5.10/5.4 \o/
(for the time being we are leaving existing failures behind)
## Considerations
All this data is available in the legacy KernelCI Web Dashboard - https://linux.kernelci.org/ - but not easily filtered there. The data in this report was checked manually. As we evolve this report, we want to add traceability of the information, making it really easy for anyone to dig deeper for more info, logs, etc.
The report covers the hardware platforms in KernelCI with sustained availability over time - we will detail this further in future reports.
We opted to make the report really simple as you can see above. It is just an initial spark. From here your feedback will drive the process. So really really tell us what you want to see next. We want FEEDBACK!
Best,
- Gus
On Fri, Dec 08, 2023 at 12:29:35PM -0300, Gustavo Padovan wrote:
Looks great!
A few notes, it can be a bit more verbose if you want :)
One email per -rc release (i.e. one per branch) is fine, and that way if you add a: Tested-by: kernelci-bot <email goes here> or something like that, to the email, my systems will pick it up and it will get added to the final commit message.
But other than that, hey, I'll take the above, it's better than what was there before!
How about if something breaks, what will it look like? That's where it gets more "interesting" :)
thanks,
greg k-h
On 08/12/2023 16:58, Greg KH wrote:
Brings back some memories, 5.10.20-rc2 :)
https://lore.kernel.org/stable/32a6c609-642c-71cf-0a84-d5e8ccd104b1@collabor...
I see some people are working in my footsteps now, it'll be interesting to see if they reach the same conclusions about how to automate these emails and track regressions. I guess it's hard to convince others that the solutions we now know we need to put in place are going to solve this, so everyone has to do the journey themselves. Maybe that's part of upstream development, not always removing duplication of efforts.
Here's some feedback in general:
* Showing what is passing is mostly noise
As Greg pointed out, what's important is the things that are broken (so new regressions). For stable, I think we also established that it was good to keep a record of all the things that were tested and passed, but it's not too relevant when gating releases. See the other manual emails sent by Shuah, Guenter and some Linaro folks for example.
* Replying to the stable review
This email is a detached thread, I know it's a draft and just a way to discuss things, but obviously a real report would need to be sent as a reply to the patch review thread using stable-rc.
On a related topic, it was once mentioned that since stable releases occur once a week and they are used as the basis for many distros and products, it would make sense to have long-running tests after the release has been declared. So we could have say, 48h of testing with extended coverage from LTP, fstests, benchmarks etc. That would be a reply to the email with the release tag, not the patch review.
For the record, a few years ago, KernelCI used to reply to the review threads on the list. Unfortunately this broke at some point, mostly because the legacy system is too bloated and hard to maintain, and now it's waiting to be enabled again with the new API. Here's one example, 4.4-202 in 2019 a bit before it stopped:
https://lore.kernel.org/stable/5dce97f3.1c69fb81.6633c.685c@mx.google.com/
* Automation
And also obviously, doing this by hand isn't really practical. It's OK for a maintainer looking just at a small amount of results, but for KernelCI it would take maybe 2h per stable release candidate for a dedicated person to look at all the regressions etc. So discussing the format and type of content is more relevant at this stage I think, while the automated data harvesting part gets implemented in the background. And of course, we need the new API in production for this to be actually enabled - so still a few months away from now.
I've mentioned before the concept of finding "2nd derivatives" in the rest results, basically the first delta gives you all the regressions and then you do a delta of the regressions to find the new ones. Maintainer trees would be typically comparing against mainline or say, the -rc2 tag where they based their branch. In the case of stable, it would be between the stable-rc branch being tested and the base stable branch with the last tagged release.
But hey, I'm not a stable maintainer :) This is merely a summary of what I recall from the past few years of discussions and what I believe to be the current consensus on what people wanted to do next.
One last thing, I see there's a change in KernelCI now to actually stop sending the current (suboptimal) automated reports to the stable mailing list:
https://github.com/kernelci/kernelci-jenkins/pull/136
Is this actually what people here want? I would argue that we need the new reports first before deliberately stopping the old ones. Maybe I missed something, it just felt a bit arbitrary. Some folks might actually be reading these emails, if we wanted to stop them we probably should first send a warning about when they'll stop etc. Anyway, I'll go back under my rock for now :)
Cheers, Guillaume
On Mon, Dec 11, 2023 at 11:14:03AM +0100, Guillaume Tucker wrote:
What tests take longer than 48 hours?
Yes, that is going to be required for this to be useful.
If these reports are currently for me, I'm just deleting them as they provide no value anymore. So yes, let's stop this until we can get something that actually works for us please.
thanks,
greg k-h
On 11/12/2023 14:07, Greg KH wrote:
Well, I'm not sure what you're actually asking here. Strictly speaking, some benchmarks and fuzzing can run for longer than 48h.
What I meant is that testing is always open-ended, we could run tests literally forever on every kernel revision if we wanted to. For maintainer trees, it's really useful to have a short feedback loop and get useful results within say, 1h. For linux-next and mainline, maybe more testing can be done and results could take up to 4h to arrive. Then for stable releases (not stable-rc), as they happen basically once a week and are adopted as a base revision by a large group of users, it would make sense to have a bigger "testing budget" and allow up to maybe 48h of testing efforts. As to how to make best use of this time, there are various ways to look at it.
I would suggest to first run the tests that aren't usually run such as some less common fstests combinations as well as some LTP and kselftests suites that take more than 30 min to complete. Also, if there are any reproducers for the fixes that have been applied to the stable branch then they could be run as true regression testing to confirm these issues don't come back. Then some additional benchmarks and tests that are known to "fail" occasionally could also be run to gather more stats. This could potentially show trends in case of say, performance deviation over several months on LTS with finer granularity.
OK thanks for confirming.
Right, I wasn't sure if anyone else was interested in them. It sounds like Sasha doesn't really need them either, although he wrote on IRC that he wouldn't disable them until something better was in place. I would suggest sending at least an email to the stable list to propose to stop these emails with a particular date and ideally some kind of plan about when some new emails would be available to replace them. But if really nobody else than you needs the current emails, then effectively nobody needs them and we can stop now of course.
Cheers, Guillaume
linux-stable-mirror@lists.linaro.org