On Tue, Nov 21, 2023 at 01:04:50AM -0500, Theodore Ts'o wrote:
On Mon, Nov 20, 2023 at 10:27:33PM +0000, Mark Brown wrote:
This is the sort of thing that kcidb (which Nikolai works on) is good at ingesting, I actually do push all my CI's test results into there already:
(the dashboard is down currently.) A few other projects including the current KernelCI and RedHat's CKI push their data in there too, I'm sure Nikolai would be delighted to get more people pushing data in. The goal is to merge this with the main KernelCI infrastructure, it's currently separate while people figure out the whole big data thing.
Looking at the kernelci, it appears that it's using a JSON submission format. Is there conversion scripts that take a KTAP test report, or a Junit XML test report?
Probably - I know I've got something for KUnit which is annoyingly difficult to publish for non-technical reasons and is a little broken (things weren't visible in the dashboard when it was up which might mean some missing field or a date set wrong). My KTAP stuff is all mediated through LAVA, that can push results into a web hook directly so it's really easy to just add a notifier to your job and stream the results in directly (I intend to push that into kcidb in my copious free time so other people can use my code there). It's relatively straightforward to write these things.
The KernelCI LF project is funding kcidb with precisely this goal for the reasons you outline, the data collection part seems to be relatively mature at this point but AIUI there's a bunch of open questions with the analysis and usage side, partly due to needing to find people to work on it.
Indeed, this is the super hard part. Having looked at the kernelci web site, its dashboard isn't particularly useful for what I'm trying to do with it. For my part, when analyizing a single test run, the kernelci dashboard isn't particularly helpful. What I need is something more like this:
ext4/4k: 554 tests, 48 skipped, 4301 seconds ext4/1k: 550 tests, 3 failures, 51 skipped, 6739 seconds Failures: generic/051 generic/475 generic/476
That should be achievable with the KernelCI stuff (which is different to kcidb at present) - you're a lot of the way there with how kselftest is currently reported modulo the list of failures which currently requires you to drill down to a second level page.
... which summarizes 6,592 tests in 20 lines, and for any test that has failed, we rerun it four more times, so we can get an indication of whether a test is a hard failure, or a flaky failure.
(I don't need to see all of the tests that passes; it's the test failures or the test flakes that are significant.)
The listing of tests does get a bit more complex when you mix in running on different platforms.
And then when comparing between multiple test runs, that's when I'm interesting in see which tests may have regressed, or which tests may have been fixed when going in between version A and version B.
Yup, that comparison stuff is useful. The landing pages for individual tests do have something there but not really anything higher level:
https://linux.kernelci.org/test/case/id/655b0fa18dc4b7e0c47e4a88/
And right now, kernelci doesn't have any of that. So it might be hard to convinced overloaded maintainers to upload test runs to kernelci, when they don't see any immediate benefit of uploading the kernelci db.
Note that kcidb and KernelCI are currently different databases - with the dashboard being done kcidb has no UI at all. Personally I'm pushing my data in on the basis that it costs me basically nothing to do so given that I'm already running the tests.
There is a bit of a chicken-and-egg problem, since without the test results getting uploaded, it's hard to get the analysis functionality implemented, and without the analysis features, it's hard to get developers to upload the data.
I think if we get tooling in place so that people can just run a script, add a flag to their tools or whatever to ingest results from the standard testsuites the barrier to reporting becomes sufficiently low that it's more of a "why not?" type thing.
There's also other things we can do beyond big data analysis, some of which are a lot easier - for example checking other people's CI results for your branch before sending or accepting a pull request (if you've got a one stop shop to pull data from that's a lot easier than if you have to go round a bunch of places).
That being said, a number of file system developers probably have several years worth of test results that we could probably give you. I have hundreds of junit.xml files, with information about how kernel version, what version of xfstesets, etc, that was used. I'm happy to make samples of it available for anyone who is interested.
Right, I've likewise got a pile of results I can reprocess at will.