On Mon, Nov 20, 2023 at 10:27:33PM +0000, Mark Brown wrote:
This is the sort of thing that kcidb (which Nikolai works on) is good at ingesting, I actually do push all my CI's test results into there already:
https://github.com/kernelci/kcidb/
(the dashboard is down currently.) A few other projects including the current KernelCI and RedHat's CKI push their data in there too, I'm sure Nikolai would be delighted to get more people pushing data in. The goal is to merge this with the main KernelCI infrastructure, it's currently separate while people figure out the whole big data thing.
Looking at the kernelci, it appears that it's using a JSON submission format. Is there conversion scripts that take a KTAP test report, or a Junit XML test report?
The KernelCI LF project is funding kcidb with precisely this goal for the reasons you outline, the data collection part seems to be relatively mature at this point but AIUI there's a bunch of open questions with the analysis and usage side, partly due to needing to find people to work on it.
Indeed, this is the super hard part. Having looked at the kernelci web site, its dashboard isn't particularly useful for what I'm trying to do with it. For my part, when analyizing a single test run, the kernelci dashboard isn't particularly helpful. What I need is something more like this:
ext4/4k: 554 tests, 48 skipped, 4301 seconds ext4/1k: 550 tests, 3 failures, 51 skipped, 6739 seconds Failures: generic/051 generic/475 generic/476 ext4/ext3: 546 tests, 138 skipped, 4239 seconds ext4/encrypt: 532 tests, 3 failures, 159 skipped, 3218 seconds Failures: generic/681 generic/682 generic/691 ext4/nojournal: 549 tests, 3 failures, 118 skipped, 4477 seconds Failures: ext4/301 ext4/304 generic/455 ext4/ext3conv: 551 tests, 49 skipped, 4655 seconds ext4/adv: 551 tests, 4 failures, 56 skipped, 4987 seconds Failures: generic/477 generic/506 Flaky: generic/269: 40% (2/5) generic/455: 40% (2/5) ext4/dioread_nolock: 552 tests, 48 skipped, 4538 seconds ext4/data_journal: 550 tests, 2 failures, 120 skipped, 4401 seconds Failures: generic/455 generic/484 ext4/bigalloc_4k: 526 tests, 53 skipped, 4537 seconds ext4/bigalloc_1k: 526 tests, 61 skipped, 4847 seconds ext4/dax: 541 tests, 1 failures, 152 skipped, 3069 seconds Flaky: generic/269: 60% (3/5) Totals: 6592 tests, 1053 skipped, 72 failures, 0 errors, 50577s
... which summarizes 6,592 tests in 20 lines, and for any test that has failed, we rerun it four more times, so we can get an indication of whether a test is a hard failure, or a flaky failure.
(I don't need to see all of the tests that passes; it's the test failures or the test flakes that are significant.)
And then when comparing between multiple test runs, that's when I'm interesting in see which tests may have regressed, or which tests may have been fixed when going in between version A and version B.
And right now, kernelci doesn't have any of that. So it might be hard to convinced overloaded maintainers to upload test runs to kernelci, when they don't see any immediate benefit of uploading the kernelci db.
There is a bit of a chicken-and-egg problem, since without the test results getting uploaded, it's hard to get the analysis functionality implemented, and without the analysis features, it's hard to get developers to upload the data.
That being said, a number of file system developers probably have several years worth of test results that we could probably give you. I have hundreds of junit.xml files, with information about how kernel version, what version of xfstesets, etc, that was used. I'm happy to make samples of it available for anyone who is interested.
Cheers,
- Ted