Hi!
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
On Wed, Apr 02, 2025 at 10:36:55AM -0700, Jakub Kicinski wrote:
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
You've not said what tree you're testing there but everything is fine in -next AFAICT, and there is one fix for cs_dsp on it's way to Linus at the minute specifically for KUnit.
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
This is the first report I've seen from you...
On Wed, 2 Apr 2025 18:55:29 +0100 Mark Brown wrote:
On Wed, Apr 02, 2025 at 10:36:55AM -0700, Jakub Kicinski wrote:
The Cirrus tests keep failing for me when run on x86 ./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64 https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
You've not said what tree you're testing there but everything is fine in -next AFAICT, and there is one fix for cs_dsp on it's way to Linus at the minute specifically for KUnit.
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
This is the first report I've seen from you...
More of a test balloon that a real report as you alluded to. I was wondering if it's just me, and from your response it seems to be just me.
I did more digging and with newer compilers I don't see the issue either. So I did:
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64 \ --kconfig_add CONFIG_INIT_STACK_ALL_ZERO=n
And the problem reproduces on recent compilers, too. Which leads me to thinking that the tests are broken but stack init covers it up.. ? Strangely tho
make W=1 LLVM=1 ...
doesn't report any uninitialized variable warnings.
On Wed, Apr 02, 2025 at 03:56:44PM -0700, Jakub Kicinski wrote:
On Wed, 2 Apr 2025 18:55:29 +0100 Mark Brown wrote:
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
This is the first report I've seen from you...
More of a test balloon that a real report as you alluded to. I was wondering if it's just me, and from your response it seems to be just me.
There were some reports of things but not ongoing ones - that's where the fix that was queued came from.
And the problem reproduces on recent compilers, too. Which leads me to thinking that the tests are broken but stack init covers it up.. ? Strangely tho
make W=1 LLVM=1 ...
doesn't report any uninitialized variable warnings.
I think there's parameter passing by pointer which manages to confuse matters.
On Thu, 3 Apr 2025 at 01:55, 'Mark Brown' via KUnit Development kunit-dev@googlegroups.com wrote:
On Wed, Apr 02, 2025 at 10:36:55AM -0700, Jakub Kicinski wrote:
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
You've not said what tree you're testing there but everything is fine in -next AFAICT, and there is one fix for cs_dsp on it's way to Linus at the minute specifically for KUnit.
FWIW, I can reproduce the issue in the current torvalds/master, but not in linux-next, so I assume this[1] fixed it. (Which looks at a glance like it's fixing an uninitialised variable, so that fits.)
That being said, there are a _lot_ of Cirrus FW tests (over 5000 by my count when all of the parameterised ones are taken into account), so this can still trigger the default 5-minute kunit.py timeout. And because its split up over a lot of smaller suites, it doesn't really make sense to mark all of those tests/suites as slow (though some of them might merit it). If that's causing issues for people, maybe that could be a good reason to remove them from KUNIT_ALL_TESTS, though personally, I suspect the better answer might be to move it behind a config option which can then be excluded from the default .kunitconfig if necessary, as in [2]. (Or just increase the default timeut.)
Cheers, -- David
[1]: https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/comm... [2]: https://lore.kernel.org/linux-kselftest/20250319230539.140869-1-npache@redha...
On Thu, Apr 03, 2025 at 02:19:40PM +0800, David Gow wrote:
That being said, there are a _lot_ of Cirrus FW tests (over 5000 by my count when all of the parameterised ones are taken into account), so this can still trigger the default 5-minute kunit.py timeout. And because its split up over a lot of smaller suites, it doesn't really make sense to mark all of those tests/suites as slow (though some of them might merit it). If that's causing issues for people, maybe that could be a good reason to remove them from KUNIT_ALL_TESTS, though personally, I suspect the better answer might be to move it behind a config option which can then be excluded from the default .kunitconfig if necessary, as in [2]. (Or just increase the default timeut.)
I've not heard anyone mention hitting the timeouts, though now I run interactively I do see that the Cirrus stuff is a good proportion of the runtime for --alltests. We could potentially add a config option, though it'd need to end up in the config file so it gets turned on and we'd still run into any timeout issues. There's a tension with people expecting --alltests to actually run all the tests as well...
On 3/4/25 14:26, Mark Brown wrote:
On Thu, Apr 03, 2025 at 02:19:40PM +0800, David Gow wrote:
That being said, there are a _lot_ of Cirrus FW tests (over 5000 by my count when all of the parameterised ones are taken into account), so this can still trigger the default 5-minute kunit.py timeout. And because its split up over a lot of smaller suites, it doesn't really make sense to mark all of those tests/suites as slow (though some of them might merit it). If that's causing issues for people, maybe that could be a good reason to remove them from KUNIT_ALL_TESTS, though personally, I suspect the better answer might be to move it behind a config option which can then be excluded from the default .kunitconfig if necessary, as in [2]. (Or just increase the default timeut.)
I've not heard anyone mention hitting the timeouts, though now I run interactively I do see that the Cirrus stuff is a good proportion of the runtime for --alltests. We could potentially add a config option, though it'd need to end up in the config file so it gets turned on and we'd still run into any timeout issues. There's a tension with people expecting --alltests to actually run all the tests as well...
I don't want to get into the situation where nobody outside of Cirrus is running the tests. One of the main points of converting our tests to kunit was the hope that more people would run them on more configurations.
Ideally if they have any failures they would report those back with enough information for us to investigate. Disabling tests that fail and ignoring them isn't helpful.
On Thu, Apr 03, 2025 at 05:57:51PM +0100, Richard Fitzgerald wrote:
On 3/4/25 14:26, Mark Brown wrote:
I've not heard anyone mention hitting the timeouts, though now I run interactively I do see that the Cirrus stuff is a good proportion of the runtime for --alltests. We could potentially add a config option, though it'd need to end up in the config file so it gets turned on and we'd still run into any timeout issues. There's a tension with people expecting --alltests to actually run all the tests as well...
I don't want to get into the situation where nobody outside of Cirrus is running the tests. One of the main points of converting our tests to kunit was the hope that more people would run them on more configurations.
Yes, me either - I do actually run them as part of my automated testing.
On Fri, 4 Apr 2025 at 01:48, Mark Brown broonie@kernel.org wrote:
On Thu, Apr 03, 2025 at 05:57:51PM +0100, Richard Fitzgerald wrote:
On 3/4/25 14:26, Mark Brown wrote:
I've not heard anyone mention hitting the timeouts, though now I run interactively I do see that the Cirrus stuff is a good proportion of the runtime for --alltests. We could potentially add a config option, though it'd need to end up in the config file so it gets turned on and we'd still run into any timeout issues. There's a tension with people expecting --alltests to actually run all the tests as well...
I don't want to get into the situation where nobody outside of Cirrus is running the tests. One of the main points of converting our tests to kunit was the hope that more people would run them on more configurations.
Yeah, I definitely agree that this should be enabled by --alltests regardless: we possibly need to adjust the default timeout for --alltests compared to without --alltests.
Ultimately, this is about balancing a few different usecases: - I want to check nothing relevant to my config is broken (enable CONFIG_KUNIT_ALL_TESTS on top of the existing config) - I want to test everything I can in my tree / CI system / etc (run with --alltests, possibly several times across different architectures, or with KASAN, etc) - I'm developing something specific, and want to run all of the tests related to that (use a subsystem/driver/feature specific .kunitconfig) - I want to quickly check if my source tree builds, boots, and is vaguely sane (run the "default" set of tests) - I want to run tests on my ancient/slow machine/emulator without waiting all day (run whichever config makes sense, but filtering out "slow" tests).
Now, my gut feeling is that we definitely want to run all of these tests for the --alltests case, and also for the "developing the Cirrus Logic driver" case, but that only a subset of them need to run in the "does my tree look vaguely sane" case, and there's limited (but not zero) use in running them in the "I'm testing my config which doesn't have any of the Cirrus Logic drivers enabled" case.
If that's the case, I think the ideal solution is to: - Make sure these tests don't automatically enable themselves if no driver which depends on the firmware loader is enabled. (e.g. make it depend on the library, rather than selecting it. If there's an extra "dummy" option which force-enables it, that's fine, but that shouldn't be enabled if KUNIT_ALL_TESTS is) - As a result, make them more explicitly enabled with --alltests, and probably disabled -- or only run a subset -- in the default. Currently this is mostly decided by whether CONFIG_REGMAP is enabled, having a specific item to use for these tests would be less surprising. - If any of the individual tests are particularly slow (more than a ~second or so on fast hardware), mark them as slow. Most people still enable slow tests, so they'll still get run most of the time, but they can be skipped on old m68k machines, or gated behind the quick tests passing in CI systems, etc) - Bump up the default timeout ( at least for --alltests), as 5 minutes clearly isn't enough for, e.g., qemu-based emulation anymore. (I'm happy to do this: I've got some timeout-related patches I'm working on anyway.)
Does that seem like a sensible approach?
Cheers, -- David
On 4/4/25 07:53, David Gow wrote:
On Fri, 4 Apr 2025 at 01:48, Mark Brown broonie@kernel.org wrote:
On Thu, Apr 03, 2025 at 05:57:51PM +0100, Richard Fitzgerald wrote:
On 3/4/25 14:26, Mark Brown wrote:
I've not heard anyone mention hitting the timeouts, though now I run interactively I do see that the Cirrus stuff is a good proportion of the runtime for --alltests. We could potentially add a config option, though it'd need to end up in the config file so it gets turned on and we'd still run into any timeout issues. There's a tension with people expecting --alltests to actually run all the tests as well...
I don't want to get into the situation where nobody outside of Cirrus is running the tests. One of the main points of converting our tests to kunit was the hope that more people would run them on more configurations.
Yeah, I definitely agree that this should be enabled by --alltests regardless: we possibly need to adjust the default timeout for --alltests compared to without --alltests.
Ultimately, this is about balancing a few different usecases:
- I want to check nothing relevant to my config is broken (enable
CONFIG_KUNIT_ALL_TESTS on top of the existing config)
- I want to test everything I can in my tree / CI system / etc (run
with --alltests, possibly several times across different architectures, or with KASAN, etc)
- I'm developing something specific, and want to run all of the tests
related to that (use a subsystem/driver/feature specific .kunitconfig)
- I want to quickly check if my source tree builds, boots, and is
vaguely sane (run the "default" set of tests)
I don't mind reorganizing the cs_dsp test into smoke and thorough.
Although the feedback I've seen (such as it is) is not about the time it takes but the fact it flagged an error. "This test fails - disable it" isn't helped if it's the quick smoke test which failed.
It's reasonable that people are annoyed if the test fails because of a bug in the test. It seems strange to disable the test because it fails on a real bug in the code it is testing (which it did).
- I want to run tests on my ancient/slow machine/emulator without
waiting all day (run whichever config makes sense, but filtering out "slow" tests).
Now, my gut feeling is that we definitely want to run all of these tests for the --alltests case, and also for the "developing the Cirrus Logic driver" case, but that only a subset of them need to run in the "does my tree look vaguely sane" case, and there's limited (but not zero) use in running them in the "I'm testing my config which doesn't have any of the Cirrus Logic drivers enabled" case.
If that's the case, I think the ideal solution is to:
- Make sure these tests don't automatically enable themselves if no
driver which depends on the firmware loader is enabled. (e.g. make it depend on the library, rather than selecting it. If there's an extra "dummy" option which force-enables it, that's fine, but that shouldn't be enabled if KUNIT_ALL_TESTS is)
I thought someone had already patched that but apparently not. I'm not the only person who thought "ALL_TESTS" meant "all tests", not "some tests". Given that ALL_TESTS is badly named and really means "All tests for modules that are already selected" then it should only build if cs_dsp is specifically selected.
However the users of cs_dsp can have a lot of dependencies that are entirely irrelevant to cs_dsp and so cs_dsp *could* be tested without those. (This also applies to other Cirrus KUnit tests.) I started on some patches to make the Kconfig for the tested libraries visible if CONFIG_KUNIT is enabled so that they can be selected for testing without the need to enable a large bunch of other frameworks just to be able to enable a user of the library. They are somewhere in my to-do pile.
Related to this I created a UM kunit.py config specifically to run all Cirrus KUnit tests with minimum other clutter. But does that mean nobody outside Cirrus will run it? Or is there a process somewhere that runs kunit.py on all configs in tools/testing/kunit/configs ?
- As a result, make them more explicitly enabled with --alltests, and
probably disabled -- or only run a subset -- in the default. Currently this is mostly decided by whether CONFIG_REGMAP is enabled, having a specific item to use for these tests would be less surprising.
- If any of the individual tests are particularly slow (more than a
~second or so on fast hardware), mark them as slow. Most people still enable slow tests, so they'll still get run most of the time, but they can be skipped on old m68k machines, or gated behind the quick tests passing in CI systems, etc)
I don't expect any individual test cases are slow (I'll check through). One of the advantages of parameterizing them was that it avoids one single test case taking a long time looping through a lot of testing. They should be a quick write-read-check-exit. "Slowness" is tricky to judge. The slowest hardware I have is an ARM7 clocked at a few hundred MHz, or use QEMU in full emulation.
- Bump up the default timeout ( at least for --alltests), as 5 minutes
clearly isn't enough for, e.g., qemu-based emulation anymore. (I'm happy to do this: I've got some timeout-related patches I'm working on anyway.)
Is this a default for each test case or the entire suite?
Does that seem like a sensible approach?
Cheers, -- David
On Fri, Apr 04, 2025 at 12:10:13PM +0100, Richard Fitzgerald wrote:
On 4/4/25 07:53, David Gow wrote:
- Make sure these tests don't automatically enable themselves if no
driver which depends on the firmware loader is enabled. (e.g. make it depend on the library, rather than selecting it. If there's an extra "dummy" option which force-enables it, that's fine, but that shouldn't be enabled if KUNIT_ALL_TESTS is)
I thought someone had already patched that but apparently not. I'm not the only person who thought "ALL_TESTS" meant "all tests", not "some tests". Given that ALL_TESTS is badly named and really means "All tests for modules that are already selected" then it should only build if cs_dsp is specifically selected.
The change didn't enable the drivers in the config fragement for all tests, and TBH I suspect it's a better idea to have a specific config option for the tests that depends on KUnit or something since the drivers will pull in subsystems which gets to be a mess and possibly fragile.
However the users of cs_dsp can have a lot of dependencies that are entirely irrelevant to cs_dsp and so cs_dsp *could* be tested without those. (This also applies to other Cirrus KUnit tests.) I started on some patches to make the Kconfig for the tested libraries visible if CONFIG_KUNIT is enabled so that they can be selected for testing without the need to enable a large bunch of other frameworks just to be able to enable a user of the library. They are somewhere in my to-do pile.
Yeah, I think that's the most sensible thing for coverage.
Related to this I created a UM kunit.py config specifically to run all Cirrus KUnit tests with minimum other clutter. But does that mean nobody outside Cirrus will run it? Or is there a process somewhere that runs kunit.py on all configs in tools/testing/kunit/configs ?
I'm not aware of anything. I'm running --alltests (on virtual hardware rather than um due to the environment where things get run).
- As a result, make them more explicitly enabled with --alltests, and
probably disabled -- or only run a subset -- in the default. Currently this is mostly decided by whether CONFIG_REGMAP is enabled, having a specific item to use for these tests would be less surprising.
- If any of the individual tests are particularly slow (more than a
~second or so on fast hardware), mark them as slow. Most people still enable slow tests, so they'll still get run most of the time, but they can be skipped on old m68k machines, or gated behind the quick tests passing in CI systems, etc)
I don't expect any individual test cases are slow (I'll check through). One of the advantages of parameterizing them was that it avoids one single test case taking a long time looping through a lot of testing. They should be a quick write-read-check-exit. "Slowness" is tricky to judge. The slowest hardware I have is an ARM7 clocked at a few hundred MHz, or use QEMU in full emulation.
On the scale of KUnit tests there's quite a few that are noticable, it's just a second or two per test but even in qemu the tests are usually ticking by faster than can be read.
On 2/4/25 18:36, Jakub Kicinski wrote:
Hi!
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
Please report the errors so we can investigate. "tests keep failing" isn't telling me anything I need to know to be able to investigate what the problem is. They all pass for me on x86_64. I ran them a few days ago when I submitted a fix for a bug that the test had found.
Have you got the latest bugfixes?
https://lore.kernel.org/all/20250211-cs_dsp-kunit-strings-v1-1- d9bc2035d154@linutronix.de/
https://lore.kernel.org/all/174291786934.56229.8841259212687977896.b4-ty@ker...
If you haven't got the bugfixes, then it's expected that the test fails, because it's telling you there is a bug.
If you just add failures to a ignore list instead of reporting them, you might just be helping to ensure real bugs in production code don't get fixed.
On 02/04/2025 6:36 pm, Jakub Kicinski wrote:
Hi!
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
It seems like new cases continue to appear and we have to keep adding them to the local ignored list. Is it possible to get these fixed or can we exclude the cirrus tests from KUNIT_ALL_TESTS?
This looks like the problem that was fixed by: https://lore.kernel.org/linux-sound/174291786934.56229.8841259212687977896.b...
linux-kselftest-mirror@lists.linaro.org