Hello LKFT maintainers, CI operators,
First, I would like to say thank you to the people behind the LKFT project for validating stable kernels (and more), and including some Network selftests in their tests suites.
A lot of improvements around the networking kselftests have been done this year. At the last Netconf [1], we discussed how these tests were validated on stable kernels from CIs like the LKFT one, and we have some suggestions to improve the situation.
KSelftests from the same version --------------------------------
According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this.
Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored.
Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too.
Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
Skipped tests -------------
It looks like many tests are skipped:
- Some have been in a skip file [3] for a while: maybe they can be removed?
- Some are skipped because of missing tools: maybe they can be added? e.g. iputils, tshark, ipv6toolkit, etc.
- Some tests are in 'net', but in subdirectories, and hence not tested, e.g. forwarding, packetdrill, netfilter, tcp_ao. Could they be tested too?
How can we change this to increase the code coverage using existing tests?
KVM ---
It looks like different VMs are being used to execute the different tests. Do these VMs benefit from any accelerations like KVM? If not, some tests might fail because the environment is too slow.
The KSFT_MACHINE_SLOW=yes env var can be set to increase some tolerances, timeout or to skip some parts, but that might not be enough for some tests.
Notifications -------------
In case of new regressions, who is being notified? Are the people from the MAINTAINERS file, and linked to the corresponding selftests being notified or do they need to do the monitoring on their side?
Looking forward to improving the networking selftests results when validating stable kernels!
[1] https://netdev.bots.linux.dev/netconf/2024/ [2] https://docs.kernel.org/dev-tools/kselftest.html [3] https://github.com/Linaro/test-definitions/blob/master/automated/linux/kself...
Cheers, Matt
On Fri, Nov 08, 2024 at 07:21:59PM +0100, Matthieu Baerts wrote:
KSelftests from the same version
According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this.
Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored.
Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too.
Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
These kinds of decisions are something that Greg and Shuah need to decide on.
You would still need some way to automatically detect that kselftest is running on an old kernel and disable the networking checks. Otherwise when random people on the internet try to run selftests they would run into issues.
regards, dan carpenter
Hi Dan,
Thank you for your reply!
On 13/11/2024 18:08, Dan Carpenter wrote:
On Fri, Nov 08, 2024 at 07:21:59PM +0100, Matthieu Baerts wrote:
KSelftests from the same version
According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this.
Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored.
Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too.
Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
These kinds of decisions are something that Greg and Shuah need to decide on.
Thank you, it makes sense.
You would still need some way to automatically detect that kselftest is running on an old kernel and disable the networking checks. Otherwise when random people on the internet try to run selftests they would run into issues.
Indeed. I guess we can always add a warning when the kernel and selftests versions are different. I suppose the selftests are built using the same kernel version, then executed on older versions: we could then compare the kernel versions at build time and run time, no?
Regarding the other questions from my previous email -- skipped tests (e.g. I think Netfilter tests are no longer validated), KVM, notifications -- do you know who at Linaro could eventually look at them?
Cheers, Matt
On Fri, Nov 15, 2024 at 01:43:14PM +0100, Matthieu Baerts wrote:
Regarding the other questions from my previous email -- skipped tests (e.g. I think Netfilter tests are no longer validated), KVM, notifications -- do you know who at Linaro could eventually look at them?
The skip tests were because they lead to hangs. We're going to look at those again to see if they're still an issue. And we're also going to try enable the other tests you mentioned.
regards, dan carpenter
Hi Dan,
On 15/11/2024 14:07, Dan Carpenter wrote:
On Fri, Nov 15, 2024 at 01:43:14PM +0100, Matthieu Baerts wrote:
Regarding the other questions from my previous email -- skipped tests (e.g. I think Netfilter tests are no longer validated), KVM, notifications -- do you know who at Linaro could eventually look at them?
The skip tests were because they lead to hangs. We're going to look at those again to see if they're still an issue. And we're also going to try enable the other tests you mentioned.
Great, thank you!
For KVM (or similar), I guess it is not available, right? Some time-sensitive tests might be unstable in such environment, and need to be skipped.
For the notifications, do not hesitate to contact the corresponding maintainers, the last people who modified the problematic selftests and the netdev list. These "net" selftests are now better maintained, and they are regularly validated on the development branches:
https://netdev.bots.linux.dev/status.html
Cheers, Matt
On 11/8/24 11:21, Matthieu Baerts wrote:
Hello LKFT maintainers, CI operators,
First, I would like to say thank you to the people behind the LKFT project for validating stable kernels (and more), and including some Network selftests in their tests suites.
A lot of improvements around the networking kselftests have been done this year. At the last Netconf [1], we discussed how these tests were validated on stable kernels from CIs like the LKFT one, and we have some suggestions to improve the situation.
KSelftests from the same version
According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this.
This is from the Documentation/dev-tools/kselftest.rst: ---- Kselftest from mainline can be run on older stable kernels. Running tests from mainline offers the best coverage. Several test rings run mainline kselftest suite on stable releases. The reason is that when a new test gets added to test existing code to regression test a bug, we should be able to run that test on an older kernel. Hence, it is important to keep code that can still test an older kernel and make sure it skips the test gracefully on newer releases. ----
As it states, running tests from mainline increases the coverage when new tests are added to regression test an existing kernel feature in a stable release.
It also says that when mainline tests are running on an older kernel, the test should detect missing features and report skips.
The above paragraph addresses test developers and users. I would say the policy regarding the test development will not change. We want to keep it the same, continuing to take measures to skip tests when a feature isn't supported in the kernel the tests are running on. This addresses not just a kernel and test revision mismatch, but also when a feature isn't enabled when kernel and test revisions match.
This policy helps us find bugs in the tests failing when they should skip. If test rings move to a new policy, our ability to find bugs like this goes down.
As per users and test ring maintainers, they need to be aware of the reduced coverage if they revision match kernel and tests. Revision matching example: 6.11.8 tests on 6.11.8 stable
Greg KH and other stable maintainers can weigh in on whether they would like LKFT to go from running mainline tests on stable releases to revision matching.
Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored.
Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too.
The assumption that new tests can be backported is incorrect. It goes against the stable rules. We backport fixes and not new features and new tests.
Running kselftests from the same release will reduce coverage when a new test is added to regression test a 6.11 feature. This happens more often than not. Revision matching example: 6.11.8 tests on 6.11.8 stable
Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
It is expected that there will be more skipped tests as you run tests from mainline on stable releases. You will see more skips on older stables.
An alternative would be to revision match for older stables. New tests could be written for 6.12 which should be run on 6.11 and maybe not on 6.1 depending on missed coverage.
Before changing the current approach, it is important to understand that running mainline tests on stable releases increases test coverage and that newer tests will not be backported and that the coverage gap will increase overtime.
thanks, -- Shuah
Hi Shuah, Greg,
Thank you for your reply!
On 13/11/2024 19:33, Shuah Khan wrote:
On 11/8/24 11:21, Matthieu Baerts wrote:
Hello LKFT maintainers, CI operators,
First, I would like to say thank you to the people behind the LKFT project for validating stable kernels (and more), and including some Network selftests in their tests suites.
A lot of improvements around the networking kselftests have been done this year. At the last Netconf [1], we discussed how these tests were validated on stable kernels from CIs like the LKFT one, and we have some suggestions to improve the situation.
KSelftests from the same version
According to the doc [2], kselftests should support all previous kernel versions. The LKFT CI is then using the kselftests from the last stable release to validate all stable versions. Even if there are good reasons to do that, we would like to ask for an opt-out for this policy for the networking tests: this is hard to maintain with the increased complexity, hard to validate on all stable kernels before applying patches, and hard to put in place in some situations. As a result, many tests are failing on older kernels, and it looks like it is a lot of work to support older kernels, and to maintain this.
This is from the Documentation/dev-tools/kselftest.rst:
Kselftest from mainline can be run on older stable kernels. Running tests from mainline offers the best coverage. Several test rings run mainline kselftest suite on stable releases. The reason is that when a new test gets added to test existing code to regression test a bug, we should be able to run that test on an older kernel. Hence, it is important to keep code that can still test an older kernel and make sure it skips the test gracefully on newer releases.
As it states, running tests from mainline increases the coverage when new tests are added to regression test an existing kernel feature in a stable release.
It also says that when mainline tests are running on an older kernel, the test should detect missing features and report skips.
The above paragraph addresses test developers and users. I would say the policy regarding the test development will not change. We want to keep it the same, continuing to take measures to skip tests when a feature isn't supported in the kernel the tests are running on. This addresses not just a kernel and test revision mismatch, but also when a feature isn't enabled when kernel and test revisions match.
This policy helps us find bugs in the tests failing when they should skip. If test rings move to a new policy, our ability to find bugs like this goes down.
As per users and test ring maintainers, they need to be aware of the reduced coverage if they revision match kernel and tests. Revision matching example: 6.11.8 tests on 6.11.8 stable
Greg KH and other stable maintainers can weigh in on whether they would like LKFT to go from running mainline tests on stable releases to revision matching.
I appreciate these explanations. When we discussed this subject at Netconf, we looked at the documentation, and we understood the advantages of running newer kselftests on older kernels. But the issue we have is to "detect missing features and report skips": that's hard to maintain, because it increases the code complexity, and it is hard to validate before applying patches.
One of the reasons is that many networking selftests are validating internal behaviours that are not exposed to the userspace. That makes it hard to detect what behaviour to expect, and checking the kernel version doesn't seem to be the right thing to do here. Or does it mean that these essential tests should not validate the internal behaviours, e.g. checking that the packets sent on the wire are formatted correctly?
A compromise could be to mark the tests checking internal behaviours, and warn testers that they should be executed on the same version. Or even run all the tests twice: once with the kselftests from the same version, and once using the kselftests from the latest stable version. WDYT?
The main problem we saw when using kselftests from a newer version is that the code coverage of many 'net' tests might even decrease over time. In this subsystem, it is common to have "big" selftests running many subtests. When a new feature is added, a new subtest might be added in an existing selftest. When one subtest fails -- e.g. because the test is not skipped on older kernels -- the whole selftest is marked as failed. In a situation where a selftest is always failing due to one subtest, it means people stop looking at regressions with the other subtests. If we cannot easily predict which internal behaviour is expected, a workaround not to reduce the code coverage is to parse subtests, but not all selftests formats the results in an inner TAP 13 format. Both predicting the kernel behaviour, and changing the output format look like quite a lot of work as there are hundreds of existing selftests, with thousands of subtests.
Many networking tests are validating the internal behaviour that is not exposed to the userspace. A typical example: some tests look at the raw packets being exchanged during a test, and this behaviour can change without modifying how the userspace is interacting with the kernel. The kernel could expose capabilities, but that's not something that seems natural to put in place for internal behaviours that are not exposed to end users. Maybe workarounds could be used, e.g. looking at kernel symbols, etc. Nut that doesn't always work, increase the complexity, and often "false positive" issue will be noticed only after a patch hits stable, and will cause a bunch of tests to be ignored.
Regarding fixes, ideally they will come with a new or modified test that can also be backported. So the coverage can continue to grow in stable versions too.
The assumption that new tests can be backported is incorrect. It goes against the stable rules. We backport fixes and not new features and new tests.
I'm sorry, I don't think I clearly explained what I wanted to say here: tests validating new features are obviously not backported. On the other hand, fixes regularly come with a regression test, and often, they are even part of the same commit. So both the fix, and the modified / added test are backported. It is useful to quickly validate a fix on a stable version. Is it something that should not be done?
Running kselftests from the same release will reduce coverage when a new test is added to regression test a 6.11 feature. This happens more often than not. Revision matching example: 6.11.8 tests on 6.11.8 stable
I see, then does that mean tests attached to a fix cannot be backported? If they can, and assuming new tests are validating new features, not old ones, then the impact should be limited, no?
Do you think that from the kernel v6.12 (or before?), the LKFT CI could run the networking kselftests from the version that is being validated, and not from a newer one? So validating the selftests from v6.12.1 on a v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.
It is expected that there will be more skipped tests as you run tests from mainline on stable releases. You will see more skips on older stables.
Indeed, if it is possible to detect when the test should be skipped or adapted on older kernel versions. Some tests cannot be easily adapted to run on older kernel versions. It means they would need to be skipped when running on older versions after having been adapted to support an internal behaviour change, e.g. a packet being formatted differently. That would reduce the code coverage on older kernels then.
An alternative would be to revision match for older stables. New tests could be written for 6.12 which should be run on 6.11 and maybe not on 6.1 depending on missed coverage.
That could be an alternative indeed. When looking at the results of the 5.10 kernel for example, we can see a very high number of failures -- 1/3 for the basic net tests, 2/3 in some net sub-systems -- and not many skips. This doesn't look good.
Before changing the current approach, it is important to understand that running mainline tests on stable releases increases test coverage and that newer tests will not be backported and that the coverage gap will increase overtime.
Understood.
Again, thank you for your reply!
Cheers, Matt
linux-kselftest-mirror@lists.linaro.org