These patches aim to make using the openvswitch testsuite more reliable. These should address the major sources of flakiness in the openvswitch test suite allowing the CI infrastructure to exercise the openvswitch module for patch series. There should be no change for users who simply run the tests (except that patch 3/3 does make some of the debugging a bit easier by making some output more verbose).
Aaron Conole (3): selftests: openvswitch: Bump timeout to 15 minutes. selftests: openvswitch: Attempt to autoload module. selftests: openvswitch: Be more verbose with selftest debugging.
.../selftests/net/openvswitch/openvswitch.sh | 23 ++++++++++++------- .../selftests/net/openvswitch/settings | 1 + 2 files changed, 16 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/net/openvswitch/settings
We found that since some tests rely on the TCP SYN timeouts to cause flow misses, the default test suite timeout of 45 seconds is quick to be exceeded. Bump the timeout to 15 minutes.
Signed-off-by: Aaron Conole aconole@redhat.com --- tools/testing/selftests/net/openvswitch/settings | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/net/openvswitch/settings
diff --git a/tools/testing/selftests/net/openvswitch/settings b/tools/testing/selftests/net/openvswitch/settings new file mode 100644 index 000000000000..e2206265f67c --- /dev/null +++ b/tools/testing/selftests/net/openvswitch/settings @@ -0,0 +1 @@ +timeout=900
On Tue, Jul 02, 2024 at 09:28:28AM -0400, Aaron Conole wrote:
We found that since some tests rely on the TCP SYN timeouts to cause flow misses, the default test suite timeout of 45 seconds is quick to be exceeded. Bump the timeout to 15 minutes.
Signed-off-by: Aaron Conole aconole@redhat.com
Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org
FWIIW, locally I had been using a timeout of 720s. So 900 seems entirely reasonable to me.
Previously, the openvswitch.sh test suites would not attempt to autoload the openvswitch module. The idea was that a user who is manually running tests might not even have the OVS module loaded or configured for their own development. However, if the kernel module is configured, and the module can be autoloaded then we should just attempt to load it and run the tests. This is especially true in the CI environments, where the CI tests should be able to rely on auto loading to get the test suite running.
Signed-off-by: Aaron Conole aconole@redhat.com --- .../selftests/net/openvswitch/openvswitch.sh | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 15bca0708717..0bd0425848d9 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -613,16 +613,20 @@ run_test() { tname="$1" tdesc="$2"
- if ! lsmod | grep openvswitch >/dev/null 2>&1; then - stdbuf -o0 printf "TEST: %-60s [NOMOD]\n" "${tdesc}" - return $ksft_skip - fi - if python3 ovs-dpctl.py -h 2>&1 | \ grep -E "Need to (install|upgrade) the python" >/dev/null 2>&1; then stdbuf -o0 printf "TEST: %-60s [PYLIB]\n" "${tdesc}" return $ksft_skip fi + + python3 ovs-dpctl.py show >/dev/null 2>&1 || \ + echo "[DPCTL] show exception." + + if ! lsmod | grep openvswitch >/dev/null 2>&1; then + stdbuf -o0 printf "TEST: %-60s [NOMOD]\n" "${tdesc}" + return $ksft_skip + fi + printf "TEST: %-60s [START]\n" "${tname}"
unset IFS
On Tue, Jul 02, 2024 at 09:28:29AM -0400, Aaron Conole wrote:
Previously, the openvswitch.sh test suites would not attempt to autoload the openvswitch module. The idea was that a user who is manually running tests might not even have the OVS module loaded or configured for their own development. However, if the kernel module is configured, and the module can be autoloaded then we should just attempt to load it and run the tests. This is especially true in the CI environments, where the CI tests should be able to rely on auto loading to get the test suite running.
Signed-off-by: Aaron Conole aconole@redhat.com
Reviewed-by: Simon Horman horms@kernel.org Tested-by: Simon Horman horms@kernel.org
The openvswitch selftest is difficult to debug for anyone that isn't directly familiar with the openvswitch module and the specifics of the test cases. Many times when something fails, the debug log will be sparsely populated and it takes some time to understand where a failure occured.
Increase the amount of details logged to the debug log by trapping all 'info' logs, and all 'ovs_sbx' commands.
Signed-off-by: Aaron Conole aconole@redhat.com --- NOTE: There is a conflict here with a patch on list that adds psample support, but it should be simple to resolve, since the conflict would be due to a context change in tests="". I can also respin if the patches collide.
tools/testing/selftests/net/openvswitch/openvswitch.sh | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh index 0bd0425848d9..531951086d9c 100755 --- a/tools/testing/selftests/net/openvswitch/openvswitch.sh +++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh @@ -23,7 +23,9 @@ tests=" drop_reason drop: test drop reasons are emitted"
info() { - [ $VERBOSE = 0 ] || echo $* + [ "${ovs_dir}" != "" ] && + echo "`date +"[%m-%d %H:%M:%S]"` $*" >> ${ovs_dir}/debug.log + [ $VERBOSE = 0 ] || echo $* }
ovs_base=`pwd` @@ -65,7 +67,8 @@ ovs_setenv() {
ovs_sbx() { if test "X$2" != X; then - (ovs_setenv $1; shift; "$@" >> ${ovs_dir}/debug.log) + (ovs_setenv $1; shift; + info "run cmd: $@"; "$@" >> ${ovs_dir}/debug.log) else ovs_setenv $1 fi @@ -139,7 +142,7 @@ ovs_add_flow () { info "Adding flow to DP: sbx:$1 br:$2 flow:$3 act:$4" ovs_sbx "$1" python3 $ovs_base/ovs-dpctl.py add-flow "$2" "$3" "$4" if [ $? -ne 0 ]; then - echo "Flow [ $3 : $4 ] failed" >> ${ovs_dir}/debug.log + info "Flow [ $3 : $4 ] failed" return 1 fi return 0
On Tue, Jul 02, 2024 at 09:28:30AM -0400, Aaron Conole wrote:
The openvswitch selftest is difficult to debug for anyone that isn't directly familiar with the openvswitch module and the specifics of the test cases. Many times when something fails, the debug log will be sparsely populated and it takes some time to understand where a failure occured.
Increase the amount of details logged to the debug log by trapping all 'info' logs, and all 'ovs_sbx' commands.
Signed-off-by: Aaron Conole aconole@redhat.com
Reviewed-by: Simon Horman horms@kernel.org
Hello:
This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Tue, 2 Jul 2024 09:28:27 -0400 you wrote:
These patches aim to make using the openvswitch testsuite more reliable. These should address the major sources of flakiness in the openvswitch test suite allowing the CI infrastructure to exercise the openvswitch module for patch series. There should be no change for users who simply run the tests (except that patch 3/3 does make some of the debugging a bit easier by making some output more verbose).
[...]
Here is the summary with links: - [net-next,1/3] selftests: openvswitch: Bump timeout to 15 minutes. https://git.kernel.org/netdev/net-next/c/ff015706fc73 - [net-next,2/3] selftests: openvswitch: Attempt to autoload module. https://git.kernel.org/netdev/net-next/c/818481db3df4 - [net-next,3/3] selftests: openvswitch: Be more verbose with selftest debugging. https://git.kernel.org/netdev/net-next/c/7abfd8ecb785
You are awesome, thank you!
On Tue, 2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
These patches aim to make using the openvswitch testsuite more reliable. These should address the major sources of flakiness in the openvswitch test suite allowing the CI infrastructure to exercise the openvswitch module for patch series. There should be no change for users who simply run the tests (except that patch 3/3 does make some of the debugging a bit easier by making some output more verbose).
Hi Aaron!
The results look solid on normal builds now, but with a debug kernel the test is failing consistently:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=...
Jakub Kicinski kuba@kernel.org writes:
On Tue, 2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
These patches aim to make using the openvswitch testsuite more reliable. These should address the major sources of flakiness in the openvswitch test suite allowing the CI infrastructure to exercise the openvswitch module for patch series. There should be no change for users who simply run the tests (except that patch 3/3 does make some of the debugging a bit easier by making some output more verbose).
Hi Aaron!
The results look solid on normal builds now, but with a debug kernel the test is failing consistently:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=...
Yes - it shows a test case issue with the upcall and psample tests.
Adrian and I discussed the correct approach would be using a wait_for instead of just sleeping, because it seems the dbg environment might be too racy. I think he is working on a follow up to submit after the psample work gets merged - we were hoping not to hold that patch series up with more potential conflicts or merge issues if that's okay.
On Fri, 05 Jul 2024 09:49:12 -0400 Aaron Conole wrote:
The results look solid on normal builds now, but with a debug kernel the test is failing consistently:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=...
Yes - it shows a test case issue with the upcall and psample tests.
Adrian and I discussed the correct approach would be using a wait_for instead of just sleeping, because it seems the dbg environment might be too racy. I think he is working on a follow up to submit after the psample work gets merged - we were hoping not to hold that patch series up with more potential conflicts or merge issues if that's okay.
Makes sense, thanks!
On Fri, Jul 05, 2024 at 09:49:12AM GMT, Aaron Conole wrote:
Jakub Kicinski kuba@kernel.org writes:
On Tue, 2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
These patches aim to make using the openvswitch testsuite more reliable. These should address the major sources of flakiness in the openvswitch test suite allowing the CI infrastructure to exercise the openvswitch module for patch series. There should be no change for users who simply run the tests (except that patch 3/3 does make some of the debugging a bit easier by making some output more verbose).
Hi Aaron!
The results look solid on normal builds now, but with a debug kernel the test is failing consistently:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=...
Yes - it shows a test case issue with the upcall and psample tests.
Adrian and I discussed the correct approach would be using a wait_for instead of just sleeping, because it seems the dbg environment might be too racy. I think he is working on a follow up to submit after the psample work gets merged - we were hoping not to hold that patch series up with more potential conflicts or merge issues if that's okay.
Yes. I am working on a patch to solve the failures in slow systems.
Thanks. Adrián
linux-kselftest-mirror@lists.linaro.org