ksft runner sends 2 SIGTERMs in a row if a test runs out of time. Handle this in a similar way we handle SIGINT - cleanup and stop running further tests.
Because we get 2 signals we need a bit of logic to ignore the subsequent one, they come immediately one after the other (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM to runner child")).
This change makes sure we run cleanup (scheduled defer()s) and also print a stack trace on SIGTERM, which doesn't happen by default. Tests occasionally hang in NIPA and it's impossible to tell what they are waiting from or doing.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- CC: petrm@nvidia.com CC: willemb@google.com CC: sdf@fomichev.me CC: linux-kselftest@vger.kernel.org --- tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py index 3cfad0fd4570..73710634d457 100644 --- a/tools/testing/selftests/net/lib/py/ksft.py +++ b/tools/testing/selftests/net/lib/py/ksft.py @@ -3,6 +3,7 @@ import builtins import functools import inspect +import signal import sys import time import traceback @@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True pass
+class KsftTerminate(KeyboardInterrupt): + pass + + def ksft_pr(*objs, **kwargs): print("#", *objs, **kwargs)
@@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env
+term_cnt = 0 + +def _ksft_intr(signum, frame): + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout + # if we don't ignore the second one it will stop us from handling cleanup + global term_cnt + term_cnt += 1 + if term_cnt == 1: + raise KsftTerminate() + else: + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...") + + def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or []
@@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break
+ global term_cnt + term_cnt = 0 + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr) + totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13") @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt) + stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop: - ksft_pr("Stopping tests due to KeyboardInterrupt.") + ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail'
@@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True if stop: break
+ signal.signal(signal.SIGTERM, prev_sigterm) + print( f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0" )
Jakub Kicinski wrote:
ksft runner sends 2 SIGTERMs in a row if a test runs out of time. Handle this in a similar way we handle SIGINT - cleanup and stop running further tests.
Because we get 2 signals we need a bit of logic to ignore the subsequent one, they come immediately one after the other (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM to runner child")).
This change makes sure we run cleanup (scheduled defer()s) and also print a stack trace on SIGTERM, which doesn't happen by default. Tests occasionally hang in NIPA and it's impossible to tell what they are waiting from or doing.
Signed-off-by: Jakub Kicinski kuba@kernel.org
CC: petrm@nvidia.com CC: willemb@google.com CC: sdf@fomichev.me CC: linux-kselftest@vger.kernel.org
tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py index 3cfad0fd4570..73710634d457 100644 --- a/tools/testing/selftests/net/lib/py/ksft.py +++ b/tools/testing/selftests/net/lib/py/ksft.py @@ -3,6 +3,7 @@ import builtins import functools import inspect +import signal import sys import time import traceback @@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True pass +class KsftTerminate(KeyboardInterrupt):
- pass
def ksft_pr(*objs, **kwargs): print("#", *objs, **kwargs) @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env +term_cnt = 0
A bit ugly to initialize this here. Also, it already is initialized below.
+def _ksft_intr(signum, frame):
- # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
- # if we don't ignore the second one it will stop us from handling cleanup
- global term_cnt
- term_cnt += 1
- if term_cnt == 1:
raise KsftTerminate()
- else:
ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or [] @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break
- global term_cnt
- term_cnt = 0
- prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
- totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13") @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt)
stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop:
ksft_pr("Stopping tests due to KeyboardInterrupt.")
ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail'
@@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True if stop: break
- signal.signal(signal.SIGTERM, prev_sigterm)
Why is prev_sigterm saved and reassigned as handler here?
print( f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0" )
-- 2.49.0
On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
@@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env +term_cnt = 0
A bit ugly to initialize this here. Also, it already is initialized below.
We need a global so that the signal handler can access it. Python doesn't have syntax to define a variable without a value. Or do you suggest term_cnt = None ?
The whole term_cnt dance is super ugly, couldn't think of a cleaner way. It's really annoying that ksft infra sends 2 terminating signals one immediately after the other :|
+def _ksft_intr(signum, frame):
- # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
- # if we don't ignore the second one it will stop us from handling cleanup
- global term_cnt
- term_cnt += 1
- if term_cnt == 1:
raise KsftTerminate()
- else:
ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or [] @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break
- global term_cnt
- term_cnt = 0
- prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
- totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13") @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt)
stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop:
ksft_pr("Stopping tests due to KeyboardInterrupt.")
ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail'
@@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True if stop: break
- signal.signal(signal.SIGTERM, prev_sigterm)
Why is prev_sigterm saved and reassigned as handler here?
Because we ignore all signals when cnt > 2 I didn't want to keep our handler installed. Just in case something after ksft_run() hangs. It should be equivalent to
signal.signal(signal.SIGTERM, signal.SIG_DLF)
if the prev is of concern. Then again keeping prev doesn't change #LOC
Reviewed-by: Willem de Bruijn willemb@google.com
Jakub Kicinski wrote:
On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
@@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env +term_cnt = 0
A bit ugly to initialize this here. Also, it already is initialized below.
We need a global so that the signal handler can access it. Python doesn't have syntax to define a variable without a value. Or do you suggest term_cnt = None ?
I meant that the "global term_cnt" in ksft_run below already creates the global var, and is guaranteed to do so before _ksft_intr, so no need to also define it outside a function.
Obviously not very important, don't mean to ask for a respin. LGTM.
The whole term_cnt dance is super ugly, couldn't think of a cleaner way. It's really annoying that ksft infra sends 2 terminating signals one immediately after the other :|
+def _ksft_intr(signum, frame):
- # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
- # if we don't ignore the second one it will stop us from handling cleanup
- global term_cnt
- term_cnt += 1
- if term_cnt == 1:
raise KsftTerminate()
- else:
ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or [] @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break
- global term_cnt
- term_cnt = 0
- prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
- totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13") @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt)
stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop:
ksft_pr("Stopping tests due to KeyboardInterrupt.")
ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail'
@@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True if stop: break
- signal.signal(signal.SIGTERM, prev_sigterm)
Why is prev_sigterm saved and reassigned as handler here?
Because we ignore all signals when cnt > 2 I didn't want to keep our handler installed. Just in case something after ksft_run() hangs. It should be equivalent to
signal.signal(signal.SIGTERM, signal.SIG_DLF)
if the prev is of concern. Then again keeping prev doesn't change #LOC
Oh I see. Ok.
On 4/29/25 3:27 AM, Willem de Bruijn wrote:
Reviewed-by: Willem de Bruijn willemb@google.com
Jakub Kicinski wrote:
On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
@@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True return env +term_cnt = 0
A bit ugly to initialize this here. Also, it already is initialized below.
We need a global so that the signal handler can access it. Python doesn't have syntax to define a variable without a value. Or do you suggest term_cnt = None ?
I meant that the "global term_cnt" in ksft_run below already creates the global var, and is guaranteed to do so before _ksft_intr, so no need to also define it outside a function.
Obviously not very important, don't mean to ask for a respin. LGTM.
FWIW I think it's better to avoid the unneeded assignment in global scope, so I would suggest either follow-up or a v2, whatever is simpler.
Thanks,
Paolo
On Mon, 28 Apr 2025 21:27:32 -0400 Willem de Bruijn wrote:
A bit ugly to initialize this here. Also, it already is initialized below.
We need a global so that the signal handler can access it. Python doesn't have syntax to define a variable without a value. Or do you suggest term_cnt = None ?
I meant that the "global term_cnt" in ksft_run below already creates the global var, and is guaranteed to do so before _ksft_intr, so no need to also define it outside a function.
Obviously not very important, don't mean to ask for a respin. LGTM.
Oh wow, thanks! totally didn't know that using the global is enough to add something to the global scope.
linux-kselftest-mirror@lists.linaro.org