[PATCH 1/2] kunit: tool: make unit test not print parsed testdata to stdout

List overview All Threads
Download

newer

older

[PATCH] kunit:...

Re: [PATCH -next 2/2]...

Daniel Latypov

28 Oct 2022 28 Oct '22

9:02 p.m.

Currently, if you run $ ./tools/testing/kunit/kunit_tool_test.py you'll see a lot of output from the parser as we feed it testdata.

This makes the output hard to read and fairly confusing, esp. since our testdata includes example failures, which get printed out in red.

Silence that output so real failures are easier to see.

Signed-off-by: Daniel Latypov dlatypov@google.com --- tools/testing/kunit/kunit_tool_test.py | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py index e2cd2cc2e98f..a6e53945656e 100755 --- a/tools/testing/kunit/kunit_tool_test.py +++ b/tools/testing/kunit/kunit_tool_test.py @@ -80,6 +80,9 @@ class KconfigTest(unittest.TestCase): self.assertEqual(actual_kconfig, expected_kconfig)

class KUnitParserTest(unittest.TestCase): + def setUp(self): + self.print_mock = mock.patch('kunit_printer.Printer.print').start() + self.addCleanup(mock.patch.stopall)

def assertContains(self, needle: str, haystack: kunit_parser.LineStream): # Clone the iterator so we can print the contents on failure. @@ -485,6 +488,9 @@ class LinuxSourceTreeTest(unittest.TestCase):

class KUnitJsonTest(unittest.TestCase): + def setUp(self): + self.print_mock = mock.patch('kunit_printer.Printer.print').start() + self.addCleanup(mock.patch.stopall)

def _json_for(self, log_file): with open(test_data_path(log_file)) as file:

base-commit: 8f8b51f7d5c8bd3a89e7ea87aed2cdaa52ca5ba4

-- 2.38.1.273.g43a17bfeac-goog

Show replies by date

Daniel Latypov

28 Oct 28 Oct

9:02 p.m.

New subject: [PATCH 2/2] kunit: tool: print summary of failed tests if a few failed out of a lot

E.g. all the hw_breakpoint tests are failing right now. So if I run `kunit.py run --altests --arch=x86_64`, then I see

...

Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7

Seeing which 9 tests failed out of the hundreds is annoying. If my terminal doesn't have scrollback support, I have to resort to looking at `.kunit/test.log` for the `not ok` lines.

Teach kunit.py to print a summarized list of failures if the # of tests reachs an arbitrary threshold (>=100 tests).

To try and keep the output from being too long/noisy, this new logic a) just reports "parent_test failed" if every child test failed b) won't print anything if there are >10 failures (also arbitrary).

With this patch, we get an extra line of output showing:

...

Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7 Failures: hw_breakpoint

This also works with parameterized tests, e.g. if I add a fake failure

...

Failures: kcsan.test_atomic_builtins_missing_barrier.threads=6

Note: we didn't have enough tests for this to be a problem before. But with commit 980ac3ad0512 ("kunit: tool: rename all_test_uml.config, use it for --alltests"), --alltests works and thus running >100 tests will probably become more common.

Signed-off-by: Daniel Latypov dlatypov@google.com --- tools/testing/kunit/kunit_parser.py | 47 ++++++++++++++++++++++++++ tools/testing/kunit/kunit_tool_test.py | 22 ++++++++++++ 2 files changed, 69 insertions(+)

diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py index 1ae873e3e341..94dba66feec5 100644 --- a/tools/testing/kunit/kunit_parser.py +++ b/tools/testing/kunit/kunit_parser.py @@ -58,6 +58,10 @@ class Test: self.counts.errors += 1 stdout.print_with_timestamp(stdout.red('[ERROR]') + f' Test: {self.name}: {error_message}')

+ def ok_status(self) -> bool: + """Returns true if the status was ok, i.e. passed or skipped.""" + return self.status in (TestStatus.SUCCESS, TestStatus.SKIPPED) + class TestStatus(Enum): """An enumeration class to represent the status of a test.""" SUCCESS = auto() @@ -565,6 +569,40 @@ def print_test_footer(test: Test) -> None: stdout.print_with_timestamp(format_test_divider(message, len(message) - stdout.color_len()))

+ + +def _summarize_failed_tests(test: Test) -> str: + """Tries to summarize all the failing subtests in `test`.""" + + def failed_names(test: Test, parent_name: str) -> List[str]: + # Note: we use 'main' internally for the top-level test. + if not parent_name or parent_name == 'main': + full_name = test.name + else: + full_name = parent_name + '.' + test.name + + if not test.subtests: # this is a leaf node + return [full_name] + + # If all the children failed, just say this subtest failed. + # Don't summarize it down "the top-level test failed", though. + failed_subtests = [sub for sub in test.subtests if not sub.ok_status()] + if parent_name and len(failed_subtests) == len(test.subtests): + return [full_name] + + all_failures = [] # type: List[str] + for t in failed_subtests: + all_failures.extend(failed_names(t, full_name)) + return all_failures + + failures = failed_names(test, '') + # If there are too many failures, printing them out will just be noisy. + if len(failures) > 10: # this is an arbitrary limit + return '' + + return 'Failures: ' + ', '.join(failures) + + def print_summary_line(test: Test) -> None: """ Prints summary line of test object. Color of line is dependent on @@ -587,6 +625,15 @@ def print_summary_line(test: Test) -> None: color = stdout.red stdout.print_with_timestamp(color(f'Testing complete. {test.counts}'))

+ # Summarize failures that might have gone off-screen since we had a lot + # of tests (arbitrarily defined as >=100 for now). + if test.ok_status() or test.counts.total() < 100: + return + summarized = _summarize_failed_tests(test) + if not summarized: + return + stdout.print_with_timestamp(color(summarized)) + # Other methods:

def bubble_up_test_results(test: Test) -> None: diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py index a6e53945656e..7dcd67003b23 100755 --- a/tools/testing/kunit/kunit_tool_test.py +++ b/tools/testing/kunit/kunit_tool_test.py @@ -312,6 +312,28 @@ class KUnitParserTest(unittest.TestCase): result.status) self.assertEqual('kunit-resource-test', result.subtests[0].name)

+ def test_summarize_failures(self): + output = """ + KTAP version 1 + 1..2 + # Subtest: all_failed_suite + 1..2 + not ok 1 - test1 + not ok 2 - test2 + not ok 1 - all_failed_suite + # Subtest: some_failed_suite + 1..2 + ok 1 - test1 + not ok 2 - test2 + not ok 1 - some_failed_suite + """ + result = kunit_parser.parse_run_tests(output.splitlines()) + self.assertEqual(kunit_parser.TestStatus.FAILURE, result.status) + + self.assertEqual(kunit_parser._summarize_failed_tests(result), + 'Failures: all_failed_suite, some_failed_suite.test2') + + def line_stream_from_strs(strs: Iterable[str]) -> kunit_parser.LineStream: return kunit_parser.LineStream(enumerate(strs, start=1))

-- 2.38.1.273.g43a17bfeac-goog

David Gow

30 Oct 30 Oct

2:04 p.m.

New subject: [PATCH 2/2] kunit: tool: print summary of failed tests if a few failed out of a lot

On Sat, Oct 29, 2022 at 5:03 AM Daniel Latypov dlatypov@google.com wrote:

...

E.g. all the hw_breakpoint tests are failing right now. So if I run `kunit.py run --altests --arch=x86_64`, then I see

...
Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7

Seeing which 9 tests failed out of the hundreds is annoying. If my terminal doesn't have scrollback support, I have to resort to looking at `.kunit/test.log` for the `not ok` lines.

Teach kunit.py to print a summarized list of failures if the # of tests reachs an arbitrary threshold (>=100 tests).

To try and keep the output from being too long/noisy, this new logic a) just reports "parent_test failed" if every child test failed b) won't print anything if there are >10 failures (also arbitrary).

With this patch, we get an extra line of output showing:

...
Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7 Failures: hw_breakpoint

This also works with parameterized tests, e.g. if I add a fake failure

...
Failures: kcsan.test_atomic_builtins_missing_barrier.threads=6

Note: we didn't have enough tests for this to be a problem before. But with commit 980ac3ad0512 ("kunit: tool: rename all_test_uml.config, use it for --alltests"), --alltests works and thus running >100 tests will probably become more common.

Signed-off-by: Daniel Latypov dlatypov@google.com

I like it! I do think we'll ultimately want some more options for the main results display as well (e.g., only display failed tests, limit the depth of nested results, etc), but this would be useful even then, as the number of tests displayed could still be large. (And you might not know what failures you'd be looking for in advance.)

Reviewed-by: David Gow davidgow@google.com

Cheers, -- David

...

tools/testing/kunit/kunit_parser.py | 47 ++++++++++++++++++++++++++ tools/testing/kunit/kunit_tool_test.py | 22 ++++++++++++ 2 files changed, 69 insertions(+)

diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py index 1ae873e3e341..94dba66feec5 100644 --- a/tools/testing/kunit/kunit_parser.py +++ b/tools/testing/kunit/kunit_parser.py @@ -58,6 +58,10 @@ class Test: self.counts.errors += 1 stdout.print_with_timestamp(stdout.red('[ERROR]') + f' Test: {self.name}: {error_message}')
  def ok_status(self) -> bool:
          """Returns true if the status was ok, i.e. passed or skipped."""
          return self.status in (TestStatus.SUCCESS, TestStatus.SKIPPED)
class TestStatus(Enum): """An enumeration class to represent the status of a test.""" SUCCESS = auto() @@ -565,6 +569,40 @@ def print_test_footer(test: Test) -> None: stdout.print_with_timestamp(format_test_divider(message, len(message) - stdout.color_len()))

+def _summarize_failed_tests(test: Test) -> str:
  """Tries to summarize all the failing subtests in `test`."""
  def failed_names(test: Test, parent_name: str) -> List[str]:
          # Note: we use 'main' internally for the top-level test.
          if not parent_name or parent_name == 'main':
                  full_name = test.name
          else:
                  full_name = parent_name + '.' + test.name
          if not test.subtests:  # this is a leaf node
                  return [full_name]
          # If all the children failed, just say this subtest failed.
          # Don't summarize it down "the top-level test failed", though.
          failed_subtests = [sub for sub in test.subtests if not sub.ok_status()]
          if parent_name and len(failed_subtests) ==  len(test.subtests):
                  return [full_name]
          all_failures = []  # type: List[str]
          for t in failed_subtests:
                  all_failures.extend(failed_names(t, full_name))
          return all_failures
  failures = failed_names(test, '')
  # If there are too many failures, printing them out will just be noisy.
  if len(failures) > 10:  # this is an arbitrary limit
          return ''
  return 'Failures: ' + ', '.join(failures)
def print_summary_line(test: Test) -> None: """ Prints summary line of test object. Color of line is dependent on @@ -587,6 +625,15 @@ def print_summary_line(test: Test) -> None: color = stdout.red stdout.print_with_timestamp(color(f'Testing complete. {test.counts}'))
  # Summarize failures that might have gone off-screen since we had a lot
  # of tests (arbitrarily defined as >=100 for now).
  if test.ok_status() or test.counts.total() < 100:
          return
  summarized = _summarize_failed_tests(test)
  if not summarized:
          return
  stdout.print_with_timestamp(color(summarized))
# Other methods:

def bubble_up_test_results(test: Test) -> None: diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py index a6e53945656e..7dcd67003b23 100755 --- a/tools/testing/kunit/kunit_tool_test.py +++ b/tools/testing/kunit/kunit_tool_test.py @@ -312,6 +312,28 @@ class KUnitParserTest(unittest.TestCase): result.status) self.assertEqual('kunit-resource-test', result.subtests[0].name)
  def test_summarize_failures(self):
          output = """
          KTAP version 1
          1..2
                  # Subtest: all_failed_suite
                  1..2
                  not ok 1 - test1
                  not ok 2 - test2
          not ok 1 - all_failed_suite
                  # Subtest: some_failed_suite
                  1..2
                  ok 1 - test1
                  not ok 2 - test2
          not ok 1 - some_failed_suite
          """
          result = kunit_parser.parse_run_tests(output.splitlines())
          self.assertEqual(kunit_parser.TestStatus.FAILURE, result.status)
          self.assertEqual(kunit_parser._summarize_failed_tests(result),
                  'Failures: all_failed_suite, some_failed_suite.test2')
def line_stream_from_strs(strs: Iterable[str]) -> kunit_parser.LineStream: return kunit_parser.LineStream(enumerate(strs, start=1))

-- 2.38.1.273.g43a17bfeac-goog

Daniel Latypov

31 Oct 31 Oct

5:54 p.m.

New subject: [PATCH 2/2] kunit: tool: print summary of failed tests if a few failed out of a lot

On Sun, Oct 30, 2022 at 7:05 AM David Gow davidgow@google.com wrote:

...

On Sat, Oct 29, 2022 at 5:03 AM Daniel Latypov dlatypov@google.com wrote:

...
E.g. all the hw_breakpoint tests are failing right now. So if I run `kunit.py run --altests --arch=x86_64`, then I see

...
Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7

Seeing which 9 tests failed out of the hundreds is annoying. If my terminal doesn't have scrollback support, I have to resort to looking at `.kunit/test.log` for the `not ok` lines.

Teach kunit.py to print a summarized list of failures if the # of tests reachs an arbitrary threshold (>=100 tests).

To try and keep the output from being too long/noisy, this new logic a) just reports "parent_test failed" if every child test failed b) won't print anything if there are >10 failures (also arbitrary).

With this patch, we get an extra line of output showing:

...
Testing complete. Ran 408 tests: passed: 392, failed: 9, skipped: 7 Failures: hw_breakpoint

This also works with parameterized tests, e.g. if I add a fake failure

...
Failures: kcsan.test_atomic_builtins_missing_barrier.threads=6

Note: we didn't have enough tests for this to be a problem before. But with commit 980ac3ad0512 ("kunit: tool: rename all_test_uml.config, use it for --alltests"), --alltests works and thus running >100 tests will probably become more common.

Signed-off-by: Daniel Latypov dlatypov@google.com

I like it! I do think we'll ultimately want some more options for the main results display as well (e.g., only display failed tests, limit the depth of nested results, etc), but this would be useful even then, as the number of tests displayed could still be large. (And you might not know what failures you'd be looking for in advance.)

Reviewed-by: David Gow davidgow@google.com

Agreed, there's a lot of room to play around with the main output. The hope here is this is enough to tide us over (usability-wise) until we get around to that.

E.g. in the future, it might make sense to only print suite names by default. If subtests (test cases and individual parameters) fail, we could print those in expanded detail. But there's obviously tradeoffs: * the real time output is nice, esp. since some test cases are slower than others * I think most people are only running 1-2 suites at a time right now

Another thing we could do is optionally use \r to use only the last few lines for in-progress output? E.g.

t=1 Running suite: example [PASSED] example_simple_test

t=2, use \r to update the test case line Running suite: example [SKIPPED] example_skip_test

Then we could print out the results of interest in more detail at the end.

David Gow

30 Oct 30 Oct

3:31 a.m.

On Sat, Oct 29, 2022 at 5:03 AM Daniel Latypov dlatypov@google.com wrote:

...

Currently, if you run $ ./tools/testing/kunit/kunit_tool_test.py you'll see a lot of output from the parser as we feed it testdata.

This makes the output hard to read and fairly confusing, esp. since our testdata includes example failures, which get printed out in red.

Silence that output so real failures are easier to see.

Signed-off-by: Daniel Latypov dlatypov@google.com

Thanks -- this has been annoying me for ages.

That being said, this isn't a perfect fix, the "usage" text and "Reconfiguring .config" still show up for me: --- davidgow@slicestar:~/Development/linux-kselftest$ ./tools/testing/kunit/kunit_tool_test.py ..............................usage: kunit_tool_test.py run [-h] [--build_dir DIR] [--make_options X=Y] [--alltests] [--kunitconfig PATHS] [--kconfig_add CONFIG_X=Y] [--arch ARCH] [--cross_compile PREFIX] [--qemu_config FILE] [--qemu_ar gs] [--jobs N] [--timeout SECONDS] [--kernel_args] [--run_isolated {suite,test}] [--raw_output [{all,kunit}]] [--json [FILE]] [filter_glob] kunit_tool_test.py run: error: argument --raw_output: invalid choice: 'invalid' (choose from 'all', 'kunit') ..............................Generating .config ... .Regenerating .config ... ......... ---------------------------------------------------------------------- Ran 70 tests in 0.232s

OK ---

That's still a significant improvement on what we had before, though, so:

Reviewed-by: David Gow davidgow@google.com

Cheers, -- David

...

tools/testing/kunit/kunit_tool_test.py | 6 ++++++ 1 file changed, 6 insertions(+)

diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py index e2cd2cc2e98f..a6e53945656e 100755 --- a/tools/testing/kunit/kunit_tool_test.py +++ b/tools/testing/kunit/kunit_tool_test.py @@ -80,6 +80,9 @@ class KconfigTest(unittest.TestCase): self.assertEqual(actual_kconfig, expected_kconfig)

class KUnitParserTest(unittest.TestCase):
  def setUp(self):
          self.print_mock = mock.patch('kunit_printer.Printer.print').start()
          self.addCleanup(mock.patch.stopall)

  def assertContains(self, needle: str, haystack: kunit_parser.LineStream):
          # Clone the iterator so we can print the contents on failure.
@@ -485,6 +488,9 @@ class LinuxSourceTreeTest(unittest.TestCase):

class KUnitJsonTest(unittest.TestCase):
  def setUp(self):
          self.print_mock = mock.patch('kunit_printer.Printer.print').start()
          self.addCleanup(mock.patch.stopall)

  def _json_for(self, log_file):
          with open(test_data_path(log_file)) as file:
base-commit: 8f8b51f7d5c8bd3a89e7ea87aed2cdaa52ca5ba4

2.38.1.273.g43a17bfeac-goog

Daniel Latypov

31 Oct 31 Oct

4:37 p.m.

On Sat, Oct 29, 2022 at 8:31 PM David Gow davidgow@google.com wrote:

...

On Sat, Oct 29, 2022 at 5:03 AM Daniel Latypov dlatypov@google.com wrote:

...
Currently, if you run $ ./tools/testing/kunit/kunit_tool_test.py you'll see a lot of output from the parser as we feed it testdata.

This makes the output hard to read and fairly confusing, esp. since our testdata includes example failures, which get printed out in red.

Silence that output so real failures are easier to see.

Signed-off-by: Daniel Latypov dlatypov@google.com

Thanks -- this has been annoying me for ages.

That being said, this isn't a perfect fix, the "usage" text and "Reconfiguring .config" still show up for me:

davidgow@slicestar:~/Development/linux-kselftest$ ./tools/testing/kunit/kunit_tool_test.py ..............................usage: kunit_tool_test.py run [-h] [--build_dir DIR] [--make_options X=Y] [--alltests] [--kunitconfig PATHS] [--kconfig_add CONFIG_X=Y] [--arch ARCH] [--cross_compile PREFIX] [--qemu_config FILE] [--qemu_ar gs] [--jobs N] [--timeout SECONDS] [--kernel_args] [--run_isolated {suite,test}] [--raw_output [{all,kunit}]] [--json [FILE]] [filter_glob] kunit_tool_test.py run: error: argument --raw_output: invalid choice: 'invalid' (choose from 'all', 'kunit') ..............................Generating .config ... .Regenerating .config ... .........

Ran 70 tests in 0.232s

OK

That's still a significant improvement on what we had before, though, so:

Yeah, I was originally going to make this patch silence all the output. But I figured I would focus this patch on just the code using our kunit_printer abstraction, aka the parser.

We could address the other ones, but we'd have to mock out the built-in print function. As someone who is guilty of print(f) debugging, doing that means I no longer have an easy of way of adding debugging code to those test cases :P

1165

days inactive

1168

days old

linux-kselftest-mirror@lists.linaro.org

5 comments

participants

tags (0)

participants (2)

Daniel Latypov
David Gow