Re: [PATCH v4 09/14] selftests: harness: Move teardown conditional into test metadata

11 Jun 2025


      On Tue, Jun 10, 2025 at 08:46:57PM -0300, Jason Gunthorpe wrote:
...
On Tue, Jun 10, 2025 at 11:48:44AM -0700, Nicolin Chen wrote:
...
On Tue, Jun 10, 2025 at 09:09:02AM -0300, Jason Gunthorpe wrote:
...
On Tue, Jun 10, 2025 at 01:38:22PM +0200, Thomas Weißschuh wrote:
...
...

#  RUN           iommufd_dirty_tracking.domain_dirty128M_huge.enforce_dirty ...
# enforce_dirty: Test terminated unexpectedly by signal 11
Sig 11 is weird..
...
...
On another note, the selftest should use the kselftest_harness' ASSERT_*()
macros instead of plain assert().
IIRC the kselftest stuff explodes if you try to use it's assert
functions within a fixture setup/teardown context.
I also wasn't able to reproduce this (x86 ubuntu 24 LTS OS) Maybe
it is ARM specific, I think Nicolin is running on ARM..
Yes. And I was running with 64KB page size. I just quickly retried
with 4KB page size (matching x86), and all failed tests pass now.
That's a weird thing to be sensitive too. Can you get a backtrace from
the crash, what function/line is crashing?
I think I am getting what's going on. Here the harness code has a
parent process and a child process:
--------------------------------------------------------------
428-            /* _metadata and potentially self are shared with all forks. */ \
429:            child = fork(); \
430:            if (child == 0) { \
431-                    fixture_name##_setup(_metadata, self, variant->data); \
432-                    /* Let setup failure terminate early. */ \
433-                    if (_metadata->exit_code) \
434-                            _exit(0); \
435-                    *_metadata->no_teardown = false; \
436-                    fixture_name##_##test_name(_metadata, self, variant->data); \
437-                    _metadata->teardown_fn(false, _metadata, self, variant->data); \
438-                    _exit(0); \
439:            } else if (child < 0 || child != waitpid(child, &status, 0)) { \
440-                    ksft_print_msg("ERROR SPAWNING TEST GRANDCHILD\n"); \
441-                    _metadata->exit_code = KSFT_FAIL; \
442-            } \
443-            _metadata->teardown_fn(true, _metadata, self, variant->data); \
444-            munmap(_metadata->no_teardown, sizeof(*_metadata->no_teardown)); \
445-            _metadata->no_teardown = NULL; \
446-            if (self && fixture_name##_teardown_parent) \
447-                    munmap(self, sizeof(*self)); \
448-            if (WIFEXITED(status)) { \
449-                    if (WEXITSTATUS(status)) \
450-                            _metadata->exit_code = WEXITSTATUS(status); \
451-            } else if (WIFSIGNALED(status)) { \
452-                    /* Forward signal to __wait_for_test(). */ \
453-                    kill(getpid(), WTERMSIG(status)); \
454-            } \
....
456-    static void wrapper_##fixture_name##_##test_name##_teardown( \
457-            bool in_parent, struct __test_metadata *_metadata, \
458-            void *self, const void *variant) \
459-    { \
460-            if (fixture_name##_teardown_parent == in_parent && \
461-                            !__atomic_test_and_set(_metadata->no_teardown, __ATOMIC_RELAXED)) \
462-                    fixture_name##_teardown(_metadata, self, variant); \
463-    } \
--------------------------------------------------------------
I found there is a race between those two processes, resulting in
the teardown() not getting invoked: I added some ksft_print_msg()
calls in-between the lines to debug, those tests can pass mostly,
as teardown() got invoked.
I think the reason why those huge page cases fail is just because 
the huge version of setup() takes longer time..
I haven't figured out a proper fix yet, but something smells bad:
1) *no_teardown is set non-atomically, while both processes calls
   __atomic_test_and_set()
2) parent doesn't seem to wait for the setup() to complete..
3) when parent runs faster than the child that is still running
   setup(), the parent unmaps the no_teardown and set it to NULL,
   then UAF in the child, i.e. signal 11?
I think Thomas should have an idea with these info :)
Thanks
Nicolin

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4 09/14] selftests: harness: Move teardown conditional into test metadata