On 3/17/21 7:53 PM, David Rientjes wrote:
On Wed, 17 Mar 2021, Vlastimil Babka wrote:
[ 22.154049] random: get_random_u32 called from __kmem_cache_create+0x23/0x3e0 with crng_init=0 [ 22.154070] random: get_random_u32 called from cache_random_seq_create+0x7c/0x140 with crng_init=0 [ 22.154167] random: get_random_u32 called from allocate_slab+0x155/0x5e0 with crng_init=0 [ 22.154690] test_slub: 1. kmem_cache: Clobber Redzone 0x12->0x(ptrval) [ 22.164499] ============================================================================= [ 22.166629] BUG TestSlub_RZ_alloc (Not tainted): Redzone overwritten [ 22.168179] ----------------------------------------------------------------------------- [ 22.168179] [ 22.168372] Disabling lock debugging due to kernel taint [ 22.168372] INFO: 0x(ptrval)-0x(ptrval) @offset=1064. First byte 0x12 instead of 0xcc [ 22.168372] INFO: Allocated in resiliency_test+0x47/0x1be age=3 cpu=0 pid=1 [ 22.168372] __slab_alloc+0x57/0x80 [ 22.168372] kmem_cache_alloc (kbuild/src/consumer/mm/slub.c:2871 kbuild/src/consumer/mm/slub.c:2915 kbuild/src/consumer/mm/slub.c:2920) [ 22.168372] resiliency_test (kbuild/src/consumer/lib/test_slub.c:34 kbuild/src/consumer/lib/test_slub.c:107) [ 22.168372] test_slub_init (kbuild/src/consumer/lib/test_slub.c:124) [ 22.168372] do_one_initcall (kbuild/src/consumer/init/main.c:1226) [ 22.168372] kernel_init_freeable (kbuild/src/consumer/init/main.c:1298 kbuild/src/consumer/init/main.c:1315 kbuild/src/consumer/init/main.c:1335 kbuild/src/consumer/init/main.c:1537) [ 22.168372] kernel_init (kbuild/src/consumer/init/main.c:1426) [ 22.168372] ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_32.S:856) [ 22.168372] INFO: Slab 0x(ptrval) objects=16 used=1 fp=0x(ptrval) flags=0x40000201 [ 22.168372] INFO: Object 0x(ptrval) @offset=1000 fp=0x(ptrval) [ 22.168372] [ 22.168372] Redzone (ptrval): cc cc cc cc cc cc cc cc ........ [ 22.168372] Object (ptrval): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 22.168372] Object (ptrval): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. [ 22.168372] Redzone (ptrval): 12 cc cc cc .... [ 22.168372] Padding (ptrval): 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [ 22.168372] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 5.12.0-rc2-00001-ge48d82b67a2b #1 [ 22.168372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 22.168372] Call Trace: [ 22.168372] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122) [ 22.168372] print_trailer (kbuild/src/consumer/mm/slub.c:737) [ 22.168372] check_bytes_and_report.cold (kbuild/src/consumer/mm/slub.c:807) [ 22.168372] check_object (kbuild/src/consumer/mm/slub.c:914) [ 22.168372] validate_slab (kbuild/src/consumer/mm/slub.c:4635)
Hm but in this case the output means the tested functionality (slub debugging) is working as intended. So what can we do? Indicate/teach somehow to the bot that this is OK? Does kselftest have some support for this? Or silence the validation output for testing purposes? (I would prefer not to)
Unless you're familiar with everything that CONFIG_TEST_SLUB does, maybe this could be inferred as an actual issue that the test has uncovered that is unexpected?
I don't have a good way of silencing the check_bytes_and_report() output other than a big hammer: implement {disable,enable}_slub_warnings() that the resiliency test could call into before triggering these checks.
So Oliver has implemented this, but now I got a different idea that should be much cleaner IMHO. We could add a per-cache flag SLAB_SILENT_ERRORS (similar to SLAB_RED_ZONE etc) instead of a global bool. The test would just create the caches with this flag. The flag should be added to the SLAB_NEVER_MERGE, SLAB_DEBUG_FLAGS, SLAB_FLAGS_PERMITTED macros as well.
A similar suggestion is that adding the errors counter parameter to all validate_slab_cache() and relevant functions is tedious - there are more that had to be modified like this than initially expected. Instead the error counter can be added to SLUB's struct kmem_cache definition, incremented by the various checks and the tests can look at that after validation.
Thanks, Vlastimil