Let me add Richard to the CC list. See lore for more details. https://lore.kernel.org/all/CA+G9fYuA643RHHpPnz9Ww7rr3zV5a0y=7_uFcybBSL=QP_s...
On Tue, Oct 31, 2023 at 09:57:48PM +0530, Naresh Kamboju wrote:
On Mon, 30 Oct 2023 at 14:33, Dan Carpenter dan.carpenter@linaro.org wrote:
We have started printing more and more intentional stack traces. Whether it's testing KASAN is able to detect use after frees or it's part of a kunit test.
These stack traces can be problematic. They suddenly show up as a new failure. Now the test team has to contact the developers. A bunch of people have to investigate the bug. We finally decide that it's intentional so now the test team has to update their filter scripts to mark it as intentional. These filters are ad-hoc because there is no standard format for warnings.
A better way would be to mark it as intentional from the start.
Here, I have marked the beginning and the end of the trace. It's more tricky for things like lkdtm_FORTIFY_MEM_MEMBER() where the flow doesn't reach the end of the function. I guess I would print a different warning for stack traces that can't have a "Intentional warning finished\n" message at the end.
I haven't actually tested this patch... Daniel, do you have a list of intentional stack traces we could annotate?
[My two cents]
I have been noticing following kernel warnings / BUGs
Some are intentional and some are not. I had a similar thing happen to me last week where I had too many Smatch false positives in my devel code so I accidentally sent a patch with a stupid bug. I've since updated my QC process to run both the devel and released versions of Smatch.
But a similar thing is happening here where we have so many bogus warnings that we missed a real bug.
These are starting happening from next-20231009. I am not sure which are "Intentional warnings" or real regressions.
[ 37.378220] BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xc4/0x300 [ 37.645506] BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xec/0x300 .. [ 632.407425] BUG: KASAN: null-ptr-deref in kobject_namespace+0x3c/0xb0
Logs: [Sorry for sharing long logs ]
Not your fault. These long warnings are the issue at hand.
==========
------------[ cut here ]------------ [ 629.699281] WARNING: CPU: 0 PID: 2834 at drivers/gpu/drm/drm_rect.c:138 drm_rect_calc_hscale+0xbc/0xe8
Deliberate.
[ 629.914458] WARNING: CPU: 5 PID: 2836 at drivers/gpu/drm/drm_rect.c:138 drm_rect_calc_hscale+0xbc/0xe8 [drm_kms_helper]
Deliberate.
[ 630.172564] WARNING: CPU: 5 PID: 2846 at drivers/gpu/drm/drm_rect.c:138 drm_rect_calc_vscale+0xbc/0xe8 [drm_kms_helper]
Deliberate.
------------[ cut here ]------------ [ 630.388003] WARNING: CPU: 3 PID: 2848 at drivers/gpu/drm/drm_rect.c:138 drm_rect_calc_vscale+0xbc/0xe8 [drm_kms_helper]
Deliberate.
------------[ cut here ]------------ [ 631.679963] kobject: '(null)' (00000000f512f33b): is not initialized, yet kobject_get() is being called.
Not deliberate. This seems like a straight forward bug to fix.
Failing a kobject_get() seems like it would obviously lead to a refcounting underflow and a use after free so I suspect some of the other warnings that follow are caused by this issue. We should fix it first and see which warnings disappear.
So testing the Linux Kernel Dump Test Module is always going to create warnings. So intentional warnings are a part of life. We should annotate them.
But having too many warnings is bad and has caused this kobject_get() bug. We should delete the warning in drm_calc_scale() or make it a WARN_ONCE() and mark it as intentional in the kunit test.
regards, dan carpenter