On Fri, May 23, 2025 at 05:17:15PM +0200, Arnd Bergmann wrote:
On Fri, May 23, 2025, at 16:08, Kent Overstreet wrote:
On Fri, May 23, 2025 at 03:49:54PM +0200, Arnd Bergmann wrote:
On Fri, May 23, 2025, at 15:19, Naresh Kamboju wrote:
I reproduced the problem locally and found this to go down to 1440 bytes after I turn off KASAN_STACK. next-20250523 has some changes that take the number down further to 1136 with KASAN_STACK and or 1552 with KASAN_STACK.
I've turned bcachefs with kasan-stack on for my randconfig builds again to see if there are any remaining corner cases.
Thanks for the numbers - that does still seem high, I'll have to have a look with pahole.
I agree it's still larger than it should be: having more than a few hundred bytes on a function usually means that there is both the risk for actual overflow and general inefficiency if all the stack data gets accessed as well.
It's probably not actually structure data though, but a combination of effects:
- KASAN_STACK adds extra redzones for each variable
- KASAN_STACK further prevents stack slots from getting reused inside one function, in order to better pinpoint which instance caused problems like out-of-scope access
- passing structures by value causes them to be put on the stack on some architectures, even when the structure size is only one or two registers
We mainly do this with bkey_s_c, which is just two words: on x86_64, that gets passed in registers. Is riscv different?
- sanitizers turn off optimizations that lead to better stack usage
- in some cases, the missed optimization ends up causing local variables to get spilled to the stack many times because of a combination of all the above.
Yeesh.
I suspect we should be running with a larger stack when the sanitizers are running, and perhaps tweak the warnings accordingly. I did a bunch of stack usage work after I found a kmsan build was blowing out the stack, but then running with max stack usage tracing enabled showed it to be a largely non issue on non-sanitizer builds, IIRC.
The good news is that so far my randconfig builds have not shown any more stack frame warnings on next-20230523 with bcachefs force-enabled, now 55 builds into the change, across arm32/arm64/x86 using gcc-15.1.
Good to know, thanks.