Hi Arnd,
I took a look on the stack usage issue in the kernel snippet you provided [1], and as you have noted the most impact indeed come from -ftree-ch optimization. It is enabled in all optimization levels besides -Os (since besides possible increasing the stack usage it also might increase code side).
I am still fulling grasping what free-ch optimization does, but my understanding so far is it tries to reorganize the loop for later loop optimization phases. More specifically, what it ends up doing on the specific snippet is create extra stack variables for the internal membber access in the inner loop (which in its turns increase stack usage).
This is also why adding the compiler barrier inhibits the optimization, since it prevents the ftree-ch to optimize the internal loop reorganization and it is passed as is to later optimizations phases.
It is also a generic pass that affects all architecture, albeit the resulting stack will depend on later passes. With GCC 9.2.1 I see the resulting stack usage using -fstack-usage along with -O2:
arm 632 aarch64 448 powerpc 912 powerpc64le 560 s390 600 s390x 632 i386 1376 x86_64 784
Also, -fconserve-stack does not really help with this pass since ftree-ch does not check the flag usage. The fconserve-stack currently only seems to effect the inliner by setting both large-stack-frame and large-stack-frame-growth to some conservative values.
The straightforward change I am checking is just to disable tree-ch optimization if fconserve-stack is also enabled:
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c index b894a7e0918..b14dd66257c 100644 --- a/gcc/tree-ssa-loop-ch.c +++ b/gcc/tree-ssa-loop-ch.c @@ -291,7 +291,8 @@ public: {}
/* opt_pass methods: */ - virtual bool gate (function *) { return flag_tree_ch != 0; } + virtual bool gate (function *) { return flag_tree_ch != 0 + && flag_conserve_stack == 0; }
/* Initialize and finalize loop structures, copying headers inbetween. */ virtual unsigned int execute (function *);
On powerpc64le with gcc master:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage && cat stack_usage.su ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 496 static
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -O2 ../stack_usage.c -c -fstack-usage -fconserve-stack && cat stack_usage.su ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 176 static
The reference for minimal stack usage is with -Os:
$ /home/azanella/gcc/gcc-git-build/gcc/xgcc -B /home/azanella/gcc/gcc-git-build/gcc -Os ../stack_usage.c -c -fstack-usage && cat stack_usage.su ../stack_usage.c:157:6:mlx5e_grp_sw_update_stats 32 static
I will try to check if also enable the same test for -fgcse and -free-ter do make sense.