Hi,
On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
I'd like to report a very severe performance regression due to
mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
Yes, that is a known problem, with a couple of reports already in the past months. And I posted a fix from which I thought it is on its way upstream, but apparently its not:
https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/
Adding Andrew and the x86 maintainers to Cc.
Regards,
Joerg
in v4.19.88. I believe this was included since v4.19.67. It is also in all the other LTS kernels, except 3.16.
So today I switched an x86_64 production server from v5.1.21 to v4.19.88, because we kept hitting runaway kcompactd and kswapd. Plus there was a significant increase in memory usage compared to v5.1.5. I'm still bisecting that on another production server.
The service we run is one of the largest forums in Taiwan [1]. It is a terminal-based bulletin board system running over telnet, SSH or a custom WebSocket bridge. The service itself is the one-process-per-user type of design from the old days. This means a lot of forks when there are user spikes or reconnections.
(Reconnections happen because a lot of people use mobile apps that wrap the service, but they get disconnected as soon as they are backgrounded.)
With v4.19.88 we saw a lot of contention on pgd_lock in the process fork path with CONFIG_VMAP_STACK=y:
Samples: 937K of event 'cycles:ppp', Event count (approx.): 499112453614 Children Self Command Shared Object Symbol
- 31.15% 0.03% mbbsd [kernel.kallsyms]
[k] entry_SYSCALL_64_after_hwframe
- 31.12% 0.02% mbbsd [kernel.kallsyms]
[k] do_syscall_64
- 28.12% 0.42% mbbsd [kernel.kallsyms]
[k] do_raw_spin_lock
- 27.70% 27.62% mbbsd [kernel.kallsyms]
[k] queued_spin_lock_slowpath
- 18.73% __libc_fork
- 18.33% entry_SYSCALL_64_after_hwframe do_syscall_64
- _do_fork
- 18.33% copy_process.part.64
- 11.00% __vmalloc_node_range
- 10.93% sync_global_pgds_l4 do_raw_spin_lock queued_spin_lock_slowpath
- 7.27% mm_init.isra.59 pgd_alloc do_raw_spin_lock queued_spin_lock_slowpath
- 8.68% 0x41fd89415541f689
- __libc_start_main
- 7.49% main
- 0.90% main
This hit us pretty hard, with the service dropping below one-third of its original capacity.
With CONFIG_VMAP_STACK=n, the fork code path skips this, but other vmalloc users are still affected. One other area is the tty layer. This also causes problems for us since there can be as many as 15k users over SSH, some coming and going. So we got a lot of hung sshd processes as well. Unfortunately I don't have any perf reports or kernel logs to go with.
Now I understand that there is already a fix in -next:
https://lore.kernel.org/patchwork/patch/1137341/
However the code has changed a lot in mainline and I'm not sure how to backport this. For now I just reverted the commit by hand by removing the offending code. Seems to work OK, and based on the commit logs I guess it's safe to do so, as we're not running X86-32 or PTI.
Regards ChenYu