I was able to reproduce crash on 5.15.y kernel during COW, and when the grandchild process attempts a write to a private page inherited from the child process and the private page contains a memory uncorrectable error. The way to reproduce is described in Tony's patch, using his ras-tools/einj_mem_uc. And the patch series fixed the panic issue in 5.15.y.
Followed here is the backport of Tony patch series to stable 5.15 and stable 6.1. Both backport have encountered trivial conflicts due to missing dependencies, details are provided in each patch.
Please let me know whether the backport is acceptable.
Tony Luck (2): mm, hwpoison: try to recover from copy-on write faults mm, hwpoison: when copy-on-write hits poison, take page offline
include/linux/highmem.h | 24 ++++++++++++++++++++++++ include/linux/mm.h | 5 ++++- mm/memory.c | 33 +++++++++++++++++++++++---------- 3 files changed, 51 insertions(+), 11 deletions(-)