Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere Altra

5 Jul 2022


      On Tue, Jul 05, 2022 at 11:53:22AM +0100, Kajetan Puchalski wrote:
...
On Mon, Jul 04, 2022 at 10:22:24AM +0100, Kajetan Puchalski wrote:
...
On Sat, Jul 02, 2022 at 10:56:51PM +0200, Florian Westphal wrote:
...
...
That would make sense, from further experiments I ran it somehow seems
to be related to the number of workers being spawned by stress-ng along
with the CPUs/cores involved.
For instance, running the test with <=25 workers (--udp-flood 25 etc.)
results in the test running fine for at least 15 minutes.
Ok.  I will let it run for longer on the machines I have access to.
In mean time, you could test attached patch, its simple s/refcount_/atomic_/
in nf_conntrack.
If mainline (patch vs. HEAD 69cb6c6556ad89620547318439) crashes for you
but works with attached patch someone who understands aarch64 memory ordering
would have to look more closely at refcount_XXX functions to see where they
might differ from atomic_ ones.
I can confirm that the patch seems to solve the issue.
With it applied on top of the 5.19-rc5 tag the test runs fine for at
least 15 minutes which was not the case before so it looks like it is
that aarch64 memory ordering problem.
I'm CCing some people who should be able to help with aarch64 memory
ordering, maybe they could take a look.
(re-sending due to a typo in CC, sorry for duplicate emails!)
Sorry, but I have absolutely no context here. We have a handy document
describing the differences between atomic_t and refcount_t:
Documentation/core-api/refcount-vs-atomic.rst
What else do you need to know?
Will

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Regression] stress-ng udp-flood causes kernel panic on Ampere Altra