Re: [PATCH net v3] net: stmmac: protect updates of 64-bit statistics counters

28 Feb 2024

      Net maintainers, chiming in here, as it seems handling this regression
stalled.
On 13.02.24 16:52, Eric Dumazet wrote:
...
On Tue, Feb 13, 2024 at 4:26 PM Guenter Roeck linux@roeck-us.net wrote:
...
On Tue, Feb 13, 2024 at 03:51:35PM +0100, Eric Dumazet wrote:
...
On Tue, Feb 13, 2024 at 3:29 PM Jisheng Zhang jszhang@kernel.org wrote:
...
On Sun, Feb 11, 2024 at 08:30:21PM -0800, Guenter Roeck wrote:
...
On Sat, Feb 03, 2024 at 08:09:27PM +0100, Petr Tesarik wrote:
...
As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
u64_stats_sync must ensure mutual exclusion, or one seqcount update could
be lost on 32-bit platforms, thus blocking readers forever. Such lockups
have been observed in real world after stmmac_xmit() on one CPU raced with
stmmac_napi_poll_tx() on another CPU.
To fix the issue without introducing a new lock, split the statics into
three parts:

fields updated only under the tx queue lock,
fields updated only during NAPI poll,
fields updated only from interrupt context,

Updates to fields in the first two groups are already serialized through
other locks. It is sufficient to split the existing struct u64_stats_sync
so that each group has its own.
Note that tx_set_ic_bit is updated from both contexts. Split this counter
so that each context gets its own, and calculate their sum to get the total
value in stmmac_get_ethtool_stats().
For the third group, multiple interrupts may be processed by different CPUs
at the same time, but interrupts on the same CPU will not nest. Move fields
from this group to a newly created per-cpu struct stmmac_pcpu_stats.
Fixes: 133466c3bbe1 ("net: stmmac: use per-queue 64 bit statistics where necessary")
Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
Cc: stable@vger.kernel.org
Signed-off-by: Petr Tesarik petr@tesarici.cz
This patch results in a lockdep splat. Backtrace and bisect results attached.

[   33.736728] ================================
[   33.736805] WARNING: inconsistent lock state
[   33.736953] 6.8.0-rc4 #1 Tainted: G                 N
[   33.737080] --------------------------------
[   33.737155] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[   33.737309] kworker/0:2/39 [HC1[1]:SC0[2]:HE0:SE0] takes:
[   33.737459] ef792074 (&syncp->seq#2){?...}-{0:0}, at: sun8i_dwmac_dma_interrupt+0x9c/0x28c
[   33.738206] {HARDIRQ-ON-W} state was registered at:
[   33.738318]   lock_acquire+0x11c/0x368
[   33.738431]   __u64_stats_update_begin+0x104/0x1ac
[   33.738525]   stmmac_xmit+0x4d0/0xc58
interesting lockdep splat...
stmmac_xmit() operates on txq_stats->q_syncp, while the
sun8i_dwmac_dma_interrupt() operates on pcpu's priv->xstats.pcpu_stats
they are different syncp. so how does lockdep splat happen.
Right, I do not see anything obvious yet.
Wild guess: I think it maybe saying that due to
    inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.

the critical code may somehow be interrupted and, while handling the
interrupt, try to acquire the same lock again.
This should not happen, the 'syncp' are different. They have different
lockdep classes.
One is exclusively used from hard irq context.
The second one only used from BH context.
Alexis Lothoré hit this now as well, see yesterday report in this
thread; apart from that nothing seem to have happened for two weeks now.
The change recently made it to some stable/longterm kernels, too. Makes
me wonder:
What's the plan forward here? Is this considered to be a false positive?
Or a real problem? Or a kind of situation along the lines of "that
commit should not cause the problem we are seeing, so it might have
exposed a older bug in the code, but nobody looked closer yet to check"?
Or something else?
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net v3] net: stmmac: protect updates of 64-bit statistics counters