Testing a network device that has large numbers of bytes/packets may overflow. Using stats64 when comparing fixes this problem.
I tripped on this while iterating on a qstats patch for mlx5. See below for confirmation without my added code that this is a bug.
Before this patch (with added debugging output):
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec rstat: 481708634 qstat: 666201639514 key: tx-bytes not ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex
Note the huge delta above ^^^ in the rtnl vs qstats.
After this patch:
$ NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex
It looks like rtnl_fill_stats in net/core/rtnetlink.c will attempt to copy the 64bit stats into a 32bit structure which is probably why this behavior is occurring.
To show this is happening, you can get the underlying stats that the stats.py test uses like this:
$ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifi-index": 7}'
And examine the output (heavily snipped to show relevant fields):
'stats': { 'multicast': 3739197, 'rx-bytes': 1201525399, 'rx-packets': 56807158, 'tx-bytes': 492404458, 'tx-packets': 1200285371,
'stats64': { 'multicast': 3739197, 'rx-bytes': 35561263767, 'rx-packets': 56807158, 'tx-bytes': 666212335338, 'tx-packets': 1200285371,
The stats.py test prior to this patch was using the 'stats' structure above, which matches the failure output on my system.
Comparing side by side, rx-bytes and tx-bytes, and getting ethtool -S output:
rx-bytes stats: 1201525399 rx-bytes stats64: 35561263767 rx-bytes ethtool: 36203402638
tx-bytes stats: 492404458 tx-bytes stats64: 666212335338 tx-bytes ethtool: 666215360113
Note that the above was taken from a system with an mlx5 NIC, which only exposes ndo_get_stats64.
Based on the ethtool output and qstat output, it appears that stats.py should be updated to use the 'stats64' structure for accurate comparisons when packet/byte counters get very large.
To confirm that this was not related to the qstats code I was iterating on, I booted a kernel without my driver changes and re-ran the test which shows the qstats are skipped (as they don't exist for mlx5):
NETIF=eth0 tools/testing/selftests/drivers/net/stats.py KTAP version 1 1..4 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum # SKIP qstats not supported by the device ok 4 stats.qstat_by_ifindex # SKIP No ifindex supports qstats
But, fetching the stats using the CLI
$ ./cli.py --spec ../../../Documentation/netlink/specs/rt_link.yaml \ --do getlink --json '{"ifi-index": 7}'
Shows the same issue (heavily snipped for relevant fields only):
'stats': { 'multicast': 105489, 'rx-bytes': 530879526, 'rx-packets': 751415, 'tx-bytes': 2510191396, 'tx-packets': 27700323, 'stats64': { 'multicast': 105489, 'rx-bytes': 530879526, 'rx-packets': 751415, 'tx-bytes': 15395093284, 'tx-packets': 27700323,
Comparing side by side with ethtool -S on the unmodified mlx5 driver:
tx-bytes stats: 2510191396 tx-bytes stats64: 15395093284 tx-bytes ethtool: 17718435810
Fixes: f0e6c86e4bab ("testing: net-drv: add a driver test for stats reporting") Signed-off-by: Joe Damato jdamato@fastly.com --- tools/testing/selftests/drivers/net/stats.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py index 7a7b16b180e2..820b8e0a22c6 100755 --- a/tools/testing/selftests/drivers/net/stats.py +++ b/tools/testing/selftests/drivers/net/stats.py @@ -69,7 +69,7 @@ def pkt_byte_sum(cfg) -> None: return 0
for _ in range(10): - rtstat = rtnl.getlink({"ifi-index": cfg.ifindex})['stats'] + rtstat = rtnl.getlink({"ifi-index": cfg.ifindex})['stats64'] if stat_cmp(rtstat, qstat) < 0: raise Exception("RTNL stats are lower, fetched later") qstat = get_qstat(cfg)
Hello:
This patch was applied to netdev/net.git (main) by Paolo Abeni pabeni@redhat.com:
On Mon, 20 May 2024 23:58:43 +0000 you wrote:
Testing a network device that has large numbers of bytes/packets may overflow. Using stats64 when comparing fixes this problem.
I tripped on this while iterating on a qstats patch for mlx5. See below for confirmation without my added code that this is a bug.
Before this patch (with added debugging output):
[...]
Here is the summary with links: - [net] testing: net-drv: use stats64 for testing https://git.kernel.org/netdev/net/c/a61a459f5822
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org