On Fri, 12 Jul 2024, Jeff Layton wrote:
On Mon, 2024-07-08 at 17:49 +0000, Chuck Lever III wrote:
On Jul 8, 2024, at 6:36 AM, Greg KH greg@kroah.com wrote:
On Sat, Jul 06, 2024 at 07:46:19AM +0000, Sherry Yang wrote:
On Jul 6, 2024, at 12:11 AM, Greg KH greg@kroah.com wrote:
On Fri, Jul 05, 2024 at 02:19:18PM +0000, Chuck Lever III wrote:
> On Jul 2, 2024, at 6:55 PM, Calum Mackay calum.mackay@oracle.com wrote: > > To clarify… > > On 02/07/2024 5:54 pm, Calum Mackay wrote: > > hi Petr, > > I noticed your LTP patch [1][2] which adjusts the nfsstat01 test on v6.9 kernels, to account for Josef's changes [3], which restrict the NFS/RPC stats per-namespace. > > I see that Josef's changes were backported, as far back as longterm v5.4, > > Sorry, that's not quite accurate. > > Josef's NFS client changes were all backported from v6.9, as far as longterm v5.4.y: > > 2057a48d0dd0 sunrpc: add a struct rpc_stats arg to rpc_create_args > d47151b79e32 nfs: expose /proc/net/sunrpc/nfs in net namespaces > 1548036ef120 nfs: make the rpc_stat per net namespace > > > Of Josef's NFS server changes, four were backported from v6.9 to v6.8: > > 418b9687dece sunrpc: use the struct net as the svc proc private > d98416cc2154 nfsd: rename NFSD_NET_* to NFSD_STATS_* > 93483ac5fec6 nfsd: expose /proc/net/sunrpc/nfsd in net namespaces > 4b14885411f7 nfsd: make all of the nfsd stats per-network namespace > > and the others remained only in v6.9: > > ab42f4d9a26f sunrpc: don't change ->sv_stats if it doesn't exist > a2214ed588fb nfsd: stop setting ->pg_stats for unused stats > f09432386766 sunrpc: pass in the sv_stats struct through svc_create_pooled > 3f6ef182f144 sunrpc: remove ->pg_stats from svc_program > e41ee44cc6a4 nfsd: remove nfsd_stats, make th_cnt a global counter > 16fb9808ab2c nfsd: make svc_stat per-network namespace instead of global > > > > I'm wondering if this difference between NFS client, and NFS server, stat behaviour, across kernel versions, may perhaps cause some user confusion?
As a refresher for the stable folken, Josef's changes make nfsstats silo'd, so they no longer show counts from the whole system, but only for NFS operations relating to the local net namespace. That is a surprising change for some users, tools, and testing.
I'm not clear on whether there are any rules/guidelines around LTS backports causing behavior changes that user tools, like nfsstat, might be impacted by.
The same rules that apply for Linus's tree (i.e. no userspace regressions.)
Given the current data we have, LTP nfsstat01[1] failed on LTS v5.4.278 because of kernel commit 1548036ef1204 ("nfs: make the rpc_stat per net namespace") [2]. Other LTS which backported the same commit are very likely troubled with the same LTP test failure.
The following are the LTP nfsstat01 failure output
======== network 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface network 1 TINFO: add local addr 10.0.0.2/24 network 1 TINFO: add local addr fd00:1:1:1::2/64 network 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface network 1 TINFO: add remote addr 10.0.0.1/24 network 1 TINFO: add remote addr fd00:1:1:1::1/64 network 1 TINFO: Network config (local -- remote): network 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1 network 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24 network 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64 <<<test_start>>> tag=veth|nfsstat3_01 stime=1719943586 cmdline="nfsstat01" contacts="" analysis=exit <<<test_output>>> incrementing stop nfsstat01 1 TINFO: timeout per run is 0h 20m 0s nfsstat01 1 TINFO: setup NFSv3, socket type udp nfsstat01 1 TINFO: Mounting NFS: mount -t nfs -o proto=udp,vers=3 10.0.0.2:/tmp/netpan-4577/LTP_nfsstat01.lz6zhgQHoV/3/udp /tmp/netpan-4577/LTP_nfsstat01.lz6zhgQHoV/3/0 nfsstat01 1 TINFO: checking RPC calls for server/client nfsstat01 1 TINFO: calls 98/0 nfsstat01 1 TINFO: Checking for tracking of RPC calls for server/client nfsstat01 1 TINFO: new calls 102/0 nfsstat01 1 TPASS: server RPC calls increased nfsstat01 1 TFAIL: client RPC calls not increased nfsstat01 1 TINFO: checking NFS calls for server/client nfsstat01 1 TINFO: calls 2/2 nfsstat01 1 TINFO: Checking for tracking of NFS calls for server/client nfsstat01 1 TINFO: new calls 3/3 nfsstat01 1 TPASS: server NFS calls increased nfsstat01 1 TPASS: client NFS calls increased nfsstat01 2 TINFO: Cleaning up testcase nfsstat01 2 TINFO: SELinux enabled in enforcing mode, this may affect test results nfsstat01 2 TINFO: it can be disabled with TST_DISABLE_SELINUX=1 (requires super/root) nfsstat01 2 TINFO: install seinfo to find used SELinux profiles nfsstat01 2 TINFO: loaded SELinux profiles: none
Summary: passed 3 failed 1 skipped 0 warnings 0 <<<execution_status>>> initiation_status="ok" duration=1 termination_type=exited termination_id=1 corefile=no cutime=11 cstime=16 <<<test_end>>> ltp-pan reported FAIL ========
We can observe the number of RPC client calls is 0, which is wired. And this happens from the kernel commit 57d1ce96d7655 ("nfs: make the rpc_stat per net namespace”). So now we’re not sure the kernel backport of nfs client changes is proper, or the LTP tests / userspace need to be modified.
If no userspace regression, should we revert the Josef’s NFS client-side changes on LTS?
This sounds like a regression in Linus's tree too, so why isn't it reverted there first?
There is a change in behavior in the upstream code, but Josef's patches fix an information leak and make the statistics more sensible in container environments. I'm not certain that should be considered a regression, but confess I don't know the regression rules to this fine a degree of detail.
If it is indeed a regression, how can we go about retaining both behaviors (selectable by Kconfig or perhaps administrative UI)?
I'd argue that the old behavior was a bug, and that Josef fixed it. These stats should probably have been made per-net when all of the original nfsd namespace work was done, but no one noticed until recently. Whoops.
A couple of hacky ideas for how we might deal with this:
1/ add a new line to the output of /proc/net/rpc/nfsd. It could just say "per-net\n" or "per-net <netns_id_number>\n" or something. nfsstat should ignore it, but LTP test could look for it and handle it appropriately. That could even be useful later for nfsstat too I guess.
2/ move the file to a new name and make the old filename be a symlink to the new one. nfsstat would still work, but LTP would be able to see whether it was a symlink to detect the difference...or could just make a new symlink that points to the file and LTP could look for its presence.
I don't think it makes sense to present a solution which requires LTP to be modified. If we are willing to modify LTP, then we should modify it to work with the per-net stats.
I think we need to create a new interface for the per-net stats, then deprecate the old interface and remove it in (say) 2 years. That given LTP time to update, and means that an old LTP won't give incorrect numbers, it will simply fail.
All we need to do is bikeshed the new interface. netlink ? /proc/net/rpc-pernet/nfsd ?
This means that we still need to keep the combined stats, or to combine all the per-net stats on each access.
NeilBrown