Re: [LTP] [PATCH 1/1] nfsstat01: Update client RPC calls for kernel 6.9

12 Jul 2024

      On Fri, 2024-07-12 at 15:45 +0200, Thorsten Leemhuis wrote:
...
On 08.07.24 19:49, Chuck Lever III wrote:
...
...
On Jul 8, 2024, at 6:36 AM, Greg KH greg@kroah.com wrote:
On Sat, Jul 06, 2024 at 07:46:19AM +0000, Sherry Yang wrote:
...
...
On Jul 6, 2024, at 12:11 AM, Greg KH greg@kroah.com wrote:
On Fri, Jul 05, 2024 at 02:19:18PM +0000, Chuck Lever III
wrote:
...
> On Jul 2, 2024, at 6:55 PM, Calum Mackay
> calum.mackay@oracle.com wrote:
> On 02/07/2024 5:54 pm, Calum Mackay wrote:
> > I noticed your LTP patch [1][2] which adjusts the
> > nfsstat01 test on v6.9 kernels, to account for Josef's
> > changes [3], which restrict the NFS/RPC stats per-
> > namespace.
> > I see that Josef's changes were backported, as far back
> > as longterm v5.4,
[...]
...
...
...
...
> I'm wondering if this difference between NFS client, and
> NFS server, stat behaviour, across kernel versions, may
> perhaps cause some user confusion?
As a refresher for the stable folken, Josef's changes make
nfsstats silo'd, so they no longer show counts from the
whole
system, but only for NFS operations relating to the local
net
namespace. That is a surprising change for some users,
tools,
and testing.
I'm not clear on whether there are any rules/guidelines
around
LTS backports causing behavior changes that user tools,
like
nfsstat, might be impacted by.
The same rules that apply for Linus's tree (i.e. no userspace
regressions.)
[...]
If no userspace regression, should we revert the Josef’s NFS
client-side changes on LTS?
This sounds like a regression in Linus's tree too, so why isn't
it
reverted there first?
There is a change in behavior in the upstream code, but Josef's
patches fix an information leak and make the statistics more
sensible in container environments. I'm not certain that
should be considered a regression, but confess I don't know
the regression rules to this fine a degree of detail.
Chuck pointed me to this thread (I had an eye on it already anyway)
and
asked for advice. Take everything I write here with a grain of salt,
as
this is somewhat tricky situation which makes it hard to predict how
Linus would actually want to see this handled. Maybe I should have
CCed
him, but I doubt he cares right now; but we maybe should bring him
in,
if an actual user complains.
With that out of the way, let me write a few thoughts:

That some test breaks is not a regression, as regressions are about

"practical issues", not some ABI/API changes that only some tests
care
about. So if it's just a test that broke update it.

If a user would reported something like "this change broke my app"

it
obviously would be something totally different. But that did not
happen
yet afaics -- or did it? But from the discussion it sounds like that
is
something that will likely happen down the road. If that's the case
I'd
say it's best to prevent that from happening.
I doubt anyone outside of automated testcases will ever notice this,
and if they do, then they probably want the new behavior. This was an
oversight when the nfs client and server were first containerized.
These stats should have been made per-net then, but never were.
Basically, the old "/proc/net/rpc/nfs{d}" files presented aggregate
stats for all of the nfsd's on the machine _and_ they presented the
same information no matter the net namespace you're in when reading
them.
Josef's patches changed it so that we collect this information on a
per-net namespace basis, and we only present the totals of the net
namespace reading the procfile.
There are 2 possibilities of breakage:
1) someone in a container expects to see stats for the entire host in
/proc/net/rpc/nfsd. This is a bug -- users in the container should
never have seen this in the first place. In practice, it's probably
pretty benign, but fixing it is the right thing to do.
2) someone in the init_net_ns reads the procfile and expects to see
global totals. Ok, this is a change, but I argue that it's a good one,
since it gives more immediate info about the server running in the
init_net_ns.
In practice most usage of these procfiles is pretty informal (mostly
acting as the source for nfsstat). Segregating this info by container
is the right outcome. It should have been done that way from the get-
go.
...

Not sure how Linus would react if a user would complain that some

workflow broke because rpc_stat are now per net namespace and shows
different numbers (e.g. using a format that does not break any apps).
It
would likely depend on the actual case and how bad he would consider
the
information leak.
...
If it is indeed a regression, how can we go about retaining
both behaviors (selectable by Kconfig or perhaps administrative
UI)?
That likely might be the best idea if user report an actual
regression
due to this. But switching the format of any existing file creates
quite
some trouble, as others already mentioned in this thread. So maybe
providing the newer format in a different file and allowing to
disable
the older one though a Kconfig setting might be the best way forward.
Sure, it would take years until people would have switched over, but
that's how it is with our "no regressions" rule.
Does that help?
Ciao, Thorsten
-- 
Jeff Layton jlayton@kernel.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [LTP] [PATCH 1/1] nfsstat01: Update client RPC calls for kernel 6.9